AWS Snowball - Page 61

To Next Page

To Previous Page

AWS Snowball User Guide

Using the Snowball Client

is not supported with the Amazon S3 Adapter for Snowball. Following, you can ﬁnd information about

how to prepare for and perform HDFS data transfer.

Although you can write HDFS data to a Snowball, you can't write Hadoop data from a Snowball to your

local HDFS. As a result, export jobs are not supported for HDFS.

If you have a large number of small ﬁles, say under a megabyte each in size, then transferring them all

at once has a negative impact on your performance. This performance degradation is due to per-ﬁle

overhead when you transfer data from HDFS clusters.

Important

The batch option for the Snowball client's copy command is not supported for HDFS

data transfers. If you must transfer a large number of small ﬁles from an HDFS cluster, we

recommend that you ﬁnd a method of collecting them into larger archive ﬁles, and then

transferring those. However, these archives are what is imported into Amazon S3. If you want

the ﬁles in their original state, take them out of the archives after importing the archives.

Preparing for Transferring Your HDFS Data with the Snowball Client

Before you transfer your HDFS (version 2.x) data, do the following:

• Conﬁrm the Kerberos authentication settings for your HDFS cluster – The Snowball client supports

Kerberos authentication for communicating with your HDFS in two ways: with the Kerberos login

already on the host system and with authentication through specifying a principal and keytab in the

snowball cp command. The following HDFS/Kerberos encryption types are known to work with

Snowball:

• des3-cbc-sha1-kd

• aes-128-cts-hmac-sha1-96

• 256-cts-hmac-sha1-96

• rc4-hmac (arcfour-hmac)

Alternatively, you can copy from an unsecured HDFS cluster.

• Conﬁrm that your workstation has the Hadoop client 2.x version installed on it – To use the

Snowball client, your workstation needs to have the Hadoop client 2.x installed, running, and able to

communicate with your HDFS 2.x cluster.

• Conﬁrm the location of your site-speciﬁc conﬁguration ﬁles – If you are using site-speciﬁc

conﬁguration ﬁles, you need to use the --hdfsconfig parameter to pass the location of each XML

ﬁle.

• Conﬁrm your Namenode URI – Each HDFS 2.x cluster has a Namenode.core-site.xml ﬁle. This ﬁle

includes a property element with the name of fs.defaultFS and a value of IP Address:port,

for example hdfs://192.0.2.0:9000. You use this value, the Namenode URI, as a part of the source

schema when you run Snowball client commands to perform operations on your HDFS cluster. For

more information, see Sources for the Snowball Client Commands (p. 54).

Note

Currently, only HDFS 2.X clusters are supported with Snowball. You can still transfer data from

an HDFS 1.x cluster by staging the data that you want to transfer on a workstation, and then

copying that data to the Snowball with the standard snowball cp commands and options.

When you have conﬁrmed the information listed previously, identify the Amazon S3 bucket that you

want your HDFS data imported into.

After your preparations for the HDFS import are complete, you can begin. If you haven't created your

job yet, see Importing Data into Amazon S3 with AWS Snowball (p. 16) until you reach Use the AWS

Snowball Client (p. 22). At that point, return to this topic.

Main Page

AWS Snowball - Page 61

Table of Contents