EasyManua.ls Logo

AWS Snowball - Page 61

AWS Snowball
166 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
AWS Snowball User Guide
Using the Snowball Client
is not supported with the Amazon S3 Adapter for Snowball. Following, you can find information about
how to prepare for and perform HDFS data transfer.
Although you can write HDFS data to a Snowball, you can't write Hadoop data from a Snowball to your
local HDFS. As a result, export jobs are not supported for HDFS.
If you have a large number of small files, say under a megabyte each in size, then transferring them all
at once has a negative impact on your performance. This performance degradation is due to per-file
overhead when you transfer data from HDFS clusters.
Important
The batch option for the Snowball client's copy command is not supported for HDFS
data transfers. If you must transfer a large number of small files from an HDFS cluster, we
recommend that you find a method of collecting them into larger archive files, and then
transferring those. However, these archives are what is imported into Amazon S3. If you want
the files in their original state, take them out of the archives after importing the archives.
Preparing for Transferring Your HDFS Data with the Snowball Client
Before you transfer your HDFS (version 2.x) data, do the following:
Confirm the Kerberos authentication settings for your HDFS cluster – The Snowball client supports
Kerberos authentication for communicating with your HDFS in two ways: with the Kerberos login
already on the host system and with authentication through specifying a principal and keytab in the
snowball cp command. The following HDFS/Kerberos encryption types are known to work with
Snowball:
des3-cbc-sha1-kd
aes-128-cts-hmac-sha1-96
256-cts-hmac-sha1-96
rc4-hmac (arcfour-hmac)
Alternatively, you can copy from an unsecured HDFS cluster.
Confirm that your workstation has the Hadoop client 2.x version installed on it – To use the
Snowball client, your workstation needs to have the Hadoop client 2.x installed, running, and able to
communicate with your HDFS 2.x cluster.
Confirm the location of your site-specific configuration files – If you are using site-specific
configuration files, you need to use the --hdfsconfig parameter to pass the location of each XML
file.
Confirm your Namenode URI – Each HDFS 2.x cluster has a Namenode.core-site.xml file. This file
includes a property element with the name of fs.defaultFS and a value of IP Address:port,
for example hdfs://192.0.2.0:9000. You use this value, the Namenode URI, as a part of the source
schema when you run Snowball client commands to perform operations on your HDFS cluster. For
more information, see Sources for the Snowball Client Commands (p. 54).
Note
Currently, only HDFS 2.X clusters are supported with Snowball. You can still transfer data from
an HDFS 1.x cluster by staging the data that you want to transfer on a workstation, and then
copying that data to the Snowball with the standard snowball cp commands and options.
When you have confirmed the information listed previously, identify the Amazon S3 bucket that you
want your HDFS data imported into.
After your preparations for the HDFS import are complete, you can begin. If you haven't created your
job yet, see Importing Data into Amazon S3 with AWS Snowball (p. 16) until you reach Use the AWS
Snowball Client (p. 22). At that point, return to this topic.
55

Table of Contents