I am using distcp between hadoop 0.20 and hadoop 2.2.0 versions.I am getting error during data transfer between these clusters using below distcp command:
hadoop distcp -skipcrccheck -update
getting below error:
HTTP_OK expected, received 400
Solutions will be appreciated.
Related
I wanted to transfer files from unsecured HDFS cluster to kerberized cluster. I am using distcp to transfer the files. I have used the following command.
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://<ip>:8020/<sourcedir> hdfs://<ip>:8020/<destinationdir>
I am getting the following error after I executed the above command in the kerberized cluster.
java.io.EOFException: End of File Exception between local host is: "<xxx>"; destination host is: "<yyy>; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException
this is error because:
cluster is blocked for RPC communication, in such cases, webhdfs
protocol can be used, so above distcp can be rewritten as
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://xxx:8020/src_path webhdfs://yyy:50070/target_path
this is very good blog post for distcp
I am trying to copy data from one HDFS to another HDFS. Any suggestion why 1st one works but not 2nd one?
(works)
hadoop distcp hdfs://abc.net:8020/foo/bar webhdfs://def.net:14000/bar/foo
(does not work )
hadoop distcp webhdfs://abc.net:50070/foo/bar webhdfs://def:14000/bar/foo
Thanks!
If the two cluster are running incompatible version of HDFS, then
you can use the webhdfsprotocol to distcp between them.
hadoop distcp webhdfs://namenode1:50070/source/dir webhdfs://namenode2:50070/destination/dir
NameNode URI and NameNode HTTP port should be provided in the source and destination command, if you are using webhdfs.
I am setting up for spark with Hadoop 2.3.0 on Mesos 0.21.0. when I try spark on the master, I get these error messages fro stderr of mesos slave:
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1229 12:34:45.923665 8571 fetcher.cpp:76] Fetching URI
'hdfs://10.170.207.41/spark/spark-1.2.0.tar.gz'
I1229 12:34:45.925240 8571 fetcher.cpp:105] Downloading resource from
'hdfs://10.170.207.41/spark/spark-1.2.0.tar.gz' to
'/tmp/mesos/slaves/20141226-161203-701475338-5050-6942-S0/frameworks/20141229-111020-701475338-5050-985-0001/executors/20141226-161203-701475338-5050-6942-S0/runs/8ef30e72-d8cf-4218-8a62-bccdf673b5aa/spark-1.2.0.tar.gz'
E1229 12:34:45.927089 8571 fetcher.cpp:109] HDFS copyToLocal failed:
hadoop fs -copyToLocal 'hdfs://10.170.207.41/spark/spark-1.2.0.tar.gz'
'/tmp/mesos/slaves/20141226-161203-701475338-5050-6942-S0/frameworks/20141229-111020-701475338-5050-985-0001/executors/20141226-161203-701475338-5050-6942-S0/runs/8ef30e72-d8cf-4218-8a62-bccdf673b5aa/spark-1.2.0.tar.gz'
sh: 1: hadoop: not found
Failed to fetch: hdfs://10.170.207.41/spark/spark-1.2.0.tar.gz
Failed to synchronize with slave (it's probably exited)
The interesting thing is that when i switch to the slave node and run the same command
hadoop fs -copyToLocal 'hdfs://10.170.207.41/spark/spark-1.2.0.tar.gz'
'/tmp/mesos/slaves/20141226-161203-701475338-5050-6942-S0/frameworks/20141229-111020-701475338-5050-985-0001/executors/20141226-161203-701475338-5050-6942-S0/runs/8ef30e72-d8cf-4218-8a62-bccdf673b5aa/spark-1.2.0.tar.gz'
, it goes well.
When starting mesos slave, you have to specify the path to your hadoop installation through the following parameter:
--hadoop_home=/path/to/hadoop
Without that it just didn't work for me, even though I had the HADOOP_HOME environment variable set up.
I am new with hadoop.I am transfering data between hadoop 0.20 and hadoop 2.2.0 using distcp command.
during transfer i am getting below error:
Check-sum mismatch between
hftp://10.0.3.28:50070/hive/warehouse/staging_precall_cdr/operator=idea/PRECALL_CDR_Assam_OCT_JAN.csv
and
hdfs://10.0.20.118:9000/user/hive/warehouse/PRECALL_CDR_Assam_OCT_JAN.csv
I have used -skipcrccheck and -Ddfs.checksum.type=CRC32 also but did not get any solution.
Solutions will be appreciated.
It looks like a known issue in Jira , copying data between 0.20 and 2.2.0 hadoop version https://issues.apache.org/jira/browse/HDFS-3054.
A workaround to this problem is to enable preserve block and check-sum in the distcp copying using -pbc.
hadoop distcp -pbc <SRC> <DEST>
OR
Use Skip CRC check using -skipcrccheck option
hadoop distcp -skipcrccheck -update <SRC> <DEST>
During distcp between two version of hadoop i am getting below error:
Server IPC version 9 cannot communicate with client version 3
I am using below command:
hadoop distcp
Solutions will be appreciated.
distcp does not work between version from hdfs:// to hdfs://
You must run the distcp on the destination cluster and use the hftp:// protocol (a read-only protocol) on the source cluster.
Note: the default ports are different for different protocols, so the command ends up looking like:
hadoop distcp hftp://<source>:50070/<src path> hdfs://<dest>:8020/<dest path>
or, if you prefer fake values
hadoop distcp hftp://foo.company.com:50070/data/baz hdfs://bar.compnay.com:8020/data/