getting error during distcp between two different version of hadoop cluster - hadoop

I am using distcp between hadoop 0.20 and hadoop 2.2.0 versions.I am getting error during data transfer between these clusters using below distcp command:
hadoop distcp -skipcrccheck -update
getting below error:
HTTP_OK expected, received 400
Solutions will be appreciated.

Related

Transfer of files from unsecured hdfs to secured hdfs cluster

I wanted to transfer files from unsecured HDFS cluster to kerberized cluster. I am using distcp to transfer the files. I have used the following command.
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://<ip>:8020/<sourcedir> hdfs://<ip>:8020/<destinationdir>
I am getting the following error after I executed the above command in the kerberized cluster.
java.io.EOFException: End of File Exception between local host is: "<xxx>"; destination host is: "<yyy>; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException
this is error because:
cluster is blocked for RPC communication, in such cases, webhdfs
protocol can be used, so above distcp can be rewritten as
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://xxx:8020/src_path webhdfs://yyy:50070/target_path
this is very good blog post for distcp

Hadoop distcp not working

I am trying to copy data from one HDFS to another HDFS. Any suggestion why 1st one works but not 2nd one?
(works)
hadoop distcp hdfs://abc.net:8020/foo/bar webhdfs://def.net:14000/bar/foo
(does not work )
hadoop distcp webhdfs://abc.net:50070/foo/bar webhdfs://def:14000/bar/foo
Thanks!
If the two cluster are running incompatible version of HDFS, then
you can use the webhdfsprotocol to distcp between them.
hadoop distcp webhdfs://namenode1:50070/source/dir webhdfs://namenode2:50070/destination/dir
NameNode URI and NameNode HTTP port should be provided in the source and destination command, if you are using webhdfs.

spark with Hadoop 2.3.0 on Mesos 0.21.0 with error "sh: 1: hadoop: not found" on slave

I am setting up for spark with Hadoop 2.3.0 on Mesos 0.21.0. when I try spark on the master, I get these error messages fro stderr of mesos slave:
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1229 12:34:45.923665 8571 fetcher.cpp:76] Fetching URI
'hdfs://10.170.207.41/spark/spark-1.2.0.tar.gz'
I1229 12:34:45.925240 8571 fetcher.cpp:105] Downloading resource from
'hdfs://10.170.207.41/spark/spark-1.2.0.tar.gz' to
'/tmp/mesos/slaves/20141226-161203-701475338-5050-6942-S0/frameworks/20141229-111020-701475338-5050-985-0001/executors/20141226-161203-701475338-5050-6942-S0/runs/8ef30e72-d8cf-4218-8a62-bccdf673b5aa/spark-1.2.0.tar.gz'
E1229 12:34:45.927089 8571 fetcher.cpp:109] HDFS copyToLocal failed:
hadoop fs -copyToLocal 'hdfs://10.170.207.41/spark/spark-1.2.0.tar.gz'
'/tmp/mesos/slaves/20141226-161203-701475338-5050-6942-S0/frameworks/20141229-111020-701475338-5050-985-0001/executors/20141226-161203-701475338-5050-6942-S0/runs/8ef30e72-d8cf-4218-8a62-bccdf673b5aa/spark-1.2.0.tar.gz'
sh: 1: hadoop: not found
Failed to fetch: hdfs://10.170.207.41/spark/spark-1.2.0.tar.gz
Failed to synchronize with slave (it's probably exited)
The interesting thing is that when i switch to the slave node and run the same command
hadoop fs -copyToLocal 'hdfs://10.170.207.41/spark/spark-1.2.0.tar.gz'
'/tmp/mesos/slaves/20141226-161203-701475338-5050-6942-S0/frameworks/20141229-111020-701475338-5050-985-0001/executors/20141226-161203-701475338-5050-6942-S0/runs/8ef30e72-d8cf-4218-8a62-bccdf673b5aa/spark-1.2.0.tar.gz'
, it goes well.
When starting mesos slave, you have to specify the path to your hadoop installation through the following parameter:
--hadoop_home=/path/to/hadoop
Without that it just didn't work for me, even though I had the HADOOP_HOME environment variable set up.

getting Check-sum mismatch during data transfer between two different version of hadoop

I am new with hadoop.I am transfering data between hadoop 0.20 and hadoop 2.2.0 using distcp command.
during transfer i am getting below error:
Check-sum mismatch between
hftp://10.0.3.28:50070/hive/warehouse/staging_precall_cdr/operator=idea/PRECALL_CDR_Assam_OCT_JAN.csv
and
hdfs://10.0.20.118:9000/user/hive/warehouse/PRECALL_CDR_Assam_OCT_JAN.csv
I have used -skipcrccheck and -Ddfs.checksum.type=CRC32 also but did not get any solution.
Solutions will be appreciated.
It looks like a known issue in Jira , copying data between 0.20 and 2.2.0 hadoop version https://issues.apache.org/jira/browse/HDFS-3054.
A workaround to this problem is to enable preserve block and check-sum in the distcp copying using -pbc.
hadoop distcp -pbc <SRC> <DEST>
OR
Use Skip CRC check using -skipcrccheck option
hadoop distcp -skipcrccheck -update <SRC> <DEST>

distcp not working between hadoop version 2.0.0 and 0.20

During distcp between two version of hadoop i am getting below error:
Server IPC version 9 cannot communicate with client version 3
I am using below command:
hadoop distcp
Solutions will be appreciated.
distcp does not work between version from hdfs:// to hdfs://
You must run the distcp on the destination cluster and use the hftp:// protocol (a read-only protocol) on the source cluster.
Note: the default ports are different for different protocols, so the command ends up looking like:
hadoop distcp hftp://<source>:50070/<src path> hdfs://<dest>:8020/<dest path>
or, if you prefer fake values
hadoop distcp hftp://foo.company.com:50070/data/baz hdfs://bar.compnay.com:8020/data/

Resources