Connection time out during map reduce process - hadoop

I am transferring data from one cluster to other cluster using distcp command.i am getting below problem during map reduce process:
java.net.ConnectException: Connection timed out
I am using below command:
/home/hadoop/hadoop/bin/hadoop distcp -update -skipcrccheck "hftp://source:50070//hive/warehouse//tablename" "hdfs://destination:9000//hive/warehouse//tablename"
How can i solve this problem .Solutions will be appriciated.

If you are trying to transfer data from one HDFS to another then why you are using hftp command?
hftp is for transfer data from ftp server into hdfs.
try this for hdfs to hdfs
/home/hadoop/hadoop/bin/hadoop distcp -update -skipcrccheck "hdfs://source:50070/hive/warehouse/tablename" "hdfs://destination:9000/hive/warehouse/tablename"
For ftp to hdfs, use the correct ftp address.

Related

Hadoop copy file to remote server

How do i achive this : initiate a hadoop command from the server with Hadoop client installed to move the data from Hadoop cluster to a remote linux server which do not have hadoop client installed ?
Since you can't send all blocks of a file to a remote host, you would have to do this
hadoop fs -get /path/src/file
scp ./file user#host:/path/dest/file

Transfer of files from unsecured hdfs to secured hdfs cluster

I wanted to transfer files from unsecured HDFS cluster to kerberized cluster. I am using distcp to transfer the files. I have used the following command.
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://<ip>:8020/<sourcedir> hdfs://<ip>:8020/<destinationdir>
I am getting the following error after I executed the above command in the kerberized cluster.
java.io.EOFException: End of File Exception between local host is: "<xxx>"; destination host is: "<yyy>; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException
this is error because:
cluster is blocked for RPC communication, in such cases, webhdfs
protocol can be used, so above distcp can be rewritten as
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://xxx:8020/src_path webhdfs://yyy:50070/target_path
this is very good blog post for distcp

transfer file from local machine of 1 cluster to hdfs of another cluster

I have 2 hadoop clusters(A and B) and want to transfer a file from local of cluster A to HDFS of cluster B. Is there a way to do it?
I tried copyFromLocal and put but looks like they don't copy the file over to the HDFS of cluster B and show that they are not supported:
copyFromLocal: Not supported
fyi: connection looks open as I am able to read HDFS of cluster B from local of cluster A(hadoop fs -ls hdfs://NNofB:port/path)
Not sure if there is a direct way from HDFS->HDFS, but you could get from HDFS on a node in ClusterA, scp the data to a node in ClusterB, then put that data into HDFS from that node in ClusterB.

Hadoop distcp not working

I am trying to copy data from one HDFS to another HDFS. Any suggestion why 1st one works but not 2nd one?
(works)
hadoop distcp hdfs://abc.net:8020/foo/bar webhdfs://def.net:14000/bar/foo
(does not work )
hadoop distcp webhdfs://abc.net:50070/foo/bar webhdfs://def:14000/bar/foo
Thanks!
If the two cluster are running incompatible version of HDFS, then
you can use the webhdfsprotocol to distcp between them.
hadoop distcp webhdfs://namenode1:50070/source/dir webhdfs://namenode2:50070/destination/dir
NameNode URI and NameNode HTTP port should be provided in the source and destination command, if you are using webhdfs.

Error with flume and remote hdfs sink

I'm trying to run flume with an hdfs sink. The hdfs is running in a different machine properly and I can even interact with the hdfs from the flume machine, but when I run flume and send events to it I get the following error:
2013-05-26 14:22:11,399 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:456)] HDFS IO error
java.io.IOException: Callable timed out after 25000 ms
at org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:352)
at org.apache.flume.sink.hdfs.HDFSEventSink.append(HDFSEventSink.java:727)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:430)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258)
at java.util.concurrent.FutureTask.get(FutureTask.java:119)
at org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:345)
... 5 more
Again, conectivity is not an issue since I can interact with hdfs using the hadoop command line (the flume machine is NOT a datanode).
The weirdest part is that after killing flume I can see that the tmp file is created in hdfs but it's empty (and the .tmp extension remains).
Any ideas as to why could this be happening? Thanks a lot!
Check 3 things, if your firewall is off i.e. iptables should be stopped. Secondly, value of the property agent.sinks.hdfs-sink.hdfs.path = hdfs://PUBLIC_IP:8020/user/hdfs/flume and not Private IP.
And change
agent.sinks.hdfs-sink.hdfs.callTimeout = 180000 because the default is 10000 ms which is very less time for HDFS to react.
Thanks,
Shilpa

Resources