copying local files to hdfs is very very slow - hadoop

I am trying to copy local files to the hdfs using following command
$ hdfs dfs -put [local_path] [dfs_uri]
the commands works and copy the files, but 2 KB/sec or even slower.
any help appreciated :)
NOTE: Cloudera Express 5.5.1 (#8 built by jenkins on 20151201-1818 git: 2a7dfe22d921bef89c7ee3c2981cb4c1dc43de7b)

Related

How to copy a file from HDFS to a Windows machine?

I want to copy a .csv file from our Hadoop cluster in my local Desktop, so I can edit the file and upload back (replace).
I tried:
hadoop fs -copyToLocal /c_transaction_label.csv C:/Users/E_SJIRAK/Desktop
which yielded:
copyToLocal: '/Users/E_SJIRAK/Desktop': No such file or directory:
file:////Users/E_SJIRAK/Desktop
Help would be appreciated.
If you have SSH'd into the Hadoop cluster, then you cannot copyToLocal into Windows.
You need a 2 step process. Download from HDFS to the Linux environment. Then use SFTP (WinSCP, Filezilla, etc) or Putty scp command from Windows host to get files into your Windows machine.
Otherwise, you need to setup hadoop CLI command on Windows itself.

Uploading file in HDFS cluster

I was learning hadoop and till now I configured 3 Node cluster
127.0.0.1 localhost
10.0.1.1 hadoop-namenode
10.0.1.2 hadoop-datanode-2
10.0.1.3 hadoop-datanode-3
My hadoop Namenode directory looks like below
hadoop
bin
data-> ./namenode ./datanode
etc
logs
sbin
--
--
As I learned that when we upload a large file in the cluster in divide the file into blocks, I want to upload a 1Gig file in my cluster and want to see how it is being stored in datanode.
Can anyone help me with the commands to upload file and see where these blocks are being stored.
First, you need to check if you have Hadoop tools in your path, if not - I recommend integrate them into it.
One of the possible ways of uploading a file to HDFS:hadoop fs -put /path/to/localfile /path/in/hdfs
I would suggest you read the documentation and get familiar with high-level commands first as it will save you time
Hadoop Documentation
Start with "dfs" command, as this one of the most often used commands

Hortonworks VM - Hadoop batch upload?

Is there a way to batch upload files to Hadoop under a Hortonworks VM running CentOS? I see I can use the Ambari - Sandbox's HDFS Files tool, but that only allows uploading one-by-one. Apparently you could use Redgate's HDFS Explorer in the past, but it's no longer available. Hadoop is made to process big data, but it's absurd having to upload all files one-by-one...
Thank you!
Of course you can use the * wildcard in copyFromLocal, f.e.:
hdfs dfs -copyFromLocal input/* /tmp/input

MrJob spends a lot of time Copying local files into hdfs

The problem I'm encountering is this:
Having already put my input.txt (50MBytes) file into HDFS, I'm running
python ./test.py hdfs:///user/myself/input.txt -r hadoop --hadoop-bin /usr/bin/hadoop
It seems that MrJob spends a lot of time copying files to hdfs (again?)
Copying local files into hdfs:///user/myself/tmp/mrjob/test.myself.20150927.104821.148929/files/
Is this logical? Shouldn't it use input.txt directly from HDFS?
(Using Hadoop version 2.6.0)
Look at the contents of hdfs:///user/myself/tmp/mrjob/test.myself.20150927.104821.148929/files/ and you will see that input.txt isn't the file that's being copied into HDFS.
What's being copied is mrjob's entire python directory, so that it can be unpacked on each of your nodes. (mrjob assumes that mrjob is not installed on each of the nodes in your cluster.)

Issues when using hadoop to copy files from grid to local

I am trying to copy some files from the hadoop HDFS to local. I used the following command
hadoop fs -copyToLocal <hdfs path> <local path>
The size of the file is just 80M. I had run a job before where I had no issue in copying files of size 70MB to local. However, this time I am having Input/Output error
copyToLocal: Input/output error
can anyone tell me what could have gone wrong?
It might be a space constraint on your machine. I had the same issue because the file was too big for it to be moved to my local machine. Once I made space, I was able to perform the copyToLocal operation.

Resources