copying local folder to hdfs through R - hadoop

I am trying to export a folder from my local file system to hdfs . I am running code through R . How may I be able to do it?
Hope for suggestions

You should use the system command to do that easily:
system("hadoop fs -put /path/to/file /path/in/hdfs")
You can also use the rhdfs project, particularly the functions hdfs.write or hdfs.copy which should do the same.

Related

How to copy file from local directory in another drive to HDFS in Apache Hadoop?

I'm new to Apache Hadoop and I'm trying to copy a simple text file from my local directory to HDFS on Hadoop, which is up and running. However, Hadoop is installed in D: while my file is in C:.
If I use the -put or copyFromLocal command in cmd with the file in the aforementioned drive, it doesn't allow me to do that. However, if I place the text file in the same D: drive, the file is correctly uploaded to Hadoop and can be seen on Hadoop localhost. The code that works with the file and Hadoop in the same drive is as follows:
hadoop fs -put /test.txt /user/testDirectory
If my file is in a separate drive, I get the error '/test.txt': No such file or directory. I've tried variations of /C/pathOfFile/test.txt but to no avail, so in short, I need to know how to access a local file in another directory, specifically with respect to the -put command. Any help for this probably amateurish question will be appreciated.
If your current cmd session is in D:\, then your command would look at the root of that drive
You could try prefixing the path
file:/C:/test.txt
Otherwise, cd to the path containing your file first, then just -put test.txt or -put .\test.txt
Note: HDFS doesn't know about the difference between C and D unless you actually set fs.defaultFS to be something like file:/D:/hdfs
From your question I assume that you have installed Hadoop in a Virtual Machine (VM) on a Windows installation. Please provide more details on that if this assumption is incorrect. The issue is that your VM considers drive D: as the Local Directory, where -put and -copyFromLocal can see files at. C: is not visible to these commands currently.
You need to mount drive C: to your VM, in order to make its files available as local for Hadoop. There are guides out there depending on your VM. I advise care while at it, in order not to mishandle any Windows installation files.

Hadoop copyFromLocal: '.': No such file or directory

I use Windows 8 with a cloudera-quickstart-vm-5.4.2-0 virtual box.
I downloaded a text file as words.txt into the Downloads folder.
I changed directory to Downloads and used hadoop fs -copyFromLocal words.txt
I get the no such file or directory error.
Can anyone explain me why this is happening / how to solve this issue?
Here is a screenshot of the terminal:
Someone told me this error occurs when Hadoop is in safe mode, but I have made sure that the safe mode is OFF.
It's happening because hdfs:///user/cloudera doesn't exist.
Running hdfs dfs -ls probably gives you a similar error.
Without specified destination folder, it looks for ., the current HDFS directory for the UNIX account running the command.
You must hdfs dfs -mkdir "/user/$(whoami)" before your current UNIX account can use HDFS, or you can specify an otherwise existing HDFS location to copy to

Copied file from HDFS don't show in local machine

I copied a folder from HDFS to my local machine using the following command:
hdfs dfs -copyToLocal hdfs:///user/myname/output-64-32/
~/Documents/fromHDFS
But I can not see any file in fromHDFS folder and also when I try to run the command again, it says "File exists".
Any help is really appreciated.
Thanks.
Try these
rm -r ~/Documents/fromHDFS/*
hdfs dfs -get /user/myname/output-64-32/ ~/Documents/fromHDFS/

No such file or directory in copying file to hadoop

i'm beginner in hadoop, when i use
Hadoop fs -ls /
And
Hadoop fs - mkdir /pathname
Every thing is ok, but i want to use my csv file in hadoop, my file is in c drive, i used -put and wget and copyfromlocal commands like these:
Hadoop fs -put c:/ path / myhadoopdir
Hadoop fs copyFromLoacl c:/...
Wget ftp://c:/...
But in two of above it errors in no such file or directory /myfilepathinc:
And for the third
Unable to resolve host address"c"
Thanks for your help
Looking at your command, it seems that there could be couple of reasons for this issue.
Hadoop fs -put c:/ path / myhadoopdir
Hadoop fs copyFromLoacl c:/...
Use hadoop fs -copyFromLocal correctly.
Check your local file permission. You have to give full access to that file.
You have to give your absolute path location both in local and in hdfs.
Hope it will work for you.
salmanbw's answer is exact. To be more clear.
Suppose your file is "c:\testfile.txt", use the command below.
And also make sure you have write permission to your directory in HDFS.
hadoop fs -copyFromLocal c:\testfile.txt /HDFSdir/testfile.txt

How can I run the wordCount example in Hadoop?

I'm trying to run the following example in hadoop: http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
However I don't understand the commands that are being used, specifically how to create an input file, upload it to the HDFS and then run the word count example.
I'm trying the following command:
bin/hadoop fs -put inputFolder/inputFile inputHDFS/
however it says
put: File inputFolder/inputFile does not exist
I have this folder inside the hadoop folder which is the folder before "bin" so why is this happening?
thanks :)
Hopefully this isn't overkill:
Assuming you've installed hadoop (in either local, distributed or pseudo-distributed), you have to make sure hadoop's bin and other misc parameters are in your path. In linux/mac this is a simple matter of adding the following to one of your shell files (~/.bashrc, ~/.zshrc, ~/.bash_profile, etc. - depending on your setup and preferences):
export HADOOP_INSTALL_DIR=/path/to/hadoop # /opt/hadoop or /usr/local/hadoop, for example
export JAVA_HOME=/path/to/jvm
export PATH=$PATH:$HADOOP_INSTALL_DIR/bin
export PATH=$PATH:$HADOOP_INSTALL_DIR/sbin
Then run exec $SHELL or reload your terminal. To verify hadoop is running, type hadoop version and see that no errors are raised. Assuming you followed the instructions on how to set up a single node cluster and started hadoop services with the start-all.sh command, you should be good to go:
In pseudo-dist mode, your file system pretends to be HDFS. So just reference any path like you would with any other linux command, like cat or grep. This is useful for testing, and you don't have to copy anything around.
With an actual HDFS running, I use the copyFromLocal command (I find it to just work):
$ hadoop fs -copyFromLocal ~/data/testfile.txt /user/hadoopuser/data/
Here I've assumed your performing the copying on a machine that is part of the cluster. Note that if your hadoopuser is the same as your unix username, you can drop the /user/hadoopuser/ part - it is implicitly assumed to do everything inside your HDFS user dir. Also, if you're using a client machine to run commands on a cluster (you can do that too!), know that you'll need to pass the cluster's configuration using -conf flag right after hadoop fs, like:
# assumes your username is the same as the one on HDFS, as explained earlier
$ hadoop fs -conf ~/conf/hadoop-cluster.xml -copyFromLocal ~/data/testfile.txt data/
For the input file, you can use any file/s that contain text. I used some random files from the gutenberg site.
Last, to run the wordcount example (comes as jar in hadoop distro), just run the command:
$ hadoop jar /path/to/hadoop-*-examples.jar wordcount /user/hadoopuser/data/ /user/hadoopuser/output/wc
This will read everything in data/ folder (can have one or many files) and write everything to output/wc folder - all on HDFS. If you run this in pseudo-dist, no need to copy anything - just point it to proper input and output dirs. Make sure the wc dir doesn't exist or your job will crash (cannot write over existing dir). See this for a better wordcount breakdown.
Again, all this assumes you've made it through the setup stages successfully (no small feat).
Hope this wasn't too confusing - good luck!

Resources