Hadoopy won't get past mkdir - hadoop

I'm currently working on a project that makes use of hadoop (2.7.0) I have a two node cluster configured and working (for the most part). I can run mapper / reducer jobs manually withoud any problems. But when I try to start a job with hadoopy I get a error. After debugging the error I see it origionates from the following command that is executed by hadoopy:
hadoop fs -mkdir _hadoopy_tmp
This yields the error:
mkdir: '_hadoopy_tmp': No such file or directory
When doing it manually mkdir works fine if I start my file direcotry name with a '/' in front of it. If I don't start with the '/' I get the same error as above. Same goes with the ls command (ls / gives me a result, ls . gives me a error that there is no such file or directory). I'm guessing that I screwed up in the configuration of hadoop somewhere. I just cant figure out where.
EDIT: to clearify: I'm aware that you should use the mkdir command with a direct path (ea / in front of it). When interacting with hadoop trough the terminal I do this. However the hadoopy framework seems not to do it (it throws the error as shown above). my question is: is there a fix/workaround for this in hadoopy, or do I have to rewrite there source code?

I don't understand what is 'manually' for you, but the errors that you are seeing makes perfect sense in my opinion, if you want to create a directory in hadoop FS, you should give the exact path to do it. There isn't problem there, and you didn't screw up anything. I recommend you to do it in this way:
$HADOOP_HOME/bin/hdfs dfs -mkdir /name_of_new_folder/
Pd: I don't know anything of hadoopy, i'm just talking from my experience with hadoop (and some items should be equally handled in both, so that's the reason why i'm answering here, please correct my if i'm wrong)

Related

Error while trying to execute hdfs command

While trying to copy a file from local disk to hdfs it shows error even though the syntax is proper. It says no such file or directory even though the file physically does exist.
What should I do? I have tried all the 3 commands to transfer/copy the file.
hadoop fs -put /Users/Sneha/Desktop/bank.xlsx /user/gulafsha_parveen_simplilearn/myproject
Shows Error:
no such file for /Users/Sneha/Desktop/bank.xlsx
I think one good way to troubleshoot will be to do an ls with the same user. Something like below.
>>$ ls /Users/Sneha/Desktop/bank.xlsx
Hope the output will make things clear.

What is the difference between moveFromLocal v/s put and CopyToLocal v/s get in hadoop hdfs command

Basically what is the major difference between moveFromLocal and copyToLocal instead of using put and get command in CLI of hadoop.
moveFromLocal: Similar to put command, except that the source localsrc is deleted after it’s copied.
copyToLocal: Similar to get command, except that the destination is restricted to a local file reference.
Source.

Job not running: 'No such file or directory' but the script exists

I'm a bioinformatician, new in the community and quite new about working with bash-commands.
I recently encountered a very trivial error message but for me the issue is a bit complex to fix.
Briefly, when I launch a script with the qsub command (from the master node ) the job does not work and I find the following error message in the 'log' file:
Fatal error: cannot open file
'/data/users/genethongandolfi/scripts/multi454.mse/multi454fasta.manip.r':
No such file or directory
This sounds quite strange for me since the path to the script file called 'multi454fasta.manip.r' is correct (I already checked with the 'find' command).
I also tried to move the script into the home directory /home/genethongandolfi/scripts and the error message changes: the job runs because the system finds the script, but not the input file in the usual path /data/users/genethongandolfi/analysis/etc... . It seems to be something for which the /data/users/... path is not recognized when I launch a job.
There are a couple of reasons why this could be the case:
The file location on the slave node is different from the master
The file permissions on the slave do not permit access to the file
If you can, try logging into the slave node, change to the user running the job, and check the file location and permissions.
Had the same error for a simple c program in form of an .exe
Removing the .exe from the shell script did eventually fix it.
So instead of ./program.exe write ./program

hadoop fs –put Unknown command

Created a folder [LOAN_DATA] with below command
hadoop fs -mkdir hdfs://masterNode:8020/tmp/hadoop-hadoop/dfs/LOAN_DATA
Now using the web UI when I list the contents of directory /tmp/hadoop-hadoop/dfs, it shows LOAN_DATA.
But when I want to store some Data from a TXT file to the LOAN_DATA directory using put or copyFromLocal I get
put: Unknown command
Command used:
hadoop fs –put '/home/hadoop/my_work/Acquisition_2012Q1.txt' hdfs://masterNode:8020/tmp/hadoop-hadoop/dfs/LOAN_DATA
How to resolve this issue?
This issue may occur when you copy-paste a command and use it. It is because of the change in font (or character set) used in the document from where it was copied.
For example:
If you copy/paste and execute the command -
hdfs dfs -put workflow.xml /testfile/workflow.xml
You may get-
–put: Unknown command
OR
–p-t: Unknown command
This happens because the copy is done from a UTF-8 file and the - or u (or any of the characters) copied may be of different character set.
So just type the command on the terminal (don't copy/paste) and you should be fine.
Alternatively, if you are running a shell script which was copied from
some other editor then run a dos2unix on the script before running it
on the Linux terminal.
Eg: dos2unix <shell_script.sh>
Tried your command and "it appears", there is a typo error in the above command 'hadoop fs –put ....'.
Instead of '–put', use '-put' or '-copyFromLocal'. Problem is with '–' but the correct character should be '-'. As such, the error is obvious :-)
Here is my example (using a get command instead of put):
$ hadoop fs –get /tmp/hadoop-data/output/* /tmp/hadoop-data/output/
–get: Unknown command
$ hadoop fs -get /tmp/hadoop-data/output/* /tmp/hadoop-data/output/
get: `/tmp/hadoop-data/output/part-r-00000': File exists
Anand's answer is, of course, correct. But it might not have been a typo but rather a subtle trap. Often when people are learning new technology, they copy and paste commands from websites and blogs. Often, what was originally entered as a dash will be copied as a hyphen. Hyphens differ from dashes only in that they are a tad longer, so the mistake is hard to spot, but since they are a completely different character the command is wrong, that is, "not found".

How can I run the wordCount example in Hadoop?

I'm trying to run the following example in hadoop: http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
However I don't understand the commands that are being used, specifically how to create an input file, upload it to the HDFS and then run the word count example.
I'm trying the following command:
bin/hadoop fs -put inputFolder/inputFile inputHDFS/
however it says
put: File inputFolder/inputFile does not exist
I have this folder inside the hadoop folder which is the folder before "bin" so why is this happening?
thanks :)
Hopefully this isn't overkill:
Assuming you've installed hadoop (in either local, distributed or pseudo-distributed), you have to make sure hadoop's bin and other misc parameters are in your path. In linux/mac this is a simple matter of adding the following to one of your shell files (~/.bashrc, ~/.zshrc, ~/.bash_profile, etc. - depending on your setup and preferences):
export HADOOP_INSTALL_DIR=/path/to/hadoop # /opt/hadoop or /usr/local/hadoop, for example
export JAVA_HOME=/path/to/jvm
export PATH=$PATH:$HADOOP_INSTALL_DIR/bin
export PATH=$PATH:$HADOOP_INSTALL_DIR/sbin
Then run exec $SHELL or reload your terminal. To verify hadoop is running, type hadoop version and see that no errors are raised. Assuming you followed the instructions on how to set up a single node cluster and started hadoop services with the start-all.sh command, you should be good to go:
In pseudo-dist mode, your file system pretends to be HDFS. So just reference any path like you would with any other linux command, like cat or grep. This is useful for testing, and you don't have to copy anything around.
With an actual HDFS running, I use the copyFromLocal command (I find it to just work):
$ hadoop fs -copyFromLocal ~/data/testfile.txt /user/hadoopuser/data/
Here I've assumed your performing the copying on a machine that is part of the cluster. Note that if your hadoopuser is the same as your unix username, you can drop the /user/hadoopuser/ part - it is implicitly assumed to do everything inside your HDFS user dir. Also, if you're using a client machine to run commands on a cluster (you can do that too!), know that you'll need to pass the cluster's configuration using -conf flag right after hadoop fs, like:
# assumes your username is the same as the one on HDFS, as explained earlier
$ hadoop fs -conf ~/conf/hadoop-cluster.xml -copyFromLocal ~/data/testfile.txt data/
For the input file, you can use any file/s that contain text. I used some random files from the gutenberg site.
Last, to run the wordcount example (comes as jar in hadoop distro), just run the command:
$ hadoop jar /path/to/hadoop-*-examples.jar wordcount /user/hadoopuser/data/ /user/hadoopuser/output/wc
This will read everything in data/ folder (can have one or many files) and write everything to output/wc folder - all on HDFS. If you run this in pseudo-dist, no need to copy anything - just point it to proper input and output dirs. Make sure the wc dir doesn't exist or your job will crash (cannot write over existing dir). See this for a better wordcount breakdown.
Again, all this assumes you've made it through the setup stages successfully (no small feat).
Hope this wasn't too confusing - good luck!

Resources