HADOOP Error: Could not find or load main class org.apache.hadoop.fs.FsShell - hadoop

I'm building a wordcounter program and I want to create a working directory in the HDFS, but when I execute hdfs dfs -mkdir wordcount or other commands from hdfs dfs command list it returns me Error: Could not find or load main class org.apache.hadoop.fs.FsShell. Google have told me that maybe it is a problem with path variable, but i checked it and it's ok. Thank you!

The error means hadoop classpath command has issues, not PATH.
And you don't need HDFS to run or learn MapReduce / Spark WordCount code. It works fine on your local filesystem as well

Related

can't run a MapReduce Job on Hadoop

I'm trying to run a mapreduce job took from the internet. This job takes in input a 'points.dat' file and makes a k-means clustering on it. It should produce a file 'centroids.dat' and a file with points matched to their own centroid. A couple of months this was working, but now i'm trying to re-execute on a new installation.
I made
bin/hdfs dfs -copyFromLocal ..//..//../home/<myusername>/Downloads/points.dat
Everything is fine and the file appears in the web service tool in the /user// path on hdfs . Jps is ok
The jar requests args:
<input> <output> <n clusters>
so i made
bin/hadoop jar ../../../home/<myusername>/Downloads/kmeans.jar /user/<myusername>/ /out 3
it creates a "centroids.dat" file in /user/ and a out/ directory. As much as i can understand it tries to re-read "centroids.dat" to execute. So it ends with some failures like
"failed creating symlink /tmp/hadoop-<myusername>/mapred/local/1466809349241/centroids.dat <- /usr/local/hadoop/centroids.dat
So java raise a FileNotFoundException
I tried to shorten the question as much as possible. If more info are needed, no problem for me
I think you are missing to mention main class in your command
bin/hadoop jar kmeans.jar MainClass input output

How do you transfer files onto the Hadoop FS (HDFS) on WIndows cmdline without Cygwin?

I have zero experience with Hadoop, but suddenly have to use it at work with Spark on Windows. My question, which has been asked a few times here, but I never could quite get the syntax for what I need, is this. I'm trying to transfer a simple file called:
gensortText.txt which let's say is at c:\gensortText.txt
I know you can use hadoop fs -copyFromLocal. I've tried these things:
hadoop fs -copyFromLocal C:\gensortText.txt hdfs://0.0.0.0:19000
ERROR: Relative path in absolute URI.
hadoop fs -copyFromLocal C:\gensortOutText.txt \tmp\hadoop-Administrator\dfs
ERROR: copyFromLocal: `tmphadoop-Administratordfs': No such file or directory
and a number of other variations with hdfs: and using the tmp directory which all returned similar errors.
I have hadoop in c:\deploy as suggested in the Hadoop2Windows guide (which works and allowed me to run Hadoop. I can access the WebGui and all that). Hadoop has created my new HDFS at c:\temp. Please someone help me figure out how to transfer files into the system. It can even be manually if that's possible, but that doesn't seem to work as it doesn't show up in the Web GUI when I go to "Utilities->Browse the Filesystem". Nothing shows up there actually.
Can someone please help. Any information that's relevant I can provide, but I'm so new to this I don't really know what would be helpful. I think it's just my syntax for the cmdline tool. Can someone give me a concrete example of how to use hadoop -fs copyFromLocal or another simple way to do this? Sorry for my ignorance on the subject, and thanks for any help
To be able to run hadoop commands on Windows you need to have winutils installed and visible to hadoop process.

How to move Word and PDF documents to Hadoop HDFS?

I want to copy/upload some files from a local system (a system not in Hadoop cluster) onto Hadoop HDFS. The local system can be Windows system too.
I tried with Flume spool directory. It works fine with Text files. For other docs, the mime type is getting corrupted.
Please let me know different approaches to load a file(s) to HDFS.
hadoop fs -copyFromLocal <localsrc> URI
Check Hadoop documentation: copyFromLocal
Keep in mind, Apache Flume wasn't created to copy some files.
You can also use hadoop fs -put <localsrcpath> <hdfspath>
This is one of the alternative to copyFromLocal
In hadoop 2.0 (YARN) you can do as follows to transfer local files to HDFS:
hdfs dfs -put "localsrcpath" "hdfspath"
where hdfs is the command located in the bin directory.
Java code can do that easily. You don't require any tools for this. Check below, the piece of code that worked:
Configuration conf = new Configuration();
try {
conf.set("fs.defaultFS",<<namenode>>); //something like hdfs://server:9000 or copy from core-site.xml
FileSystem fileSystem= FileSystem.get(conf);
System.out.println("Uploading please wait...");
fileSystem.copyFromLocalFile(false, new Path(args[0]), new Path(args[1].trim()));//args[0]=C://file or dir args[1]=/imported
Prepare jar out of this and run on any OS. Keep in mind you no need to
have Hadoop running in the machine, where you are going to run this
code. If you need any help, add comments.
Don't forget to add dnsresolver line where you run this code. Open /drivers/etc/hosts (for Windows)
hadoopnamenode ip-address
slavenode ip-address
First you need to load docs from your Windows machine to linux machine using filezilla or other tool.
And then you need to use:
hadoop fs -put localsrcpath hdfspath
Following command will also work.
hadoop fs -copyFromLocal localsrcpath hdfspath

Output Folders for Amazon EMR

I want to jun a custom jar, whose main class a chain of map reduce jobs, with the output of the first job going as the input of the second jar, and so on.
What do I set in FileOutputFormat.setOutputPath("what path should be here?");
If I specify -outputdir in the argument, I get the error FileAlraedy exists. If I don't specify, then I do not know where the ouput will land. I want to be able to see the ouput from every job of the chained mapreduce jobs.
Thanks in adv. Pls help!
You are likely getting the "FileAlraedy exists" error because that output directory exists prior to the job you are running. Make sure to delete the directories that you specify as output for your Hadoop jobs; otherwise you will not be able to run those jobs.
Good practice is to take output from command line as it will increase flexibility of your code And you will compile your jar only once provided the changes are related to your path.
for EMR if you launch your cluster and compile your jar
For eg.
dfs_ip_folder=HDFS_IP_DIR
dfs_op_folder=HDFS_OP_DIR
hadoop jar hadoop-examples-*.jar wordcount ${dfs_ip_folder} ${dfs_op_folder}
Note : you have to create dfs_ip_folder and store input data inside it.
dfs_op_folder will be created automatically on HDFS not on local file system
To access the HDFS op folder either you can copy it to local file system or you can do cat.
eg.
hadoop fs -cat ${dfs_op_folder}/<file_name>
hadoop fs -copyToLocal ${dfs_op_folder} ${your_local_input_dir_path}

hadoop - Where are input/output files stored in hadoop and how to execute java file in hadoop?

Suppose I write a java program and i want to run it in Hadoop, then
where should the file be saved?
how to access it from hadoop?
should i be calling it by the following command? hadoop classname
what is the command in hadoop to execute the java file?
The simplest answers I can think of to your questions are:
1) Anywhere
2,3,4)$HADOOP_HOME/bin/hadoop jar [path_to_your_jar_file]
A similar question was asked here Executing helloworld.java in apache hadoop
It may seem complicated, but it's simpler than you might think!
Compile your map/reduce classes, and your main class into a jar. Let's call this jar myjob.jar.
This jar does not need to include the Hadoop libraries, but it should include any other dependencies you have.
Your main method should set up and run your map/reduce job, here is an example.
Put this jar on any machine with the hadoop command line utility installed.
Run your main method using the hadoop command line utility:
hadoop jar myjob.jar
Hope that helps.
where should the file be saved?
The data should be saved in "hdfs". You will want to probably load it into the cluster from your data source using something like Apache Flume. The file can be placed anywhere but most home is /user/hadoop/
how to access it from hadoop?
SSH into the hadoop cluster headnode like a standard linux server.
To list your hadoop root hdfs
hadoop fs -ls /
should i be calling it by the following command? hadoop classname
You should be using the hadoop command to access your data and run your programs, try hadoop help
what is the command in hadoop to execute the java file?
hadoop -jar MyJar.jar com.mycompany.MainDriver arg[0] arg[1] ...

Resources