Issue with psuedo mode configuration of Hadoop - hadoop

I am trying to do pseudo mode configuration of Hadoop 2.0.4 version. Script start-dfs.sh works fine. However, start-mapred.sh fails to start the jobtracker and tasktracker. Below is the error I am getting. Seeing at error it looks like it is not able to pick the jar file. Please let me know if you have any idea of this issue. Thanks.
FATAL org.apache.hadoop.mapred.JobTracker: java.lang.NoSuchMethodError: org/apache/hadoop/mapred/JobACLsManager.<init>(Lorg/apache/hadoop/mapred/JobConf;)V
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2182)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1895)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1889)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:311)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:302)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:297)
at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4820)

It seems I was using incorrect jars. So, first I replaced those. Then, I just created a new directory with hadoop conf files. Formatted the namenode. Finally it worked. :)

Related

Error in : hadoop jar tutorial_classes_com.jar WordCount /WordCountApp/input /WordCountApp/output is showing error in mapreduce.xml in one system

While I try to run this command its showing error in one system and I tried in another its working. Why is it so? I'm using hadoop-3.3.2 using lubuntu terminal.
The job is not submitting. Its showing error in mapred.xml configuration, but everything is proper I worked in another system there its working. What I should do?
hadoop jar tutorial_classes_com.jar WordCount /WordCountApp/input /WordCountApp/output

Why MR2 map task is running under 'yarn' user and not under user I ran hadoop job?

I'm trying to run mapreduce job on MR2, Hadoop ver. 2.6.0-cdh5.8.0. Job has relative path to directory which has a lot of files to be compressed based on some criteria(not really necessary for this question). I'm running my job as following:
sudo -u my_user hadoop jar my_jar.jar com.example.Main
There is a folder on HDFS under path /user/my_user/ with files. But when I'm running my job I got following exception:
java.io.FileNotFoundException: File /user/yarn/<path_from_job> does not exist.
I'm migrating this job from MR1 where this job is working correctly. My suggestion is this is happening due to YARN, because each container started under YARN user. In my job configuration I've tried to set mapreduce.job.user.name="my_user" but this didn't help.
I've found ${user.home} usage in me Job configuration, but I don't know aware where it is set and is it possible to change this.
The only solution I found so far is to provide absolute path to folder. Is there any other way around, because I feel like this is not correct approach.
Thank you

Hadoop namenode format not working

I've been trying to install hadoop 2.7.0 on Ubuntu but when i enter the hadoop namenode -format command i get the following message:
Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
I've triple checked all the configuration files but i can't seem to find where the problem is.
I followed this tutorial : http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
Can anyone please tell me why is this not working??
You have to add hadoop-hdfs-2.7.0.jar to your hadoop classpath. Just add these lines in $HADOOP_HOME/etc/hadoop/hadoop-env.sh:
export HADOOP_HOME=/path/to/hadoop
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/hdfs/hadoop-hdfs-2.7.0.jar
Now, stop all hadoop processes. Try to format namenode now. Post the error if you get any.

Running MapReduce code that uses zooKeeper

I want to ask about how to execute a MapReduce java code that uses zooKeeper.
My first code is just to create a variable (znode) and to modify it by each mapper.
So I modified the wordCount code just to test zookeeper for the first time.
When I run it using the eclipse console, everything goes well, so I can see the changes on the value of the znode, etc.
However, I was trying to execute it using linux command line:
**bin/hadoop jar ./myjar.jar algo.WordCount /input.txt /out
I got the following error
**Error: java.lang.ClassNotFoundException: org.apache.zookeeper.Watcher
Although that I added the path of the jar file using conf.set("mapred.jar","...."); in the mapreduce code but I don't know why it did not recognize the classes of zookeeper.
Any idea?

Hadoop streaming with python on Windows

I'm using Hortonworks HDP for Windows and have it successfully configured with a master and 2 slaves.
I'm using the following command;
bin\hadoop jar contrib\streaming\hadoop-streaming-1.1.0-SNAPSHOT.jar -files file:///d:/dev/python/mapper.py,file:///d:/dev/python/reducer.py -mapper "python mapper.py" -reducer "python reduce.py" -input /flume/0424/userlog.MDAC-HD1.MDAC.local..20130424.1366789040945 -output /flume/o%1 -cmdenv PYTHONPATH=c:\python27
The mapper runs through fine, but the log reports that the reduce.py file wasn't found. In the exception it looks like the hadoop taskrunner is creating the symlink for the reducer to the mapper.py file.
When I check the job configuration file, I noticed that mapred.cache.files is set to;
hdfs://MDAC-HD1:8020/mapred/staging/administrator/.staging/job_201304251054_0021/files/mapper.py#mapper.py
It looks like although the reduce.py file is being added to the jar file, it's not being included in the configuration correctly and can't be found when the reducer tries to run.
I think my command is correct, I've tried using -file parameters instead but then neither file is found.
Can anyone see or know of an obvious reason?
Please note, this is on Windows.
EDIT- I've just run it locally and it worked, looks like my problem may be with the copying of the files round the cluster.
Still welcome input!
Well, thats embarrassing... my first question and I answer it myself.
I found the problem by renaming the hadoop conf file to force default settings which meant the local job tracker.
The job ran properly and it gave me the room to work out what the problem is, looks like communication around the cluster isn't as complete as it need be.
When I see your command, it shows "file:///d:/dev/python/reducer.py" for -files option, but you specify the reduce.py for -reducer. Does this cause the problem?? Sorry I am not sure.

Resources