Running MapReduce code that uses zooKeeper - hadoop

I want to ask about how to execute a MapReduce java code that uses zooKeeper.
My first code is just to create a variable (znode) and to modify it by each mapper.
So I modified the wordCount code just to test zookeeper for the first time.
When I run it using the eclipse console, everything goes well, so I can see the changes on the value of the znode, etc.
However, I was trying to execute it using linux command line:
**bin/hadoop jar ./myjar.jar algo.WordCount /input.txt /out
I got the following error
**Error: java.lang.ClassNotFoundException: org.apache.zookeeper.Watcher
Although that I added the path of the jar file using conf.set("mapred.jar","...."); in the mapreduce code but I don't know why it did not recognize the classes of zookeeper.
Any idea?

Related

See print in python script running on spark with spark-submit

I have to test some code using Spark and I'm pretty new to it.
The code I have runs an ETL script on a cluster. The ETL script is written in Python and have several prints in it but I'm unable to see those prints. The Python script is added to the spark-submit in the --py-files tag. I don't if those prints are unreachable since they are happening in the YARN executors and I should change them to logs and use log4j or add them to an accumulator reachable by the driver.
Any suggestions would help.
The final goal is to see how the execution of the code is going.I don't know if simple prints is the best solution but it was already in the code I was given to test.

How can I fix ClassNotFounException when executing HBase java application from command line?

I don't know anything about bash, but i put together a script to help me run my Hbase java application:
#!/bin/bash
HADOOP_CLASSPATH="$(hbase classpath)"
hadoop jar my.jar my_pkg.my_class
When I run it I get a:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy
When I echo out the HADOOP_CLASSPATH I see that hbase-server-1.2.0-cdh5.8.0.jar is there...
Is the hadoop jar command ignoring the HADOOP_CLASSPATH?
Also I have tried to run the commands from the command-line instead of using my script. I get the same error.
The approach was inspired by this cloduera-question
The solution was to include the Hadoop class path on the same line. I am not certain what the difference is, but this works:
HADOOP_CLASSPATH="$(hbase classpath)" hadoop jar my.jar my_pkg.my_class

Issue with psuedo mode configuration of Hadoop

I am trying to do pseudo mode configuration of Hadoop 2.0.4 version. Script start-dfs.sh works fine. However, start-mapred.sh fails to start the jobtracker and tasktracker. Below is the error I am getting. Seeing at error it looks like it is not able to pick the jar file. Please let me know if you have any idea of this issue. Thanks.
FATAL org.apache.hadoop.mapred.JobTracker: java.lang.NoSuchMethodError: org/apache/hadoop/mapred/JobACLsManager.<init>(Lorg/apache/hadoop/mapred/JobConf;)V
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2182)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1895)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1889)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:311)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:302)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:297)
at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4820)
It seems I was using incorrect jars. So, first I replaced those. Then, I just created a new directory with hadoop conf files. Formatted the namenode. Finally it worked. :)

Error: Failed to create Data Storage while running embedded pig in java

I wrote a simple program to test the embedded pig in java to run in mapreduce mode.
The hadoop version in the server I am running is 0.20.2-cdh3u4a, and pig version is 0.10.0-cdh3u4a.
When I try to run in local mode, it runs successfully. But when I try to run in mapreduce mode, it gives me the error.
I run my program using the following commands as shown in http://pig.apache.org/docs/r0.9.1/cont.html#embed-java
javac -cp pig.jar EmbedPigTest.java
javac -cp pig.jar:.:/etc/hadoop/conf EmbedPigTest.java input.txt
My program gives error as:
Exception in thread "main" java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
at org.apache.pig.PigServer.<init>(PigServer.java:226)
at org.apache.pig.PigServer.<init>(PigServer.java:215)
at org.apache.pig.PigServer.<init>(PigServer.java:211)
at org.apache.pig.PigServer.<init>(PigServer.java:207)
at WordCount.main(EmbedPigTest.java:9)
In some online resources they say that this problem occurs due to different hadoop version. But, I didn't understand what I should do. Suggestions please !!
This is happening because you are linking to the wrong jar, Please see the link below it describes this issue very well.
http://localsteve.wordpress.com/2012/09/30/embedding-pig-for-cdh4-java-apps-fer-realz/
I was faced same kind of issue when I tried to use pig in map reduce mode without starting the services.
Please check all services using jps before using pig in map reduce mode.

Using different hadoop-mapreduce-client-core.jar to run hadoop cluster

I'm working on a hadoop cluster with CDH4.2.0 installed and ran into this error. It's been fixed in later versions of hadoop but I don't have access to update the cluster. Is there a way to tell hadoop to use this jar when running my job through the command line arguments like
hadoop jar MyJob.jar -D hadoop.mapreduce.client=hadoop-mapreduce-client-core-2.0.0-cdh4.2.0.jar
where the new mapreduce-client-core.jar file is the patched jar from the ticket. Or must hadoop be completely recompiled with this new jar? I'm new to hadoop so I don't know all the command line options that are possible.
I'm not sure how that would work as when you're executing the hadoop command you're actually executing code in the client jar.
Can you not use MR1? The issue says this issue only occurs when you're using MR2, so unless you really need Yarn you're probably better using the MR1 library to run your map/reduce.

Resources