Cloudera Manager: Where do I put Java ClassPath for MapReduce jobs? - hadoop

I've got Hadoop-Lzo working happily on my local pseudo-cluster but the second I try the same jar file in production, I get:
java.lang.RuntimeException: native-lzo library not available
The libraries are verified to be on the DataNodes, so my question is:
In what screen / setting do I specify the location of the native-lzo library?

For MapReduce you need to add the entries to the MapReduce Client Environment Safety valve. You can find MapReduce Client Safety by going to View and Edit tab under Configuration. Then add these lines over there :
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
Also add the LZO codecs to the io.compression.codecs property under the MapReduce Service. To do that go to io.compression under View and Edit tab under Configuration and these lines :
com.hadoop.compression.lzo.LzoCodec
com.hadoop.compression.lzo.LzopCodec
Do not forget to restart your MR daemons after making the changes. Once restarted redeploy your MR client configuration.
For a detailed help on how to use LZO you can visit this link :
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-Installation-Guide/cmig_install_LZO_Compression.html
HTH

try sudo apt-get install lzop in your TaskTracker nodes.

Related

Application (job) list empty on Hadoop 2.x

I have a Hadoop 2.8.1 installation on a macOS Sierra (Darwin Kernel version 16.7.0) and it's working fine, except the application/tasks tracking.
1) At first, I thought it was a problem with the Resource Manager web interface. So:
I've copied the yarn-site.xml template to the etc/yarn-site.xml file, but it didn't help.
I've tried to change the default 'dr. who' user to my Hadoop user on Resource manager (http://localhost:18088/cluster/apps/RUNNING?user.name=myUser), but it didn't help also.
2) Nor even on command line I can track my applications (jobs): yarn application -list returns always empty.
3) Another information: on application INFO outputs, it shows these following lines, but I can't access it.
INFO mapreduce.Job: The url to track the job: http://localhost:8080/
INFO mapreduce.Job: Running job: job_local2009332672_0001
Is it a yarn problem? Should I change another setting file? Thanks!
Look at mapreduce.framework.name in mapred-site.xml. In your HADOOP_CONF_DIR
Set its value to yarn.
If you don't have a mapred-site, then copy and rename the mapred-default XML file.
Thanks for the answer, I was looking for this feature without success. I did changes on the etc/hosts for nothing
The answer is to set mapreduce.framework.name in mapred-site.xmlto yarn as stated by cricket_007.
This is setting yarn as the default framework for MapReduce operations

Running Mahout Job on Hadoop: Got ClassNotFoundException

I try to run a Mahout Kmeans Example on the cloudera quickstart vm for hadoop. I read here link to clouudera block and here stack overflow post that i can use the -libjars command to attach the mahout .jars
I put the jar-files: KMeansHadoop.jar mahout-core-0.9.jar and mahout-math-0.9.jar in the same folder and run:
hadoop jar KMeansHadoop.jar SimpleKMeansClustering -libjars mahout-core-0.9.jar mahout-math-0.9.jar
But i still get the error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
What do i wrong? Thank you!
Firstly, I believe that the -libjars values need to be comma-separated. But that only makes your third-party jars available to the cluster. You may also need to use HADOOP_CLASSPATH to make those jars available on the client side (e.g: on the edge node from which you're kicking off your job).
Check out this post. It helped me a lot when I was working my way through this exact issue with getting Driven to work with Cascading.

I cannot see the running applications in hadoop 2.5.2 (yarn)

I installed hadoop 2.5.2, and I can run the wordcount sample successfully. However, when I want to see the application running on yarn (job running), I cannot as all applictaions interface is always empty (shown in the following screen).
Is there anyway to make the jobs visible?
Please try localhost:19888 or check value of the the property for web url for job history (mapreduce.jobhistory.webapp.address) configured in you yarn config file.

where is the hadoop task manager UI

I installed the hadoop 2.2 system on my ubuntu box using this tutorial
http://codesfusion.blogspot.com/2013/11/hadoop-2x-core-hdfs-and-yarn-components.html
Everything worked fine for me and now when I do
http://localhost:50070
I can see the management UI for HDFS. Very good!!
But the I am going through another tutorial which tells me that there must be a task manager UI running at http://mymachine.com:50030 and http://mymachine.com:50060
on my machine I cannot open these ports.
I have already done
start-dfs.sh
start-yarn.sh
start-all.sh
is something wrong? why can't I see the task manager UI?
You have installed YARN (MRv2) which runs the ResourceManager. The URL http://mymachine.com:50030 is the web address for the JobTracker daemon that comes with MRv1 and hence you are not able to see it.
To see the ResourceManager UI, check your yarn-site.xml file for the following property:
yarn.resourcemanager.webapp.address
By default, it should point to : resource_manager_hostname:8088
Assuming your ResourceManager runs on mymachine, you should see the ResourceManager UI at http://mymachine.com:8088/
Make sure all your deamons are up and running before you visit the URL for the ResourceManager.
For Hadoop 2[aka YARN/MRV2] - Any hadoop installation version-ed 2.x or higher its at port number 8088. eg. localhost:8088
For Hadoop 1 - Any hadoop installation version-ed lower than 2.x[eg 1.x or 0.x] its at port number 50030. eg localhost:50030
By default HadoopUI location is as below
http://mymachine.com:50070

Using different hadoop-mapreduce-client-core.jar to run hadoop cluster

I'm working on a hadoop cluster with CDH4.2.0 installed and ran into this error. It's been fixed in later versions of hadoop but I don't have access to update the cluster. Is there a way to tell hadoop to use this jar when running my job through the command line arguments like
hadoop jar MyJob.jar -D hadoop.mapreduce.client=hadoop-mapreduce-client-core-2.0.0-cdh4.2.0.jar
where the new mapreduce-client-core.jar file is the patched jar from the ticket. Or must hadoop be completely recompiled with this new jar? I'm new to hadoop so I don't know all the command line options that are possible.
I'm not sure how that would work as when you're executing the hadoop command you're actually executing code in the client jar.
Can you not use MR1? The issue says this issue only occurs when you're using MR2, so unless you really need Yarn you're probably better using the MR1 library to run your map/reduce.

Resources