How to change tmp directory in yarn - hadoop

I have written a MR job and have run it in local mode with following configuration settings
mapred.local.dir=<<local directory having good amount of space>>
fs.default.name=file:///
mapred.job.tracker=local
on Hadoop 1.x
Now I am using Hadoop 2.x and the same Job I am running with the same Configuration settings, but I am getting error :
Disk Out of Space
Is it that If I switch from Hadoop 1.x to 2.x (using Hadoop-2.6 jars), the same Configuration Settings to change the Tmp Dir not work.??
What are the new Settings to configure the "tmp" directory of MR1 (mapred API) on Hadoop 2.6.
Kindly advice.
Regards
Cheers :))

Many properties in 1.x have been deprecated and replaced with new properties in 2.x.
mapred.child.tmp has been replaced by mapreduce.task.tmp.dir
mapred.local.dir has been replaced by mapreduce.cluster.local.dir
Have a look at complete list of deprecated properties and new equivalent properties at Apache website link

It can be done by setting
mapreduce.cluster.local.dir=<<local directory having good amount of space>>

Related

Why Hive will search its configuration profile in HADOOP_CONF_DIR first?

Today I found that if I copy hive-site.xml into $HADOOP_HOME/etc/hadoop/, Hive will use the hive-site.xml in the $HADOOP_HOME/etc/hadoop/ instead of the one in $HIVE_HOME/conf, and it will also search for the hive-log4j.properties in $HADOOP_HOME/etc/hadoop/.
If not found, Hive will just use the default one in /lib/hive-common-1.1.0-cdh5.7.6.jar!/hive-log4j.properties instead of the customized one in $HIVE_HOME/conf, but why?
I searched the keyword copy hive-site.xml to HADOOP_HOME in the official Hive manual in apache.org but failed to find any explanation...
My Hive version is hive-1.1.0-cdh5.7.6, Hadoop version hadoop-2.6.0-cdh5.7.6, JDK 1.7.
So, you've mentioned Sqoop, therefore I'll point out the proper processes for getting hive XML configuration.
1) There's a classpath problem if the file isn't found. Copying the file is one solution, but a poor one. A symlink is preferred.
Every time I've used Sqoop, I never messed around with controlling any XML files - it just worked. Therefore, both HDP and CDH must have the proper classpath and/or symlinks setup.
2) The documentation states where configurations are loaded from
Sqoop will fall back to $HADOOP_HOME. If it is not set either, Sqoop will use the default installation locations for Apache Bigtop, /usr/lib/hadoop and /usr/lib/hadoop-mapreduce, respectively.
The active Hadoop configuration is loaded from $HADOOP_HOME/conf/, unless the $HADOOP_CONF_DIR environment variable is set
This classpath controls where configurations are loaded from
3) You can also, at runtime, give extra files
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
sqoop import -files $HIVE_HOME/conf/hive-site.xml ...

How to Configure MR1 on CDH5.1 vm

I have installed CDH5.1 VM on my machine. CDH 5.1 is by default set to MR2(YARN). I would want to change the configuration from MR2 to MR1. Request to let me know the changes that need to be done.
Just do the steps to set MR configuration as given in the cdh5.1.2 Documentation
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_mr_cluster_deploy.html#topic_11_3
then use the hadoop command and not he yarn command to run the jar

Installing/Configuring Hue with Apache frameworks

Has anyone tried configuring Hue with the frameworks from Apache and not with CDH. The documentation says to set the mapred.jobtracker.plugins property to org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin and check the JT log files to make sure that the Thrift plugin has been loaded. But, I don't see anything related to Thrift in the JT log files. And, also looks like the mapred.jobtracker.plugins in not defined in the mapred-site.xml for Hadoop 1.2.1 which is the latest stable release.
Did you had the mapred.jobtracker.plugins to the mapred-site.xml? Hadoop support plugins since 1.2.0 so you should be good.

Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses-submiting job2remoteClustr

I recently upgraded my cluster from Apache Hadoop1.0 to CDH4.4.0. I have a weblogic server in another machine from where i submit jobs to this remote cluster via mapreduce client. I still want to use MR1 and not Yarn. I have compiled my client code against the client jars in the CDH installtion (/usr/lib/hadoop/client/*)
Am getting the below error when creating a JobClient instance. There are many posts related to the same issue but all the solutions refer to the scenario of submitting the job to a local cluster and not to remote and specifically in my case from a wls container.
JobClient jc = new JobClient(conf);
Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
But running from the command prompt on the cluster works perfectly fine.
Appreciate your timely help!
I had a similar error and added the following jars to classpath and it worked for me:
hadoop-mapreduce-client-jobclient-2.2.0.2.0.6.0-76:hadoop-mapreduce-client-shuffle-2.3.0.jar:hadoop-mapreduce-client-common-2.3.0.jar
It's likely that your app is looking at your old Hadoop 1.x configuration files. Maybe your app hard-codes some config? This error tends to indicate you are using the new client libraries but that they are not seeing new-style configuration.
It must exist since the command-line tools see them fine. Check your HADOOP_HOME or HADOOP_CONF_DIR env variables too although that's what the command line tools tend to pick up, and they work.
Note that you need to install the 'mapreduce' service and not 'yarn' in CDH 4.4 to make it compatible with MR1 clients. See also the '...-mr1-...' artifacts in Maven.
In my case, this error was due to the version of the jars, make sure that you are using the same version as in the server.
export HADOOP_MAPRED_HOME=/cloudera/parcels/CDH-4.1.3-1.cdh4.1.3.p0.23/lib/hadoop-0.20-mapreduce
I my case i was running sqoop 1.4.5 and pointing it to the latest hadoop 2.0.0-cdh4.4.0 which had the yarn stuff also thats why it was complaining.
When i pointed sqoop to hadoop-0.20/2.0.0-cdh4.4.0 (MR1 i think) it worked.
As with Akshay (comment by Setob_b) all I needed to fix was to get hadoop-mapreduce-client-shuffle-.jar on my classpath.
As follows for Maven:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-shuffle</artifactId>
<version>${hadoop.version}</version>
</dependency>
In my case, strangely this error was because in my 'core-site.xml' file, I mentioned "IP-address" rather than "hostname".
The moment I mentioned "hostname" in place of IP address and in "core-site.xml" and "mapred.xml" and re-installed mapreduce lib files, error got resolved.
in my case, i resolved this by using hadoop jar instead of java -jar .
it's usefull, hadoop will provide the configuration context from hdfs-site.xml, core-site.xml ....

Cloudera Manager: Where do I put Java ClassPath for MapReduce jobs?

I've got Hadoop-Lzo working happily on my local pseudo-cluster but the second I try the same jar file in production, I get:
java.lang.RuntimeException: native-lzo library not available
The libraries are verified to be on the DataNodes, so my question is:
In what screen / setting do I specify the location of the native-lzo library?
For MapReduce you need to add the entries to the MapReduce Client Environment Safety valve. You can find MapReduce Client Safety by going to View and Edit tab under Configuration. Then add these lines over there :
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
Also add the LZO codecs to the io.compression.codecs property under the MapReduce Service. To do that go to io.compression under View and Edit tab under Configuration and these lines :
com.hadoop.compression.lzo.LzoCodec
com.hadoop.compression.lzo.LzopCodec
Do not forget to restart your MR daemons after making the changes. Once restarted redeploy your MR client configuration.
For a detailed help on how to use LZO you can visit this link :
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-Installation-Guide/cmig_install_LZO_Compression.html
HTH
try sudo apt-get install lzop in your TaskTracker nodes.

Resources