Configure job memory in Hadoop 1.2.0 - hadoop

I need to set -Xmx property of a job, running on data node.
On task tracker node I tried to put properties
<property>
<name>mapred.map.java.opts</name>
<value>-Xmx64m</value>
</property>
<property>
<name>mapred.reduce.java.opts</name>
<value>-Xmx64m</value>
</property>
into conf/core-site.xml
but it doesn't have any effect on submitted jobs, I still see java process with -Xmx200m in process list.
Please advice.

Try using:
<property>
<name>mapred.map.child.java.opts</name>
<value>-Xmx64m</value>
</property>
<property>
<name>mapred.reduce.child.java.opts</name>
<value>-Xmx64m</value>
</property>
in your conf/mapred-site.xml on each data node.

Related

Apache Ignite Hadoop Accelerator MapReduce Jobs are not in the JobHistory Server

I'm currently running Apache Ignite Hadoop accelerator for MapReduce. The jobs run, but I am unable to see them in the JobHistoryServer. I wouldn't expect to see the jobs in Yarn's Resource Manager (and don't).
I'm running my MapReduce jobs like
hadoop --config path/to/config/ jar path/to/jar ....
In the mapred-site.xml, I've added
<property>
<name>mapreduce.framework.name</name>
<value>ignite</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>[your_host]:11211</value>
</property>
My mapreduce.jobhistory.* settings have not been changed.
In the core-site.xml I've added
<property>
<name>fs.default.name</name>
<value>igfs://igfs#/</value>
</property>
<property>
<name>fs.igfs.impl</name>
<value>org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.igfs.impl</name>
<value>org.apache.ignite.hadoop.fs.v2.IgniteHadoopFileSystem</value>
</property>
I've also added ignite-core-1.6.0.jar, ignite-hadoop-1.6.0.jar, and ignite-shmem-1.0.0.jar to the $HADOOP_HOME path. Similarly, I've exported HADOOP_HOME, HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, and HADOOP_MAPRED_HOME.
Is this functionality not supported by Ignite or am I doing something wrong?
Also, is there a way to track the MapReduce job running on Ignite?
Currently Ignite does not integrate anyhow with Hadoop History server, issue https://issues.apache.org/jira/browse/IGNITE-3766 requests that.

Yarn timeline server log aggregation

Configuring hadoop 2.7.1 to retain yarn jobs for longer
Have enabled log aggregation and the jobhistory/timeline server and when a job is complete in the resource manager it does show up in the jobhistory server(if you give the correct url), however the jobhistory server is only listing M/R jobs, not yarn applications
The problem is the job is not visible in the timeline server, in fact no jobs show in the timeline server
Current yarn-site.xml configuration :
<property>
<name>yarn.timeline-service.hostname</name>
<value>host1</value>
</property>
<property>
<name>yarn.timeline-service.address</name>
<value>${yarn.timeline-service.hostname}:10200</value>
</property>
<property>
<name>yarn.timeline-service.webapp.address</name>
<value>${yarn.timeline-service.hostname}:8188</value>
</property>
<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.generic-application-history.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://${yarn.timeline-service.hostname}:19888/jobhistory/logs/</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/var/vm/apps/hadoop/logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/vm/apps/hadoop/logs</value>
</property>
Am I providing conflicting configuration in using the jobhistory server AND the timeline server?
At the end of the day I want the yarn logs persisted to hdfs for viewing in the web-ui over the following days/weeks
You need to set mapreduce.job.emit-timeline-data property to true in mapred-site.xml
This will enable mapreduce jobs to push events to the timeline server.

Only one node in ResourceManager

It is normal that in ResourceManager (nodemanager:8088/cluster/nodes) i can see only one node?
In my test environment i setup two node cluster and command bin/hdfs dfsadmin -report show me two nodes.
Sorry but i am find the solution.
You need to add following property in your conf/yarn-site.xml file on all nodes:
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>resourcemanager_address:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>resourcemanager_address:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>resourcemanager_address:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>resourcemanager_address:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>resourcemanager_address:8033</value>
</property>
That will be overwrite the default settings for resourcemanager address (default is 0.0.0.0).
Hope this helps someone.
You can also simply set
<property>
<name>yarn.resourcemanager.hostname</name>
<value>resourcemanager_address</value>
</property>
... and the rest of the properties will be set correctly automatically.
To point out the obvious, make sure you start/restart the nodemanager as well.
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager

Multiple Input Paths configuration in OOZIE

I am trying to configure a Mapreduce job in oozie . This job has two different input formats and two input data folders. I used this post How to configure oozie workflow for multi-input path with multiple mappers
and added these properties to my workflow.xml :
<property>
<name>mapred.input.dir.formats</name>
<value>folder/data/*;org.apache.hadoop.mapred.SequenceFileInputFormat\,data/*;org.apache.hadoop.mapred.TextInputFormat</value>
</property>
<property>
<name>mapred.input.dir.mappers</name>
<value>folder/data/*;....PublicMapper\,data/*;....PublicMapper</value>
</property>
but when the job is launched i have the following error: " No input paths specified in job".
Is there anyone that can help me ?
thks
You need to set some additional properties:
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.DelegatingMapper</value>
</property>
I faced the same issue today, so I used the following properties.
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.DelegatingMapper</value>
</property>
<property>
<name>mapreduce.input.multipleinputs.dir.formats</name>
<value>/first/input/path;org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat,/second/input/path;org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat</value>
</property>
<property>
<name>mapreduce.input.multipleinputs.dir.mappers</name>
<value>/first/input/path;com.first.Mapper,/second/input/path;com.second.Mapper</value>
</property>
The difference is instead of mapred.input.dir.formats and mapred.input.dir.mappers which is part of the old map-reduce API I used mapreduce.input.multipleinputs.dir.formats and mapreduce.input.multipleinputs.dir.mappers respectively. The code worked just fine after that. I ran it on Hadoop 1.2.1 and Oozie 3.3.2.

HBase UI doesn't show any region servers

I run hbase in a distributed mode. Hbase starts region servers java processes on all nodes, but web ui doesn' show them
http://s1.ipicture.ru/uploads/20120517/16DXTnsU.png
here is hbase-site.xml
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>10.3.6.44</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/hdfs/zookeeper</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://10.3.6.44:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
btw hadoop cluster is running normally and sees all the datanodes
thanks very much for your help.
problem was with dns and hosts file.
Add this property to your hbase-site.xml file and see if it works for you
name - hbase.zookeeper.property.clientPort
value - 2181

Resources