Hive and Hadoop Running Only Locally

Hive and Hadoop Running Only Locally - hadoop

I have configured a 3 node Hadoop cluster. I was trying to use Hive on top of it. Hive always seems to running only in local mode. I heard that Hive takes values from Hadoop about the cluster. So I ran a job in Hadoop and it seems to be running in Local mode as well. I have installed Hive on all three nodes as well.I'm attaching the logs and configuration files. Please ask me if you need any further details.
Hive Log:
INFO : Number of reduce tasks determined at compile time: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:1
INFO : Submitting tokens for job: job_local49819314_0002
INFO : The url to track the job: http://localhost:8080/
INFO : Job running in-process (local Hadoop)
INFO : 2016-01-27 23:56:30,389 Stage-1 map = 100%, reduce = 100%
INFO : Ended Job = job_local49819314_0002
Hadoop Word Count Log:
16/01/27 23:46:20 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/01/27 23:46:20 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/01/27 23:46:20 INFO input.FileInputFormat: Total input paths to process : 1
16/01/27 23:46:20 INFO mapreduce.JobSubmitter: number of splits:1
16/01/27 23:46:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local494116460_0001
16/01/27 23:46:20 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/01/27 23:46:20 INFO mapreduce.Job: Running job: job_local494116460_0001
16/01/27 23:46:20 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/01/27 23:46:20 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/01/27 23:46:20 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
16/01/27 23:46:20 INFO mapred.LocalJobRunner: Waiting for map tasks
16/01/27 23:46:20 INFO mapred.LocalJobRunner: Starting task: attempt_local494116460_0001_m_000000_0
16/01/27 23:46:20 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/01/27 23:46:20 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/01/27 23:46:20 INFO mapred.MapTask: Processing split: hdfs://master:9000/exercise3:0+18834811
16/01/27 23:46:20 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/01/27 23:46:20 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/01/27 23:46:20 INFO mapred.MapTask: soft limit at 83886080
16/01/27 23:46:20 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/01/27 23:46:20 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/01/27 23:46:20 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/01/27 23:46:21 INFO mapreduce.Job: Job job_local494116460_0001 running in uber mode : false
16/01/27 23:46:21 INFO mapreduce.Job: map 0% reduce 0%
16/01/27 23:46:26 INFO mapred.LocalJobRunner: map > map
16/01/27 23:46:27 INFO mapreduce.Job: map 13% reduce 0%
16/01/27 23:46:29 INFO mapred.LocalJobRunner: map > map
16/01/27 23:46:30 INFO mapreduce.Job: map 19% reduce 0%
16/01/27 23:46:32 INFO mapred.LocalJobRunner: map > map
16/01/27 23:46:33 INFO mapreduce.Job: map 29% reduce 0%
16/01/27 23:46:35 INFO mapred.LocalJobRunner: map > map
16/01/27 23:46:36 INFO mapreduce.Job: map 36% reduce 0%
16/01/27 23:46:38 INFO mapred.LocalJobRunner: map > map
16/01/27 23:46:39 INFO mapreduce.Job: map 45% reduce 0%
16/01/27 23:46:41 INFO mapred.LocalJobRunner: map > map
16/01/27 23:46:42 INFO mapreduce.Job: map 54% reduce 0%
16/01/27 23:46:44 INFO mapred.LocalJobRunner: map > map
16/01/27 23:46:45 INFO mapreduce.Job: map 62% reduce 0%
16/01/27 23:46:46 INFO mapred.LocalJobRunner: map > map
16/01/27 23:46:46 INFO mapred.MapTask: Starting flush of map output
16/01/27 23:46:46 INFO mapred.MapTask: Spilling map output
16/01/27 23:46:46 INFO mapred.MapTask: bufstart = 0; bufend = 21289849; bufvoid = 104857600
16/01/27 23:46:46 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23806260(95225040); length = 2408137/6553600
16/01/27 23:46:47 INFO mapred.MapTask: Finished spill 0
16/01/27 23:46:47 INFO mapred.Task: Task:attempt_local494116460_0001_m_000000_0 is done. And is in the process of committing
16/01/27 23:46:47 INFO mapred.LocalJobRunner: map
16/01/27 23:46:47 INFO mapred.Task: Task 'attempt_local494116460_0001_m_000000_0' done.
16/01/27 23:46:47 INFO mapred.LocalJobRunner: Finishing task: attempt_local494116460_0001_m_000000_0
16/01/27 23:46:47 INFO mapred.LocalJobRunner: map task executor complete.
16/01/27 23:46:47 INFO mapred.LocalJobRunner: Waiting for reduce tasks
16/01/27 23:46:47 INFO mapred.LocalJobRunner: Starting task: attempt_local494116460_0001_r_000000_0
16/01/27 23:46:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/01/27 23:46:47 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/01/27 23:46:47 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle#15602819
16/01/27 23:46:47 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=333971456, maxSingleShuffleLimit=83492864, mergeThreshold=220421168, ioSortFactor=10, memToMemMergeOutputsThreshold=10
16/01/27 23:46:47 INFO reduce.EventFetcher: attempt_local494116460_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
16/01/27 23:46:47 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local494116460_0001_m_000000_0 decomp: 13082052 len: 13082056 to MEMORY
16/01/27 23:46:47 INFO reduce.InMemoryMapOutput: Read 13082052 bytes from map-output for attempt_local494116460_0001_m_000000_0
16/01/27 23:46:47 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 13082052, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->13082052
16/01/27 23:46:47 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
16/01/27 23:46:47 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/01/27 23:46:47 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
16/01/27 23:46:47 INFO mapred.Merger: Merging 1 sorted segments
16/01/27 23:46:47 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 13082040 bytes
16/01/27 23:46:47 INFO reduce.MergeManagerImpl: Merged 1 segments, 13082052 bytes to disk to satisfy reduce memory limit
16/01/27 23:46:47 INFO reduce.MergeManagerImpl: Merging 1 files, 13082056 bytes from disk
16/01/27 23:46:47 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
16/01/27 23:46:47 INFO mapred.Merger: Merging 1 sorted segments
16/01/27 23:46:47 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 13082040 bytes
16/01/27 23:46:47 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/01/27 23:46:47 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
16/01/27 23:46:47 INFO mapreduce.Job: map 100% reduce 0%
16/01/27 23:46:53 INFO mapred.LocalJobRunner: reduce > reduce
16/01/27 23:46:53 INFO mapreduce.Job: map 100% reduce 85%
16/01/27 23:46:56 INFO mapred.LocalJobRunner: reduce > reduce
16/01/27 23:46:56 INFO mapreduce.Job: map 100% reduce 89%
16/01/27 23:46:59 INFO mapred.LocalJobRunner: reduce > reduce
16/01/27 23:46:59 INFO mapreduce.Job: map 100% reduce 92%
16/01/27 23:47:02 INFO mapred.LocalJobRunner: reduce > reduce
16/01/27 23:47:02 INFO mapreduce.Job: map 100% reduce 96%
16/01/27 23:47:05 INFO mapred.LocalJobRunner: reduce > reduce
16/01/27 23:47:05 INFO mapreduce.Job: map 100% reduce 99%
16/01/27 23:47:08 INFO mapred.LocalJobRunner: reduce > reduce
16/01/27 23:47:08 INFO mapreduce.Job: map 100% reduce 100%
16/01/27 23:47:11 INFO mapred.LocalJobRunner: reduce > reduce
16/01/27 23:47:18 INFO mapred.Task: Task:attempt_local494116460_0001_r_000000_0 is done. And is in the process of committing
16/01/27 23:47:18 INFO mapred.LocalJobRunner: reduce > reduce
16/01/27 23:47:18 INFO mapred.Task: Task attempt_local494116460_0001_r_000000_0 is allowed to commit now
16/01/27 23:47:18 INFO output.FileOutputCommitter: Saved output of task 'attempt_local494116460_0001_r_000000_0' to hdfs://master:9000/output/_temporary/0/task_local494116460_0001_r_000000
16/01/27 23:47:18 INFO mapred.LocalJobRunner: reduce > reduce
16/01/27 23:47:18 INFO mapred.Task: Task 'attempt_local494116460_0001_r_000000_0' done.
16/01/27 23:47:18 INFO mapred.LocalJobRunner: Finishing task: attempt_local494116460_0001_r_000000_0
16/01/27 23:47:18 INFO mapred.LocalJobRunner: reduce task executor complete.
16/01/27 23:47:18 INFO mapreduce.Job: Job job_local494116460_0001 completed successfully
16/01/27 23:47:18 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=26711328
FILE: Number of bytes written=40348644
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=37669622
HDFS: Number of bytes written=12758437
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=65535
Map output records=602035
Map output bytes=21289849
Map output materialized bytes=13082056
Input split bytes=93
Combine input records=602035
Combine output records=58349
Reduce input groups=58349
Reduce shuffle bytes=13082056
Reduce input records=58349
Reduce output records=58349
Spilled Records=116698
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=123
Total committed heap usage (bytes)=848297984
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=18834811
File Output Format Counters
Bytes Written=12758437
Configuration Files
The same configuration is present in all the systems
mapred-site.xml
<configuration>
<property>
<name>mapreduce.job.tracker</name>
<value>master:5431</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/home/huser/hadoop-2.7.1/hadoop_tmp/history/intermediate</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/home/huser/hadoop-2.7.1/hadoop_tmp/history/done</value>
</property>
<property>
<name>mapred.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>master:54311</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>master:50030</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/huser/hadoop-2.7.1/hadoop_tmp/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/huser/hadoop-2.7.1/hadoop_tmp/hdfs/datanode</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
</configuration>
Bashrc
export JAVA_HOME=/opt/jdk/jdk1.8.0_66
export PATH=$PATH:$JAVA_HOME
# -- HADOOP ENVIRONMENT VARIABLES START -- #
export HADOOP_HOME=/home/huser/hadoop-2.7.1/
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_CONF=$HADOOP_HOME/etc/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
# -- HADOOP ENVIRONMENT VARIABLES END -- #
# -- Hive Variables Start --#
export HIVE_HOME=/home/huser/apache-hive-1.2.1-bin
export HIVE_CONF=$HIVE_HOME/conf
export PATH=$HIVE_HOME/bin:$PATH
export PATH=$HIVE_HOME/lib:$PATH
export ANT_LIB=/home/huser/apache-ant-1.9.6/lib
# -- Hive Variables End -- #
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value></value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
</property>
<property>
<name>mapreduce.job.tracker</name>
<value>master:5431</value>
</property>
</configuration>
Hive Properties
Set hive.exec.mode.local.auto;
+----------------------------------+--+
| set |
+----------------------------------+--+
| hive.exec.mode.local.auto=false |
+----------------------------------+--+
set mapred.job.tracker;
+----------------------------------+--+
| set |
+----------------------------------+--+
| mapred.job.tracker=master:54311 |
+----------------------------------+--+
start-dfs.sh and start-yarn.sh
start all the datanodes, namenode, resourcemanager etc. The cluster is working fine the problem seems to be only with the jobs. I can see that all the datanodes are available at http://master:50070 and a job history page at http://master:8088
If you need any further logs or config files let me know.
Thanks.

The problem seems to be with the mapred-site.xml.
This is the new file
<configuration>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/home/huser/hadoop-2.7.1/hadoop_tmp/history/intermediate</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/home/huser/hadoop-2.7.1/hadoop_tmp/history/done</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>master:54311</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>master:50030</value>
</property>
</configuration>
The mapreduce.job.tracker seems to be not a valid property.
and I have also changed my
HADOOP_HOME=/home/huser/hadoop-2.7.1/ to
HADOOP_HOME=/home/huser/hadoop-2.7.1 removing the forward slash (/).
I have also changes the hive-site.xml to the following:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>1</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
</property>
</configuration>

Related

Mapreduce Job running in local mode instead of cluster

Configuration are done for running mapreduce job in cluster mode on top of yarn but its running on local mode.
Not able to figuring out whats the issue.
below is yarn-site.xml (at master node)
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>namenode:8031</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name> //node manager servi
<value>mapreduce_shuffle</value> //This will specify that how mapper reducer work
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>namenode:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>namenode:8032</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>namenode</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2042</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
yarn-site.xml (at slave node)
<property>
<name>yarn.nodemanager.aux-services</name> //node manager service
<value>mapreduce_shuffle</value> //This will specify that how mapper reducer work
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>namenode:8031</value> //Tell the ip_address of resource tracker
</property>
mapred-site.xml (at master node and slave node)
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2048</value>
</property>
on submission the job output is like below.
18/12/06 16:20:43 INFO input.FileInputFormat: Total input paths to process : 1
18/12/06 16:20:43 INFO mapreduce.JobSubmitter: number of splits:2
18/12/06 16:20:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1556004420_0001
18/12/06 16:20:43 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/12/06 16:20:43 INFO mapreduce.Job: Running job: job_local1556004420_0001
18/12/06 16:20:43 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/12/06 16:20:43 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/12/06 16:20:43 INFO mapred.LocalJobRunner: Waiting for map tasks
18/12/06 16:20:43 INFO mapred.LocalJobRunner: Starting task: attempt_local1556004420_0001_m_000000_0
18/12/06 16:20:43 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/12/06 16:20:43 INFO mapred.MapTask: Processing split: hdfs://namenode:9001/all-the-news/articles1.csv:0+134217728
18/12/06 16:20:43 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/12/06 16:20:43 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/12/06 16:20:43 INFO mapred.MapTask: soft limit at 83886080
18/12/06 16:20:43 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/12/06 16:20:43 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/12/06 16:20:43 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/12/06 16:20:44 INFO mapreduce.Job: Job job_local1556004420_0001 running in uber mode : false
18/12/06 16:20:44 INFO mapreduce.Job: map 0% reduce 0%
18/12/06 16:20:49 INFO mapred.LocalJobRunner: map > map
18/12/06 16:20:50 INFO mapreduce.Job: map 1% reduce 0%
18/12/06 16:20:52 INFO mapred.LocalJobRunner: map > map
18/12/06 16:20:55 INFO mapred.LocalJobRunner: map > map
18/12/06 16:20:56 INFO mapreduce.Job: map 2% reduce 0%
18/12/06 16:20:58 INFO mapred.LocalJobRunner: map > map
18/12/06 16:21:01 INFO mapred.LocalJobRunner: map > map
18/12/06 16:21:02 INFO mapreduce.Job: map 3% reduce 0%
18/12/06 16:21:04 INFO mapred.LocalJobRunner: map > map
Why it's running in local mode.
I am running this job on 200MB file with 3 nodes 2 datanode and 1 namenode.
etc/hosts file is as shown below
127.0.0.1 localhost
127.0.1.1 anil-Lenovo-Product
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.8.98 namenode
192.168.8.99 datanode
192.168.8.100 datanode2

first check if these configurations are effective:
http://{your-resource-manager-host}:8088/conf by default or
your configured UI address: http://namenode:8088/conf
then make sure these properties are configured:
in mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
in yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
restart YARN service and check if it works.
jobs are submitted by ClientProtocol interface, and one of its two implementations are created when service started:
LocalClientProtocolProvider prefix with job_local
YarnClientProtocolProvider prefix with job_
according to MRConfig.FRAMEWORK_NAME(value is "mapreduce.framework.name") configuration, and its valid options are classic, yarn, local.
Good luck!

Hadoop 2.7.0 - MapReduce Jobs not Running - Failing with AM Container Error

I am using Hadoop 2.7.0 in pseudo node mode, on a Fedora 22 Virtual Machine. A few days back the MapReduce jobs ran fine, but after installed Oozie and made modifications to the yarn-site.xml . I am getting the below error on running the Pi example job and come what may I am not able to debug the error,
EDITED - I am running the job using command line and NOT using the oozie workflow engine .. command - hadoop jar 10 100
Starting Job
15/12/17 15:22:05 INFO client.RMProxy: Connecting to ResourceManager at /192.168.122.1:8032
15/12/17 15:22:06 INFO input.FileInputFormat: Total input paths to process : 10
15/12/17 15:22:06 INFO mapreduce.JobSubmitter: number of splits:10
15/12/17 15:22:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1450326099697_0001
15/12/17 15:22:07 INFO impl.YarnClientImpl: Submitted application application_1450326099697_0001
15/12/17 15:22:07 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1450326099697_0001/
15/12/17 15:22:07 INFO mapreduce.Job: Running job: job_1450326099697_0001
15/12/17 15:22:17 INFO mapreduce.Job: Job job_1450326099697_0001 running in uber mode : false
15/12/17 15:22:17 INFO mapreduce.Job: map 0% reduce 0%
15/12/17 15:22:24 INFO mapreduce.Job: map 10% reduce 0%
15/12/17 15:22:30 INFO mapreduce.Job: map 20% reduce 0%
15/12/17 15:22:36 INFO mapreduce.Job: map 30% reduce 0%
15/12/17 15:22:42 INFO mapreduce.Job: map 40% reduce 0%
15/12/17 15:22:46 INFO mapreduce.Job: map 50% reduce 0%
15/12/17 15:22:51 INFO mapreduce.Job: map 60% reduce 0%
15/12/17 15:22:56 INFO mapreduce.Job: map 70% reduce 0%
15/12/17 15:23:01 INFO mapreduce.Job: map 80% reduce 0%
15/12/17 15:23:07 INFO mapreduce.Job: map 90% reduce 0%
15/12/17 15:23:13 INFO mapreduce.Job: map 100% reduce 0%
15/12/17 15:23:18 INFO mapreduce.Job: map 100% reduce 100%
15/12/17 15:23:23 INFO ipc.Client: Retrying connect to server: vlan722-rsvd-router.ddr.priv/192.168.122.1:34460. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
sleepTime=1000 MILLISECONDS)
15/12/17 15:23:24 INFO ipc.Client: Retrying connect to server: vlan722-rsvd-router.ddr.priv/192.168.122.1:34460. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
sleepTime=1000 MILLISECONDS)
15/12/17 15:23:25 INFO ipc.Client: Retrying connect to server: vlan722-rsvd-router.ddr.priv/192.168.122.1:34460. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
sleepTime=1000 MILLISECONDS)
15/12/17 15:23:28 INFO mapreduce.Job: map 0% reduce 0%
15/12/17 15:23:28 INFO mapreduce.Job: Job job_1450326099697_0001 failed with state FAILED due to: Application application_1450326099697_0001 failed 2 times due to AM Container for
appattempt_1450326099697_0001_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://hadoop:8088/cluster/app/application_1450326099697_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1450326099697_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
15/12/17 15:23:28 INFO mapreduce.Job: Counters: 0
Job Finished in 82.924 seconds
Estimated value of Pi is 3.14800000000000000000
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>
/home/osboxes/hadoop/etc/hadoop,
/home/osboxes/hadoop/share/hadoop/common/*,
/home/osboxes/hadoop/share/hadoop/common/lib/*,
/home/osboxes/hadoop/share/hadoop/hdfs/*,
/home/osboxes/hadoop/share/hadoop/hdfs/lib/*,
/home/osboxes/hadoop/share/hadoop/yarn/*,
/home/osboxes/hadoop/share/hadoop/yarn/lib/*,
/home/osboxes/hadoop/share/hadoop/mapreduce/*,
/home/osboxes/hadoop/share/hadoop/mapreduce/lib/*
</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5120</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>http://192.168.122.1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>http://192.168.122.1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>http://192.168.122.1:8031</value>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>http://192.168.122.1:8041</value>
</property>
Any help on this would be very much appreciated.
EDIT - yarn-site.xml before
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

Finally I solved the issue by making the following change to mapred-site.xml ,
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>http://localhost:19888</value>
</property>
After that the jobs ran perfectly fine.

Second mapper not getting called - MultipleInputs

My job gets stuck once the first mapper (Reducemapper2) gets complete at "map 50% reduce 0%". I tried to debug a lot and googled it as well, but I'm not able to figure out the reason. Below is the driver class.
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class Reducedriver {
public static void main(String args[]) throws Exception
{
if(args.length!=3)
{
System.err.println("Usage: Worddrivernewapi <input path1> <inputpath2> <output path>");
System.exit(-1);
}
Configuration conf=new Configuration();
Job job=new Job(conf,"Reducesideexample");
job.setJarByClass(Reducedriver.class);
job.setJobName("Reducedriver");
Path path1=new Path(args[0]);
Path path2=new Path(args[1]); MultipleInputs.addInputPath(job,path1,TextInputFormat.class,Reducemapper1.class);
MultipleInputs.addInputPath(job,path2,TextInputFormat.class,Reducemapper2.class);
FileOutputFormat.setOutputPath(job,new Path(args[2]));
//job.setMapperClass(Reducemapper1.class);
job.setPartitionerClass(Reducepartitioner.class);
//job.setSortComparatorClass(Reducesortcomparator.class);
job.setGroupingComparatorClass(Reducegroupcomparator.class);
job.setReducerClass(Reducereducer.class);
//job.setNumReduceTasks(0);
job.setMapOutputKeyClass(ReduceWritable.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Could some one help me in figuring out the issue?
This is a pseudo distributed mode with 2 mapper and reducer capacity. I had multiple successful runs in my 2 node capacity.
Log for a single mapper(Jobtracker log):
2015-05-16 11:10:56,630 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2015-05-16 11:10:57,126 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2015-05-16 11:10:57,288 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2015-05-16 11:10:57,309 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#42f93a98
2015-05-16 11:10:57,484 INFO org.apache.hadoop.mapred.MapTask: Processing split: hdfs://localhost:9000/user/hduser/test/mapmainfile.dat:0+40
2015-05-16 11:10:57,512 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
2015-05-16 11:10:57,591 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720
2015-05-16 11:10:57,592 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
2015-05-16 11:10:57,607 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library not loaded
2015-05-16 11:10:57,666 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
2015-05-16 11:10:57,669 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
From the terminal:
15/05/16 11:10:50 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/05/16 11:10:50 INFO input.FileInputFormat: Total input paths to process : 1
15/05/16 11:10:50 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/05/16 11:10:50 WARN snappy.LoadSnappy: Snappy native library not loaded
15/05/16 11:10:50 INFO input.FileInputFormat: Total input paths to process : 1
15/05/16 11:10:51 INFO mapred.JobClient: Running job: job_201505161109_0001
15/05/16 11:10:52 INFO mapred.JobClient: map 0% reduce 0%
15/05/16 11:11:04 INFO mapred.JobClient: map 100% reduce 0%.
When I tried to debug through localhost I could see that the first mapper completes and the map progress stops at 50%.
Localjobrunner log:
15/05/16 11:36:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/05/16 11:36:08 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/05/16 11:36:08 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15/05/16 11:36:08 INFO input.FileInputFormat: Total input paths to process : 1
15/05/16 11:36:08 WARN snappy.LoadSnappy: Snappy native library not loaded
15/05/16 11:36:08 INFO input.FileInputFormat: Total input paths to process : 1
15/05/16 11:36:08 INFO mapred.JobClient: Running job: job_local815502428_0001
15/05/16 11:36:09 INFO mapred.LocalJobRunner: Waiting for map tasks
15/05/16 11:36:09 INFO mapred.LocalJobRunner: Starting task: attempt_local815502428_0001_m_000000_0
15/05/16 11:36:09 INFO util.ProcessTree: setsid exited with exit code 0
15/05/16 11:36:09 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#11507b87
15/05/16 11:36:09 INFO mapred.MapTask: Processing split: file:/home/hduser/hadoop/myexamples/mainmapdatafile.dat:0+137
15/05/16 11:36:09 INFO mapred.MapTask: io.sort.mb = 100
15/05/16 11:36:09 INFO mapred.MapTask: data buffer = 79691776/99614720
15/05/16 11:36:09 INFO mapred.MapTask: record buffer = 262144/327680
15/05/16 11:36:09 INFO mapred.JobClient: map 0% reduce 0%
15/05/16 11:36:18 INFO mapred.LocalJobRunner:
15/05/16 11:36:18 INFO mapred.JobClient: map 6% reduce 0%
15/05/16 11:36:27 INFO mapred.LocalJobRunner:
15/05/16 11:36:28 INFO mapred.JobClient: map 12% reduce 0%
15/05/16 11:36:36 INFO mapred.LocalJobRunner:
15/05/16 11:36:37 INFO mapred.JobClient: map 18% reduce 0%
15/05/16 11:36:45 INFO mapred.LocalJobRunner:
15/05/16 11:36:46 INFO mapred.JobClient: map 25% reduce 0%
15/05/16 11:36:51 INFO mapred.LocalJobRunner:
15/05/16 11:36:52 INFO mapred.JobClient: map 31% reduce 0%
15/05/16 11:36:57 INFO mapred.LocalJobRunner:
15/05/16 11:36:58 INFO mapred.JobClient: map 37% reduce 0%
15/05/16 11:37:03 INFO mapred.LocalJobRunner:
15/05/16 11:37:04 INFO mapred.JobClient: map 43% reduce 0%
15/05/16 11:37:09 INFO mapred.LocalJobRunner:
15/05/16 11:37:10 INFO mapred.JobClient: map 50% reduce 0%
15/05/16 11:37:12 INFO mapred.MapTask: Starting flush of map output
15/05/16 11:37:12 INFO mapred.MapTask: Starting flush of map output
15/05/16 11:37:18 INFO mapred.LocalJobRunner:

Mapreduce throwing OutOfMemoryError for large input file

Hi I have a mapreduce jar that runs perfectly fine for small input files. When I say small I mean sample input files that I've created with less than 10 lines of input. But when I try to run mapreduce on an input file of size 1.8GB, I get the OutOfMemoryError. I'm not sure what i'm supposed to be doing.
Is there anyway that I can limit the number of tasks being spawned? And have few tasks run for longer durations?
Around 20 tasks are spawned on the large input file before I get this error. Here's part of the log that's generated for the first two tasks.
13/12/13 12:00:22 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
13/12/13 12:00:22 INFO mapreduce.Job: Running job: job_local1170901099_0001
13/12/13 12:00:22 INFO mapred.LocalJobRunner: OutputCommitter set in config null
13/12/13 12:00:22 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
13/12/13 12:00:22 INFO mapred.LocalJobRunner: Waiting for map tasks
13/12/13 12:00:22 INFO mapred.LocalJobRunner: Starting task: attempt_local1170901099_0001_m_000000_0
13/12/13 12:00:22 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
13/12/13 12:00:22 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
13/12/13 12:00:22 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/chaitanya.nadig/friendship.txt:0+134217728
13/12/13 12:00:22 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
13/12/13 12:00:23 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
13/12/13 12:00:23 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
13/12/13 12:00:23 INFO mapred.MapTask: soft limit at 83886080
13/12/13 12:00:23 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
13/12/13 12:00:23 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
13/12/13 12:00:23 INFO mapreduce.Job: Job job_local1170901099_0001 running in uber mode : false
13/12/13 12:00:23 INFO mapreduce.Job: map 0% reduce 0%
13/12/13 12:00:24 INFO mapred.MapTask: Starting flush of map output
13/12/13 12:00:24 INFO mapred.LocalJobRunner: Starting task: attempt_local1170901099_0001_m_000001_0
13/12/13 12:00:24 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
13/12/13 12:00:24 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
13/12/13 12:00:24 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/chaitanya.nadig/friendship.txt:134217728+134217728
13/12/13 12:00:24 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
13/12/13 12:00:24 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
13/12/13 12:00:24 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
13/12/13 12:00:24 INFO mapred.MapTask: soft limit at 83886080
13/12/13 12:00:24 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
13/12/13 12:00:24 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
13/12/13 12:00:25 INFO mapred.MapTask: Starting flush of map output
This is the tail of the log which is generated when the error occurs.
13/12/13 12:00:43 INFO mapred.MapTask: Starting flush of map output
13/12/13 12:00:43 INFO mapred.Task: Task:attempt_local1170901099_0001_m_000020_0 is done. And is in the process of committing
13/12/13 12:00:43 INFO mapred.LocalJobRunner: map
13/12/13 12:00:43 INFO mapred.Task: Task 'attempt_local1170901099_0001_m_000020_0' done.
13/12/13 12:00:43 INFO mapred.LocalJobRunner: Finishing task: attempt_local1170901099_0001_m_000020_0
13/12/13 12:00:43 INFO mapred.LocalJobRunner: Map task executor complete.
13/12/13 12:00:43 WARN mapred.LocalJobRunner: job_local1170901099_0001
java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at org.apache.hadoop.io.Text.setCapacity(Text.java:266)
at org.apache.hadoop.io.Text.append(Text.java:236)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:238)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:164)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)
13/12/13 12:00:44 INFO mapreduce.Job: map 100% reduce 0%
13/12/13 12:00:44 INFO mapreduce.Job: Job job_local1170901099_0001 failed with state FAILED due to: NA
13/12/13 12:00:44 INFO mapreduce.Job: Counters: 22
File System Counters
FILE: Number of bytes read=27635962
FILE: Number of bytes written=28018656
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=5338170260
HDFS: Number of bytes written=0
HDFS: Number of read operations=25
HDFS: Number of large read operations=0
HDFS: Number of write operations=1
Map-Reduce Framework
Map input records=0
Map output records=0
Map output bytes=0
Map output materialized bytes=6
Input split bytes=122
Combine input records=0
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=5
Total committed heap usage (bytes)=530186240
File Input Format Counters
Bytes Read=118909386

This answer is late, but posting it in case it helps someone else. The problem was that the file I was trying to process was corrupted. I got different copy of the file and ran my MR job on it and everything worked fine.

My first impulse would be to ask what your startup parameters are. Typically, when you run MapReduce and experience an out-of-memory error, you would use something like the following as your startup params:
-Dmapred.map.child.java.opts=-Xmx1G -Dmapred.reduce.child.java.opts=-Xmx1G
The key here is that these two amounts are cumulative. So, the amounts you specificy added together should not come close to exceeding the memory available on your system after you start MapReduce.

Might be late but i solved this by setting the following parameter to 0.2
mapred.job.shuffle.input.buffer.percent
This tells the reducer JVM in the shuffle space to ask only 0.2 % of the heap space,rather than 0.7%.You are getting "Out of heap space" error because the shuffle space is asking the JVM for memory which is not available to it.Rather than spilling it just throws the exception.But if you ask only for 0.2% chances are you will get the memory.Also once you exceed the alloted memory the spilling logic comes into picture.
Ofcourse the downside is the slowless.
You can also calculate at run-time the amount of memory available and then reset the buffer.

Increasing io.sort.mb

It would be highly appreciated if someone could help me to find out what went wrong in my configuration.
I wanted to increase the value of io.sort.mb and thus I added the property below in core-site.xml.
io.sort.mb
350m
The runtime information I am attaching below clearly shows that the value of io.sort.mb, did not change rather the default value io.sort.mb = 100 stayed.
13/08/15 16:43:34 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1e5e96c1
13/08/15 16:43:34 INFO mapred.MapTask: numReduceTasks: 1
13/08/15 16:43:34 INFO mapred.MapTask: **io.sort.mb = 100**
13/08/15 16:43:34 INFO mapred.MapTask: data buffer = 79691776/99614720
13/08/15 16:43:34 INFO mapred.MapTask: record buffer = 262144/327680
13/08/15 16:43:34 INFO mapred.MapTask: Starting flush of map output
13/08/15 16:43:34 INFO mapred.MapTask: Finished spill 0
13/08/15 16:43:34 INFO mapred.Task: Task:attempt_local_0001_m_004609_0 is done. And is in the process of commiting
Since it was not working, I added the property in mapred-site.xml schema, however I got the same outcome as above.
Can anyone suggest me what should I do?
Thanking you in advance.
Haq

according to the article here io.sort.mb should be 10 * io.sort.factor incase you have ram.
"core-site.xml"
<property>
<name>io.sort.factor</name>
<value>100</value>
<description>More streams merged at once while sorting files.</description>
</property>
<property>
<name>io.sort.mb</name>
<value>200</value>
<description>Higher memory-limit while sorting data.</description>
</property>
trying changing sort factor also on all nodes.

this conf should be in mapred-site.xml instead of core-site.xml
refer: http://hadoop.apache.org/docs/r1.0.4/mapred-default.html

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Hive and Hadoop Running Only Locally - hadoop

Related

Mapreduce Job running in local mode instead of cluster

Hadoop 2.7.0 - MapReduce Jobs not Running - Failing with AM Container Error

Second mapper not getting called - MultipleInputs

Mapreduce throwing OutOfMemoryError for large input file

Increasing io.sort.mb

Categories

Resources