Hadoop Mapreduce stuck and never completes - hadoop

I am using Pig to store data into hive. I have a problem that when I execute program, it shows 0% complete and stuck and never completed. I run around 3 hours but never showed any problem. I started searching and found the problem might be in yarn.xml and map_reduce.xml configuration. I changed configuration, but it never effected at all.

Related

Hive applications are lazy to start

Hive(TEZ) queries are starting in lazy fashion on YARN.
We are running Hive queries on Tez engine. As soon as we submit the queries, we are able to see the status as RUNNING, but the actual job is not starting until 10 to 15 mins.
I am not very sure what code have to present to understand the problem as Hive, Tez and YARN consists lot of configurations. Please let me know in case any configurations required for further investigation on my issue.
The expected scenario is, looking to execute the query as soon as submitted.

Hadoop Pig job not running

I am testing hadoop, as of now I have :
1) localhost:8088 working
2) localhost:50070 working
3) I created a few files on hdfs
Then I launch pig, and do a LOAD on a file, and then a FILTER, and then a DUMP.
When I DUMP, pig display info about the starting of the mapreduce.
It ends with a sentence like :
"MapReduceLauncher - 0% complete" + "Running Jobs are [job_xxx]".
So I think the job is launched. I even see it as an ACCEPTED App on the hadoop interface on localhost:8088. But then nothing happens : it is stucked at 0% complete :-(
So, the job is "ACCEPTED" but never goes to RUNNING :-(
Should I do something to make my grunt/pig command lines instructions to run ??
Thanks.
JR.
PS: I can't make any copy paste from my job environment.
I unblocked the situation while getting aware that my hard drive was 90% full. At this level, hadoop refuses to write anymore logs. I just had to delete some (big !) files to get it running again...

Has anyone use mapred.job.tracker=local in hadoop streaming job?

in last weeks, we use hadoop streaming to calculate some reports everyday. Recently we made a change to our program, if the input size is smaller than 10MB, we will set mapred.job.tracker=local in the JobConf, then the job will run locally.
But last night, many jobs failed, with status 3 returned by runningJob.getJobState().
I don't know why, and there is nothing in the stderr.
I can google nothing related about this question. So I'm wondering if I should use mapred.job.tracker=local in production mode? Maybe it's just a debug solution in developing supplied by hadoop.
Has anyone know something about it? Anything, Any infomation, Thank you.
I believe setting mapred.job.tracker=local has nothing to do with your error as local is the default value.
This config parameter defines the host and port that the MapReduce job tracker runs at. If it is set to be "local", then jobs are run in-process as a single map and reduce task.
Refer here.

Hadoop mapreduce getMapOutput failed

Current setup:
- Hadoop 0.20.2-cdh3u3
- Hbase Version 0.90.4-cdh3u3
- Jetty-6.1.14
- Running on VM (Debian Squeeze)
Problem appears during mapreduce process on Hbase table. On the Reduce phase it crashes every time at the very same point with these logs in tasktracker.log:
ERROR org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_201205290717_0001_m_000010_0,3) failed:
org.mortbay.jetty.EofException
WARN org.mortbay.log: Committed before 410 getMapOutput(attempt_201205290717_0001_m_000010_0,3) failed :
org.mortbay.jetty.EofException
ERROR org.mortbay.log: /mapOutput
java.lang.IllegalStateException: Committed
Hoping anyone faced the same or similar problem before, looking for a solution.
I am facing the same issue here.
On my cluster, this happens on all slaves (datanode & tasttrackers) except for one, which results in the general reduce process to first progress very slowly and at a certain point in a reroll of the reduce progress so far due to some error. the reduce process then starts all over again: the job never finishes.
There is an open major issue in the bugtracker. See https://issues.apache.org/jira/browse/MAPREDUCE-5
Let us hope, it will be fixed some day, but at the very moment, i can not use my hadoop program with huge files > 3 GB at all. In my case i hope, i can fix it by additional data cleaning and more efficient data structures (trove, fastutils), so the problem doesnt occur at all, but honestly, this feels kind of like the wrong approach here. Not to do those smaller tweaks was the main reason starting with hadoop anyways.
The Jetty EOFException is observed when the reduce Task prematurely closes a connection to a jetty server. Restart the tasktrackers and run the job again. See if it works for you.

Hadoop Mapper not running my class

Using Hadoop version 0.20.. I am creating a chain of jobs job1 and job2 (mappers of which are in x.jar, there is no reducer) , with dependency and submitting to hadoop cluster using JobControl. Note I have setJarByClass and getJar gives the correct jar file, when checked before submission.
Submission goes through and there seem to be no errors in user logs and jobtracker. But I dont see my Mapper getting executed (no sysouts or log output), but default output seems to be coming to the output folder (input file is read as is and output). I am able to run the job directly using x.jar, but I am really out of clues as to why it is not running with Jobcontrol.
Please help !
This issue bugged me for quite some days. Finally I found that it was the UsedGenericOptionsParser which created the issue. Set this to true and everything started working fine.

Resources