I was executing few mapreduce program on the hadoop cluster. The programs executed successfully and gave the required output.
using jps command I noticed that RunJar was still running as the process. I stopped my cluster but still the process id was up.
I know that Hadoop jar invokes base Runjar for execution of jar, but is it normal that even after job completion the process is up?
enter image description here
if yes, in that care muliple Runjar instances will keep running, how can i make sure that after job completion, run jar even stops(I don't wish to kill the process)
The RunJar process is normally the result of someone or something running “hadoop jar "
you can kill the process with:
kill 13082
Related
I have a Hadoop cluster running Cloudera's CDH3, Apache Hadoop's 0.20.2 equivalent. I want to restart the job-tracker as there are some jobs which are not getting killed. I tried killing them from the command line, the command executes successfully, but the jobs are still in Job Cleanup: Pending status. Anyways I want to restart the job-tracker and see if that cleanup the jobs. I know the command to restart the job-tracker, but I am not sure if I need to put the name-node in safe-mode before I restart the job-tracker.
You can try to kill the unwanted jobs using hadoop job -kill <Job-ID> and check for command status echo "$?". If that doesn't work, Restart is the only option.
Hadoop Jobtracker and namenodes are independent components, No need to execute namenode safenode before Jobtracker restart. You can restart Jobtracker process alone.(tasktracker if required)
I'm encountering a problem with Pig and Oozie.
I have pig script that tries to read data from non-existent table, so an exception happens in initialize method of RecordReader . And that is ok, it should occur ( as the table definitely doesn't exist).
The problem starts when such a script is launched via oozie on a multi-node hadoop cluster - after the first attempt job just hangs and does nothing until any other job is submitted to the cluster.
If launched via CMD (pig -f test.pig) it doesn't hang. It also doesn't hang if launched in local mode or on a single-node cluster(via CMD or via Oozie).
I really hope someone had a problem like this and can help me.
I have configured Spark jobserver to run on YARN.
I am able to send spark jobs to YARN but even after the job finishes it does not quit on YARN
For eg:
I tried to make a simple spark context.
The context is reflecting in jobserver but YARN is still running the process and is not quieting I have to manually kill the tasks.
Yarn Job
Spark Context
Job server reflects the contexts but as soon as I try to run any task in it Job server give me an error
{
"status": "ERROR",
"result": "context test-context2 not found"
}
My Spark UI is also not very helpful
I'm working by CDH 5.1 now. It starts normal Hadoop job by YARN but hive still works with mapred. Sometimes a big query will hang for a long time and I want to kill it.
I can find this big job by JobTracker web console while it didn't provide a button to kill it.
Another way is killing by command line. However, I couldn't find any job running by command line.
I have tried 2 commands:
yarn application -list
mapred job -list
How to kill big query like this?
You can get the Job ID from Hive CLI when you run a job or from the Web UI. You can also list the job IDs using the application ID from resource manager. Ideally, you should get everything from
mapred job -list
or
hadoop job -list
Using the Job ID you can kill it by using the below command.
hadoop job -kill <job_id>
Another alternative would be to kill the application using
yarn application -kill <application_id>
what's the difference between run a jar file with commands "hadoop jar " and "yarn -jar " ?
I've used the "hadoop jar" command on my MAC successfully but I want be sure that the execution is being correct and parallel on my four cores.
Thanks!!!
Short Answer
They are probably identical for you, but even if they aren't, they should both utilize your cluster to the best of its ability.
Longer Answer
The /usr/bin/yarn script sets up the execution environment so that all of the yarn commands can be run. The /usr/bin/hadoop script isn't quite as concerned about yarn specific functionality. However, if you have your cluster set up to use yarn as the default implementation of mapreduce (MRv2), then hadoop jar will probably act the same as yarn jar for a mapreduce job.
Either way you're probably fine, but you can always check the resource manager (or job tracker) web interface to see how your job is distributed across the cluster (whether it's a single node cluster or not)