I want to restart an oozie bundle job that is in a killed state.
The bundle job has one coordinator job.
I accidentally killed the coordinator job instead of killing a particular action id.
Due to this the bundle job is also marked as killed.
I tried -rerun on the individual actions on the coordinator job, but no new actions are getting created.
I tried -rerun on the bundle job but don't see any change in the bundle job status. It is still showing as killed.
Please suggest if there is a way to make the bundle job into running again.
Thanks
Related
I have a Hadoop cluster running Cloudera's CDH3, Apache Hadoop's 0.20.2 equivalent. I want to restart the job-tracker as there are some jobs which are not getting killed. I tried killing them from the command line, the command executes successfully, but the jobs are still in Job Cleanup: Pending status. Anyways I want to restart the job-tracker and see if that cleanup the jobs. I know the command to restart the job-tracker, but I am not sure if I need to put the name-node in safe-mode before I restart the job-tracker.
You can try to kill the unwanted jobs using hadoop job -kill <Job-ID> and check for command status echo "$?". If that doesn't work, Restart is the only option.
Hadoop Jobtracker and namenodes are independent components, No need to execute namenode safenode before Jobtracker restart. You can restart Jobtracker process alone.(tasktracker if required)
Say I have a currently running oozie bundle running coordinators A and B.
I do development work on coordinator B and I want to relaunch coordinator B.
Is there any way to relaunch a coordinator inside of an oozie bundle, without restarting the bundle itself? This is because I don't want to restart coordinator A.
Otherwise, is there a way to add/remove a coordinator from a currently running oozie bundle?
I have a spring batch job which reads, transforms and writes to an Oracle database. I am running the job via the CommandLineJobRunner utility (using a fat jar + dependencies generated with the maven shade plugin); the job subsequently fails halfway through due to "java heap memory limit reached" and the job is not marked as FAILED but rather still shows status STARTED.
I tried to re-run the job using the same job parameters (as the docs suggest) but this gives me this error:
5:24:34.147 [main] ERROR o.s.b.c.l.s.CommandLineJobRunner - Job Terminated in error: A job execution for this job is already running: JobInstance: id=1, version=0, Job=[maskTableJob]
org.springframework.batch.core.repository.JobExecutionAlreadyRunningException: A job execution for this job is already running: JobInstance: id=1, version=0, Job=[maskTableJob]
at org.springframework.batch.core.repository.support.SimpleJobRepository.createJobExecution(SimpleJobRepository.java:120) ~[maskng-batch-1.0-SNAPSHOT-executable.jar:1.0-SNAPSH
I have tried all sorts of things (like manually setting the status to FAILED, using the -restart argument) but to no avail. Is there something I am missing here as I thought one of the strong points of spring batch is its ability to restart jobs where they left off....!!?
First thing that you should know is Joblauncher cannot be used to restart the job which has already run .
The reason why you are getting "JobExecutionAlreadyRunningException" is because the parameter that you are passing is already present in the DB and hence you are getting this exception .
In spring batch , job can be restarted if it has completed with "FAILED" status or "STOPPED" status.
JobOperator has restart method which can be used to restart a failed job by passing the jobexecution id which was completed with "FAILED" status or "STOPPED" status.
Please note that a job cannot be restarted if it has completed with "FINISHED" status .
In this case you will have to submit new job with new job parameters
If you want to manually set the status of job as failed then run the below query and restart the job using JobOperator.restart() method.
update batch_job_execution set status="FAILED", version=version+1 where job_instance_id=jobId;
Improper handling of transaction management could be one possible reason why your job status is not getting updated with the "FAILED" status . Please make sure you are transaction is getting completed even if the job has encountered run time exception.
I was executing few mapreduce program on the hadoop cluster. The programs executed successfully and gave the required output.
using jps command I noticed that RunJar was still running as the process. I stopped my cluster but still the process id was up.
I know that Hadoop jar invokes base Runjar for execution of jar, but is it normal that even after job completion the process is up?
enter image description here
if yes, in that care muliple Runjar instances will keep running, how can i make sure that after job completion, run jar even stops(I don't wish to kill the process)
The RunJar process is normally the result of someone or something running “hadoop jar "
you can kill the process with:
kill 13082
I have configured Spark jobserver to run on YARN.
I am able to send spark jobs to YARN but even after the job finishes it does not quit on YARN
For eg:
I tried to make a simple spark context.
The context is reflecting in jobserver but YARN is still running the process and is not quieting I have to manually kill the tasks.
Yarn Job
Spark Context
Job server reflects the contexts but as soon as I try to run any task in it Job server give me an error
{
"status": "ERROR",
"result": "context test-context2 not found"
}
My Spark UI is also not very helpful