I have a spring batch job which reads, transforms and writes to an Oracle database. I am running the job via the CommandLineJobRunner utility (using a fat jar + dependencies generated with the maven shade plugin); the job subsequently fails halfway through due to "java heap memory limit reached" and the job is not marked as FAILED but rather still shows status STARTED.
I tried to re-run the job using the same job parameters (as the docs suggest) but this gives me this error:
5:24:34.147 [main] ERROR o.s.b.c.l.s.CommandLineJobRunner - Job Terminated in error: A job execution for this job is already running: JobInstance: id=1, version=0, Job=[maskTableJob]
org.springframework.batch.core.repository.JobExecutionAlreadyRunningException: A job execution for this job is already running: JobInstance: id=1, version=0, Job=[maskTableJob]
at org.springframework.batch.core.repository.support.SimpleJobRepository.createJobExecution(SimpleJobRepository.java:120) ~[maskng-batch-1.0-SNAPSHOT-executable.jar:1.0-SNAPSH
I have tried all sorts of things (like manually setting the status to FAILED, using the -restart argument) but to no avail. Is there something I am missing here as I thought one of the strong points of spring batch is its ability to restart jobs where they left off....!!?
First thing that you should know is Joblauncher cannot be used to restart the job which has already run .
The reason why you are getting "JobExecutionAlreadyRunningException" is because the parameter that you are passing is already present in the DB and hence you are getting this exception .
In spring batch , job can be restarted if it has completed with "FAILED" status or "STOPPED" status.
JobOperator has restart method which can be used to restart a failed job by passing the jobexecution id which was completed with "FAILED" status or "STOPPED" status.
Please note that a job cannot be restarted if it has completed with "FINISHED" status .
In this case you will have to submit new job with new job parameters
If you want to manually set the status of job as failed then run the below query and restart the job using JobOperator.restart() method.
update batch_job_execution set status="FAILED", version=version+1 where job_instance_id=jobId;
Improper handling of transaction management could be one possible reason why your job status is not getting updated with the "FAILED" status . Please make sure you are transaction is getting completed even if the job has encountered run time exception.
Related
I want to restart an oozie bundle job that is in a killed state.
The bundle job has one coordinator job.
I accidentally killed the coordinator job instead of killing a particular action id.
Due to this the bundle job is also marked as killed.
I tried -rerun on the individual actions on the coordinator job, but no new actions are getting created.
I tried -rerun on the bundle job but don't see any change in the bundle job status. It is still showing as killed.
Please suggest if there is a way to make the bundle job into running again.
Thanks
I'm encountering a problem with Pig and Oozie.
I have pig script that tries to read data from non-existent table, so an exception happens in initialize method of RecordReader . And that is ok, it should occur ( as the table definitely doesn't exist).
The problem starts when such a script is launched via oozie on a multi-node hadoop cluster - after the first attempt job just hangs and does nothing until any other job is submitted to the cluster.
If launched via CMD (pig -f test.pig) it doesn't hang. It also doesn't hang if launched in local mode or on a single-node cluster(via CMD or via Oozie).
I really hope someone had a problem like this and can help me.
I have change my kerberos cluster to unkerberized
after that below Exception occurred while launching MR jobs.
Application application_1458558692044_0001 failed 1 times due to AM Container for appattempt_1458558692044_0001_000001 exited with exitCode: -1000
For more detailed output, check application tracking page:http://hdp1.impetus.co.in:8088/proxy/application_1458558692044_0001/Then, click on links to logs of each attempt.
Diagnostics: Operation not permitted
Failing this attempt. Failing the application.
I am able to continue my work
delete yarn folder from all the nodes which is define on yarn.nodemanager.local-dirs this property
then restart yarn process
I am new to oozie, trying to write a oozie workflow in CDH4.1.1. So I started the oozie service and then I checked the status using this command:
sudo service oozie status
I got the message:
running
Then I tried this command for checking the status:
oozie admin --oozie http://localhost:11000/oozie status
And I got the below exception:
java.lang.NullPointerException
at java.io.Writer.write(Writer.java:140)
at org.apache.oozie.client.AuthOozieClient.writeAuthToken(AuthOozieClient.java:182)
at org.apache.oozie.client.AuthOozieClient.createConnection(AuthOozieClient.java:137)
at org.apache.oozie.client.OozieClient.validateWSVersion(OozieClient.java:243)
at org.apache.oozie.client.OozieClient.createURL(OozieClient.java:344)
at org.apache.oozie.client.OozieClient.access$000(OozieClient.java:76)
at org.apache.oozie.client.OozieClient$ClientCallable.call(OozieClient.java:410)
at org.apache.oozie.client.OozieClient.getSystemMode(OozieClient.java:1299)
at org.apache.oozie.cli.OozieCLI.adminCommand(OozieCLI.java:1323)
at org.apache.oozie.cli.OozieCLI.processCommand(OozieCLI.java:499)
at org.apache.oozie.cli.OozieCLI.run(OozieCLI.java:466)
at org.apache.oozie.cli.OozieCLI.main(OozieCLI.java:176)
null
Reading the exception stack, I am unable to figure out the reason for this exception. Please let me know why I got this exception and how to resolve this.
Try disabling the env property USE_AUTH_TOKEN_CACHE_SYS_PROP in your cluster. As per your stacktrace and the code .
Usually the clusters are setup with Kerberos based authentication, which is set up by following the steps here . Not sure if you want to do that, but just wanted to mentioned that as an FYI.
Pig script(Tez enabled) embedded in a shell wrapper returns exit code 0 or gracefully exits even if it throws an error.
In case of a batch process, the task is supposed to error out and stop the process. But in this case all the downstream tasks will be executed.
Reading the Pig Jira, it is said that its fixed in 0.8.1, but i am using 0.14 and still having the issue.
Have anyone faced such an issue?