I have a batch that is scheduled to run every day at 00:05.
On some days, a job in the batch fails immediately with the ModelNotFoundException.
However, the model that was not found does exist.
There were no change to any of the concerned models in the database. (Field, Category, Condition)
There is also no code in the application that allows to delete the category.
Retrying the job manually from the horizon dashboard make the job pass.
The dba said there are no error in the logs and no scheduled maintenance at that time.
What can possibly cause this?
Related
I am using Advanced Scheduler to schedule a cron job daily once at 12 PM. But the code executes automatically even though I have disabled the trigger and disconnected my GitHub repository. So far the code has executed three times in an interval of 2 hours.
I don't understand why it is executing automatically?
Any idea?
My spring batch application is running on PCF platform which is connected to MySQL database (single instance), it's running fine when only an instance is up & running but when it comes to more than one application instance, I'm getting exception org.springframework.dao.DuplicateKeyException. This might be happening because similar batch job is firing at the same time & trying to update batch instance table with same job ID. Is there any way to restrict this kind of failure or in another way, I wanted a solution where only one batch job will run at a time even there are multiple instances running.
For me , it is a good sign that DuplicateKeyException is thrown. Because it exactly achieves what you want to do is that spring-batch already makes sure that the same job execution will not executed in parallel. (i.e. Only one server instance execute the job successfully while other fail to execute)
So I see no harms in your case. If you don't like this exception , you can catch it and re-throw it as your application level exception saying something like "The job is executing by other sever instances , so skip to execute it."
If you really want that only one server instance will try to trigger to execute a job and other servers will not try to trigger in the meantime , it is not the problem of spring-batch but is a problem about how you ensure that only one server node will fires the request in the distributed environment. If the batch job is fired as a scheduled task using #Scheduled , you can consider to use a distributed lock such as ShedLock to make sure that it is executed at most once at the same time on one node only.
Currently we are using Spring Cloud Dataflow to run a sequence of apps we have created based on a definition. Each of the apps we have made are spring batch jobs, with individual steps. The current issue we are having is that when one of these steps inside the app's batch job fails, it is reflected as expected in the step_execution, job_execution, and task_execution tables in the scdf database. However, we are not able to rerun any scdf job that has failed in an app from the top scdf level because it seems the row entry in the step_execution table for SCDF's step related to the overall app never propagates to FAILED in the status column, instead always being COMPLETED no matter what happens. Below I have included a picture which gets across what I am saying. test-simple8-test-app is the app we have created, while check-step, sleep-step, and should-error-step are steps inside the job for that app. You can see in the should-error-step that it has FAILED for both ExitCode and Status, while the entry for the app itself has COMPLETED for status and FAILED for ExitCode.
Relevant Table
We have tried altering what we report in the task_execution table since we saw CTR is looking for certain fields there, but it still seems it does not affect the Status column in step_executions. If we manually change the entry in the db to FAILED for that value, it proceeds as we would expect and as is normal for spring batch, in that it resumes the job from that app and re executes it.
Is there a good way to relieve this problem, or is it a problem with the way we are approaching it?
Edit: Added Flow Diagram for better clarity
Problem
Users can submit data to generate a report, which triggers a spring-batch job. If the same data is submitted (by the same user or another user) the same report should be generated such that Spring Batch doesn't start a new job, under the premise that the report has already been generated.
To make matters a little more complicated, generated reports expire after 90 days. The idea behind this is that the data gleaned from various web services used to build the report is likely out of date. Therefore, after 90 days the report should be re-generated using new data from those web services.
Questions
When a job has already run, how can I discover the job execution id for that job? This id is used in the URL to uniquely identify a report. JobExplorer is severely limited in querying Spring Batch data.
How can I trigger another instance of the job only after 90 days? The issue is that given duplicate job parameters, a JobInstanceAlreadyCompleteException will be thrown. Must I encode the 90 days has an extra identifying parameter, or is there an easier way?
Clean up old jobs must be done using business methods as well as for expired reports.
After this premise you can try a different path to solve your problem:
Every user launch a different job, with same report properties but
an extra job-parameters to make every jobs different
First step is to check - using business method - if there is a running job for that report ; in this case notify user he have to wait or retry later (use a decider)
Second step is to check if there is a completed - not expired - report using a business method;if so retrieve it and show to user (use a decider as for step 2)
Generate report (delete old, if necessary)
Show report to user.
Of course, generated report metadata tables are different from SB tables and should be accessed using DAO related to domain context (the report in your case).
Can this a valid alternative?
I would like to be able to set all indexes to "Update on schedule" and then have them all update automatically like Magento says they should in the background. The problem is, this doesn't happen. There is no cron job that automatically reindexes (See this related question).
So, if I have to create my own cron job, how do I do this exactly in an efficient way? I don't want to run "php shell/indexer.php reindexall". That does full rebuilds of index tables. Sure, I could do that nightly, but that means that no changes will be reflected on the frontend until the next day. That's not an acceptable solution. If I run full reindexes throughout the day, I end up with the same problem that I have right now - table locks and slowness due to reindexing while people are working in the admin.
Magento's new "partial reindexing" should fix this right?
This is my understanding of how it works:
I edit an entity that has a related index (e.g. A product).
A database trigger adds a record to related change log tables.
Some process later reads the change log tables and reindexes these specific entities
Concrete example
I update a value in "catalog_product_entity_varchar".
The database trigger "trg_catalog_product_entity_varchar_after_update" flags this product as changed by inserting a new version into "catalog_product_flat_cl" and "catalogsearch_fulltext_cl".
A partial reindex process reads these change log tables and reindexes only the products mention to "catalog_product_flat" and "catalogsearch_fulltext" respectively.
If this were the case, the reindexing process would be minimal and could be run often. Even every minute to where indexing becomes almost unnoticeable to admin users. (I say every minute, because Magento tells us this is possible)
In this release, however, the flat catalog is updated for you — either every minute, or according to your Magento cron job.
Where is this mystical partial reindex? How do I call it instead of reindexing everything?
Is there a reindexPartial()?
The enterprise_refresh_index cron job appears to run this. It runs every time the Magento cron runs. See Enterprise_Index_Model_Observer::refreshIndex().
This is not intended to run manually because of the need to establish a lock file. It is easiest just to run the cron.php file if you need a manual reindex.
I believe I just have a project specific issue with this not running.
The partial reindexing is executed through the cron job operator built into Magento. You do not need to run the actual indexer.php file. Instead, you must setup Magento's built in Cron scheduler based on the documentation.
Documentation: http://www.magentocommerce.com/wiki/groups/227/setting_up_magento_in_cron
You simply execute the cron.php file, which will in turn call the partial reindexing process.
php5-cli -f /home/USERNAME/public_html/cron.php
How it works:
A change to the an entity is made and is flagged to be reindex.
A cronjob executes the cron.php file
Magento checks to see which cron tasks it will run, and runs the partial reindexing process
The indexing process will see the changed entity and update the index tables with the new values.