I'm setting up a Quartz driven job in a Spring. The job needs a single argument which is the id of a database record that it can use to locate the data it needs to process.
The sequence is:
Job starts,
locates next available record id,
processes data.
Because the record id is unknown until the job starts, I cannot set it up when I create the job. I also need to account for restarts if things go bad. From reading Quartz doco it appears that if I store the record Id in the trigger's JobDataMap, then when the server restarts, the job will automatically restart with the same record Id it was original started with.
This is where things get tricky, I'm trying to figure out where and when to get the record Id so I can store it in the trigger's JobDataMap. I'm thinking I need to implement a TriggerListener and use it to set the record Id in the JobDataMap when the triggerFired() callback is called. This will involve a call to the database to get the record Id.
I'm not really sure if this approach is the correct one, or whether I'm barking up the wrong tree. Can someone with some quartz experience tell me if this is correct, or if there is a better way to configure a jobs arguments so that they can be dynamically set and a restart will preserve them?
Thanks
Derek
Related
I have a large set of users in my project like 50m.
I should create a playlist for each user every day, for doing this, I'm currently using this method:
I have a column in my users' table that holds the latest time of creating a playlist for that user, and I name it last_playlist_created_at.
I run a query on the users' table and get the top 1000s, that selects the list of users which their last_playlist_created_at is past one day and sort the result in ascending order by last_playlist_created_at
After that, I run a foreach on the result and publish a message for each in my message-broker.
Behind the message-broker, I start around 64 workers to process the messages (create a playlist for the user) and update last_playlist_created_at in the users' table.
If my message-broker messages list was empty, I will repeat these steps (While - Do-While)
I think the processing method is good enough and can be scalable as well,
but the method we use to create the message for each user is not scalable!
How should I do to dispatch a large set of messages for each of my users?
Ok, so my answer is completely based on your comment where you mentioned that you use while(true) to check if the playlist needs to be updated which does not seem so trivial.
Although this is a design question and there are multiple solutions, here's how I would solve it.
First up, think of updating the playlist for a user as a job.
Now, in your case this is a scheduled Job. ie. once a day.
So, use a scheduler to schedule the next job time.
Write a Scheduled Job Handler to push this to a Message Queue. This part is just to handle multiple jobs at the same time where you could control the flow.
Generate the playlist for the user based on the job. Create a Schedule event for the next day.
You could persist Scheduled Job data just to avoid race conditions.
When we start a job via dbms_schedular, it creates a record in *_SCHEDULER_JOBS
Is there a way at the middle or end of a job. I write onto the record
Example:
Job_A spawns Procedure_B
And inside Procedure_B, I can write back to the record in *_scheduler_jobs in Job_A
As procedure_B is looping on a sequence of sub-task e.g. batch_id=1 to 10
and I want to write id=1 and id=10 into the entry for Job_A in *_SCHEDULER_JOBS
-Thanks for all the help.
Question is a bit vague on "I want to write id=1 and id=10 into the entry for Job_A in *_SCHEDULER_JOBS".
From within the procedure, you can use the SYS_CONTEXT('USERENV','BG_JOB_ID') to get the object id of the scheduler job, and look the owner / name up in ALL_OBJECTS. If the job is owned by the same user as the procedure or has been granted appropriate privileges, it can then call DBMS_SCHEDULER.SET_ATTRIBUTE to make changes to the job. Depending what attribute it changes, that will probably only kick in for the next run of the job.
BG_JOB_ID only works if the job is run from the scheduler, and not if someone is doing dbms_scheduler.run_job within their own session. Given multiple sessions could do this concurrently, you probably don't want them all trying to make changes to the job anyway.
I am using Spring batch JDBCCursorItemReader to read set of data from a table. Once data is read spring batch will process each row in a chunk(reader, processor, writer). Now I want to update/delete those records which my reader fetched to avoid reprocessing by another instance of same job. Can someone please tell me how can I do this in reader?
Thanks
Like it has been pointed out this might be a bad design idea. However if your sure this is what you want to do,
create a two step job,
step a, with commit interval as 1
Read the record
Write the updated record with the current job execution id
step b
Read the record where job execution id is current job execution id
process and update as needed
Notes
I do not recommend this approach for reasons stated in the comments
A commit interval of 1 would kill you performance wise so this approach if ever used should be for a low volume job only.
I'm trying to initialize my data in my Azure Data Tables but I only want this to happen once on the server at startup (i.e. via the WebRole Role Entry OnStart routine). The problem is if I have multiple instances starting up at the same time then potentially either one of those instances can add records to the same table at the same time hence duplicating the data at runtime.
Is there there like an overarching routine for all instances? An application object in which I can shove a value into and check it in each of the instances to see if the tables have been created or not? A singleton of some sort that azure exposes?
Cheers
Rob
No, but you could use a Blob lease as a mutex. You could also use a table lock in SQL Azure, if you're using that.
You could also use a Queue, and drop a message in there and then just one role would pick up the message and process it.
You could create a new single instance role that does this job on role start.
To be really paranoid about this and address the event of failure in the middle of writing the data, you can do something even more complex.
A queue message is a great way to ensure transactional capabilities as long as the work you are doing can be idempotent.
Each instance adds a message to a queue.
Each instance polls the queue and on receiving a message
Reads the locking row from the table.
If the ‘create data state’ value is ‘unclaimed’
Attempts to update the row with a ‘in process’ value and a timeout expiration timestamp based on the amount of time needed to create the data.
if the update is successful, the instance owns the task of creating the data
So create the data
update the ‘create data state’ to ‘committed’
delete the message
else if the update is unsuccessful the instance does not own the task
so just delete the message.
Else if the ‘create data’ value is ‘in process’, check if the current time is past the expiration timestamp.
That would imply that the ‘in process’ failed
So try all over again to set the state to ‘in process’, delete the incomplete written rows
And try recreating the data, updating the state and deleting the message
Else if the ‘create data’ value is ‘committed’
Just delete the queue message, since the work has been done
I have a fairly simple domain model involving a list of Facility aggregate roots. Given that I'm using CQRS and an event-bus to handle events raised from the domain, how could you handle validation on sets? For example, say I have the following requirement:
Facility's must have a unique name.
Since I'm using an eventually consistent database on the query side, the data in it is not guaranteed to be accurate at the time the event processesor processes the event.
For example, a FacilityCreatedEvent is in the query database event processing queue waiting to be processed and written into the database. A new CreateFacilityCommand is sent to the domain to be processed. The domain services query the read database to see if there are any other Facility's registered already with that name, but returns false because the CreateNewFacilityEvent has not yet been processed and written to the store. The new CreateFacilityCommand will now succeed and throw up another FacilityCreatedEvent which would blow up when the event processor tries to write it into the database and finds that another Facility already exists with that name.
The solution I went with was to add a System aggregate root that could maintain a list of the current Facility names. When creating a new Facility, I use the System aggregate (only one System as a global object / singleton) as a factory for it. If the given facility name already exists, then it will throw a validation error.
This keeps the validation constraints within the domain and does not rely on the eventually consistent query store.
Three approaches are outlined in Eventual Consistency and Set Validation:
If the problem is rare or not important, deal with it administratively, possibly by sending a notification to an admin.
Dispatch a DuplicateFacilityNameDetected event, which could kick off an automated resolution process.
Maintain a Service that knows about used Facility names, maybe by listening to domain events and maintaining a persistent list of names. Before creating any new Facility, check with this service first.
Also see this related question: Uniqueness validation when using CQRS and Event sourcing
In this case, you may implement a simple CRUD style service that basically does an insert in a Sql table with a primary key constraint.
The insert will only happen once. When duplicate commands with the same value that should only exist one time hits the aggregate, the aggregate calls the service, the service fails the Insert operation due to a violation of the Primary Key constraint, throws an error, the whole process fails and no events are generated, no reporting in the query side, maybe a reporting of the failure in a table for eventual consistency checking where the user can query to know the status of the command processing. To check that, just query again and again the Command Status View Model with the Command Guid.
Obviously, when the command holds a value that does not exists in the table for primary key checking, the operation is a success.
The table of the primary key constraint should be only be used as a service, but, because you implemented Event sourcing, you can replay the events to rebuild the table of primary key constraint.
Because uniqueness check would be done before data writing, so the better method is to build a event-tracking service, which would send a notification when the process finished or terminated.