I am new to RabbitMQ fanout exchange. I have used it to update and fetch large amount of records in database.I have a 4 servers up and running.
I am able to fetch and update records from multiple servers but I am facing following issue:
When I call API, Lets consider there are 2000 records and I am fetching in batch of 100.
Suppose I want change status created to active.
what happens is first server taking first 100 records and I am updating a status from created to active for each record.
Parallelly second, third and fourth server fetching 100 records each.
When other servers taking same entry it is not waiting to update and save record process of first server.
Other 3 server getting status as created only because it does not wait to update records.
Example:
id: 1, status: created
id: 2, status: created
First server(Checks if status is created and it's true) - Fetched id 1 (updating status to active > save > show on UI)
At the same time
2nd server taking same record check status for id-1 and
getting status is created because first server is not finished with id-1 yet.
thats why it is again fetching same record (id 1 > Fetch records -> updating status > save > show on UI)
Please help me to solve this issue?
You're fighting yourself here by using the fanout. If you want to broadcast requests to all workers simultaneously, that's what fanout can do, but if you want the workers to pool together and process each task once, they can all service a single queue.
Try attaching all the workers to a single queue, and inject into that queue as necessary. Each worker will pull out jobs from the queue, and only one worker will be assigned to each job.
The fanout model is used for situations where all workers must be notified of something, like a configuration change, a message they might be interested in, etc.
Related
Let’s say I have a microservice that needs to generate millions of reports with even more rows of data.
Business rules:
One client generates 0 to many reports on a single run
Many clients can be generating reports in a single
Any request to generate a report for a client that is currently processing should throw an error
The reports are generated on a schedule.
The schedule is stored in the database of the microservice (a) for each client. The schedule is managed by a separate microservice (b) and the data is replicated via integration events to microservice a.
Ex:
Client A, Schedule = today
Client B, Schedule = 3 days from now
Only client A will have a report generated.
Now, let’s say the microservice gets a request to generate all reports for clients configured to generate today. Since it has to generate millions of reports, we want it to horizontally scale.
However, I’m having a hard time identifying a great way to do this. Some ideas:
Only let one instance of the microservice a retrieve the clients that need to generate today. This can be polled in case that service fails and another can pick it up.
Insert this data into a shared cache
or into a topic or queue
that all other instances will process from. Scale based on the number of
messages in the topic.
Let another microservice (b) make the request for generation and pass in each request into a topic or queue that microservice (a) reads. However this introduces a dependency between services and can cause some data ownership ambiguities
I'm new to jobs and queues.
At the moment, I'm only really using the ->later() method on a Mail. This places each mail on the default queue.
There are instances where I need cancel jobs on the queue related to a specific model ID. I don't really see any reference to deleting pending jobs in the queue - only deleting / clearing failed.
In Telescope, there are tags showing the Model IDs associated with each pending job.
There are a few things I was hoping to do:
Delete all jobs associated with a specific model ID
Listen for the execution of a job based on a specific model ID, so I may update the database table with the date/timestamp of when the job actually executed. (users can queue emails to send hours in advance and I'd like to log when their customer actually receives the email)
Remove record associated with job since it should not exist if the email didn't actually get sent.
Hoping for some advice on how to solve this problem of needing to manage jobs in this fashion.
I'm using Redis if that makes any difference.
I'm little confused on how the server.connection-timeout property will work on a spring boot REST API project
I have a Spring boot REST API Project in which I have a delete REST API, this will basically do couple of delete operation on a Database table say for example this delete API will delete some rows on 3 tables as following
Delete API gets "customer Id" as Input and execution the following
Delete all records matching the customer Id in Table A (delete call to an external DB)
Delete all records matching the customer Id in Table B (delete call to an external DB)
Delete all records matching the customer Id in Table C (delete call to an external DB)
my question here is if I set "server.connection-timeout" to 5 Seconds what does it actually means?
I have 2 two assumptions
The delete Rest Api will timeout in 5 Seconds meaning all the 3 external DB call has to be done within the 5 Seconds if not the REST API will timeout
Each external DB call will have 5 Seconds timeout, in this case 15 Seconds totally
In worst case if all the 3 External DB call takes 4 Seconds then the Delete API will take 12 Seconds to respond - is this a valid one?
I think you are confusing. server.connection-timeout – is the time that connectors wait for another HTTP request before closing the connection.
It doesn't matter how much time it takes to complete the request.
In your case if server.connection-timeout is 5, this will not effect #1 #2 or #3 deletes which you mentioned.
In a simple terms connection-timeout does not apply to long running requests. Instead It applies to the initial connection, when the server waits for the client to request something.
Default: the connector’s container-specific default is used. Use a value of -1 to indicate infinite timeout.
I have unique problem trying to see what is the best implementation for this.
I have table which has half million rows. Each row represents
business entity I need to fetch information about this entity from
internet and update back on the table asynchronously
. (this process takes about 2 to 3 minutes) .
I cannot get all these rows updated efficiently with 1 instance of
microservices. so planning to scale this up to multiple instances
my microservice instances is async daemon fetch business entity 1 at time and process the data & finally update the data back to the table.
. Here is where my problem between multiple instances how do I ensure no 2 microservice instance works with same business entity (same row) in the update process? I want to implement an optimal solution microservices probably without having to maintain any state on the application layer.
You have to use an external system (Database/Cache) to save information about each instance.
Example: Shedlock. Creates a table or document in the database where it stores the information about the current locks.
I would suggest you to use a worker queue. Which looks like a perfect fit for your problem. Just load the whole data or id of the data to the queue once. Then let the consumers consume them.
You can see an clear explanation here
https://www.rabbitmq.com/tutorials/tutorial-two-python.html
I have a ruby daemon that selects 100 records from database and do a task with it.
To make it faster I usually create 3 instances of the same daemon. And each one selects diferents data by using mysql LIMIT and OFFSET.
The problem is that sometimes a task is performed 2 or 3 times with the same data record.
So I think that trusting only on database LIMIT and OFFSET is not enough ... since 2 or more daemons can actually collects the same data at the same time sometimes.
How can I do it safely? Avoiding 2 instances to select the same data
Daemon 1 => selects records from 1 to 100
Daemon 2 => selects records from 101 to 200
Daemon 3 => selects records from 201 to 300
Rather than rolling your own solution, you might want to look at existing solutions for processing background jobs like Resque (a personal favorite). With Resque, you would queue a job for each of your rows using a trigger that makes sense in your application (it's hard to say without any context) for example a link on your website. At all times you would keep X number of workers running (three in your case) and Resque will do the queue management work for you. Resque uses Redis as a backend, so it supports atomic push/pop out of the gate (no more double-processing).
Resque also comes with a very intuitive and easy to use web interface for monitoring your jobs and workers.