Multiple queues vs multiple jobs in resque - resque

I am using resque to background process two types of jobs:
(1) 3rd-party API requests
(2) DB query and insert
While the two jobs can be processed parallely, each job type in itself can only be processed in serial order. For example, DB operations need to happen in serial order but can be executed in parallel with 3rd party API requests.
I am contemplating either of the following methods for executing this :
(1) Having two queues with one queue handling only API requests and the other queue
handling only db queries. Each queue will have its own worker.
(2) One single queue but two workers. One worker for each job.
I would like to know the difference in the two approaches and which among the two would be a better approach to take.

This choice of selecting an architecture is not straight forward, you have to keep many things in mind.
Having two queues with one queue handling only API requests and the other queue
handling only db queries. Each queue will have its own worker.
Ans: You can have this architecture when you have both the queues equally busy i.e if you have one queue with more number of jobs and other empty then your one worker will be idol and there will be jobs waiting in other queue.
You should always think about full utilisation of your workers.
One single queue but two workers. One worker for each job.
Ans: This approach is what we also use in our project.
Having all jobs en-queued to one queue and have multiple workers running on it.
Your workers will always be busy no matter which type job is present.
proper worker utilisation is possible.
At last I would suggest you can use 2nd approach.

Related

Achieve concurrency in Kafka consumers

We are working on parallelising our Kafka consumer to process more number of records to handle the Peak load. One way, we are already doing is through spinning up as many consumers as many partitions within the same consumer group.
Our Consumer deals with making an API call which is synchronous as of now. We felt making this API call asynchronous will make our consumer handle more load. Hence, we are trying to making the API call Asynchronous and in its response we are increasing the offset. However we are seeing an issue with this:
By making the API call Asynchronous, we may get the response for the last record first and none of the previous record's API calls haven't initiated or done by then. If we commit the offset as soon as we receive the response of the last record, the offset would get changed to the last record. In the meantime if the consumer restarts or partition rebalances, we will not receive any record before the last record we committed the offset as. With this, we will miss out the unprocessed records.
As of now we already have 25 partitions. We are looking forward to understand if someone have achieved parallelism without increasing the partitions or increasing the partitions is the only way to achieve parallelism (to avoid offset issues).
First, you need to decouple (if only at first) the reading of the messages from the processing of these messages. Next look at how many concurrent calls you can make to your API as it doesn't make any sense to call it more frequently than the server can handle, asynchronously or not. If the number of concurrent API calls is roughly equal to the number of partitions you have in your topic, then it doesn't make sense to call the API asynchronously.
If the number of partitions is significantly less than the max number of possible concurrent API calls then you have a few choices. You could try to make the max number of concurrent API calls with fewer threads (one per consumer) by calling the API's asynchronously as you suggest, or you can create more threads and make your calls synchronously. Of course, then you get into the problem of how can your consumers hand their work off to a greater number of shared threads, but that's exactly what streaming execution platforms like Flink or Storm do for you. Streaming platforms (like Flink) that offer checkpoint processing can also handle your problem of how to handle offset commits when messages are processed out of order. You could roll your own checkpoint processing and roll your own shared thread management, but you'd have to really want to avoid using a streaming execution platform.
Finally, you might have more consumers than max possible concurrent API calls, but then I'd suggest that you just have fewer consumers and share partitions, not API calling threads.
And, of course, you can always change the number of your topic partitions to make your preferred option above more feasible.
Either way, to answer your specific question you want to look at how Flink does checkpoint processing with Kafka offset commits. To oversimplify (because I don't think you want to roll your own), the kafka consumers have to remember not only the offsets they just committed, but they have to hold on to the previous committed offsets, and that defines a block of messages flowing though your application. Either that block of messages in its entirety is processed all the way through or you need to rollback the processing state of each thread to the point where the last message in the previous block was processed. Again, that's a major oversimplification, but that's kinda how it's done.
You have to look at kafka batch processing. In a nutshell: you can setup huge batch.size with a little number (or even single) of partitions. As far, as whole batch of messages consumed at consumer side (i.e. in ram memory) - you can parallelize this messages in any way you want.
I would really like to share links, but their number rolls over the web hole.
UPDATE
In terms of committing offsets - you can do this for whole batch.
In general, kafka doesn't achieve target performance requirements by abusing partitions number, but rather relying on batch processing.
I already saw a lot of projects, suffering from partitions scaling (you may see issues later, during rebalancing for example). The rule of thumb - look at every available batch setting first.

multiple consumers per kinesis shard

I read you can have multiple consumer apps per kinesis stream.
http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html
however, I heard you can only have on consumer per shard. Is this true? I don't find any documentation to support this, and can't imagine how that could be if multiple consumers are reading from the same stream. Certainly, it doesn't mean the producer needs to repeat content in different shards for different consumers.
Kinesis Client Library starts threads in the background, each listens to 1 shard in the stream. You cannot connect to a shard over multiple threads, that is by-design.
http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-record-processor-scaling.html
For example, if your application is running on one EC2 instance, and
is processing one Amazon Kinesis stream that has four shards. This one
instance has one KCL worker and four record processors (one record
processor for every shard). These four record processors run in
parallel within the same process.
In the explanation above, the term "KCL worker" refers to a Kinesis consumer application. Not the threads.
But below, the same "KCL worker" term refers to a "Worker" thread in the application; which is a runnable.
Typically, when you use the KCL,
you should ensure that the number of instances does not exceed the
number of shards (except for failure standby purposes). Each shard is
processed by exactly one KCL worker and has exactly one corresponding
record processor, so you never need multiple instances to process one
shard.
See the Worker.java class in KCL source.
Late to the party, but the answer is that you can have multiple consumers per kinesis shard. A KCL instance will only start one process per shard, but you can have another KCL instance consuming the same stream (and shard), assuming the second one has permission.
There are limits, though, as laid out in the docs, including:
Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second.
If you want a stream with multiple consumers where each message will be processed once, you're probably better off with something like Amazon Simple Queue Service.
to keep it simple, you can have multiple/different lambda functions get triggered on kinesis data. this way your both the lambdas are going to get all the data from the kinesis. The downside is that now you will have to increase the throughput at the kinesis level which is going to pricey. Use SQS instead for your use case.

Is out of order command queue useful on AMD GPU?

It seems to me that one opencl command queue won't dispatch commands to more than one hardware queue. So commands in an out of order command queue are still executed one by one, just not in the order they were enqueued?
So if I want to make use of multiple hardware queues all I can do is to create multiple opencl command queues?
OOO (out of order) queues are available to meet the needs of user event dependency. Having a single queue in this type of applications can lead to a blocked queue waiting to a user event that never comes. And creating one queue per job is also non optimal.
If you want parallelism int the execution, OOO is NOT what you need. But multiple queues.
A common approach is to use a Queue for IO, and a queue for running kernels.
But you can also use a queue per thread, in a multi-thread processing scheme. IO of each thread will overlap the execution of other threads.
NOTE: nVIDIA does support parallel execution of jobs in a single queue, but that is out of the standard.

Suggestion for Oracle AQ dequeue approach

I have a need to dequeue messages coming from an Oracle Queue on a continuous basis.
As far as I could imagine, we can deuque the message in two ways, either through Asyncronous Auto-Notification approach or by manual polling process where one can dequeue one message at a time.
I can't go for Asyncronous notification feature as the number of messages it receives could go upto 1000 within 5 mintues during peak hours and
I do not want to overload the database by spawning multiple callback procedures in the background.
With the manual polling process,I can create a one-time scheduler job that runs 24*7 which calls a stored proc that dequeus the messages in a loop in WAIT mode(kind of listening for a message).
The problem with this approach is that
1) the scheduler job runs continously and occupies one permanent job slot
2) the stored procedure does not EXIT as it runs in a loop waiting for messages.
Are there any alternative/better solutions where I do not need to have a job/procedure running continuously looking for messages?
Can I use auto-notification approach to get notification for the very first message,un-subscribe the subscriber and dequeue further messages and
subscribe to the queue again when there are no more messages ? Is this a safe approach and will i lose any message in between subscription and un-subscription ?
BTW, We use Oracle 10gR2 database, so I can't use PURGE ON NOTIFICATION option.
Appreciate your expert solution!!
You're right, it's not a good idea to use auto-notification for a high-volume queue.
At one client I've seen a one-time scheduler job which runs 24*7, it seems to work reasonably well, and they can enqueue a special "STOP" message (which goes to the top of the queue) that it listens for and stops processing messages.
However, generally I'd lean towards a job that runs regularly (e.g. once per minute, or whatever granularity is suitable for you) which would dequeue all the messages. I'd put the dequeue in a loop with a loop counter and a "maximum messages" limiter based on the maximum number of messages you'd expect in a 1-minute period. The job would keep processing messages until (a) there are no more messages in the queue, or (b) the maximum limit has been reached.
You can then set the schedule for the job based on the maximum delay you want to see between an enqueue and a dequeue. E.g. if it doesn't matter if a message isn't processed within 5 minutes, you could set the job to run once every 5 minutes.
The maximum limit needs to be quite a high figure - e.g. 10x or 100x the expected maximum number - otherwise a spike could flood your queue and it might not keep up. The idea of the maximum limit is to ensure that the job never runs forever. This should give ops enough time to detect a problem with the queue (e.g. if some rogue process is flooding the queue with bogus messages).

Storm as a replacement for Multi-threaded Consumer/Producer approach to process high volumes?

We have a existing setup where upstream systems send messages to us on a Message Queue and we process these messages.The content is xml and we simply unmarshal.This unmarshalling step is followed by a write to db (to put relevant values onto relevant columns).
The system is set to interface with many more upstream systems and our volumes are going to increase to a peak size of 40mm per day.
Our current way of processing is have listeners on the queues and then have a multiple threads of producers and consumers which do the unmarshalling and subsequent db write.
My question : Can this process fit into the Storm use case scenario?
I mean can MQ be my spout and I have 2 bolts one to unmarshal and this then becomes the spout for the next bolt which does the write to db?
If yes,what is the benefit that I can derive? Is it a goodbye to cumbersome multi threaded producer/worker pattern of code.
If its as simple as the above then where/why would one want to resort to the conventional multi threaded approach to producer/consumer scenario
My point being is there a data volume/frequency at which Storm starts to shine when compared to the conventional approach.
PS : I'm very new to this and trying to get a hang of this and want to ascertain if the line of thinking is right
Regards,
CVM
Definitely this scenario can fit into a storm topology. The spouts can pull from MQ and the bolts can handle the unmarshalling and subsequent processing.
The major benefit over conventional multi threaded pattern is the ability to add more worker nodes as the load increases. This is not so easy with traditional producer consumer patterns.
Specific data volume number is a very broad question since it depends on a large number of factors like hardware etc.

Resources