Python Requests - Batch API calls - performance

I am working on an API to push records from a database to endpoint.
I have managed to write a piece of code which grabs the records, it's parsing them and finally pushing to API and works just fine.
It manages to push around 400 req/min and I was wondering if I can batch this requests to make it a bit more performant, but I can't wrap my head around how this can be achieved.
The API call url is:
http://call-url.com/service/{id}
Consider payload for id:
id = 101
{
"stars":3
"position":5
"date":"2002-04-30T10:00:00+00:00"
}
url = http://call-url.com/service/101
I am using python and requests module to push the records.
As we speak I am grabbing the records and parsing load for each individual id and pushing them.
I bumped into asyncio and threading so far, but I need to ensure I don't push the same request twice.
Is it possible to push more than 1 record at once?
Thank you,

You can utilize any AMQP message broker. RabbitMQ for example. The concept is simple, just check the tutorial. Split your script to main.py (read DB, prepares payloads and push them to the queue) and worker.py (get payload from the queue and send to API), and then just spawn as many worker.py processes as you need...

Related

Spring Batch Remote Chunking Chunk Response

I have implemented Spring Batch Remote Chunking with Kafka. I have implemented both Manager and worker configuration. I want to send some DTO or object in chunkresponse from worker side to Manager and do some processing once I receive the response. Is there any way to achieve this. I want to know the count of records processed after each chunk is processed from worker side and I have to update the database frequently with count.
I want to send some DTO or object in chunkresponse from worker side to Manager and do some processing once I receive the response. Is there any way to achieve this.
I'm not sure the remote chunking feature was designed to send items from the manager to workers and back again. The ChunkResponse is what the manager is expecting from workers and I see no way you can send processed items in it (except probably serializing the item in the ChunkResponse#message field, or storing it in the execution context, which both are not good ideas..).
I want to know the count of records processed after each chunk is processed from worker side and I have to update the database frequently with count.
The StepContribution is what you are looking for here. It holds all the counts (read count, write count, etc). You can get the step contribution from the ChunkResponse on the manager side and do what is required with the result.

WebFlux - handle each item asynchronously before returning

I am fairly new to WebFlux and I am looking for what seems to be a pretty normal usage pattern. Basically what I have is a Spring Controller which returns a Flux< A > (where A is a row fetched from the DB using R2DBC). I want to do an async operation on each received object (for instance I want to send a push notification for each object, for which I also need to make a call to the DB for the users push token and then send the push). The operations should be done asynchronously, so the API end-users receive their data with no delay. Is there some pattern for this already?

Do we need complex sending strategies for GCM / FCM?

Currently I'm working on a SaaS with support for multiple tenants that can enable push notifications for their user-bases.
I'm thinking of using a message queue to store all pushes and send them with a separate service. That new service would need to read from the queue and send the push notifications.
My question now is: Do I need to come up with a complex sending strategy? I know that with GCM has a limit of 1000 devices per request, so this needs to be considered. I also can't wait for x pushes to fly in as this might delay a previous push from being sent. My next thought was to create a global array and fill it with pushes from the queue. A loop would then fetch that array every say 1 second and send pushes. This way pushes would get sent for sure and I wouldn't exceed the 1000 devices limit.
So ... although this might work I'm not sure if an infinite loop is the best way to go. I'm wondering if GCM / FCM even has a request limit? If not, I wouldn't need to aggregate the pushes in the first place and I could ditch the loop. I could simply fire a request for each push that gets pulled from the queue.
Any enlightenment on this topic or improvement of my prototypical algorithm would be great!
Do I need to come up with a complex sending strategy?
Not really. GCM/FCM is pretty simple enough. Just send the message towards the GCM/FCM server and it would queue it on it's own, then (as per it's behavior) send it as soon as possible.
I know that with GCM has a limit of 1000 devices per request, so this needs to be considered.
I think you're confusing the 1000 devices per request limit. The 1000 devices limit refers to the number of registration tokens you add in the list when using the registration_ids parameter:
This parameter specifies a list of devices (registration tokens, or IDs) receiving a multicast message. It must contain at least 1 and at most 1000 registration tokens.
This means you can only send to 1000 devices with the same message payload in a single request (you can then do a batch request (1000/each request) if you need to).
I'm wondering if GCM / FCM even has a request limit?
AFAIK, there is no such limit. Ditch the loop. Whenever you successfully send a message towards the GCM/FCM server, it will enqueue and keep the message until such time that it is available to send.

Ruby Sockets and parallel event handling

I'm writing a library that can interact with a socket server that transmits data as events to certain actions my library sends it.
I created an Actions module that formats the actions so that the server can read it. It also generates an action_id, because the events parser can identify it with the action that sent it. There are more than one event per action possible.
While I'm sending my action to the server, the event parser is still getting data from the server, so they work independent from each other (but then again they do work together: events response aggregator triggers the action callback).
In my model, I want to get a list of some resource from the server. The server sends its data one line at a time, but that's being handled by the events aggregator, so don't worry about that.
Okay, my problem:
In my model I am requesting the resources, but since the events are being parsed in another thread, I need to do a "infinite" loop that checks if the list is filled, and then break out to return it to the consumer of the model (e.g. my controller).
Is there another (better) way of doing this or am I on the right track? I would love your thoughts :)
Here is my story in code: https://gist.github.com/anonymous/8652934
Check out Ruby EventMachine.
It's designed to simplify this sort of reactor pattern application.
It depends on the implementation. In the code you provide you're not showing how actually the request and responses are processed.
If you know exactly the number of responses you're supposed to receive, in each one you could check if all are completed, then execute an specific action. e.g.
# suppose response_receiver is the method which receives the server response
def response_receiver data
#responses_list << data
if #response_list.size == #expected_size
# Execute some action
end
end

What is the right approach for an async work queue with results?

I have a REST server on heroku. It will have N-dynos for the REST service and N-dynos for workers.
Essentially, I have some long running rest requests. When these come in I want to delegate them to one of the workers and give the client a redirect to poll the operation and eventually return the result of the operation.
I am going to use JEDIS/REDIS from RedisToGo for this. As far as I can tell there are two ways I can do this.
I can use the PUB/SUB functionality. Have the publisher create unique identities for the work results and return these in a redirect URI to the REST client.
Essentially the same thing but instead of PUB/SUB use RPUSH/BLPOP.
I'm not sure what the advantage is to #1. For example, if I have a task called LongMathOperation it seems like I can simply have a list for this. The list elements are JSON objects that have the math operation arguments as well as a UUID generated by the REST server for where the results should be placed. Then all the worker dynos will just have blocking BLPOP calls and the first one there will get the job, process it, and put the results in REDIS using the key of the UUID.
Make sense? So my question is "why would using PUB/SUB be better than this?" What does PUB/SUB bring to the table here that I am missing?
Thanks!
I would also use lists because pubsub messages are not persistent. If you have no subscribers then the messages are lost. In other words, if for whatever reason you do not have any workers listening then the client won't get served properly. Lists are persistent on the other hand. But pubsub does not take as much memory as lists obviously for the same reason: there is nothing to store.

Resources