Process a stream of sessions on aws - aws-lambda

Is there a way to implement somethong like Flink's session-window on aws with lambda and some way of managing messages?
We have a stream of small events with a session id. We cannot guarantee the order of the arriving events and we don't always have a session-finished event. We know that session ids are unique. We also know that when a session is finished it won't be restarted. We also know that when the session is active we will receive a message every minute or so. We need to process the entire session as a whole.
We want to wait for a silent time of X minutes, and if no messages arrive we will process the entire session as a whole.
This is exactly what Flink's silent window does, is there a way to do the same thing purely using aws lambda and it's triggers?
There can be 10s of millions of sessions at the same time

It's not possible with an AWS Lambda.
Lambdas are stateless, they are able to process messages one by one, but cannot offer any processing over a sequence of messages, which would be required for the kind of windowing logic you describe.
Maybe an option for you would be Kinesis Data Analytics? Under the hood, this one is actually Flink, although it's provided as a managed service by AWS, so maybe you'll get there the "lambda-like" experience you're looking for?

Related

What is the best way to share events between Google cloud run containers

I have a service which is running on many cloud run containers.
When a single container (A) receives a web request to do some work, I need all the other live containers to fetch some updated data from elasticsearch.
I would have expected ES to have a "listening" type of connection such as firebase but this is not possible.
Right now I am having to poll the database from each service.
Is there a better way to achieve this sort of cross container sync when using cloud run? Would pub/sub be the best solution here?
It's unusual but not impossible to achieve.
First of all, you have to understand the instance life cycle: the CPU is allocated only when a request is being processed. Else, the CPU is throttle ( bellow 5%). That's also for that you pay only when your instance is processing, and not when the instance is kept warm (and offloaded after a while).
That being said, it's totally useless and inefficient to update instances in background when a request is not being processed.
Therefore, the idea is to perform something when the instance receive a request. The bad thing is that this solution will increase the request latency (the instance start to sync his cache and then process the request).
Finally the solution is to store, somewhere, the latest cache update. You have to keep that pretty same information in your instance. When the instance receive a request, first thing, it compares its own cache date with the central data date.
If it's the same, no problem, continue the processing.
If the central data date is after the current instance date, update the instance data, and then process the request.
You can store the data, and the date of that data in Firestore for instance, or in MemoryStore, or in any other databases.
PubSub can be also a solution but more complex to implement. Each instance, when they start have to create a pull subscription on a topic. When the instance is killed, you have to delete that subscription.
Then, when a request comes in, your instance have to pull the subscription, and get the messages, if any, and update his local cache.
Could be faster than the previous solution, but harder to implement.

Should we store Events in a database? (Event Driven Design)

We have several services that publishes and subscribes to Domain Events. What we usually do is log events whenever we publish and log events whenever we process events. We basically use this to apply choreography pattern.
We are not doing Event Sourcing in these systems, and there's no programmatic use for them after publishing/processing. That's the main driver we opted not to store these in a durable container, like a database or event store.
Question is, are we missing some fundamental thing by doing this?
Is storing Events a must?
I consider queued messages as system messages, even if they represent some domain event in an event-driven architecture (pub/sub messaging).
There is absolutely no hard-and-fast rule about their storage. If you would like to keep them around you could have your messaging mechanism forward them to some auditing endpoint for storage and then remove them after some time (if necessary).
You are not missing anything fundamental by not storing them.
You're definitely not missing out on anything (but there is a catch) especially if that's not a need by the business. An Event-Sourced System would definitely store all the events generated by the system into a database (or any other event-store)
The main use of an event store is to be able to restore the state of the system to the current state in case of a failure by replaying messages. To make this process of recovery faster we have snapshots.
In your case since these events are just are only relevant until the process is completed, it would not make sense to store them until you have a failure. (this is the catch) especially in a Distributed Transaction case scenario.
What I would suggest?
Don't store the event themselves but log the relevant details about these events and maybe use an ELK stack or Grafana to store these logs.
Use either the Saga Pattern or the Routing Slip pattern in case of a Distributed Transaction and log them as well.
In case a failure occurs while processing an event, put that event into an exception queue and handle it. If it's a part of a distributed transaction make sure either they all have the same TransactionId or they have a CorrelationId so you can lookup for logs and save your system.
For reliably performing your business transactions in a distributed archicture you somehow need to make sure that your events are published at least once.
So a service that publishes events needs to persist such an event within the same transaction that causes it to get created.
Considering you are publishing an event via infrastructure services (e.g. a messaging service) you can not rely on it being available all the time.
Also, your own service instance could go down after persisting your newly created or changed aggregate but before it had the chance to publish the event via, for instance, a messaging service.
Question is, are we missing some fundamental thing by doing this? Is storing Events a must?
It doesn't matter that you are not doing event sourcing. Unless it is okay from the business perspective to sometimes lose an event forever you need to temporarily persist your event with your local transaction until it got published.
You can look into the Transactional Outbox Pattern to achieve reliable event publishing.
Note: Logging/tracking your events somehow for monitoring or later analyzing/reporting purpose is a different thing and has another motivation.

Microservice failure Scenario

I am working on Microservice architecture. One of my service is exposed to source system which is used to post the data. This microservice published the data to redis. I am using redis pub/sub. Which is further consumed by couple of microservices.
Now if the other microservice is down and not able to process the data from redis pub/sub than I have to retry with the published data when microservice comes up. Source can not push the data again. As source can not repush the data and manual intervention is not possible so I tohught of 3 approaches.
Additionally Using redis data for storing and retrieving.
Using database for storing before publishing. I have many source and target microservices which use redis pub/sub. Now If I use this approach everytime i have to insert the request in DB first than its response status. Now I have to use shared database, this approach itself adding couple of more exception handling cases and doesnt look very efficient to me.
Use kafka inplace if redis pub/sub. As traffic is low so I used Redis pub/sub and not feasible to change.
In both of the above cases, I have to use scheduler and I have a duration before which I have to retry else subsequent request will fail.
Is there any other way to handle above cases.
For the point 2,
- Store the data in DB.
- Create a daemon process which will process the data from the table.
- This Daemon process can be configured well as per our needs.
- Daemon process will poll the DB and publish the data, if any. Also, it will delete the data once published.
Not in micro service architecture, But I have seen this approach working efficiently while communicating 3rd party services.
At the very outset, as you mentioned, we do indeed seem to have only three possibilities
This is one of those situations where you want to get a handshake from the service after pushing and after processing. In order to accomplish the same, using a middleware queuing system would be a right shot.
Although a bit more complex to accomplish, what you can do is use Kafka for streaming this. Configuring producer and consumer groups properly can help you do the job smoothly.
Using a DB to store would be a overkill, considering the situation where you "this data is to be processed and to be persisted"
BUT, alternatively, storing data to Redis and reading it in a cron-job/scheduled job would make your job much simpler. Once the job is run successfully, you may remove the data from cache and thus save Redis Memory.
If you can comment further more on the architecture and the implementation, I can go ahead and update my answer accordingly. :)

Notifying golongpoll.SubscriptionManager of an event from kafka-go

I was writing a POC on long-polling using go.
I see the general package to be used is https://github.com/jcuga/golongpoll .
But assuming that I would want to publish an event to the golongpoll.SubscriptionManager from a general context, especially when there is a possibility that the long poll API request is being served by one machine, while the Kafka event for that particular consumer group is consumed by another instance in the cluster.
The examples given in the documentation did not talk of such a scenario at all, even though this seems like a common scenario. One way I can think of is have a distributed cache like Redis in between and have all the services poll this for a change? But that sounds a bit dumb to me.

Service synchronization issue

I've created two services.
One of them (scheduler) only requests to the other (backoffice) for performing some "large" operations.
When backoffice receives a request:
first creates a mark (key on redis) in order to set that the process has started.
Each time a request is reached:
backoffice checks if the mark exist.
When it exists means that the previous process has not yet finished, and escape it.
Perform the large process.
When process is finished, the previous key in redis is removed.
It would be something like this:
if (key exists)
return;
make long process... (1);
remove key;
The problem arises when service is destroyed when the process has not already finished and then it doesn't removes the mark on redis. It means the process will never run again.
Is there any way to solve this kind of problems?
The way to solve this problem is use an existing engine as building custom scalable and robust solution for reliable service orchestration is really hard.
I recommend looking at Uber Cadence Workflow which would allow to convert your pseudocode into a real production application with minor changes.
You can fire a background job that updates timestamp under the key, e.g. every minute.
When service attempts to start the process it must verify key existence (as it does now) + timestamp under the key. If it is more than 1 minute ago then the previous attempt is stale and you can start over.
Sounds like you should be using a messaging queue to schedule tasks for the back office service. Queuing solutions like RabbitMQ allow you to manually acknowledge (or “ack”) that the process is complete. Whenever a subscriber crashes, the queue detects that the connection dropped without acknowledgement and will re-enqueue the same task which will be picked up by the next available subscriber. Here’s another thread talking about this problem specifically focused on messaging queues:
What happens to fetched messages when RabbitMQ consumer crashes?

Resources