Using Akka.net / Actor System for an ETL process - etl

I'm new in the world of actor modeling and I am in love with the idea. But does some pattern exists for processing a batch of messages simply for bulk storage in a safe manner?
I'm afraid if I read 400 messages of expected 500 and put them in a list, if the system closes, I don't want to lose those 400 messages from the (persisted)
mailbox. In a service bus world you could ask for a batch of messages and only when processed, commit all of them. Thank you.

You may want to combine your actor system with some service bus/reliable queues, like RabbitMQ or Azure Service Bus, at use actor system only for message processing.
From within Akka.NET itself, you have persistence extension, which can be used for storing actor state in persistent backend of your choice. It also contains a dedicated kind of an actor, AtLeastOnceDeliveryActor that may be used to resend messages until they will be confirmed.

you can extend split and aggregate in your ESB to do it, I made something similar with mule ESB from long time.

Related

Read data directly from REST Api with Kafka

Is such a situation even possible ? :
There is an application "XYZ" (in which there is no Kafka) that exposes a REST api. It is a SpringBoot application with which Angular application communicates.
A new application (SpringBoot) is created which wants to use Kafka and needs to fetch data from "XYZ" application. And it wants to do this using Kafka.
The "XYZ" application has an example endpoint [GET] api/message/all which displays all messages.
Is there a way to "connect" Kafka directly to this endpoint and read data from it ? In short, the idea is for Kafka to consume data directly from the EP. Communication between two microservices, where one microservice does not have a kafka.
What suggestions do you have for solving this situation. Because I guess this option is not possible. Is it necessary to add a publisher in application XYZ which will send data to the queue and only then will they be available for consumption by a new application ??
Getting them via the REST-Interface might not be a very good idea.
Simply put, in the messaging world, message delivery guarantees are a big topic and the standard ways to solve that with Kafka are usually
Producing messages from your service using the Producer-API to a Kafka topic.
Using Kafka-Connect to read from an outbox-table.
Since you most likely have a database already attached to your API-Service, there might arise the problem of dual writes if you choose to produce the messages directly to a topic. What this means, is that writes to a database might fail while it might be successfully written to Kafka/vice-versa. So you can end up with inconsistent states. Depending on your use case this might be a problem or not.
Nevertheless, to overcome that, the outbox pattern can come in handy.
Via the outbox pattern, you'd basically write your messages to a table, a so-called outbox-table, and then you'd use Kafka-Connect to poll this table of the database. Kafka Connect is basically a cluster of workers that consume this database table and forward the entries of the table to a Kafka topic. You might want to look at confluent cloud, they offer a fully managed Kafka-Connect service. Like this you don't have to manage the cluster of workers yourself. Once you have the messages in a Kafka topic, you can consume them with the standard Kafka Consumer-API/ Stream-API.
What you're looking for is a Source-Connector.
A source connector for a specific database. E.g. MongoDB
E.g. https://www.confluent.io/hub/mongodb/kafka-connect-mongodb
For now, most source-connectors produce in an at-least-once fashion. This means that the topic you configure the connector to write to might contain a message twice. So make sure that if you need them to be consumed exactly once, you think about deduplicating these messages.

Difficulty Understanding Event Sourcing Microservice Event Receiving/Communication

I've been aware of event sourcing, CQRS, DDD and micro services for a little while and I'm now at that point where I want to try and start implementing stuff and giving something a go.
I've been looking into the technical side of CQRS and I understand the DDD concepts in there. How both the write side handles commands from the UI and publishes events from it, and how the read side handles events and creates projections on them.
The difficulty I'm having is the communication & a handling events from service-to-service (both from a write to read service and between micro services).
So I want to focus on eventstore (this one: https://eventstore.com/ to be less ambiguous). This is what I want to use as I understand it is a perfect for event sourcing and the simple nature of storing the events means I can use this for a message bus as well.
So my issue falls into two questions:
Between the write and the read, in order for the read side to receive/fetch the events created from the write side, am i right in thinking something like a catch up subscription can be used to subscribe to a stream to receive any events written to it or do i use something like polling to fetch events from a given point?
Between micro services, I am having an even harder time... So when looking at CQRS tutorials/talks etc... they always seem to talk with an example of an isolated service which receives commands from the UI/API. This is fine. I understand the write side will have an API attached to it so the user can interact with it to perform commands. E.g. create a customer. However... say if I have two micro services, e.g. a order micro service and an shipping micro service, how does the shipping micro service get the events published from the order micro service. Specifically, how does those customer events, translate to commands for the shipping service.
So let's take a simple example of: - Command created from the order's API to place an order. - A OrderPlacedEvent is published to the event store. How does the shipping service listen and react to this is it need to then DispatchOrder and create ain turn an OrderDispatchedEvent.
Does the write side of the shipping microservice then need to poll or also have a catch up subscription to the order stream? If so how does an event get translated to an command using DDD approach?
something like a catch up subscription can be used to subscribe to a stream to receive any events written to it
Yes, using catch-up subscriptions is the right way of doing it. You need to keep the stream position of your subscription persisted somewhere as well.
Here you can find some sample code that works. I am not posting the whole snippet since it is too long.
The projection service startup flow is:
Load the checkpoint (first time ever it would be the stream start)
Subscribe to the stream from that checkpoint
The runtime flow will then be:
The subscription will then call the function you provide when it receives an event. There's some plumbing there to do, like if you subscribe to $all, you need to filter out system events (it will be easier in the next version of Event Store)
Project the event
Store the new checkpoint
If you make your projections idempotent, you can store the checkpoint from time to time and save some IO.
how does the shipping micro service get the events published from the order micro service
When you build a brand new system and you have a small team working on all the components, you can make a shortcut and subscribe to domain events from another service, as you'd do with projections. Within the integration context (between the boxes), ordering should not be important so you can use persistent subscriptions so you won't need to think about checkpoints. Event Store will do it for you.
Be aware that it introduces tight coupling on the domain event schema of the originating service. Your contexts will have the Partnership relationship or the downstream service will be a Conformist.
When you move forward with your system, you might decide to decouple those contexts properly. So, you introduce a stable event API for the service that publishes events for others to consume. The same subscription that you used for integration can now instead take care of translating domain (internal) events to integration (external) events. The consuming context would then use the stable API and the domain model of the upstream service will be free in iterating on their domain model, as soon as they keep the conversion up-to-date.
It won't be necessary to use Event Store for the downstream context, they could just as well use a message broker. Integration events usually don't need to be persisted due to their transient nature.
We are running a webinar series about Event Sourcing at Event Store, check our web site to get on-demand access to previous webinars and you might find interesting to join future ones.
The difficulty I'm having is the communication & a handling events from service-to-service (both from a write to read service and between micro services).
The difficulty is not your fault - the DDD literature is really weak when it comes to discussing the plumbing.
Greg Young discusses some of the issues of subscription in the latter part of his Polygot Data talk.
Eventide Project has documentation that does a decent job of explaining the principles behind how the plumbing fits things together.
Between micro services, I am having an even harder time...
The basic idea: your message store is fundamentally a database; when the host of your microservice wakes up, it queries the message store for messages after some checkpoint, and then feeds them to your domain logic (updating its own local copy of the checkpoint as needed).
So the host pulls a document with events in it from the store, and transforms that document into a stream of handle(Event) commands that ultimately get passed to your domain component.
Put another way, you build a host that polls the database for information, parses the response, and then passes the parsed data to the domain model, and writes its own checkpoints.

MassTransit Multiple Consumers

I have an environment where I have only one app server. I have some messages that take awhile to service (like 10 seconds or so) and I'd like to increase throughput by configuring multiple instances of my consumer application running code to process these messages. I've read about the "competing consumer" pattern and gather that this should be avoided when using MassTransit. According to the MassTransit docs here, each receive endpoint should have a unique queue name. I'm struggling to understand how to map this recommendation to my environment. Is it possible to have N instances of consumers running that each receive the same message, but only one of the instances will actually act on it? In other words, can we implement the "competing consumer" pattern but across multiple queues instead of one?
Or am I looking at this wrong? Do I really need to look into the "Send" method as opposed to "Publish"? The downside with "Send" is that it requires the sender to have direct knowledge of the existence of an endpoint, and I want to be dynamic with the number of consumers/endpoints I have. Is there anything built in to MassTransit that could help with the keeping track of how many consumer instances/queues/endpoints there are that can service a particular message type?
Thanks,
Andy
so the "avoid competing consumers" guidance was from when MSMQ was the primary transport. MSMQ would fall over if multiple threads where reading from the queue.
If you are using RabbitMQ, then competing consumers work brilliantly. Competing consumers is the right answer. Each competing consume will use the same receive from endpoint.

Sending files over MSMQ

In a retail scenario where each stores report their daily transaction to the backend system at the end of the day. Today a file consisting of the daily transactions and some other meta information is transferred from the stores to the backend using FTP. I’m currently investigating replacing FTP with something else. MSMQ has been suggested as an alternative transport mechanism. So my question is, do we need to write a custom windows service that sticks the daily transactions file into a message object and sends it on its way or is there any out the box mechanism in MSMQ to handle this?
Also, since the files we want to transfer can reach 5-6 Mb for large stores should we rule out MSMQ? In that case is there any other suggested technologies we should investigate?
Cheers!
NServiceBus provides a nice abstraction over MSMQ for situations like this. You get the reliable messaging aspects of MSMQ, along with a very nice programming model for defining your messages.
MSMQ is limited to a 4MB message size, however, and there are two ways you could deal with this in NServiceBus:
NServiceBus has a concept called the Data Bus, which takes the large attachments in your messages and transmits them reliably using another method. This is handled by the infrastructure and as far as your message handlers are concerned, the data is just there.
You could break up the payload into smaller atomic messages and send them as normal messages. The NServiceBus infrastructure would ensure that they all arrive at their destination and are processed. I would recommend this method unless it's absolutely critical that the entire huge data dump is processed as one atomic transaction.
One other thing to note is that the fact that you do nightly dumps is probably a limitation of a previous system. With NServiceBus it may be possible to change the system so that these bits of information are sent in a more immediate fashion, which will result in much more up-to-date data all the time, which may be a big win for the business.
You can look at IBM Sterling Managed File Transfer and WebSphere MQ Managed File Transfer products.
You can consider WebSphere MQ MFT if you require both messaging and file transfer capabilities. On the other hand if your requirement is just file transfer then you can look at Sterling MFT.
Sending files over a messaging transport is not trivial. If you put the entire file into a single message you can have the atomicity you need but tuning the messaging provider for wide variance in message sizes can be challenging. If all the files are of about the same size, one per message is about the simplest solution.
On the other hand, you can split the files into multiple messages but then you've got to reassemble them, in the right order, include a protocol to detect and resend missing segments, integrity-check the file received against the file sent, etc. You also probably want to check that the files on either end did not change during the transmission.
With any of these systems you also need the system to be smart enough to manage the disposition of sending and receiving files under normal and exception conditions, log the transfers, etc.
So when considering whether to move to messaging the two best options are either to move natively to messaging and give up files altogether, or to use an enterprise managed file transfer solution that runs atop the messaging provider that you choose. None of the off-the-shelf MFT products will cost as much in the long run as developing it yourself if you wish to do it right with robust exception handling and reporting.
If the stores are on separate networks and communicating over the internet, then MSMQ is not really an option. NServiceBus provides a concept of a gateway, which allows to asynchronously transport MSMQ messages over HTTP or HTTPS.

JMS Producer-Consumer-Observer (PCO)

In JMS there are Queues and Topics. As I understand it so far queues are best used for producer/consumer scenarios, where as topics can be used for publish/subscribe. However in my scenario I need a way to combine both approaches and create a producer-consumer-observer architecture.
Particularly I have producers which write to some queues and workers, which read from these queues and process the messages in those queues, then write it to a different queue (or topic). Whenever a worker has done a job my GUI should be notified and update its representation of the current system state. Since workers and GUI are different processes I cannot apply a simple observer pattern or notify the GUI directly.
What is the best way to realize this using a combination of queues and/or topics? The GUI should always be notified, but it should never consume anything from a queue?
I would like to solve this with JMS directly and not use any additional technology such as RMI to implement the observer part.
To give a more concrete example:
I have a queue with packages (PACKAGEQUEUE), produced by machine (PackageProducer)
I have a worker which takes a package from the PACKAGEQUEUE adds an address and then writes it to a MAILQUEUE (AddressWorker)
Another worker processes the MAILQUEUE and sends the packages out by mail (MailWorker).
After step 2. when a message is written to the MAILQUEUE, I want to notify the GUI and update the status of the package. Of course the GUI should not consume the messages in the MAILQUEUE, only the MailWorker must consume them.
You can use a combination of queue and topic for your solution.
Your GUI application can subscribe to a topic, say MAILQUEUE_NOTIFICATION. Every time (i.e at step 2) PackageProducer writes message to MAILQUEUE, a copy of that message should be published to MAILQUEUE_NOTIFICATION topic. Since the GUI application has subscribed to the topic, it will get that publication containing information on status of the package. GUI can be updated with the contents of that publication.
HTH

Resources