How to setup a domain model with Lagom? - microservices

I'm currently trying to build an application that handles personal finances. I'm struggling with Lagom ways of doing because I can't find any example of "real" application built with Lagom. I have to guess what are best practises and I'm constantly afraid of falling into pitfalls.
My case is the following: I have Users, Accounts and Transactions. Accounts belong to users but can be "shared" between them (with some sort of authorization system, one user is admin and other can read or edit the account). Transactions have an optional "debit" account, an optional "credit" account and an amount which is always positive.
The scenarios I was considering are the followings:
I consider that transactions belong to accounts and are parts of the account entity as a list of entries. In that scenario, a transfert transaction must have a "sister" entry in the other account. This seems easy to implement but I'm concerned by :
the potential size of the entity (and the snapshots). What happen if I have accounts that contain thousands of ten of thousands of transactions?
the duplication of the transaction in several accounts.
I consider that transactions have their own service. I that case I can use Kafka to publish events when transactions are recorded so the Account entity can "update" it's balance. In that case does it make sense to have a "balance" property in the entity or a read-side event listener for transaction events that update the read-database?
I can have two Persistent Entities in the same service but in that case I'm struggling with the read-side. Let say I have a transaction, I want to insert into the "transactions" table and update the "accounts" table. Should I have multiple read-side processors that listen to different events but write in the same db?
What do you think?

I think that you shouldn't have a different entity 'Transactions' because it is tightly coupled to the account entity, in fact, the transactions of an account is no more than the event log of this account. So I recommend persisting the balance with a unique transaction id and the id of the other account when it is a transfer transaction, and make the read processor to listen the events of the account changes to store them in the read model.
Doing this, a transfer is just a message between the two accounts that results in a modification of the balance that later will be persistent as part of the event log of each of them. This way seems more natural and you don't have to manage a sepparate aggregate root that, in addition, is tightly coupled to the account entities.


Event Sourcing and concurrent, contradictory events creation

I am having a hard time figuring this one out. Maybe you can help me.
Problem statement:
Imagine there is a system that records financial transactions of an account (like a wallet service). Transactions are stored in a database and each Transaction denotes an increase or decrease of the balance of a given amount.
On the application code side, when the User wants to purchase, all Transactions for his account are being pulled from the DB and the current balance is calculated. Based on the result, the customer has or has not sufficient funds for the purchase (the balance can never go below zero).
Transactions example:
ID userId amount currency, otherData
Transaction(12345, 54321, 180, USD, ...)
Transaction(12346, 54321, -50, USD, ...)
Transaction(12347, 54321, 20, USD, ...)
Those 3 from above would mean the User has 150 USD on his balance.
Concurrent access:
Now, imagine there are 2 or more instances of such application. Imagine, the User has a balance of 100 USD and bought two items worth of 100 USD at the same time. Request for such a purchase goes to two different instances, which both read all Transactions from DB and reduce them into currentBalance. In both replicas, at the same time balance equals to 100 USD. Both services allow purchase and add new Transaction Transaction(12345, 54321, -100, USD, ...) which decreases the balance by 100.
If there are two, contradictory Transactions inserted into the DB, the balance is incorrect: -100 USD.
How should I deal with such a situation?
I know that usually optimistic or pessimistic concurrency control is used. So here are my doubts about both:
Optimistic concurrency
It's about keeping the version of the resource and comparing it before the actual update, like a CAS operation. Since Transactions are a form of events - immutable entities - there is no resource which version I could grasp. I do not update anything. I only insert new changes to the balance, which has to be consistent with all other existing Transactions.
Pessimistic concurrency
It's about locking the table/page/row for modification, in case they more often happen in the system. Yeah, ok.. blocking a table/page for each insert is off the table I think (scalability and high load concerns). And locking rows - well, which rows do I lock? Again, I do not modify anything in the DB state.
Open ideas
My feeling is, that this kind of problem has to be solved on the application code level. Some, yet vague ideas that come to my mind now:
Distributed cache, which holds "lock of given User", so that only one Transaction can be processed at a time (purchase, deposit, withdrawal, refund, anything).
Each Transaction has having field such as previousTransactionId - pointer to the last committed Transaction and some kind of unique index on this field (exactly one Transaction can point to exactly one Transaction in the past, first Transaction ever having null value). This way I'd get constraint violation error trying to insert a duplicate.
Asynchronous processing with queueing system, and having a topic-per-user: exactly one instance processing Transactions for given User one-by-one. Nice try, but unfortunatelly I need to be synchronous with the purchase in order to reply to 3rd party system.
One thing to note is that typically there's a per-entity offset (a monotonically increasing number, e.g. Account|12345|6789 could be the 6789th event for account #12345) associated with each event. Thus, assuming the DB in which you're storing events supports it, you can get optimistic concurrency control by remembering the highest offset seen when reconstructing the state of that entity and conditioning the insertion of events on there not being events for account #12345 with offsets greater than 6789.
There are datastores which support the idea of "fencing": only one instance is allowed to publish events to a particular stream, which is another way to optimistic concurrency control.
There are approaches which move pessimistic concurrency control into the application/framework/toolkit code. Akka/Akka.Net (disclaimer: I am employed by Lightbend, which maintains and sells commercial support for one of those two projects) has cluster sharding, which allows multiple instances of an application to coordinate ownership of entities between themselves. For example instance A might have account 12345 and instance B might have account 23456. If instance B receives a request for account 12345, it (massively simplifying) effectively forwards the request to instance A which enforces that only request for account 12345 is being processed at a time. This approach can in some way be thought of as a combination of 1 (of note: this distributed cache is not only providing concurrency control, but actually caching the application state (e.g. the account balance and any other data useful for deciding if a transaction can be accepted) too) and 3 (even though it's presenting a synchronous API to the outside world).
Additionally, it is often possible to design the events such that they form a conflict-free replicated data type (CRDT) which effectively allows forks in the event log as long as there's a guarantee that they can be reconciled. One could squint and perhaps see bank accounts allowing overdrafts (where the reconciliation is allowing a negative balance and charging a substantial fee) as an example of a CRDT.
How should I deal with such a situation?
The general term for the problem you are describing is set validation. If there is some property that must hold for the set taken as a whole, then you need to have some form of lock to prevent conflicting writes.
Optimistic/pessimistic are just two different locking implementations.
In the event that you have concurrent writes, the usual general mechanism is that first writer wins. The losers of the race follow the "concurrent modification" branch, and either retry (recalculating again to ensure that the desired properties still hold) or abort.
In a case like you describe, if your insertion code is responsible for confirming that the user balance is not negative, then that code needs to be able to lock the entire transaction history for the user.
Now: notice that if in the previous paragraph, because its really important. One of the things you need to understand in your domain is whether or not your system is the authority for transactions.
If your system is the authority, then maintaining the invariant is reasonable, because your system can say "no, that one isn't a permitted transaction", and everyone else has to go along with it.
If your system is NOT the authority - you are getting copies of transactions from "somewhere else", then your system doesn't have veto power, and shouldn't be trying to skip transactions just because the balance doesn't work out.
So we might need a concept like "overdrawn" in our system, rather than trying to state absolutely that balance will always satisfy some invariant.
Fundamentally, collaborative/competitive domains with lots of authorities working in parallel require a different understanding of properties and constraints than the simpler models we can use with a single authority.
In terms of implementation, the usual approach is that the set has a data representation that can be locked as a whole. One common approach is to keep an append only list of changes to the set (sometimes referred to has the set's history or "event stream").
In relational databases, one successful approach I've seen is to implement a stored procedure that takes the necessary arguments and then acquires the appropriate locks (ie - applying "tell, don't ask" to the relational data store); that allows you to insulate the application code from the details of the data store.

system design - How to update cache only after persisted to database?

After watching this awesome talk by Martin Klepmann about how Kafka can be used to stream events so that we can get rid of 2-phase-commits, I have a couple of questions related to updating a cache only when the database is updated properly.
Problem Statement
Lets say you have a Redis cache which stores the user's profile pic and a Postgres database which is used for all the User related operations(creating, updation, deletion, etc)
I want to update my Redis cache only and only when a new user has been successfully added to my database.
How can I do that using Kafka ?
If I am to take the example given in the video then the workflow would follow something like this:
User registers
Request is handled by User Registration Micro service
User Registration Microservice inserts a new entry into the User's table.
Then generates an User Creation Event in the user_created topic.
Cache population microservice consumes the newly created User Creation Event
Cache population microservice updates the redis cache.
The problem starts what would happen if the User Registration Microservice crashed just after writing to the database, but failed to send the event to Kafka ?
What would be the correct way of handling this ?
Does the User Registration Microservice maintain the last event it published ? How can it reliably do that ? Does it write to a DB ? Then the problem starts all over again, what if it published the event to Kafka but failed before it could update its last known offset.
There are three broad approaches one can take for this:
There's the transactional outbox pattern, wherein, in the same transaction as inserting the new entry into the user table, a corresponding user creation event is inserted into an outbox table. Some process then eventually queries that outbox table, publishes the events in that table to Kafka, and deletes the events in the table. Since the inserts are in the same transaction, they either both occur or neither occurs; barring a bug in the process which publishes the outbox to Kafka, this guarantees that every user insert eventually has an associated event published (at least once) to Kafka.
There's a more event-sourcingish pattern, where you publish the user creation event to Kafka and then some consuming process inserts into the user table based on the event. Since this happens with a delay, this strongly suggests that the user registration service needs to keep state of which users it has published creation events for (with the combination of Kafka and Postgres being the source of truth for this). Since Kafka allows a message to be consumed by arbitrarily many consumers, a different consumer can then update Redis.
Change data capture (e.g. Debezium) can be used to tie into Postgres' write-ahead log (as Postgres actually event sources under the hood...) and publish an event that essentially says "this row was inserted into the user table" to Kafka. A consumer of that event can then translate that into a user created event.
CDC in some sense moves the transactional outbox into the infrastructure, at the cost of requiring that the context it inherently throws away be reconstructed later (which is not always possible).
That said, I'd strongly advise against having ____ creation be a microservice and I'd likewise strongly advise against a RInK store like Redis. Both of these smell like attempts to paper over architectural deficiencies by adding microservices and caches.
The one-foot-on-the-way-to-event-sourcing approach isn't one I'd recommend, but if one starts there, the requirement to make the registration service stateful suddenly opens up possibilities which may remove the need for Redis, limit the need for a Kafka-like thing, and allow you to treat the existence of a DB as an implementation detail.

How to split monolothic application into microservices when most operations on one aggregate?

I am splitting a monolothic application into microservices based on this schema
Following DDD, I found 4 aggregates for this schema: User, JobTitle, Skill and Course. So there should be 4 microservices corresponding Account, JobTitle, Skill and Course.
The tricky thing is when I come across all APIs providing to Front End, 90% of them are related to User dependent Entities such as user_skills, user_job_titles ...
If I follow the DDD, then Account microservice would be very big, most of logic will go inside this microservice and the others would be tiny. Base on this, I come up with 2 solutions:
I split the application into 2 microservices: Account microservice for User aggregate and Insight microservice for JobTitle, Skill and Course aggregates. The problem with this solution is that it will create a cluster communications between Account and Insight since most of time Account need data from Insight microservice for it's business logics. And the original problem that Account microservice helds most of APIs for FE is not eliminated.
Keeping 4 microservices Account, JobTitle, Skill And Course. And pushing User dependent Entities into other corresponding microservices. For example user_skills will go into Skill microservice. This solution looks better but it will break DDD aggregate principle and could bring more problems later
How should I split this monolothic application into microserices?
I don't know exactly how many dependent relations the user entity has with the other aggregates, but in this case you could replicate the relation data in your user microservice and keeping it updated using events.
For example. If you have a table called user_skills then you can place it in the users domain and when a skill changes in the skill microservice, it will write an event in some sort of message broker like kafka and user microservice will be listen to consume that event and update its user-skill table accordingly.

Can I keep a copy of a table of one database in another database in a microservice architecture?

I am currently new to microservice architecture so thanks in advance.
I have two different services a User Service and a Footballer Service each having their individual databases.(User database and Footballer database).
The Footballer service has a database with a single table storing footballer informations.
The User service has a database which stores User details along with other user related data.
Now a User can add footballers to their team by querying the Footballer service and I need to store them somewhere in order to be displayed later.
Currently I'm storing the footballers for each user in a table in the User database whereby I make a call to the Footballer service to give me the details of a specific footballer by ID and save them in the USer database by mapping against the USer ID.
So is this a good idea to do that and by any chance does it mean im replicating data between two services
and if it is than what other ways can i achieve the same functionality ?
Currently I'm storing the footballers for each user in a table in the User database whereby I make a call to the Footballer service to give me the details of a specific footballer by ID and save them in the USer database by mapping against the USer ID.
"Caching" is a fairly common pattern. From the perspective of the User microservice, the data from Footballer is just another input which you might save or not. If you are caching, you'll usually want to have some sort of timestamp/version on the cached data.
Caching identifiers is pretty normal - we often need some kind of correlation identifier to connect data in two different places.
If you find yourself using Footballer data in your User domain logic (that is to say, the way that User changes depends on the Footballer data available)... that's more suspicious, and may indicate that your boundaries are incorrectly drawn / some of your capabilities are in the wrong place.
If you are expecting the User Service to be autonomous - that is to say, to be able to continue serving its purpose even when Footballer is out of service, then your code needs to be able to work from cached copies of the data from Footballer and/or be able to suspend some parts of its work until fresh copies of that data are available.
People usually follow DDD (Domain driven design) in case of micro-services :
So here in your case there are two domains i.e. 2 services :
So, user service should only do user specific tasks, it should not be concerned about footballer's data.
Hence, according to DDD, the footballers that are linked to the user should be stored in football service.
Replicating the ID wouldn't be considered replication in case of microservices architecture.

How can I divide one database to multi databases?

I want to decompose my application to adopt microservices architecture, and i will need to come up with a solid strategy to split my database (Mysql) into multiple small databases (mysql) aligned with my applications.
TL;DR: Depends on the scenario and from what each service will do
Although there is no clear answer to this, since it really depends on your needs and on what each service should do, you can come up with a general starting point (assuming you don't need to keep the existing database type).
Let's assume you have a monolithic application for an e-commerce, and you want to split this application into smaller services, each one with it's own database.
The approach you could use is to create some services that handles some parts of the website: for example you could have one service that handles users authentication,one for the orders, one for the products, one for the invoices and so on...
Now, each service will have it's own database, and here's come another question: which database a specific service should have? Because one of the advantages of this kind of architecture is that each service can have it's own kind of database, so for example the products service can have a non relational database, such as MongoDB, since all it does is getting details about products, so you don't have to manage any relation.
The orders service, on the other hand, could have a relational database, since you want to keep a relation between the order and the invoice for that order. But wait, invoices are handled by the invoice service, so how can you keep the relation between these two without sharing the database? Well, that's one of the "issues" of this approach: you have to keep services independent while also let them communicate each other. How can we do this? There is no clear answer here too... One approach could be to just pass all invoices details to the orders service as well, or you can just pass the invoice ID when saving the order and later retrieve the invoice via an API call to the invoice service, or you can pass all the relevant details you need for the invoice to an API endpoint in the order service that stores these data to a specific table in the database (since most of the times you don't need the entire actual object), etc... The possibilities are endless...
