Axon and Spring's repository integration - event-sourcing

I've read the Axon documentation and looked at all provided sample projects, especially the AxonBank which I'm referencing here, but one thing is still bothering me and is not explained as far as I see:
It is my understanding that in Axon you perform queries against a read database which represents the materialized view, e.g. a H2 that contains the latest BankAccount JPA entity (here). However if you have a Spring repository, e.g. JpaRepository<BankAccount, Long> (here), you also have the save-method which should only be used for commands. Shouldn't you split the repository into a read-only and write-only repository?
Could someone also point me the documentation what how Axon works with this repository? Because for an unitiated developer it looks like a "normal" JPA repository, i.e. the entity seems mutable and always up to date.
But from a theoretical perspective I expect an immutable entity in a zero state where a projection is created by applying all events, does this happen in the background with Axon?
What would happen if I update the entity with JpaRepository#save but not the aggregate? Will they be out of sync?
It seems that we have two sources of truth in this case, which shouldn't be the case theoretically.

let me try to help you!
What you are describing is the CQRS pattern - especially the Query side!
The Repository you mentioned is usually used on the #EventHandlers to build your projections, which will store the data the way you need it!
Looking at the AxonBank, it should be clearly visible here.
I don't think there is anything on Axon documentation specific about it but indeed this is a regular JPA Repository. Of course you can use whatever you want as your Query side.
What would happen if I update the entity with JpaRepository#save but not the aggregate? Will they be out of sync?
In this case, your view model will be updated based on anything other than events, which is not what you want. This repository should be updated only based on events, which most of the time, are sent by your Aggregates!
It seems that we have two sources of truth in this case, which shouldn't be the case theoretically.
Regarding your question about the source of truth, your Events should always be the source of truth. In the end, you should not update the repository other than using #EventHandlers.


Repositories (database gateway) in Clean Architecture

I have been studying Robert Martin's Clean Architecture. I loved this book, but when I saw some tutorial and code that is supposed to be an example of clean architecture, I saw a repository interface declared in the Entities (enterprise business rules) layer, with methods referencing an Entities class (which means the repository implementation in the adapters layers depends directly on Entities layer, bypassing application layer). Although in this sample codes the dependency rule is followed, i have some question about it:
Wouldn't a repository interface fit better in application layer, since persistence is more an application rule then a core rule about the entities?
In his book, and also in his blog article (search for the "What data crosses the boundaries" subtitle), Uncle bob says that only simple data structures should be passed across the boundaries, never entities. So are this sample codes wrong?
Wouldn't a repository interface fit better in application layer, since persistence is more an application rule then a core rule about the entities?
The repository interfaces are made to be used by interactors (aka. use cases). With the repository interface an interactor says what it wants but not how it is fulfilled. Thus they belong to the interactors and should therefore be placed in the use cases (application) layer.
I don't know the example you mentioned in your question, so I can't investigate it to see if there is some reason why the author put the repository interfaces in the entities layer and if it could have been solved in another way.
In his book, and also in his blog article (search for the "What data crosses the boundaries" subtitle), Uncle bob says that only simple data structures should be passed across the boundaries, never entities. So are this sample codes wrong?
Uncle Bob says
Typically the data that crosses the boundaries is simple data structures.
and later
For example, many database frameworks return a convenient data format in response to a query. We might call this a RowStructure. We don’t want to pass that row structure inwards across a boundary.
Let me explain why I highlighted typically.
So what he doesn't want in his architecture is that classes, or any types, that are defined in the database layer appear in the entities or the interactors. Because this would be a dependency from higher level modules to details.
Sometimes there are use cases that some developers want to implement more relaxed, but it comes at a price.
E.g. when you have a Show order details and you have loaded the complete order entity in your interactor, you usually have to copy almost all data to the data structure that passed the boundary (the output port) in order to strictly follow the architecture.
You could also pass the entity to the output port, but this would violate the strict architecture rules. So if you want to follow the strict rules you must copy the data to a new data structure, if not you pass the entity to the output port.
But before you pass the entity directly to the output port, you should consider the pros and cons.
If you pass the entity to the output port, ou have a dependency from the interface adapters layer to the entity layer. The dependency rule that each dependency should point inwards is still applied, but the dependency skips one layer.
Such a relaxed architecture introduces the risk that e.g. controllers might invoke business methods on entities and that is what they shouldn't do. It's up to you to decide if you (and your colleagues) are diciplined enough to not invoke business methods from the controller or if you better protect yourself and others from doing it by introducing a new data structure and do the copy effort.
From my point-of-view, which is a little more influenced by Domain-driven Design not only Clean Architecture, I see repositories as clear members of the domain layer (or business or core layer mostly referred to in clean architecture).
There are mainly two reasons for that:
1. Repositories are domain services that contain logic that has a meaning to business people.
Like finding an entity (aggregate) based on different criteria. For instance, a customer support employee, when on a support call, might have to search for an order either by the order number or if the customer does not have that they have to search by customer name and other criteria. The order repository would provide something like findOrderByCustomerName() which reflects the business language and needs. So repositories can contain more than just methods to retrieve the entity and save the entity. In some cases they can also provide a primitive value. For instance, a repository for incident tickets could give you the number of unresolved tickets for a specific product. Again, it needs to fulfill some business logic related purpose.
2. The domain (business) layer itself might need to consume a repository in order to perform some business logic.
If business logic does not fit with one single entity it is often the case that some kind of domain service is needed that contains the required logic. And this service containing business logic could need access to a repository to fulfill its tasks.
Let's say we have some company internal meeting room reservation application. If you want to make a new appointment there might be business logic that says that it must not be possible to schedule an appointment if there is already another appointment at the same time in the same meeting room. So the domain logic to ask if the requested meeting room is still free at that time might be part of the meeting room repository providing something like IsRoomFreeAt(RoomId roomId, DateTime requestedMeetingTime). And there could be a service in the domain layer executing all the related business logic which needs this repository's functionality. Even if you don't want to apply that query pattern and rather just search for the respective meeting room and do check the free status of that room instead, it would be the same result.
If the one approach or the other would be more suited for the respective application use case is another question. The important part here is that the business logic to make sure no rooms are double booked requires some information from a repository, no matter how.
If the repository would be located in the application layer the dependencies would no longer be pointing inwards because now the domain (business) layer would have a dependency on the application layer and not just the other way around.

Eventual consistency - Axon conflict resolver

I'm working on a PoC to evaluate the use of Axon framework for the development of a new application.
My concern is about the eventual consistency with the CQRS pattern since consistency is a requirement for us.
There are a lot of articles and threads about this topic, so I apologize if I'm creating a duplicate thread.
Axon offer a conflict resolver but I'm not sure to understand how it works.
I found an example on a open source project.
This solution stores the version of the aggregate in the event store and read model. The client will read then the version from the read model.
What if I have different read models, could there be version conflicts?
How does Axon solve the conflicts?
Before we dive into how Axon deals with consistency, there are a few things that I'd like to point out in the context of CQRS as a concept.
There is a lot of misconception around consistency in combination with CQRS. The concept of eventual consistency applies between the different models that you have defined within your application. For example, a Command Model may have changed state recently, but the Query Model doesn't reflect that state yet. The Query Model is eventually consistent with the Command Model. However, the information within that Query Model is still consistent in itself.
More importantly, this allows you to make conscious choices around where consistency is important and where it can be relaxed. Typically, Command Models make decisions in which consistency is important. You'd want to make sure each decision is made with the relevant knowledge of recent changes. That's the purpose of the Aggregate. An Aggregate will always make decisions that are consistent with its state.
I recommend reading up on the Reactive Principles document [1], namely Section V [2].
Then Axon. Axon implements the concepts of DDD and CQRS very strictly. Consistency is sacred within an Aggregate. For example, when using Event Sourcing, the events with an Aggregate's stream are guaranteed to have been generated based on a State that included all previous events in that stream. In other words, event number 9 in the stream was created with the knowledge of events number 0 through 8. Guaranteed.
When events are published, this doesn't mean any projections are already up to date. This may take a few milliseconds. Relaxing consistency here allows us to scale our system. The only downside is that a user may execute a command, perform a query and not see the results yet. This is actually much more common in systems than you think. There are numerous ways to prevent this from being a problem. Updating user interfaces in real-time is a powerful way of working with this. Then it doesn't matter which user made the change; they see it practically immediately.
The other way round may pose a challenge. A user observes the system state through a Query. This may (and always will, even without CQRS) provide stale data; the data may have been altered while the user is watching it. The user decides to make a change. However, in parallel, the information has already been changed. This other change may be such that, had the user known, it would have never submitted that Command.
In Axon, you can use Conflict Resolvers to detect these "unseen" parallel actions. You can use the "aggregate sequence" from incoming events and store them with your projection. If a user action results in a Command towards that aggregate, pass the Aggregate Sequence as Expected Aggregate Version. If the actual Aggregate's version doesn't match this (because it has been altered in the meantime), you get to decide whether that is problematic. There is a short explanation in the Reference Guide [3].
I hope this sheds some light on consistency in the context of CQRS and Axon.

How should we design database when working with multiple version of same service

For mricro-service based product,We want to provide backward compatibility.
This means we will have multiple versions of same service running at a time.
Problem: When new version is created, there are changes in database TABLES, few columns are added and few are altered. In this case if database is same for services, it will impact older services. What is the best way to handle this ?
Can we have database tables with versions?
One known way is have different database for each service, which we want to avoid.
You should never be in this situation. If columns are added you can have a DTO which do not send out these newly added columns to older versions. If you have to remove, then don't remove, stop using it for new apis, and if you need to alter create a new column and discard and let new api talk to new ones.
Having said that, such changes should be resisted and if you have to you need to make sure ways you can maintain the sanity of data. If you stop using and existing column and add new one how will you read data when you look up at the whole thing.
What will happen when new api makes call to historic data, what will happen when you run a reporting tool on it.
There are so many question that will need to be answered other than how api needs to be served and how services will manage the changes.
Creating a new table can be solution but how good or how bad it is , depends on your use case, what the changes are, what was the significance of the data in the service , what is its historic significance i.e if you need older data, or you can dump it etc
I feel like this is more of business decision rather than technical one.
As far as backward compatibility is concerned, I try to provide it at my controller level. I try as far as possible to have just one core biz logic in my code and map older apis to the newer one by either providing default values or doing required conversions.
I would never want to keep to set of logics. It takes some effort but I am able to find my way. Your case might not be same as mine, but still try to avoid getting into keeping two tables or two databases for old and new apis and try to concentrate the changes wrt to managing old apis into one place.
First of all, its a very good question and design is tricky.
You should refer this answer to get a fair and broad ideas.
Can we have database tables with versions?
In my opinion, you can have whatever you want but this is not recommended because of kind of complexities that it introduces to the system. This is what is concluded in above answer too.
What is the best way to handle this?
The way I do it and have seen in few systems that I didn't worked on that API is basically treated as presentation layer and incompatible DB changes to previous version of API are avoided.
i.e. lets say there is an API change in newer version which doesn't require a DB change - no problem , all is well and good - move ahead.
then lets say there is new API version which is calling for a DB change that will break existing system / old version - Its not good , try to rework your solution to achieve same functionality in such a way so that it doesn't break your existing version. If that's not possible ( obviously everything is possible !! ) , its a case of major product merge & upgrade and needs to be deferred till old version is discarded.
For this very reason, in the very first attempt, we need to design DB tables & JPA entities to be as complete and as broad as possible & keep DTOs and Entities distinct so changes are mainly needed on DTO side and not on entity side.
Obviously, these are subjective opinions, will vary case by case basis and open for debates.

The Clean Architecture, usecase dependencies

Recently, I found my way to The Clean Architecture post by Uncle Bob. But when I tried to apply it to a current project, I got stuck when a usecase needed to depend on another usecase.
For example, my Domain Model is Goal and Task. One Goal can have many Tasks. When I update a Task, it needs to update the information of its parent Goal. In other words, UpdateTask usecase will have UpdateGoal usecase as a dependecy. I am not sure if this is acceptable, or, if we should avoid usecase level dependencies.
A use case is related to a functionality of your application. Generally when we need to invoke from one use case to another there is something that does not work.
When you update a goal in isolation, it is not the same scenario as when you update it by a change in a task, in fact, it is sure that not all data is updated, but a part.
Surely you will have to use the goal repository and the goal entity but it is a completely different scenario. In your case you are not duplicating logic, only calls to the repository or the entity, saving code lines can be expensive in the future.
In short, it is not a good idea to have dependence between use cases.

Repository pattern with "modern" data access strategies

So I was searching the web looking for best practices when implementing the repository pattern with multiple data stores when I found my entire way of looking at the problem turned upside down. Here's what I have...
My application is a BI tool pulling data from (as of now) four different databases. Due to internal constraints, I am currently using LINQ-to-SQL for data access but require a design that will allow me to change to Entity Framework or NHibernate or the next data access du jour. I also hold steadfast to decoupled layers in my apps using an IoC framework (Castle Windsor in this case).
As such, I've used the Repository pattern to abstract the actual data access code from my business layer. As a result, my business object is coded against some I<Entity>Repository interface and the IoC Container is used to manage the actual implementation. In this case, I would expect to have a concrete Linq<Entity>Repository that implements the interface using LINQ-to-SQL to do the work. Later I could replace this with an EF<Entity>Repository with no changes required to my business layer.
Also, because I'm coding against the interface, I can easily mock the repository for unit testing purposes.
So the first question that I have as I begin coding the application is whether I should have one repository per DataContext or per entity (as I've typically done)? Let's say one database contains Customers and Sales with the expected relationship. Should I have a single OrderTrackingRepository with methods that work with both entities or have a separate CustomerRepository and a different SalesRepository?
Next, as a BI tool, the primary interface is for reporting, charting, etc and often will require a "mashup" of data across multiple sources. For instance, the reality is that one database contains customer information while another handles sales information and a third holds other financial information but one of my requirements is to display aggregated information that spans all three. Plus, I have to support dynamic filtering in the UI. Obviously working directly against the LINQ-to-SQL or EF DataContext objects (Table<Entity>, for instance) will allow me to pretty much do anything. What's the best approach to expose that same functionality to my business logic when abstracting the DAL with a repository interface?
This article: link text indicates that EF4 has turned this approach around and that the repository is nothing more than an IQueryable returned from the EF DataContext which brings up a whole other set of questions.
But, I think I've rambled on enough...
UPDATE (Thanks, Steven!)
Okay, let me put a more tangible (for me, at least) example on the table and clarify a few points that will hopefully lead to an approach I can better wrap my head around.
While I understand what Steven has proposed, I have a team of developers I have to consider when implementing such things and I'm afraid they will get lost in the complexity (yes, a real problem here!).
So, let's remove any direct tie-in with Linq-to-Sql because I don't want a solution that is dependant upon the way L2S works - or even EF, for that matter. My intent has been to abstract away the data access technology being used so that I can change it as needed without requiring collateral changes to the consuming code in my business layer. I've accomplished this in the past by presenting the business layer with IRepository interfaces to work against. Perhaps these should have been named IUnitOfWork or, more to my liking, IDataService, but the goal is the same. These interfaces typically exposed methods such as Add, Remove, Contains and GetByKey, for example.
Here's my situation. I have three databases to work with. One is DB2 and contains all of the business information for a customer (franchise) such as their info and their Products, Orders, etc. Another, SQL Server database contains their financial history while a third SQL Server database contains application-specific information. The first two databases are shared by multiple applications.
Through my application, the customer may enter/upload their financial information for a given time period. When entered, I have to perform the following steps:
1.Validate the entered data against a set of static rules. For example, the data must contain a legitimate customer ID value (in the case of an upload). This requires a lookup in the DB2 database to verify that the supplied customer ID exists and is current.
2.Next I have to validate the data against a set of dynamic rules which are contained in the third (SQL Server) database. An example may be that a given value cannot exceed a certain percentage of another value.
3.Once validated, I persist the data to the second SQL Server database containing the financial data.
All the while, my code must have loosely-coupled dependencies so I may mock them in my unit tests.
As part of the analysis, I know that I have three distinct data stores to work with and about a half-dozen or so entities (at this time) that I am working with. In generic terms, I presume that I would have three DataContexts in my application, one per data store, with the entities exposed by the appropriate data context.
I could then create a separate I{repository|unit of work|service} for each entity that would be consumed by my business logic with a concrete implementation that knows which data context to use. But this seems to be a risky proposition as the number of entities increases, so does the number of individual repository|UoW|service types.
Then, take the case of my validation logic which works with multiple entities and, thereby, multiple data contexts. I'm not sure this is the most efficient way to do this.
The other requirement that I have yet to mention is on the reporting side where I will need to execute some complex queries on the data stores. As of right now, these queries will be limited to a single data store at a time, but the possibility is there that I might need to have the ability to mash data together from multiple sources.
Finally, I am considering the idea of pulling out all of the data access stuff for the first two (shared) databases into their own project and have been looking at WCF Data Services as a possible approach. This would give me the basis for a consistent approach for any application making use of this data.
How does this change your thinking?
In your case I would recommend returning IEnummerables's for your data queries for the repo. I usually aggregate calls from multiple repo's through a service class that represents the domain problem and encapsulates my business logic. To keep it clean I try keep my repros focused on the domain problem. I liken my Datacontext to a repo, and extract an interface using a T4 template to make life easier for mocking. But there is nothing stopping you using a traditional repo that encapsulates your calls. Doing it this way will allow you to switch ORM's at any stage.
I have also done a lot of work in this area, and INITIALLY came to the same conclusion, however it is NOT a good solution. The point of the Repo is to abstract queries into discrete chunks of work. Exposing IQueryable is too adhoc and raises some issues later down the line. You loose your ability to scale. You loose your ability to optimize queries (Lets say I want to move to a highly optimized stored proc). You loose your ability to use IoC for the repo to switch out data access layers (switch the project from SQL to Mongo). You loose your ability to provide effective data caching in the Repo (Which is a major strength in the Repo pattern). I would recommend taking a CLOSE look as to WHY we have a Repo pattern. It isn't simply an "ORM" mapping layer. What made this really clear to me was the CQRS pattern.
Further to this allowing the ad-hoc nature of IQueryable opens you to misfitting reuse of queries. It is GENERALLY not a good idea to reuse queries, since query to query you see slight deviations, which ends up with 2 byproducts: Queries become too broad and inefficient. Queries become riddled with unmaintainable IF THEN statements to cater for the deviations.
IQueryable is easy, but opens you up to an unmaintainable mess.
Look at this SO answer. I think it shows a simplified model of what you want. IQueryable<T> is indeed our new Repository :-). DataContext and ObjectContext are our Unit of Work.
Here is a blog post that describes the model you might be looking for.
It would be wise to hide the shared databases behind a service. This will solve several problems:
This will make the database private to the service, which makes it much easier to change the implementation when needed.
You can put the needed validation logic (for database 1) in that service and can create tests for that validation logic in that project.
Clients accessing that service can assume correctness of the service, and its validation logic.
The result of this is that your application will send data to the service to validate it. Call the service to fetch data. Query its own private database (database 3) and join the data of the three data source locally together. I've never been a fan of using cross-database or even cross-server (in your situation) database calls and letting the database join everything together. Transactions will be promoted to distributed-transactions and it's hard to predict how many data the servers will exchange.
When you abstract the shared databases behind the service, things get easier (at least from your application's point of view). Your application calls services it trusts which limits the amount of code in that application and the amount of tests. You still want to mock the calls to such a service, but that would be pretty easy. It should also solve the problem of validating over multiple data sources.
Validation is always a hard part. I'm very familiar with Validation Application block, and love it for it's flexibility. It isn't however an easy framework, but you might take a peek at what you can do with it. For instance, I've written several articles about integration with O/RM tools and how to 'embed' a context (context as in DataContext/Unit of Work) in Validation Application Block.
Please have a look at my IRepository pattern implementation using EF 4.0.
My solution has the following features:
supports connections to multiple dbs
One repository per entity
Support for execution of queries
Unit of work pattern implementation
Support for validating entities using VAB guidance
Common operations are kept at base class level. High use of OOPS techniques for code re-usability and ease of maintenance.
