For my Symfony2 webapp, I have created a data model consisting of 16 entities. Main entities are User and Company. There is another entity: Location.
There is no implemented functionality concerning the Location, so it's unused by now. (Will say: It is defined as a doctrine entity, but not implemented in any controller.)
I recently signed up for a monitoring service (NewRelic), which allows to check for database performance consumption. Funny thing: My Location entity, which I don't use at all, consumes most performance: It has the highest throughput and is the most time consuming entity in my database. (Besides _session, but don't notice that.)
All other entities listed in that monitoring app are entities, that are used in my app.
There are a couple of other entities that are defined but not implemented. And none of those entities are listed in my monitoring app.
May Location be a Symfony2-reserved entity name that - if used - will consume performance? Do you have any other explanations for this behaviour?
Related
Our scenario: we receive bulks of messages from Kafka and write them to the DB after certain processing. Currently we achieve DB-write rates (in our company network) of up to 300..310 thousand records/min. But my colleagues want more (500K-600K/min.)
The affected Java application has a functional layer (a "business facade" so to speak), underneath we have classes of persistence layer, which write records grouped to individual tables into the DB as bulk inserts/updates. Whereas a bulk insert/update has been implemented as a #Transactional(REQUIRED) - i.e. default setting. Therefore, a received group of Kafka messages often means more than 1 database transaction.
I know that a DB-commit is expensive in terms of performance. I used the following settings when configuring our Spring-based data sources:
useConfigs=maxPerformance
rewriteBatchedStatements=true
prepStmtCacheSize=256
prepStmtCacheSqlLimit=2048
This did improve performance, but not to the desired benchmark of 500K-600K DB-writes/min.
Question to you, colleagues: is it OK from the standpoint of software architecture and for performance increase to annotate our "business facade" class as #Transactional(REQUIRED) and the DB layer classes as #Transactional(SUPPORTS). Thus, I want to have only one transaction per group/bulk of Kafka messages and thereby increase the DB-write rates by avoiding "excessive" commits.
Personally, I'm a bit hesitant about this change. On the one hand, I'm breaking here the boundaries of the areas of responsibility of the individual classes/layers: business logic "high-level" classes should know nothing about transaction management and the persistence layer classes should treat DB transactions as their core task. On the other hand, unwanted "cross-dependencies" arise: i.e. if an update for a table XYZ fails, then a rollback is also made for another table ABC, although everything ran smoothly there (remember all tables are getting now updates and inserts within one transaction!).
What do you think about this potential change in the transaction management? How can you fine-tune a spring-boot application to achieve higher write rates (configuration or maybe implementation changes)?
I've been struggling to choose to work with JPA entities as separated classes than domain entities in a single bounded context. I've faced the following choices
Use separated domain classes for Aggregate roots/Aggregates..etc with domain repositories to wrap Spring JPA repositories and use converters to map JPA entities <> Domain Entities with only required data
Lazy loading is about to be given away unless in mappers/converters are handling this inside domain repositories but this is overkill.
When saving objects, there might be related Aggregate roots (one to many relationship) which later in complex logic, I had to extremly take care of the state of the Domain entity to pass it to the domain repository and either fill it with all related data or simply map it (another method in the converter) with out relationship data (cascading not applied on JPA persisting)
A lot of duplicated code to avoid such situations even for very simple use cases
Or Use JPA entities as my domain entities and so far there are multiple examples/opinions of this like
https://github.com/citerus/dddsample-core/tree/Spring_Annotations_Autowire
http://www.javamagazine.mozaicreader.com/MayJune2018/Twitter#&pageSet=50&page=0
Should JPA entities and DDD entities be the same classes?
DDD, domain entities/VO and JPA
How to implement DDD using Spring Crud/Jpa Repository
On the other hand, there are opinions like this
Is it a good practice to use JPA entities as domain models?
My question, on the long run, from experience
What would cost more effort & time ?
Are both approaches are acceptable as practices ?
What are the pros and cons of both ?
What would cost more effort & time ?
Decoupling almost always does. It's trade-off !
Are both approaches are acceptable as practices ?
Yes. I see there are many conflicted opinions on both approaches but really, they're just opinions. Both are applied and cost.
What are the pros and cons of both ?
Using JPA entities as domain entities approach really 1- reduces the time cost notionally. 2- Also lets you use lazy loading with relationships avoiding more code in application service, that if you're not following referencing other aggregates by id instead which also is opinion based but really costs the lazy loading of JPA.
One down side to this approach is unit testing as I see it. Unit test should not depend on starting up container, database...etc. Should purely test business logic. But that's not optimally possible with such frameworks. See this answer for example
JPA Entity must be unit tested and how?
Using JPA as separated entities in the infrastructure with wrapper repositories will make unit tests easier to mock data and test purely the domain (business rules) with comfort. It will reversely to the previous pros, cost you the mapping effort and time, too much duplicated code for mapping, wrapping repositories..etc. It brings the headache (and this should be a pro) of caring what is the state of your domain entity because mapping of nulls to JPA entity will effect the relationships mapping to your persistence source, and you REALLY SHOULD CARE for the state of your domain entity.
Also automatic lazy loading of ORM will not be used and done easily. Either
1- You put a reference to other aggregates as member in your aggregate root (Breaking the aggregate ID reference rule) and handle that in the mappers
2- You get from repository only wanted data of aggregate root with other aggregate's ID as reference members. This is done by well defined queries in the repository implementation so, this is a lot of writing & customizing queries. Avoiding using default ones which returns full JPA entities with ready lazy loading related references.
We have a database that manages codes, such as a list of valid currencies, a list of country codes, etc (hereinafter known as CodesDB).
We also have multiple microservices that in a monolithic app + database would have foreign key constraints to rows in tables in the CodesDB.
When a microservice receives a request to modify data, what are my options for ensuring the codes passed in the request are valid?
I am currently leaning towards having the CodesDB microservice post an event onto a service bus announcing when a code is added or modified - and then each other microservice interested in that type of code (country / currency / etc) can then issue an API request to the CodeDB microservice to grab the state it needs and reflect the changes in its own local DB. That way we get referential integrity within each microservice DB.
Is this the correct approach? Are there any other recommended approaches?
Asynchronous event based notification is a pattern commonly used in micro services world for ensuring eventual consistency. Depending on how strict your consistency requirement are you may have to ensure additional checks.
Another possible approach could be to use
Read only data stores using materialized view. This is a form of CQRS pattern where data from multiple services is stored in a de-normalized form in read only data store. The data gets updated asynchronously using the approach mentioned above. The consumers gets fast access to data without having to query multiple services
Caching - You could also possibly use distributed or replicated depending on your performance or consistency requirements.
I am trying to convert one monolithic application into micro service oriented architecture style. Back end I am using spring , spring boot frameworks for development. Front-end I am using angular 2. And also using PostgreSQL as database.
Here my confusion is that, when I am designing my databases as distributed, according to functionalities it may contain 5 databases. Means I am designing according to vertical partition. Then I am thinking to implement inter-microservice communication services to achieve the entire functionality.
The other way I am thinking that to horizontally partition the current structure. So my domain is based on some educational university. So half of university go under one DB and remaining will go under another DB. And deploy services according to Two region (two for two set of university).
Currently I am decided to continue with the last mentioned approach. I am new to these types of tasks, since it referring some architecture task. Also I am beginner to this microservice and distributed database world. Would someone confirm that my approach will give solution to my issue? Can I continue with my second approach - horizontal partitioning of databases according to domain object?
Can I continue with my second approach - Horizontal partitioning of
databases according to domain object?
Temporarily yes, if based on that you are able to scale your current system to meet your needs.
Now lets think about why on the first place you want to move to Microserices as a development style.
Small Components - easier to manager
Independently Deployable - Continous Delivery
Multiple Languages
The code is organized around business capabilities
and .....
When moving to Microservices, you should not have multiple services reading directly from each other databases, which will make them tightly coupled.
One service should be completely ignorant on how the other service designed its internal structure.
Now if you want to move towards microservices and take complete advantage of that, you should have vertical partition as you say and services talk to each other.
Also while moving towards microservices your will get lots and lots of other problems. I tried compiling on how one should start on microservices on this link .
How to separate services which are reading data from same table:
Now lets first create a dummy example: we have three services Order , Shipping , Customer all are three different microservices.
Following are the ways in which multiple services require data from same table:
Service one needs to read data from other service for things like validation.
Order and shipping service might need some data from customer service to complete their operation.
Eg: While placing a order one will call Order Service API with customer id , now as Order Service might need to validate whether its a valid customer or not.
One approach Database level exposure -- not recommened -- use the same customer table -- which binds order service to customer service Impl
Another approach, Call another service to get data
Variation - 1 Call Customer service to check whether customer exists and get some customer data like name , and save this in order service
Variation - 2 do not validate while placing the order, on OrderPlaced event check in async from Customer Service and validate and update state of order if required
I recommend Call another service to get data based on the consistency you want.
In some use cases you want a single transaction between data from multiple services.
For eg: Delete a customer. you might want that all order of the customer also should get deleted.
In this case you need to deal with eventual consistency, service one will raise an event and then service 2 will react accordingly.
Now if this answers your question than ok, else specify in what kind of scenario multiple service require to call another service.
If still not solved, you could email me on puneetjindal.11#gmail.com, will answer you
Currently I am decided to continue with the last mentioned approach.
If you want horizontal scalability (scaling for increasingly large number of client connections) for your database you may be better of with a technology that was designed to work as a scalable, distributed system. Something like CockroachDB or NoSQL. Cockroachdb for example has built in data sharding and replication and allows you to grow with adding server nodes as required.
when I am designing my databases as distributed, according to functionalities it may contain 5 databases
This sounds like you had the right general idea - split by domain functionality. Here's a link to a previous answer regarding general DB design with micro services.
In the Microservices world, each Microservice owns a set of functionalities and the data manipulated by these functionalities. If a microservice needs data owned by another microservice, it cannot directly go to the database maintained/owned by the other microservice rather it would call an API exposed by the other microservice.
Now, regarding the placement of data, there are various options - you can store data owned by a microservice in a NoSQL database like MongoDB, DynamoDB, Cassandra (it really depends on the microservice's use-case) OR you can have a different table for each micro-service in a single instance of a SQL database. BUT remember, if you choose a single instance of a SQL Database with multiple tables, then there would be no joins (basically no interaction) between tables owned by different microservices.
I would suggest you start small and then think about database scaling issues when the usage of the system grows.
In our project we're trying to apply the Bounded Context ideology and we've faced kind of obvious problem of performance. E.g., we have different classes (in different contexts) for representing a user in the system: Person in our core domain's context and User in security context. So, we have two different repositories for each of the aggregate, but they are using the same table in DB and sometimes accessing the same data.
Is there common solution to minimize db roundtrips in this case? Are there ORM's which deals with it, or should we code some caching system by ourselves?
upd: the db is from legacy app, and we'll have to use it "as is"
So, we have two different repositories for each of the aggregate, but
they are using the same table in DB and sometimes accessing the same
data.
The fact that you have two aggregates stored in the same table is an indication of a problem with the design. In this case, it seems you have two bounded contexts - a BC for the core domain (Person is here) and an identity/access BC (User is here). The BCs are related and the latter can be seen as upstream from the former. A Person in the core domain has a corresponding User in the identity BC, but they are not exactly the same thing.
Beyond this relationship between the BCs there are questions regarding ownership of behavior. For example, both a Person and a User may have a name and what is to be determined is who own's the behavior of changing a name. This can be implemented in several ways. Person may have its own name and changes should be propagated to the identity BC. Similarly, User may own changes to name, in which case they must be propagated to Person via a synchronization mechanism.
Overall, your problem could be addressed in two ways. First, you can store Person and User aggregates in different tables. Any given query should only use one of these tables and they can be synchronized in an eventually consistent matter. Another approach is to decouple the behavioral domain model from a model designed for queries (read-model). This way, you can create a read-model designed to serve a specific screen(s) and have a customized query, perhaps even outside of an ORM.
If all the Users are Person too (sometimes external services are modeled as special users too), the only data that User and Person should share on the database are their identifiers.
Indeed each entity in a domain model should hold references only to the data that they need to ensure their invariants.
Moreover I guess that Users are identified by Username and Persons are identified by something else (VAT code or so..).
Thus, the simplest optimization technique is to avoid to encapsulate in an entity those informations that are not required to ensure its invariants.
Furthermore you simply need an effective context mapping technique to easily pass from User to Person when needed. I use shared identifiers for this.
As an example you can expose the Person's identifier in the User class, so that a simple query to the Person's repository can provide you the data you need.
Finally I suggest you the Vaughn Vernon series on Aggregate Root Design.