I've an application which stores entites in MariaDB. The entity has some relations to other entities and in one of the child entities is a binary attribute. See the following:
Parent
\ 1-n ChildA
- attrA1
- ...
\ 1-n ChildB
- attrB1
- binaryAttr
Storing the binary files in the DB has an impact on the DB size of course and thereby an impact on our backup concept. Besides this binary data doesn't necessarily have to be in the DB as well.
I'm thinking about combining MariaDB with an S3 compatible object store, so that the structured data is persisted in DB and the binary files in the object store.
We're are using Spring Boot (Data-JPA) with Apache Camel.
The naïve approach would be to store the binary file with a uuid-key in the object store and afterwards persisting the rest of the entity with the reference (uuid) to the DB.
But there's no easy transaction handling for this, nor a transparent handling of the entity (I've to handle persisting entity and binary data seperately).
Is there a java / spring based framework to overcome the drawbacks (transaction handling, transparent handling) of two different "datasources"?
What is the best practice to handle such a scenario or is it totally unrecommended?
Is there an extension mechanism for Spring / Hibernate to handle object store persistency within the hibernate orm mapping process?
Is it possible to implement one repository for persisting / loading entity and binary data at once?
Can anyone give me some direction?
I'm considering using spring-data-jdbc for a project.
But i don't have any control over the DB-schema.
My domain model can be populated by the existing tables, but they differ in many ways.
Examples:
A specific aggregate in my model consists of nested Value-Objects. The corresponding table only features flat columns, so the nested Value-Objects would have to be mapped manually.
One the other hand, there are aggregates that don't have many nested Value-Objects, but the corresponding tables are organized according to a star-schema, so the values are distributed over many tables (instead of a single one).
I guess this prevents me from using many of the Quality-Of-Life features (like Query-Derivation and Mapping).
Do I actually get anything significant out of spring-data-jdbc in comparison to using a plain JdbcTemplate in this scenario?
The scenario you describe would make me tend towards plain JdbcTemplate.
But I would consider using the Aggregate approach Spring Data JDBC does:
Load complete aggregates
Reference between aggregates using ids, or something like an AggregateReference
And if you have an aggregate that actually can be mapped using Spring Data JDBC you can still do that.
I'm using Spring Data JPA to expose REST APIs. In my application, there are two types of tables available(current and archival) and structure of the current and archival tables are exactly similar and data will be moved for current table to archival table over the period of time for performance reasons. I'm having repository classes to retrieve the data from current and archival table separately and Pagination is also implemented for repositories.
Now I got a requirement to fetch the eligible records from both tables based on criteria and apply pagination at single shot. Is it possible with Spring Data JPA
You can keep the latest version in both tables and when you search for data you just do a regular search.
Another option would be to create a view over the two tables.
I also think Hibernate Envers was able to do that though I never tried it.
I am trying to convert one monolithic application into micro service oriented architecture style. Back end I am using spring , spring boot frameworks for development. Front-end I am using angular 2. And also using PostgreSQL as database.
Here my confusion is that, when I am designing my databases as distributed, according to functionalities it may contain 5 databases. Means I am designing according to vertical partition. Then I am thinking to implement inter-microservice communication services to achieve the entire functionality.
The other way I am thinking that to horizontally partition the current structure. So my domain is based on some educational university. So half of university go under one DB and remaining will go under another DB. And deploy services according to Two region (two for two set of university).
Currently I am decided to continue with the last mentioned approach. I am new to these types of tasks, since it referring some architecture task. Also I am beginner to this microservice and distributed database world. Would someone confirm that my approach will give solution to my issue? Can I continue with my second approach - horizontal partitioning of databases according to domain object?
Can I continue with my second approach - Horizontal partitioning of
databases according to domain object?
Temporarily yes, if based on that you are able to scale your current system to meet your needs.
Now lets think about why on the first place you want to move to Microserices as a development style.
Small Components - easier to manager
Independently Deployable - Continous Delivery
Multiple Languages
The code is organized around business capabilities
and .....
When moving to Microservices, you should not have multiple services reading directly from each other databases, which will make them tightly coupled.
One service should be completely ignorant on how the other service designed its internal structure.
Now if you want to move towards microservices and take complete advantage of that, you should have vertical partition as you say and services talk to each other.
Also while moving towards microservices your will get lots and lots of other problems. I tried compiling on how one should start on microservices on this link .
How to separate services which are reading data from same table:
Now lets first create a dummy example: we have three services Order , Shipping , Customer all are three different microservices.
Following are the ways in which multiple services require data from same table:
Service one needs to read data from other service for things like validation.
Order and shipping service might need some data from customer service to complete their operation.
Eg: While placing a order one will call Order Service API with customer id , now as Order Service might need to validate whether its a valid customer or not.
One approach Database level exposure -- not recommened -- use the same customer table -- which binds order service to customer service Impl
Another approach, Call another service to get data
Variation - 1 Call Customer service to check whether customer exists and get some customer data like name , and save this in order service
Variation - 2 do not validate while placing the order, on OrderPlaced event check in async from Customer Service and validate and update state of order if required
I recommend Call another service to get data based on the consistency you want.
In some use cases you want a single transaction between data from multiple services.
For eg: Delete a customer. you might want that all order of the customer also should get deleted.
In this case you need to deal with eventual consistency, service one will raise an event and then service 2 will react accordingly.
Now if this answers your question than ok, else specify in what kind of scenario multiple service require to call another service.
If still not solved, you could email me on puneetjindal.11#gmail.com, will answer you
Currently I am decided to continue with the last mentioned approach.
If you want horizontal scalability (scaling for increasingly large number of client connections) for your database you may be better of with a technology that was designed to work as a scalable, distributed system. Something like CockroachDB or NoSQL. Cockroachdb for example has built in data sharding and replication and allows you to grow with adding server nodes as required.
when I am designing my databases as distributed, according to functionalities it may contain 5 databases
This sounds like you had the right general idea - split by domain functionality. Here's a link to a previous answer regarding general DB design with micro services.
In the Microservices world, each Microservice owns a set of functionalities and the data manipulated by these functionalities. If a microservice needs data owned by another microservice, it cannot directly go to the database maintained/owned by the other microservice rather it would call an API exposed by the other microservice.
Now, regarding the placement of data, there are various options - you can store data owned by a microservice in a NoSQL database like MongoDB, DynamoDB, Cassandra (it really depends on the microservice's use-case) OR you can have a different table for each micro-service in a single instance of a SQL database. BUT remember, if you choose a single instance of a SQL Database with multiple tables, then there would be no joins (basically no interaction) between tables owned by different microservices.
I would suggest you start small and then think about database scaling issues when the usage of the system grows.
In a microservices architecture, each microservice has its own database and tables should not be duplicated in different databases.
But there are tables, like lookup tables (called also reference tables), that are needed by multiple microservices.
Should we put lookup tables in each microservice database, or is it better to create a new microservice with a database holding all the lookup tables ?
Lookup tables will usually contain read only data (they are like view models), so they can be available across the system in whatever technical solution you choose.
A shared read only database table, a distributed cache, on a CDN...
Make sense?