Why are interface projections much slower than constructor projections and entity projections in Spring Data JPA with Hibernate? - spring

I've been wondering which kind of projections should I use, so I did a little test, which covered 5 types of projections (based on docs: https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#projections):
1. Entity projection
This is just a standard findAll() provided by Spring Data repository. Nothing fancy here.
Service:
List<SampleEntity> projections = sampleRepository.findAll();
Entity:
#Entity
#Table(name = "SAMPLE_ENTITIES")
public class SampleEntity {
#Id
private Long id;
private String name;
private String city;
private Integer age;
}
2. Constructor projection
Service:
List<NameOnlyDTO> projections = sampleRepository.findAllNameOnlyConstructorProjection();
Repository:
#Query("select new path.to.dto.NameOnlyDTO(e.name) from SampleEntity e")
List<NameOnlyDTO> findAllNameOnlyConstructorProjection();
Data transfer object:
#NoArgsConstructor
#AllArgsConstructor
public class NameOnlyDTO {
private String name;
}
3. Interface projection
Service:
List<NameOnly> projections = sampleRepository.findAllNameOnlyBy();
Repository:
List<NameOnly> findAllNameOnlyBy();
Interface:
public interface NameOnly {
String getName();
}
4. Tuple projection
Service:
List<Tuple> projections = sampleRepository.findAllNameOnlyTupleProjection();
Repository:
#Query("select e.name as name from SampleEntity e")
List<Tuple> findAllNameOnlyTupleProjection();
5. Dynamic projection
Service:
List<DynamicProjectionDTO> projections = sampleRepository.findAllBy(DynamicProjectionDTO.class);
Repository:
<T> List<T> findAllBy(Class<T> type);
Data transfer object:
public class DynamicProjectionDTO {
private String name;
public DynamicProjectionDTO(String name) {
this.name = name;
}
}
Some additional info:
The project was built using gradle spring boot plugin (version 2.0.4), which uses Spring 5.0.8 under the hood. Database: H2 in memory.
Results:
Entity projections took 161.61 ms on average out of 100 iterations.
Constructor projections took 24.84 ms on average out of 100 iterations.
Interface projections took 252.26 ms on average out of 100 iterations.
Tuple projections took 21.41 ms on average out of 100 iterations.
Dynamic projections took 23.62 ms on average out of 100 iterations.
-----------------------------------------------------------------------
One iteration retrieved (from DB) and projected 100 000 objects.
-----------------------------------------------------------------------
Notes:
It is understandable that retrieving entities takes some time. Hibernate tracks these objects for changes, lazy loading and so on.
Constructor projections are really fast and have no limitations on the DTO side, but require manual object creation in #Query annotation.
Interface projections turned out to be really slow. See question.
Tuple projections were the fastest, but are not the most convinient to play with. They need an alias in JPQL and the data has to be retrieved by calling .get("name") instead of .getName().
Dynamic projections look pretty cool and fast, but must have exactly one constructor. No more, no less. Otherwise Spring Data throws an exception, because it doesn't know which one to use (it takes constructor parameters to determine which data to retrieve from DB).
Question:
Why interface projections take longer than retrieving entities? Each interface projection returned is actually a proxy. Is it so expensive to create that proxy? If so, doesn't it defeat the main purpose of projections (since they are meant to be faster than entities)? Other projections look awesome tho. I would really love some insight on this. Thank you.
EDIT :
Here is the test repository: https://github.com/aurora-software-ks/spring-boot-projections-test in case you want to run it yourself. It is very easy to set up. Readme contains everything you need to know.

I experienced similar behavior with an older version of Spring Data and this was my take on it: https://arnoldgalovics.com/how-much-projections-can-help/
I had a talk with Oliver Gierke (Spring Data lead) and he made some improvements (that's why you get so "good" results :-) ) but basically there will be always a cost on having abstractions vs coding it manually.
This is a trade-off as everything else is. On one hand you got flexibility, easier development, less maintenance (hopefully), on the other hand you get full control, a bit uglier query model.

Each one has its Pros and Cons:
Interface projection :
Nested, dynamic and open projection allowed, but Spring generates proxy at runtime.
DTO projection :
Faster, but nested, dynamic and open projection not allowed.

Related

Spring Data JPA DistinctBy projections

Good day fellow hibernators!
I have a question on how the DistinctBy clause works in conjunction with Spring Data's projection
Assume I have 3 classes:
public class Task {
Long id;
#ManyToOne(fetch = LAZY)
#JoinColumn(name = "project_id")
private Project project;
#OneToOne
#JoinColumn(name = "contact_id")
private Contact assigned;
Boolean deleted;
// ...
}
public class Contact {
Long id;
// ...
}
public class Project {
Long id;
#OneToMany(fetch = LAZY, mappedBy = "project")
private Set<Task> tasks;
// ...
}
These would be my domain classes. Notice, Project does have a "One2Many" to Tasks, Contact does not. Now, I have 2 interfaces for my projections and the basic TaskRepo with 2 methods:
public interface JustProject {
Project getProject();
}
public interface JustAssignee {
Contact getContact();
}
public class TaskRepo extends CrudRepository<Task, Long>, JpaSpecificationExecutor<Task> {
List<JustAssignee> findDistinctByDeletedFalse();
List<JustProject> findDistinctByDeletedFalseAndDeletedFalse();
}
The way it works for me right now is that, findDistinctByDeletedFalse returns as many instances as there are distinct contacts for tasks (e.g. if there are 10 tasks but only 3 contacts, the method will return just 3 objects containing all the 3 distinct contacts). Same for findDistinctByDeletedFalseAndDeletedFalse but on project level.
Now I have a few questions here and would love to get some help in understanding how this works exactly.
is the distinct clause applied after the search is done?
my initial assumption was that this behavior would not work as it does now. I assumed that the distinct clause is applied before the result is fetched, meaning that it would be DISTINCT based on the underlying task model, not the returned JustContact or JustProject model.
is there any way I could somehow not abuse the ...AndDeletedFalse redundant appendix? I need both the two methods from the repo but I feel like I had to cheat just to obtain that result...
... am I doing something wrong? I wanted to get "all distinct contacts/projects assigned to all tasks" as elegant of a way as possible. I ended up thinking about this distinctby exactly because I was unsure on how it works and wanted to try mu luck out. I really didn't think it would work this way, but now that it does I would really want to understand why it does!
Many thanks <3
The DISTINCT keyword is applied to the query and therefore it's effect depends on the select list which in turn is controlled by the projection. Therefore if you have only project or only contact in your projection the DISTINCT will get applied to those values only. Note though, that this relies somewhat on the boundaries of the JPA specification and I wouldn't be surprised if you see different behaviour with different implementations. See https://github.com/eclipse-ee4j/jpa-api/issues/189 and https://github.com/eclipse-ee4j/jpa-api/issues/124 for somewhat related issues raised against the specification.
In oder to differentiate methods that otherwise only differ in the return value you might add any additional string between find and By in the method name. For example you might want to rename your methods to findDistinctContactsByDeletedFalse and findDistinctProjectsByDeletedFalse
I guess this is the best that you can get with Spring Data JPA. You might be able to use just a single method by using the dynamic projections approach, but I think this is a perfect use case for Blaze-Persistence Entity Views.
I created the library to allow easy mapping between JPA models and custom interface or abstract class defined models, something like Spring Data Projections on steroids. The idea is that you define your target structure(domain model) the way you like and map attributes(getters) via JPQL expressions to the entity model.
A DTO model for your use case could look like the following with Blaze-Persistence Entity-Views:
#EntityView(Task.class)
public interface TaskAggregateDto {
// A synthetic "id" to get a grouping context on object level
#IdMapping("1")
int getGroupKey();
Set<ProjectDto> getProjects();
Set<ContactDto> getContacts();
#EntityView(Project.class)
interface ProjectDto {
#IdMapping
Long getId();
String getName();
}
#EntityView(Contact.class)
interface ContactDto {
#IdMapping
Long getId();
String getName();
}
}
The Spring Data integration allows you to use it almost like Spring Data Projections: https://persistence.blazebit.com/documentation/entity-view/manual/en_US/index.html#spring-data-features
public interface TaskRepo extends CrudRepository<Task, Long>, JpaSpecificationExecutor<Task> {
TaskAggregateDto findOneByDeletedFalse();
}

Spring Data JPA - Class-based projections with Specification and Pageable

I am trying to identify the best approach for the following scenario. I have few entities in my project where I am applying some concepts of Spring Data JPA in order to have a great service to load entities with DTOs and it is not so difficult to provide maintenance when needed.
#Entity
class Order {
Long id;
Datetime createdDate;
String note;
List<Item> items;
...
}
#Entity
class Item {
Long id;
Order order;
...
}
Currently, I have a service method based on the specification and pageable classes to load my objects.
public Page findAll(Specification spec , Pageable pageable) {
return repository.findAll(spec,pageable);
}
This approach is working fine, however, it has some performance issues. I would like to use class-based projections with the specification and pageable classes. Do you guys have a recommendation or an example using this approach?
Regards,
Caique Ferreira
This feature is not yet supported, here is a bug tracker for it.

Is there a way to create one JPA entity based on many database tables and do I really have to do this or is it a bad practice?

I'm quite new to Spring Data JPA technology and currently facing one task I can't deal with. I am seeking best practice for such cases.
In my Postgres database I have a two tables connected with one-to-many relation. Table 'account' has a field 'type_id' which is foreign key references to field 'id' of table 'account_type':
So the 'account_type' table only plays a role of dictionary. Accordingly to that I've created to JPA entities (Kotlin code):
#Entity
class Account(
#Id #GeneratedValue var id: Long? = null,
var amount: Int,
#ManyToOne var accountType: AccountType
)
#Entity
class AccountType(
#Id #GeneratedValue var id: Long? = null,
var type: String
)
In my Spring Boot application I'd like to have a RestConroller which will be responsible for giving all accounts in JSON format. To do that I made entities classes serializable and wrote a simple restcontroller:
#GetMapping("/getAllAccounts", produces = [APPLICATION_JSON_VALUE])
fun getAccountsData(): String {
val accountsList = accountRepository.findAll().toMutableList()
return json.stringify(Account.serializer().list, accountsList)
}
where accountRepository is just an interface which extends CrudRepository<Account, Long>.
And now if I go to :8080/getAllAccounts, I'll get the Json of the following format (sorry for formatting):
[
{"id":1,
"amount":0,
"accountType":{
"id":1,
"type":"DBT"
}
},
{"id":2,
"amount":0,
"accountType":{
"id":2,
"type":"CRD"
}
}
]
But what I really want from that controller is just
[
{"id":1,
"amount":0,
"type":"DBT"
},
{"id":2,
"amount":0,
"type":"CRD"
}
]
Of course I can create new serializable class for accounts which will have String field instead of AccountType field and can map JPA Account class to that class extracting account type string from AccountType field. But for me it looks like unnecessary overhead and I believe that there could be a better pattern for such cases.
For example what I have in my head is that probably somehow I can create one JPA entity class (with String field representing account type) which will be based on two database tables and unnecessary complexity of having inner object will be reduced automagically each time I call repository methods :) Moreover I will be able to use this entity class in my business logic without any additional 'wrappers'.
P.s. I read about #SecondaryTable annotation but it looks like it can only work in cases where there is one-to-one relation between two tables which is not my case.
There are a couple of options whic allow clean separation without a DTO.
Firstly, you could look at using a projection which is kind of like a DTO mentioned in other answers but without many of the drawbacks:
https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#projections
#Projection(
name = "accountSummary",
types = { Account.class })
public Interface AccountSummaryProjection{
Long getId();
Integer getAmount();
#Value("#{target.accountType.type}")
String getType();
}
You then simply need to update your controller to call either query method with a List return type or write a method which takes a the proection class as an arg.
https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#projection.dynamic
#GetMapping("/getAllAccounts", produces = [APPLICATION_JSON_VALUE])
#ResponseBody
fun getAccountsData(): List<AccountSummaryProjection>{
return accountRepository.findAllAsSummary();
}
An alternative approach is to use the Jackson annotations. I note in your question you are manually tranforming the result to a JSON String and returning a String from your controller. You don't need to do that if the Jackson Json library is on the classpath. See my controller above.
So if you leave the serialization to Jackson you can separate the view from the entity using a couple of annotations. Note that I would apply these using a Jackson mixin rather than having to pollute the Entity model with Json processing instructions however you can look that up:
#Entity
class Account(
//in real life I would apply these using a Jacksin mix
//to prevent polluting the domain model with view concerns.
#JsonDeserializer(converter = StringToAccountTypeConverter.class)
#JsonSerializer(converter = AccountTypeToStringConverter.class
#Id #GeneratedValue var id: Long? = null,
var amount: Int,
#ManyToOne var accountType: AccountType
)
You then simply create the necessary converters:
public class StringToAccountTypeConverter extends StdConverter<String, CountryType>
implements org.springframework.core.convert.converter.Converter<String, AccountType> {
#Autowired
private AccountTypeRepository repo;
#Override
public AccountType convert(String value) {
//look up in repo and return
}
}
and vice versa:
public class AccountTypeToStringConverter extends StdConverter<String, CountryType>
implements org.springframework.core.convert.converter.Converter<AccountType, String> {
#Override
public String convert(AccountType value) {
return value.getName();
}
}
One of the least complicated ways to achieve what you are aiming for - from the external clients' point of view, at least - has to do with custom serialisation, what you seem to be aware of and what #YoManTaMero has extended upon.
Obtaining the desired class structure might not be possible. The closest I've managed to find is related to the #SecondaryTable annotation but the caveat is this only works for #OneToOne relationships.
In general, I'd pinpoint your problem to the issue of DTOs and Entities. The idea behind JPA is to map the schema and content of your database to code in an accessible but accurate way. It takes away the heavy-lifting of managing SQL queries, but it is designed mostly to reflect your DB's structure, not to map it to a different set of domains.
If the organisation of your DB schema does not exactly match the needs of your system's I/O communication, this might be a sign that:
Your DB has not been designed correctly;
Your DB is fine, but the manageable entities (tables) in it simply do not match directly to the business entities (models) in your external communication.
Should second be the case, Entities should be mapped to DTOs which can then be passed around. Single Entity may map to a few different DTOs. Single DTO might take more than one (related!) entities to be created. This is a good practice for medium-to-large systems in the first place - handing out references to the object that's the direct access point to your database is a risk.
Mind that simply because the id of the accountType is not taking part in your external communication does not mean it will never be a part of your business logic.
To sum up: JPA is designed with ease of database access in mind, not for smoothing out external communication. For that, other tools - such as e.g. Jackson serializer - are used, or certain design patterns - like DTO - are being employed.
One approach to solve this is to #JsonIgnore accountType and create getType method like
#JsonProperty("type")
var getType() {
return accountType.getType();
}

Avoid N+1 with DTO mapping on Hibernate entities

In our Restful application we decided to use DTO's to shield the Hibernate domain model for several reasons.
We map Hibernate entities to DTO and vice versa manually using DTOMappers in the Service Layer.
Example in Service Layer:
#Transactional(readOnly=true)
public PersonDTO findPersonWithInvoicesById(Long id) {
Person person = personRepository.findById(id);
return PersonMapperDTOFactory.getInstance().toDTO(person);
}
The main concept could be explained like this:
JSON (Jackson parser) <-> Controller <-> Service Layer (uses Mapping Layer) <-> Repository
We agreed that we retrieve associations by performing a HQL (or Criteria) using a left join.
This is mostly a performant way to retrieve relations and avoids the N+1 select issue.
However, it's still possible to have the N+1 select issue when a developer mistakenly forgets to do a left join. The relations will still be fetched because the PersonDTOMapper will iterate over the Invoices of a Person for converting to InvoiceDTOs. So the data is still fetched because the DTOMapper is executed where a Hibernate Session is active (managed by Spring)
Is there some way to make the Hibernate Session 'not active' in our DTOMappers? We would face a LazyInitializationException that should trigger the developer that he didn't fetch some data like it should.
I've read about #Transactional(propagation = Propagation.NOT_SUPPORTED) that suspends the transaction. However, I don't know that it was intended for such purposes.
What is a clean solution to achieve this? Alternatives are also very welcome!
Usually I use the mapper in the controller layer. From my prspective, the service layer manages the application business logic, dtos are very useful if you want to rapresent data to the external world in a different way. In this way you may get the lazy inizitalization excpetion you are looking for.
I have one more reason to prefer this solution: just image you need to invoke a public method inside a public method in the service class: in this case you might need to call the mapper several times.
If you are using Hibernate, then there are specific ways that you can determine if an associated object has been lazy-loaded.
For example, let's say you have an entity class Foo that contains a #ManyToOne 'foreign' association to entity class Bar which is represented by a field in Foo called bar.
In you DTO mapping code you can check if the associated bar has been lazy-loaded using the following code:
if (!(bar instanceof HibernateProxy) ||
!((HibernateProxy)bar).getHibernateLazyInitializer().isUninitialized()) {
// bar has already been lazy-loaded, so we can
// recursively load a BarDTO for the associated Bar object
}
The simplest solution to achieve what you desire is to clear the entity manager after querying and before invoking the DTO mapper. That way, the object will be detached and access to uninitialized assocations will trigger a LazyInitializationException instead.
I felt your pain as well which drove me to developing Blaze-Persistence Entity Views which allows you to define DTOs as interfaces and map to the entity model, using the attribute name as default mapping, which allows very simple looking mappings.
Here a little example
#Entity
class Person {
#Id Long id;
String name;
String lastName;
String address;
String city;
String zipCode;
}
#EntityView(Person.class)
interface PersonDTO {
#IdMapping Long getId();
String getName();
}
Querying would be as simple as
#Transactional(readOnly=true)
public PersonDTO findPersonWithInvoicesById(Long id) {
return personRepository.findById(id);
}
interface PersonRepository extends EntityViewRepository<PersonDTO, Long> {
PersonDTO findById(Long id);
}
Since you seem to be using Spring data, you will enjoy the spring data integration.

Performance issues using neo4j #QueryResult

I'm using neo4j with spring data.
When I use queries which return multiple fields I generally try and return an interface (#QueryResult annotated), so I won't need to convert the results afterwards.
For Some reason I experience very bad performance as the number of results grow.
Does anyone have solution?
I'm using neo4j 2.0.1 through rest, spring data for neo4j 3.0.0
The dataset is very small, less than a 100 nodes, and the result set is at most ~10 records.
Spring-Data-Neo4j-Rest is quite slow because it was built for the embedded database.
Here are fixes that can help improve the performance of your application still leveraging SDN's power to deserialize the objects for you.
Write your own cypher queries, and use #QueryResult more to retrieve connected nodes. DONT USE THE #Fetch annotation. SDN will make extra REST calls in an attempt to serialize the entire object graph when it could be done in a single query.
For example,
public class Person {
#RelatedTo (type = "MARRIED", direction = BOTH)
private Person partner;
}
#QueryResult
public class PersonResult {
#ResultColumn("person")
private Person person;
#ResultColumn("partner")
private Person partner;
}
In your repository then fetch your results
public interface PersonRepository extends GraphRepository <Person> {
#Query ("MATCH person-[m:MARRIED]-partner RETURN person, partner)
List<PersonResult> findAllMarried ();
}
AGAIN REMOVE THOSE DEADLY #FETCH ANNOTATIONS.
Your objects are serialized in batch this way. This is very efficient pending when the SDN team work out a better way to solve this problem.

Resources