preventing OpenJPA N+1 select performance problem on maps

preventing OpenJPA N+1 select performance problem on maps - performance

When I have an entity that contains a Map, e.g.
#Entity
public class TestEntity {
#ElementCollection(fetch = FetchType.EAGER)
Map<String, String> strings = new HashMap<String, String>();
}
and I select multiple entities (SELECT z FROM TestEntity z), OpenJPA 2.0 performs one query for each TestEntity to fetch the map, even though I used FetchType.EAGER. This also happens when the Map value is an entity and I use #OneToMany instead of #ElementCollection. In principle this can be done more efficiently with one query that selects all the map entries for all returned TestEntities. For Collection-valued fields OpenJPA already does this by default (openjpa.jdbc.EagerFetchMode" value="parallel") but it seems to fail on this simple entity. (Same problem with value="join").
Could I be doing something wrong? Is there an easy way to tell OpenJPA to not perform a query per entity but only one?
Or is there already any work planned on improving this (I filed it under https://issues.apache.org/jira/browse/OPENJPA-1920)?
It is a problem for us because we wish to fetch (and detach) a list of about 1900 products which takes almost 15 seconds with OpenJPA. It takes less than a second with my own native query.
Having to write only one native query wouldn't be much of a problem but the map we use is inside a reusable StringI18N entity which is referenced from several different entities (and can be deep in the object graph), so native queries are a maintenance headache.
Any help getting performance up is greatly appreciated.
EDIT: explicitly using JOIN FETCH does not help either:
"SELECT z FROM TestEntity z JOIN FETCH z.strings"
OpenJPA's TRACE still shows that it executes one SQL statement for each individual TestEntity.

It might be a pain (correction: I know it'll be a pain) but have you tried actually mapping your 2-field TestEntity as a full JPA-persisted #Entity?
I know that Hibernate used to treat #ElementCollections rather differently to #OneToManys for example - OpenJPA could well be doing something similar.

Related

Force JPA to not use union and fetch tables one by one

I have 5 similar tables from which I need to execute a same query and fetch data in pages. I have used polymorphic queries (have super abstract class and used #Inheritance to fetch all rows automatically)
But this approach has problems as noted here: Database pressure on Polymorphic queries
The issue is that the queries use union all which makes DB to search through millions of rows just to get 500 results. So instead I want to execute this serially.
When I execute the method JPA will go to first table; fetch data in pages; if the data fetching is complete then go to second table and so on...
Right now with union, I have ton of pressure on database. With this new approach, I could have less pressure as only one table is accessed at once.
I do not know a way to do this without changing the setup I have right now. For example right now I have it like this:
public interface OhlcDao extends JpaRepository<AbstractOhlc, OhlcId> {
Slice<OhlcRawBean<? extends OhlcBean>> findByIdSourceIdAndIdTickerIdIn(
String sourceId,
Set<String> tickerId,
PageRequest pageRequest
);
}
The method uses union to fetch data which I do not like.
Is there a way to make this work in JPA or Hibernate by changing any internal code (aka without changing my setup, so similar method does not use unions)

Using Spring Query Methods JPA to efficiently query DB without multiple SELECT statements

I have an entity that has simple String columns as well as many ElementCollections (List and Map). I noticed looking at my postgres logs that PostGres when querying for this entity is doing a bunch of SELECT queries consecutively to get all the ElementCollections.
For efficiency, I would imagine doing one SELECT query with some inner JOINs might be better to avoid all of the individual SELECT queries. Is there a way to do that without writing a very verbose select query manually with all the INNER JOINs?
I have been looking around FetchTypes and Spring QueryData language, and DTO Projection but I imagine there might be a more straightforward way. The benefit I had been taking for granted is by explictly doing the JOINs if I add a new field then I will have to keep updating my query and if Spring is generating queries for me, then I wouldn't have to do anything.
// Person.java
#Entity
public Person {
#Id
long personId;
#Column
String firstName;
#Column
String lastName;
#ElementCollections
Set<String> someField;
#ElementCollections
Map<String, String> otherField;
#ElementCollections
Set<String> anotherField;
#ElementCollections
Map<String, String> yetAnotherField;
}
What is happening right now is
SELECT firstName, lastName FROM Person WHERE personId=$1
SELECT someField FROM Person_SomeField WHERE someField.personId=$1
SELECT otherField.key otherField.value FROM Person_OtherField WHERE otherField.personId=$1
And this continues for all of the ElementCollections tables which leads to a lot of queries.

Change your annotation to #ElementCollection(fetch = FetchType.EAGER).
It sounds like those fields are being lazily loaded (Hibernate is waiting until they are accessed to load them) which results in the N+1 queries you are seeing. LAZY loading is the default behavior for this type of member, which makes sense because loading it is not cheap. However, if you always want these members loaded, setting it to EAGER can make sense. Setting the fetch to EAGER will force Hibernate to load them along with the entity itself. This is the documentation for the fetch option on #ElementCollection:
(Optional) Whether the collection should be lazily loaded or must be
eagerly fetched. The EAGER strategy is a requirement on the
persistence provider runtime that the collection elements must be
eagerly fetched. The LAZY strategy is a hint to the persistence
provider runtime.

Method in Entity need to load all data from aggregate, how to optimalize this?

I've problem with aggregate which one will increase over time.
One day there will be thousands of records and optimalization gonna be bad.
#Entity
public class Serviceman ... {
#ManyToMany(mappedBy = "servicemanList")
private List<ServiceJob> services = new ArrayList<>();
...
public Optional<ServiceJob> firstServiceJobAfterDate(LocalDateTime dateTime) {
return services.stream().filter(i -> i.getStartDate().isAfter(dateTime))
.min(Comparator.comparing(ServiceJob::getStartDate));
}
}
Method just loading all ServiceJob to get just one of them.
Maybe I should delegate this method into service with native sql.

You have to design small aggregates instead of large ones.
This essay explains in detail how to do it: http://dddcommunity.org/library/vernon_2011/. It explains how to decompose your aggregates to smaller ones so you can manage the complexity.
In your case instead of having an Aggregate consisting of two entities: Serviceman and Servicejob with Serviceman being the aggregate root you can decompose it in two smaller aggregates with single entity. ServiceJob will reference Serviceman by ID and you can use ServicejobRpository to make queries.
In your example you will have ServicejobRpository.firstServiceJobAfterDate(guid servicemanID, DateTime date).
This way if you have a lot of entities and you need to scale, you can store Servicejob entities to another DB Server.
If for some reason Serviceman or Servicejob need references to each other to do their work you can use a Service that will use ServicemanRepository and ServicejobRepository to get both aggregates and pass them to one another so they can do their work.

how to do AND and multiple OR parameters method in spring data JPA

I am trying to formulate a method name for this query :
#Query("from Employees where department = ?1 and (fullTime = true or contractor = true or subContractor = true)")
I thought this method will do the trick, but it does an and on dept and full time
public List<Employees> findByDepartmentAndfullTimeTrueOrContractorTrueOrSubContractorTrue(String dept);
This is a related question : Spring JPA Data "OR" query but was asked in 2012. Is there a way to achieve this without having to use #Query ?

This is currently not supported and probably never will be for a very simple reason:
Derived queries are considered a means to define very simple queries. I admit this is blurry but if you get to findByDepartmentAndfullTimeTrueOrContractorTrueOrSubContractorTrue it's time to rethink whether that's actually what you want to expose to clients. It's awkward to write, awkward to read and probably actually more than a collection of predicates but conveying a higher-level meaning and thus should be named in amore descriptive way.
The solution - as you already discovered - is to use #Query or Querydsl predicates.

How to obtain a generic search/find method with spring data?

All I need is to provide all my repositories with a generic search/find method.
Something like this:
public interface BaseRepository<T, ID extends Serializable>
extends PagingAndSortingRepository<T, ID> {
Iterable<T> search(SearchParameters sp);
}
where the SearchParameters object represents a set of values for each property, and probably a condition to apply on them.
Jpa Criteria is probably the way to go, but I'm really having a hard time finding something that fits my needs.

I used one approach which goes in the same direction but i would rather say its a dynamic approach instead of generic. Its now working pretty well and we are able to generate all desired filters automatically by just giving the search entity. I also thought the criteria api is the way to go but after a while it just got too messy with all the side effects and i turned around creating the query string with parameters myself.
I created an entityscanner which takes all domain entities and generates filterdefinition objects for each desired filter. This scanner takes an entity and follows properties up to a certain level (to keep the amount of filters at bay). I cannot give you the code here since that belongs to a customer but the approach i can provide.
What i needed in the filterdefinition is this: entitytype, propertypath, propertytype, valuesexpression in case we render options (think masterdata), joins needed (to avoid joining several times the same tables), open/closed bracket. This is the definition of a filter.
Then you need a value object holding the current configuration of a user: Inputvalue, operator (>=), brackets, filter link (and/or) .
With this we can render a completly dynamic filter engine with some small limitations. I.e i did not implement parent searches of the same entity yet.
You might start simple an generate a sub query for each filter. Like: where id in (select ....) and/or id in (select ...) This works ok if the amount of entities is not too high but you will feel the performance penalty of several subqueries if the amount of rows in the domain entity table is high.
Then you dive in and find a way to separate the joins needed for a property path and in the querycreator you fiddle out the way of joining entities only again if neccessary.
As said. Start simple. Take first level properties of simple types like string and create your query engine. Enhance it by following specific entity joins and after you can go crazy and introduce expressions fetching options for a select rendering or use the conversion service for input parameters and so on.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio