Force JPA to not use union and fetch tables one by one - spring-boot

I have 5 similar tables from which I need to execute a same query and fetch data in pages. I have used polymorphic queries (have super abstract class and used #Inheritance to fetch all rows automatically)
But this approach has problems as noted here: Database pressure on Polymorphic queries
The issue is that the queries use union all which makes DB to search through millions of rows just to get 500 results. So instead I want to execute this serially.
When I execute the method JPA will go to first table; fetch data in pages; if the data fetching is complete then go to second table and so on...
Right now with union, I have ton of pressure on database. With this new approach, I could have less pressure as only one table is accessed at once.
I do not know a way to do this without changing the setup I have right now. For example right now I have it like this:
public interface OhlcDao extends JpaRepository<AbstractOhlc, OhlcId> {
Slice<OhlcRawBean<? extends OhlcBean>> findByIdSourceIdAndIdTickerIdIn(
String sourceId,
Set<String> tickerId,
PageRequest pageRequest
);
}
The method uses union to fetch data which I do not like.
Is there a way to make this work in JPA or Hibernate by changing any internal code (aka without changing my setup, so similar method does not use unions)

Related

Method in Entity need to load all data from aggregate, how to optimalize this?

I've problem with aggregate which one will increase over time.
One day there will be thousands of records and optimalization gonna be bad.
#Entity
public class Serviceman ... {
#ManyToMany(mappedBy = "servicemanList")
private List<ServiceJob> services = new ArrayList<>();
...
public Optional<ServiceJob> firstServiceJobAfterDate(LocalDateTime dateTime) {
return services.stream().filter(i -> i.getStartDate().isAfter(dateTime))
.min(Comparator.comparing(ServiceJob::getStartDate));
}
}
Method just loading all ServiceJob to get just one of them.
Maybe I should delegate this method into service with native sql.
You have to design small aggregates instead of large ones.
This essay explains in detail how to do it: http://dddcommunity.org/library/vernon_2011/. It explains how to decompose your aggregates to smaller ones so you can manage the complexity.
In your case instead of having an Aggregate consisting of two entities: Serviceman and Servicejob with Serviceman being the aggregate root you can decompose it in two smaller aggregates with single entity. ServiceJob will reference Serviceman by ID and you can use ServicejobRpository to make queries.
In your example you will have ServicejobRpository.firstServiceJobAfterDate(guid servicemanID, DateTime date).
This way if you have a lot of entities and you need to scale, you can store Servicejob entities to another DB Server.
If for some reason Serviceman or Servicejob need references to each other to do their work you can use a Service that will use ServicemanRepository and ServicejobRepository to get both aggregates and pass them to one another so they can do their work.

Optimizing Hibernate layer

After moving the logic from a legacy application (SQL/coldfushion) to Spring Rest with Hibernate, we have experienced a slowness in the application. The main reason is with Hibernate we noticed many queries are generated which we used to do with one single query in the legacy application (two pages long query).
Write now, I'm looking at selecting proper fetch strategies and try to optimize code. Could you please give me any other areas that I need to investigate to optimize the Hibernate layer or any other sujjestions?
Try to use DTO not entities(you can load DTO directly from the database)
Review the loading strategy (Eager or Lazy)
Try to use Native Queries more
Try to use more parameters to restrict the result set
You also can leverage some caching technique (cache all static data)
Try to implement hashCode and equals for each entity
If you use, HQL queries, then add the 'join fetch', It avoids the n+1 query problems. For more information on join fetch
e.g:
select a from Model a
inner join fetch a.b as b
Add 'indexes' for columns which are using in where condition.
e.g: Add index for the column 'name' which is used in where condition.
select a from Model a where a.name ='x'
Follow the below links:
http://www.thoughts-on-java.org/tips-to-boost-your-hibernate-performance/
https://docs.jboss.org/hibernate/orm/3.3/reference/en/html/performance.html

How to obtain a generic search/find method with spring data?

All I need is to provide all my repositories with a generic search/find method.
Something like this:
public interface BaseRepository<T, ID extends Serializable>
extends PagingAndSortingRepository<T, ID> {
Iterable<T> search(SearchParameters sp);
}
where the SearchParameters object represents a set of values for each property, and probably a condition to apply on them.
Jpa Criteria is probably the way to go, but I'm really having a hard time finding something that fits my needs.
I used one approach which goes in the same direction but i would rather say its a dynamic approach instead of generic. Its now working pretty well and we are able to generate all desired filters automatically by just giving the search entity. I also thought the criteria api is the way to go but after a while it just got too messy with all the side effects and i turned around creating the query string with parameters myself.
I created an entityscanner which takes all domain entities and generates filterdefinition objects for each desired filter. This scanner takes an entity and follows properties up to a certain level (to keep the amount of filters at bay). I cannot give you the code here since that belongs to a customer but the approach i can provide.
What i needed in the filterdefinition is this: entitytype, propertypath, propertytype, valuesexpression in case we render options (think masterdata), joins needed (to avoid joining several times the same tables), open/closed bracket. This is the definition of a filter.
Then you need a value object holding the current configuration of a user: Inputvalue, operator (>=), brackets, filter link (and/or) .
With this we can render a completly dynamic filter engine with some small limitations. I.e i did not implement parent searches of the same entity yet.
You might start simple an generate a sub query for each filter. Like: where id in (select ....) and/or id in (select ...) This works ok if the amount of entities is not too high but you will feel the performance penalty of several subqueries if the amount of rows in the domain entity table is high.
Then you dive in and find a way to separate the joins needed for a property path and in the querycreator you fiddle out the way of joining entities only again if neccessary.
As said. Start simple. Take first level properties of simple types like string and create your query engine. Enhance it by following specific entity joins and after you can go crazy and introduce expressions fetching options for a select rendering or use the conversion service for input parameters and so on.

Performace issue using Foreach in LINQ

I am using an IList<Employee> where i get the records more then 5000 by using linq which could be better? empdetailsList has 5000
Example :
foreach(Employee emp in empdetailsList)
{
Employee employee=new Employee();
employee=Details.GetFeeDetails(emp.Emplid);
}
The above example takes a lot of time in order to iterate each empdetails where i need to get corresponding fees list.
suggest me anybody what to do?
Linq to SQL/Linq to Entities use a deferred execution pattern. As soon as you call For Each or anything else that indirectly calls GetEnumerator, that's when your query gets translated into SQL and performed against the database.
The trick is to make sure your query is completely and correctly defined before that happens. Use Where(...), and the other Linq filters to reduce as much as possible the amount of data the query will retrieve. These filters are built into a single query before the database is called.
Linq to SQL/Linq to Entities also both use Lazy Loading. This is where if you have related entities (like Sales Order --> has many Sales Order Lines --> has 1 Product), the query will not return them unless it knows it needs to. If you did something like this:
Dim orders = entities.SalesOrders
For Each o in orders
For Each ol in o.SalesOrderLines
Console.WriteLine(ol.Product.Name)
Next
Next
You will get awful performance, because at the time of calling GetEnumerator (the start of the For Each), the query engine doesn't know you need the related entities, so "saves time" by ignoring them. If you observe the database activity, you'll then see hundreds/thousands of database roundtrips as each related entity is then retrieved 1 at a time.
To avoid this problem, if you know you'll need related entities, use the Include() method in Entity Framework. If you've got it right, when you profile the database activity you should only see a single query being made, and every item being retrieved by that query should be used for something by your application.
If the call to Details.GetFeeDetails(emp.Emplid); involves another round-trip of some sort, then that's the issue. I would suggest altering your query in this case to return fee details with the original IList<Employee> query.

preventing OpenJPA N+1 select performance problem on maps

When I have an entity that contains a Map, e.g.
#Entity
public class TestEntity {
#ElementCollection(fetch = FetchType.EAGER)
Map<String, String> strings = new HashMap<String, String>();
}
and I select multiple entities (SELECT z FROM TestEntity z), OpenJPA 2.0 performs one query for each TestEntity to fetch the map, even though I used FetchType.EAGER. This also happens when the Map value is an entity and I use #OneToMany instead of #ElementCollection. In principle this can be done more efficiently with one query that selects all the map entries for all returned TestEntities. For Collection-valued fields OpenJPA already does this by default (openjpa.jdbc.EagerFetchMode" value="parallel") but it seems to fail on this simple entity. (Same problem with value="join").
Could I be doing something wrong? Is there an easy way to tell OpenJPA to not perform a query per entity but only one?
Or is there already any work planned on improving this (I filed it under https://issues.apache.org/jira/browse/OPENJPA-1920)?
It is a problem for us because we wish to fetch (and detach) a list of about 1900 products which takes almost 15 seconds with OpenJPA. It takes less than a second with my own native query.
Having to write only one native query wouldn't be much of a problem but the map we use is inside a reusable StringI18N entity which is referenced from several different entities (and can be deep in the object graph), so native queries are a maintenance headache.
Any help getting performance up is greatly appreciated.
EDIT: explicitly using JOIN FETCH does not help either:
"SELECT z FROM TestEntity z JOIN FETCH z.strings"
OpenJPA's TRACE still shows that it executes one SQL statement for each individual TestEntity.
It might be a pain (correction: I know it'll be a pain) but have you tried actually mapping your 2-field TestEntity as a full JPA-persisted #Entity?
I know that Hibernate used to treat #ElementCollections rather differently to #OneToManys for example - OpenJPA could well be doing something similar.

Resources