I implemented pageable functionality into Criteria API query and I noticed increased memory usage during query execution. I also used spring-data-jpa method query to return same result, but there memory is cleaned up after every batch is processed. I tried detaching, flushing, clearing objects from EntityManager, but memory use would keep going up, occasionally it will drop but not as much as with method queries. My question is what could cause this memory use if objects are detached and how to deal with it?
Memory usage with Criteria API pageable:
Memory usage with method query:
Code
Since I'm also updating entities retrieved from DB, I use approach where I save ID of last processed entity, so when entity gets updated query doesen't skip next selected page. Below I provide code example that is not from real app I'm working on, but it just recreation of the issue I'm having.
Repository code:
#Override
public Slice<Player> getPlayers(int lastId, Pageable pageable) {
List<Predicate> predicates = new ArrayList<>();
CriteriaBuilder criteriaBuilder = entityManager.getCriteriaBuilder();
CriteriaQuery<Player> criteriaQuery = criteriaBuilder.createQuery(Player.class);
Root<Player> root = criteriaQuery.from(Player.class);
predicates.add(criteriaBuilder.greaterThan(root.get("id"), lastId));
criteriaQuery.where(criteriaBuilder.and(predicates.toArray(Predicate[]::new)));
criteriaQuery.orderBy(criteriaBuilder.asc(root.get("id")));
var query = entityManager.createQuery(criteriaQuery);
if (pageable.isPaged()) {
int pageSize = pageable.getPageSize();
int offset = pageable.getPageNumber() > 0 ? pageable.getPageNumber() * pageSize : 0;
// Fetch additional element and skip it based on the pageSize to know hasNext value.
query.setMaxResults(pageSize + 1);
query.setFirstResult(offset);
var resultList = query.getResultList();
boolean hasNext = pageable.isPaged() && resultList.size() > pageSize;
return new SliceImpl<>(hasNext ? resultList.subList(0, pageSize) : resultList, pageable, hasNext);
} else {
return new SliceImpl<>(query.getResultList(), pageable, false);
}
}
Iterating through pageables:
#Override
public Slice<Player> getAllPlayersPageable() {
int lastId = 0;
boolean hasNext = false;
Pageable pageable = PageRequest.of(0, 200);
do {
var players = playerCriteriaRepository.getPlayers(lastId, pageable);
if(!players.isEmpty()){
lastId = players.getContent().get(players.getContent().size() - 1).getId();
for(var player : players){
System.out.println(player.getFirstName());
entityManager.detach(player);
}
}
hasNext = players.hasNext();
} while (hasNext);
return null;
}
I think you are running into a query plan cache issue here that is related to the use of the JPA Criteria API and how numeric values are handled. Hibernate will render all numeric values as literals into an intermediary HQL query string which is then compiled. As you can imagine, every "scroll" to the next page will be a new query string so you gradually fill up the query plan cache.
One possible solution is to use a library like Blaze-Persistence which has a custom JPA Criteria API implementation and a Spring Data integration that will avoid these issues and at the same time improve the performance of your queries due to a better pagination implementation.
All your code would stay the same, you just have to include the integration and configure it as documented in the setup section.
Related
I am programming function for pagination in my repository layer. Function receive as parameters spring's pageable object and some value like this:
public Page<Foo> filterFoo(Pageable pageable, String value) {
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<Foo> fooQuery = cb.createQuery(Foo.class);
Root<Foo> foo = fooQuery .from(Foo.class);
fooQuery .where(adding predicate for match value);
List<Foo> result = entityManager.createQuery(fooQuery )
.setFirstResult((pageable.getPageNumber() - 1) * pageable.getPageSize())
.setMaxResults(pageable.getPageSize())
.getResultList();
return new PageImpl<>(result, pageable, xxxx);
}
Function return spring's PageImpl object filled with my result. To PageImpl I also need set total count of objects which suit predicates. This count number have to be of course without maxResult and firstResult. Is possible create another database call with my fooQuery to get total database records for that query without limit? What is the best practise to use pageable and criteria api in JPA? Thank you in advice.
Because generated SQL uses aliases - you may need make separate query for get total count of rows.
For example:
CriteriaQuery<Long> countQuery = cb.createQuery(Long.class);
countQuery.select(cb.count(countQuery.from(Foo.class)));
if (Objects.nonNull(filters)) {
countQuery.where(filters);
}
return new PageImpl<>(result, pageable, em.createQuery(countQuery).getSingleResult());
where filters is equal to your adding predicate for match value expression.
Also, you may use a TupleQuery with custom SQL function for calculate count of rows in one select query.
Like this:
public class SqlFunctionsMetadataBuilderContributor implements MetadataBuilderContributor {
#Override
public void contribute(MetadataBuilder metadataBuilder) {
metadataBuilder.applySqlFunction(
"count_over",
new SQLFunctionTemplate(
StandardBasicTypes.LONG,
"(count(?1) over())"
)
);
}
}
and Criteria:
public Page<Foo> findAll(Specification<Foo> specification, Pageable pageable) {
CriteriaQuery<Tuple> cq = cb.createTupleQuery();
Root<Foo.class> fooRoot = cq.from(Foo.class);
cq.select(cb.tuple(fooRoot, cb.function("count_over", Long.class, fooRoot.get("id"))));
Predicate filters = specification.toPredicate(fooRoot, cq, cb);
if (Objects.nonNull(filters)) {
cq.where(filters);
}
TypedQuery<Tuple> query = em.createQuery(cq);
query.setFirstResult((int) pageable.getOffset());
query.setMaxResults(pageable.getPageSize());
List<Tuple> result = query.getResultList();
if (result.isEmpty()) {
return new PageImpl<>(List.of());
}
return new PageImpl<>(
result.stream().map(tuple -> (Foo) tuple.get(0)).collect(toUnmodifiableList()),
pageable,
(long) result.get(0).get(1)
);
}
See more about SQLFunction: https://vladmihalcea.com/hibernate-sql-function-jpql-criteria-api-query/ and Custom SQL for Order in JPA Criteria API
During code optimization I found few areas where I was using findOne() within for loop –
public List<User> validateUsers(List<String> userIds) {
List<User> validUsers = new ArrayList<>();
for ( String userId : userIds) {
User user = userRepository.findOne(userId); //Network hit :: expensive call
//Perform validations
...
//Add valid users to validUsers list
...
}
return validUsers;
}
Above method takes long time if I pass huge list of users to validate. [for 300 users around 5 sec.]
Then I changed above method to use findAll() and perform validations on result collection -
public List<User> validateUsers(List<String> userIds) {
List<User> validUsers = new ArrayList<>();
Iterable<User> itr = userRepository.findAll(userIds); //Only one Network hit
for ( User user : itr) {
//Perform validations
...
//Add valid users to validUsers list
...
}
return validUsers;
}
Now for 300 users, results coming in 100 ms.
Question is: Is there any side effects of using findAll() considering the underlying structure of Cassandra? Also I am using CrudRepository. Should I use CassandraRepository?
Following are the parameters to think of when you are attempting this.
How big is the users table, if you are using findAll.
Partition keys for the user table
As Cassandra queries are faster with the primary key fields, findOne might perform better with the large amount of data.
However, can you try
List<T> findAllById(Iterable<ID> ids);
from org.springframework.data.cassandra.repository.CassandraRepository
Is it possible to fetch data in user defined ranges [int starting record -int last record]?
In my case user will define in query String in which range he wants to fetch data.
I have tried something like this
Pageable pageable = new PageRequest(0, 10);
Page<Project> list = projectRepository.findAll(spec, pageable);
Where spec is my defined specification but unfortunately this do not help.
May be I am doing something wrong here.
I have seen other spring jpa provided methods but nothing are of much help.
user can enter something like this localhost:8080/Section/employee? range{"columnName":name,"from":6,"to":20}
So this says to fetch employee data and it will fetch the first 15 records (sorted by columnName ) does not matter as of now.
If you can suggest me something better that would be great.if you think I have not provided enough information please let me know, I will provide required information.
Update :I do not want to use native or Create query statements (until I don't have any other option).
May be something like this:
Pageable pageable = new PageRequest(0, 10);
Page<Project> list = projectRepository.findAll(spec, new pageable(int startIndex,int endIndex){
// here my logic.
});
If you have better options, you can suggest me that as well.
Thanks.
Your approach didn't work, because new PageRequest(0, 10); doens't do what you think. As stated in docs, the input arguments are page and size, not limit and offset.
As far as I know (and somebody correct me if I'm wrong), there is no "out of the box" support for what you need in default SrpingData repositories. But you can create custom implementation of Pagable, that will take limit/offset parameters. Here is basic example - Spring data Pageable and LIMIT/OFFSET
We can do this with Pagination and by setting the database table column name, value & row counts as below:
#Transactional(readOnly=true)
public List<String> queryEmployeeDetails(String columnName,String columnData, int startRecord, int endRecord) {
Query query = sessionFactory.getCurrentSession().createQuery(" from Employee emp where emp.col= :"+columnName);
query.setParameter(columnName, columnData);
query.setFirstResult(startRecord);
query.setMaxResults(endRecord);
List<String> list = (List<String>)query.list();
return list;
}
If I am understanding your problem correctly, you want your repository to allow user to
Provide criteria for query (through Specification)
Provide column to sort
Provide the range of result to retrieve.
If my understanding is correctly, then:
In order to achieve 1., you can make use of JpaSpecificationExecutor from Spring Data JPA, which allow you to pass in Specificiation for query.
Both 2 and 3 is achievable in JpaSpecificationExecutor by use of Pagable. Pageable allow you to provide the starting index, number of record, and sorting columns for your query. You will need to implement your range-based Pageable. PageRequest is a good reference on what you can implement (or you can extend it I believe).
So i got this working as one of the answer suggested ,i implemented my own Pageable and overrided getPagesize(),getOffset(),getSort() thats it.(In my case i did not need more)
public Range(int startIndex, int endIndex, String sortBy) {
this.startIndex = startIndex;
this.endIndex = endIndex;
this.sortBy = sortBy;
}
#Override
public int getPageSize() {
if (endIndex == 0)
return 0;
return endIndex - startIndex;
}
#Override
public int getOffset() {
// TODO Auto-generated method stub
return startIndex;
}
#Override
public Sort getSort() {
// TODO Auto-generated method stub
if (sortBy != null && !sortBy.equalsIgnoreCase(""))
return new Sort(Direction.ASC, sortBy);
else
return new Sort(Direction.ASC, "id");
}
where startIndex ,endIndex are starting and last index of record.
to access it :
repository.findAll(spec,new Range(0,20,"id");
There is no offset parameter you can simply pass. However there is a very simple solution for this:
int pageNumber = Math.floor(offset / limit) + ( offset % limit );
PageRequest pReq = PageRequest.of(pageNumber, limit);
The client just have to keep track on the offset instead of page number. By this I mean your controller would receive the offset instead of the page number.
Hope this helps!
Trying to optimise one of a big query in our GAE/J datastore, and during the experiment found some results which were not expected.
The code itself is pretty straightforward, which I copied below:
public static CursorEntityWrapper<Media> getMediasByUsernameCursored(String username, String cursorString, int batchSize, MediaFetchGroup... fetchGroups) {
PersistenceManager pm = PMF.get().getPersistenceManager();
List<Media> result = new ArrayList<>();
Transaction tx = pm.currentTransaction();
try {
DBUtils.applyFetchGroups(pm, fetchGroups);
tx.begin();
Query query = pm.newQuery(Media.class);
query.getFetchPlan().setFetchSize(batchSize);
if (StringUtils.isNotBlank(cursorString)) {
Cursor c = Cursor.fromWebSafeString(cursorString);
Map<String, Object> extensionMap = new HashMap<>();
extensionMap.put(JDOCursorHelper.CURSOR_EXTENSION, c);
query.setExtensions(extensionMap);
}
query.setFilter("mediaUserString == userParam");
query.declareParameters("java.lang.String userParam");
query.setRange(0, batchSize);
result = (List<Media>) query.execute(username);
Cursor cursor = JDOCursorHelper.getCursor(result);
cursorString = cursor.toWebSafeString();
tx.commit();
} finally {
DBUtils.closeTransaction(tx, log);
DBUtils.closePMAndDetachResult(pm, result);
}
return results;}
As you can see, we have a query method to return all entity Media by username. This entity has some child entities you can pass in as optional parameters.
We got a very poor performance for this query when asking for one of the large fetchgroups. For 60 entities result, it takes about 10 - 15 seconds to return, and when the result is at 200 size, the response can take up to 40 seconds.
One of the interesting experiment was we thought the transaction should have a major impact on the performance, so we take off the transaction from the method, and deployed the application again. But it seems we are wrong that the method became even slower when transaction was not presented.
So my question is why a query without transaction will be slower than with transaction?
I'm using spring data jpa with querydsl. I have a method that returns query results in pages containing total count. getting the total count is expensive and I would like to cache it. how is that possible?
My naive approach
#Cacheable("queryCount")
private long getCount(JPAQuery query){
return query.count();
}
does not work (to make it work they way wanted the actually key for the cache should not be the whole query, just the criteria). Anyway tested it, did not work and then I found this: Spring 3.1 #Cacheable - method still executed
The way I understand this I can only cache the public interface methods. However in said method I would need to cache a property of the return value, eg.
Page<T> findByComplexProperty(...)
I would need to cache
page.getTotalElements();
Annotating the whole method works (it is cached) but not the way I would like. Assume getting total count takes 30 seconds. Hence for every new page request user needs to wait 30 sec. if he goes back a page, then the cache is used but I would want the count to be only run exactly once and then count is fetched from cache.
How can I do that?
My solution was to autowire the cache manager in the class creating the complex query:
#Autowired
private CacheManager cacheManager;
and then create a simple private method getCount
private long getCount(JPAQuery query) {
Predicate whereClause = query.getMetadata().getWhere();
String key = whereClause.toString();
Cache cache = this.cacheManager.getCache(QUERY_CACHE);
Cache.ValueWrapper value = cache.get(key);
if (value == null) {
Long result = query.count();
cache.put(key, result);
return result;
}
return (Long)value.get();
}