Spring data elasticsearch repository with Pageable is retuning only 10000 documents - spring-boot

I have index with 17364 documents in elasticsearch.
$curl http://localhost:9200/performance/_count
{"count":17364,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}
Spring data repository,
public interface TestRepository extends ElasticsearchRepository<Transaction, String> {
}
Fetch all documents page by page and print:
public void testReport() {
int page = 0, pageSize = 1000;
Pageable of = PageRequest.of(page, pageSize);
Page<Transaction> all = testRepository.findAll(of);
int numberOfPages = all.getTotalPages();
log.info("All pages: {}, {}", numberOfPages, all.getTotalElements());
do {
log.info("Current page: {}, {}", of.getPageNumber(), of.getPageSize());
for (Transaction txn : all) {
log.info(mapper.writeValueAsString(txn));
}
} while ((of = of.next()) != null && (transactionRepository.findAll(of)) != null);
}
This code is returning only 10000 documents although the index has 17364 documents. Could you please help me to find why this is happening.
ElasticSearch Version: 7.9
spring-boot-starter-parent: 2.3.2.RELEASE

I see two options:
A. Since you have only 17364 documents, you could increase the index.max_result_window setting in your index to (e.g.) 20000, so that you can paginate till the end:
PUT performance/_settings
{
"index.max_result_window": 20000
}
B. If you have a bigger index and/or increasing the index.max_result_window limit is not an option for any reason, then you need to leverage the Scroll API. Spring Data ES supports two ways for doing that.
The first method involves leveraging the ElasticsearchTemplate.searchForStream() method which internally uses the Scroll API
SearchHitsIterator<Transaction> stream = elasticsearchTemplate.searchForStream(searchQuery, Transaction.class, "performance");
The second method is a bit more low-level. You need to modify your repository definition with a method that returns a Stream:
public interface TestRepository extends ElasticsearchRepository<Transaction, String> {
Stream<Transaction> findScrollAll();
}
And then implement that method with ElasticsearchTemplate. searchScrollStart() and ElasticsearchTemplate. searchScrollContinue()
Addition:
3rd option:
Just define a method
Stream<Searchhit<Transaction>> searchBy()
in your Testrepository. Or with just the return type Stream<Transaction>.

Related

How to implement a list of DB update queries in one call with SpringBoot Webflux + R2dbc application

The goal of my springBoot webflux r2dbc application is Controller accepts a Request including a list of DB UPDATE or INSERT details, and Response a result summary back.
I can write a ReactiveCrudRepository based repository to implement each DB operation. But I don't know how to write the Service to group the executions of the list of DB operations and compose a result summary response.
I am new to java reactive programing. Thanks for any suggestions and help.
Chen
I get the hint from here: https://www.vinsguru.com/spring-webflux-aggregation/ . Ideas are :
From request to create 3 Monos
Mono<List> monoEndDateSet -- DB Row ids of update operation;
Mono<List> monoCreateList -- DB Row ids of new inserted;
Mono monoRespFilled -- partly fill some known fields;
use Mono.zip aggregate the 3 monos, map and aggregate the Tuple3 to Mono to return.
Below are key part of codes:
public Mono<ChangeSupplyResponse> ChangeSupplies(ChangeSupplyRequest csr){
ChangeSupplyResponse resp = ChangeSupplyResponse.builder().build();
resp.setEventType(csr.getEventType());
resp.setSupplyOperationId(csr.getSupplyOperationId());
resp.setTeamMemberId(csr.getTeamMemberId());
resp.setRequestTimeStamp(csr.getTimestamp());
resp.setProcessStart(OffsetDateTime.now());
resp.setUserId(csr.getUserId());
Mono<List<Long>> monoEndDateSet = getEndDateIdList(csr);
Mono<List<Long>> monoCreateList = getNewSupplyEntityList(csr);
Mono<ChangeSupplyResponse> monoRespFilled = Mono.just(resp);
return Mono.zip(monoRespFilled, monoEndDateSet, monoCreateList).map(this::combine).as(operator::transactional);
}
private ChangeSupplyResponse combine(Tuple3<ChangeSupplyResponse, List<Long>, List<Long>> tuple){
ChangeSupplyResponse resp = tuple.getT1().toBuilder().build();
List<Long> endDateIds = tuple.getT2();
resp.setEndDatedDemandStreamSupplyIds(endDateIds);
List<Long> newIds = tuple.getT3();
resp.setNewCreatedDemandStreamSupplyIds(newIds);
resp.setSuccess(true);
Duration span = Duration.between(resp.getProcessStart(), OffsetDateTime.now());
resp.setProcessDurationMillis(span.toMillis());
return resp;
}
private Mono<List<Long>> getNewSupplyEntityList(ChangeSupplyRequest csr) {
Flux<DemandStreamSupplyEntity> fluxNewCreated = Flux.empty();
for (SrmOperation so : csr.getOperations()) {
if (so.getType() == SrmOperationType.createSupply) {
DemandStreamSupplyEntity e = buildEntity(so, csr);
fluxNewCreated = fluxNewCreated.mergeWith(this.demandStreamSupplyRepository.save(e));
}
}
return fluxNewCreated.map(e -> e.getDemandStreamSupplyId()).collectList();
}
...

Memory leak with Criteria API Pageable

I implemented pageable functionality into Criteria API query and I noticed increased memory usage during query execution. I also used spring-data-jpa method query to return same result, but there memory is cleaned up after every batch is processed. I tried detaching, flushing, clearing objects from EntityManager, but memory use would keep going up, occasionally it will drop but not as much as with method queries. My question is what could cause this memory use if objects are detached and how to deal with it?
Memory usage with Criteria API pageable:
Memory usage with method query:
Code
Since I'm also updating entities retrieved from DB, I use approach where I save ID of last processed entity, so when entity gets updated query doesen't skip next selected page. Below I provide code example that is not from real app I'm working on, but it just recreation of the issue I'm having.
Repository code:
#Override
public Slice<Player> getPlayers(int lastId, Pageable pageable) {
List<Predicate> predicates = new ArrayList<>();
CriteriaBuilder criteriaBuilder = entityManager.getCriteriaBuilder();
CriteriaQuery<Player> criteriaQuery = criteriaBuilder.createQuery(Player.class);
Root<Player> root = criteriaQuery.from(Player.class);
predicates.add(criteriaBuilder.greaterThan(root.get("id"), lastId));
criteriaQuery.where(criteriaBuilder.and(predicates.toArray(Predicate[]::new)));
criteriaQuery.orderBy(criteriaBuilder.asc(root.get("id")));
var query = entityManager.createQuery(criteriaQuery);
if (pageable.isPaged()) {
int pageSize = pageable.getPageSize();
int offset = pageable.getPageNumber() > 0 ? pageable.getPageNumber() * pageSize : 0;
// Fetch additional element and skip it based on the pageSize to know hasNext value.
query.setMaxResults(pageSize + 1);
query.setFirstResult(offset);
var resultList = query.getResultList();
boolean hasNext = pageable.isPaged() && resultList.size() > pageSize;
return new SliceImpl<>(hasNext ? resultList.subList(0, pageSize) : resultList, pageable, hasNext);
} else {
return new SliceImpl<>(query.getResultList(), pageable, false);
}
}
Iterating through pageables:
#Override
public Slice<Player> getAllPlayersPageable() {
int lastId = 0;
boolean hasNext = false;
Pageable pageable = PageRequest.of(0, 200);
do {
var players = playerCriteriaRepository.getPlayers(lastId, pageable);
if(!players.isEmpty()){
lastId = players.getContent().get(players.getContent().size() - 1).getId();
for(var player : players){
System.out.println(player.getFirstName());
entityManager.detach(player);
}
}
hasNext = players.hasNext();
} while (hasNext);
return null;
}
I think you are running into a query plan cache issue here that is related to the use of the JPA Criteria API and how numeric values are handled. Hibernate will render all numeric values as literals into an intermediary HQL query string which is then compiled. As you can imagine, every "scroll" to the next page will be a new query string so you gradually fill up the query plan cache.
One possible solution is to use a library like Blaze-Persistence which has a custom JPA Criteria API implementation and a Spring Data integration that will avoid these issues and at the same time improve the performance of your queries due to a better pagination implementation.
All your code would stay the same, you just have to include the integration and configure it as documented in the setup section.

Spring data JDBC query creation with pagination complains IncorrectResultSizeDataAccessException: Incorrect result size

I'm struggling to trying the pagination feature, as described in the reference document.
This is my table schema:
CREATE TABLE cities
(
id int PRIMARY KEY,
name varchar(255),
pref_id int
);
Repository:
public interface CityRepository extends CrudRepository<CityEntity, Integer> {
Page<CityEntity> findAll(Pageable pageable);
// get all cities in the prefecture
Page<CityEntity> findByPrefId(Integer prefId, Pageable pageable);
}
Test code:
Page<CityEntity> allCities = repository.findAll(PageRequest.of(0, 10));
Page<CityEntity> cities = repository.findByPrefId(1, PageRequest.of(0, 10));
findAll works well, but findByPrefId throws the following error:
Incorrect result size: expected 1, actual 10
org.springframework.dao.IncorrectResultSizeDataAccessException: Incorrect result size: expected 1, actual 10
at org.springframework.dao.support.DataAccessUtils.nullableSingleResult(DataAccessUtils.java:100)
at org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate.queryForObject(NamedParameterJdbcTemplate.java:237)
at org.springframework.data.jdbc.repository.query.AbstractJdbcQuery.lambda$singleObjectQuery$1(AbstractJdbcQuery.java:115)
at org.springframework.data.jdbc.repository.query.PartTreeJdbcQuery.execute(PartTreeJdbcQuery.java:98)
at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor$QueryMethodInvoker.invoke(QueryExecutorMethodInterceptor.java:195)
at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.doInvoke(QueryExecutorMethodInterceptor.java:152)
at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.invoke(QueryExecutorMethodInterceptor.java:130)
...
If I change the method signature into List<CityEntity> findByPrefId(Integer prefId, Pageable pageable), it works.
Am I missing something? I'm using the latest version of spring-data-jdbc (2.0.2.RELEASE).
I don't know about the technicality, but this is what I learned from experience.
In your case, if the total number of cities is lesser than the pageable.getPageSize(), then your repository will return a List<>.
But if total number of cities is bigger than the pageable.getPageSize() then your repository will return a Page<>.
Knowing that, this is what I did to work around it.
Long amount = repository.countByPrefId(prefId);
if(pagination.getPageSize()>amount ) {
List<CityEntity> list = repository.findByPrefId(prefId);
} else {
Page<CityEntity> pages = repository.findByPrefId(person, PageRequest.of(0, 10));
}
This also means that in your repository you'll have two differents methods, one with Pageable as a parameter and one with only PrefId as a parameter.
I believe the accepted answer is referring to Spring Data JPA which does work by returning pages based on a count query derived from the custom query OR manually set via countQuery, no reason for the if/else.
However this flat out does not work in Spring Data JDBC.
https://jira.spring.io/browse/DATAJDBC-554
Workaround provided in link but for reference:
interface FooRepository extends PagingAndSortingRepository<FooEntity, Long> {
List<FooEntity> findAllByBar(String bar, Pageable pageable);
Long countAllByBar(String bar);
}
And then combining those 2 queries like this:
List<FooEntity> fooList = repository.findAllByBar("...", pageable);
Long fooTotalCount = repository.countAllByBar("...");
Page<FooEntity> fooPage = PageableExecutionUtils.getPage(fooList, pageable, () -> fooTotalCount);

In spring data mongodb how to achieve pagination for aggregation

In spring data mongodb using mongotemplate or mongorepository, how to achieve pagination for aggregateion
This is an answer to an old post, but I'll provide an answer in case anyone else comes along while searching for something like this.
Building on the previous solution by Fırat KÜÇÜK, giving the results.size() as the value for the "total" field in the PageImpl constructor will not making paging work the way, well, you expect paging to work. It sets the total size to the page size every time, so instead, you need to find out the actual total number of results that your query would return:
public Page<UserListItemView> list(final Pageable pageable) {
long total = getCount(<your property name>, <your property value>);
final Aggregation agg = newAggregation(
skip(pageable.getPageNumber() * pageable.getPageSize()),
limit(pageable.getPageSize())
);
final List<UserListItemView> results = mongoTemplate
.aggregate(agg, User.class, UserListItemView.class)
.getMappedResults();
return new PageImpl<>(results, pageable, total);
}
Now, then, the best way to get the total number of results is another question, and it is one that I am currently trying to figure out. The method that I tried (and it worked) was to almost run the same aggregation twice, (once to get the total count, and again to get the actual results for paging) but using only the MatchOperation followed by a GroupOperation to get the count:
private long getCount(String propertyName, String propertyValue) {
MatchOperation matchOperation = match(Criteria.where(propertyName).is(propertyValue));
GroupOperation groupOperation = group(propertyName).count().as("count");
Aggregation aggregation = newAggregation(matchOperation, groupOperation);
return mongoTemplate.aggregate(aggregation, Foo.class, NumberOfResults.class).getMappedResults().get(0).getCount();
}
private class NumberOfResults {
private int count;
public int getCount() {
return count;
}
public void setCount(int count) {
this.count = count;
}
}
It seems kind of inefficient to run nearly the same query twice, but if you are going to page results, the pageable object must know the total number of results if you really want it to behave like paging. If anyone can improve on my method to get the total count of results, that would be awesome!
Edit: This will also provide the count, and it is simpler because you do not need a wrapper object to hold the result, so you can replace the entire previous code block with this one:
private long getCount(String propertyName, String propertyValue) {
Query countQuery = new Query(Criteria.where(propertyName).is(propertyValue));
return mongoTemplate.count(countQuery, Foo.class);
}
In addition to ssouris solution you can use Pageable classes for the results.
public Page<UserListItemView> list(final Pageable pageable) {
final Aggregation agg = newAggregation(
skip(pageable.getPageNumber() * pageable.getPageSize()),
limit(pageable.getPageSize())
);
final List<UserListItemView> results = mongoTemplate
.aggregate(agg, User.class, UserListItemView.class)
.getMappedResults();
return new PageImpl<>(results, pageable, results.size())
}
You can use MongoTemplate
org.spring.framework.data.mongodb.core.aggregation.Aggregation#skip
and
org.springframework.data.mongodb.core.aggregation.Aggregation#limit
Aggregation agg = newAggregation(
project("tags"),
skip(10),
limit(10)
);
AggregationResults<TagCount> results = mongoTemplate.aggregate(agg, "tags", TagCount.class);
List<TagCount> tagCount = results.getMappedResults();
As per the answer https://stackoverflow.com/a/39784851/4546949 I wrote code for Java.
Use aggregation group to get count and array of data with other paging information.
AggregationOperation group = Aggregation.group().count().as("total")
.addToSet(pageable.getPageNumber()).as("pageNumber")
.addToSet(pageable.getPageSize()).as("pageSize")
.addToSet(pageable.getOffset()).as("offset")
.push("$$ROOT").as("data");
Use Aggregation project to slice as per the paging information.
AggregationOperation project = Aggregation.project()
.andInclude("pageSize", "pageNumber", "total", "offset")
.and(ArrayOperators.Slice.sliceArrayOf("data").offset((int) pageable.getOffset()).itemCount(pageable.getPageSize()))
.as("data");
Use mongo template to aggregate.
Aggregation aggr = newAggregation(group, project);
CustomPage page = mongoTemplate.aggregate(aggregation, Foo.class, CustomPage.class).getUniqueMappedResult();
Create a CustomPage.
public class CustomPage {
private long pageSize;
private long pageNumber;
private long offset;
private long total;
private List<Foo> data;
}
Here is my generic solution:
public Page<ResultObject> list(Pageable pageable) {
// build your main stages
List<AggregationOperation> mainStages = Arrays.asList(match(....), group(....));
return pageAggregation(pageable, mainStages, "target-collection", ResultObject.class);
}
public <T> Page<T> pageAggregation(
final Pageable pageable,
final List<AggregationOperation> mainStages,
final String collection,
final Class<T> clazz) {
final List<AggregationOperation> stagesWithCount = new ArrayList<>(mainStages);
stagesWithCount.add(count().as("count"));
final Aggregation countAgg = newAggregation(stagesWithCount);
final Long count = Optional
.ofNullable(mongoTemplate.aggregate(countAgg, collection, Document.class).getUniqueMappedResult())
.map(doc -> ((Integer) doc.get("count")).longValue())
.orElse(0L);
final List<AggregationOperation> stagesWithPaging = new ArrayList<>(mainStages);
stagesWithPaging.add(sort(pageable.getSort()));
stagesWithPaging.add(skip(pageable.getOffset()));
stagesWithPaging.add(limit(pageable.getPageSize()));
final Aggregation resultAgg = newAggregation(stagesWithPaging);
final List<T> result = mongoTemplate.aggregate(resultAgg, collection, clazz).getMappedResults();
return new PageImpl<>(result, pageable, count);
}
To return a Paged Object with correct value of pageable object , I find this is the best and simple way.
Aggregation aggregation = Aggregation.newAggregation(Aggregation.match(Criteria.where("type").is("project")),
Aggregation.group("id").last("id").as("id"), Aggregation.project("id"),
Aggregation.skip(pageable.getPageNumber() * pageable.getPageSize()),
Aggregation.limit(pageable.getPageSize()));
PageableExecutionUtils.getPage(mongoTemplate.aggregate(aggregation, Draft.class, Draft.class).getMappedResults(), pageable,() -> mongoTemplate.count(Query.of(query).limit(-1).skip(-1), Draft.class));
Another approach would be to extend the PagingAndSortingRepository<T, ID> interface. Then, you can create an #Aggregation query method like this:
#Aggregation(pipeline = {
"{ $match: { someField: ?0 } }",
"{ $project: { _id: 0, someField: 1} }"
})
List<StuffAggregateModel> aggregateStuff(final String somePropertyName, final Pageable pageable);
Just call this from your business logic service class and construct the Pageable (which also contains sort options, if desired) and call the repo method. I like this approach because of the simplicity and the sheer minimization of the amount of code that you have to write. If your query (aggregation pipeline) is simple enough, this is probably the best solution. Maintenance coding for this approach is nearly effortless.
My answer with MongoDB $facet
// User(_id, first name, etc), Car (user_id, brand, etc..)
LookupOperation lookupStageCar = Aggregation.lookup(‘cars ’, ‘user_id’, ‘_id’, ‘car’);
MatchOperation matchStage = Aggregation.match(Criteria.where(‘car.user_id ‘).exists(true));
CountOperation countOperation = Aggregation.count().as("total");
AddFieldsOperation addFieldsOperation = Aggregation.addFields().addFieldWithValue("page", pageable.getPageNumber()).build();
SkipOperation skipOperation = Aggregation.skip(Long.valueOf(pageable.getPageNumber() * pageable.getPageSize()));
LimitOperation limitOperation = Aggregation.limit(pageable.getPageSize());
// here the magic
FacetOperation facetOperation = Aggregation.facet( countOperation, addFieldsOperation).as("metadata")
.and(skipOperation, limitOperation).as("data");
// users with car
List<AggrigationResults> map = mongoTemplate.aggregate(Aggregation.newAggregation( lookupStageCar, matchStage, facetOperation), "User", AggrigationResults.class).getMappedResults();
———————————————————————————
public class AggrigationResults {
private List<Metadata> metadata;
private List<User> data;
}
public class Metadata {
private long total;
private long page;
}
———————————————————————————
output:
{
"metadata" : [
{
"total" : 300,
"page" : 3
}
],
"data" : [
{
... original document ...
},
{
... another document ...
},
{
... etc up to 10 docs ...
}
]
}
see : How to use MongoDB aggregation for pagination?

How to query data via Spring data JPA with user defined offset and limit (Range)

Is it possible to fetch data in user defined ranges [int starting record -int last record]?
In my case user will define in query String in which range he wants to fetch data.
I have tried something like this
Pageable pageable = new PageRequest(0, 10);
Page<Project> list = projectRepository.findAll(spec, pageable);
Where spec is my defined specification but unfortunately this do not help.
May be I am doing something wrong here.
I have seen other spring jpa provided methods but nothing are of much help.
user can enter something like this localhost:8080/Section/employee? range{"columnName":name,"from":6,"to":20}
So this says to fetch employee data and it will fetch the first 15 records (sorted by columnName ) does not matter as of now.
If you can suggest me something better that would be great.if you think I have not provided enough information please let me know, I will provide required information.
Update :I do not want to use native or Create query statements (until I don't have any other option).
May be something like this:
Pageable pageable = new PageRequest(0, 10);
Page<Project> list = projectRepository.findAll(spec, new pageable(int startIndex,int endIndex){
// here my logic.
});
If you have better options, you can suggest me that as well.
Thanks.
Your approach didn't work, because new PageRequest(0, 10); doens't do what you think. As stated in docs, the input arguments are page and size, not limit and offset.
As far as I know (and somebody correct me if I'm wrong), there is no "out of the box" support for what you need in default SrpingData repositories. But you can create custom implementation of Pagable, that will take limit/offset parameters. Here is basic example - Spring data Pageable and LIMIT/OFFSET
We can do this with Pagination and by setting the database table column name, value & row counts as below:
#Transactional(readOnly=true)
public List<String> queryEmployeeDetails(String columnName,String columnData, int startRecord, int endRecord) {
Query query = sessionFactory.getCurrentSession().createQuery(" from Employee emp where emp.col= :"+columnName);
query.setParameter(columnName, columnData);
query.setFirstResult(startRecord);
query.setMaxResults(endRecord);
List<String> list = (List<String>)query.list();
return list;
}
If I am understanding your problem correctly, you want your repository to allow user to
Provide criteria for query (through Specification)
Provide column to sort
Provide the range of result to retrieve.
If my understanding is correctly, then:
In order to achieve 1., you can make use of JpaSpecificationExecutor from Spring Data JPA, which allow you to pass in Specificiation for query.
Both 2 and 3 is achievable in JpaSpecificationExecutor by use of Pagable. Pageable allow you to provide the starting index, number of record, and sorting columns for your query. You will need to implement your range-based Pageable. PageRequest is a good reference on what you can implement (or you can extend it I believe).
So i got this working as one of the answer suggested ,i implemented my own Pageable and overrided getPagesize(),getOffset(),getSort() thats it.(In my case i did not need more)
public Range(int startIndex, int endIndex, String sortBy) {
this.startIndex = startIndex;
this.endIndex = endIndex;
this.sortBy = sortBy;
}
#Override
public int getPageSize() {
if (endIndex == 0)
return 0;
return endIndex - startIndex;
}
#Override
public int getOffset() {
// TODO Auto-generated method stub
return startIndex;
}
#Override
public Sort getSort() {
// TODO Auto-generated method stub
if (sortBy != null && !sortBy.equalsIgnoreCase(""))
return new Sort(Direction.ASC, sortBy);
else
return new Sort(Direction.ASC, "id");
}
where startIndex ,endIndex are starting and last index of record.
to access it :
repository.findAll(spec,new Range(0,20,"id");
There is no offset parameter you can simply pass. However there is a very simple solution for this:
int pageNumber = Math.floor(offset / limit) + ( offset % limit );
PageRequest pReq = PageRequest.of(pageNumber, limit);
The client just have to keep track on the offset instead of page number. By this I mean your controller would receive the offset instead of the page number.
Hope this helps!

Resources