Cannot get percentiles or value from Aggregation - spring-boot

The parentSalaries is a list of Buckets of size 1 and contains Aggregations of size 2, which are "precentials_salary" and "avg_salary".
I am trying to get the percentiles values (5.0, 25.0 etc) and the value under the "average_salary" aggregation. However, there is no function like "getValue" or "getPercentiles" for the Aggregation.
I can see the data but can not extract them.
The code that I have is as below;
private void doSomething(Aggregations aggregations) {
//aggregations is the Aggregations from the SearchResponse
Terms parentSalaryRatio = aggregations.get("parent_salary_ratio");
if (parentSalaryRatio != null) {
List<? extends Terms.Bucket> parentSalaries = parentSalaryRatio.getBuckets();
getTotalAvgSalaries(parentSalaries);
}
}
private void getTotalAvgSalaries(List<? extends Terms.Bucket> parentSalaries) {
Aggregations aggregations = parentSalaries.get(0).getAggregations();
Aggregation precentials = aggregations.get("precentials_salary");
Aggregation precentials = aggregations.get("avg_salary");
}
Any help would be greatly appreciated.

found the issue;
I used ParsedSingleValueNumericMetricsAggregation to extract the "value" data. It has the value() function. The ParsedAvg can be used as well. It extends ParsedSingleValueNumericMetricsAggregation
And for the percentiles, I used ParsedTDigestPercentiles as P.J.Meisch suggested

Related

Spring data elasticsearch repository with Pageable is retuning only 10000 documents

I have index with 17364 documents in elasticsearch.
$curl http://localhost:9200/performance/_count
{"count":17364,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}
Spring data repository,
public interface TestRepository extends ElasticsearchRepository<Transaction, String> {
}
Fetch all documents page by page and print:
public void testReport() {
int page = 0, pageSize = 1000;
Pageable of = PageRequest.of(page, pageSize);
Page<Transaction> all = testRepository.findAll(of);
int numberOfPages = all.getTotalPages();
log.info("All pages: {}, {}", numberOfPages, all.getTotalElements());
do {
log.info("Current page: {}, {}", of.getPageNumber(), of.getPageSize());
for (Transaction txn : all) {
log.info(mapper.writeValueAsString(txn));
}
} while ((of = of.next()) != null && (transactionRepository.findAll(of)) != null);
}
This code is returning only 10000 documents although the index has 17364 documents. Could you please help me to find why this is happening.
ElasticSearch Version: 7.9
spring-boot-starter-parent: 2.3.2.RELEASE
I see two options:
A. Since you have only 17364 documents, you could increase the index.max_result_window setting in your index to (e.g.) 20000, so that you can paginate till the end:
PUT performance/_settings
{
"index.max_result_window": 20000
}
B. If you have a bigger index and/or increasing the index.max_result_window limit is not an option for any reason, then you need to leverage the Scroll API. Spring Data ES supports two ways for doing that.
The first method involves leveraging the ElasticsearchTemplate.searchForStream() method which internally uses the Scroll API
SearchHitsIterator<Transaction> stream = elasticsearchTemplate.searchForStream(searchQuery, Transaction.class, "performance");
The second method is a bit more low-level. You need to modify your repository definition with a method that returns a Stream:
public interface TestRepository extends ElasticsearchRepository<Transaction, String> {
Stream<Transaction> findScrollAll();
}
And then implement that method with ElasticsearchTemplate. searchScrollStart() and ElasticsearchTemplate. searchScrollContinue()
Addition:
3rd option:
Just define a method
Stream<Searchhit<Transaction>> searchBy()
in your Testrepository. Or with just the return type Stream<Transaction>.

ES Match query analogue in Lucene

I use queries like this one to run in ES:
boolQuery.must(QueryBuilders.matchQuery("field", value).minimumShouldMatch("50%"))
What's the straight analogue for this query in Lucene?
Match Query, as I understand it, basically analyzes the query, and creates a BooleanQuery out of all the terms the analyzer finds. You could get sorta close by just passing the text through QueryParser.
But you could replicate it something like this:
public static Query makeMatchQuery (String fieldname, String value) throws IOException {
//get a builder to start adding clauses to.
BooleanQuery.Builder qbuilder = new BooleanQuery.Builder();
//We need to analyze that value, and get a tokenstream to read terms from
Analyzer analyzer = new StandardAnalyzer();
TokenStream stream = analyzer.tokenStream(fieldname, new StringReader(value));
stream.reset();
//Iterate the token stream, and add them all to our query
int countTerms = 0;
while(stream.incrementToken()) {
countTerms++;
Query termQuery = new TermQuery(new Term(
fieldname,
stream.getAttribute(CharTermAttribute.class).toString()));
qbuilder.add(termQuery, BooleanClause.Occur.SHOULD);
}
stream.close();
analyzer.close();
//The min should match is a count of clauses, not a percentage. So for 50%, count/2
qbuilder.setMinimumNumberShouldMatch(countTerms / 2);
Query finalQuery = qbuilder.build();
return finalQuery;
}

Spring data mongo db count nested objects with a specific condition

I have a document like that:
'subject' : {
'name' :"...."
'facebookPosts':[
{
date:"14/02/2017 20:20:03" , // it is a string
text:"facebook post text here",
other stuff here
}
]
}
and I want to count the facebookPosts within a specific objects that their date field contains e.g "23/07/2016".
Now, I do that by extracting all the documents and count in the client side (spring ) , But I think that's not efficient.
You need to aggregate your results.
final Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(Criteria.where("facebookPosts.date").regex(REGEX)),
Aggregation.unwind("facebookPosts"),
Aggregation.group().count().as("count"));
Regex might not be the best solution, just an example.
unwind will split array into separate elements you can then count.
Create a class that will hold the count, something like:
public class PostCount {
private Long count;
// getters, setters
}
And then execute it like this:
AggregationResults<PostCount> postCount = mongoTemplate.aggregate(aggregation, Subject.class, PostCount.class);
long count = postCount.getMappedResults().get(0).getCount();

In spring data mongodb how to achieve pagination for aggregation

In spring data mongodb using mongotemplate or mongorepository, how to achieve pagination for aggregateion
This is an answer to an old post, but I'll provide an answer in case anyone else comes along while searching for something like this.
Building on the previous solution by Fırat KÜÇÜK, giving the results.size() as the value for the "total" field in the PageImpl constructor will not making paging work the way, well, you expect paging to work. It sets the total size to the page size every time, so instead, you need to find out the actual total number of results that your query would return:
public Page<UserListItemView> list(final Pageable pageable) {
long total = getCount(<your property name>, <your property value>);
final Aggregation agg = newAggregation(
skip(pageable.getPageNumber() * pageable.getPageSize()),
limit(pageable.getPageSize())
);
final List<UserListItemView> results = mongoTemplate
.aggregate(agg, User.class, UserListItemView.class)
.getMappedResults();
return new PageImpl<>(results, pageable, total);
}
Now, then, the best way to get the total number of results is another question, and it is one that I am currently trying to figure out. The method that I tried (and it worked) was to almost run the same aggregation twice, (once to get the total count, and again to get the actual results for paging) but using only the MatchOperation followed by a GroupOperation to get the count:
private long getCount(String propertyName, String propertyValue) {
MatchOperation matchOperation = match(Criteria.where(propertyName).is(propertyValue));
GroupOperation groupOperation = group(propertyName).count().as("count");
Aggregation aggregation = newAggregation(matchOperation, groupOperation);
return mongoTemplate.aggregate(aggregation, Foo.class, NumberOfResults.class).getMappedResults().get(0).getCount();
}
private class NumberOfResults {
private int count;
public int getCount() {
return count;
}
public void setCount(int count) {
this.count = count;
}
}
It seems kind of inefficient to run nearly the same query twice, but if you are going to page results, the pageable object must know the total number of results if you really want it to behave like paging. If anyone can improve on my method to get the total count of results, that would be awesome!
Edit: This will also provide the count, and it is simpler because you do not need a wrapper object to hold the result, so you can replace the entire previous code block with this one:
private long getCount(String propertyName, String propertyValue) {
Query countQuery = new Query(Criteria.where(propertyName).is(propertyValue));
return mongoTemplate.count(countQuery, Foo.class);
}
In addition to ssouris solution you can use Pageable classes for the results.
public Page<UserListItemView> list(final Pageable pageable) {
final Aggregation agg = newAggregation(
skip(pageable.getPageNumber() * pageable.getPageSize()),
limit(pageable.getPageSize())
);
final List<UserListItemView> results = mongoTemplate
.aggregate(agg, User.class, UserListItemView.class)
.getMappedResults();
return new PageImpl<>(results, pageable, results.size())
}
You can use MongoTemplate
org.spring.framework.data.mongodb.core.aggregation.Aggregation#skip
and
org.springframework.data.mongodb.core.aggregation.Aggregation#limit
Aggregation agg = newAggregation(
project("tags"),
skip(10),
limit(10)
);
AggregationResults<TagCount> results = mongoTemplate.aggregate(agg, "tags", TagCount.class);
List<TagCount> tagCount = results.getMappedResults();
As per the answer https://stackoverflow.com/a/39784851/4546949 I wrote code for Java.
Use aggregation group to get count and array of data with other paging information.
AggregationOperation group = Aggregation.group().count().as("total")
.addToSet(pageable.getPageNumber()).as("pageNumber")
.addToSet(pageable.getPageSize()).as("pageSize")
.addToSet(pageable.getOffset()).as("offset")
.push("$$ROOT").as("data");
Use Aggregation project to slice as per the paging information.
AggregationOperation project = Aggregation.project()
.andInclude("pageSize", "pageNumber", "total", "offset")
.and(ArrayOperators.Slice.sliceArrayOf("data").offset((int) pageable.getOffset()).itemCount(pageable.getPageSize()))
.as("data");
Use mongo template to aggregate.
Aggregation aggr = newAggregation(group, project);
CustomPage page = mongoTemplate.aggregate(aggregation, Foo.class, CustomPage.class).getUniqueMappedResult();
Create a CustomPage.
public class CustomPage {
private long pageSize;
private long pageNumber;
private long offset;
private long total;
private List<Foo> data;
}
Here is my generic solution:
public Page<ResultObject> list(Pageable pageable) {
// build your main stages
List<AggregationOperation> mainStages = Arrays.asList(match(....), group(....));
return pageAggregation(pageable, mainStages, "target-collection", ResultObject.class);
}
public <T> Page<T> pageAggregation(
final Pageable pageable,
final List<AggregationOperation> mainStages,
final String collection,
final Class<T> clazz) {
final List<AggregationOperation> stagesWithCount = new ArrayList<>(mainStages);
stagesWithCount.add(count().as("count"));
final Aggregation countAgg = newAggregation(stagesWithCount);
final Long count = Optional
.ofNullable(mongoTemplate.aggregate(countAgg, collection, Document.class).getUniqueMappedResult())
.map(doc -> ((Integer) doc.get("count")).longValue())
.orElse(0L);
final List<AggregationOperation> stagesWithPaging = new ArrayList<>(mainStages);
stagesWithPaging.add(sort(pageable.getSort()));
stagesWithPaging.add(skip(pageable.getOffset()));
stagesWithPaging.add(limit(pageable.getPageSize()));
final Aggregation resultAgg = newAggregation(stagesWithPaging);
final List<T> result = mongoTemplate.aggregate(resultAgg, collection, clazz).getMappedResults();
return new PageImpl<>(result, pageable, count);
}
To return a Paged Object with correct value of pageable object , I find this is the best and simple way.
Aggregation aggregation = Aggregation.newAggregation(Aggregation.match(Criteria.where("type").is("project")),
Aggregation.group("id").last("id").as("id"), Aggregation.project("id"),
Aggregation.skip(pageable.getPageNumber() * pageable.getPageSize()),
Aggregation.limit(pageable.getPageSize()));
PageableExecutionUtils.getPage(mongoTemplate.aggregate(aggregation, Draft.class, Draft.class).getMappedResults(), pageable,() -> mongoTemplate.count(Query.of(query).limit(-1).skip(-1), Draft.class));
Another approach would be to extend the PagingAndSortingRepository<T, ID> interface. Then, you can create an #Aggregation query method like this:
#Aggregation(pipeline = {
"{ $match: { someField: ?0 } }",
"{ $project: { _id: 0, someField: 1} }"
})
List<StuffAggregateModel> aggregateStuff(final String somePropertyName, final Pageable pageable);
Just call this from your business logic service class and construct the Pageable (which also contains sort options, if desired) and call the repo method. I like this approach because of the simplicity and the sheer minimization of the amount of code that you have to write. If your query (aggregation pipeline) is simple enough, this is probably the best solution. Maintenance coding for this approach is nearly effortless.
My answer with MongoDB $facet
// User(_id, first name, etc), Car (user_id, brand, etc..)
LookupOperation lookupStageCar = Aggregation.lookup(‘cars ’, ‘user_id’, ‘_id’, ‘car’);
MatchOperation matchStage = Aggregation.match(Criteria.where(‘car.user_id ‘).exists(true));
CountOperation countOperation = Aggregation.count().as("total");
AddFieldsOperation addFieldsOperation = Aggregation.addFields().addFieldWithValue("page", pageable.getPageNumber()).build();
SkipOperation skipOperation = Aggregation.skip(Long.valueOf(pageable.getPageNumber() * pageable.getPageSize()));
LimitOperation limitOperation = Aggregation.limit(pageable.getPageSize());
// here the magic
FacetOperation facetOperation = Aggregation.facet( countOperation, addFieldsOperation).as("metadata")
.and(skipOperation, limitOperation).as("data");
// users with car
List<AggrigationResults> map = mongoTemplate.aggregate(Aggregation.newAggregation( lookupStageCar, matchStage, facetOperation), "User", AggrigationResults.class).getMappedResults();
———————————————————————————
public class AggrigationResults {
private List<Metadata> metadata;
private List<User> data;
}
public class Metadata {
private long total;
private long page;
}
———————————————————————————
output:
{
"metadata" : [
{
"total" : 300,
"page" : 3
}
],
"data" : [
{
... original document ...
},
{
... another document ...
},
{
... etc up to 10 docs ...
}
]
}
see : How to use MongoDB aggregation for pagination?

How to get distance - MongoDB Template Near function

I'm trying to find Near by places.
Below code is working fine.
But i'm not able to get actual distance of place from my given lat,lng.
Criteria criteria = new Criteria("coordinates")
.near(new Point(searchRequest.getLat(),searchRequest.getLng()));
Query query = new Query();
query.addCriteria(criteria);
query.addCriteria(criteriaName);
query.limit(5);
List<Place> ls = (List<Place>) mongoTemplate.find(query, Place.class);
You can do it with geoNear aggregation. In spring-data-mongodb GeoNearOperation is representing this aggregation.
Extend or create inherit Place class with field where you would like to have distance information (example with inheritance):
public class PlaceWithDistance extends Place {
private double distance;
public double getDistance() {
return distance;
}
public void setDistance(final double distance) {
this.distance = distance;
}
}
Instead of Criteria with Query use aggregation. Second argument of geoNear is name of field where distance should be set:
final NearQuery nearQuery = NearQuery
.near(new Point(searchRequest.getLat(), searchRequest.getLng()));
nearQuery.num(5);
nearQuery.spherical(true); // if using 2dsphere index, otherwise delete or set false
// "distance" argument is name of field for distance
final Aggregation a = newAggregation(geoNear(nearQuery, "distance"));
final AggregationResults<PlaceWithDistance> results =
mongoTemplate.aggregate(a, Place.class, PlaceWithDistance.class);
// results.forEach(System.out::println);
List<PlaceWithDistance> ls = results.getMappedResults();
Just to make it easier - associated imports:
import static org.springframework.data.mongodb.core.aggregation.Aggregation.geoNear;
import static org.springframework.data.mongodb.core.aggregation.Aggregation.newAggregation;
import org.springframework.data.mongodb.core.aggregation.Aggregation;
import org.springframework.data.mongodb.core.aggregation.AggregationResults;
import org.springframework.data.mongodb.core.aggregation.GeoNearOperation;
import org.springframework.data.mongodb.core.query.NearQuery;
Walery Strauch's example was useful for me...
However I wanted to :
run aggregate query to get all the points in 2dsphere index with-in given distance in Kilometers or Meters. You can use Metrics.KILOMETERS & Metrics.MILES
collection name is not specified as part of pojo
I have 2dsphere index with old way of representation in MongoDB. I am using Mongo as sharded databased for Geo-Spatial queries. My nearSphere query (without aggregation) was failing only when there is a shard key added into the same collection where I have 2dsphere index.
After using below implementation with shard key in the same collection. I am successfully able to fetch the required data.
Here is the sample :
import org.springframework.data.geo.Metrics;
final NearQuery query = NearQuery.near(new Point(longitude, latitude), Metrics.KILOMETERS)
.num(limit)
.minDistance(distanceInKiloMeters)
.maxDistance(maxNearByUEDistanceInKiloMeters)
.spherical(true);
final Aggregation a = newAggregation(geoNear(query, "distance"));
final AggregationResults<PlaceWithDistance> results = offlineMongoTemplate.aggregate(a, "myCollectionName", PlaceWithDistance.class);
final List<PlaceWithDistance> measurements = new ArrayList<PlaceWithDistance>(results.getMappedResults());

Resources