Spring Elasticsearch Aggregation Filtering Not Working - spring

I'm trying to query pricing stats on products I am recording in my Elasticsearch Database by product number. The pricing may be for new, used or refurbished products, so I wish to filter on condition. The condition filter works as a JSON query in Marvel returning stats based on two price documents with condition new.
When I try to do similar using the Java API, I am getting stats based on 4 documents that includes 2 new and 2 refurbished.
Could anyone please identify what I am doing wrong in the Java code below?
Thanks.
Here's the working JSON Query:
GET /stats/price/_search
{
"query": {
"match_phrase": {"mpc": "MGTX2LL/A"}
},
"size": 0,
"aggs" : {
"low_price_stats" : {
"filter": {
"term" : { "condition" : "new"}
},
"aggs" : {
"price_stats" : { "extended_stats" : { "field" : "price" } }
}
}
}
}
And the problematic Java:
public Aggregations aggByManufacturerPartNumber(String mpn) {
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices("stats")
.withTypes("price")
.withQuery(termQuery("mpn", mpn))
.withFilter(
FilterBuilders.termFilter("condition", "New")
)
.addAggregation(AggregationBuilders.extendedStats("stats_agg").field("price"))
.build();
Aggregations aggregations = elasticsearchTemplate.query(searchQuery, new ResultsExtractor<Aggregations>() {
#Override
public Aggregations extract(SearchResponse response) {
return response.getAggregations();
}
});
return aggregations;
}

In your Java code you're only building the price_stats sub-aggregation without its parent filter aggregation. The call to withFilter will create a filter at the query level, not at the aggregation level. The correct Java code that matches your JSON query would be like this:
// build top-level filter aggregation
FilterAggregationBuilder lowPriceStatsAgg = AggregationBuilders.filter("low_price_stats")
.filter(FilterBuilders.termFilter("condition", "new"));
// build extended stats sub-aggregation
lowPriceStatsAgg.subAggregation(AggregationBuilders.extendedStats("stats_agg").field("price"));
// build query
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices("stats")
.withTypes("price")
.withQuery(termQuery("mpn", mpn))
.addAggregation(lowPriceStatsAgg)
.build();
// then get the results
Aggregations aggs = response.getAggregations();
Filter lowPriceStats = aggs.get("low_price_stats");
ExtendedStats statsAgg = lowPriceStats.get("stats_agg");
Besides, also note that in your JSON query you have a match_phrase on the mpc field while in your Java code you have a term query on the mpn field. So you probably need to fix that, too, but the above code fixes the aggregation part only.

Related

How to apply filter on filterered data in elastic search using API

How could I be able to add multiple filters on the index
I want to filter results by first_name and then by category using elastic search client
In kibana dashboard
I want to achieve the same functionality using the elastic search client and python
but I am able to filter the data only once
Sample code
#app.route('/get-data')
#login_required
def get_permission():
uri = f'https://localhost:9200/'
client = Elasticsearch(hosts=uri, basic_auth=(session['username'], session['password']), ca_certs=session['cert'], verify_certs=False)
body = {
"from" : 0,
"size" : 20,
"query" : {
"bool" : {
"must" : [],
"filter" : [],
"must_not":[],
"should" :[],
}
}
}
index_data = client.search(index=index, body=body)
return render_template('showdata.html', index_data=index_data)
I have looked into the msearch but it's not working
msearch method on devtool
Result are not correct
Is there any way to filter or reapply the search method on filtered data without messing up the old query
filter is an array in Elasticsearch DSL, and you should be able to provide multiple filters in that array, I can't help with python code, but in JSON filter array looks like
{
"query": {
"bool": {
"filter": [
{
"prefix": {
"question_body_markdown": "i"
}
},
{
"term": {
"customer.first_name": "foo"
}
}
]
}
}
}

Elasticsearch Jest client add condition to json query

I am using Elasticsearch 6.3 with Jest client 6.3 (Java API)
Search search = new Search.Builder(jsonQueryString)
.addIndex("SOME_INDEX")
.build();
SearchResult result = jestClient.execute(search);
And this is my sample JSON query
{
"query": {
"bool" : {
"filter": {
"match" :{
"someField" : "some value"
}
}
}
}
}
The JSON query string is accepted as a POST request body and then passed to the Jest client. Before I can execute the json query on the Jest client, I need to add conditions to the query for e.g.
{
"query": {
"bool" : {
"filter": {
"match" :{
"someField" : "some value"
}
}
},
"must": {
"match" :{
"systemField" : "pre-defined value"
}
}
}
}
}
Is there an API that allows to parse the JSON query and add conditions to it before it can be executed on Jest client? The JSON query can be any query supported by Query DSL and not necessarily contain bool condition. I need to add a pre-defined condition to the query. I appreciate any help on this. Thanks very much.
There is no out of the box Elasticsearch or Jest API to achieve the above, the workaround I implemented is using Jackson ObjectMapper
// convert the search request body into object node
ObjectNode searchRequestNode = objectMapper.readValue(queryString, ObjectNode.class);
// extract the query
String query = searchRequestNode.get("query").toString();
// wrap the original query and add conditions
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.must(QueryBuilders.wrapperQuery(query));
boolQueryBuilder.filter(QueryBuilders.termsQuery("fieldA", listOfValues));
boolQueryBuilder.filter(QueryBuilders.termQuery("fieldB", value));
// convert querybuilder to json query string
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(queryBuilder);
String queryWithFilters = searchSourceBuilder.toString();
// convert json string to object node
ObjectNode queryNode = objectMapper.readValue(queryWithFilters, ObjectNode.class);
// replace original query with the new query containing added conditions
searchRequestNode.set("query", queryNode.get("query"));
String finalSearchRequestWithOwnFilters = searchRequestNode.toString();

"filtered query does not support sort" when using Hibernate Search

I'm trying to issue a query which includes sorting
from Hibernate Search 5.7.1.Final
to ElasticSearch 2.4.2.
When I'm using curl I get the results:
curl -XPOST 'localhost:9200/com.example.app.model.review/_search?pretty' -d '
{
"query": { "match" : { "authors.name" : "Puczel" } },
"sort": { "title": { "order": "asc" } }
}'
But when I issue the query from code:
protected static Session session;
public static void prepareSession()
{
SessionFactory sessionFactory = new Configuration().configure()
.buildSessionFactory();
session = sessionFactory.openSession();
}
...
protected static void testJSONQueryWithSort()
{
FullTextSession fullTextSession = Search.getFullTextSession(session);
QueryDescriptor query = ElasticsearchQueries.fromJson(
"{ 'query': { 'match' : { 'authors.name' : 'Puczel' } }, 'sort': { 'title': { 'order': 'asc' } } }");
List<?> result = fullTextSession.createFullTextQuery(query, Review.class).list();
System.out.println("\n\nSearch results for 'author.name:Puczel':");
for(Object object : result)
{
Review review = (Review) object;
System.out.println(review.toString());
}
}
I get an Exception:
"[filtered] query does not support [sort]"
I understand where it comes from, because the query
that Hibernate Search issues is different than my curl query
- specifying the type is realised differently:
{
"query":
{
"filtered":
{
"query":
{
"match":{"authors.name":"Puczel"}
},
"sort":{"title":{"order":"asc"}},
"filter":{"type":{"value":"com.example.app.model.Review"}}
}
}
}
But I don't know how to change it.
I tried using the sort example from Hibernate documentation:
https://docs.jboss.org/hibernate/search/5.7/reference/en-US/html_single/#__a_id_elasticsearch_query_sorting_a_sorting
But the example is not full. I don't know:
which imports to use (there are multiple matching),
what are the types of the undeclared variables, like s,
how to initalise the variable luceneQuery.
I will appreciate any remarks on this.
Yes, as mentioned in the javadoc of org.hibernate.search.elasticsearch.ElasticsearchQueries.fromJson(String):
Note that only the 'query' attribute is supported.
So you must use the Hibernate Search API to perform sorts.
which imports to use (there are multiple matching),
Sort is the one from Lucene (org.apache.lucene), List is from java.util, and all the other imports should be from Hibernate Search (org.hibernate.search).
what are the types of the undeclared variables, like s
s is a FullTextSession retrieved through org.hibernate.search.Search.getFullTextSession(Session). It will also work with a FullTextEntityManager retrieved through org.hibernate.search.jpa.Search.getFullTextEntityManager(EntityManager).
how to initalise the variable luceneQuery
You'll have to use the query builder (qb):
Query luceneQuery = qb.keyword().onField("authors.name").matching("Puczel").createQuery();
If you intend to use the Hibernate Search API, and you're not comfortable with it yet, I'd recommend reading the general documentation first (not just the Elasticsearch part, which only mentions Elasticsearch specifics): https://docs.jboss.org/hibernate/search/5.7/reference/en-US/html_single/#search-query

How to get selected object only from an array

I have a collection with documents of the following structure:
{
"category": "movies",
"movies": [
{
"name": "HarryPotter",
"language": "english"
},
{
"name": "Fana",
"language": "hindi"
}
]
}
I want to query with movie name="fana" and the response sholud be
{
"category": "movies",
"movies": [
{
"name": "HarryPotter",
"language": "english"
}
]
}
How do I get the above using spring mongoTemplate?
You can try something like this.
Non-Aggregation based approach:
public MovieCollection getMoviesByName() {
BasicDBObject fields = new BasicDBObject("category", 1).append("movies", new BasicDBObject("$elemMatch", new BasicDBObject("name", "Fana").append("size", new BasicDBObject("$lt", 3))));
BasicQuery query = new BasicQuery(new BasicDBObject(), fields);
MovieCollection groupResults = mongoTemplate.findOne(query, MovieCollection.class);
return groupResults;
}
Aggregation based approach:
import static org.springframework.data.mongodb.core.aggregation.Aggregation.*;
import static org.springframework.data.mongodb.core.query.Criteria.where;
public List<BasicDBObject> getMoviesByName() {
Aggregation aggregation = newAggregation(unwind("movies"), match(where("movies.name").is("Fana").and("movies.size").lt(1)),
project(fields().and("category", "$category").and("movies", "$movies")));
AggregationResults<BasicDBObject> groupResults = mongoTemplate.aggregate(
aggregation, "movieCollection", BasicDBObject.class);
return groupResults.getMappedResults();
}
$unwind of mongodb aggregation can be used for this.
db.Collection.aggregate([{
{$unwind : 'movies'},
{$match :{'movies.name' : 'fana'}}
}])
You can try the above query to get required output.
Above approaches provides you a solution using aggregation and basic query. But if you dont want to use BasicObject below code will perfectly work:
Query query = new Query()
query.fields().elemMatch("movies", Criteria.where("name").is("Fana"));
List<Movies> movies = mongoTemplate.find(query, Movies.class);
The drawback of this query is that it may return duplicate results present in different documents, since more than 1 document may match this criteria. So you can add _id in the criteria like below:
Criteria criteria = Criteria.where('_id').is(movieId)
Query query = new Query().addCriteria(criteria)
query.fields().elemMatch("movies", Criteria.where("name").is("Fana"));
query.fields().exclude('_id')
List<Movies> movies = mongoTemplate.find(query, Movies.class);
I am excluding "_id" of the document in the response.

Filter facet returns count of all documents and not range

I'm using Elasticsearch and Nest to create a query for documents within a specific time range as well as doing some filter facets. The query looks like this:
{
"facets": {
"notfound": {
"query": {
"term": {
"statusCode": {
"value": 404
}
}
}
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2014-04-05T05:25:37",
"to": "2014-04-07T05:25:37"
}
}
}
]
}
}
}
In the specific case, the total hits of the search is 21 documents, which fits the documents within that time range in Elasticsearch. But the "notfound" facet returns 38, which fits the total number of ErrorDocuments with a StatusCode value of 404.
As I understand the documentation, facets collects data from withing the search. In this case, the "notfound" facet should never be able to return a count higher that 21.
What am I doing wrong here?
There's a distinct difference between filter/query/filtered_query/facet filter which is good to know.
Top level filter
{
filter: {}
}
This acts as a post-filter, meaning it will filter the results after the query phase has ended. Since facets are part of the query phase filters do not influence the documents that are facetted over. Filters do not alter score and are therefor very cacheable.
Top level query
{
query: {}
}
Queries influence the score of a document and are therefor less cacheable than filters. Queries run in the query phase and thus also influence the documents that are facetted over.
Filtered query
{
query: {
filtered: {
filter: {}
query: {}
}
}
}
This allows you to run filters in the query phase taking advantage of their better cacheability and have them influence the documents that are facetted over.
Facet filter
"facets" : {
"<FACET NAME>" : {
"<FACET TYPE>" : {
...
},
"facet_filter" : {
"term" : { "user" : "kimchy"}
}
}
}
this allows you to apply a filter to the documents that the facet is run over. Remember that the it'll be a combination of the queryphase/facetfilter unless you also specify global:true on the facet as well.
Query Facet/Filter Facet
{
"facets" : {
"wow_facet" : {
"query" : {
"term" : { "tag" : "wow" }
}
}
}
}
Which is the one that #thomasardal is using in this case which is perfectly fine, it's a facet type which returns a single value: the query hit count.
The fact that your Query Facet returns 38 and not 21 is because you use a filter for your time range.
You can fix this by either doing the filter in a filtered_query in the query phase or apply a facet filter(not a filter_facet) to your query_facet although because filters are cached better you better use facet filter inside you filter facet.
Confusingly Filter Facets are specified using .FacetFilter() on the search object. I will change this in 1.0 to avoid future confusion.
Sadly: .FacetFilter() and .FacetQuery() in NEST do not allow you to specify a facet filter like you can with other facets:
var results = typedClient.Search<object>(s => s
.FacetTerm(ft=>ft
.OnField("myfield")
.FacetFilter(f=>f.Term("filter_facet_on_this_field", "value"))
)
);
You issue here is that you are performing a Filter Facet and not a normal facet on your query (which will follow the restrictions applied via the query filter). In the JSON, the issue is because of the "query" between the facet name "notfound" and the "terms" entry. This is telling Elasticsearch to run this as a separate query and facet on the results of this separate query and not your main query with the date range filter. So your JSON should look like the following:
{
"facets": {
"notfound": {
"term": {
"statusCode": {
"value": 404
}
}
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2014-04-05T05:25:37",
"to": "2014-04-07T05:25:37"
}
}
}
]
}
}
}
Since I see you have this tagged with NEST as well, in your call using NEST, you are probably using FacetFilter on your search request, switch this to just Facet to get the desired result.

Resources