elastic search faceted query returns incorrect count - elasticsearch

I need help in aggregate / faceted queries in elastic search. I have used faceted query to group the results but I’m not getting grouped result with correct count.
Please suggest on how to get grouped results from elastic search.
{
"query" : {
"query_string" : {"query" : "pared_cat_id:1"} } ,
"facets" : {
"subcategory" : {
"terms" : {
"field": "sub_cat_id",
"size" : 50,
"order" : "term",
"all_terms" : true
}
}
},
"from" : 0,
"size": 50
}
Trying to get grouped results for sub category id for passed parent category id.

"query_string" : {"query" : "pared_cat_id:1"} } ,
This is applied to overall data and not on the facets counts.
FOr this you need to use facet query in which you can specify same which you are specifying in the main query string.
So facets count which are being shown to you now are based on the results without applying "query_string" : {"query" : "pared_cat_id:1"} } , ie. to the whole data. Incase you want facets counts after applying "query_string" : {"query" : "pared_cat_id:1"} } , provide it in the facet query.

Elasticsearch faceting queries works very well in terms of accuracy, at least I have not seen any problem yet.
Just a few questions:
What field is this string or numeric,give example?
Have you applied any custom mapping or you have used default "standard" analyzer
Please state the kind of inaccuracy like "aa" should have count 100 but its 50 or is there any other kind of inaccuracy?

Elasticsearch facets query returns incorrect count if the number of shards is >1, so as for now Facets are deprecated and will be removed in a future release. You are encouraged to migrate to aggregations instead.
I suggest that you take a look at this blog post in which Alex Brasetvik give a good description along with some examples on how to use the aggregations feature properly.

Related

Filter on score after rescore in Elasticsearch

I have been on an internet manhunt for days for this and getting ready to give up. I need to filter on _score in Elasticsearch after the rescore function has completed. So given an example query like this:
POST /_search
{
"query" : {
"match" : {
"message" : {
"operator" : "or",
"query" : "the quick brown"
}
}
},
"rescore" : {
"window_size" : 50,
"query" : {
"rescore_query" : {
"match_phrase" : {
"message" : {
"query" : "the quick brown",
"slop" : 2
}
}
},
"query_weight" : 0.7,
"rescore_query_weight" : 1.2
}
}
}
Say just for simplicity's sake that the above returns 5 documents with scores ranging from 0.0 to 1.0. I want the final returned results set to only be the documents with a score above 0.90. In other words, take those newly-rescored docs, and hand them off to a filter where it drops all documents scored below 0.90.
I have tried many, many different ways but nothing is working. Post_filter is apparently meant to come after the main query but before rescore, so that one doesn't work. min_score does not work at all with rescore, it only works with the original ES scores from the main query. Aggs is one functionality that I am able to get to work after rescore, but aggregating is not what I need to do here. But at least it shows me that ES has the ability to continue operating on the data after a rescore query.
Any thoughts on how to get this seemingly simple task accomplished? I have also tried using function_score and script_score but really those are just ways to further modify the scores, whereas I need to filter on the scores generated by the rescore. The requirement here is to get it done in the query. We can't do it as a post-processing step.

ElasticSearch and Agregation

I have been given a problem where I need to perform a search based on different fields.For example,On UI the user is giving several search option like company name,department,state/province,title country and region.
The user selects few of these options like company name,department,state.I need to perform the search on these fields and return the results.
Can I do this with the help of aggregation in elastic search?Can anyone give me detailed example on how this can be done.
I did a few example like performing aggregation on gender.the query is as follows:-
"aggs" :{"group_by_gender" :{"terms" :{"field" : "gender"}}
When I ran this type of query all the sources(from documents) were returned.So,I was kind of confused whether aggregation is actually performed.
Thanks in Advance
Aggregations are meant to make statistics over the values of fields. If you need to search documents depending on fields, you need to make (boolean) queries.
Example:
POST myIndex/_search
{
"bool" : {
"must" : [
{"term" : { "name" : "kimchy" }},
{"term" : { "state" : "unicorn planet" }}
]
}
}
Elastic search boolquery
boolean query has different parameters like must , should , match ,match all , filter.
hope this will help.

Elastic search using aggregations instead of facets

I am trying to figure out how I would do the following query, but instead of using facets use the new aggregation. The reason for my change is then I would like to take it further and instead of just showing 10 tags, show all tags with a count over 0.
{
"query" : { "query_string" : {"query" : "T*"} },
"facets" : {
"tags" : { "terms" : {"field" : "tags"} }
}
}
Any help would be greatly appreciated
Most facet types have an equivalent aggregation type. The equivalent of the terms facet type is the terms aggregation type.

Elastic Search truncating hits.total via score

Is it possible to execute a query and filter it so that only elements with score > 1.0 are considered in the hits.total response?
I believe you can use min_score to achieve this (http://www.elasticsearch.org/guide/reference/api/search/min-score/). The ES docs example:
{
"min_score": 0.5,
"query" : {
"term" : { "user" : "kimchy" }
}
}
As the docs also say, this isn't usually practical because scoring is a relative calculation. If you're heavily influencing the results however, it might be what you need.

Queries vs Filters - Order of execution

I've read this question and a colleague of mine made me doubt:
In a filtered query, when is the filter applied ? Before or after executing the query ? When is the result cached ?
If the filter is applied beforehand, wouldn't it be a a good thing to duplicate the query part in the filters ?
If the filter is applied afterward, then i'm having trouble understanding what is cached.
Luckily, ES provides two types of filters for you to work with:
{
"query" : {
"field" : { "title" : "Catch-22" }
},
"filter" : {
"term" : { "year" : 1961 }
}
}
{
"query": {
"filtered" : {
"query" : {
"field" : { "title" : "Catch-22" }
},
"filter" : {
"term" : { "year" : 1961 }
}
}
}
}
In the first case, filters are applied to all documents found by the query. In the second case, the documents are filtered before the query runs. This yields better performance.
Quoted from: http://www.packtpub.com/elasticsearch-server-for-fast-scalable-flexible-search-solution/book
About cache, I'm not sure about cache mechanism of filters.
My guessing would be:
First case, since the filter is against a set of results returned by query, the cache is kind of specific for this return set.
Second case, the filter is applied first, the cache is stored for the indices you checked against, thus, this cache is more reusable because it does not rely on the content of the query, but at larger memory cost and query time for first time(before the cache is generated).
Let me explain you search query execution-
First thing is that there is always a Complete document of reference in which you want to search.
If you have filter query included with search query then it will just make that document smaller or in other words filter queries are cached results of same query.
Now you have a smaller tree to search from with your query text.
Now your doubt part- Duplicating the query in filters will only increase overhead of cache mechanism and There are many guide lines on what to include in filter query and what to ignore. It's all play of relevancy.

Resources