Using term or terms with one value in Elasticsearch queries - elasticsearch

I am querying an Elasticsearch index using the values of a field. Sometimes, I have to extract all the documents having a field set to exactly one value; Some other times I have to retrieve all the documents having a field, set with one of the values in a list of values.
The latter use case contains the former. Can I use a single query using the terms construct?
POST /_search
{
"query": {
"terms" : { "user" : ["kimchy", "elasticsearch"]}
}
}
Or, in cases I know I need to search only for a unique value, it is better to use the term construct?
POST _search
{
"query": {
"term" : { "user" : "kimchy" }
}
}
Which approach is better regarding performance? Does Elasticsearch perform any optimization if the value in the terms construct is unique?
Thanks to all.

See this link. Terms query is automatically cached while term query is not . So, the next you run the same query, the took time for query for execution will be faster. So if you have a case where you need to run the same query again and again, terms query is a good choice. If not, there is not much of difference between the two.

Related

Elasticsearch: What is the difference between a match and a term in a filter?

I was following an ES tutorial, and at some point I wrote a query using term in the filter instead the recommended solution using match. My understanding is that match was used in the query part to get scoring, while term was used in the filter part to just remove hits before enter the query part. To my surprise match also works in the filter part.
What is the difference between:
GET blogs/_search
{
"query": {
"bool": {
"filter": {
"match": {
"category.keyword": "News"
}
}
}
}
}
and:
GET blogs/_search
{
"query": {
"bool": {
"filter": {
"term": {
"category.keyword": "News"
}
}
}
}
}
Both returns the same hits, and the score is 0 for all hits.
What is the behaviour or match in a filter clause? I would expect it to yield some score, but it does not.
What I thought:
term : does not analyze either the parameter or the field, and it is a yes/no scenario.
match : analyzes parameter and field and calculates a score of how good they match.
But when using match against a keyword in the filter part of the query, how does it behave?
The match query is a high-level query that resorts to using a term query if it needs to.
Scoring has nothing to do with using match instead of term. Scoring kicks in when you use bool/must/should instead of bool/filter.
Here is how the match query works:
First, it checks the type of the field.
If it's a text field then the value will be analyzed, either with the analyzer specified in the query (if any), or with the search- or index-time analyzer specified in the mapping.
If it's a keyword field (like in your case), then the input is not analyzed and taken "as is"
Since you're using the match query on a keyword field and your input is a single term, nothing is analyzed and the match query resorts to using a term query underneath. This is why you're seeing the same results.
In general, it's always best to use a match query as it is smart enough to know what to do given the field you're querying and the input data you're searching for.
You can read more about the difference between the two here.

What is the difference between `constant_score + filter` and `term` query?

I have two queries in Elasticsearch:
{
"term" : {
"price" : 20
}
}
and
"constant_score" : {
"filter" : {
"term" : {
"price" : 20
}
}
}
They are returning the same query result. I wonder what the main difference between them. I read some articles about scoring document. And I believe both queries are scoring document. The constant_score will use default score 1.0 to match the document's score. So I don't see much difference between these two.
The results would be exactly the exact.
However, the biggest difference is that the constant_score/filter version will cache the results of the term query since it's run in a filter context. All future executions will leverage that cache. Also, one feature of the constant_score query is that the returned score is always equal to the given boost value (which defaults to 1)
The first query will be run outside of the filter context and hence not benefit from the filter cache.

Nested count queries

i'm looking to add a feature to an existing query. Basically, I run a query that returns say 1000 documents. Those documents all have the same structure, only the values of certain fields vary. What i'd like, is to not only get the full list as a result, but also count how many results have a field X with the value Y, how many results have the same field X with the value Z etc...
Basically get all the results + 4 or 5 "counts" that would act like the SQL "group by", in a way.
The point of this is to allow full text search over all the clients in our database (without filtering), while showing how many of those are active clients, past clients, active prospects etc...
Any way to do this without running additional / separate queries ?
EDIT WITH ANSWER :
Aggregations is the way to go. Here's how I did it, it's so straightforward that I expected much harder work !
{
"query": {
"term": {
"_type":"client"
}
},
"aggregations" : {
"agg1" : {
"terms" : {
"field" : "listType.typeRef.keyword"
}
}
}
}
Note that it's even in a list of terms and not a single field, that's just how easy it was !
I believe what you are looking for is the aggregation query.
The documentation should be clear enough, but if you struggle please give us your ES query and we will help you from there.

Query that works on difference of dates

Consider I have a doc which has createdDate and closedDate. Now I want to find all docs where (closedDate - createdDate) > 2. I am not able to apply script in range field. Any clue how to proceed with this.
I think this may be possbile by using scripts. By isn't any way I can perform this by query.
Isn't a way to perform this like
{
"range" : {
"date" : {
"gt" : "{createdDate} - {closedDate}/d > 2"
}
}
}
The only way to do that by query is to index an additonal duration field before-hand into your JSON document. Personally I would store the duration in milliseconds and use filters for queries.
If this is not acceptable you will have to use script fields. Described here and here in the Elasticsearch docu.
IMO saving the durtion to each document is preferable, especially if you frequently use the duration for further analysis. The additional field does not cost a lot of memory, but reduces the need for calculations (and therefore is likly to speed up query time) And Especially in Elasticsearch memory shouldn't be a big issue.
Yes, you can do this via script
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": "(doc.closedDate.value - doc.createdDate.value)/86400000 > 2"
}
}
]
}
}
}
Note: make sure to enable dynamic scripting in order to try this.
However, it'd be best to already compute that difference at indexing time and then use a range query on that difference field.

How is Elastic Search sorting when no sort option specified and no search query specified

I wonder how Elastic search is sorting (on what field) when no search query is specified (I just filter on documents) and no sort option specified. It looks like sorting is than random ... Default sort order is _score, but score is always 1 when you do not specify a search query ...
You got it right. Its then more or less random with score being 1. You still get consistent results as far as I remember. You have the "same" when you get results in SQL but don't specify ORDER BY.
Just in case someone may see this post even it posted over 6 yrs ago..
When you wanna know how elasticsearch calculate its own score known as _score, you can use the explain option.
I suppose that your query(with filter & without search) might like this more or less (but the point is making the explain option true) :
POST /goods/_search
{
"explain": true,
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"term": {
"maker_name": "nike"
}
}
}
}
}
As running this, you will notice that the _explaination of each hits describes as below :
"_explanation" : {
"value" : 1.0,
"description" : "ConstantScore(maker_name:nike)",
"details" : [ ]
}
which means ES gave constant score to all of the hits.
So to answer the question, "yes".
The results are sorted kinda randomly because all the filtered results have same (constant) score without any search query.
By the way, enabling an explain option is more helpful when you use search queries. You will see how ES calculates the score and will understand the reason why it returns in that order.
Score is mainly used for sorting, Score is calculated by lucene score calculating using several constraints,For more info refer here .

Resources