Elastic Search Query for multiple conditions - elasticsearch

I want to build a query in Elastic Search which has 3 sub conditions.
1. It must satisfy at-least one of list of provided values.
2. After 1, 2 must be satisfied and then 3rd condition.
(1 must be satisfied, 2 and 3 also must be satisfied but only after 1 is satisfied).
1 is a list of values, so anyone satisfying will suffice.
Please give a outline of how to frame the Elastic Search query using boolean parameters.
Thanks in advance.

{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" :[{"term":{"sessionId":"-ShAwL2KlnVeo6nMMNX3ycVlc0kdikOWPC8vShyvpRpdmOQJkbBo-FiLJymsuZp36gcQs1I"}}],
"should" : [
{ "term" : {"visitorId": "b090606f-968d-fef4-33e3-3341f3a04265"}},
{ "term" : {"clientIp": "192.168.8.100"}}
]
}
}
}
}
}
the terms specified in the must, the documents must match the criteria
the terms specified in the should, any of the term can be matched

Related

Is there a difference between using search terms and should when querying Elasticsearch

I am performing a refactor of the code to query an ES index, and I was wondering if there is any difference between the two snippets below:
"bool" : {
"should" : [ {
"terms" : {
"myType" : [ 1 ]
}
}, {
"terms" : {
"myType" : [ 2 ]
}
}, {
"terms" : {
"myType" : [ 4 ]
}
} ]
}
and
"terms" : {
"myType" : [ 1, 2, 4 ]
}
Please check this blog from Elastic discuss page which will answer your question. Coying here for quick referance:
There's a few differences.
The simplest to see is the verbosity - terms queries just list an
array while term queries require more JSON.
terms queries do not score matches based on IDF (the rareness) of
matched terms - the term query does.
term queries can only have up to 1024 values due to Boolean's max
clause count
terms queries can have more terms
By default, Elasticsearch limits the terms query to a maximum of
65,536 terms. You can change this limit using the
index.max_terms_count setting.
Which of them is going to be faster? Is speed also related to the
number of terms?
It depends. They execute differently. term queries do more expensive scoring but does so lazily. They may "skip" over docs during execution because other more selective criteria may advance the stream of matching docs considered.
The terms queries doesn't do expensive scoring but is more eager and creates the equivalent of a single bitset with a one or zero for every doc by ORing all the potential matching docs up front. Many terms can share the same bitset which is what provides the scalability in term numbers.

What is the difference between `constant_score + filter` and `term` query?

I have two queries in Elasticsearch:
{
"term" : {
"price" : 20
}
}
and
"constant_score" : {
"filter" : {
"term" : {
"price" : 20
}
}
}
They are returning the same query result. I wonder what the main difference between them. I read some articles about scoring document. And I believe both queries are scoring document. The constant_score will use default score 1.0 to match the document's score. So I don't see much difference between these two.
The results would be exactly the exact.
However, the biggest difference is that the constant_score/filter version will cache the results of the term query since it's run in a filter context. All future executions will leverage that cache. Also, one feature of the constant_score query is that the returned score is always equal to the given boost value (which defaults to 1)
The first query will be run outside of the filter context and hence not benefit from the filter cache.

How to use multiple query strings with aggregation in elasticsearch

How to use multiple query strings with aggregate functions in elasticsearch?
For example:
if a>0 AND a<1, then {"low":count(aggregate count of records within 0 to 1)}
else if a > 1 AND a < 100, then {"normal":count(aggregate count of records within 1 to 100)}
else {"high":count(aggregate count of records after 100)}
How to achieve this using Request Body Query string?
Thank you in advance.
Assuming that a is a field that you search on, I think the easiest way for you to do that is using the range aggregation with buckets for each of your use-cases (low, normal, high).
You cannot bind aggregations to conditions of your query. That you would have to do in code yourself. But if you use the range aggregation, you could define your buckets like
POST /_search
{
"aggs" : {
"a_ranges" : {
"range" : {
"field" : "a",
"ranges" : [
{ "to" : 1 },
{ "from" : 1, "to" : 10 },
{ "from" : 10 }
]
}
}
}
}
Depending on your query, two of these buckets would remain empty, but this should give you the result you want

Finding fields Elasticsearch has matched on

I am using Elasticsearch to search for a group a user should join. I have the user data nested into the search query. On return I get back the closest matched group that user should be in.
The field I am searching on is a nested field as follows:
`{"interests": [
{"topics":["python", "stackoverflow", "elasticsearch"]},
{"topics":["arts", "textiles"]}
]}`
However if you want an understanding of a match - how do you do this?
Elasticsearch does have an explain function which says what the scoring is made up of using tfidf, but not specifically what terms were used.
For example, if I search for 'Textile', the doc should match on 'textiles'. Thus I want the term 'textiles' to be returned in explain or some other way.
The only way I see that provides this need, is to store the search and the document retrieved and then process both to discover words ES has most likely matched on.
EDIT - for some more clarity of the question
An example in my index of a group which has "interests": ['arts', 'fine arts', 'art painting', 'arts and crafts', 'sports']
Now my search, I am looking for Arts and many other things. Now the term I am searching for comes up in this list many times, thus should always be a contributor.
What I want in the response is to say these words were matched ['arts', 'fine arts', 'art painting', 'arts and crafts']along with the degree to which they match i..e 'arts' should be higher than the others, but all others are also relevant
Elasticsearch allows you to specify the _name field for all queries and
filters. This means that you can separate your query into different parts with
separate names, which will allow you to determine which parts matched.
For example:
{
"query" : {
"bool" : {
"should" : [
{"match" : { "interests.topics" : {"query" : "python", "_name" : "py-topic"} }},
{"match" : { "interests.topics" : {"query" : "arts", "_name" : "arts-topic"} }}
]
}
}
}
Then, in your response, you will get back any array of which queries (or
filters) matched and you can determine if the py-topic query and/or the
arts-topic query matched above.

How to enable fuzziness for phrase queries in ElasticSearch

We're using ElasticSearch for searching through millions of tags. Our users should be able to include boolean operators (+, -, "xy", AND, OR, brackets). If no hits are returned, we fall back to a spelling suggestion provided by ES and search again. That's our query:
$ curl -XGET 'http://127.0.0.1:9200/my_index/my_type/_search' -d '
{
"query" : {
"query_string" : {
"query" : "some test query +bools -included",
"default_operator" : "AND"
}
},
"suggest" : {
"text" : "some test query +bools -included",
"simple_phrase" : {
"phrase" : {
"field" : "my_tags_field",
"size" : 1
}
}
}
}
Instead of only providing a fallback to spelling suggestions, we'd like to enable fuzzy matching. If, for example, a user searches for "stackoverfolw", ES should return matches for "stackoverflow".
Additional question: What's the better performing method for "correcting" spelling errors? As it is now, we have to perform two subsequent requests, first with the original search term, then with the by ES suggested term.
The query_string does support some fuzziness but only when using the ~ operator, which I think doesn't your usecase. I would add a fuzzy query then and put it in or with the existing query_string. For instance you can use a bool query and add the fuzzy query as a should clause, keeping the original query_string as a must clause.
As for your additional question about how to correct spelling mistakes: I would use fuzzy queries to automatically correct them and two subsequent requests if you want the user to select the right correction from a list (e.g. Did you mean), but your approach sounds good too.

Resources