Search string keyword by elasticsearch - elasticsearch

I have an issue to implement elasticsearch with the query "energy saving tv".
I have 3 objects with "title" field:
T1: Phone with LG application is an energy saving tv
T2: That tv made by energy saving LG applications
T3: Phone with LG application ensures optimal energy saving
Then I used "match" and "AND" operator for query "energy saving tv":
GET my_index/_search
{
"query": {
"match": {
"title": {
"query": "energy saving tv",
"operator": "and"
}
}
}
}
Result:
Score T1: 5.0
Score T2: 5.37
So T2's score is higher than T1's score, but I wanna title that has form "energy*saving*tv" (in the order of words in the keyword) will have a score higher. Pls help me. Thank you very much!

You can use a Match phrase query to match a phrase comprised of several words.
{
"query": {
"match_phrase": {
"title": "energy saving tv"
}
}
}
Note that this will only match T1 since the exact order is preserved.
If you also want to include other results with a more mixed up or spread apart word order you can add the slop parameter.
This will also match T2, but with a lower score:
{
"query": {
"match_phrase": {
"title": {
"query": "energy saving tv",
"slop": 10
}
}
}
}
The slop basically defines the upper limit to how often you can move a query term to the right or left in order to match the document. It defaults to 0.
E.g. going from the query "energy saving tv" to the document "energy tv saving" would require a slop of 2, since tv moves one term to the left and saving moves one term to the right.
See this answer for a great visual explanation.

Related

Match all words in any order but duplicates consider as individual

We have a text field which matches words in any order but when same words exist in query, it needs to give documents which has same no.of duplicates(means doesn't remove duplicate words)
"match": {
"field": {
"operator": "and",
"query": "2019 1 Scc 1"
}
}
}
wrong results : 2019 6 SCC 1, 2019 5 SCC 1,SCC 1 2009 6
correct result : 1 2019 Scc 1
match query only check the matching term by term. So it cant have a notion of term frequency.
The cool way of fulfilling your requirement could be to create a new scripted similatiry for your field, but I'm not sure that such a script can have access to the term frequency of the request :(
But maybe a match_phrase trick can do the job for you. Match phrase query handle request terms as a whole phrase (so it watchs token position to determine the matching). So if you configure a big slop ( like 10 ) the match query will match when every term of the request have a match in the document field (with a distinct position).
So duplicate tokens in the request needs to be find twice in the document
Here an example :
POST <index>/_search
{
"query": {
"match_phrase": {
"field": {
"slop": 10,
"query": "2019 1 1 Scc 1"
}
}
}
}
I cant assure it will work for all your use cases, but its a starting point :)

What is the difference between must and filter in Query DSL in elasticsearch?

I am new to elastic search and I am confused between must and filter. I want to perform an and operation between my terms, so I did this
POST /xyz/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"city": "city1"
}
},
{
"term": {
"saleType": "sale_type1"
}
}
]
}
}
}
which gave me the required results matching both the terms, and on using filter like this
POST /xyz/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"city": "city1"
}
}
],
"filter": {
"term": {
"saleType": "sale_type1"
}
}
}
}
}
I get the same result, so when should I use must and when should I use filter? What is the difference?
must contributes to the score. In filter, the score of the query is ignored.
In both must and filter, the clause(query) must appear in matching documents. This is the reason for getting same results.
You may check this link
Score
The relevance score of each document is represented by a positive floating-point number called the _score. The higher the _score, the more relevant the document.
A query clause generates a _score for each document.
To know how score is calculated, refer this link
must returns a score for every matching document. This score helps you rank the matching documents, and compare the relative relevance between documents (using the magnitude of the score of each document).
With this, one can say, Doc 1 is how many times more relevant than Doc 2. Or that Doc 1 to 7 are of much higher relevancy than Doc 8+.
For how the relative score is determined, you can refer to the references below.
Briefly, it is related to the number of term occurrences in the document, the document length, and the average number of term occurrences in your database index.
filter doesn't return a score. All one can say is, all matching documents are of relevance. But it won't help in evaluating if one is more relevant than the other. You can think of filter as a must with only 2 scores: zero or non-zero, and where all zero-scored documents are dropped.
filter is helpful if you just want to whitelist/blacklist for e.g., all documents belonging to the topic "pets".
In summary, there are 3 points that will help you in deciding when to use what:
must is your only choice when comparing/ranking documents by relevance
filter excludes all documents that don't match
filter is a lot faster because Elasticsearch doesn't need to compute the relative score
References:
Query vs Filter: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html
Computation of Relevance: https://www.infoq.com/articles/similarity-scoring-elasticsearch/

How to filter results based on frequency of repeating terms in an array in elasticsearch

I have an array field with a lot of keywords and i need to sort the documents on the basis on how many times a particular keyword repetation in those arrays.
For eg,if my field name is "nationality" and for document 1, it consists of the following
doc1
nationality :
["US","UK","Australia","India","US","US"]
and for doc2
nationality:
["US","UK","US","US","US","China"]
I want only those documents to be shown where the term "US" occurs more than 3 times. That would make only doc2 to be shown. How to do this?
You can use scripting for this to be implemented.
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_index['nationality']['US'].tf() > 3"
}
}
}
}
}
Here in this scripy the array "nationality" is checked for the term "US" and the count is taken by tf (term frequency). Now only the documents with term frequency greater than three are shown in the results. You can learn more about the filter operations here

Constant Score Query elasticsearch boosting

My understanding of Constant Score Query in elasticsearch is that boost factor would be assigned as score for every matching query. The documentation says:
A query that wraps a filter or another query and simply returns a constant score equal to the query boost for every document in the filter.
However when I send this query:
"query": {
"constant_score": {
"filter": {
"term": {
"source": "BBC"
}
},
"boost": 3
}
},
"fields": ["title", "source"]
all the matching documents are given a score of 1?! I cannot figure out what I am doing wrong, and had also tried with query instead of filter in constant_score.
Scores are only meant to be relative to all other scores in a given result set, so a result set where everything has the score of 3 is the same as a result set where everything has the score of 1.
Really, the only purpose of the relevance _score is to sort the results of the current query in the correct order. You should not try to compare the relevance scores from different queries. - Elasticsearch Guide
Either the constant score is being ignored because it's not being combined with another query or it's being normalized. As #keety said, check to the output of explain to see exactly what's going on.
Constant score query gives equal score to any matching document irrespective any scoring factors like TF, IDF etc. This can be used when you don't care whether how much a doc matched but just if a doc matched or not and give a score too, unlike filter.
If you want score as 3 literally for all the matching documents for a particular query, then you should be using function score query, something like
"query": {
"function_score": {
"functions": [
{
"filter": { "term": { "source": "BBC" } },
"weight": 3
}
]
}
...
}

Boosting in Elasticsearch

I am new to elasticsearch. In elasticsearch we can use the term boost in almost all queries. I understand it's used for modify score of documents. But i can't find actual use of it. My query is if i use boost values in some queries, will it affect final score of search or the boost rank of docs in index itself.
And what is main difference between boost at index and boost at querying..
Thanks in Advance..!
Query time boost allows you to give more weight to one query than to another. For instance, let's say you are querying the title and body fields for "Quick Brown Fox", you could write it as:
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "Quick Brown Fox"
}
},
{
"match": {
"body": "Quick Brown Fox"
}
}
]
}
}
}
But you decide that you want the title field to be more important than the body field, which means you need to boost the query on the title field by (eg) 2:
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "Quick Brown Fox",
"boost": 2
}
}
},
{
"match": {
"body": "Quick Brown Fox"
}
}
]
}
}
}
(Note how the structure of the match clause changed to accommodate the boost parameter).
The boost value of 2 doesn't double the _score exactly - the scores go through a normalization process. So you should think of boost as make this query clause relatively more important than the other query clauses.
My doubt is if i use boost values in some queries. will it affect final score of search
Yes it does, but you shouldn't rely on the actual value of _score anyway. Its only purpose is to allow Elasticsearch to decide which documents are most relevant to this query. If the query changes, the scores change.
Re index time boosting: don't use it. It's inflexible and error prone.
Boost at query time won't modify your index. It only applies boost factor on fields when searching.
I prefer boost at query time as it's more flexible. If you need to change your boost rules and you had set it at index time, you will probably need to reindex.
Use cases of boosting : Suppose you are building a e-commerce web app, and your product data is in elastic search. Whenever a customer uses search bar you query elastic search and displays the result in web app.
Elastic search keeps relevance score for every document and returns the result in sorted order of the relevance score.
Now let's assume a user searches for "samsung phones", then should your web app just show samsung phones -> Answer is NO.
Your web app should show other phones as well (as user may like those as well) but first show samsung phones (as he/she is looking for those) and then show other phones as well.
So question is how do you query where samsung phones comes up in result ? -> Answer is relevance score.
Let say you hit query like for all mobile phones and samsung phone and the keep high relevance score of samsung phones,
Then result will contain first samsung phones and then other phones.

Resources