Search documents have at least one word in a list in ElasticSearch - elasticsearch

I would like to search documents with 1) some phrases that must exist in one of three fields 2) and a list words in which at least one of them occurs in one of the fields, such as ['supply', 'procure', 'purchase'].
Below is the current ES query I use which meets the first requirement. However, how should I add the word list in this query?
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "ford",
"fields": [
"title",
"description",
"news_content"
]
}
},
{
"multi_match": {
"query": "lone star",
"fields": [
"title",
"description",
"news_content"
],
"type": "phrase"
}
}
]
}
}
}

You are almost there, just add operator OR in your query, which would solve your second use case of list words in which at least one of them occurs in one of the fields,
Let me show if you by an example:
Index def
{
"mappings" :{
"properties" :{
"title" :{
"type" : "text"
},
"description":{
"type" : "text"
}
}
}
}
Index sample doc
{
"title" : "foo",
"description": "opster"
}
{
"title" : "bar",
"description": "stackoverflow"
}
{
"title" : "baz",
"description": "nodesc"
}
Search query, notice I am searching for foo amit, list of words so atleast one of them should match in any of 2 fields
{
"query": {
"bool": {
"should": {
"multi_match": {
"query": "foo amit",
"fields": [
"title",
"description"
],
"operator": "or" --> notice operator OR
}
}
}
}
}
Search result
"hits": [
{
"_index": "white",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"title": "foo", --> notice this match as `foo` is present and we used opertor OR in query.
"description": "opster"
}
}
]

Related

Update the score of Pinned Documents in Elastic Search

I have a requirement to show some documents always on top of search results and for that, I have used pinned query to pin some documents and the pinned documents will have a score value of 1.7014122E38.
But I have another requirement to modify this score of pinned documents which I'm unable to achieve at the query level.
Sample Documents
"docs": [
{
"_id": 1,
"name": "jack"
},
{
"_id": 2,
"name": "ryan"
},
{
"_id": 3,
"name": "mark"
},
{
"_id": 4,
"name": "taylor"
},
{
"_id": 5,
"name": "taylor"
}
]
}
ES Query
{
"query": {
"bool": {
"should": [
{
"pinned": {
"ids": [
"3"
],
"organic": {
"query": {
"bool": {
"must": [
{
"multi_match": {
"name": "taylor",
"fields": [
"name"
]
}
}
]
}
}
}
}
}
]
}
}
}
Now I want to multiply the pinned document score weight with some value which I'm unable to achieve in ES.
Can someone please help me to solve this requirement?
Since the pinned queries' scores are calculated at query time, there's no way of knowing what they're will end up being. It could be 1.7014122E38 but also 1.7014122402528844E38 etc.
What you could do is use a sort script and check whether the implicit score is unusually high (I chose Integer.MAXV_VALUE as the boundary) which'd indicate whether or not you're dealing with a pinned. If that's the case, you can override the pinned documents' scores however you like.
POST your-index/_search?track_scores&filter_path=hits.hits._id,hits.hits._source,hits.hits.sort
{
"query": {
"bool": {
"should": [
{
"pinned": {
"ids": [ "3" ],
"organic": {
"bool": {
"must": [
{
"multi_match": {
"query": "taylor",
"fields": [
"name"
]
}
}
]
}
}
}
}
]
}
},
"sort": [
{
"_script": {
"order": "desc",
"type": "number",
"script": {
"source": "_score >= Integer.MAX_VALUE ? params.score_rewrite : _score",
"params": {
"score_rewrite": 42
}
}
}
}
]
}
Note that it's necessary to set the track_scores URI parameter because when sorting on a field, the scores are not computed by default.
That way, the resulting hits would look along the lines of:
{
"hits" : {
"hits" : [
{
"_id" : "3", <-- pinned ID
"_source" : {
"name" : "mark"
},
"sort" : [
42.0 <-- overridden sort
]
},
{
"_id" : "4",
"_source" : {
"name" : "taylor"
},
"sort" : [
0.875468730926 <-- default sort
]
},
{
"_id" : "5",
"_source" : {
"name" : "taylor"
},
"sort" : [
0.875468730926
]
}
]
}
}
P.S.: Integer.MAX_VALUE is arbitrary and there's absolutely no guarantee that it'll catch all pinned docs. In other words, a bit of experimentation will be needed to choose a bulletproof boundary.

Returning documents that match multiple wildcard string queries

I'm new to Elasticsearch and would greatly appreciate help on this
In the query below I only want the first document to be returned, but instead both documents are returned. How can I write a query to search for two wildcard strings on two separate fields, but only return documents that match?
I think what's being returned currently is score dependent, but I don't need the score.
POST /pr/_doc/1
{
"type": "Type ONE",
"currency":"USD"
}
POST /pr/_doc/2
{
"type": "Type TWO",
"currency":"USD"
}
GET /pr/_search
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Type ON*",
"fields": ["type"],
"analyze_wildcard": true
}
},
{
"simple_query_string": {
"query": "US*",
"fields": ["currency"],
"analyze_wildcard":true
}
}
]
}
}
}
Use below query which uses the default_operator: AND and query string for in depth information and further reading.
Search query
{
"query": {
"query_string": {
"query": "(Type ON*) AND (US*)",
"fields" : ["type", "currency"],
"default_operator" : "AND"
}
}
}
Index your sample docs and it returns your expected doc only:
"hits": [
{
"_index": "multiplequery",
"_type": "_doc",
"_id": "1",
"_score": 2.1823215,
"_source": {
"type": "Type ONE",
"currency": "USD"
}
}
]

Word and phrase search on multiple fields in ElasticSearch

I'd like to search documents using Python through ElasticSearch. I am looking for documents which contains word and/or phrase in any one of three fields.
GET /my_docs/_search
{
"query": {
"multi_match": {
"query": "Ford \"lone star\"",
"fields": [
"title",
"description",
"news_content"
],
"minimum_should_match": "-1",
"operator": "AND"
}
}
}
In the above query, I'd like to get documents whose title, description, or news_content contain "Ford" and "lone star" (as a phrase).
However, it seems that it does not consider "lone star" as a phrase. It returns documents with "Ford", "lone", and "star".
So, I was able to reproduce your issue and solved it using the REST API of Elasticsearch as I am not familiar with the python syntax and glad you provided your search query in JSON format, and I built my solution on top of it.
Index def
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"description" :{
"type" : "text"
},
"news_content" : {
"type" : "text"
}
}
}
}
Sample docs
{
"title" : "Ford",
"news_content" : "lone star", --> note this matches your criteria
"description" : "foo bar"
}
{
"title" : "Ford",
"news_content" : "lone",
"description" : "star"
}
Search query you are looking for
{
"query": {
"bool": {
"must": [ --> note this, both clause must match
{
"multi_match": {
"query": "ford",
"fields": [
"title",
"description",
"news_content"
]
}
},
{
"multi_match": {
"query": "lone star",
"fields": [
"title",
"description",
"news_content"
],
"type": "phrase" --> note `lone star` must be phrase
}
}
]
}
}
}
Result contains just one doc from sample
"hits": [
{
"_index": "so_phrase",
"_type": "_doc",
"_id": "1",
"_score": 0.9527341,
"_source": {
"title": "Ford",
"news_content": "lone star",
"description": "foo bar"
}
}
]

How Elasticsearch relevance score gets calculated?

I am using multi_match with phrase_prefix for full text search in Elasticsearch 5.5. ES query looks like
{
query: {
bool: {
must: {
multi_match: {
query: "butt",
type: "phrase_prefix",
fields: ["item.name", "item.keywords"],
max_expansions: 10
}
}
}
}
}
I am getting following response
[
{
"_index": "items_index",
"_type": "item",
"_id": "2",
"_score": 0.61426216,
"_source": {
"item": {
"keywords": "amul butter, milk, butter milk, flavoured",
"name": "Flavoured Butter"
}
}
},
{
"_index": "items_index",
"_type": "item",
"_id": "1",
"_score": 0.39063013,
"_source": {
"item": {
"keywords": "amul butter, milk, butter milk",
"name": "Butter Milk"
}
}
}
]
Mappings is as follows(I am using default mappings)
{
"items_index" : {
"mappings" : {
"parent_doc": {
...
"properties": {
"item" : {
"properties" : {
"keywords" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
How item with "name": "Flavoured Butter" getting higher score of 0.61426216 against the document with "name": "Butter Milk" and score 0.39063013?
I tried applying boost to "item.name" and removing "item.keywords" form search fields getting same results.
How scores in Elasticsearch works? Are above results correct in terms of relavance?
The scoring for phrase_prefix is similar to that of best_fields, meaning that score of a document is the score obtained from the best_field, which here is item.keywords.
So, item.name isn't adding to score
Refer: multi-match-types
You can use 2 multi_match queries to combine the score from keywords and name.
{
"query": {
"bool": {
"must": [{
"multi_match": {
"query": "butt",
"type": "phrase_prefix",
"fields": [
"item.keywords"
],
"max_expansions": 10
}
},{
"multi_match": {
"query": "butt",
"type": "phrase_prefix",
"fields": [
"item.name"
],
"max_expansions": 10
}
}]
}
}
}

ElasticSearch : MultiMatch with "match a list of values"

How can I combine MultiMatch + "match a list of values" in a single query.
ie . I want to query a list of names ["John","Bas","Peter"] against a list of fields ["first_name","Alias","nick_name","surname"]
-match a list of values-
{
"query": {
"filtered" : {
"filter" : {
"terms": {
"first_name": ["John","Bas","Peter"]
}
}
}
}
}
-MultiMatch-
{
"multi_match" : {
"query": "john",
"fields": ["first_name","Alias","nick_name","surname"]
}
}
You can provide multiple search terms to the multi_match as well:
"query": {
"multi_match": {
"query": "John Bas Peter",
"fields": [
"first_name",
"Alias",
"nick_name",
"surname"
]
}
}
Also, multi_match has various ways by which it's actually transforming internally the query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#multi-match-types
If you use the _validate API with this you could see how exactly ES is re-interpreting the query:
GET /_validate/query?explain=true
{
"query": {
"multi_match": {
"query": "John Bas Peter",
"fields": [
"first_name",
"Alias",
"nick_name",
"surname"
]
}
}
}
The above gives:
"explanations": [
{
"index": "test",
"valid": true,
"explanation": "((Alias:john Alias:bas Alias:peter) | (surname:john surname:bas surname:peter) | (nick_name:john nick_name:bas nick_name:peter) | (first_name:john first_name:bas first_name:peter))"
}
]

Resources