Word and phrase search on multiple fields in ElasticSearch - elasticsearch

I'd like to search documents using Python through ElasticSearch. I am looking for documents which contains word and/or phrase in any one of three fields.
GET /my_docs/_search
{
"query": {
"multi_match": {
"query": "Ford \"lone star\"",
"fields": [
"title",
"description",
"news_content"
],
"minimum_should_match": "-1",
"operator": "AND"
}
}
}
In the above query, I'd like to get documents whose title, description, or news_content contain "Ford" and "lone star" (as a phrase).
However, it seems that it does not consider "lone star" as a phrase. It returns documents with "Ford", "lone", and "star".

So, I was able to reproduce your issue and solved it using the REST API of Elasticsearch as I am not familiar with the python syntax and glad you provided your search query in JSON format, and I built my solution on top of it.
Index def
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"description" :{
"type" : "text"
},
"news_content" : {
"type" : "text"
}
}
}
}
Sample docs
{
"title" : "Ford",
"news_content" : "lone star", --> note this matches your criteria
"description" : "foo bar"
}
{
"title" : "Ford",
"news_content" : "lone",
"description" : "star"
}
Search query you are looking for
{
"query": {
"bool": {
"must": [ --> note this, both clause must match
{
"multi_match": {
"query": "ford",
"fields": [
"title",
"description",
"news_content"
]
}
},
{
"multi_match": {
"query": "lone star",
"fields": [
"title",
"description",
"news_content"
],
"type": "phrase" --> note `lone star` must be phrase
}
}
]
}
}
}
Result contains just one doc from sample
"hits": [
{
"_index": "so_phrase",
"_type": "_doc",
"_id": "1",
"_score": 0.9527341,
"_source": {
"title": "Ford",
"news_content": "lone star",
"description": "foo bar"
}
}
]

Related

Elastic search how to query the results for the keyword exists in the given fields

I have a email elastic search db created uses following mappings for email sender and receipients:
"mappings": {
...
"recipients": {
"type": "keyword"
},
"sender": {
"type": "keyword"
},
...
I am given a list of emails and I try to query the emails if the any of the email is either the sender OR recipient. For example, I try to use following query:
{
"query": {
"multi_match" : {
"query": "abc#apple.com defg#samsung.com",
"operator": "OR",
"fields": [ "recipients", "sender" ],
"type": "cross_fields"
}
}
}
to query the emails if (abc#apple.com exists in the sender or receipient) OR (defg#samsung.com exists in the sender or receipient). But it doesn't return any result.. (But it do exists)
Does anyone know how to query the emails if any of the email in sender or receipient?
Thanks
It's good that you have found the solution, but understanding why multi_match didn't work and why query_string worked, and why you should avoid the query_string if possible important.
As mentioned, in the official Elasticsearch documentation,
Also, your multi_match query didn't work as you provided the two mails input in the same query like abc#apple.com defg#samsung.com and this term is analyzed depending on the fields analyzer(keyword in your example), So, it would try to find abc#apple.com defg#samsung.com in your fields, not abc#apple.com or defg#samsung.com.
If you want to use the multi_match, right query would be
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "abc#apple.com",
"operator": "OR",
"fields": [
"recipients",
"sender"
],
"type": "cross_fields"
}
},
{
"multi_match": {
"query": "defg#samsung.com",
"operator": "OR",
"fields": [
"recipients",
"sender"
],
"type": "cross_fields"
}
}
]
}
}
}
which returns below documents.
"hits": [
{
"_index": "71367024",
"_id": "1",
"_score": 0.6931471,
"_source": {
"recipients": "abc#apple.com",
"sender": "foo#bar.com"
}
},
{
"_index": "71367024",
"_id": "2",
"_score": 0.6931471,
"_source": {
"recipients": "defg#samsung.com",
"sender": "baz#bar.com"
}
}
]
I think I may find the answer. Using the following query will work:
{
"query": {
"query_string" : {
"query": "abc#apple.com OR defg#samsung.com",
"fields": [ "recipients", "sender" ]
}
}

Returning documents that match multiple wildcard string queries

I'm new to Elasticsearch and would greatly appreciate help on this
In the query below I only want the first document to be returned, but instead both documents are returned. How can I write a query to search for two wildcard strings on two separate fields, but only return documents that match?
I think what's being returned currently is score dependent, but I don't need the score.
POST /pr/_doc/1
{
"type": "Type ONE",
"currency":"USD"
}
POST /pr/_doc/2
{
"type": "Type TWO",
"currency":"USD"
}
GET /pr/_search
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Type ON*",
"fields": ["type"],
"analyze_wildcard": true
}
},
{
"simple_query_string": {
"query": "US*",
"fields": ["currency"],
"analyze_wildcard":true
}
}
]
}
}
}
Use below query which uses the default_operator: AND and query string for in depth information and further reading.
Search query
{
"query": {
"query_string": {
"query": "(Type ON*) AND (US*)",
"fields" : ["type", "currency"],
"default_operator" : "AND"
}
}
}
Index your sample docs and it returns your expected doc only:
"hits": [
{
"_index": "multiplequery",
"_type": "_doc",
"_id": "1",
"_score": 2.1823215,
"_source": {
"type": "Type ONE",
"currency": "USD"
}
}
]

Search documents have at least one word in a list in ElasticSearch

I would like to search documents with 1) some phrases that must exist in one of three fields 2) and a list words in which at least one of them occurs in one of the fields, such as ['supply', 'procure', 'purchase'].
Below is the current ES query I use which meets the first requirement. However, how should I add the word list in this query?
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "ford",
"fields": [
"title",
"description",
"news_content"
]
}
},
{
"multi_match": {
"query": "lone star",
"fields": [
"title",
"description",
"news_content"
],
"type": "phrase"
}
}
]
}
}
}
You are almost there, just add operator OR in your query, which would solve your second use case of list words in which at least one of them occurs in one of the fields,
Let me show if you by an example:
Index def
{
"mappings" :{
"properties" :{
"title" :{
"type" : "text"
},
"description":{
"type" : "text"
}
}
}
}
Index sample doc
{
"title" : "foo",
"description": "opster"
}
{
"title" : "bar",
"description": "stackoverflow"
}
{
"title" : "baz",
"description": "nodesc"
}
Search query, notice I am searching for foo amit, list of words so atleast one of them should match in any of 2 fields
{
"query": {
"bool": {
"should": {
"multi_match": {
"query": "foo amit",
"fields": [
"title",
"description"
],
"operator": "or" --> notice operator OR
}
}
}
}
}
Search result
"hits": [
{
"_index": "white",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"title": "foo", --> notice this match as `foo` is present and we used opertor OR in query.
"description": "opster"
}
}
]

No match on document if the search string is longer than the search field

I have a title I am looking for
The title is, and is stored in a document as
"Police diaries : stefan zweig"
When I search "Police"
I get the result.
But when I search Policeman
I do not get the result.
Here is the query:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"fields": [
"title",
omitted because irrelevance...
],
"query": "Policeman",
"fuzziness": "1.5",
"prefix_length": "2"
}
}
],
"must": {
omitted because irrelevance...
}
}
},
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}
and here is the mapping
{
"books": {
"mappings": {
"book": {
"_all": {
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"sort": {
"type": "text",
"analyzer": "to order in another language, (creates a string with symbols)",
"fielddata": true
}
}
}
}
}
}
}
}
It should be noted that I have documents with a title "some title"
which get hits if I search for "someone title".
I cant figure out why the police book is not showing up.
So you have 2 parts of your question.
You want to search the title containing police when searching for policeman.
want to know why some title documents match the someone title document and according to that you expect the first one to match as well.
Let me first explain you why second query matches and the why the first one doesn't and then would tell you, how to make the first one to work.
Your document containing some title creates below tokens and you can verify this with analyzer API.
POST /_analyze
{
"text": "some title",
"analyzer" : "standard" --> default analyzer for text field
}
Generated tokens
{
"tokens": [
{
"token": "some",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "title",
"start_offset": 5,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Now when you search for someone title using the match query which is analyzed and uses the same analyzer which is used on index time on field.
So it creates 2 tokens someone and title and match query matches the title tokens, which is the reason it comes in your search result, you can also use Explain API to verify and see the internals how it matches in detail.
How to bring police title when searching for policeman
You need to make use of synonyms token filter as shown in the below example.
Index Def
{
"settings": {
"analysis": {
"analyzer": {
"synonyms": {
"filter": [
"lowercase",
"synonym_filter"
],
"tokenizer": "standard"
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms" : ["policeman => police"] --> note this
}
}
}
},
"mappings": {
"properties": {
"": {
"type": "text",
"analyzer": "synonyms"
}
}
}
}
Index sample doc
{
"dialog" : "police"
}
Search query having term policeman
{
"query": {
"match" : {
"dialog" : {
"query" : "policeman"
}
}
}
}
And search result
"hits": [
{
"_index": "so_syn",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"dialog": "police" --> note source has `police` only.
}
}
]

How Elasticsearch relevance score gets calculated?

I am using multi_match with phrase_prefix for full text search in Elasticsearch 5.5. ES query looks like
{
query: {
bool: {
must: {
multi_match: {
query: "butt",
type: "phrase_prefix",
fields: ["item.name", "item.keywords"],
max_expansions: 10
}
}
}
}
}
I am getting following response
[
{
"_index": "items_index",
"_type": "item",
"_id": "2",
"_score": 0.61426216,
"_source": {
"item": {
"keywords": "amul butter, milk, butter milk, flavoured",
"name": "Flavoured Butter"
}
}
},
{
"_index": "items_index",
"_type": "item",
"_id": "1",
"_score": 0.39063013,
"_source": {
"item": {
"keywords": "amul butter, milk, butter milk",
"name": "Butter Milk"
}
}
}
]
Mappings is as follows(I am using default mappings)
{
"items_index" : {
"mappings" : {
"parent_doc": {
...
"properties": {
"item" : {
"properties" : {
"keywords" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
How item with "name": "Flavoured Butter" getting higher score of 0.61426216 against the document with "name": "Butter Milk" and score 0.39063013?
I tried applying boost to "item.name" and removing "item.keywords" form search fields getting same results.
How scores in Elasticsearch works? Are above results correct in terms of relavance?
The scoring for phrase_prefix is similar to that of best_fields, meaning that score of a document is the score obtained from the best_field, which here is item.keywords.
So, item.name isn't adding to score
Refer: multi-match-types
You can use 2 multi_match queries to combine the score from keywords and name.
{
"query": {
"bool": {
"must": [{
"multi_match": {
"query": "butt",
"type": "phrase_prefix",
"fields": [
"item.keywords"
],
"max_expansions": 10
}
},{
"multi_match": {
"query": "butt",
"type": "phrase_prefix",
"fields": [
"item.name"
],
"max_expansions": 10
}
}]
}
}
}

Resources