Is it possible to set the TYPE parameter using simple query string query in Elastic Search - elasticsearch

When using a query string query in ES and matching multiple fields, I can set a TYPE paramter to configure how ES combines/scores when matching on multiple fields.
e.g. I want to match two fields in my index, and combine scores from both fields
GET /_search
{
"query": {
"query_string" : {
"query" : "test",
"fields": ["titel", "content"],
"type": "most_fields"
}
}
}
The parameter seems to be missing using the simple query string. What is the default mode for simple query string? How are scores chosen/combined? Is it possible to set type.

Simple query string doesn't have a type parameter. It does a sum of score from each field.
Consider below index and let's see how different queries calculate score using explanation api
Mapping:
PUT testindex6
{
"mappings": {
"properties": {
"title":{
"type": "text"
},
"description":{
"type": "text"
}
}
}
}
Data:
POST testindex6/_doc
{
"title": "dog",
"description":"dog is brown"
}
1. Query_string best_fields(default)
Finds documents which match any field, but uses the _score from the
best field
GET testindex6/_search?explain=true
{
"query": {
"query_string": {
"default_field": "*",
"query": "dog brown",
"type":"best_fields"
}
}
}
Result:
"_explanation" : {
"value" : 0.5753642,
"description" : "max of:",
"details" : [
{
"value" : 0.5753642,
"description" : "sum of:",
},
{
"value" : 0.2876821,
"description" : "sum of:",
}
]
}
Best_fields takes max score from matched fields
2. Query_string most_fields
Does sum of scores from matched fields
GET testindex6/_search?explain=true
{
"query": {
"query_string": {
"default_field": "*",
"query": "dog brown",
"type":"most_fields"
}
}
}
Result
"_explanation" : {
"value" : 0.8630463,
"description" : "sum of:",
"details" : [
{
"value" : 0.5753642,
"description" : "sum of:"
....
},
{
"value" : 0.2876821,
"description" : "sum of:"
....
}
]
}
}
3. Simple_Query_String
Query
GET testindex6/_search?explain=true
{
"query": {
"simple_query_string": {
"query": "dog brown",
"fields": ["*"]
}
}
}
Result:
"_explanation" : {
"value" : 0.8630463,
"description" : "sum of:",
"details" : [
{
"value" : 0.5753642,
"description" : "sum of:",
},
{
"value" : 0.2876821,
"description" : "sum of:"
}
]
}
}
So you can see score is same in most_fields and simple_query_string(both do a sum of). But there is difference in them. Consider below index
I have created a field title with type text and subfield shingles with shingles analyzer.
PUT index_2
{
"settings": {
"analysis": {
"analyzer": {
"analyzer_shingle": {
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"shingles": {
"search_analyzer": "analyzer_shingle",
"analyzer": "analyzer_shingle",
"type": "text"
}
}
}
}
}
}
Data:
POST index_2/_doc
{
"title":"the brown fox"
}
1. Most_fields
Query:
GET index_2/_search?explain=true
{
"query": {
"query_string": {
"query": "brown fox",
"fields": ["*"],
"type":"most_fields"
}
}
}
Result:
"_explanation" : {
"value" : 1.3650365,
"description" : "sum of:",
"details" : [
{
"value" : 0.7896724,
"description" : "sum of:",
},
{
"value" : 0.5753642,
"description" : "sum of:",
}
]
}
2. Simple_Query_string
Query
GET index_2/_search?explain=true
{
"query": {
"simple_query_string": {
"query": "brown fox",
"fields": ["*"]
}
}
}
Result:
"_explanation" : {
"value" : 1.2632996,
"description" : "sum of:",
"details" : [
{
"value" : 0.6316498,
"description" : "sum of:",
},
{
"value" : 0.6316498,
"description" : "sum of:"
}
]
}
}
If you will see the score is different in most_fields and simple_query_string even though both do sum of scores.
The reason is most_fields uses analyzer of field while querying ,remember titles(standard) and titles shingles(analyzer_shingle) have different analyzer while simple_query_string use default analyzer of the index(standard) for all fields.
If we will query most_fields and force it to use standard analyzer you will score is same
Query:
GET index_2/_search?explain=true
{
"query": {
"query_string": {
"query": "brown fox",
"fields": ["*"],
"type":"most_fields",
"analyzer": "standard"-->instead of field analyzer respectively use standard for all
}
}
}
Result:
"_explanation" : {
"value" : 1.2632996,
"description" : "sum of:"
"details" : [
{
"value" : 0.6879354,
"description" : "sum of:"
},
{
"value" : 0.5753642,
"description" : "sum of:"
}
]
}
simple_query_string I think is for simple scenarios, if you are using different analyzers for different field use simple_query_string or bool- match queries

Related

Elasticsearch multi-match fields doesn't include query string

I am doing a multi-match search using the following query object:
{
_source: [
'baseline',
'cdrp',
'date',
'description',
'dev_status',
'element',
'event',
'id'
],
track_total_hits: true,
query: {
bool: {
filter: [{name: "baseline", values: ["1f.0.1.0", "1f.1.8.3"]}],
should: [
{
multi_match:{
query: "national",
fields: ["cdrp","description","narrative.*","title","cop"]
}
}
]
}
},
highlight: { fields: { '*': {} } },
sort: [],
from: 0,
size: 50
}
I'm expecting the word "national" to be found within description or narrative.* fields but only one record out of 2 returned meet my expectations. I'm trying to understand why.
elasticsearch.config.ts
"settings": {
"analysis": {
"analyzer": {
"search_synonyms": {
"tokenizer": "whitespace",
"filter": [
"graph_synonyms",
"lowercase",
"asciifolding"
],
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "search_synonyms"
},
"narrative": {
"type":"object",
"properties":{
"_all":{
"type": "text",
"analyzer": "search_synonyms"
}
}
},
}
}
Should clause works like OR, it doesn't filter out documents it affects scoring. Documents which match should clause are scored higher.
If you want to filter on multi-match you can move it inside filter clause
filter: [
{
name: "baseline", values: ["1f.0.1.0", "1f.1.8.3"]
},
{
multi_match:
{
query: "national",
fields: ["cdrp","description","narrative.*","title","cop"]
}
}
]
Filter vs Must:- Both return documents matching clauses specified. Filter doesn't score documents. So if you are not interested in score of documents or are not concerned with order of documents returned, you can use filter. So both are same with difference of scoring.
Documents with more matches are scored higher
Multi_match by default uses best_fields
Finds documents which match any field, but uses the _score from the
best field.
It uses score returned for field with maximum number of matches to calculate score for each document.
Example
Document 1 has matches in two field , field1 (score 2), field2 (score 1)
Document 2 has matches in one field , field2 (score 3)
Documnet 2 will be ranked higher even if 1 field has matched.
You can change it to most_fields
Finds documents which match any field and combines the _score from
each field.
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "test",
"fields": [],
"type": "most_fields"
}
}
]
}
}
}
Still a document with fewer number of fields matched can be ranked higher due to high score in a field caused by multiple terms.
If you want to give same score to a single field irrespective of number of tokens matched. You need to use constant_score query
{
"query": {
"bool": {
"should": [
{
"constant_score": {
"filter": {
"term": {
"field1": "test"
}
}
}
},
{
"constant_score": {
"filter": {
"term": {
"field2": "test"
}
}
}
}
]
}
},
"highlight": {
"fields": {
"field1": {},
"field2": {}
}
}
}
Result:
"hits" : [
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "iSCe6nEB8J88APx3YBGn",
"_score" : 2.0, --> one score per field matched
"_source" : {
"field1" : "test",
"field2" : "test"
},
"highlight" : {
"field1" : [
"<em>test</em>"
],
"field2" : [
"<em>test</em>"
]
}
},
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "iiCe6nEB8J88APx3ghF-",
"_score" : 1.0,
"_source" : {
"field1" : "test",
"field2" : "abc"
},
"highlight" : {
"field1" : [
"<em>test</em>"
]
}
},
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "iyCf6nEB8J88APx3UhF8",
"_score" : 1.0,
"_source" : {
"field1" : "test do",
"field2" : "abc"
},
"highlight" : {
"field1" : [
"<em>test</em> do"
]
}
}
]
}

Search documents have at least one word in a list in ElasticSearch

I would like to search documents with 1) some phrases that must exist in one of three fields 2) and a list words in which at least one of them occurs in one of the fields, such as ['supply', 'procure', 'purchase'].
Below is the current ES query I use which meets the first requirement. However, how should I add the word list in this query?
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "ford",
"fields": [
"title",
"description",
"news_content"
]
}
},
{
"multi_match": {
"query": "lone star",
"fields": [
"title",
"description",
"news_content"
],
"type": "phrase"
}
}
]
}
}
}
You are almost there, just add operator OR in your query, which would solve your second use case of list words in which at least one of them occurs in one of the fields,
Let me show if you by an example:
Index def
{
"mappings" :{
"properties" :{
"title" :{
"type" : "text"
},
"description":{
"type" : "text"
}
}
}
}
Index sample doc
{
"title" : "foo",
"description": "opster"
}
{
"title" : "bar",
"description": "stackoverflow"
}
{
"title" : "baz",
"description": "nodesc"
}
Search query, notice I am searching for foo amit, list of words so atleast one of them should match in any of 2 fields
{
"query": {
"bool": {
"should": {
"multi_match": {
"query": "foo amit",
"fields": [
"title",
"description"
],
"operator": "or" --> notice operator OR
}
}
}
}
}
Search result
"hits": [
{
"_index": "white",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"title": "foo", --> notice this match as `foo` is present and we used opertor OR in query.
"description": "opster"
}
}
]

Word and phrase search on multiple fields in ElasticSearch

I'd like to search documents using Python through ElasticSearch. I am looking for documents which contains word and/or phrase in any one of three fields.
GET /my_docs/_search
{
"query": {
"multi_match": {
"query": "Ford \"lone star\"",
"fields": [
"title",
"description",
"news_content"
],
"minimum_should_match": "-1",
"operator": "AND"
}
}
}
In the above query, I'd like to get documents whose title, description, or news_content contain "Ford" and "lone star" (as a phrase).
However, it seems that it does not consider "lone star" as a phrase. It returns documents with "Ford", "lone", and "star".
So, I was able to reproduce your issue and solved it using the REST API of Elasticsearch as I am not familiar with the python syntax and glad you provided your search query in JSON format, and I built my solution on top of it.
Index def
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"description" :{
"type" : "text"
},
"news_content" : {
"type" : "text"
}
}
}
}
Sample docs
{
"title" : "Ford",
"news_content" : "lone star", --> note this matches your criteria
"description" : "foo bar"
}
{
"title" : "Ford",
"news_content" : "lone",
"description" : "star"
}
Search query you are looking for
{
"query": {
"bool": {
"must": [ --> note this, both clause must match
{
"multi_match": {
"query": "ford",
"fields": [
"title",
"description",
"news_content"
]
}
},
{
"multi_match": {
"query": "lone star",
"fields": [
"title",
"description",
"news_content"
],
"type": "phrase" --> note `lone star` must be phrase
}
}
]
}
}
}
Result contains just one doc from sample
"hits": [
{
"_index": "so_phrase",
"_type": "_doc",
"_id": "1",
"_score": 0.9527341,
"_source": {
"title": "Ford",
"news_content": "lone star",
"description": "foo bar"
}
}
]

Elasticsearch is not returning a document I expect in the search results

I have a collection of customers that have a first name, last name, email, description and owner id. I want to take a character string from the app, and search on all the fields, with a priority order. Im using boost to achieve that.
Currently I have a lot of test customers with the name Sean in various fields within the documents. I have 2 documents that contain an email with sean.jones#email.com. One document contains the same email in the description.
When I perform the following search, im missing the document in the search results that does not contain the email in the description.
Here is my query:
{
"query" : {
"bool" : {
"filter" : {
"match" : {
"ownerId" : "acct_123"
}
},
"must" : [
{
"bool" : {
"should" : [
{
"prefix" : {
"firstName" : {
"value" : "sean",
"boost" : 3
}
}
},
{
"prefix" : {
"lastName" : {
"value" : "sean",
"boost" : 3
}
}
},
{
"terms" : {
"boost" : 2,
"description" : [
"sean"
]
}
},
{
"prefix" : {
"email" : {
"value" : "sean",
"boost" : 1
}
}
}
]
}
}
]
}
}
}
Here is the document that Im missing:
{
"_index" : "xxx",
"_id" : "cus_123",
"_version" : 1,
"_type" : "customers",
"_seq_no" : 9096,
"_primary_term" : 1,
"found" : true,
"_source" : {
"firstName" : null,
"id" : "cus_123",
"lastName" : null,
"email" : "sean.jones#email.com",
"ownerId" : "acct_123",
"description" : null
}
}
When I look at the current results, all of the documents have a score of 3.0. They have "Sean" in the name as well, so they score higher. When I do an _explain on the document im missing, with the query above, I get the following:
{
"_index": "xxx",
"_type": "customers",
"_id": "cus_123",
"matched": true,
"explanation": {
"value": 1.0,
"description": "sum of:",
"details": [
{
"value": 1.0,
"description": "sum of:",
"details": [
{
"value": 1.0,
"description": "ConstantScore(email._index_prefix:sean)",
"details": []
}
]
},
{
"value": 0.0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0.0,
"description": "# clause",
"details": []
},
{
"value": 1.0,
"description": "ownerId:acct_123",
"details": []
}
]
}
]
}
}
Here are my mappings:
{
"properties": {
"firstName": {
"type": "text",
"index_prefixes": {
"max_chars": 10,
"min_chars": 1
}
},
"email": {
"analyzer": "my_email_analyzer",
"type": "text",
"index_prefixes": {
"max_chars": 10,
"min_chars": 1
}
},
"lastName": {
"type": "text",
"index_prefixes": {
"max_chars": 10,
"min_chars": 1
}
},
"description": {
"type": "text"
},
"ownerId": {
"type": "text"
}
}
}
"my_email_analyzer": {
"type": "custom",
"tokenizer": "uax_url_email"
}
If im understanding this correctly, because this document is only scoring a 1, its not meeting a particular threshold. Ive tried adjusting the min_score but I had no luck. Any thoughts on how I can get this document to be included in the search results?
thanks so much
It depends on what mean by "missing":
is it, that the document does not make it into the number of hits (the "total")?
or is it, that the document itself does not show up as a hit in the hits list?
If it's #2 you may want to increase the number of documents Elasticsearch fetches and returns, by adding a size-clause to your search request (default size is 10):
Example
"size": 50

Is it possible to use a more-like-this query on nested fields?

I have an "event" type based on a (nested) press article, including the title, and the text, which both have multifields.
I've tried :
{
"query":{
"nested":{
"path":"article",
"query":{
"mlt":{
"fields":["article.title.search","article.text.search"],
"max_query_terms": 20,
"min_term_freq": 1,
"include": "false",
"like":[{
"_index":"myindex",
"_type":"event",
"doc":{
"article":{
"title":"this is the title",
"text":"this is the body of the article"
}
}]
}
}
}
}
}
But it always returns 0 hits
{
"query": {
"nested":{
"path":"articles",
"query":{
"more_like_this" : {
"fields" : ["articles.brand", "articles.category", "articles.material"],
"like" : [
{
"_index" : "$index",
"_type" : "$type",
"_id" : "$id"
}
],
"min_term_freq" : 1,
"max_query_terms" : 20
}
}
}
}
This Works for me, Taking in consideration that the mapping of the nested fields you are using must be defined as term vectors.
"brand": {
"type": "string",
"index": "not_analyzed",
"term_vector": "yes"
}
Refer to: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html

Resources