ElasticSearch query is not working with only 2 characters - elasticsearch

I have a field with this mapping definition
identifierNumber: {
type: "keyword",
fields: { text: { type: "text" } },
},
the values of this field look something like this 22-001,22-002, etc
I am making the following query to ElasticSearch
{
"query": {
"bool": {
"filter": [
{
"term": {
"status": "NEW"
}
}
],
"must": [
{
"simple_query_string": {
"query": "22 22~",
"fields": [
"title^3",
"identifierNumber^2"
],
"lenient": true
}
}
]
}
},
"sort": []
}
this query returns 0 results.
changing the simple_query_string query to 22001 or 22-001 will return relevant results.
Can someone explain to me why the original query with only 2 characters does not work?

I think you need add the fields "identifierNumber.text" in simple_query_string clausule.
"simple_query_string": {
"query": "22 22~",
"fields": [
"title^3",
"identifierNumber.text"
],
"lenient": true
}

Related

Elasticsearch Query NOT searching in the specified fields

I am struggling with an elasticsearch query. In the fields option, we have specified '*' which means it should look in all fields as well as given the higher weights to a few fields. But it isn't working as it should.
This query was written by my colleague, it'd be great if you could explain it as well as point out the solution. Here's my query:
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Atoms for Peace",
"default_operator": "AND",
"flags": "PREFIX|PHRASE|NOT|AND|OR|FUZZY|WHITESPACE",
"fields": [
"*",
"systemNumber^5",
"global_search",
"objectType^2",
"partTypes.text",
"partTypes.id",
"gs_am_people^2",
"gs_am_person^2",
"gs_am_org^2",
"gs_title^2",
"_currentLocation.displayName",
"briefDescription",
"physicalDescription",
"summaryDescription",
"_flatPersonsNameId",
"_flatPeoplesNameId",
"_flatOrganisationsNameId",
"_primaryDate",
"_primaryDateEarliest",
"_primaryDateLatest"
]
}
}
]
}
}
Your query is fine but it will not work on field with "nested" data type.
From doc
Searching across all eligible fields does not include nested documents. Use a nested query to search those documents.
You need to use nested query
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"simple_query_string": {
"query": "Atoms for Peace",
"default_operator": "AND",
"flags": "PREFIX|PHRASE|NOT|AND|OR|FUZZY|WHITESPACE",
"fields": [
"*",
"systemNumber^5",
"global_search",
"objectType^2",
"partTypes.text",
"partTypes.id",
"gs_am_people^2",
"gs_am_person^2",
"gs_am_org^2",
"gs_title^2",
"_currentLocation.displayName",
"briefDescription",
"physicalDescription",
"summaryDescription",
"_flatPersonsNameId",
"_flatPeoplesNameId",
"_flatOrganisationsNameId",
"_primaryDate",
"_primaryDateEarliest",
"_primaryDateLatest"
]
}
},
{
"nested": {
"path": "record",
"query": {
"simple_query_string": {
"query": "Atoms for Peace",
"default_operator": "AND",
"flags": "PREFIX|PHRASE|NOT|AND|OR|FUZZY|WHITESPACE",
"fields": [
"*"
]
}
}
}
}
]
}
}
}

How to filter the follwing query in elasticsearch?

I am using the following search:
{
"_source": [
"title",
"bench",
"court",
"id_"
],
"size": 10,
"query": {
"bool": {
"must": {
"multi_match": {
"query": "murder"
,
"fields": [
"title",
"content"
]
}
},
"should": {
"multi_match": {
"query": "murder",
"fields": [
"title.standard",
"content.standard"
]
}
}
}
},
"highlight": {
"fields": {
"title": {},
"content": {}
}
}
}
I now want to filter the results using the id (_id) elastic search gave it during indexing. For example, {"_id" : 5903}. I guess you have to use the term query. The results should be such that only if the _id is matched, the document returns. How can I do that?
In order to get your query filtered by doc's id (one, or many), there is a special id query in elasticsearch. Here are the details: https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl-ids-query.html

Elasticsearch "AND in query_string" vs. "default_operator AND"

elasticsearch v7.1.1
I dont understand the difference between a query_string containing "AND"
vs. "default_operator AND"
I thought it should yield the same result, but doesnt:
HTTP POST http://localhost:9200/umlautsuche
{
"settings": {
"analysis": {
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": ["ph => f"]
}
},
"filter": {
"my_ngram": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 10
}
},
"analyzer": {
"my_name_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
],
"filter": [
"lowercase",
"german_normalization"
]
}
}
}
},
"mappings": {
"date_detection": false,
"dynamic_templates": [
{
"string_fields_german": {
"match_mapping_type": "string",
"match": "*",
"mapping": {
"type": "text",
"analyzer": "my_name_analyzer"
}
}
},
{
"dates": {
"match": "lastModified",
"match_pattern": "regex",
"mapping": {
"type": "date",
"ignore_malformed": true
}
}
}
]
}
}
HTTP POST http://localhost:9200/_bulk
{ "index" : { "_index" : "umlautsuche", "_id" : "1" } }
{"vorname": "Stephan-Jörg", "nachname": "Müller", "ort": "Hollabrunn"}
{ "index" : { "_index" : "umlautsuche", "_id" : "2" } }
{"vorname": "Stephan-Joerg", "nachname": "Mueller", "ort": "Hollabrunn"}
{ "index" : { "_index" : "umlautsuche", "_id" : "3" } }
{"vorname": "Stephan-Jörg", "nachname": "Müll", "ort": "Hollabrunn"}
No results here - unexpected by me:
HTTP POST http://localhost:9200/umlautsuche/_search
{
"query": {
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"default_operator": "AND",
"fields": ["vorname", "nachname"]
}
}
}
This query gives the results as expected by me:
HTTP POST http://localhost:9200/umlautsuche/_search
{
"query": {
"query_string": {
"query": "Stefan AND Müller AND Jör*",
"analyze_wildcard": true,
"default_operator": "AND",
"fields": ["vorname", "nachname"]
}
}
}
How do I configure query/analyzer so I dont need these "AND" between my search terms?
What you are facing is an obscurity of boolean logic of query_string boolean operators, and possibly an undocumented behavior. Because of this obscurity I believe it is better to either use bool query with explicit logic, or to use a copy_to.
Let me explain in a bit more detail what's going on and how can you fix it.
Why doesn't the first query match?
In order to see how the query gets executed, let's set profile: true:
POST /umlautsuche/_search
{
"query": {
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"default_operator": "AND",
"fields": [
"vorname",
"nachname"
]
}
},
"profile": true
}
In the ES response we will see:
"profile": {
"shards": [
{
"id": "[QCANVs5gR0GOiiGCmEwj7w][umlautsuche][0]",
"searches": [
{
"query": [
{
"type": "BooleanQuery",
"description": "+((+nachname:stefan +nachname:muller) | (+vorname:stefan +vorname:muller)) +(nachname:jor* | vorname:jor*)",
"time_in_nanos": 17787641,
"breakdown": {
"set_min_competitive_score_count": 0,
We are interested in this part:
"+((+nachname:stefan +nachname:muller) | (+vorname:stefan +vorname:muller)) +(nachname:jor* | vorname:jor*)"
Without going into deep analysis, we can tell that this query wants to find documents with surname stefan and with surname muller, which is impossible (because stefan is never a surname among the documents).
What we actually want to do, I presume, is "find people whose full name is Stefan Müller Jör*". This is not what the query generated by Elasticsearch does.
Why does the second query match?
Let's do the same trick with explain: true. The response would contain this:
"profile": {
"shards": [
{
"id": "[QCANVs5gR0GOiiGCmEwj7w][umlautsuche][0]",
"searches": [
{
"query": [
{
"type": "BooleanQuery",
"description": "+(nachname:stefan | vorname:stefan) +(nachname:muller | vorname:muller) +(nachname:jor* | vorname:jor*)",
"time_in_nanos": 17970342,
"breakdown": {
We can see that the query got interpreted like this:
"+(nachname:stefan | vorname:stefan) +(nachname:muller | vorname:muller) +(nachname:jor* | vorname:jor*)"
Which we can roughly interpret as "find people whose name or surname is one these three names", which is what we expect it to do.
In the documentation of query_string query it says that with default_operator: AND it should interpret spaces as ANDs:
The default operator used if no explicit operator is specified. For
example, with a default operator of OR, the query capital of Hungary
is translated to capital OR of OR Hungary, and with default operator
of AND, the same query is translated to capital AND of AND Hungary.
The default value is OR.
Although, from what we have just seen, this does not seem to be correct - at least in case of querying multiple fields.
So what can we do about it?
Use bool with explicit logic
This query seems to work:
POST /umlautsuche/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"fields": [
"vorname"
]
}
},
{
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"fields": [
"nachname"
]
}
}
]
}
}
}
This query is not an exact equivalent, consider it as an example. For instance, if we would have another record like this, without "Jörg":
{"vorname": "Stephan", "nachname": "Müll", "ort": "Hollabrunn"}
the bool query above would match it despite missing "Jörg". To overcome this you can write a more complex bool query, but this will not do if you wanted to avoid parsing user input.
How can we still use plain, unparsed query string?
Introduce a copy_to field
We can try to use copy_to capability. It will copy the content of several fields into another field and will analyze these fields all together.
We will have to modify the mapping configuration (unfortunately the existing index will have to be recreated):
"mappings": {
"date_detection": false,
"dynamic_templates": [
{
"name_fields_german": {
"match_mapping_type": "string",
"match": "*name",
"mapping": {
"type": "text",
"analyzer": "my_name_analyzer",
"copy_to": "full_name"
}
}
},
{
"string_fields_german": {
"match_mapping_type": "string",
"match": "*",
"mapping": {
"type": "text",
"analyzer": "my_name_analyzer"
}
}
},
{
"dates": {
"match": "lastModified",
"match_pattern": "regex",
"mapping": {
"type": "date",
"ignore_malformed": true
}
}
}
]
}
Then we can populate the index in exactly the same manner as we did before.
Now we can query the new field full_name with the following query:
POST /umlautsuche/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"default_operator": "AND",
"fields": [
"full_name"
]
}
}
]
}
}
}
This query will return same 2 documents as the second query. Thus, in this case default_operator: AND behaves as we would expect it, asking for all tokens from the query to be matched.
Hope that helps!

Searching in specific fields of types

Consider the following query:
{
"query" : {
"match_phrase" : {
"_all" : "Smith"
}
}
}
How would I specify in which fields of which types it may search, instead of searching in everything? (field names may be non-unique across types)
I've tried the query below, but it didn't work (it doesn't return results, it does when I remove person. from all fields):
{
"query": {
"multi_match": {
"query": "Smith",
"fields": [
"person.first_name",
"person.last_name",
"person.age"
],
"lenient": true
}
}
}
I'm sending these queries to http://localhost:9200/tsf-model/_search.
If you can build your query dynamically, I think you can use a combination of your multi_match query and a type query for each type, in order to achieve what you want:
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"bool": {
"filter": [
{
"type": {
"value": "type1"
}
},
{
"multi_match": {
"query": "Smith",
"fields": [
"field1",
"field3",
"field5"
]
}
}
]
}
},
{
"bool": {
"filter": [
{
"type": {
"value": "type2"
}
},
{
"multi_match": {
"query": "Smith",
"fields": [
"field2",
"field4",
"field6"
]
}
}
]
}
}
]
}
}
}

Multi_match and match queries together

I have the following queries in elastic search :
{
"query": {
"multi_match": {
"query": "bluefin bat",
"type": "phrase",
"fields": [
"title^5",
"body.value"
]
}
},
"highlight": {
"fields": {
"body.value": {
"number_of_fragments": 3
}
}
},
"fields": [
"title",
"id"
]
}
I have tried using "dis_max" but then two of my fields have to be searched for the same query.
The remaining match query has a different query text.
The remaining match query is like this:
{
"query": {
"match": {
"ingredients": "key1, key2",
"analyzer": "keyword_analyzer"
}
}
}
How can I integrate these two queries without using dis_max for joining.
I figured out the answer. multi_match internally applies :
"dis_max"
Hence, you cannot apply dis_max with multi_match.
But what I could do is I could apply bool query to solve this type of problem.
I could apply should which actually translates to OR boolean value or I could apply must which is equivalent to AND.
So this is how I modified my query :
{
"query": {
"bool":{
"should": [
{"multi_match":
{"query": "SOME_QUERY",
"type": "phrase",
"fields": ["title^5","body"]
}
},
{
"match":{
"labels" :{
"query": "SOME_QUERY",
"analyzer": "keyword_analyzer"
}
}
},
{
"match":{
"displayName" :{
"query": "SOME_QUERY",
"fuzziness": "AUTO"
}
}
}
],
"minimum_number_should_match": "50%"
}
},
"fields": ["title","id","labels","displayName","username"],
"highlight": {
"fields": {
"body.storage.value": {
"number_of_fragments": 3}
}
}
}
I hope this helps someone in future.

Resources