Elastic Search - Multi Word Exact Match - elasticsearch

I want to adjust the following query so it exactly matches multiple words. Whenever I try this it seems to tokenize the strings and then search. How can I specify for a particular substring that it must be an exact match?
{
"query": {
"query_string": {
"query": "string OR string2 OR this is my multi word string",
"fields": ["title","description"]
}
}
}
My mapping is as follows:
{
"indexname": {
"properties": {
"title": {
"type": "multi_field",
"fields": {
"title": {"type": "string"},
"original": {"type" : "string", "index" : "not_analyzed"}
}
},
"location": {
"type": "geo_point"
}
}
}

By default the QuerySring and match queries are analyzed.So use terms query.unfortunately we cannot use multiple fields in term query.So use bool query for that.Please try bellow query..
{
"query": {
"bool": {
"must": [
{
"term": {
"title": {
"value": "string OR string2 OR this is my multi word string"
}
}
},
{
"term": {
"description": {
"value": "string OR string2 OR this is my multi word string"
}
}
}
]
}
}
}
HOpe it helps..!

Related

Proximity-Relevance in elasticsearch

I have an json record in the elastic search with fields
"streetName": "5 Street",
"name": ["Shivam Apartments"]
I tried the below query but it does not return anything if I add streetName bool in the query
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": {
"match": {
"name": {
"query": "shivam apartments",
"minimum_should_match": "80%"
}
}
}
}
},
{
"bool": {
"must": {
"match": {
"streetName": {
"query": "5 street",
"minimum_should_match": "80%"
}
}
}
}
}
]
}
}
}
Document Mapping
{
"rabc_documents": {
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete_analyzer",
"position_increment_gap": 0
},
"streetName": {
"type": "keyword"
}
}
}
}
}
Based on the E.S Documentation (Keywords in Elastic Search)
"Keyword fields are only searchable by their exact value".
Along with that keywords are case sensitive as well.
Taking aforementioned into account:
Searching for "5 street" will not match "5 Street" ('s' vs 'S') on keyword field
minimum_should_match will not work on a keyword field.
Suggestion: For partial matches use "text" mapping instead of "keyword". Keywords are meant to be used for filtering, aggregation based on term, etc.

not able to search in compounding query using analyzer

I have a problem index which has multiple fields e.g tags (comma separated string of tags), author, tester. I am creating a global search where problems can be searched by all these fields at once.
I am using boolean query
e.g
{
"query": {
"bool": {
"must": [{
"match": {
"author": "author_username"
}
},
{
"match": {
"tester": "tester_username"
}
},
{
"match": {
"tags": "<tag1,tag2>"
}
}
]
}
}
}
Without Analyzer I am able to get the results but it uses space as separator e.g python 3 is getting searched as python or 3.
But I wanted to search Python 3 as single query. So, I have created an analyzer for tags so that every comma-separated tag is considered as one, not by standard whitespace.
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "pattern",
"pattern": ","
}
}
}
},
"mappings": {
"properties": {
"tags": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "standard"
}
}
}
}
But now I am not getting any results. Please let me know what I am missing here. I am not able to find the use of analyzer in compound queries in the documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/compound-queries.html
Adding an example:
{
"query": {
"bool": {
"must": [{
"match": {
"author": "test1"
}
},
{
"match": {
"tester": "test2"
}
},
{
"match": {
"tags": "test3, abc 4"
}
}
]
}
}
}
Results should match all the fields but for the tags field there should be a union of tags and query should be comma-separated not by space. i.e query should match test and abc 4 but above query searching for test, abc and 4.
You need to either remove search_analyzer from your mapping or pass my_analyzer in match query
GET tags/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"tags": {
"query": "python 3",
"analyzer": "my_analyzer" --> by default search analyzer is used
}
}
}
]
}
}
}
By default, queries will use the analyzer defined in the field mapping, but this can be overridden with the search_analyzer setting.

Elasticsearch "AND in query_string" vs. "default_operator AND"

elasticsearch v7.1.1
I dont understand the difference between a query_string containing "AND"
vs. "default_operator AND"
I thought it should yield the same result, but doesnt:
HTTP POST http://localhost:9200/umlautsuche
{
"settings": {
"analysis": {
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": ["ph => f"]
}
},
"filter": {
"my_ngram": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 10
}
},
"analyzer": {
"my_name_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
],
"filter": [
"lowercase",
"german_normalization"
]
}
}
}
},
"mappings": {
"date_detection": false,
"dynamic_templates": [
{
"string_fields_german": {
"match_mapping_type": "string",
"match": "*",
"mapping": {
"type": "text",
"analyzer": "my_name_analyzer"
}
}
},
{
"dates": {
"match": "lastModified",
"match_pattern": "regex",
"mapping": {
"type": "date",
"ignore_malformed": true
}
}
}
]
}
}
HTTP POST http://localhost:9200/_bulk
{ "index" : { "_index" : "umlautsuche", "_id" : "1" } }
{"vorname": "Stephan-Jörg", "nachname": "Müller", "ort": "Hollabrunn"}
{ "index" : { "_index" : "umlautsuche", "_id" : "2" } }
{"vorname": "Stephan-Joerg", "nachname": "Mueller", "ort": "Hollabrunn"}
{ "index" : { "_index" : "umlautsuche", "_id" : "3" } }
{"vorname": "Stephan-Jörg", "nachname": "Müll", "ort": "Hollabrunn"}
No results here - unexpected by me:
HTTP POST http://localhost:9200/umlautsuche/_search
{
"query": {
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"default_operator": "AND",
"fields": ["vorname", "nachname"]
}
}
}
This query gives the results as expected by me:
HTTP POST http://localhost:9200/umlautsuche/_search
{
"query": {
"query_string": {
"query": "Stefan AND Müller AND Jör*",
"analyze_wildcard": true,
"default_operator": "AND",
"fields": ["vorname", "nachname"]
}
}
}
How do I configure query/analyzer so I dont need these "AND" between my search terms?
What you are facing is an obscurity of boolean logic of query_string boolean operators, and possibly an undocumented behavior. Because of this obscurity I believe it is better to either use bool query with explicit logic, or to use a copy_to.
Let me explain in a bit more detail what's going on and how can you fix it.
Why doesn't the first query match?
In order to see how the query gets executed, let's set profile: true:
POST /umlautsuche/_search
{
"query": {
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"default_operator": "AND",
"fields": [
"vorname",
"nachname"
]
}
},
"profile": true
}
In the ES response we will see:
"profile": {
"shards": [
{
"id": "[QCANVs5gR0GOiiGCmEwj7w][umlautsuche][0]",
"searches": [
{
"query": [
{
"type": "BooleanQuery",
"description": "+((+nachname:stefan +nachname:muller) | (+vorname:stefan +vorname:muller)) +(nachname:jor* | vorname:jor*)",
"time_in_nanos": 17787641,
"breakdown": {
"set_min_competitive_score_count": 0,
We are interested in this part:
"+((+nachname:stefan +nachname:muller) | (+vorname:stefan +vorname:muller)) +(nachname:jor* | vorname:jor*)"
Without going into deep analysis, we can tell that this query wants to find documents with surname stefan and with surname muller, which is impossible (because stefan is never a surname among the documents).
What we actually want to do, I presume, is "find people whose full name is Stefan Müller Jör*". This is not what the query generated by Elasticsearch does.
Why does the second query match?
Let's do the same trick with explain: true. The response would contain this:
"profile": {
"shards": [
{
"id": "[QCANVs5gR0GOiiGCmEwj7w][umlautsuche][0]",
"searches": [
{
"query": [
{
"type": "BooleanQuery",
"description": "+(nachname:stefan | vorname:stefan) +(nachname:muller | vorname:muller) +(nachname:jor* | vorname:jor*)",
"time_in_nanos": 17970342,
"breakdown": {
We can see that the query got interpreted like this:
"+(nachname:stefan | vorname:stefan) +(nachname:muller | vorname:muller) +(nachname:jor* | vorname:jor*)"
Which we can roughly interpret as "find people whose name or surname is one these three names", which is what we expect it to do.
In the documentation of query_string query it says that with default_operator: AND it should interpret spaces as ANDs:
The default operator used if no explicit operator is specified. For
example, with a default operator of OR, the query capital of Hungary
is translated to capital OR of OR Hungary, and with default operator
of AND, the same query is translated to capital AND of AND Hungary.
The default value is OR.
Although, from what we have just seen, this does not seem to be correct - at least in case of querying multiple fields.
So what can we do about it?
Use bool with explicit logic
This query seems to work:
POST /umlautsuche/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"fields": [
"vorname"
]
}
},
{
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"fields": [
"nachname"
]
}
}
]
}
}
}
This query is not an exact equivalent, consider it as an example. For instance, if we would have another record like this, without "Jörg":
{"vorname": "Stephan", "nachname": "Müll", "ort": "Hollabrunn"}
the bool query above would match it despite missing "Jörg". To overcome this you can write a more complex bool query, but this will not do if you wanted to avoid parsing user input.
How can we still use plain, unparsed query string?
Introduce a copy_to field
We can try to use copy_to capability. It will copy the content of several fields into another field and will analyze these fields all together.
We will have to modify the mapping configuration (unfortunately the existing index will have to be recreated):
"mappings": {
"date_detection": false,
"dynamic_templates": [
{
"name_fields_german": {
"match_mapping_type": "string",
"match": "*name",
"mapping": {
"type": "text",
"analyzer": "my_name_analyzer",
"copy_to": "full_name"
}
}
},
{
"string_fields_german": {
"match_mapping_type": "string",
"match": "*",
"mapping": {
"type": "text",
"analyzer": "my_name_analyzer"
}
}
},
{
"dates": {
"match": "lastModified",
"match_pattern": "regex",
"mapping": {
"type": "date",
"ignore_malformed": true
}
}
}
]
}
Then we can populate the index in exactly the same manner as we did before.
Now we can query the new field full_name with the following query:
POST /umlautsuche/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"default_operator": "AND",
"fields": [
"full_name"
]
}
}
]
}
}
}
This query will return same 2 documents as the second query. Thus, in this case default_operator: AND behaves as we would expect it, asking for all tokens from the query to be matched.
Hope that helps!

Search-as-you-type inside arrays

I am trying to implement a search-as-you-type query inside an array.
This is the structure of the documents:
{
"guid": "6f954d53-df57-47e3-ae9e-cb445bd566d3",
"labels":
[
{
"name": "London",
"lang": "en"
},
{
"name": "Llundain",
"lang": "cy"
},
{
"name": "Lunnainn",
"lang": "gd"
}
]
}
and up to now this is what I came with:
{
"query": {
"multi_match": {
"fields": ["labels.name"],
"query": name,
"type": "phrase_prefix"
}
}
which works exactly as requested.
The problem is that I would like to search also by language.
What I tried is:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["labels.name"],
"query": "london",
"type": "phrase_prefix"
}
},
{
"term": {
"labels.lang": "gd"
}
}
]
}
}
}
but these queries act on separate values of the array.
So, for example, I would like to search only Welsh language (cy). That means that my query that contains the city name should match only values that have "cy" on the "lang" tag.
How do I write this kind of query?
Internally, ElasticSearch flattens nested JSON objects, so it can't correlate the lang and name of a specific element in the labels array. If you want this kind of correlation, you'll need to index your documents differently.
The usual way to do this is to use the nested data type with a matching nested query.
The query would end up looking something like this:
{
"query": {
"nested": {
"path": "labels",
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["labels.name"],
"query": "london",
"type": "phrase_prefix"
}
},
{
"term": {
"labels.lang": "gd"
}
}
]
}
}
}
}
}
But note that you'll need to also specify nested mappings for your labels, e.g.:
"properties": {
"labels": {
"type": "nested",
"properties": {
"name": {
"type": "text"
/* you might want to add other mapping-related configuration here */
},
"lang": {
"type": "keyword"
}
}
}
}
Other ways to do this include:
Indexing each label as a separate document, repeating the guid field
Using parent/child documents
You should use Nested datatype in mapping instead of Object datatype. For detail explanation refer this:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
So, you should define mapping of your field something like this:
{
"properties": {
"labels": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"lang": {
"type": "keyword"
}
}
}
}
}
After this you could query using Nested Query as:
{
"query": {
"nested": {
"path": "labels",
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["labels.name"],
"query": "london",
"type": "phrase_prefix"
}
},
{
"term": {
"labels.lang": "gd"
}
}
]
}
}
}
}
}

How to get Elastic search to return both exact matched and then other matches in result

Need help with Elasticsearch. I try to get first exact match result then those documents that have one field matched using the following query but with no luck. Basically, trying to get top score hits first and then less accurate and only matched by one field in the total search result.
The mapping is as following:
{
"palsx1493": {
"mappings": {
"pals": {
"properties": {
"aboutme": {
"type": "string"
},
"dob": {
"type": "date",
"format": "date"
},
"fccode": {
"type": "string"
},
"fcname": {
"type": "string"
},
"learning": {
"type": "nested",
"properties": {
"skillslevel": {
"type": "string"
},
"skillsname": {
"type": "string"
}
}
},
"name": {
"type": "string"
},
"rating": {
"type": "string"
},
"teaching": {
"type": "nested",
"properties": {
"skillslevel": {
"type": "string"
},
"skillsname": {
"type": "string"
}
}
},
"trate": {
"type": "string"
},
"treg": {
"type": "string"
}
}
}
}
}
}
When Searching, I need the result to return the exact matched documents followed by lower score matched with the teaching skillname in that prioritized order. what happens now is that I get the exact matches correctly first and then I get the learning.skillname matched, and then teaching.skillname matched. I want these two last ones swapped having the teaching.skillname coming after the exact matched results.
Exact match:
1. fcname (is crom country name and can be either a specific name or just set to "Any Country".
2. dob: Date of birth is a range value - a range value is given as input
3. teaching: skillname
4. learning: skillname
This is what I have tried with no luck:
{
"query": {
"bool": {
"should": [
{ "match": { "fcname": "spain"}},
{ "range": {
"bod": {
"from": "1950-10-10",
"to": "1967-12-12"
}
}
},
{
"nested": {
"path": "learning",
"score_mode": "max",
"query": {
"bool": {
"must": [
{ "match": { "learning.skillname": learningSkillName}}
]
}
}
}
},
{
"nested": {
"path": "teaching",
"query": {
"bool": {
"must": [
{ "match": { "teaching.skillname": teachingSkillName}}
]
}
}
}
}
]
}
}
}
Please look into indices. The default is a full text search which does inverted indexing to store data. So it would store the string according to the analyzer.
Fo exact string match please use : index = 'not_analyzed'
eg.
"nick"{
"type": "string",
"index":"not_analyzed"
},
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-core-types.html
I figured it out. Solution was to use function_score feature to override/ add score to a document with certain matched field. Replacing the nested part above with following gave me the correct result:
"nested": {
"path": "teaching",
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{ "match": { "teaching.skillname": "xxx"}}
]
}
},
"functions": [
{
"script_score": {
"script": "_score + 2"
}
}],

Resources