Fuzzy query in elastic search in not working, even with the exact value the results are empty.
ES Version: 7.6.2
Index Mapping: Below are the mapping details
{
"movies" : {
"mappings" : {
"properties" : {
"genre" : {
"type" : "text",
"fields" : {
"field" : {
"type" : "keyword"
}
}
},
"id" : {
"type" : "long"
},
"rating" : {
"type" : "double"
},
"title" : {
"type" : "text"
}
}
}
}
}
Documents: Below documents are present in the index
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "movies",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 2,
"title" : "Raju Ban gaya gentleman",
"rating" : 2,
"genre" : [
"Drama"
]
}
},
{
"_index" : "movies",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"id" : 2,
"title" : "Baat ban jaegi gentleman",
"rating" : 4,
"genre" : [
"Drama"
]
}
}
]
}
}
Query: Below is the query which i am using for searching the document
GET movies/_search
{
"query": {
"fuzzy": {
"title": {"value": "Bat ban jaegi gentleman", "fuzziness": 1}
}
}
}
I haven't used fuzzy queries before and per my understanding it should work just fine.
Fuzzy queries are not analyzed but the field is so your search for
Bat ban jaegi gentleman will be divided into different terms and Bat will be analyzed and that term will be further used to filter down the result.
You can refer to this answer as well ElasticSearch's Fuzzy Query as to why fuzzy query analyze on field.
But since you want to analyze complete title, you can change your mapping of title to have keyword field as well.
You can see how exactly your string will be tokenized by analyze API:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html
Below is mapping for the same:
"mappings": {
"properties": {
"genre": {
"type": "text",
"fields": {
"field": {
"type": "keyword"
}
}
},
"id": {
"type": "long"
},
"rating": {
"type": "double"
},
"title": {
"type": "text",
"fields": {
"field": {
"type": "keyword"
}
}
}
}
}
Now if you search on title.field you will get desired result. Search query is :
{
"query": {
"fuzzy": {
"title.field": {"value": "Bat ban jaegi gentleman", "fuzziness": 1}
}
}
}
Result obtained in this case is :
"hits": [
{
"_index": "ftestmovies",
"_type": "_doc",
"_id": "2",
"_score": 0.9381845,
"_source": {
"title": "Baat ban jaegi gentleman",
"rating": 4,
"genre": [
"Drama"
]
}
}
]
Related
Getting incorrect inner hits from parent child relationship when combined with boolean query
Hi Everyone
I am getting incorrect inner hits results when combining parent-child query with boolean query. To reproduce the issue, I create this Index
PUT /my-index-000001
{
"mappings": {
"_routing": {
"required": true
},
"properties": {
"parentProperty": {
"type": "text"
},
"childProperty": {
"type": "text"
},
"id": {
"type": "integer"
},
"myJoinField": {
"type": "join",
"relations": {
"parent": "mychild"
}
}
}
}
}
then I add these three documents (document with Id equals "1" is the parent of the other two documents)
POST /my-index-000001/_doc/1?routing=1
{
"id": 1,
"parentProperty": "a parent document",
"myJoinField": "parent"
}
POST /my-index-000001/_doc/2?routing=1
{
"id": 2,
"childProperty": "queensland civil administration",
"myJoinField": {
"name":"mychild",
"parent":"1"
}
}
POST /my-index-000001/_doc/3?routing=1
{
"id": 3,
"childProperty": "beautiful weather",
"myJoinField": {
"name":"mychild",
"parent":"1"
}
}
now we set up our index with 3 documents. I am looking for all child documents that meet this boolean query: [childProperty contains either "queensland civil" or both "beautiful" and "nothing"].
I expect that elastic returns only the child document with Id "2" since the child document with Id "3" does not have the term "nothing" in it.
The translated version of this query is as follows:
GET /my-index-000001/_search
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"has_child": {
"inner_hits": {
"name": "opr1"
},
"query": {
"query_string": {
"analyzer": "stop",
"query": "childProperty:(\"queensland civil\")"
}
},
"type": "mychild"
}
},
{
"bool": {
"must": [
{
"has_child": {
"inner_hits": {
"name": "opr2"
},
"query": {
"query_string": {
"query": "childProperty:(beautiful)"
}
},
"type": "mychild"
}
},
{
"has_child": {
"inner_hits": {
"name": "opr3"
},
"query": {
"query_string": {
"query": "childProperty:(nothing)"
}
},
"type": "mychild"
}
}
]
}
}
]
}
}
}
and the result that is returned from elasitc is as follows:
{
"took" : 24,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_routing" : "1",
"_source" : {
"id" : 1,
"parentProperty" : "a parent document",
"myJoinField" : "parent"
},
"inner_hits" : {
"opr1" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.2814486,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.2814486,
"_routing" : "1",
"_source" : {
"id" : 2,
"childProperty" : "queensland civil administration",
"myJoinField" : {
"name" : "mychild",
"parent" : "1"
}
}
}
]
}
},
"opr2" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.7549127,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.7549127,
"_routing" : "1",
"_source" : {
"id" : 3,
"childProperty" : "beautiful weather",
"myJoinField" : {
"name" : "mychild",
"parent" : "1"
}
}
}
]
}
},
"opr3" : {
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
}
}
]
}
}
as you can see in the result the elastic returns both child document which clearly is against what I have written in the "must" section of the query.
but if I rewrite the query as following then it will return ONLY the expected document (document with Id "2"):
GET /my-index-000001/_search
{
"query": {
"bool": {
"must": [
{
"has_child": {
"inner_hits": {
"name": "opr1"
},
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"query_string": {
"query": "childProperty:(\"queensland civil\")"
}
},
{
"bool": {
"must": [
{
"query_string": {
"query": "childProperty:(beautiful)"
}
},
{
"query_string": {
"query": "childProperty:(weather1)"
}
}
]
}
}
]
}
},
"type": "mychild"
}
}
]
}
}
}
here is the correct result:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_routing" : "1",
"_source" : {
"id" : 1,
"parentProperty" : "a parent document",
"myJoinField" : "parent"
},
"inner_hits" : {
"opr1" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.2814486,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.2814486,
"_routing" : "1",
"_source" : {
"id" : 2,
"childProperty" : "queensland civil administration",
"myJoinField" : {
"name" : "mychild",
"parent" : "1"
}
}
}
]
}
}
}
}
]
}
}
I appreciate it if someone tells me what I did wrong in the first query or if this is the default behavior in elasitc when it comes to parent/child relationship.
Is it possible to configure the mapping of an index, or the discover view of this in index in a way that an array inside the documents is / will be sorted?
Background: I have a es index with documents containing an array:
This array is updated from time to time with new entries (objects containing a timestamp), and I would like this arrays to be sorted according to the timestamp inside the objects.
If your field is define as nested type then you can use inner_hits to sort the array of object. it will return the sorted object array inside inner_hits for each document.
You can define field as nested like below:
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"openTimes": {
"type": "nested",
"properties": {
"date": {
"type": "date"
},
"name": {
"type": "keyword"
}
}
}
}
}
}
Let consider below is your sample data:
{"index": { } }
{ "name": "second on 6th (3rd on the 5th)", "openTimes": [ { "date": "2018-12-05T12:00:00" ,"name":"abc"}, { "date": "2018-12-06T11:00:00","name":"xyz" }] }
{"index": { } }
{ "name": "third on 6th (1st on the 5th)", "openTimes": [ {"date": "2018-12-05T10:00:00","name":"abc"}, { "date": "2018-12-06T12:00:00","name":"xyz" }] }
{"index": { } }
{ "name": "first on the 6th (2nd on the 5th)", "openTimes": [ {"date": "2018-12-05T11:00:00","name":"abc" }, { "date": "2018-12-06T10:00:00","name":"xyz" }] }
Below is Query:
{
"query": {
"nested": {
"path": "openTimes",
"query": {
"match_all": {}
},
"inner_hits": {
"sort": {
"openTimes.date": "desc"
}
}
}
}
}
Sample Response:
{
"_index" : "nested-listings",
"_type" : "_doc",
"_id" : "u0fw338BMCbs63yKkqi0",
"_score" : 1.0,
"_source" : {
"name" : "second on 6th (3rd on the 5th)",
"openTimes" : [
{
"date" : "2018-12-05T12:00:00",
"name" : "abc"
},
{
"date" : "2018-12-06T11:00:00",
"name" : "xyz"
}
]
},
"inner_hits" : {
"openTimes" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "nested-listings",
"_type" : "_doc",
"_id" : "u0fw338BMCbs63yKkqi0",
"_nested" : {
"field" : "openTimes",
"offset" : 1
},
"_score" : null,
"_source" : {
"date" : "2018-12-06T11:00:00",
"name" : "xyz"
},
"sort" : [
1544094000000
]
},
{
"_index" : "nested-listings",
"_type" : "_doc",
"_id" : "u0fw338BMCbs63yKkqi0",
"_nested" : {
"field" : "openTimes",
"offset" : 0
},
"_score" : null,
"_source" : {
"date" : "2018-12-05T12:00:00",
"name" : "abc"
},
"sort" : [
1544011200000
]
}
]
}
}
}
}
I am creating the following my_cars index
PUT my_cars
{
"settings": {
"analysis": {
"analyzer": {
"sortable": {
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "sortable"
}
}
}
}
When i check the mapping , it seems fine :-
{
"my_cars" : {
"mappings" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
},
"analyzer" : "sortable"
}
}
}
}
}
But now when i run the query for search and sort
GET my_cars/_search
{
"query": {
"match_all": {}
},
"sort": {
"name.keyword": {
"order": "asc"
}
}
}
The capital/uppercase results show up first , hence making me think the analyzer is not working fine. the result i get is as follows :-
{
"took" : 163,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "my_cars",
"_type" : "_doc",
"_id" : "f1RLUnoBZEZpPd-TeK9L",
"_score" : null,
"_source" : {
"name" : "Apples",
"price" : 250
},
"sort" : [
"Apples"
]
},
{
"_index" : "my_cars",
"_type" : "_doc",
"_id" : "H7JLUnoBh60DJePfnpGB",
"_score" : null,
"_source" : {
"name" : "Brocoli",
"price" : 250
},
"sort" : [
"Brocoli"
]
},
{
"_index" : "my_cars",
"_type" : "_doc",
"_id" : "gFRLUnoBZEZpPd-Tyq9A",
"_score" : null,
"_source" : {
"name" : "azus",
"price" : 110
},
"sort" : [
"azus"
]
},
{
"_index" : "my_cars",
"_type" : "_doc",
"_id" : "gVRMUnoBZEZpPd-TAq-A",
"_score" : null,
"_source" : {
"name" : "botpzus",
"price" : 80
},
"sort" : [
"botpzus"
]
}
]
}
}
As you can see the lowercase names come in last, how do i fix this ? I have build my analyzer based on THIS question. But unlike the answer in that question , i am unable to add the analyzer field directly inside the keyword mapping. How do i fix my alphabetical search irrespective of casing ?
The solution is to use a normalizer on the name.keyword:
PUT my_cars
{
"settings": {
"analysis": {
"normalizer": {
"sortable": {
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "sortable"
}
}
}
}
}
}
My mapping looks like so:
"condition": {
"properties": {
"name": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
},
and some data I have looks like:
"condition": [
{
"name": "condition",
"value": "new",
},
{
"name": "condition",
"value": "gently-used",
}
]
How can I write a query that finds all objects within the array that have a new condition?
I have the following but I am getting 0 results back:
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"attribute_condition": "new"
}
}
]
}
}
}
First, you need to map your condition field as a nested type.
"condition": {
"type": "nested",
"properties": {
"name": { "type": "keyword" },
"value": { "type": "keyword" }
}
},
Now you're able to query each element of the condition array independently from each other. Next, you need to use the nested query and request to retrieve the inner hits and output them in the inner_hits object of the query response
{
"query": {
"bool": {
"must": {
"nested": {
"path": "condition",
"query": {
"match": {
"condition.value": "new"
}
},
"inner_hits": {}
}
}
}
}
}
An example response will look like below:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "nested",
"_type" : "_doc",
"_id" : "Xx_LN3gBp5RUqdfAef3B",
"_score" : 0.6931471,
"_source" : {
"condition" : [
{
"name" : "condition",
"value" : "new"
},
{
"name" : "condition",
"value" : "gently-used"
}
]
},
"inner_hits" : { <--- here begins the list of inner hits
"condition" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "nested",
"_type" : "_doc",
"_id" : "Xx_LN3gBp5RUqdfAef3B",
"_nested" : {
"field" : "condition",
"offset" : 0
},
"_score" : 0.6931471,
"_source" : {
"name" : "condition",
"value" : "new"
}
}
]
}
}
}
}
]
}
}
I am using elasticsearch 6.8 and doing below query:
curl localhost:9200/twitter/_search?pretty=true -H 'Content-Type: application/json' -d '
{ "query": {"match_phrase": { "name": ".C" }}}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "twitter",
"_type" : "1",
"_id" : "2",
"_score" : 0.2876821,
"_source" : {
"name" : "my name C 100"
}
},
{
"_index" : "twitter",
"_type" : "1",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "my name .C 100"
}
}
]
}
}
You see that two documents get returned but I don't expect the first one which doesn't have .C get returned. I have tried to escape dot with {"match_phrase": { "name": "\\.C" }} but it doesn't work.
I don't want to change the type of the name to be keyword because I still need tokenizer.
And I have put . as protected words in the index setting as below:
#curl localhost:9200/twitter/_settings?
{
"twitter" : {
"settings" : {
"index" : {
"number_of_shards" : "5",
"provided_name" : "twitter",
"creation_date" : "1579489541087",
"analysis" : {
"filter" : {
"word_delim_filter" : {
"type" : "word_delimiter",
"protected_words" : [
"."
]
}
},
"analyzer" : {
"content" : {
"type" : "custom",
"tokenizer" : "whitespace"
},
"custom_synonyms_delim" : {
"filter" : [
"word_delim_filter"
],
"tokenizer" : "whitespace"
}
}
},
"number_of_replicas" : "1",
"uuid" : "nYr7NPdVRCqIcTzzM_iBeQ",
"version" : {
"created" : "6080299"
}
}
}
}
}
How can I escape dot in the query?
Here is a working example of how to handle dot in your scenario:
Mapping
PUT my_index
{
"settings": {
"analysis": {
"filter": {
"word_delim_filter": {
"type": "word_delimiter",
"type_table": [
". => ALPHANUM"
]
}
},
"analyzer": {
"content": {
"type": "custom",
"tokenizer": "whitespace"
},
"custom_synonyms_delim": {
"filter": [
"word_delim_filter"
],
"type": "custom",
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "custom_synonyms_delim",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
Indexing documents
POST my_index/_doc/1
{
"name" : "my name C 100"
}
POST my_index/_doc/2
{
"name" : "my name .C 100"
}
Search Query
GET my_index/_search
{
"query": {
"match_phrase": {
"name": ".C"
}
}
}
Results
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.6931472,
"_source" : {
"name" : "my name .C 100"
}
}
]
Hope this helps