How to escape special character from `match_phase` query? - elasticsearch

I am using elasticsearch 6.8 and doing below query:
curl localhost:9200/twitter/_search?pretty=true -H 'Content-Type: application/json' -d '
{ "query": {"match_phrase": { "name": ".C" }}}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "twitter",
"_type" : "1",
"_id" : "2",
"_score" : 0.2876821,
"_source" : {
"name" : "my name C 100"
}
},
{
"_index" : "twitter",
"_type" : "1",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "my name .C 100"
}
}
]
}
}
You see that two documents get returned but I don't expect the first one which doesn't have .C get returned. I have tried to escape dot with {"match_phrase": { "name": "\\.C" }} but it doesn't work.
I don't want to change the type of the name to be keyword because I still need tokenizer.
And I have put . as protected words in the index setting as below:
#curl localhost:9200/twitter/_settings?
{
"twitter" : {
"settings" : {
"index" : {
"number_of_shards" : "5",
"provided_name" : "twitter",
"creation_date" : "1579489541087",
"analysis" : {
"filter" : {
"word_delim_filter" : {
"type" : "word_delimiter",
"protected_words" : [
"."
]
}
},
"analyzer" : {
"content" : {
"type" : "custom",
"tokenizer" : "whitespace"
},
"custom_synonyms_delim" : {
"filter" : [
"word_delim_filter"
],
"tokenizer" : "whitespace"
}
}
},
"number_of_replicas" : "1",
"uuid" : "nYr7NPdVRCqIcTzzM_iBeQ",
"version" : {
"created" : "6080299"
}
}
}
}
}
How can I escape dot in the query?

Here is a working example of how to handle dot in your scenario:
Mapping
PUT my_index
{
"settings": {
"analysis": {
"filter": {
"word_delim_filter": {
"type": "word_delimiter",
"type_table": [
". => ALPHANUM"
]
}
},
"analyzer": {
"content": {
"type": "custom",
"tokenizer": "whitespace"
},
"custom_synonyms_delim": {
"filter": [
"word_delim_filter"
],
"type": "custom",
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "custom_synonyms_delim",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
Indexing documents
POST my_index/_doc/1
{
"name" : "my name C 100"
}
POST my_index/_doc/2
{
"name" : "my name .C 100"
}
Search Query
GET my_index/_search
{
"query": {
"match_phrase": {
"name": ".C"
}
}
}
Results
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.6931472,
"_source" : {
"name" : "my name .C 100"
}
}
]
Hope this helps

Related

Elasticsearch: sorted nested array

Is it possible to configure the mapping of an index, or the discover view of this in index in a way that an array inside the documents is / will be sorted?
Background: I have a es index with documents containing an array:
This array is updated from time to time with new entries (objects containing a timestamp), and I would like this arrays to be sorted according to the timestamp inside the objects.
If your field is define as nested type then you can use inner_hits to sort the array of object. it will return the sorted object array inside inner_hits for each document.
You can define field as nested like below:
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"openTimes": {
"type": "nested",
"properties": {
"date": {
"type": "date"
},
"name": {
"type": "keyword"
}
}
}
}
}
}
Let consider below is your sample data:
{"index": { } }
{ "name": "second on 6th (3rd on the 5th)", "openTimes": [ { "date": "2018-12-05T12:00:00" ,"name":"abc"}, { "date": "2018-12-06T11:00:00","name":"xyz" }] }
{"index": { } }
{ "name": "third on 6th (1st on the 5th)", "openTimes": [ {"date": "2018-12-05T10:00:00","name":"abc"}, { "date": "2018-12-06T12:00:00","name":"xyz" }] }
{"index": { } }
{ "name": "first on the 6th (2nd on the 5th)", "openTimes": [ {"date": "2018-12-05T11:00:00","name":"abc" }, { "date": "2018-12-06T10:00:00","name":"xyz" }] }
Below is Query:
{
"query": {
"nested": {
"path": "openTimes",
"query": {
"match_all": {}
},
"inner_hits": {
"sort": {
"openTimes.date": "desc"
}
}
}
}
}
Sample Response:
{
"_index" : "nested-listings",
"_type" : "_doc",
"_id" : "u0fw338BMCbs63yKkqi0",
"_score" : 1.0,
"_source" : {
"name" : "second on 6th (3rd on the 5th)",
"openTimes" : [
{
"date" : "2018-12-05T12:00:00",
"name" : "abc"
},
{
"date" : "2018-12-06T11:00:00",
"name" : "xyz"
}
]
},
"inner_hits" : {
"openTimes" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "nested-listings",
"_type" : "_doc",
"_id" : "u0fw338BMCbs63yKkqi0",
"_nested" : {
"field" : "openTimes",
"offset" : 1
},
"_score" : null,
"_source" : {
"date" : "2018-12-06T11:00:00",
"name" : "xyz"
},
"sort" : [
1544094000000
]
},
{
"_index" : "nested-listings",
"_type" : "_doc",
"_id" : "u0fw338BMCbs63yKkqi0",
"_nested" : {
"field" : "openTimes",
"offset" : 0
},
"_score" : null,
"_source" : {
"date" : "2018-12-05T12:00:00",
"name" : "abc"
},
"sort" : [
1544011200000
]
}
]
}
}
}
}

elastic search version 6-7 , analyzer used for sort not working

I am creating the following my_cars index
PUT my_cars
{
"settings": {
"analysis": {
"analyzer": {
"sortable": {
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "sortable"
}
}
}
}
When i check the mapping , it seems fine :-
{
"my_cars" : {
"mappings" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
},
"analyzer" : "sortable"
}
}
}
}
}
But now when i run the query for search and sort
GET my_cars/_search
{
"query": {
"match_all": {}
},
"sort": {
"name.keyword": {
"order": "asc"
}
}
}
The capital/uppercase results show up first , hence making me think the analyzer is not working fine. the result i get is as follows :-
{
"took" : 163,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "my_cars",
"_type" : "_doc",
"_id" : "f1RLUnoBZEZpPd-TeK9L",
"_score" : null,
"_source" : {
"name" : "Apples",
"price" : 250
},
"sort" : [
"Apples"
]
},
{
"_index" : "my_cars",
"_type" : "_doc",
"_id" : "H7JLUnoBh60DJePfnpGB",
"_score" : null,
"_source" : {
"name" : "Brocoli",
"price" : 250
},
"sort" : [
"Brocoli"
]
},
{
"_index" : "my_cars",
"_type" : "_doc",
"_id" : "gFRLUnoBZEZpPd-Tyq9A",
"_score" : null,
"_source" : {
"name" : "azus",
"price" : 110
},
"sort" : [
"azus"
]
},
{
"_index" : "my_cars",
"_type" : "_doc",
"_id" : "gVRMUnoBZEZpPd-TAq-A",
"_score" : null,
"_source" : {
"name" : "botpzus",
"price" : 80
},
"sort" : [
"botpzus"
]
}
]
}
}
As you can see the lowercase names come in last, how do i fix this ? I have build my analyzer based on THIS question. But unlike the answer in that question , i am unable to add the analyzer field directly inside the keyword mapping. How do i fix my alphabetical search irrespective of casing ?
The solution is to use a normalizer on the name.keyword:
PUT my_cars
{
"settings": {
"analysis": {
"normalizer": {
"sortable": {
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "sortable"
}
}
}
}
}
}

Finding all objects with a certain field in ElasticSearch

My mapping looks like so:
"condition": {
"properties": {
"name": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
},
and some data I have looks like:
"condition": [
{
"name": "condition",
"value": "new",
},
{
"name": "condition",
"value": "gently-used",
}
]
How can I write a query that finds all objects within the array that have a new condition?
I have the following but I am getting 0 results back:
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"attribute_condition": "new"
}
}
]
}
}
}
First, you need to map your condition field as a nested type.
"condition": {
"type": "nested",
"properties": {
"name": { "type": "keyword" },
"value": { "type": "keyword" }
}
},
Now you're able to query each element of the condition array independently from each other. Next, you need to use the nested query and request to retrieve the inner hits and output them in the inner_hits object of the query response
{
"query": {
"bool": {
"must": {
"nested": {
"path": "condition",
"query": {
"match": {
"condition.value": "new"
}
},
"inner_hits": {}
}
}
}
}
}
An example response will look like below:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "nested",
"_type" : "_doc",
"_id" : "Xx_LN3gBp5RUqdfAef3B",
"_score" : 0.6931471,
"_source" : {
"condition" : [
{
"name" : "condition",
"value" : "new"
},
{
"name" : "condition",
"value" : "gently-used"
}
]
},
"inner_hits" : { <--- here begins the list of inner hits
"condition" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "nested",
"_type" : "_doc",
"_id" : "Xx_LN3gBp5RUqdfAef3B",
"_nested" : {
"field" : "condition",
"offset" : 0
},
"_score" : 0.6931471,
"_source" : {
"name" : "condition",
"value" : "new"
}
}
]
}
}
}
}
]
}
}

How to build simple terms query for nested object?

I have index like this:
PUT job_offers
{
"mappings": {
"properties": {
"location": {
"properties": {
"slug": {
"type": "keyword"
},
"name": {
"type": "text"
}
},
"type": "nested"
},
"experience": {
"properties": {
"slug": {
"type": "keyword"
},
"name": {
"type": "text"
}
},
"type": "nested"
}
}
}
}
I insert this object:
POST job_offers/_doc
{
"title": "Junior Ruby on Rails Developer",
"location": [
{
"slug": "new-york",
"name": "New York"
},
{
"slug": "atlanta",
"name": "Atlanta"
},
{
"slug": "remote",
"name": "Remote"
}
],
"experience": [
{
"slug": "junior",
"name": "Junior"
}
]
}
This query returns 0 documents.
GET job_offers/_search
{
"query": {
"terms": {
"location.slug": [
"remote",
"new-york"
]
}
}
}
Can you explain me why? I thought it should return documents where location.slug is remote or new-york.
Nested- Query have a different syntax
GET job_offers/_search
{
"query": {
"nested": {
"path": "location",
"query": {
"terms": {
"location.slug": ["remote","new-york"]
}
}
}
}
}
Result:
"hits" : [
{
"_index" : "job_offers",
"_type" : "_doc",
"_id" : "wWjoXnEBs0rCGpYsvUf4",
"_score" : 1.0,
"_source" : {
"title" : "Junior Ruby on Rails Developer",
"location" : [
{
"slug" : "new-york",
"name" : "New York"
},
{
"slug" : "atlanta",
"name" : "Atlanta"
},
{
"slug" : "remote",
"name" : "Remote"
}
],
"experience" : [
{
"slug" : "junior",
"name" : "Junior"
}
]
}
}
]
It will return entire document where location.slug matches "remote" or "new-york". If you want to get matched nested document , you need to use inner_hits
GET job_offers/_search
{
"query": {
"nested": {
"path": "location",
"query": {
"terms": {
"location.slug": ["remote","new-york"]
}
},
"inner_hits": {} --> note
}
}
}
Result:
"hits" : [
{
"_index" : "job_offers",
"_type" : "_doc",
"_id" : "wWjoXnEBs0rCGpYsvUf4",
"_score" : 1.0,
"_source" : {
"title" : "Junior Ruby on Rails Developer",
"location" : [
{
"slug" : "new-york",
"name" : "New York"
},
{
"slug" : "atlanta",
"name" : "Atlanta"
},
{
"slug" : "remote",
"name" : "Remote"
}
],
"experience" : [
{
"slug" : "junior",
"name" : "Junior"
}
]
},
"inner_hits" : { --> will give matched nested object
"location" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "job_offers",
"_type" : "_doc",
"_id" : "wWjoXnEBs0rCGpYsvUf4",
"_nested" : {
"field" : "location",
"offset" : 0
},
"_score" : 1.0,
"_source" : {
"slug" : "new-york",
"name" : "New York"
}
},
{
"_index" : "job_offers",
"_type" : "_doc",
"_id" : "wWjoXnEBs0rCGpYsvUf4",
"_nested" : {
"field" : "location",
"offset" : 2
},
"_score" : 1.0,
"_source" : {
"slug" : "remote",
"name" : "Remote"
}
}
]
}
}
}
}
]
Also I see that you are using two fields for same data with different types. if data is same in both fields(name and slug) and only data type is different, you can use fields for that
It is often useful to index the same field in different ways for
different purposes. This is the purpose of multi-fields. For instance,
a string field could be mapped as a text field for full-text search,
and as a keyword field for sorting or aggregations:
In that case your mapping will become below
PUT job_offers
{
"mappings": {
"properties": {
"location": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
},
"type": "nested"
},
"experience": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
},
"type": "nested"
}
}
}
}

Fuzzy query not giving any results

Fuzzy query in elastic search in not working, even with the exact value the results are empty.
ES Version: 7.6.2
Index Mapping: Below are the mapping details
{
"movies" : {
"mappings" : {
"properties" : {
"genre" : {
"type" : "text",
"fields" : {
"field" : {
"type" : "keyword"
}
}
},
"id" : {
"type" : "long"
},
"rating" : {
"type" : "double"
},
"title" : {
"type" : "text"
}
}
}
}
}
Documents: Below documents are present in the index
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "movies",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 2,
"title" : "Raju Ban gaya gentleman",
"rating" : 2,
"genre" : [
"Drama"
]
}
},
{
"_index" : "movies",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"id" : 2,
"title" : "Baat ban jaegi gentleman",
"rating" : 4,
"genre" : [
"Drama"
]
}
}
]
}
}
Query: Below is the query which i am using for searching the document
GET movies/_search
{
"query": {
"fuzzy": {
"title": {"value": "Bat ban jaegi gentleman", "fuzziness": 1}
}
}
}
I haven't used fuzzy queries before and per my understanding it should work just fine.
Fuzzy queries are not analyzed but the field is so your search for
Bat ban jaegi gentleman will be divided into different terms and Bat will be analyzed and that term will be further used to filter down the result.
You can refer to this answer as well ElasticSearch's Fuzzy Query as to why fuzzy query analyze on field.
But since you want to analyze complete title, you can change your mapping of title to have keyword field as well.
You can see how exactly your string will be tokenized by analyze API:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html
Below is mapping for the same:
"mappings": {
"properties": {
"genre": {
"type": "text",
"fields": {
"field": {
"type": "keyword"
}
}
},
"id": {
"type": "long"
},
"rating": {
"type": "double"
},
"title": {
"type": "text",
"fields": {
"field": {
"type": "keyword"
}
}
}
}
}
Now if you search on title.field you will get desired result. Search query is :
{
"query": {
"fuzzy": {
"title.field": {"value": "Bat ban jaegi gentleman", "fuzziness": 1}
}
}
}
Result obtained in this case is :
"hits": [
{
"_index": "ftestmovies",
"_type": "_doc",
"_id": "2",
"_score": 0.9381845,
"_source": {
"title": "Baat ban jaegi gentleman",
"rating": 4,
"genre": [
"Drama"
]
}
}
]

Resources