Elastic search fuzzy query unexpected results - elasticsearch

I have 2 indices, cities and places. Places one has a mapping like this:
{
"mappings": {
"properties": {
"cityId": {
"type": "integer"
},
"cityName": {
"type": "text"
},
"placeName": {
"type": "text"
},
"status": {
"type": "keyword"
},
"category": {
"type": "keyword"
},
"reviews": {
"properties": {
"rating": {
"type": "long"
},
"comment": {
"type": "keyword"
},
"user": {
"type": "nested"
}
}
}
}
}
}
And City is index is mapped like this:
{
"mappings": {
"properties": {
"state": {
"type": "keyword"
},
"postal": {
"type": "keyword"
},
"phone": {
"type": "keyword"
},
"email": {
"type": "keyword"
},
"notes": {
"type": "keyword"
},
"status": {
"type": "keyword"
},
"cityName": {
"type": "text"
},
"website": {
"type": "keyword"
},
"cityId": {
"type": "integer"
}
}
}
}
Initially we had a single document where cities had places embedded but I was having trouble searching nested places array so I changed the structure to this, I want to be able to search both cityName and placeName in a single query with fuzziness. I have a city including the word Welder's in it's name and also the some places inside the same location have the word Welder's in their name, which have a type:text. However when searched for welder both of the following queries see below don't return these documents, a search for welders OR welder's does return these documents. I am not sure why welder won't match with Welder's*. I didn't specify any analyzer during the creation of both the indices and neither am I explicitly defining it in the query can anyone help me out with this query so it behaves as expected:
Query 1: index = places
{
"query": {
"bool": {
"should": [
{
"match": {
"placeName": {
"query": "welder",
"fuzziness": 20
}
}
},
{
"match": {
"cityName": {
"query": "welder",
"fuzziness": 20
}
}
}
]
}
}
}
Query 2: index = places
{
"query": {
"match": {
"placeName": {
"query": "welder",
"fuzziness": 20
}
}
}
}
Can anyone post a query that when passed a word welder would return documents having Welder's in their name (should also work for other terms like these, this is just an example)
Edit 1 :
This is a sample place document I would want to be returned by any of the queries posted above:
{
cityId: 29,
placeName: "Welder's Garage Islamabad",
cityName: "Islamabad",
status: "verified",
category: null,
reviews: []
}

Using your mapping and query and fuzziness set as "20" I am getting document back. Fuzziness: 20 will tolerate 20 edit distance between searched word and welder's so even "w" will match with "welder's". I think this value is different in your actual query.
If you want to search for welder or welders and return welder's then you can use stemmer token filter
Mapping:
PUT indexfuzzy
{
"mappings": {
"properties": {
"cityId": {
"type": "integer"
},
"cityName": {
"type": "text"
},
"placeName": {
"type": "text",
"analyzer": "my_analyzer"
},
"status": {
"type": "keyword"
},
"category": {
"type": "keyword"
},
"reviews": {
"properties": {
"rating": {
"type": "long"
},
"comment": {
"type": "keyword"
},
"user": {
"type": "nested"
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"stem_possessive_english",
"stem_minimal_english"
]
}
},
"filter": {
"stem_possessive_english": {
"type": "stemmer",
"name": "possessive_english"
},
"stem_minimal_english": {
"type": "stemmer",
"name": "minimal_english"
}
}
}
}
}
Query :
GET indexfuzzy/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"placeName": {
"query": "welder"--> welder,welders,welder's will work
}
}
},
{
"match": {
"cityName": {
"query": "welder"
}
}
}
]
}
}
}
Result:
[
{
"_index" : "indexfuzzy",
"_type" : "_doc",
"_id" : "Jc-yx3ABd7NBn_0GTBdp",
"_score" : 0.2876821,
"_source" : {
"cityId" : 29,
"placeName" : "Welder's Garage Islamabad",
"cityName" : "Islamabad",
"status" : "verified",
"category" : null,
"reviews" : [ ]
}
}
]
possessive_english:- removes trailing 's from tokens
minimal_english:- removes plurals
GET <index_name>/_analyze
{
"text": "Welder's Garage Islamabad",
"analyzer": "my_analyzer"
}
returns
{
"tokens" : [
{
"token" : "welder", --> will be matched for welder's, welders
"start_offset" : 0,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "garage",
"start_offset" : 9,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "islamabad",
"start_offset" : 16,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}

Related

Filter aggregation keys with non nested mapping in elasticsearch

I have following mapping:
{
"Country": {
"properties": {
"State": {
"properties": {
"Name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"Code": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"Lang": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
This is sample document:
{
"Country": {
"State": [
{
"Name": "California",
"Code": "CA",
"Lang": "EN"
},
{
"Name": "Alaska",
"Code": "AK",
"Lang": "EN"
},
{
"Name": "Texas",
"Code": "TX",
"Lang": "EN"
}
]
}
}
I am querying on this index to get aggregates of count of states by name. I am using following query:
{
"from": 0,
"size": 0,
"query": {
"query_string": {
"query": "Country.State.Name: *Ala*"
}
},
"aggs": {
"counts": {
"terms": {
"field": "Country.State.Name.raw",
"include": ".*Ala.*"
}
}
}
}
I am able to get only keys matching with query_string using include regex in terms aggregation but seems there is no way to make it case insensitive regex in include.
The result I want is:
{
"aggregations": {
"counts": {
"buckets": [
{
"key": "Alaska",
"doc_count": 1
}
]
}
}
}
Is there other solution available to get me only keys matching query_string without using nested mapping?
Use Normalizer for keyword datatype. Below is the sample mapping:
Mapping:
PUT country
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": { <---- Note this
"type": "custom",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"Country": {
"properties": {
"State": {
"properties": {
"Name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"normalizer": "my_normalizer" <---- Note this
}
}
},
"Code": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
},
"Lang": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
}
}
}
}
}
}
Document:
POST country/_doc/1
{
"Country": {
"State": [
{
"Name": "California",
"Code": "CA",
"Lang": "EN"
},
{
"Name": "Alaska",
"Code": "AK",
"Lang": "EN"
},
{
"Name": "Texas",
"Code": "TX",
"Lang": "EN"
}
]
}
}
Aggregation Query:
POST country/_search
{
"from": 0,
"size": 0,
"query": {
"query_string": {
"query": "Country.State.Name: *Ala*"
}
},
"aggs": {
"counts": {
"terms": {
"field": "Country.State.Name.raw",
"include": "ala.*"
}
}
}
}
Notice the query pattern in include. Basically all the values of the *.raw fields that you have, would be stored in lowercase letters due to the normalizer that I've applied.
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"counts" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "alaska",
"doc_count" : 1
}
]
}
}
}
Hope this helps!
I was able to fix the problem by using inline script to filter the keys. (Still a dirty fix but it solves my use case for now and I can avoid mapping changes)
Here is how I am executing query.
{
"from": 0,
"size": 0,
"query": {
"query_string": {
"query": "Country.State.Name: *Ala*"
}
},
"aggs": {
"counts": {
"terms": {
"script": {
"source": "doc['Country.State.Name.raw'].value.toLowerCase().contains('ala') ? doc['Country.State.Name.raw'].value : null",
"lang": "painless"
}
}
}
}
}

Find the number of documents with filter at a particular time in Elasticsearch

I have documents in elasticsearch in which each document looks something like as follows:
{
"id": "T12890ADSA12",
"status": "ENDED",
"type": "SAMPLE",
"updatedAt": "2020-05-29T18:18:08.483Z",
"audit": [
{
"event": "STARTED",
"version": 1,
"timestamp": "2020-04-30T13:41:25.862Z"
},
{
"event": "INPROGRESS",
"version": 2,
"timestamp": "2020-05-14T17:03:09.137Z"
},
{
"event": "INPROGRESS",
"version": 3,
"timestamp": "2020-05-17T17:03:09.137Z"
},
{
"event": "ENDED",
"version": 4,
"timestamp": "2020-05-29T18:18:08.483Z"
}
],
"createdAt": "2020-04-30T13:41:25.862Z"
}
If I wanted to know the number of documents which are in STARTED state at a particular time given. How can I do that? It should use the timestamp from each event in the events field.
Edit: Mapping of the index is as follows:
{
"id": "text",
"status": "text",
"type": "text",
"updatedAt": "date",
"events": [
{
"event": "text",
"version": long,
"timestamp": "date"
}
],
"createdAt": "date"
}
In order to achieve what you want, you need to make sure that the events array is of nested type because you have two conditions that you need to apply on each array element and this is only possible if events is nested:
"events" : {
"type": "nested", <--- you need to add this
"properties" : {
"event" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"timestamp" : {
"type" : "date"
},
"version" : {
"type" : "long"
}
}
},
Then you'll be able to run the following nested query:
{
"query": {
"nested": {
"path": "events",
"query": {
"bool": {
"must": [
{
"range": {
"events.date": {
"gte": "2020-06-08",
"lte": "2020-06-08"
}
}
},
{
"term": {
"events.event": "STARTED"
}
}
]
}
}
}
}
}

Counting search results in ElasticSearch by a nested property

Here is a schema with a nested property.
{
"dynamic": "strict",
"properties" : {
"Id" : {
"type": "integer"
},
"Name_en" : {
"type": "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"normalizer": "cloudbuy_normalizer_alphanumeric"
},
"text" : {
"type" : "text",
"analyzer": "english"
}
}
},
"Menus" : {
"type" : "nested",
"properties" : {
"Id" : {
"type" : "integer"
},
"Name" : {
"type" : "keyword",
"normalizer": "normalizer_alphanumeric"
},
"AncestorsIds" : {
"type" : "integer"
}
}
}
}
}
And here is a document.
{
"Id": 12781279
"Name": "Thing of purpose made to fit",
"Menus": [
{
"Id": -571057,
"Name": "Top level menu",
"AncestorsIds": [
-571057
]
}
,
{
"Id": 1022313,
"Name": "Other",
"AncestorsIds": [
-571057
,
1022313
]
}
]
}
For any given query I need a list with two columns: the Menu.Id and the number of documents in the result set that have that Menu.Id in their Menus array.
How?
(Is there any documentation for aggs that isn't impenetrable?)
#Richard, does this query suits your need ?
POST yourindex/_search
{
"_source": "false",
"aggs":{
"menus": {
"nested": {
"path": "Menus"
},
"aggs":{
"menu_aggregation": {
"terms": {
"field": "Menus.Id",
"size": 10
}
}
}
}
}
Output :
"aggregations": {
"menus": {
"doc_count": 2,
"menu_aggregation": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": -571057,
"doc_count": 1
},
{
"key": 1022313,
"doc_count": 1
}
]
}
}
Here we specify a nested path and then aggregate on the menu Ids.
You can take a look at this documentation page : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html

Synonyms aggregation in elasticsearch 7 - term based

I am trying to aggregate fields, but fields are similar like Med and Medium. I don't want both to come in my aggregation results, only either of it should come. I tried with synonyms but it doesn't seem to work.
Question is: How can I concatenate or unify similar aggregation results when it is term based?
Below is my work.
Mapping and Setting
{
"settings": {
"index" : {
"analysis" : {
"filter" : {
"synonym_filter" : {
"type" : "synonym",
"synonyms" : [
"medium, m, med",
"large, l",
"extra small, xs, x small"
]
}
},
"analyzer" : {
"synonym_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "synonym_filter"]
}
}
}
}
},
"mappings": {
"properties": {
"skus": {
"type": "nested",
"properties": {
"labels": {
"dynamic": "true",
"properties": {
"Color": {
"type": "text",
"fields": {
"synonym": {
"analyzer": "synonym_analyzer",
"type": "text",
"fielddata":true
}
}
},
"Size": {
"type": "text",
"fields": {
"synonym": {
"analyzer": "synonym_analyzer",
"type": "text",
"fielddata":true
}
}
}
}
}
}
}
}
}}
Aggregation
{
"aggs":{
"sizesFilter": {
"aggs": {
"sizes": {
"terms": {
"field": "skus.labels.Size.synonym"
}
}
},
"nested": {
"path": "skus"
}
}
}}
With only one doc my aggregation result is
"aggregations": {
"sizesFilter": {
"doc_count": 1,
"sizes": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "m",
"doc_count": 1
},
{
"key": "med",
"doc_count": 1
},
{
"key": "medium",
"doc_count": 1
}
]
}
}
}
I got it by setting tokenizer in analyzer to "keyword"
{
"analyzer" : {
"synonym_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase", "synonym_filter"]
}
}
}

Elastic Search mapper_parsing_exception error

I have created a index in elastic search with name test. Index mapping is as follow:
{
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
after creating index I have added following documents into it:
{
"title": "demo",
"url": {
"name": "tiger",
"age": 10
}
}
But I am getting following error:
{"mapper_parsing_exception","reason":"failed to parse field [url] of
type [text]"}
can anyone help me into this?
If your documents look like this:
{
"title": "demo",
"url": {
"name": "tiger",
"age": 10
}
}
Then your mapping needs to look like this, i.e. url is an object with the name and age fields:
{
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"properties": {
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"age": {
"type": "integer"
}
}
}
}
Hi You need to create mapping like this
PUT test
{
"settings" : {
"number_of_shards" : 1
},
"mapping": {
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
And the document is
put test/doc/1
{
"title": "demo",
"url": {
"name": "tiger",
"age": 10
}
}
GET test/doc/1
And the result is
{
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "demo",
"url" : {
"name" : "tiger",
"age" : 10
}
}
}
One reason for this if you're on Elastic Cloud is that the data types are assigned to fields the first time they appear on an index. And it will throw this error if you send it a subsequent log with a different type in that field.
For me, the log field was a string in the first log sent to the index but an object in the second. So the second one got rejected.
Good explanation here: https://discuss.elastic.co/t/getting-illegal-state-exception-error-while-pushing-logs-to-elasticsearch/290029

Resources