Terms query(ElasticSearch) not working for some field values - elasticsearch

This is my Mapping where i have already declared my required field as keywordbut yet my terms query is not working for category_name and storeName but it is working fine for price.
"mappings": {
"properties" : {
"firebaseId":{
"type":"text"
},
"name" : {
"type" : "text",
"analyzer" : "synonym"
},
"name_auto" : {
"type": "text",
"fields": {
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
}
},
"category_name" : {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"storeName" : {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"sku" : {
"type" : "text"
},
"price" : {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"magento_id" : {
"type" : "text"
},
"seller_id" : {
"type" : "text"
},
"square_item_id" : {
"type" : "text"
},
"square_variation_id" : {
"type" : "text"
},
"typeId" : {
"type" : "text"
}
}
}
}
}
This is my query below :
{
"size": 0,
"aggs": {
"Category Filter": {
"terms": {
"field": "category_name",
"size": 10
}
},
"Store Filter": {
"terms": {
"field": "storeName",
"size": 10
}
},
"Price Filter": {
"range": {
"field": "price",
"ranges": [
{
"from": 0,
"to": 50
},
{
"from": 50,
"to": 100
},
{
"from": 100,
"to": 200
}
]
}
}
}
}
which returns as follows :
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [category_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."

You need to use the .keyword sub-field, like this:
{
"size": 0,
"aggs": {
"Category Filter": {
"terms": {
"field": "category_name.keyword", <-- change this
"size": 10
}
},
"Store Filter": {
"terms": {
"field": "storeName.keyword", <-- change this
"size": 10
}
},
"Price Filter": {
"range": {
"field": "price",
"ranges": [
{
"from": 0,
"to": 50
},
{
"from": 50,
"to": 100
},
{
"from": 100,
"to": 200
}
]
}
}
}
}

Related

Improve performance of a nested term aggregation?

Is there a way to improve performance of a nested term aggregation without sampling?
Terms query:
GET <INDEX>/_search?pretty&request_cache=false
{
"_source": false,
"sort": [
"_doc"
],
"size": 0,
"track_total_hits": false,
"aggregations": {
"nested_suggestions": {
"nested": {
"path": "measurement"
},
"aggs": {
"suggestions": {
"terms": {
"field": "measurement.description.label",
"size": 1
}
}
}
}
}
}
...
{
"took" : 8239,
"timed_out" : false,
...
"aggregations" : {
"nested_suggestions" : {
"doc_count" : 226139234,
"suggestions" : {
"doc_count_error_upper_bound" : 7445607,
"sum_other_doc_count" : 214543500,
"buckets" : [
{
"key" : "xxx",
"doc_count" : 11635382
}
]
}
}
}
}
Cardinality query:
GET <INDEX>/_search?pretty&request_cache=false
{
"_source": false,
"sort": [
"_doc"
],
"size": 0,
"track_total_hits": false,
"aggregations": {
"nested_suggestions": {
"nested": {
"path": "measurement"
},
"aggs": {
"suggestions": {
"cardinality": {
"field": "measurement.description.label"
}
}
}
}
}
}
...
{
"took" : 5688,
"timed_out" : false,
...
"aggregations" : {
"nested_suggestions" : {
"doc_count" : 226139234,
"suggestions" : {
"value" : 1379
}
}
}
}
Minimal mapping:
{
"settings": {
"number_of_replicas": "0",
"number_of_shards": "10",
"analysis": {
"normalizer": {
"raw_clean": {
"type": "custom",
"filter": [
"asciifolding"
]
}
}
}
},
"mappings": {
"_doc": {
"dynamic": "strict",
"properties": {
"id": {
"type": "keyword"
},
"measurement": {
"type": "nested",
"dynamic": "strict",
"properties": {
"id": {
"type": "keyword"
},
"description": {
"type": "text",
"norms": false,
"fields": {
"label": {
"type": "keyword",
"normalizer": "raw_clean",
"ignore_above": 255,
"eager_global_ordinals": true
}
}
}
}
}
}
}
}
}
I've verified that the global ordinals have data via /_cat/fielddata?v.
Is this kind of performance expected with nested terms aggregations?
Environment:
elasticsearch 6.8.3
index size ~200GB (with the full mapping)
documents ~1million
nested documents ~225million
4CPU 16GB RAM 500GB SSD

Filter aggregation keys with non nested mapping in elasticsearch

I have following mapping:
{
"Country": {
"properties": {
"State": {
"properties": {
"Name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"Code": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"Lang": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
This is sample document:
{
"Country": {
"State": [
{
"Name": "California",
"Code": "CA",
"Lang": "EN"
},
{
"Name": "Alaska",
"Code": "AK",
"Lang": "EN"
},
{
"Name": "Texas",
"Code": "TX",
"Lang": "EN"
}
]
}
}
I am querying on this index to get aggregates of count of states by name. I am using following query:
{
"from": 0,
"size": 0,
"query": {
"query_string": {
"query": "Country.State.Name: *Ala*"
}
},
"aggs": {
"counts": {
"terms": {
"field": "Country.State.Name.raw",
"include": ".*Ala.*"
}
}
}
}
I am able to get only keys matching with query_string using include regex in terms aggregation but seems there is no way to make it case insensitive regex in include.
The result I want is:
{
"aggregations": {
"counts": {
"buckets": [
{
"key": "Alaska",
"doc_count": 1
}
]
}
}
}
Is there other solution available to get me only keys matching query_string without using nested mapping?
Use Normalizer for keyword datatype. Below is the sample mapping:
Mapping:
PUT country
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": { <---- Note this
"type": "custom",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"Country": {
"properties": {
"State": {
"properties": {
"Name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"normalizer": "my_normalizer" <---- Note this
}
}
},
"Code": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
},
"Lang": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
}
}
}
}
}
}
Document:
POST country/_doc/1
{
"Country": {
"State": [
{
"Name": "California",
"Code": "CA",
"Lang": "EN"
},
{
"Name": "Alaska",
"Code": "AK",
"Lang": "EN"
},
{
"Name": "Texas",
"Code": "TX",
"Lang": "EN"
}
]
}
}
Aggregation Query:
POST country/_search
{
"from": 0,
"size": 0,
"query": {
"query_string": {
"query": "Country.State.Name: *Ala*"
}
},
"aggs": {
"counts": {
"terms": {
"field": "Country.State.Name.raw",
"include": "ala.*"
}
}
}
}
Notice the query pattern in include. Basically all the values of the *.raw fields that you have, would be stored in lowercase letters due to the normalizer that I've applied.
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"counts" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "alaska",
"doc_count" : 1
}
]
}
}
}
Hope this helps!
I was able to fix the problem by using inline script to filter the keys. (Still a dirty fix but it solves my use case for now and I can avoid mapping changes)
Here is how I am executing query.
{
"from": 0,
"size": 0,
"query": {
"query_string": {
"query": "Country.State.Name: *Ala*"
}
},
"aggs": {
"counts": {
"terms": {
"script": {
"source": "doc['Country.State.Name.raw'].value.toLowerCase().contains('ala') ? doc['Country.State.Name.raw'].value : null",
"lang": "painless"
}
}
}
}
}

Elastic search fuzzy query unexpected results

I have 2 indices, cities and places. Places one has a mapping like this:
{
"mappings": {
"properties": {
"cityId": {
"type": "integer"
},
"cityName": {
"type": "text"
},
"placeName": {
"type": "text"
},
"status": {
"type": "keyword"
},
"category": {
"type": "keyword"
},
"reviews": {
"properties": {
"rating": {
"type": "long"
},
"comment": {
"type": "keyword"
},
"user": {
"type": "nested"
}
}
}
}
}
}
And City is index is mapped like this:
{
"mappings": {
"properties": {
"state": {
"type": "keyword"
},
"postal": {
"type": "keyword"
},
"phone": {
"type": "keyword"
},
"email": {
"type": "keyword"
},
"notes": {
"type": "keyword"
},
"status": {
"type": "keyword"
},
"cityName": {
"type": "text"
},
"website": {
"type": "keyword"
},
"cityId": {
"type": "integer"
}
}
}
}
Initially we had a single document where cities had places embedded but I was having trouble searching nested places array so I changed the structure to this, I want to be able to search both cityName and placeName in a single query with fuzziness. I have a city including the word Welder's in it's name and also the some places inside the same location have the word Welder's in their name, which have a type:text. However when searched for welder both of the following queries see below don't return these documents, a search for welders OR welder's does return these documents. I am not sure why welder won't match with Welder's*. I didn't specify any analyzer during the creation of both the indices and neither am I explicitly defining it in the query can anyone help me out with this query so it behaves as expected:
Query 1: index = places
{
"query": {
"bool": {
"should": [
{
"match": {
"placeName": {
"query": "welder",
"fuzziness": 20
}
}
},
{
"match": {
"cityName": {
"query": "welder",
"fuzziness": 20
}
}
}
]
}
}
}
Query 2: index = places
{
"query": {
"match": {
"placeName": {
"query": "welder",
"fuzziness": 20
}
}
}
}
Can anyone post a query that when passed a word welder would return documents having Welder's in their name (should also work for other terms like these, this is just an example)
Edit 1 :
This is a sample place document I would want to be returned by any of the queries posted above:
{
cityId: 29,
placeName: "Welder's Garage Islamabad",
cityName: "Islamabad",
status: "verified",
category: null,
reviews: []
}
Using your mapping and query and fuzziness set as "20" I am getting document back. Fuzziness: 20 will tolerate 20 edit distance between searched word and welder's so even "w" will match with "welder's". I think this value is different in your actual query.
If you want to search for welder or welders and return welder's then you can use stemmer token filter
Mapping:
PUT indexfuzzy
{
"mappings": {
"properties": {
"cityId": {
"type": "integer"
},
"cityName": {
"type": "text"
},
"placeName": {
"type": "text",
"analyzer": "my_analyzer"
},
"status": {
"type": "keyword"
},
"category": {
"type": "keyword"
},
"reviews": {
"properties": {
"rating": {
"type": "long"
},
"comment": {
"type": "keyword"
},
"user": {
"type": "nested"
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"stem_possessive_english",
"stem_minimal_english"
]
}
},
"filter": {
"stem_possessive_english": {
"type": "stemmer",
"name": "possessive_english"
},
"stem_minimal_english": {
"type": "stemmer",
"name": "minimal_english"
}
}
}
}
}
Query :
GET indexfuzzy/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"placeName": {
"query": "welder"--> welder,welders,welder's will work
}
}
},
{
"match": {
"cityName": {
"query": "welder"
}
}
}
]
}
}
}
Result:
[
{
"_index" : "indexfuzzy",
"_type" : "_doc",
"_id" : "Jc-yx3ABd7NBn_0GTBdp",
"_score" : 0.2876821,
"_source" : {
"cityId" : 29,
"placeName" : "Welder's Garage Islamabad",
"cityName" : "Islamabad",
"status" : "verified",
"category" : null,
"reviews" : [ ]
}
}
]
possessive_english:- removes trailing 's from tokens
minimal_english:- removes plurals
GET <index_name>/_analyze
{
"text": "Welder's Garage Islamabad",
"analyzer": "my_analyzer"
}
returns
{
"tokens" : [
{
"token" : "welder", --> will be matched for welder's, welders
"start_offset" : 0,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "garage",
"start_offset" : 9,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "islamabad",
"start_offset" : 16,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}

Counting search results in ElasticSearch by a nested property

Here is a schema with a nested property.
{
"dynamic": "strict",
"properties" : {
"Id" : {
"type": "integer"
},
"Name_en" : {
"type": "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"normalizer": "cloudbuy_normalizer_alphanumeric"
},
"text" : {
"type" : "text",
"analyzer": "english"
}
}
},
"Menus" : {
"type" : "nested",
"properties" : {
"Id" : {
"type" : "integer"
},
"Name" : {
"type" : "keyword",
"normalizer": "normalizer_alphanumeric"
},
"AncestorsIds" : {
"type" : "integer"
}
}
}
}
}
And here is a document.
{
"Id": 12781279
"Name": "Thing of purpose made to fit",
"Menus": [
{
"Id": -571057,
"Name": "Top level menu",
"AncestorsIds": [
-571057
]
}
,
{
"Id": 1022313,
"Name": "Other",
"AncestorsIds": [
-571057
,
1022313
]
}
]
}
For any given query I need a list with two columns: the Menu.Id and the number of documents in the result set that have that Menu.Id in their Menus array.
How?
(Is there any documentation for aggs that isn't impenetrable?)
#Richard, does this query suits your need ?
POST yourindex/_search
{
"_source": "false",
"aggs":{
"menus": {
"nested": {
"path": "Menus"
},
"aggs":{
"menu_aggregation": {
"terms": {
"field": "Menus.Id",
"size": 10
}
}
}
}
}
Output :
"aggregations": {
"menus": {
"doc_count": 2,
"menu_aggregation": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": -571057,
"doc_count": 1
},
{
"key": 1022313,
"doc_count": 1
}
]
}
}
Here we specify a nested path and then aggregate on the menu Ids.
You can take a look at this documentation page : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html

Synonyms aggregation in elasticsearch 7 - term based

I am trying to aggregate fields, but fields are similar like Med and Medium. I don't want both to come in my aggregation results, only either of it should come. I tried with synonyms but it doesn't seem to work.
Question is: How can I concatenate or unify similar aggregation results when it is term based?
Below is my work.
Mapping and Setting
{
"settings": {
"index" : {
"analysis" : {
"filter" : {
"synonym_filter" : {
"type" : "synonym",
"synonyms" : [
"medium, m, med",
"large, l",
"extra small, xs, x small"
]
}
},
"analyzer" : {
"synonym_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "synonym_filter"]
}
}
}
}
},
"mappings": {
"properties": {
"skus": {
"type": "nested",
"properties": {
"labels": {
"dynamic": "true",
"properties": {
"Color": {
"type": "text",
"fields": {
"synonym": {
"analyzer": "synonym_analyzer",
"type": "text",
"fielddata":true
}
}
},
"Size": {
"type": "text",
"fields": {
"synonym": {
"analyzer": "synonym_analyzer",
"type": "text",
"fielddata":true
}
}
}
}
}
}
}
}
}}
Aggregation
{
"aggs":{
"sizesFilter": {
"aggs": {
"sizes": {
"terms": {
"field": "skus.labels.Size.synonym"
}
}
},
"nested": {
"path": "skus"
}
}
}}
With only one doc my aggregation result is
"aggregations": {
"sizesFilter": {
"doc_count": 1,
"sizes": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "m",
"doc_count": 1
},
{
"key": "med",
"doc_count": 1
},
{
"key": "medium",
"doc_count": 1
}
]
}
}
}
I got it by setting tokenizer in analyzer to "keyword"
{
"analyzer" : {
"synonym_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase", "synonym_filter"]
}
}
}

Resources