Elasticsearch: Why can't I use "5m" for precision in context queries? - elasticsearch

I'm running on Elasticsearch 5.5
I have a document with the following mapping
"mappings": {
"shops": {
"properties": {
"locations": {
"type": "geo_point"
},
"name": {
"type": "keyword"
},
"suggest": {
"type": "completion",
"contexts": [
{
"name": "location",
"type": "GEO",
"precision": "10m",
"path": "locations"
}
]
}
}
}
I'll add a document as follows:
PUT my_index/shops
{
"name":"random shop",
"suggest":{
"input":"random shop"
},
"locations":[
{
"lat":42.38471212,
"lon":-71.12612357
}
]
}
I try to query for the document with the follow JSON call
GET my_shops/_search
{
"suggest": {
"result": {
"prefix": "random",
"completion": {
"field": "suggest",
"size": 5,
"fuzzy": true,
"contexts": {
"location": [{
"lat": 42.38471212,
"lon": -71.12612357,
"precision": "10mi"
}]
}
}
}
}
}
I get the following errors:
(source: discourse.org)
But when I change the "precision" field to an int, I get the intended search results.
I'm confused on two fronts.
Why is there a context error? The documentation seems to say that this is ok
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/suggester-context.html
Why can't I use string values for the precision values?
At the bottom of the page, I see that the precision values can take either distances or numeric values.

Related

Multiple concurrent aggregations best practice

I'm considering using Elasticsearch to act as the backend search engine for multi-filter utility. Per this requirement, a multiple aggregation queries will be run upon the cluster, while the expected response time is ~5 seconds.
Based on the details below, do you think this approach is valid for my use case?
If yes, what is the suggested cluster sizing?
For sure I'll have to increase default values for parameters such as index.mapping.total_fields.limit and index.mapping.nested_objects.limit.
It will be much appreciated to get some feedback on the approach suggested below, and ways to avoid common pitfalls.
Thanks in advance.
Details
Number of expected documents: ~50m
Number of unique fields values (facet_name + face_value): ~1B
Number of queries per second: ~50 per sec
Mappings:
{
"mappings": {
"properties": {
"customer_id": {
"type": "keyword"
},
"id": {
"type": "keyword"
},
"mi_score_join": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"mi_data": "customer_model"
}
},
"model_id": {
"type": "keyword"
},
"number_facet": {
"type": "nested",
"properties": {
"facet_name": {
"type": "keyword"
},
"facet_value": {
"type": "long"
}
}
},
"score": {
"type": "long"
},
"string_facet": {
"type": "nested",
"properties": {
"facet_name": {
"type": "keyword"
},
"facet_value": {
"type": "keyword"
}
}
}
}
}
}
An example for a document:
{
"id": 33421,
"string_facet":
[
{
"facet_value":"true",
"facet_name": "var_a"
},
{
"facet_value":"dummy_country",
"facet_name": "var_b"
},
{
"facet_value":"dummy_",
"facet_name": "var_c"
},
{
"facet_value":"https://dummy.com/",
"facet_name": "var_d"
}
,
{
"facet_value":"www.dummy.com",
"facet_name": "var_e"
}
,
{
"facet_value":"dummy",
"facet_name": "var_f"
}
],
"mi_score_join": "mi_data"
}
An example for an aggregation query to be run:
POST test_index/_search
{
"size":0,
"aggs": {
"facets": {
"nested": {
"path": "string_facet"
},
"aggs": {
"names": {
"terms": { "field": "string_facet.facet_name", "size":???},
"aggs": {
"values": {
"terms": { "field": "string_facet.facet_value" }
}
}
}
}
}
}
}
The "size": ??? will probably be the max cardinality of the whole terms values.
Filters may be added to the aggregations, based on the filters that already applied.

Elastic Search Wildcard query with space failing 7.11

I am having my data indexed in elastic search in version 7.11. This is my mapping i got when i directly added documents to my index.
{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}
I havent added the keyword part but no idea where it came from.
I am running a wild card query on the same. But unable to get data for keywords with spaces.
{
"query": {
"bool":{
"should":[
{"wildcard": {"name":"*hello world*"}}
]
}
}
}
Have seen many answers related to not_analyzed . And i have tried updating {"index":"true"} in mapping but with no help. How to make the wild card search work in this version of elastic search
Tried adding the wildcard field
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type" :"wildcard"
}
}
}
And got following response
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
}
],
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
},
"status": 400
}
Adding a sample document to match
{
"_index": "accelerators",
"_type": "_doc",
"_id": "602ec047a70f7f30bcf75dec",
"_score": 1.0,
"_source": {
"acc_id": "602ec047a70f7f30bcf75dec",
"name": "hello world example",
"type": "Accelerator",
"description": "khdkhfk ldsjl klsdkl",
"teamMembers": [
{
"userId": "karthik.r#gmail.com",
"name": "Karthik Ganesh R",
"shortName": "KR",
"isOwner": true
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS",
"isOwner": false
}
],
"sectorObj": [
{
"item_id": 14,
"item_text": "Cross-sector"
}
],
"geographyObj": [
{
"item_id": 4,
"item_text": "Global"
}
],
"technologyObj": [
{
"item_id": 1,
"item_text": "Artificial Intelligence"
}
],
"themeColor": 1,
"mainImage": "assets/images/Graphics/Asset 35.svg",
"features": [
{
"name": "Ideation",
"icon": "Asset 1007.svg"
},
{
"name": "Innovation",
"icon": "Asset 1044.svg"
},
{
"name": "Strategy",
"icon": "Asset 1129.svg"
},
{
"name": "Intuitive",
"icon": "Asset 964.svg"
},
],
"logo": {
"actualFileName": "",
"fileExtension": "",
"fileName": "",
"fileSize": 0,
"fileUrl": ""
},
"customLogo": {
"logoColor": "#B9241C",
"logoText": "EC",
"logoTextColor": "#F6F6FA"
},
"collaborators": [
{
"userId": "muhammed.arif#gmail.com",
"name": "muhammed Arif P T",
"shortName": "MA"
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS"
}
],
"created_date": "2021-02-18T19:30:15.238000Z",
"modified_date": "2021-03-11T11:45:49.583000Z"
}
}
You cannot modify a field mapping once created. However, you can create another sub-field of type wildcard, like this:
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type": "text",
"fields": {
"wildcard": {
"type" :"wildcard"
},
"keyword": {
"type" :"keyword",
"ignore_above":256
}
}
}
}
}
When the mapping is updated, you need to reindex your data so that the new field gets indexed, like this:
POST http://localhost:9001/indexname/_update_by_query
And then when this finishes, you'll be able to query on this new field like this:
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"name.wildcard": "*hello world*"
}
}
]
}
}
}

Elastic search array of objects nested range aggregation

I'm trying to make range aggregation on the following data set:
{
"ProductType": 1,
"ProductDefinition": "fc588f8e-14f2-4871-891f-c73a4e3d17ca",
"ParentProduct": null,
"Sku": "074617",
"VariantSku": null,
"Name": "Paraboot Avoriaz/Jannu Marron Brut Marron Brown Hiking Boot Shoes",
"AllowOrdering": true,
"Rating": null,
"ThumbnailImageUrl": "/media/1106/074617.jpg",
"PrimaryImageUrl": "/media/1106/074617.jpg",
"Categories": [
"399d7b20-18cc-46c0-b63e-79eadb9390c7"
],
"RelatedProducts": [],
"Variants": [
"84a7ff9f-edf0-4aab-87f9-ba4efd44db74",
"e2eb2c50-6abc-4fbe-8fc8-89e6644b23ef",
"a7e16ccc-c14f-42f5-afb2-9b7d9aefbc5c"
],
"PriceGroups": [
"86182755-519f-4e05-96ef-5f93a59bbaec"
],
"DisplayName": "Paraboot Avoriaz/Jannu Marron Brut Marron Brown Hiking Boot Shoes",
"ShortDescription": "",
"LongDescription": "<ul><li>Paraboot Avoriaz Mountaineering Boots</li><li>Marron Brut Marron (Brown)</li><li>Full leather inners and uppers</li><li>Norwegien Welted Commando Sole</li><li>Hand made in France</li><li>Style number : 074617</li></ul><p>As featured on Pritchards.co.uk</p>",
"UnitPrices": {
"EUR 15 pct": 343.85
},
"Taxes": {
"EUR 15 pct": 51.5775
},
"PricesInclTax": {
"EUR 15 pct": 395.4275
},
"Slug": "paraboot-avoriazjannu-marron-brut-marron-brown-hiking-boot-shoes",
"VariantsProperties": [
{
"Key": "ShoeSize",
"Value": "8"
},
{
"Key": "ShoeSize",
"Value": "10"
},
{
"Key": "ShoeSize",
"Value": "6"
}
],
"Guid": "0d4f6899-c66a-4416-8f5d-26822c3b57ae",
"Id": 178,
"ShowOnHomepage": true
}
I'm aggregating on VariantsProperties which have the following mapping
"VariantsProperties": {
"type": "nested",
"properties": {
"Key": {
"type": "keyword"
},
"Value": {
"type": "keyword"
}
}
}
Terms aggregations are working fine with following code:
{
"aggs": {
"Nest": {
"nested": {
"path": "VariantsProperties"
},
"aggs": {
"fieldIds": {
"terms": {
"field": "VariantsProperties.Key"
},
"aggs": {
"values": {
"terms": {
"field": "VariantsProperties.Value"
}
}
}
}
}
}
}
}
However when I try to do a range aggregation to get shoes in size between 8 - 12 such as:
{
"aggs": {
"Nest": {
"nested": {
"path": "VariantsProperties"
},
"aggs": {
"fieldIds": {
"range": {
"field": "VariantsProperties.Value",
"ranges": [ { "from": 8, "to": 12 }]
}
}
}
}
}
}
I get the following error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "product-avenueproductindexdefinition-24476f82-en-us",
"node": "ejgN4XecT1SUfgrhzP8uZg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]"
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]"
}
}
},
"status": 400
}
Is there a way to "transform" the terms aggregation into a range aggregation, without the need of changing the schema? I know I could build the ranges myself by extracting the data from the terms aggregation and building the ranges out of it, however, I would prefer a solution within the elastic itself.
There are two ways to solve this:
Option A: Use a script instead of a field. This option will work without having to reindex your data, but depending on your volume of data, the performance might suffer.
POST test/_search
{
"aggs": {
"Nest": {
"nested": {
"path": "VariantsProperties"
},
"aggs": {
"fieldIds": {
"range": {
"script": "Integer.parseInt(doc['VariantsProperties.Value'].value)",
"ranges": [
{
"from": 8,
"to": 12
}
]
}
}
}
}
}
}
Option B: Add an integer sub-field in your mapping.
PUT my-index/_mapping
{
"properties": {
"VariantsProperties": {
"type": "nested",
"properties": {
"Key": {
"type": "keyword"
},
"Value": {
"type": "keyword",
"fields": {
"numeric": {
"type": "integer",
"ignore_malformed": true
}
}
}
}
}
}
}
Once your mapping is modified, you can run _update_by_query on your index in order to reindex the VariantsProperties.Value data
PUT my-index/_update_by_query
Finally, when this last command is done, you can run the range aggregation on the VariantsProperties.Value.numeric field.
Also note that this second but will be more performant on the long term.

elasticsearch facet nested aggregation

Using elasticsearch 7.0.0.
I am following this link.
I have an index test_products with following mapping:
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"dynamic_templates": [
{
"search_result_data": {
"mapping": {
"type": "keyword"
},
"path_match": "search_result_data.*"
}
}
],
"properties": {
"search_data": {
"type": "nested",
"properties": {
"full_text": {
"type": "text"
},
"string_facet": {
"type": "nested",
"properties": {
"facet-name": {
"type": "keyword"
},
"facet-value": {
"type": "keyword"
}
}
}
}
}
}
}
}
And a document inserted with following format:
{
"search_result_data": {
"sku": "wheel-6075-90092",
"gtin": null,
"name": "Matte Black Wheel Fuel Ripper",
"preview_image": "abc.jg",
"url": "9836817354546538796",
"brand": "Fuel Off-Road"
},
"search_data":
{
"full_text": "Matte Black Wheel Fuel Ripper",
"string_facet": [
{
"facet-name": "category",
"facet-value": "Motor Vehicle Rims & Wheels"
},
{
"facet-name": "brand",
"facet-value": "Fuel Off-Road"
}
]
}
}
and one other document..
I am trying to aggregate on string_facet as mentioned in the link.
"aggregations": {
"agg_string_facet": {
"nested": {
"path": "string_facet"
},
"aggregations": {
"facet_name": {
"terms": {
"field": "string_facet.facet-name"
},
"aggregations": {
"facet_value": {
"terms": {
"field": "string_facet.facet-value"
}
}
}
}
}
}
}
But I get all (two) documents returned with :
"aggregations": {
"agg_string_facet": {
"doc_count": 0
}
}
What am I missing here?
Also why are the docs being returned as a response?
Documents are returned as a response because they match with your query. If you'd like them to disappear, you can set the "size" field to 0. By default, it's set to 10.
query{
...
},
"size" = 0
I read the docs and Facet aggregation has been removed. The recommendation is to use the Terms aggregation.
Now, for your question, you can go with two options:
If you'd like to get the unique values for each: facet-value and facet-name, you can do the following:
"aggs":{
"unique facet-values":{
"terms":{
"field": "facet-value.keyword",
"size": 30 #By default is 10, maximum recommended is 10,000
}
},
"unique facet-names":{
"terms":{
"field": "facet-name.keyword"
"size": 30 #By default is 10, maximum recommended is 10,000
}
}
}
If you'd like to get the unique combinations between facet-name and facet-value, you can use the Composite aggregation. If you choose this way, your aggs should look like this:
{
"aggs":{
"unique-facetvalue-and-facetname-combination":{
"composite":{
"size": 30, #By default is 10, maximum recommended is 10,000. No matter what size you choose, you can paginate.
"sources":[
{
"value":
{
"terms":{
"field": "facet-value.keyword"
}
}
},
{
"name":
{
"terms":{
"field": "facet-name.keyword"
}
}
}
]
}
}
}
}
The advantage of using Composite over Terms is that Composite lets you paginate your results with the After key. So your cluster's performance does not get affected.
Hope this is helpful! :D

How I can get the distinct result?

What I am trying to do is the query to elastic search (ver 6.4), to get the unique search result (named eids). I made a query as below. What I'd like to do is first text search from both 2 fields called eLabel and pLabel, and get the distinct result called eid. But actually the result is not aggregated, showing redundant ids from 0 to over 20. How I can adjust the query?
{
"query": {
"multi_match": {
"query": "Brazil Capital",
"fields": [
"eLabel",
"pLabel"
]
}
},
"size": 200,
"_source": [
"eid",
"eLabel"
],
"aggs": {
"eids": {
"terms": {
"field": "eid"
}
}
}
}
my current mappings are as follows.
eid : id of entity
eLabel: entity label (ex, Brazil)
prop_id: property id of the entity (eid)
pLabel: the label of the property (ex, is the capital of, is located at ...)
"mappings": {
"entity": {
"properties": {
"eLabel": {
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"eid": {
"type": "keyword"
} ,
"subclass": {
"type": "boolean"
} ,
"pLabel": {
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"prop_id": {
"type": "keyword"
} ,
"pType": {
"type": "keyword"
} ,
"way": {
"type": "keyword"
} ,
"chain": {
"type": "integer"
} ,
"siteKey": {
"type": "keyword"
},
"version": {
"type": "integer"
},
"docId": {
"type": "integer"
}
}
}
}
Based on your comment, you can make use of the below query using Bool. Don't think anything is wrong with aggregation query, just replace the query you have with the bool query I've mentioned and I think it would suffice.
When you make use of multi_match query, it would retrieve even if the document has eLabel = "Rio is capital of brazil" & pLabel = "something else entirely here"
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"eLabel": "capital"
}
},
{
"match": {
"pLabel": "brazil"
}
}
]
}
},
"size": 200,
"_source": [
"eid",
"eLabel"
],
"aggs": {
"eids": {
"terms": {
"field": "eid"
}
}
}
}
Note that if you only want the values of eid and do not want the documents, you can set "size":0 in the above query. That way you'd only have aggregation results returned.
Let me know if this helps!!

Resources