ElasticSearch - Range query on min value in array - elasticsearch

Given the following mapping:
"item": {
"properties": {
"name": {
"type": "string",
"index": "standard"
},
"state": {
"type": "string",
"index": "not_analyzed"
},
"important_dates": {
"properties": {
"city_id": {
"type": "integer"
},
"important_date": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}
And given the following items in an index:
{
"_id": 1,
"name": "test data 1",
"state": "california",
"important_dates": [
{
"city_id": 100,
"important_date": "2016-01-01T00:00:00"
},
{
"city_id": 200,
"important_date": "2016-05-15T00:00:00"
}
},
{
"_id": 2,
"name": "test data 2",
"state": "wisconsin",
"important_dates": [
{
"city_id": 300,
"important_date": "2016-04-10T00:00:00"
},
{
"city_id": 400,
"important_date": "2016-05-20T00:00:00"
}
}
Is it possible to do a range filter on important_dates, but only filter using the min date in the important_dates array? Could this also be expanded to only use the date for a specific city if a city_id was given as a parameter?
Example Queries:
If I have a range filter of 4/9/2016 to 5/17/2016 on important_dates, I only want to get back item 2 since the min date in item 1 doesn't fall within the range given.
If I have a range filter range filter of 4/9/2016 to 5/17/2016 on important_dates and pass in city_id 400, I should not get any results.

Related

How to filter by score in Elastic KNN search?

I have index with following mapping:
{
"test-2": {
"mappings": {
"properties": {
"advert_id": {
"type": "integer"
},
"fraud": {
"type": "boolean"
},
"photos": {
"properties": {
"id": {
"type": "integer"
},
"vector": {
"type": "dense_vector",
"dims": 3,
"index": true,
"similarity": "l2_norm"
}
}
},
"rating": {
"type": "long"
}
}
}
}
}
Here is how my data is saved in Elastic:
{
"advert_id": 123,
"fraud": true,
"photos": [
{
"id": 456,
"vector": [
213.32,
3.23,
4.21
]
}
]
}
I want to search data with similar vectors according to KNN algorithm. Here is my query for that:
GET /test-2/_knn_search
{
"knn": {
"field": "photos.vector",
"k": 1,
"num_candidates": 5,
"query_vector": [213.32, 3.23, 4.22]
}
}
Elastic returns me a score per each hit. Question is how can I get data with score more than N? It know about min_score, but couldn't apply it in this query.
Now that the kNN search API (/_knn_search) has been integrated into the search API (/_search) since Elasticsearch 8.4.0, we can use the min_score option as per the documentation as follows:
- GET /test-2/_knn_search
+ GET /test-2/_search
{
"knn": {
"field": "photos.vector",
"k": 1,
"num_candidates": 5,
"query_vector": [213.32, 3.23, 4.22]
},
+ "min_score": N
}

Elastic Search Wildcard query with space failing 7.11

I am having my data indexed in elastic search in version 7.11. This is my mapping i got when i directly added documents to my index.
{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}
I havent added the keyword part but no idea where it came from.
I am running a wild card query on the same. But unable to get data for keywords with spaces.
{
"query": {
"bool":{
"should":[
{"wildcard": {"name":"*hello world*"}}
]
}
}
}
Have seen many answers related to not_analyzed . And i have tried updating {"index":"true"} in mapping but with no help. How to make the wild card search work in this version of elastic search
Tried adding the wildcard field
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type" :"wildcard"
}
}
}
And got following response
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
}
],
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
},
"status": 400
}
Adding a sample document to match
{
"_index": "accelerators",
"_type": "_doc",
"_id": "602ec047a70f7f30bcf75dec",
"_score": 1.0,
"_source": {
"acc_id": "602ec047a70f7f30bcf75dec",
"name": "hello world example",
"type": "Accelerator",
"description": "khdkhfk ldsjl klsdkl",
"teamMembers": [
{
"userId": "karthik.r#gmail.com",
"name": "Karthik Ganesh R",
"shortName": "KR",
"isOwner": true
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS",
"isOwner": false
}
],
"sectorObj": [
{
"item_id": 14,
"item_text": "Cross-sector"
}
],
"geographyObj": [
{
"item_id": 4,
"item_text": "Global"
}
],
"technologyObj": [
{
"item_id": 1,
"item_text": "Artificial Intelligence"
}
],
"themeColor": 1,
"mainImage": "assets/images/Graphics/Asset 35.svg",
"features": [
{
"name": "Ideation",
"icon": "Asset 1007.svg"
},
{
"name": "Innovation",
"icon": "Asset 1044.svg"
},
{
"name": "Strategy",
"icon": "Asset 1129.svg"
},
{
"name": "Intuitive",
"icon": "Asset 964.svg"
},
],
"logo": {
"actualFileName": "",
"fileExtension": "",
"fileName": "",
"fileSize": 0,
"fileUrl": ""
},
"customLogo": {
"logoColor": "#B9241C",
"logoText": "EC",
"logoTextColor": "#F6F6FA"
},
"collaborators": [
{
"userId": "muhammed.arif#gmail.com",
"name": "muhammed Arif P T",
"shortName": "MA"
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS"
}
],
"created_date": "2021-02-18T19:30:15.238000Z",
"modified_date": "2021-03-11T11:45:49.583000Z"
}
}
You cannot modify a field mapping once created. However, you can create another sub-field of type wildcard, like this:
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type": "text",
"fields": {
"wildcard": {
"type" :"wildcard"
},
"keyword": {
"type" :"keyword",
"ignore_above":256
}
}
}
}
}
When the mapping is updated, you need to reindex your data so that the new field gets indexed, like this:
POST http://localhost:9001/indexname/_update_by_query
And then when this finishes, you'll be able to query on this new field like this:
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"name.wildcard": "*hello world*"
}
}
]
}
}
}

Elastic search array of objects nested range aggregation

I'm trying to make range aggregation on the following data set:
{
"ProductType": 1,
"ProductDefinition": "fc588f8e-14f2-4871-891f-c73a4e3d17ca",
"ParentProduct": null,
"Sku": "074617",
"VariantSku": null,
"Name": "Paraboot Avoriaz/Jannu Marron Brut Marron Brown Hiking Boot Shoes",
"AllowOrdering": true,
"Rating": null,
"ThumbnailImageUrl": "/media/1106/074617.jpg",
"PrimaryImageUrl": "/media/1106/074617.jpg",
"Categories": [
"399d7b20-18cc-46c0-b63e-79eadb9390c7"
],
"RelatedProducts": [],
"Variants": [
"84a7ff9f-edf0-4aab-87f9-ba4efd44db74",
"e2eb2c50-6abc-4fbe-8fc8-89e6644b23ef",
"a7e16ccc-c14f-42f5-afb2-9b7d9aefbc5c"
],
"PriceGroups": [
"86182755-519f-4e05-96ef-5f93a59bbaec"
],
"DisplayName": "Paraboot Avoriaz/Jannu Marron Brut Marron Brown Hiking Boot Shoes",
"ShortDescription": "",
"LongDescription": "<ul><li>Paraboot Avoriaz Mountaineering Boots</li><li>Marron Brut Marron (Brown)</li><li>Full leather inners and uppers</li><li>Norwegien Welted Commando Sole</li><li>Hand made in France</li><li>Style number : 074617</li></ul><p>As featured on Pritchards.co.uk</p>",
"UnitPrices": {
"EUR 15 pct": 343.85
},
"Taxes": {
"EUR 15 pct": 51.5775
},
"PricesInclTax": {
"EUR 15 pct": 395.4275
},
"Slug": "paraboot-avoriazjannu-marron-brut-marron-brown-hiking-boot-shoes",
"VariantsProperties": [
{
"Key": "ShoeSize",
"Value": "8"
},
{
"Key": "ShoeSize",
"Value": "10"
},
{
"Key": "ShoeSize",
"Value": "6"
}
],
"Guid": "0d4f6899-c66a-4416-8f5d-26822c3b57ae",
"Id": 178,
"ShowOnHomepage": true
}
I'm aggregating on VariantsProperties which have the following mapping
"VariantsProperties": {
"type": "nested",
"properties": {
"Key": {
"type": "keyword"
},
"Value": {
"type": "keyword"
}
}
}
Terms aggregations are working fine with following code:
{
"aggs": {
"Nest": {
"nested": {
"path": "VariantsProperties"
},
"aggs": {
"fieldIds": {
"terms": {
"field": "VariantsProperties.Key"
},
"aggs": {
"values": {
"terms": {
"field": "VariantsProperties.Value"
}
}
}
}
}
}
}
}
However when I try to do a range aggregation to get shoes in size between 8 - 12 such as:
{
"aggs": {
"Nest": {
"nested": {
"path": "VariantsProperties"
},
"aggs": {
"fieldIds": {
"range": {
"field": "VariantsProperties.Value",
"ranges": [ { "from": 8, "to": 12 }]
}
}
}
}
}
}
I get the following error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "product-avenueproductindexdefinition-24476f82-en-us",
"node": "ejgN4XecT1SUfgrhzP8uZg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]"
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]"
}
}
},
"status": 400
}
Is there a way to "transform" the terms aggregation into a range aggregation, without the need of changing the schema? I know I could build the ranges myself by extracting the data from the terms aggregation and building the ranges out of it, however, I would prefer a solution within the elastic itself.
There are two ways to solve this:
Option A: Use a script instead of a field. This option will work without having to reindex your data, but depending on your volume of data, the performance might suffer.
POST test/_search
{
"aggs": {
"Nest": {
"nested": {
"path": "VariantsProperties"
},
"aggs": {
"fieldIds": {
"range": {
"script": "Integer.parseInt(doc['VariantsProperties.Value'].value)",
"ranges": [
{
"from": 8,
"to": 12
}
]
}
}
}
}
}
}
Option B: Add an integer sub-field in your mapping.
PUT my-index/_mapping
{
"properties": {
"VariantsProperties": {
"type": "nested",
"properties": {
"Key": {
"type": "keyword"
},
"Value": {
"type": "keyword",
"fields": {
"numeric": {
"type": "integer",
"ignore_malformed": true
}
}
}
}
}
}
}
Once your mapping is modified, you can run _update_by_query on your index in order to reindex the VariantsProperties.Value data
PUT my-index/_update_by_query
Finally, when this last command is done, you can run the range aggregation on the VariantsProperties.Value.numeric field.
Also note that this second but will be more performant on the long term.

Elasticsearch: Why can't I use "5m" for precision in context queries?

I'm running on Elasticsearch 5.5
I have a document with the following mapping
"mappings": {
"shops": {
"properties": {
"locations": {
"type": "geo_point"
},
"name": {
"type": "keyword"
},
"suggest": {
"type": "completion",
"contexts": [
{
"name": "location",
"type": "GEO",
"precision": "10m",
"path": "locations"
}
]
}
}
}
I'll add a document as follows:
PUT my_index/shops
{
"name":"random shop",
"suggest":{
"input":"random shop"
},
"locations":[
{
"lat":42.38471212,
"lon":-71.12612357
}
]
}
I try to query for the document with the follow JSON call
GET my_shops/_search
{
"suggest": {
"result": {
"prefix": "random",
"completion": {
"field": "suggest",
"size": 5,
"fuzzy": true,
"contexts": {
"location": [{
"lat": 42.38471212,
"lon": -71.12612357,
"precision": "10mi"
}]
}
}
}
}
}
I get the following errors:
(source: discourse.org)
But when I change the "precision" field to an int, I get the intended search results.
I'm confused on two fronts.
Why is there a context error? The documentation seems to say that this is ok
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/suggester-context.html
Why can't I use string values for the precision values?
At the bottom of the page, I see that the precision values can take either distances or numeric values.

Extract record from multiple arrays based on a filter

I have documents in ElasticSearch with the following structure :
"_source": {
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price": [
"€ 139",
"€ 125",
"€ 120",
"€ 108"
],
"max_occupancy": [
2,
2,
1,
1
],
"type": [
"Type 1",
"Type 1 - (Tag)",
"Type 2",
"Type 2 (Tag)",
],
"availability": [
10,
10,
10,
10
],
"size": [
"26 m²",
"35 m²",
"47 m²",
"31 m²"
]
}
}
Basically, the details records are split in 5 arrays, and fields of the same record have the same index position in the 5 arrays. As can be seen in the example data there are 5 array(price, max_occupancy, type, availability, size) that are containing values related to the same element. I want to extract the element that has max_occupancy field greater or equal than 2 (if there is no record with 2 grab a 3 if there is no 3 grab a four, ...), with the lower price, in this case the record and place the result into a new JSON object like the following :
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price: ": "€ 125",
"max_occupancy": "2",
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
}
Basically the result structure should show the extracted record(that in this case is the second index of all array), and add the general information to it(fields : "last_updated", "country").
Is it possible to extract such a result from elastic search? What kind of query do I need to perform?
Could someone suggest the best approach?
My best approach: go nested with Nested Datatype
Except for easier querying, it easier to read and understand the connections between those objects that are, currently, scattered in different arrays.
Yes, if you'll decide this approach you will have to edit your mapping and re-index your entire data.
How would the mapping is going to look like? something like this:
{
"mappings": {
"properties": {
"last_updated": {
"type": "date"
},
"country": {
"type": "string"
},
"records": {
"type": "nested",
"properties": {
"price": {
"type": "string"
},
"max_occupancy": {
"type": "long"
},
"type": {
"type": "string"
},
"availability": {
"type": "long"
},
"size": {
"type": "string"
}
}
}
}
}
}
EDIT: New document structure (containing nested documents) -
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"records": [
{
"price": "€ 139",
"max_occupancy": 2,
"type": "Type 1",
"availability": 10,
"size": "26 m²"
},
{
"price": "€ 125",
"max_occupancy": 2,
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
},
{
"price": "€ 120",
"max_occupancy": 1,
"type": "Type 2",
"availability": 10,
"size": "47 m²"
},
{
"price": "€ 108",
"max_occupancy": 1,
"type": "Type 2 (Tag)",
"availability": 10,
"size": "31 m²"
}
]
}
Now, its more easy to query for any specific condition with Nested Query and Inner Hits. for example:
{
"_source": [
"last_updated",
"country"
],
"query": {
"bool": {
"must": [
{
"term": {
"country": "Italia"
}
},
{
"nested": {
"path": "records",
"query": {
"bool": {
"must": [
{
"range": {
"records.max_occupancy": {
"gte": 2
}
}
}
]
}
},
"inner_hits": {
"sort": {
"records.price": "asc"
},
"size": 1
}
}
}
]
}
}
}
Conditions are: Italia AND max_occupancy > 2.
Inner hits: sort by price ascending order and get the first result.
Hope you'll find it useful

Resources