Accessing nested property in Elasticsearch distance script. - elasticsearch

My index in elastic search has the following mapping:
"couchbaseDocument": {
"properties": {
"doc": {
"properties": {
"properties": {
"properties": {
"location": {
"type": "geo_point"
The source document is as follows:
{"properties" : {"location":"43.706596,-79.4030464"}}
I am trying to use the distance script to calculate the distance based on geo-points. I found this post Return distance in elasticsearch results? to help me out. I am trying to get all results,filter by radius 1km, get the distance, and sort on geo_point. The query is constructed as follows:
{
"query": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "1km",
"doc.properties.location": {
"lat": 43.710323,
"lon": -79.395284
}
}
},
"script_fields": {
"distancePLANE": {
"params": {
"lat": 43.710323,
"lon": -79.395284
},
"script": "doc[properties]['location'].distanceInKm(lat, lon)"
},
"distanceARC" :{
"params": {
"lat": 43.710323,
"lon": -79.395284
},
"script": "doc[properties]['location'].arcDistanceInKm(lat,lon)"
}
},
"sort": [
{
"_geo_distance":{
"doc.properties.location": [-79.395284,43.710323],
"order": "desc",
"unit": "km"
}
}
],
"track_scores": true
}
I get the following error with status 500:
"PropertyAccessException[[Error: could not access: properties; in class: org.elasticsearch.search.lookup.DocLookup]\n[Near : {... doc[properties]['location'].distan ....}]\n ^\n[Line: 1, Column: 5]]"
I tried rewriting the query in this way:
..."script": "doc['properties']['location'].arcDistanceInKm(lat,lon)"...
Then I get this error:
"CompileException[[Error: No field found for [properties] in mapping with types [couchbaseDocument]]\n[Near : {... doc['properties']['location']. ....}]\n ^\n[Line: 1, Column: 1]]; nested: ElasticSearchIllegalArgumentException[No field found for [properties] in mapping with types [couchbaseDocument]]; "
When I remove the script part from the query all together, the sorting and filtering works just fine. Is there a different way to access nested fields when using scripts? Any insights would be really appreciated!
Thank you!

Managed to get it done with
"script" : "doc.last_location.distance(41.12, -71.34)"
Don't know why but doc['last_location'] does not seem to work at all!

As mentioned in my comment when you sort by _geo_distance the "_sort" field that is returned, is the actual distance. So there is no need to do a separate computation. Details here: http://elasticsearch-users.115913.n3.nabble.com/search-by-distance-and-getting-the-actual-distance-td3317140.html#a3936224

Related

Elasticsearch custom geo distance filter

From an Elasticsearch query I'd like to retrieve all the points within a variable distance.
Let say I have 2 shops, one is willing to deliver at maximum 3 km and the other one at maximum 5 km:
PUT /my_shops/_doc/1
{
"location": {
"lat": 40.12,
"lon": -71.34
},
"max_delivery_distance": 3000
}
PUT /my_shops/_doc/2
{
"location": {
"lat": 41.12,
"lon": -72.34
},
"max_delivery_distance": 5000
}
For a given location I'd like to know which shops are able to deliver. IE query should return shop1 if given location is within 3km and shop2 if given location is within 5km
GET /my_shops/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": max_delivery_distance,
"location": {
"lat": 40,
"lon": -70
}
}
}
}
}
}
There's another way to solve this without scripting (big performance hogger !!) and let ES sort it out using native Geo shapes.
I would model each document as a circle, with a center location and a (delivery) radius. First, your index mapping should look like this:
PUT /my_shops
{
"mappings": {
"properties": {
"delivery_area": {
"type": "geo_shape",
"strategy": "recursive"
}
}
}
}
Then, your documents then need to have the following form:
PUT /my_shops/_doc/1
{
"delivery_area" : {
"type" : "circle",
"coordinates" : [-71.34, 40.12],
"radius" : "3000m"
}
}
PUT /my_shops/_doc/2
{
"delivery_area" : {
"type" : "circle",
"coordinates" : [-72.34, 41.12],
"radius" : "5000m"
}
}
And finally the query simply becomes a geo_shape query looking at intersections between a delivery point and the delivery area of each shop.
GET /my_shops/_search
{
"query": {
"bool": {
"filter": {
"geo_shape": {
"delivery_area": {
"shape": {
"type": "point",
"coordinates": [ -70, 40 ]
},
"relation": "contains"
}
}
}
}
}
}
That's it! No scripting, just geo operations.
I think that you need to work with a script to use another field as parameter. After some research I come to this answer:
GET my_shops/_search
{
"query": {
"script": {
"script": {
"params": {
"location": {
"lat": 40,
"lon": -70
}
},
"source": """
return doc['location'].arcDistance(params.location.lat, params.location.lon)/1000 <= doc['max_delivery_distance'].value"""
}
}
}
}
Basically, we exploit the fact that the classes related to the GEO points are whitelisted in painless https://github.com/elastic/elasticsearch/pull/40180/ and that scripts accepts additional parameters (your fixed location).
According to the documentation of arcDistance we retrieve the size in meters, so you need to convert this value into km by dividing by 1000.
Additional Note
I assume that location and max_delivery_distance are always (for each document) defined. If it is not the case, you need to cover this case.
Reference
Another related question
https://github.com/elastic/elasticsearch/pull/40180/

How to correctly query inside of terms aggregate values in elasticsearch, using include and regex?

How do you filter out/search in aggregate results efficiently?
Imagine you have 1 million documents in elastic search. In those documents, you have a multi_field (keyword, text) tags:
{
...
tags: ['Race', 'Racing', 'Mountain Bike', 'Horizontal'],
...
},
{
...
tags: ['Tracey Chapman', 'Silverfish', 'Blue'],
...
},
{
...
tags: ['Surfing', 'Race', 'Disgrace'],
...
},
You can use these values as filters, (facets), against a query to pull only the documents that contain this tag:
...
"filter": [
{
"terms": {
"tags": [
"Race"
]
}
},
...
]
But you want the user to be able to query for possible tag filters. So if the user types, race the return should show (from previous example), ['Race', 'Tracey Chapman', 'Disgrace']. That way, the user can query for a filter to use. In order to accomplish this, I had to use aggregates:
{
"aggs": {
"topics": {
"terms": {
"field": "tags",
"include": ".*[Rr][Aa][Cc][Ee].*", // I have to dynamically form this
"size": 6
}
}
},
"size": 0
}
This gives me exactly what I need! But it is slow, very slow. I've tried adding the execution_hint, it does not help me.
You may think, "Just use a query before the aggregate!" But the issue is that it'll pull all values for all documents in that query. Meaning, you can be displaying tags that are completely unrelated. If I queried for race before the aggregate, and did not use the include regex, I would end up with all those other values, like 'Horizontal', etc...
How can I rewrite this aggregation to work faster? Is there a better way to write this? Do I really have to make a separate index just for values? (sad face) Seems like this would be a common issue but have found no answers through documentation and googling.
You certainly don't need a separate index just for the values...
Here's my take on it:
What you're doing with the regex is essentially what should've been done by a tokenizer -- i.e. constructing substrings (or N-grams) such that they can be targeted later.
This means that the keyword Race will need to be tokenized into the n-grams ["rac", "race", "ace"]. (It doesn't really make sense to go any lower than 3 characters -- most autocomplete libraries choose to ignore fewer than 3 characters because the possible matches balloon too quickly.)
Elasticsearch offers the N-gram tokenizer but we'll need to increase the default index-level setting called max_ngram_diff from 1 to (arbitrarily) 10 because we want to catch as many ngrams as is reasonable:
PUT tagindex
{
"settings": {
"index": {
"max_ngram_diff": 10
},
"analysis": {
"analyzer": {
"my_ngrams_analyzer": {
"tokenizer": "my_ngrams",
"filter": [ "lowercase" ]
}
},
"tokenizer": {
"my_ngrams": {
"type": "ngram",
"min_gram": 3,
"max_gram": 10,
"token_chars": [ "letter", "digit" ]
}
}
}
},
{ "mappings": ... } --> see below
}
When your tags field is a list of keywords, it's simply not possible to aggregate on that field without resorting to the include option which can be either exact matches or a regex (which you're already using). Now, we cannot guarantee exact matches but we also don't want to regex! So that's why we need to use a nested list which'll treat each tag separately.
Now, nested lists are expected to contain objects so
{
"tags": ["Race", "Racing", "Mountain Bike", "Horizontal"]
}
will need to be converted to
{
"tags": [
{ "tag": "Race" },
{ "tag": "Racing" },
{ "tag": "Mountain Bike" },
{ "tag": "Horizontal" }
]
}
After that we'll proceed with the multi field mapping, keeping the original tags intact but also adding a .tokenized field to search on and a .keyword field to aggregate on:
"index": { ... },
"analysis": { ... },
"mappings": {
"properties": {
"tags": {
"type": "nested",
"properties": {
"tag": {
"type": "text",
"fields": {
"tokenized": {
"type": "text",
"analyzer": "my_ngrams_analyzer"
},
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
We'll then add our adjusted tags docs:
POST tagindex/_doc
{"tags":[{"tag":"Race"},{"tag":"Racing"},{"tag":"Mountain Bike"},{"tag":"Horizontal"}]}
POST tagindex/_doc
{"tags":[{"tag":"Tracey Chapman"},{"tag":"Silverfish"},{"tag":"Blue"}]}
POST tagindex/_doc
{"tags":[{"tag":"Surfing"},{"tag":"Race"},{"tag":"Disgrace"}]}
and apply a nested filter terms aggregation:
GET tagindex/_search
{
"aggs": {
"topics_parent": {
"nested": {
"path": "tags"
},
"aggs": {
"topics": {
"filter": {
"term": {
"tags.tag.tokenized": "race"
}
},
"aggs": {
"topics": {
"terms": {
"field": "tags.tag.keyword",
"size": 100
}
}
}
}
}
}
},
"size": 0
}
yielding
{
...
"topics_parent" : {
...
"topics" : {
...
"topics" : {
...
"buckets" : [
{
"key" : "Race",
"doc_count" : 2
},
{
"key" : "Disgrace",
"doc_count" : 1
},
{
"key" : "Tracey Chapman",
"doc_count" : 1
}
]
}
}
}
}
Caveats
in order for this to work, you'll have to reindex
ngrams will increase the storage footprint -- depending on how many tags-per-doc you have, it may become a concern
nested fields are internally treated as "separate documents" so this affects the disk space too
P.S.: This is an interesting use case. Let me know how the implementation went!

Elasticsearch Mapping - Rename existing field

Is there anyway I can rename an element in an existing elasticsearch mapping without having to add a new element ?
If so whats the best way to do it in order to avoid breaking the existing mapping?
e.g. from fieldCamelcase to fieldCamelCase
{
"myType": {
"properties": {
"timestamp": {
"type": "date",
"format": "date_optional_time"
},
"fieldCamelcase": {
"type": "string",
"index": "not_analyzed"
},
"field_test": {
"type": "double"
}
}
}
}
You could do this by creating an Ingest pipeline, that contains a Rename Processor in combination with the Reindex API.
PUT _ingest/pipeline/my_rename_pipeline
{
"description" : "describe pipeline",
"processors" : [
{
"rename": {
"field": "fieldCamelcase",
"target_field": "fieldCamelCase"
}
}
]
}
POST _reindex
{
"source": {
"index": "source"
},
"dest": {
"index": "dest",
"pipeline": "my_rename_pipeline"
}
}
Note that you need to be running Elasticsearch 5.x in order to use ingest. If you're running < 5.x then you'll have to go with what #Val mentioned in his comment :)
Updating field name in ES (version>5, missing has been removed) using _update_by_query API:
Example:
POST http://localhost:9200/INDEX_NAME/_update_by_query
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "NEW_FIELD_NAME"
}
}
}
},
"script" : {
"inline": "ctx._source.NEW_FIELD_NAME = ctx._source.OLD_FIELD_NAME; ctx._source.remove(\"OLD_FIELD_NAME\");"
}
}
First of all, you must understand how elasticsearch and lucene store data, by immutable segments (you can read about easily on Internet).
So, any solution will remove/create documents and change mapping or create a new index so a new mapping as well.
The easiest way is to use the update by query API: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-update-by-query.html
POST /XXXX/_update_by_query
{
"query": {
"missing": {
"field": "fieldCamelCase"
}
},
"script" : {
"inline": "ctx._source.fieldCamelCase = ctx._source.fieldCamelcase; ctx._source.remove(\"fieldCamelcase\");"
}
}
Starting with ES 6.4 you can use "Field Aliases", which allow the functionality you're looking for with close to 0 work or resources.
Do note that aliases can only be used for searching - not for indexing new documents.

Looking for someone to help me with ElasticSearch

I'm beginner in ElasticSearch. I'm trying to test if a list of geopoint (lat / long ) is existing in a list of geopoints.
For example I give this geopoint :
"lat": 49.01536940596998
"lon": 2.4967825412750244
and I want to test if this point exist in the list below. Thanks.
"positions": [
{
"millis": 12959023,
"lat": 49.01525113731623,
"lon": 2.4971945118159056,
"rawX": -3754,
"rawY": 605,
"rawVx": 0,
"rawVy": 0,
"speed": 9.801029291617944,
"accel": 0.09442740907572084,
"grounded": true
},
{
"millis": 12959914,
"lat": 49.01536940596998,
"lon": 2.4967825412750244,
"rawX": -3784,
"rawY": 619,
"rawVx": -15,
"rawVy": 7,
"speed": 10.841861737855924,
"accel": -0.09534648619563282,
"grounded": true
}
...
}
To be able to search in an array of objects, you need to use the nested data type. As the linked page explains, to keep the internal elements of the array as independent, you cannot use the default mapping. First, you will have to update the mapping.
Note: Mappings only take effect on new indexes. Reference.
PUT YOUR_INDEX
{
"mappings": {
"YOUR_TYPE": {
"properties": {
"positions": {
"type": "nested"
}
}
}
}
}
Now we can query the data. You're looking for a bool query, which combines other queries (in your case, term queries).
POST _search
{
"query": {
"nested": {
"path": "positions",
"query": {
"bool" : {
"must" : [
{ "term" : { "lat": 49.01536940596998 } },
{ "term" : { "lon": 2.4967825412750244 } }
]
}
}
}
}
}

How to sort items by array size in ElasticSearch?

I have 3 millions items with this structure:
{
"id": "some_id",
"title": "some_title",
"photos": [
{...},
{...},
...
]
}
Some items may have empty photos field:
{
"id": "some_id",
"title": "some_title",
"photos": []
}
I want to sort by the number of photos to result in elements without photos were at the end of the list.
I have the one working solution but it's very slow on 3 million items:
GET myitems/_search
{
"filter": {
...some filters...
},
"sort": [
{
"_script": {
"script": "_source.photos.size()",
"type": "number",
"order": "desc"
}
}
]
}
This query executes 55 seconds. How to optimize this query?
As suggested in the comments, adding a new field with the number of photos would be the way to go. There's a way to achieve this without reindexing all your data by using the update by query plugin.
Basically, after installing the plugin, you can run the following query and all your documents will get that new field. However, make sure that your indexing process also populates that new field in the new documents:
curl -XPOST 'localhost:9200/myitems/_update_by_query' -d '{
"query" : {
"match_all" : {}
},
"script" : "ctx._source.nb_photos = ctx._source.photos.size();"
}'
After this has run, you'll be able to sort your results simply with:
"sort": {"nb_photos": "desc"}
Note: for this plugin to work, one needs to have scripting enabled, it is already the case for you since you were able to use a sort script, but I'm just mentioning this for completeness' sake.
Problem was solved with Transform directive. Now I have a mapping:
PUT /myitems/_mapping/lol
{
"lol" : {
"transform": {
"lang": "groovy",
"script": "ctx._source['has_photos'] = ctx._source['photos'].size() > 0"
},
"properties" : {
... fields ...
"photos" : {"type": "object"},
"has_photos": {"type": "boolean"}
... fields ...
}
}
}
Now I can sort items by photos existence:
GET /test/_search
{
"sort": [
{
"has_photos": {
"order": "desc"
}
}
]
}
Unfortunately, this will cause full reindexation.

Resources