Looking for someone to help me with ElasticSearch - elasticsearch

I'm beginner in ElasticSearch. I'm trying to test if a list of geopoint (lat / long ) is existing in a list of geopoints.
For example I give this geopoint :
"lat": 49.01536940596998
"lon": 2.4967825412750244
and I want to test if this point exist in the list below. Thanks.
"positions": [
{
"millis": 12959023,
"lat": 49.01525113731623,
"lon": 2.4971945118159056,
"rawX": -3754,
"rawY": 605,
"rawVx": 0,
"rawVy": 0,
"speed": 9.801029291617944,
"accel": 0.09442740907572084,
"grounded": true
},
{
"millis": 12959914,
"lat": 49.01536940596998,
"lon": 2.4967825412750244,
"rawX": -3784,
"rawY": 619,
"rawVx": -15,
"rawVy": 7,
"speed": 10.841861737855924,
"accel": -0.09534648619563282,
"grounded": true
}
...
}

To be able to search in an array of objects, you need to use the nested data type. As the linked page explains, to keep the internal elements of the array as independent, you cannot use the default mapping. First, you will have to update the mapping.
Note: Mappings only take effect on new indexes. Reference.
PUT YOUR_INDEX
{
"mappings": {
"YOUR_TYPE": {
"properties": {
"positions": {
"type": "nested"
}
}
}
}
}
Now we can query the data. You're looking for a bool query, which combines other queries (in your case, term queries).
POST _search
{
"query": {
"nested": {
"path": "positions",
"query": {
"bool" : {
"must" : [
{ "term" : { "lat": 49.01536940596998 } },
{ "term" : { "lon": 2.4967825412750244 } }
]
}
}
}
}
}

Related

Elasticsearch custom geo distance filter

From an Elasticsearch query I'd like to retrieve all the points within a variable distance.
Let say I have 2 shops, one is willing to deliver at maximum 3 km and the other one at maximum 5 km:
PUT /my_shops/_doc/1
{
"location": {
"lat": 40.12,
"lon": -71.34
},
"max_delivery_distance": 3000
}
PUT /my_shops/_doc/2
{
"location": {
"lat": 41.12,
"lon": -72.34
},
"max_delivery_distance": 5000
}
For a given location I'd like to know which shops are able to deliver. IE query should return shop1 if given location is within 3km and shop2 if given location is within 5km
GET /my_shops/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": max_delivery_distance,
"location": {
"lat": 40,
"lon": -70
}
}
}
}
}
}
There's another way to solve this without scripting (big performance hogger !!) and let ES sort it out using native Geo shapes.
I would model each document as a circle, with a center location and a (delivery) radius. First, your index mapping should look like this:
PUT /my_shops
{
"mappings": {
"properties": {
"delivery_area": {
"type": "geo_shape",
"strategy": "recursive"
}
}
}
}
Then, your documents then need to have the following form:
PUT /my_shops/_doc/1
{
"delivery_area" : {
"type" : "circle",
"coordinates" : [-71.34, 40.12],
"radius" : "3000m"
}
}
PUT /my_shops/_doc/2
{
"delivery_area" : {
"type" : "circle",
"coordinates" : [-72.34, 41.12],
"radius" : "5000m"
}
}
And finally the query simply becomes a geo_shape query looking at intersections between a delivery point and the delivery area of each shop.
GET /my_shops/_search
{
"query": {
"bool": {
"filter": {
"geo_shape": {
"delivery_area": {
"shape": {
"type": "point",
"coordinates": [ -70, 40 ]
},
"relation": "contains"
}
}
}
}
}
}
That's it! No scripting, just geo operations.
I think that you need to work with a script to use another field as parameter. After some research I come to this answer:
GET my_shops/_search
{
"query": {
"script": {
"script": {
"params": {
"location": {
"lat": 40,
"lon": -70
}
},
"source": """
return doc['location'].arcDistance(params.location.lat, params.location.lon)/1000 <= doc['max_delivery_distance'].value"""
}
}
}
}
Basically, we exploit the fact that the classes related to the GEO points are whitelisted in painless https://github.com/elastic/elasticsearch/pull/40180/ and that scripts accepts additional parameters (your fixed location).
According to the documentation of arcDistance we retrieve the size in meters, so you need to convert this value into km by dividing by 1000.
Additional Note
I assume that location and max_delivery_distance are always (for each document) defined. If it is not the case, you need to cover this case.
Reference
Another related question
https://github.com/elastic/elasticsearch/pull/40180/

How to correctly query inside of terms aggregate values in elasticsearch, using include and regex?

How do you filter out/search in aggregate results efficiently?
Imagine you have 1 million documents in elastic search. In those documents, you have a multi_field (keyword, text) tags:
{
...
tags: ['Race', 'Racing', 'Mountain Bike', 'Horizontal'],
...
},
{
...
tags: ['Tracey Chapman', 'Silverfish', 'Blue'],
...
},
{
...
tags: ['Surfing', 'Race', 'Disgrace'],
...
},
You can use these values as filters, (facets), against a query to pull only the documents that contain this tag:
...
"filter": [
{
"terms": {
"tags": [
"Race"
]
}
},
...
]
But you want the user to be able to query for possible tag filters. So if the user types, race the return should show (from previous example), ['Race', 'Tracey Chapman', 'Disgrace']. That way, the user can query for a filter to use. In order to accomplish this, I had to use aggregates:
{
"aggs": {
"topics": {
"terms": {
"field": "tags",
"include": ".*[Rr][Aa][Cc][Ee].*", // I have to dynamically form this
"size": 6
}
}
},
"size": 0
}
This gives me exactly what I need! But it is slow, very slow. I've tried adding the execution_hint, it does not help me.
You may think, "Just use a query before the aggregate!" But the issue is that it'll pull all values for all documents in that query. Meaning, you can be displaying tags that are completely unrelated. If I queried for race before the aggregate, and did not use the include regex, I would end up with all those other values, like 'Horizontal', etc...
How can I rewrite this aggregation to work faster? Is there a better way to write this? Do I really have to make a separate index just for values? (sad face) Seems like this would be a common issue but have found no answers through documentation and googling.
You certainly don't need a separate index just for the values...
Here's my take on it:
What you're doing with the regex is essentially what should've been done by a tokenizer -- i.e. constructing substrings (or N-grams) such that they can be targeted later.
This means that the keyword Race will need to be tokenized into the n-grams ["rac", "race", "ace"]. (It doesn't really make sense to go any lower than 3 characters -- most autocomplete libraries choose to ignore fewer than 3 characters because the possible matches balloon too quickly.)
Elasticsearch offers the N-gram tokenizer but we'll need to increase the default index-level setting called max_ngram_diff from 1 to (arbitrarily) 10 because we want to catch as many ngrams as is reasonable:
PUT tagindex
{
"settings": {
"index": {
"max_ngram_diff": 10
},
"analysis": {
"analyzer": {
"my_ngrams_analyzer": {
"tokenizer": "my_ngrams",
"filter": [ "lowercase" ]
}
},
"tokenizer": {
"my_ngrams": {
"type": "ngram",
"min_gram": 3,
"max_gram": 10,
"token_chars": [ "letter", "digit" ]
}
}
}
},
{ "mappings": ... } --> see below
}
When your tags field is a list of keywords, it's simply not possible to aggregate on that field without resorting to the include option which can be either exact matches or a regex (which you're already using). Now, we cannot guarantee exact matches but we also don't want to regex! So that's why we need to use a nested list which'll treat each tag separately.
Now, nested lists are expected to contain objects so
{
"tags": ["Race", "Racing", "Mountain Bike", "Horizontal"]
}
will need to be converted to
{
"tags": [
{ "tag": "Race" },
{ "tag": "Racing" },
{ "tag": "Mountain Bike" },
{ "tag": "Horizontal" }
]
}
After that we'll proceed with the multi field mapping, keeping the original tags intact but also adding a .tokenized field to search on and a .keyword field to aggregate on:
"index": { ... },
"analysis": { ... },
"mappings": {
"properties": {
"tags": {
"type": "nested",
"properties": {
"tag": {
"type": "text",
"fields": {
"tokenized": {
"type": "text",
"analyzer": "my_ngrams_analyzer"
},
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
We'll then add our adjusted tags docs:
POST tagindex/_doc
{"tags":[{"tag":"Race"},{"tag":"Racing"},{"tag":"Mountain Bike"},{"tag":"Horizontal"}]}
POST tagindex/_doc
{"tags":[{"tag":"Tracey Chapman"},{"tag":"Silverfish"},{"tag":"Blue"}]}
POST tagindex/_doc
{"tags":[{"tag":"Surfing"},{"tag":"Race"},{"tag":"Disgrace"}]}
and apply a nested filter terms aggregation:
GET tagindex/_search
{
"aggs": {
"topics_parent": {
"nested": {
"path": "tags"
},
"aggs": {
"topics": {
"filter": {
"term": {
"tags.tag.tokenized": "race"
}
},
"aggs": {
"topics": {
"terms": {
"field": "tags.tag.keyword",
"size": 100
}
}
}
}
}
}
},
"size": 0
}
yielding
{
...
"topics_parent" : {
...
"topics" : {
...
"topics" : {
...
"buckets" : [
{
"key" : "Race",
"doc_count" : 2
},
{
"key" : "Disgrace",
"doc_count" : 1
},
{
"key" : "Tracey Chapman",
"doc_count" : 1
}
]
}
}
}
}
Caveats
in order for this to work, you'll have to reindex
ngrams will increase the storage footprint -- depending on how many tags-per-doc you have, it may become a concern
nested fields are internally treated as "separate documents" so this affects the disk space too
P.S.: This is an interesting use case. Let me know how the implementation went!

ElasticSearch: preserve_position_increments not working

According to the docs
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
preserve_position_increments=false is supposed to make consecutive keywords in a string searchable. But for me it's not working. Is this a bug? Steps to reproduce in Kibana:
PUT /example-index/
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_doc": {
"properties": {
"example-suggest-field": {
"type": "completion",
"analyzer": "stop",
"preserve_position_increments": false,
"max_input_length": 50
}
}
}
}
}
PUT /example-index/_doc/1
{
"example-suggest-field": [
{
"input": "Nevermind Nirvana",
"weight" : 10
}
]
}
POST /example-index/_search
{
"suggest": {
"bib-suggest" : {
"prefix" : "nir",
"completion" : {
"field" : "example-suggest-field"
}
}
}
}
POST /example-index/_search
{
"suggest": {
"bib-suggest" : {
"prefix" : "nev",
"completion" : {
"field" : "example-suggest-field"
}
}
}
}
If yes I will make a bug report
It's not a bug, preserve_position_increments is only useful when you are removing stopwords and would like to search for the token coming after the stopword (i.e. search for Beat and find The Beatles).
In your case, you should probably index ["Nevermind", "Nirvana"] instead, i.e. and array of tokens.
If you try to indexing "The Nirvana" instead, you'll find it by searching for nir

How to filter the result in elastic search to the documents which id contains any of the elements in [10,23,34,44]

I am new to elastic search. I am using the following library to help me with building my search query.
Here is the library:
https://github.com/sudo-suhas/elastic-builder
I have the below code to create my queries:
requestBody = elasticSearchBuilder.requestBodySearch()
.query(
elasticSearchBuilder.boolQuery()
.must(elasticSearchBuilder.multiMatchQuery(feilds, searchQuery)
.type(sortAlgorithm)
.tieBreaker(tieBreaker)
.minimumShouldMatch(searchAccuracy))
which results in:
{
"bool": {
"must": {
"multi_match": {
"query": "xxxxxxxxx",
"fields": ["name", "owner", "byline_name", "head"],
"type": "best_fields",
"tie_breaker": 0.3
}
}
}
}
}
Now I want to add another filter in which searches among the documents where the id in in this array: [10,23,34,44]. so I am looking for sth like between or contains or ... whatever can solve this. Can anyone helps me with this?
Update:
[
{
"id": "80092",
"categoryId": "43229",
"channelId": "54322",
"channelName": "xxxxxxxxxxxx",
"owner": "xxxxxxxxxxxxx",
"ownerChannel": "xxxxxxxxxxxxx",
"bylineName": "xxxxxxxxxxxxxx",
"bylinePublication": "xxxxxxxxxxxx"
}
]
If you want to do this for identifiers, there is a special query called ids
POST http://localhost:9200/your_index/your_type/_search
Content-type: application/json
{
"query": {
"ids" : {
"values" : ["10", "23", "34", "44"]
}
}
}
For other fields you can use terms query.
{
"query": {
"constant_score" : {
"filter" : {
"terms" : { "otherField" : [10, 23, 34, 44]}
}
}
}
}

Accessing nested property in Elasticsearch distance script.

My index in elastic search has the following mapping:
"couchbaseDocument": {
"properties": {
"doc": {
"properties": {
"properties": {
"properties": {
"location": {
"type": "geo_point"
The source document is as follows:
{"properties" : {"location":"43.706596,-79.4030464"}}
I am trying to use the distance script to calculate the distance based on geo-points. I found this post Return distance in elasticsearch results? to help me out. I am trying to get all results,filter by radius 1km, get the distance, and sort on geo_point. The query is constructed as follows:
{
"query": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "1km",
"doc.properties.location": {
"lat": 43.710323,
"lon": -79.395284
}
}
},
"script_fields": {
"distancePLANE": {
"params": {
"lat": 43.710323,
"lon": -79.395284
},
"script": "doc[properties]['location'].distanceInKm(lat, lon)"
},
"distanceARC" :{
"params": {
"lat": 43.710323,
"lon": -79.395284
},
"script": "doc[properties]['location'].arcDistanceInKm(lat,lon)"
}
},
"sort": [
{
"_geo_distance":{
"doc.properties.location": [-79.395284,43.710323],
"order": "desc",
"unit": "km"
}
}
],
"track_scores": true
}
I get the following error with status 500:
"PropertyAccessException[[Error: could not access: properties; in class: org.elasticsearch.search.lookup.DocLookup]\n[Near : {... doc[properties]['location'].distan ....}]\n ^\n[Line: 1, Column: 5]]"
I tried rewriting the query in this way:
..."script": "doc['properties']['location'].arcDistanceInKm(lat,lon)"...
Then I get this error:
"CompileException[[Error: No field found for [properties] in mapping with types [couchbaseDocument]]\n[Near : {... doc['properties']['location']. ....}]\n ^\n[Line: 1, Column: 1]]; nested: ElasticSearchIllegalArgumentException[No field found for [properties] in mapping with types [couchbaseDocument]]; "
When I remove the script part from the query all together, the sorting and filtering works just fine. Is there a different way to access nested fields when using scripts? Any insights would be really appreciated!
Thank you!
Managed to get it done with
"script" : "doc.last_location.distance(41.12, -71.34)"
Don't know why but doc['last_location'] does not seem to work at all!
As mentioned in my comment when you sort by _geo_distance the "_sort" field that is returned, is the actual distance. So there is no need to do a separate computation. Details here: http://elasticsearch-users.115913.n3.nabble.com/search-by-distance-and-getting-the-actual-distance-td3317140.html#a3936224

Resources