Elasticsearch geo distance query range - elasticsearch

Hi for a project I am trying to return users from elastic search which are in a range of 2km to 4km away from the search user.
I use the below query
`
{
"size": 1000,
"from": 0,
"_source": "user_id",
"query":{
"bool":{
"must_not": {
"terms": {
"user_id": []
}
},
"filter":[
{
"geo_distance_range":{
"from":"2km",
"to": "4km",
"location":{
"lon":-122.4194,
"lat":37.7749
}
}
}
]
}
}
}`
This query is deleted in elastic search version 6.3 which is the version I am using.
Can anyone please help me solve this use case in elastic search 6.3? Aggregations only returns the number of users in the range but I want to return complete results of all users in the range.

I can't test this, but it seems reasonable that you should be able to combine must and must_not clauses with geo_distance:
"query": {
"bool": {
"must_not": {
"terms": {
"user_id": []
},
"geo_distance": {
"distance": "2km",
"location": [-122.4194, 37.7749]
}
},
"must": {
"geo_distance": {
"distance": "4km",
"location": [-122.4194, 37.7749]
}
}
}
}

Related

geo_distance doesn't return any hit Elasticsearch

Have a problem with this query, when I use geo_distance filter, nothing returned from query. When I remove it I get proper results. Query is bellow:
GET _search
{
"query": {
"bool": {
"filter": {
"geo_distance": {
"distance": 20,
"distance_unit": "km",
"coordinates": [48.8488576, 2.3354223]
}
},
"must": {
"term": {
"_type": {
"value": "staff"
}
}
},
"must_not": [
{
"term": {
"cabinet.zipcode": {
"value": "75006"
}
}
},
{
"term": {
"next_availability_in_days": {
"value": "-1"
}
}
}
]
}
}
}
I would appreciate if someone gives me a hint.
UPDATE
When I run Elasticsearch Ruby DSL with same query logic, I get proper results:
<Elasticsearch::Model::Searching::SearchRequest:0x007ff335763560
#definition=
{:index=>["development_app_scoped_index_20170428134744",
"development_app_scoped_index_20170428134744"], :type=>["staff", "light_staff"],
:body=>
{:query=>
{:bool=>
{:must_not=>[
{:term=>{"cabinet.zipcode"=>75006}},
{:term=> {:next_availability_in_days=>-1}}
],
:must=>[
{:term=>{:_type=>"staff"}}
],
:filter=>{:geo_distance=>
{:coordinates=>
{:lat=>48.8488576, :lon=>2.3354223},
:distance=>"6km"
}
}}},
:sort=>[
{:type=>{:order=>"desc"}},
{"_geo_distance"=>{"coordinates"=>"48.8488576,2.3354223", "order"=>"asc",
"unit"=>"km"}},
{:next_availability_in_days=>{:order=>"asc"}},
{:priority=>{:order=>"asc"}}
]
}}
So this is really weird and I'm not sure what's going wrong in ES syntax, but it definitely should work as expected.
Thanks.
There is probably nothing in the range that you have entered.
Try to increase the "distance": 20 field to "distance": 500 and check the results then. For example the distance between these two geo points [0,0] and [0,1] is ~138.3414KM .
Another suggestion is to get rid of the "distance_unit" field and put the
and put the KM inside the "distance" field as following:
{
"query": {
"bool": {
"filter": {
"geo_distance": {
"distance": "20km",
"coordinates": [
48.8488576,
2.3354223
]
}
}
}
}
}

function_score: treat missing field as perfect hit

What I need to do is boost documents by location (closer is better). locations is nested type.
Works fine except Elasticsearch does not return documents if locations is missing in document. If fields is missing, Elasticsearch should treat document as perfect hit. Any idea how to achieve this?
My query:
{
"sort": [
{
"_score": "desc"
}
],
"query": {
"function_score": {
"query": {
"bool": {
"must_not": [],
"should": [
{
"nested": {
"path": "locations",
"query": {
"function_score": {
"score_mode": "sum",
"functions": [
{
"gauss": {
"locations.coordinates": {
"origin": {
"lat": "50.1078852",
"lon": "15.0385376"
},
"scale": "100km",
"offset": "20km",
"decay": "0.5"
}
}
}
]
}
}
}
}
]
}
}
}
}
}
BTW: I'm using Elasticsearch 5.0
Add one more function into the functions section with high boost (like 1000) for missing locations, something like this:
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "locations"
}
}
},
},
"weight": 1000
}
So records with missing locations will come first because of high weight.
Syntax can differ a little. More information on queries here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-exists-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

How to do geolocation search using Query String Query

I was going through Query String Query https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
How can i do geo location based search if i have latitude and longitude and range in KM?
After some experiments and some search on net. I achieved this with wrapping "query string" in bool query.
{
"size": 0,
"query": {
"bool": {
"must": {
"query_string": {
"query": "one:1 AND two:2"
}
},
"filter": {
"geo_distance": {
"distance": "12km",
"latLong": "16.48,80.61"
}
}
}
},
"from": "0",
"_source": [
"user"
]
}
cheers.
Thank-you.

Terrible has_child query performance

The following query has terrible performance.
100% sure it is the has_child. Query without it runs under 300ms, with it it takes 9 seconds.
Is there some better way to use the has_child query? It seems like I could query parents, and then children by id and then join client side to do the has child check faster than the ES database engine is doing it...
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"has_child": {
"type": "status",
"query": {
"term": {
"stage": "s3"
}
}
}
},
{
"has_child": {
"type": "status",
"query": {
"term": {
"stage": "es"
}
}
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"source": "IntegrationTest-2016-03-01T23:31:15.023Z"
}
},
{
"range": {
"eventTimestamp": {
"from": "2016-03-01T20:28:15.028Z",
"to": "2016-03-01T23:33:15.028Z"
}
}
}
]
}
}
}
},
"aggs": {
"digests": {
"terms": {
"field": "digest",
"size": 0
}
}
},
"size": 0
}
Cluster info:
CPU and memory usage is low. It is AWS ES Service cluster (v1.5.2). Many small documents, and since version aws is running is old, doc values aren't on by default. Not sure if that is helping or hurting.
Since "stage" is not analyzed (based on your comment) and, therefore, you are not interested in scoring the documents that match on that field, you might realize slight performance gains by using the has_child filter instead of the has_child query. And using a term filter instead of a term query.
In the documentation for has_child, you'll notice:
The has_child filter also accepts a filter instead of a query:
The main performance benefits of using a filter come from the fact that Elasticsearch can skip the scoring phase of the query. Also, filters can be cached which should improve the performance of future searches that use the same filters. Queries, on the other hand, cannot be cached.
Try this instead:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"source": "IntegrationTest-2016-03-01T23:31:15.023Z"
}
},
{
"range": {
"eventTimestamp": {
"from": "2016-03-01T20:28:15.028Z",
"to": "2016-03-01T23:33:15.028Z"
}
}
},
{
"has_child": {
"type": "status",
"filter": {
"term": {
"stage": "s3"
}
}
}
},
{
"has_child": {
"type": "status",
"filter": {
"term": {
"stage": "es"
}
}
}
}
]
}
}
}
},
"aggs": {
"digests": {
"terms": {
"field": "digest",
"size": 0
}
}
},
"size": 0
}
I bit the bullet and just performed the parent:child join in my application. Instead of waiting 7 seconds for the has_child query, I fire off two consecutive term queries and do some post processing: 200ms.

Elasticsearch outputs the score of 1.0 for all results when searching for a single "starred" term

We are using Elasticsearch to search for the most relevant companies in a specific catalog. When we use the normal search term like lettering we get reasonable scores and can sort the results according to the score.
However, when we modify the search term before querying and make the "starred" version of it (e.g., *lettering*) to be able to search for substrings we get a score of 1.0 for every result. The search for substrings is a requirement in the project.
Any ideas on what could cause this relevance computation? The problem occurs only when a single term is used. We get comprehensible scores when we use two starred terms in combination (e.g., *lettering* *digital*).
EDIT 1:
Exemplary mapping (YAML, other properties are mapped in the same way, excepting boost which is different for each property):
elasticSearchMapping:
type: object
include_in_all: true
enabled: true
properties:
'keywords':
type: string
include_in_all: true
boost: 50
Query:
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [{
"match_all": []
}, {
"query_string": {
"query": "*lettering*"
}
}]
}
},
"filter": {
"bool": {
"must": [{
"term": {
"__parentPath": "/sites/industrycatalog"
}
}, {
"terms": {
"__workspace": ["live"]
}
}, {
"term": {
"__dimensionCombinationHash": "d751713988987e9331980363e24189ce"
}
}, {
"term": {
"__typeAndSupertypes": "IndustryCatalog:Entry"
}
}],
"should": [],
"must_not": [{
"term": {
"_hidden": true
}
}, {
"range": {
"_hiddenBeforeDateTime": {
"gt": "now"
}
}
}, {
"range": {
"_hiddenAfterDateTime": {
"lt": "now"
}
}
}]
}
}
}
},
"fields": ["__path"],
"script_fields": {
"distance": {
"script": "doc['coordinates'].distanceInKm(51.75631079999999,14.332867899999997)"
}
},
"sort": [{
"customer.featureFlags.industrycatalog": {
"order": "asc"
}
}, {
"_geo_distance": {
"coordinates": {
"lat": "51.75631079999999",
"lon": "14.332867899999997"
},
"order": "asc",
"unit": "km",
"distance_type": "plane"
}
}],
"size": 999999
}
What you are doing is wildcard query, They fall under term level queries and by default constant score is applied.
Check the Lucene Documentation, WildcardQuery extends MultiTermQuery
You can also verify this with the help of explain api, you will something like this
"_explanation": {
"value": 1,
"description": "ConstantScore(company:lettering), product of:",
"details": [{
"value": 1,
"description": "boost"
}, {
"value": 1,
"description": "queryNorm"
}]
}
You can change this behavior with rewriting,
Try this, rewrite also works with query string query
{
"query": {
"wildcard": {
"company": {
"value": "digital*",
"rewrite": "scoring_boolean"
}
}
}
}
It has various options for scoring, see what fits your requirement.
EDIT 1, the reason you see score other than 1 for *lettering* *digital* is due to queryNorm, you can again check with explain api, If you look closely, all documents with both matches will have same score and documents with single match will have same score also.
P.S : leading wildcard is not recommended at all. You will get performance issues since it has to check against every single term in the inverted index. You might want to check edge ngram or ngram filter
Hope this helps!

Resources