Why would this cached geo query be slower in elasticsearch than the uncached? - performance

Looking at the query below, I've added a cached geo_bounding_box filter in front of my geo_shape filter. My expectation after reading https://www.elastic.co/guide/en/elasticsearch/guide/current/geo-caching.html was that this query should be faster. However, in my benchmarking the query with both filters turns out to be slightly slower on average, and MUCH slower in the worst case. Am I doing something wrong, or misinterpreting the doc?
{
"query": {
"filtered": {
"filter": {
"bool" : {
"must" : [
{"geo_bounding_box" : {
"_cache": True,
"properties.center" : {
"top_left" : {
"lat" : math.ceil(float(lat)),
"lon" : math.floor(float(lon))
},
"bottom_right" : {
"lat" : math.floor(float(lat)),
"lon" : math.ceil(float(lon))
}
}
}},
{"geo_shape": {
"geometry": {
"relation": "intersects",
"shape": {
"coordinates": [lon,lat],
"type": "point"
}
}
}}
]
}
}
}
}
}

Use lowercase JSON boolean values:
"_cache": true

Related

ElasticSearch Search geo-points within a circle that was created from a geo-point

I've gone through the documentation and searched google but can't find the answer I'm looking for. All i want to do is search points within a circle created from a geo-point
GET /pointsinradius/_doc/_search
{
"query": {
"geo_shape": {
"location": {
"shape": {
"type": "circle",
"radius": "1km",
"coordinates": [
-32.360738, 22.56237
]
}
}
}
}
}
You need to use geo_distance query
Here is an example
GET /my_locations/_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "200km",
"pin.location" : {
"lat" : 40,
"lon" : -70
}
}
}
}
}
}

Elastic Search must not queries are slow

I have a test index of 50K documents.
I'm firing 500 (same) queries against it, which have a clause that a field (that is an array of values) "must not" be of "some value".
Out of these 500 queries several fail/time out. (Sometimes it's 5, sometimes it's 9, sometimes it's 18 queries...) Is there a way to make the "must not" queries faster? In production the index is going to be several million docs, and the majority of queries are going to have "must not" clauses.
Mapping is as follows:
{
"jobs_en":{
"mappings":{
"index":{
"_all":{
"enabled":false
},
"properties":{
"GUID":{
"type":"string",
"index":"not_analyzed"
},
"channel":{
"type":"string",
"index":"not_analyzed"
},
"country":{
"type":"string",
"analyzer":"standard"
}
}
}
}
}
}
The query is as follows:
{
"bool" : {
"must" : [ {
"bool" : {
"must" : {
"bool" : { }
},
"must_not" : {
"term" : {
"channel" : "Email"
}
}
}
}, {
"bool" : {
"must" : {
"match" : {
"country" : {
"query" : "US",
"type" : "boolean"
}
}
}
}
} ]
}
}"
We have a large database in ES, I don't think it is as large as yours. Several things help me:
1. Use Must if you can.
2. Use Must Not WITH Must.
3. If you are able to: use Source.
"query" : {
"bool" : {
"must": [
{"term": {
"createUser": {
"value": "processor.imsignal"
}
}
},
{"terms" : {
"imcampaignid" : [70191,66983,70188,70235,70190]
}
}
],
"must_not": [
{"term": {
"source": {
"value": "EMAIL"
}
}
},
{"terms" : {
"category" : ["campaign_email","unsubscribe","from_email"]
}
}
]
}
},
"_source": ["category","source","accountPlatformID"]
By specifying a must first, it speeds up the query. By specifying must_not it can reduce the number of returned records which can be a real hit. Finally, reducing what is returned on those records can really be helpful.
Since there was no other answer, I figured I'd help with what I knew. Believe it or not, this query with the must not outperforms the identical query with only the musts for my purposes by tens of seconds. Telling something what it should be is essential, then filter with what it is not.

elasticsearch query on all array elements

How can I search for documents that have all of the specified tags in the following query? I tried minimum_should_match and "execution": "and", but none of them is supported in my query.
GET products/fashion/_search
{
"query": {
"constant_score": {
"filter" : {
"bool" : {
"must" : [
{"terms" : {
"tags" : ["gucci", "dresses"]
}},
{"range" : {
"price.value" : {
"gte" : 100,
"lt" : 1000
}
}}
]
}
}
}
},
"sort": { "date": { "order": "desc" }}
}
====== UPDATE
I found a way to build my queries. The task was to reproduce the following mongodb query in the elasticsearch:
{
"tags": {
"$all":["gucci","dresses"]
},
"price.value":{"$gte":100,"$lte":1000}
}
And here is my elasticsearch query
GET products/fashion/_search
{
"query": {
"bool" : {
"filter" : [
{"term" : {
"tags" : "gucci"
}},
{"term" : {
"tags" : "dresses"
}},
{"range" : {
"price.value" : {
"gte" : 100,
"lt" : 1000
}
}}
]
}
}
}
Do you have a mapping defined for your index? By default, Elasticsearch will analyze string fields. If you want to find exact terms like you are above, you need to specify them as not_analyzed in the mapping.
https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html#_term_filter_with_text

Difference between using multiple filters and specifying multiple filters in a single "and" clause

I am a new to elasticsearch and don't know what is the difference between the two queries. Is it just processing time or are they fundamentally different queries.
1) filters : { and: [{
"bool" : {
"should" : {
"term" : {
"Code" : "1510"
}
}
}
}
,
{
"bool" : {
"should" : {
"term" : {
"Id" : "Id3"
}
}
}
}] }
2) filter: [{
"bool" : {
"must" : [{
"term" : {
"Code" : "1510"
},
"term":{
"Id":"Id3"}]
}
}
}]
The queries in OP are logically equivalent.
However that being said I find 2) to be intutive , readable and simpler.
Generally for perfomance reasons bool filters are preferred over and although for the queries in question I doubt this difference is perceptible.
Also for the and filter the query in 1) is better written as follows :
"filter": {
"and": [
{
"term": {
"Code": "1510"
}
},
{
"term": {
"Id": "Id3"
}
}
]
}

Filtering on Elasticsearch Optional Fields

I'm using Elasticsearch to query a document type, that has an optional location field. When searching, if that field does not exist, those results should be returned, as well as filtering on the results that do.
It seems like the OR filter in Elasticsearch does not short circuit, as this:
"query": {
"filtered": {
"query": {
"match_phrase_prefix": {
"display_name": "SearchQuery"
}
},
"filter": {
"or": [
{
"missing": {
"field": "location"
}
},
{
"geo_distance" : {
"distance" : "20mi",
"location" : {
"lat" : 33.47,
"lon" : -112.07
}
}
}
]
}
Fails with "failed to find geo_point field [location]".
Is there any way to perform this (or something along the same vein) in ES?
I don't know why yours isn't working but I've used the bool filter with great success in the past. The should option is essentially an or and makes sure at least one is true. Give it a try and comment on my answer if it still doesn't work. Also double check I copied your query terms properly :)
{
"filtered" : {
"query" : {
"match_phrase_prefix": {
"display_name": "SearchQuery"
}
},
"filter" : {
"bool" : {
"should" : [
{
"missing": { "field": "location" }
},
{
"geo_distance" : {
"distance" : "20mi",
"location" : {
"lat" : 33.47,
"lon" : -112.07
}
}
}
]
}
}
}
}
For anyone with the same issue, I kind of just hacked around it. For any documents that were missing a "location", I added one with a lat/lon of 0/0. Then I altered my query to be:
"filter": {
"or": [
{
"geo_distance": {
"distance": "0.1mi",
"location": {
"lat": 0,
"lon": 0
}
}
},
{
"geo_distance": {
"distance": "30mi",
"location": {
"lat": [lat variable],
"lon": [lon variable]
}
}
}
]
}

Resources