Using geo_shape filter inside bool filter - elasticsearch

I'm trying to combine a geo_shape Elasticsearch filter with a basic term filter within a bool filter, so I can attempt to improve performance of our elasticsearch query, with little success.
This query is used over a set of polygons in Elasticsearch, to determine which shapes the specified point is in.
It seems as though, unless I have the wrong end of the stick, geo_shape filters can't be included inside a bool filter collection like this:
{
"size": 1000,
"fields": [],
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"geo_shape": {
"deliveryAreas.area": {
"shape": {
"coordinates": [
-0.126208,
51.430874
],
"type": "point"
}
}
}
},
{
"term": {
"restaurantState": 3
}
}
]
}
}
}
}
}
The query above runs, but returns 0 results. Using the geo_shape query outside the bool works fine, but the combination of the two seems to fail. I assume it must be a syntax error, as the ElasticSearch docs recommend this approach to make the expensive geo calls cheaper, but no luck so far.

Related

Elasticsearch "boost" not working when inside "filter"

I'm trying to boost matches on a certain field over another.
This works fine:
{
"query": {
"bool": {
"should": [
{
"terms": {
"boost": 2,
"mainField": "foo"
}
},
{
"terms": {
"otherField": "foo"
}
}
]
}
}
}
When i see the documents matched on mainField, i see they have a _score of 2.0 as expected.
But when i wrap this same query in a filter:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"terms": {
"boost": 2,
"mainField": "foo"
}
},
{
"terms": {
"otherField": "foo"
}
}
]
}
}
]
}
}
}
The _score for all documents is 0.0.
The same thing happens for multi_match. By itself (e.g inside a query) it works fine, but inside a bool + filter, it doesn't work.
Can someone explain why this is the case? I need to wrap in a filter due to the way my app composes queries.
Some context might also help: I'm trying to return documents that match on either mainField or otherField, but sort the ones matching on mainField first, so i figured boost would be the most appropriate choice here. But let me know if there is a better way.
The filter queries are always executed in the filter context. It will always return a score of zero and only contribute to the filtering of documents.
Refer to this documentation, to know more about filter context
Due to this, you are not getting a _score of 2.0, even after applying boost, in the second query

Difference between elasticsearch queries

I'm having a hard time trying to figure out why these two queries do not return the same number of results (I'm using elasticsearch 2.4.1):
{
"nested": {
"path": "details",
"filter": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
{
"nested": {
"path": "details",
"filter": {
"bool": {
"must": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
}
}
The first query has more results.
My guess was that the filter clause in the first query was working like an or/should, but if I replace the must in the second query with a should, the query yields a greater number of results than that of those two.
How does the meaning of those queries differ?
I'm afraid I have no knowledge of the structure of the indexed documents; all I know is how many rows each query returns.
The first query is wrong, the nested filter cannot be an array, so I suspect ES doesn't parse it correctly and only takes one match instead of both, which is probably why it returns more data than the second one.
The second query is correct in terms of nested filter and yields exactly what you expect.

Elasticsearch terms query on array of values

I have data on ElasticSearch index that looks like this
{
"title": "cubilia",
"people": [
"Ling Deponte",
"Dana Madin",
"Shameka Woodard",
"Bennie Craddock",
"Sandie Bakker"
]
}
Is there a way for me to do a search for all the people whos name starts with
"ling" (should be case insensitive) and get distinct terms properly cased "Ling Deponte" not "ling deponte"?
I am find with changing mappings on the index in any way.
Edit does what I want but is really bad query:
{
"size": 0,
"aggs": {
"person": {
"filter": {
"bool":{
"should":[
{"regexp":{
"people.raw":"(.* )?[lL][iI][nN][gG].*"
}}
]}
},
"aggs": {
"top-colors": {
"terms": {
"size":10,
"field": "people.raw",
"include":
{
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
}
}
people.raw is not_analyzed
Yes, and you can do it without a regular expression by taking advantage of Elasticsearch's full text capabilities.
GET /test/_search
{
"query": {
"match_phrase": {
"people": "Ling"
}
}
}
Note: This could also be match or match_phrase_prefix in this case. The match_phrase* queries imply an order of the values in the text. match simply looks for any of the values. Since you only have one value, it's pretty much irrelevant.
The problem is that you cannot limit the document responses to just that name because the search API returns documents. With that said, you can use nested documents and get the desired behavior via inner_hits.
You do not want to do wildcard prefixing whenever possible because it simply does not work at scale. To put it in SQL terms, that's like doing a full table scan; you effectively lose the benefit of the inverted index because it has to walk it entirely to find the actual start.
Combining the two should work pretty well though. Here, I use the query to widdle down results to what you are interested in, then I use your inner aggregation to only include based on the value.
{
"size": 0,
"query": {
"match_phrase": {
"people": "Ling"
}
}
"aggs": {
"person": {
"terms": {
"size":10,
"field": "people.raw",
"include": {
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
Hi Please find the query it may help for your request
GET skills/skill/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"wildcard": {
"skillNames.raw": "jav*"
}
}
]
}
}
}
}
}
My intention is to find documents starting with the "jav"

Elastic Search Filter performing much slower than Query

As my ES index/cluster has scaled up (# ~2 billion docs now), I have noticed more significant performance loss. So I started messing around with my queries to see if I could squeeze some perf out of them.
As I did this, I noticed that when I used a Boolean Query in my Filter, my results would take about 3.5-4 seconds to come back. But if I do the same thing in my Query it is more like 10-20ms
Here are the 2 queries:
Using a filter
POST /backup/entity/_search?routing=39cd0b95-efc3-4eee-93d1-93e6f5837d6b
{
"query": {"bool":{"should":[],"must":[{"match_all":{}}]}},
"filter": {
"bool": {
"must": [
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]
}
}
}
Using a query
POST /backup/entity/_search?routing=39cd0b95-efc3-4eee-93d1-93e6f5837d6b
{
"query": {"bool":{"should":[],"must":[
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]}}
}
Like I said, the second method where I don't use a Filter at all takes mere milliseconds, while the first query takes almost 4 seconds. This seems completely backwards from what the documentation says. They say that the Filter should actually be very quick and the Query should be the one that takes longer. So why am I seeing the exact opposite here?
Could it be something with my index mapping? If anyone has any idea why this is happening I would love to hear suggestions.
Thanks
The root filter element is actually another name for post_filter element. Somehow, it was supposed to be removed (the filter) in ES 1.1 but it slipped through and exists in 2.x versions as well.
It is removed completely in ES 5 though.
So, your first query is not a "filter" query. It's a query whose results are used afterwards (if applicable) in aggregations, and then the post_filter/filter is applied on the results. So you basically have a two steps process in there: https://www.elastic.co/guide/en/elasticsearch/reference/1.5/search-request-post-filter.html
More about its performance here:
While we have gained cacheability of the tag filter, we have potentially increased the cost of scoring significantly. Post filters are useful when you need aggregations to be unfiltered, but hits to be filtered. You should not be using post_filter (or its deprecated top-level synonym filter) if you do not have facets or aggregations.
A proper filter query is the following:
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [],
"must": [
{
"match_all": {}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]
}
}
}
}
}
A filter is faster. Your problem is that you include the match_all query in your filter case. This matches on all 2 billion of your documents. A set operation has to then be done against the filter to cull the set. Omit the query portion in your filter test and you'll see that the results are much faster.

elastic search where clause with constant rank?How to do this?

I'm new to elastic search. How to generate elastic search equivalent query for
select * from response where pnrno='sampleid'
I know we have to use 'filter' option in elastic search.but we do not need any ranking. (ranking can be constant) so how can I generate query for achieve this
you are correct , you can use filtered query with query clause empty and filters.Filtering a set of documents is to filter the sets upon which query acts to furthur filter/match and calculate relevance.Filters are like bool either match or reject(1/0).
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [{
"term": {
"FIELD": "VALUE"
}
}]
}
}
}
}
}
The usual way of achieving this is by using the constant_score query with an embedded term filter, like this:
{
"query": {
"constant_score": {
"filter": {
"term": {
"pnrno": "sampleid"
}
}
}
}
}

Resources