Combining 3 elasticsearch queries into one - elasticsearch

My basic problem is that i have three separate queries performing spatial,temporal and keyword search. i want to combine them into one query in a way as the following use case describes :
user enters a keyword for searching the document. the query returns certain documents. user then narrows down his search by spatial searching for which there is a spatial query and then further narrows down the results through temporal searching.
Keyword query
"query": {
"match" : { "metadata.o2r.title" : "geosciences" }
}
Spatial query
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_shape": {
"metadata.o2r.spatial.geometry": {
"shape": {
"type": "polygon",
"coordinates":
coords
},
"relation": "within"
}
}
}
}
}
}
temporal query
{
"query": {
"bool": {
"must": [{
"range": {
"metadata.o2r.temporal.begin": {
"from": lower
}
}
},
{
"range": {
"metadata.o2r.temporal.end": {
"to": upper
}
}
}
]
}
}
}
The basic idea is to provide documents with certain keywords for a given location for certain period of time through a single query
Combined Query
"query": {
"bool": {
"must": [ {
"match" : { "metadata.o2r.title" : "geosciences"
}
},
{
"filter": {
"geo_shape": {
"metadata.o2r.spatial.geometry": {
"shape": {
"type": "polygon",
"coordinates": coords
},
"relation": "within"
}
}
}
},
{
"range": {
"metadata.o2r.temporal.begin": {
"from": lower
}
}
},
{
"range": {
"metadata.o2r.temporal.end": {
"to": upper
}
}
}
]
}
}

i used a bool/match operator to combine all queries
{
"query": {
"bool": {
"must": [
{
"range": {
"metadata.o2r.temporal.begin": {
"from": from
}
}
},
{
"range": {
"metadata.o2r.temporal.end": {
"to": to
}
}
},
{
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_shape": {
"metadata.o2r.spatial.geometry": {
"shape": {
"type": "polygon",
"coordinates": coords
},
"relation": "within"
}
}
}
}
},
{
"match" : { "metadata.o2r.title" : "geosciences" }
}
]
}
}
}

Related

ElasticSearch should with nested and bool must_not exists

With the following mapping:
"categories": {
"type": "nested",
"properties": {
"category": {
"type": "integer"
},
"score": {
"type": "float"
}
}
},
I want to use the categories field to return documents that either:
have a score above a threshold in a given category, or
do not have the categories field
This is my query:
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
<id>
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "categories"
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}
It correctly returns documents both with and without the categories field, and orders the results so the ones I want are first, but it doesn't filter the results having score below the 0.5 threshold.
Great question.
That is because categories is not exactly a field from the elasticsearch point of view[a field on which inverted index is created and used for querying/searching] but categories.category and categories.score is.
As a result categories being not found in any document, which is actually true for all the documents, you observe the result what you see.
Modify the query to the below and you'd see your use-case working correctly.
POST <your_index_name>/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
"100"
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [ <----- Note this
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "categories.category"
}
},
{
"exists": {
"field": "categories.score"
}
}
]
}
}
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}

Find distinct/unique people without a birthday or have a birthday earlier than 3/1/1963

We have some employees and needed to find those we haven't entered their birthday or are born before 3/1/1963:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [{ "exists": { "field": "birthday" } }]
}
},
{
"bool": {
"filter": [{ "range": {"birthday": { "lte": 19630301 }} }]
}
}
]
}
}
}
We now need to get distinct names...we only want 1 Jason or 1 Susan, etc. How do we apply a distinct filter to the "name" field while still filtering for the birthday as above? I've tried:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"bool": {
"filter": [
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
}
]
}
},
"aggs": {
"uniq_gender": {
"terms": {
"field": "name"
}
}
},
"from": 0,
"size": 25
}
but just get results with duplicate Jasons and Susans. At the bottom it will show me that there are 10 Susans and 12 Jasons. Not sure how to get unique ones.
EDIT:
My mapping is very simple. The name field doesn't need to be keyword...can be text or anything else as it is just a field that just gets returned in the query.
{
"mappings": {
"birthdays": {
"properties": {
"name": {
"type": "keyword"
},
"birthday": {
"type": "date",
"format": "basic_date"
}
}
}
}
}
Without knowing your mapping, I'm guessing that your field name is not analyzed and able to be used on terms aggregation properly.
I suggest you, use filtered aggregation:
{
"aggs": {
"filtered_employes": {
"filter": {
"bool": {
"must": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
},
"aggs": {
"filtered_employes_by_name": {
"terms": {
"field": "name"
}
}
}
}
}
}
In other hand your query is not correct your applying a should bool filter. Change it by must and the aggregation will return only results from employes with (missing birthday) and (born before date).

ElasticSearch: How to apply regular expression on indices

I am trying to restrict the return of a search query to only those indices that start with abc-* pattern.
I tried the following regex but it didn't work.
{
"sort": [
{
"timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"query": {
"filtered": {
"query": {
"query_string": {
"regexp": {
"index": "abc-*"
}
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": "now-24h"
}
}
}
]
}
}
}
}
}
Is it possible to use the indices query and apply regex on it?
even the following doesn't filter appropriately:
{
"sort": [
{
"timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"query": {
"filtered": {
"query": {
"indices" : {
"query" : { "regexp" : { "index" : "abc-.*" } }
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": "now-24h"
}
}
}
]
}
}
}
}
}
There's a much easier solution simply by means of specifying your index pattern in the URL directly:
POST /abc-*/_search
{
"sort": [
{
"timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": "now-24h"
}
}
}
]
}
}
}
}
}
Not sure but faced same problem in different case . I think problem with - in "abc-*" .
just replace - with space , it will work
"index": "abc *"
The index pattern in the URL only supports native expressions, not regex expressions. It does solve the problem though.

Filtered bool vs Bool query : elasticsearch

I have two queries in ES. Both have different turnaround time on the same set of documents. Both are doing the same thing conceptually. I have few doubts
1- What is the difference between these two?
2- Which one is better to use?
3- If both are same why they are performing differently?
1. Filtered bool
{
"from": 0,
"size": 5,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1987112602"
}
},
{
"term": {
"original_sender_address_number": "6870340319"
}
},
{
"range": {
"x_event_timestamp": {
"gte": "2016-07-01T00:00:00.000Z",
"lte": "2016-07-30T00:00:00.000Z"
}
}
}
]
}
}
}
},
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}
2. Simple Bool
{
"query": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1277478699"
}
},
{
"term": {
"original_sender_address_number": "8020564722"
}
},
{
"term": {
"cause_code": "573"
}
},
{
"range": {
"x_event_timestamp": {
"gt": "2016-07-13T13:51:03.749Z",
"lt": "2016-07-16T13:51:03.749Z"
}
}
}
]
}
},
"from": 0,
"size": 10,
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}
Mapping:
{
"ccp": {
"mappings": {
"type1": {
"properties": {
"original_sender_address_number": {
"type": "string"
},
"called_party_address_number": {
"type": "string"
},
"cause_code": {
"type": "string"
},
"x_event_timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
.
.
.
}
}
}
}
}
Update 1:
I tried bool/must query and bool/filter query on same set of data,but I found the strange behaviour
1-
bool/must query is able to search the desired document
{
"query": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "8701662243"
}
},
{
"term": {
"cause_code": "401"
}
}
]
}
}
}
2-
While bool/filter is not able to search the document. If I remove the second field condition it searches the same record with field2's value as 401.
{
"query": {
"bool": {
"filter": [
{
"term": {
"called_party_address_number": "8701662243"
}
},
{
"term": {
"cause_code": "401"
}
}
]
}
}
}
Update2:
Found a solution of suppressing scoring phase with bool/must query by wrapping it within "constant_score".
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1235235757"
}
},
{
"term": {
"cause_code": "304"
}
}
]
}
}
}
}
}
Record we are trying to match have "called_party_address_number": "1235235757" and "cause_code": "304".
The first one uses the old 1.x query/filter syntax (i.e. filtered queries have been deprecated in favor of bool/filter).
The second one uses the new 2.x syntax but not in a filter context (i.e. you're using bool/must instead of bool/filter). The query with 2.x syntax which is equivalent to your first query (i.e. which runs in a filter context without score calculation = faster) would be this one:
{
"query": {
"bool": {
"filter": [
{
"term": {
"called_party_address_number": "1277478699"
}
},
{
"term": {
"original_sender_address_number": "8020564722"
}
},
{
"term": {
"cause_code": "573"
}
},
{
"range": {
"x_event_timestamp": {
"gt": "2016-07-13T13:51:03.749Z",
"lt": "2016-07-16T13:51:03.749Z"
}
}
}
]
}
},
"from": 0,
"size": 10,
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}

Elasticsearch Aggregation Word Count with using Stopwords

I'm using elasticsearch to store my data. I want to count the words in my documents. But I want to see the result without the stopwords. For example; in my current result I see 'and' is my top word. But I want to remove it. Currently I have 3802 stopwords in my stopword.txt. I don't want any of them to be shown in the aggregation result. How can I do that? MY current query;
{
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "now-0d/d"
}
}
}
]
}
},
"aggs": {
"words": {
"terms": {
"size" : 0,
"field": "text"
}
}
}
}
The way I want query to work is;
{
"aggs": {
"filtered": {
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "now-0d/d"
}
}
}
]
}
},
"filter": {
"my_stop": {
"type": "stop",
"stopwords_path": "/work/projects/stop_words.txt"
}
},
"aggs": {
"words": {
"terms": {
"size" : 0,
"field": "text"
}
}
}
}
}
}
By the way, I have my stopwords list in my custom analyzer.But it doesn't work the way I want.

Resources