filtering on 2 values of same field - elasticsearch

I have a status field, which can have one of the following values,
I can filter for data which have status completed. I can also see data which has ongoing.
But I want to display the data which have status completed and ongoing at the same time.
But I don't know how to add filters for 2 values on a single field.
How can I achieve what I want ?
EDIT - Thanks for answers. But that is not what i wanted.
Like here I have filtered for status:completed, I want to filter for 2 values in this exact way.
I know I can edit this filter and , and use your queries, But I need a simple way to do this(query way is complex), as I have to show it to my marketing team and they don't have any idea about queries. I need to convince them.

If I understand your question correctly, you want to perform an aggregation on 2 values of a field.
This should be possible with a query similar to this one with a terms query:
{
"size" : 0,
"query" : {
"bool" : {
"must" : [ {
"terms" : {
"status" : [ "completed", "unpaid" ]
}
} ]
}
},
"aggs" : {
"freqs" : {
"terms" : {
"field" : "status"
}
}
}
}
This will give a result like this one:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"freqs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "unpaid",
"doc_count" : 4
}, {
"key" : "completed",
"doc_count" : 1
} ]
}
}
}
Here is my toy mapping definition:
{
"bookings" : {
"properties" : {
"status" : {
"type" : "keyword"
}
}
}
}

You need a filter in aggregation.
{
"size": 0,
"aggs": {
"agg_name": {
"filter": {
"bool": {
"should": [
{
"terms": {
"status": [
"completed",
"ongoing"
]
}
}
]
}
}
}
}
}
Use the above query to get results like this:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"agg_name": {
"doc_count": 6
}
}
}
The result what you want is the doc_count

For your reference bool query in elasticsearch, should it's like OR conditions,
{
"query":{
"bool":{
"should":[
{"must":{"status":"completed"}},
{"must":{"status":"ongoing"}}
]
}
},
"aggs" : {
"booking_status" : {
"terms" : {
"field" : "status"
}
}
}
}

Related

What is the peformance impact using multiple query in search then msearch in Elasticsearch

I want to co-relate query and responses. For example, 10 responses should be returned for 10 queries.
Msearch (_msearch) satisfy the need for me as it returns the empty results even if query doesn't match. But I believe Msearch lower in performance compared to search (_search) request in which doesn't return the number of responses as number of queries
Questions:
Is there any performance impact between Msearch vs search (with bool must query as below)
How to achieve number of request = number of responses in search query?
Multiple query using search with bool should.
GET /index1/_search
{
"from": 0,
"size": 10,
"sort": [
{
"created_date": {
"order": "desc"
}
}
],
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"title": {
"value": "Title 1"
}
}
},
{
"exists": {
"field": "first_name"
}
},
{
"term": {
"field_name": {
"value": "Sample title 1"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"title": {
"value": "Title 2"
}
}
},
{
"exists": {
"field": "last_name"
}
},
{
"term": {
"field_name": {
"value": "Sample title 2"
}
}
}
]
}
}
]
}
}
}
Response:
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 2,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
Multiple queries using Msearch
GET index1/_msearch
{}
{"from":0,"size":10,"sort":[{"created_date":{"order":"desc"}}],"query":{"bool":{"must":[{"term":{"title":{"value":"Title 1"}}},{"exists":{"field":"first_name"}},{"term":{"field_name":{"value":"Sample title 1"}}}]}}}
{}
{"from":0,"size":10,"sort":[{"created_date":{"order":"desc"}}],"query":{"bool":{"must":[{"term":{"title":{"value":"Title 2"}}},{"exists":{"field":"last_name"}},{"term":{"field_name":{"value":"Sample title 2"}}}]}}}
Response:
{
"took" : 23,
"responses" : [
{
"took" : 21,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 2,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"status" : 200
},
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 2,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"status" : 200
}
]
}

Bucket Script Aggregation - Elastic Search

I'm trying to build a query at Elastic Search, in order to get the difference of two values:
Here's the code I'm using:
GET /monitora/_search
{
"size":0,
"aggs": {
"CALC_DIFF": {
"filters": {
"filters": {
"FTS_callback": {"term":{ "msgType": "panorama_fts"}},
"FTS_position": {"term":{ "msgType": "panorama_position"}}
}
},
"aggs": {
"subtract": {
"bucket_script": {
"buckets_path": {
"PCountCall": "_count",
"PcountPos":"_count"
},
"script": "params.PCountCall - params.PcountPos"
}
}
}
}
}
}
And this is what I get back when I run it:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"CALC_DIFF" : {
"buckets" : {
"FTS_callback" : {
"doc_count" : 73530,
"subtract" : {
"value" : 0.0
}
},
"FTS_position" : {
"doc_count" : 156418,
"subtract" : {
"value" : 0.0
}
}
}
}
}
}
However, instead of getting the subtraction inside these buckets (which will always be zero), I was looking for the subtraction of the counts on each bucket, which would return me (73530 - 156418) following this example.
After that, I would like to display the result as a "metric" visualization element in Kibana. Is it possible?
Could anyone give me a hand to get it right?
Thanks in advance!

What is the difference in these elasticsearch queries?

I have the following elasticsearch query that returns plenty of results.
{
"query": {
"multi_match": {
"query": "swartz",
"fields": ["notes"]
}
},
"size": 20,
"from": 0,
"sort": {
"last_modified_date": {
"order": "desc"
}
}
}
I'm trying to redo it as a bool query so I can add should and must_not, but am getting no results and I'm not sure why.
{
"query": {
"bool": {
"must": [
{"term": { "notes": "swartz" }}
]
}
},
"size": 20,
"from": 0,
"sort": {
"last_modified_date": {
"order": "desc"
}
}
}
Instead of results, what I do get is this.
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 5,
"skipped" : 0,
"failed" : 1,
"failures" : [
{
"shard" : 0,
"index" : ".kibana_1",
"node" : "E2fjoon_Smm5m7LFcQp9XQ",
"reason" : {
"type" : "query_shard_exception",
"reason" : "No mapping found for [last_modified_date] in order to sort on",
"index_uuid" : "0pZdhm_nRXWiWGcqFgvvHQ",
"index" : ".kibana_1"
}
}
]
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
First, I'm not sure why I get results and it orders properly with the first query, and secondly, even if I take the sort out of the second query I still get no results.
At first you use a match query will look any occurrence of "swartz" somewhere in the content of "notes".
In a SQL world it's something like :
where notes ilike "%swartz%"
In the second query you use a term query which will look for a perfect equality in the field.
In SQL :
where "notes"=="swartz"
It could probably explain your behavior
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

How to use composite aggregation with a single bucket

The following composite aggregation query
{
"query": {
"range": {
"orderedAt": {
"gte": 1591315200000,
"lte": 1591438881000
}
}
},
"size": 0,
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{
"aggregation_target": {
"terms": {
"field": "supplierId"
}
}
}
]
},
"aggs": {
"aggregated_hits": {
"top_hits": {}
},
"filter": {
"bucket_selector": {
"buckets_path": {
"doc_count": "_count"
},
"script": "params.doc_count > 2"
}
}
}
}
}
}
returns something like below.
{
"took" : 67,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 34,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_buckets" : {
"after_key" : {
"aggregation_target" : "0HQI2G2HG00100G8"
},
"buckets" : [
{
"key" : {
"aggregation_target" : "0HQI2G0K000100G8"
},
"doc_count" : 4,
"aggregated_hits" : {...}
},
{
"key" : {
"aggregation_target" : "0HQI2G18G00100G8"
},
"doc_count" : 11,
"aggregated_hits" : {...}
},
{
"key" : {
"aggregation_target" : "0HQI2G2HG00100G8"
},
"doc_count" : 16,
"aggregated_hits" : {...}
}
]
}
}
}
The aggregated results are put into buckets based on the condition set in the query.
Is there any way to put them in a single bucket and paginate thought the whole result(i.e. 31 documents in this case)?
I don't think you can. A doc's context doesn't include information about other docs unless you perform a cardinality, scripted_metric or terms aggregation. Also, once you bucket your docs based on the supplierId, it'd sort of defeat the purpose of aggregating in the first place...
What you wrote above is as good as it gets and you'll have to combine the aggregated_hits within some post processing step.

Is it possible with aggregation to amalgamate all values of an array property from all grouped documents into the coalesced document?

I have documents with the format similar to the following:
[
{
"name": "fred",
"title": "engineer",
"division_id": 20
"skills": [
"walking",
"talking"
]
},
{
"name": "ed",
"title": "ticket-taker",
"division_id": 20
"skills": [
"smiling"
]
}
]
I would like to run an aggs query that would show the complete set of skills for the division: ie,
{
"aggs":{
"distinct_skills":{
"cardinality":{
"field":"division_id"
}
}
},
"_source":{
"includes":[
"division_id",
"skills"
]
}
}
.. so that the resulting hit would look like:
{
"division_id": 20,
"skills": [
"walking",
"talking",
"smiling"
]
}
I know I can retrieve inner_hits and iterate through the list and amalgamate values "manually". I assume it would perform better if I could do it a query.
Just pipe two Terms Aggregation queries as shown below:
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"my_division_ids": {
"terms": {
"field": "division_id",
"size": 10
},
"aggs": {
"my_skills": {
"terms": {
"field": "skills", <---- If it is not keyword field use `skills.keyword` field if using dynamic mapping.
"size": 10
}
}
}
}
}
}
Below is the sample response:
Response:
{
"took" : 490,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_division_ids" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 20, <---- division_id
"doc_count" : 2,
"my_skills" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ <---- Skills
{
"key" : "smiling",
"doc_count" : 1
},
{
"key" : "talking",
"doc_count" : 1
},
{
"key" : "walking",
"doc_count" : 1
}
]
}
}
]
}
}
}
Hope this helps!

Resources