What is the difference in these elasticsearch queries? - elasticsearch

I have the following elasticsearch query that returns plenty of results.
{
"query": {
"multi_match": {
"query": "swartz",
"fields": ["notes"]
}
},
"size": 20,
"from": 0,
"sort": {
"last_modified_date": {
"order": "desc"
}
}
}
I'm trying to redo it as a bool query so I can add should and must_not, but am getting no results and I'm not sure why.
{
"query": {
"bool": {
"must": [
{"term": { "notes": "swartz" }}
]
}
},
"size": 20,
"from": 0,
"sort": {
"last_modified_date": {
"order": "desc"
}
}
}
Instead of results, what I do get is this.
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 5,
"skipped" : 0,
"failed" : 1,
"failures" : [
{
"shard" : 0,
"index" : ".kibana_1",
"node" : "E2fjoon_Smm5m7LFcQp9XQ",
"reason" : {
"type" : "query_shard_exception",
"reason" : "No mapping found for [last_modified_date] in order to sort on",
"index_uuid" : "0pZdhm_nRXWiWGcqFgvvHQ",
"index" : ".kibana_1"
}
}
]
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
First, I'm not sure why I get results and it orders properly with the first query, and secondly, even if I take the sort out of the second query I still get no results.

At first you use a match query will look any occurrence of "swartz" somewhere in the content of "notes".
In a SQL world it's something like :
where notes ilike "%swartz%"
In the second query you use a term query which will look for a perfect equality in the field.
In SQL :
where "notes"=="swartz"
It could probably explain your behavior
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

Related

What is the peformance impact using multiple query in search then msearch in Elasticsearch

I want to co-relate query and responses. For example, 10 responses should be returned for 10 queries.
Msearch (_msearch) satisfy the need for me as it returns the empty results even if query doesn't match. But I believe Msearch lower in performance compared to search (_search) request in which doesn't return the number of responses as number of queries
Questions:
Is there any performance impact between Msearch vs search (with bool must query as below)
How to achieve number of request = number of responses in search query?
Multiple query using search with bool should.
GET /index1/_search
{
"from": 0,
"size": 10,
"sort": [
{
"created_date": {
"order": "desc"
}
}
],
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"title": {
"value": "Title 1"
}
}
},
{
"exists": {
"field": "first_name"
}
},
{
"term": {
"field_name": {
"value": "Sample title 1"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"title": {
"value": "Title 2"
}
}
},
{
"exists": {
"field": "last_name"
}
},
{
"term": {
"field_name": {
"value": "Sample title 2"
}
}
}
]
}
}
]
}
}
}
Response:
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 2,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
Multiple queries using Msearch
GET index1/_msearch
{}
{"from":0,"size":10,"sort":[{"created_date":{"order":"desc"}}],"query":{"bool":{"must":[{"term":{"title":{"value":"Title 1"}}},{"exists":{"field":"first_name"}},{"term":{"field_name":{"value":"Sample title 1"}}}]}}}
{}
{"from":0,"size":10,"sort":[{"created_date":{"order":"desc"}}],"query":{"bool":{"must":[{"term":{"title":{"value":"Title 2"}}},{"exists":{"field":"last_name"}},{"term":{"field_name":{"value":"Sample title 2"}}}]}}}
Response:
{
"took" : 23,
"responses" : [
{
"took" : 21,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 2,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"status" : 200
},
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 2,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"status" : 200
}
]
}

Finding number of files in latest batch using aggregations

I've asked this same question in the Elasticsearch forums, but haven't gotten an answer yet.
Here is the scenario:
I have an index that contains documents representing files uploaded to a file store. These documents are uploaded in batches and each doc is tagged with a batch_id field as well as the time they are uploaded. They also have a status field that tracks if the file pass/failed during the ingestion.
Some example docs:
{
"batch_id" : "a",
"file_name" : "file_1_from_batch_a",
"#timestamp" : "2021-10-12T18:12:54.331Z",
"status" : "success"
}
{
"batch_id" : "a",
"file_name" : "file_2_from_batch_a",
"#timestamp" : "2021-10-12T00:00:00.000Z",
"status" : "success"
}
{
"batch_id" : "b",
"file_name" : "file_1_from_batch_b",
"#timestamp" : "2021-10-13T18:13:00.000Z",
"status" : "failure"
}
{
"batch_id" : "b",
"file_name" : "file_2_from_batch_b",
"#timestamp" : "2021-10-13T18:10:22.450Z",
"status" : "failure"
}
I wish to perform an aggregation query over the index to find out how many failures have occurred in the latest batch of files.
Here's what I've come up with so far, but sadly its not giving the right answer
GET my-index/_search
{
"size": 0,
"aggs": {
"most_recent" : {
"terms": {
"field" : "#timestamp",
"order": { "_term": "desc" },
"size": 1
},
"aggs": {
"execution_id": {
"terms": {
"field": "batch_id.keyword"
},
"aggs": {
"failures": {
"filter": {"term": {"status.keyword": "failure"}}
}
}
}
}
}
}
}
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"most_recent" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 3,
"buckets" : [
{
"key" : 1634148780000,
"key_as_string" : "2021-10-13T18:13:00.000Z",
"doc_count" : 1,
"execution_id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "b",
"doc_count" : 1,
"failures" : {
"doc_count" : 1
}
}
]
}
}
]
}
}
}
The query is giving me the batch_id of the most recent batch (which is good), but is incorrectly telling me how many files in that batch failed (it should be 2).
I would appreciate any help on this!
You're aggregating on timestamp, always resulting in a doc_count of 1 unless 2 batches have the exact same timestamp.
As it is aggregating the batches on unique timestamps, you're not actually grouping them as batches based on batch_id first.
To prove this, change your terms query to not include the size parameter and you'll see that the result of the search will include 2 as and 2 bs "grouped" by timestamp.
"terms": {
"field" : "#timestamp",
"order": { "_term": "desc" }
}
That's the reason for it, but I can't currently test what is the working version so leaving this here as a draft.

Is it possible with aggregation to amalgamate all values of an array property from all grouped documents into the coalesced document?

I have documents with the format similar to the following:
[
{
"name": "fred",
"title": "engineer",
"division_id": 20
"skills": [
"walking",
"talking"
]
},
{
"name": "ed",
"title": "ticket-taker",
"division_id": 20
"skills": [
"smiling"
]
}
]
I would like to run an aggs query that would show the complete set of skills for the division: ie,
{
"aggs":{
"distinct_skills":{
"cardinality":{
"field":"division_id"
}
}
},
"_source":{
"includes":[
"division_id",
"skills"
]
}
}
.. so that the resulting hit would look like:
{
"division_id": 20,
"skills": [
"walking",
"talking",
"smiling"
]
}
I know I can retrieve inner_hits and iterate through the list and amalgamate values "manually". I assume it would perform better if I could do it a query.
Just pipe two Terms Aggregation queries as shown below:
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"my_division_ids": {
"terms": {
"field": "division_id",
"size": 10
},
"aggs": {
"my_skills": {
"terms": {
"field": "skills", <---- If it is not keyword field use `skills.keyword` field if using dynamic mapping.
"size": 10
}
}
}
}
}
}
Below is the sample response:
Response:
{
"took" : 490,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_division_ids" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 20, <---- division_id
"doc_count" : 2,
"my_skills" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ <---- Skills
{
"key" : "smiling",
"doc_count" : 1
},
{
"key" : "talking",
"doc_count" : 1
},
{
"key" : "walking",
"doc_count" : 1
}
]
}
}
]
}
}
}
Hope this helps!

filtering on 2 values of same field

I have a status field, which can have one of the following values,
I can filter for data which have status completed. I can also see data which has ongoing.
But I want to display the data which have status completed and ongoing at the same time.
But I don't know how to add filters for 2 values on a single field.
How can I achieve what I want ?
EDIT - Thanks for answers. But that is not what i wanted.
Like here I have filtered for status:completed, I want to filter for 2 values in this exact way.
I know I can edit this filter and , and use your queries, But I need a simple way to do this(query way is complex), as I have to show it to my marketing team and they don't have any idea about queries. I need to convince them.
If I understand your question correctly, you want to perform an aggregation on 2 values of a field.
This should be possible with a query similar to this one with a terms query:
{
"size" : 0,
"query" : {
"bool" : {
"must" : [ {
"terms" : {
"status" : [ "completed", "unpaid" ]
}
} ]
}
},
"aggs" : {
"freqs" : {
"terms" : {
"field" : "status"
}
}
}
}
This will give a result like this one:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"freqs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "unpaid",
"doc_count" : 4
}, {
"key" : "completed",
"doc_count" : 1
} ]
}
}
}
Here is my toy mapping definition:
{
"bookings" : {
"properties" : {
"status" : {
"type" : "keyword"
}
}
}
}
You need a filter in aggregation.
{
"size": 0,
"aggs": {
"agg_name": {
"filter": {
"bool": {
"should": [
{
"terms": {
"status": [
"completed",
"ongoing"
]
}
}
]
}
}
}
}
}
Use the above query to get results like this:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"agg_name": {
"doc_count": 6
}
}
}
The result what you want is the doc_count
For your reference bool query in elasticsearch, should it's like OR conditions,
{
"query":{
"bool":{
"should":[
{"must":{"status":"completed"}},
{"must":{"status":"ongoing"}}
]
}
},
"aggs" : {
"booking_status" : {
"terms" : {
"field" : "status"
}
}
}
}

Make Elasticsearch return the number of all documents on query

When I do a query Elasticsearch returns how many hits I get. Can I also get it to reply how many documents it has in total?
Here I've added the imaginary field sum_documents to the result. Does such thing exist, or to I have to make an extra query to fetch the sum?
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"sum_documents": 500,
"max_score" : null,
"hits" : [ ]
}
}
You can add a global aggregation in your query, and it will return the total document count in your search context (index/alias + type(s))
{
"query": {
"query_string": {
"query": "viking",
"default_operator": "AND"
}
},
"aggs": {
"harvester-test": {
"global": {}
}
}
}

Resources