update and retrieve in a single query elasticsearch - elasticsearch

I want to update the status field to "IN_PROGRESS" from "FAILED" to all the docs in one of the ElasticSearch index that matches this below query and retrieve updated docs.
{
"query": {
"bool": {
"must": {
"match": { "status": "FAILED" }
},
"filter": [
{
"range": {
"count": { "gte": "2" }
}
},
{
"range": {
"updated": { "gte": "now-2h" }
}
}
]
}
}
}
I know I can achieve this by two queries (update_by_query to update and GET to retrieve all the updated docs). .The Problem is that I want to update and retrieve all the updated docs in a single query .
Is there any efficient way where I can perform this in a single query.

You can use below query with "_source": false which will return _id for all the documents.
POST multiapi/_search
{
"_source": false,
"query": {
"term": {
"status.keyword": {
"value": "FAILED"
}
}
}
}
From response you can get all the _ids and pass to the below Ids query.
POST multiapi/_update_by_query
{
"query": {
"ids": {
"values": ["M1BbcX4Bo1YkEVbN1wG1","NFBbcX4Bo1YkEVbN3gHm"]
}
},
"script": {
"source": "ctx._source['status'] = 'IN_PROGRESS'"
}
}
Also, if your index have large documents set then use search_after to retrive more then 10k documents.

Related

What is the best way to aggregate the time between events in ElasticSearch?

I'm querying an ElasticSearch database in which several applications are logging every change they make to a shared entity - each application is responsible for managing different aspects of this shared entity. The entity is persisted in a document-database, but each change is persisted in this ElasticSearch database.
I'm attempting to query for changes to a specific property (status) in order to track the lifecycle of these Product entities over time. I need to be able to dynamically answer questions like:
Over the last N weeks, what's the average time it took for a Product to move from status-"Created" to status-"Details Submitted"?
During a specific time range, what's the average time it took for a Product to move from status-"Reviewed" to status-"Available Online"?
How long did take for Products in Group-A to move from status-"Details Submitted" to status-"Reviewed"?
In SQL I might use the group-by clause and perhaps some sub-queries, like:
select avg(submitted), avg(reviewed)
from (
select id,
max(timestamp) as reviewed,
min(timestamp) as submitted,
count(*) as statusChanges
from changes
where (
(key = 'status' and previous = 'Created' and updated = 'Details Submitted')
or (key = 'status' and previous = 'Details Submitted' and updated = 'Reviewed')
) and timestamp > ? and timestamp < ? and group_id = ?
group by id
)
where statusChanges = 2
What's the best way to accomplish something comparable in ElasticSearch?
I've tried using a Composite Index, which works decently when I need to examine the specific dates of when each Product changed its status - since it allows pagination. However this doesn't allow any further sorting of results nor overall aggregation. You can only sort by the field you grouped-by and you can't aggregate across all products.
I've just recently come across the concept of a Transform index? Is that the best approach for aggregating the results of an aggregation? I haven't gotten access to try this out yet, but I'm attempting to formulate a potential Transform Index now and struggling a bit.
Here's the composite query was able to write for finding out how long each Product remained in a specific status, although I couldn't figure out how to get min_doc_count to work in a composite query...
// GET: https://<my-cluster-hostname>:9092/product-index/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match_phrase": {
"change.key": "status"
}
},
{
"match_phrase": {
"change.previousValue": "Created"
}
},
{
"match_phrase": {
"change.updatedValue": "Details Submitted"
}
}
]
}
},
{
"bool": {
"must": [
{
"match_phrase": {
"change.key": "status"
}
},
{
"match_phrase": {
"change.previousValue": "Details Submitted"
}
},
{
"match_phrase": {
"change.updatedValue": "Reviewed"
}
}
]
}
}
]
}
},
"aggs": {
"how-long-before-submitted-details-reviewed": {
"composite": {
"size": 20,
"after": {
"item": "<last_uuid_from_previous_page>"
},
"sources": [
{
"product": {
"terms": {
"field": "metadata.uuid.keyword",
"order": "desc"
}
}
}
]
},
"aggs": {
"detailsSubmitted": {
"min": {
"field": "timestamp"
}
},
"detailsReviewed": {
"max": {
"field": "timestamp"
}
}
}
}
}
}
Here's the Transform Index I'm thinking of submitting. But I wonder if there's a way of getting it to cover all status changes, or if instead I'll need to create an index for each status change like this and then filter/sort/aggregate over this Transform Index:
// PUT: https://<my-cluster-hostname>:9092/_transform/details-submitted-to-reviewed
{
"source": {
"index": "product-index",
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match_phrase": {
"change.key": "status"
}
},
{
"match_phrase": {
"change.previousValue": "Created"
}
},
{
"match_phrase": {
"change.updatedValue": "Details Submitted"
}
}
]
}
},
{
"bool": {
"must": [
{
"match_phrase": {
"change.key": "status"
}
},
{
"match_phrase": {
"change.previousValue": "Details Submitted"
}
},
{
"match_phrase": {
"change.updatedValue": "Reviewed"
}
}
]
}
}
]
}
}
},
"dest": {
"index": "details-submitted-to-reviewed"
},
"pivot": {
"group_by": {
"product-id": {
"terms": {
"field": "metadata.uuid.keyword"
}
}
},
"aggregations": {
"detailsSubmitted": {
"min": {
"field": "timestamp"
}
},
"detailsReviewed": {
"max": {
"field": "timestamp"
}
}
}
}
}

Elasticsearch how can perform a "TERMS" AND "RANGE" query together

In elasticsearch, I am working well with Terms query to search multiple ID in one query,
my original terms query
{
"query": {
"terms": {
"Id": ["134","156"],
}
}
}
however, I need to add an extra condition like the following:
{
"query": {
"terms": {
"id": ["163","121","569","579"]
},
"range":{
"age":
{"gt":10}
}
}
}
the "id" field can be a long array.
You can combine both the queries using bool query
{
"query": {
"bool": {
"must": [
{
"terms": {
"Id": [
"134",
"156"
]
}
},
{
"range": {
"age": {
"gt": 10
}
}
}
]
}
}
}

Elasticsearch find documents based on result of a main query

I want to search documents based on the field of the result main query
For ex. Let say that my doc contains only two fields
userId
geopoint
I need a query that return me the document of a specific userId and documents of users that are around his geopoint
I didn't find a way to make this in one query and for now I making 2 queries (one to retrieve the doc of a user and one to retrieve users around his geopoint)
Thanks
UPDATE 1
The first query:
GET users\_search
{
"query": {
"term": {
"userId": "10250000075114"
}
}
}
Then I make the second query for users around it
GET users\_search
{
"query": {
"function_score": {
"query": {
"bool": {
"must_not": {
"term": {
"userId": "10250000075114"
}
}
}
},
"functions": [
{
"gauss": {
"rank": {
"origin": "0.8",
"offset": "0.05",
"scale": "0.1"
}
}
},
{
"gauss": {
"startPoint": {
"origin": "32.547484,34.95457",
"offset": "5km",
"scale": "10km"
}
}
},
{
"script_score": {
"script": "_score"
}
}
]
}
}
}
Where the startPoint in the second query is the startPoint result of the first
what you are looking for is the sub-query(which is present in RDBMS) but sub-queries are not present in Elasticsearch.
But you can use the filter on your user-ids and then find the users around only those users, please refer boolean query for more info and examples.

Filter and sort based on attributes in Terms lookup document in Elastic Search

I have some documents in my index:
POST "/index/thing/_bulk" -s -d'
{ "index":{ "_id": 1 } }
{ "title":"One thing"}
{ "index":{ "_id": 2 } }
{ "title":"Second thing"}
{ "index":{ "_id": 3 } }
{ "title":"Three things"}
{ "index":{ "_id": 4 } }
{ "title":"And so fourth"}
{ "index":{ "_id": 5 } }
{ "title":"Five things"}
'
I also have documents which contain a users collection which are linked to the other documents (things) through the documents id attribute like so:
PUT /index/collection/1
{
"items": [
{"id": 1, "time_added": "2017-08-07T09:07:15.000Z", "condition": "fair"},
{"id": 3, "time_added": "2019-08-07T09:07:15.000Z", "condition": "good"},
{"id": 4, "time_added": "2016-08-07T09:07:15.000Z", "condition": "poor"}
]
}
I then use a terms lookup to get all the things in a users collection like so:
GET /documents/_search
{
"query" : {
"terms" : {
"_id" : {
"index" : "index",
"type" : "collection",
"id" : 1,
"path" : "items.id"
}
}
}
}
This works fine. I get the three documents in the collection and can search, sort and use aggregations like I want.
But is there a way to aggregate, filter and sort those documents based on the attributes (time_added or condition in this case) in the collection document? Say I wanted to sort based on time_added or filter for condition=="good" from the collection?
Maybe a script that can be applied to collection to sort or filter the items in there? It feels like this is getting pretty close to sql like left-join, so maybe Elastic Search is the wrong tool?
It looks like you need the nested data type
Taking your data as an example:
Without nested type:
POST collection/_bulk?filter_path=_
{"index":{}}
{"items":[{"id":11,"time_added":"2017-08-07T09:07:15.000Z","condition":"fair"},{"id":13,"time_added":"2019-08-07T09:07:15.000Z","condition":"good"},{"id":14,"time_added":"2016-08-07T09:07:15.000Z","condition":"poor"}]}
{"index":{}}
{"items":[{"id":21,"time_added":"2017-09-07T09:07:15.000Z","condition":"fair"},{"id":23,"time_added":"2019-09-07T09:07:15.000Z","condition":"good"},{"id":24,"time_added":"2016-09-07T09:07:15.000Z","condition":"poor"}]}
{"index":{}}
{"items":[{"id":31,"time_added":"2017-10-07T09:07:15.000Z","condition":"fair"},{"id":33,"time_added":"2019-10-07T09:07:15.000Z","condition":"good"},{"id":34,"time_added":"2016-10-07T09:07:15.000Z","condition":"poor"}]}
{"index":{}}
{"items":[{"id":41,"time_added":"2017-11-07T09:07:15.000Z","condition":"fair"},{"id":43,"time_added":"2019-11-07T09:07:15.000Z","condition":"good"},{"id":44,"time_added":"2016-11-07T09:07:15.000Z","condition":"poor"}]}
{"index":{}}
{"items":[{"id":51,"time_added":"2017-12-07T09:07:15.000Z","condition":"fair"},{"id":53,"time_added":"2019-12-07T09:07:15.000Z","condition":"good"},{"id":54,"time_added":"2016-12-07T09:07:15.000Z","condition":"poor"}]}
Query (you'd get incorrect results - expected one, got five):
GET collection/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"items.condition": {
"value": "good"
}
}
},
{
"range": {
"items.time_added": {
"lte": "2019-09-01"
}
}
}
]
}
}
}
Aggregation (incorect results - look at the first bucket "2016-08-01T00:00:00.000Z" - it contains 3 CONDITION sub-buckets with every condition type)
GET collection/_search
{
"size": 0,
"aggs": {
"DATE": {
"date_histogram": {
"field": "items.time_added",
"calendar_interval": "month"
},
"aggs": {
"CONDITION": {
"terms": {
"field": "items.condition.keyword",
"size": 10
}
}
}
}
}
}
With nested type
DELETE collection
PUT collection
{
"mappings": {
"properties": {
"items": {
"type": "nested"
}
}
}
}
# and POST the same data from above
Query (returns just one result)
GET collection/_search
{
"query": {
"nested": {
"path": "items",
"query": {
"bool": {
"must": [
{
"term": {
"items.condition": {
"value": "good"
}
}
},
{
"range": {
"items.time_added": {
"lte": "2019-09-01"
}
}
}
]
}
}
}
}
}
Aggregation (the first date bucket contains just one CONDITION sub-bucket)
GET collection/_search
{
"size": 0,
"aggs": {
"ITEMS": {
"nested": {
"path": "items"
},
"aggs": {
"DATE": {
"date_histogram": {
"field": "items.time_added",
"calendar_interval": "month"
},
"aggs": {
"CONDITION": {
"terms": {
"field": "items.condition.keyword",
"size": 10
}
}
}
}
}
}
}
}
Hope that helps :)

elasticsearch multi field query is not working as expected

I've been facing some issues with multi field elasticsearch query. I am trying to query all the documents which matches the field called func_name to two hard coded strings, even though my index has documents with both these function names, but the query result is always fetching only one func_name. So far I have tried following queries.
1) Following returns only one function match, even though the documents have another function as well
GET /_search
{
"query": {
"multi_match": {
"query": "FEM_DS_GetTunerStatusInfo MDM_TunerStatusPrint",
"operator": "OR",
"fields": [
"func_name"
]
}
}
}
2) following intermittently gives me both the functions.
GET /_search
{
"query": {
"match": {
"func_name": {
"query": "MDM_TunerStatusPrint FEM_DS_GetTunerStatusInfo",
"operator": "or"
}
}
}
}
3) Following returns only one function match, even though the documents have another function as well
{
"query": {
"bool": {
"should": [
{ "match": { "func_name": "FEM_DS_GetTunerStatusInfo" }},
{ "match": { "func_name": "MDM_TunerStatusPrint" }}
]
}
}
}
Any help is much appreciated.
Thanks for your reply. Lets assume that I have following kind of documents in my elasticsearch. I want my search to return first two documents out of all as they matches my func_name.
{
"_index": "diag-178999",
"_source": {
"severity": "MIL",
"t_id": "03468500",
"p_id": "000007c6",
"func_name": "MDM_TunerStatusPrint",
"timestamp": "2017-06-01T02:04:51.000Z"
}
},
{
"_index": "diag-344563",
"_source": {
"t_id": "03468500",
"p_id": "000007c6",
"func_name": "FEM_DS_GetTunerStatusInfo",
"timestamp": "2017-07-20T02:04:51.000Z"
}
},
{
"_index": "diag-101010",
"_source": {
"severity": "MIL",
"t_id": "03468500",
"p_id": "000007c6",
"func_name": "some_func",
"timestamp": "2017-09-15T02:04:51.000Z"
}
The "two best ways" to request your ES is to filter by terms on a particular field or to aggregate your queries so that you can rename the field, apply multiple rules, and give a more understandable format to your response
See : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html and the other doc page is here, very useful :
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
In your case, you should do :
{
"from" : 0, "size" : 2,
"query": {
"filter": {
"bool": {
"must": {
"term": {
"func_name" : "FEM_DS_GetTunerStatusInfo OR MDM_TunerStatusPrint",
}
}
}
}
}
}
OR
"aggs": {
"aggregationName": {
"terms": {
"func_name" : "FEM_DS_GetTunerStatusInfo OR MDM_TunerStatusPrint"
}
}
}
}
The aggregation at the end is just here to show you how to do the same thing as your query filter. Let me know if it's working :)
Best regards
As I understand, you should use filtered query to match any document with one of the values of func_name mentioned above:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"terms": {
"func_name": [
"FEM_DS_GetTunerStatusInfo",
"MDM_TunerStatusPrint"
]
}
}
]
}
}
}
}
}
See:
Filtered Query, Temrs Query
UPDATE in ES 5.0:
{
"query": {
"bool": {
"must": [
{
"terms": {
"func_name": [
"FEM_DS_GetTunerStatusInfo",
"MDM_TunerStatusPrint"
]
}
}
]
}
}
}
See: this answer

Resources