I want to update the status field to "IN_PROGRESS" from "FAILED" to all the docs in one of the ElasticSearch index that matches this below query and retrieve updated docs.
{
"query": {
"bool": {
"must": {
"match": { "status": "FAILED" }
},
"filter": [
{
"range": {
"count": { "gte": "2" }
}
},
{
"range": {
"updated": { "gte": "now-2h" }
}
}
]
}
}
}
I know I can achieve this by two queries (update_by_query to update and GET to retrieve all the updated docs). .The Problem is that I want to update and retrieve all the updated docs in a single query .
Is there any efficient way where I can perform this in a single query.
You can use below query with "_source": false which will return _id for all the documents.
POST multiapi/_search
{
"_source": false,
"query": {
"term": {
"status.keyword": {
"value": "FAILED"
}
}
}
}
From response you can get all the _ids and pass to the below Ids query.
POST multiapi/_update_by_query
{
"query": {
"ids": {
"values": ["M1BbcX4Bo1YkEVbN1wG1","NFBbcX4Bo1YkEVbN3gHm"]
}
},
"script": {
"source": "ctx._source['status'] = 'IN_PROGRESS'"
}
}
Also, if your index have large documents set then use search_after to retrive more then 10k documents.
Related
I'm querying an ElasticSearch database in which several applications are logging every change they make to a shared entity - each application is responsible for managing different aspects of this shared entity. The entity is persisted in a document-database, but each change is persisted in this ElasticSearch database.
I'm attempting to query for changes to a specific property (status) in order to track the lifecycle of these Product entities over time. I need to be able to dynamically answer questions like:
Over the last N weeks, what's the average time it took for a Product to move from status-"Created" to status-"Details Submitted"?
During a specific time range, what's the average time it took for a Product to move from status-"Reviewed" to status-"Available Online"?
How long did take for Products in Group-A to move from status-"Details Submitted" to status-"Reviewed"?
In SQL I might use the group-by clause and perhaps some sub-queries, like:
select avg(submitted), avg(reviewed)
from (
select id,
max(timestamp) as reviewed,
min(timestamp) as submitted,
count(*) as statusChanges
from changes
where (
(key = 'status' and previous = 'Created' and updated = 'Details Submitted')
or (key = 'status' and previous = 'Details Submitted' and updated = 'Reviewed')
) and timestamp > ? and timestamp < ? and group_id = ?
group by id
)
where statusChanges = 2
What's the best way to accomplish something comparable in ElasticSearch?
I've tried using a Composite Index, which works decently when I need to examine the specific dates of when each Product changed its status - since it allows pagination. However this doesn't allow any further sorting of results nor overall aggregation. You can only sort by the field you grouped-by and you can't aggregate across all products.
I've just recently come across the concept of a Transform index? Is that the best approach for aggregating the results of an aggregation? I haven't gotten access to try this out yet, but I'm attempting to formulate a potential Transform Index now and struggling a bit.
Here's the composite query was able to write for finding out how long each Product remained in a specific status, although I couldn't figure out how to get min_doc_count to work in a composite query...
// GET: https://<my-cluster-hostname>:9092/product-index/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match_phrase": {
"change.key": "status"
}
},
{
"match_phrase": {
"change.previousValue": "Created"
}
},
{
"match_phrase": {
"change.updatedValue": "Details Submitted"
}
}
]
}
},
{
"bool": {
"must": [
{
"match_phrase": {
"change.key": "status"
}
},
{
"match_phrase": {
"change.previousValue": "Details Submitted"
}
},
{
"match_phrase": {
"change.updatedValue": "Reviewed"
}
}
]
}
}
]
}
},
"aggs": {
"how-long-before-submitted-details-reviewed": {
"composite": {
"size": 20,
"after": {
"item": "<last_uuid_from_previous_page>"
},
"sources": [
{
"product": {
"terms": {
"field": "metadata.uuid.keyword",
"order": "desc"
}
}
}
]
},
"aggs": {
"detailsSubmitted": {
"min": {
"field": "timestamp"
}
},
"detailsReviewed": {
"max": {
"field": "timestamp"
}
}
}
}
}
}
Here's the Transform Index I'm thinking of submitting. But I wonder if there's a way of getting it to cover all status changes, or if instead I'll need to create an index for each status change like this and then filter/sort/aggregate over this Transform Index:
// PUT: https://<my-cluster-hostname>:9092/_transform/details-submitted-to-reviewed
{
"source": {
"index": "product-index",
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match_phrase": {
"change.key": "status"
}
},
{
"match_phrase": {
"change.previousValue": "Created"
}
},
{
"match_phrase": {
"change.updatedValue": "Details Submitted"
}
}
]
}
},
{
"bool": {
"must": [
{
"match_phrase": {
"change.key": "status"
}
},
{
"match_phrase": {
"change.previousValue": "Details Submitted"
}
},
{
"match_phrase": {
"change.updatedValue": "Reviewed"
}
}
]
}
}
]
}
}
},
"dest": {
"index": "details-submitted-to-reviewed"
},
"pivot": {
"group_by": {
"product-id": {
"terms": {
"field": "metadata.uuid.keyword"
}
}
},
"aggregations": {
"detailsSubmitted": {
"min": {
"field": "timestamp"
}
},
"detailsReviewed": {
"max": {
"field": "timestamp"
}
}
}
}
}
In elasticsearch, I am working well with Terms query to search multiple ID in one query,
my original terms query
{
"query": {
"terms": {
"Id": ["134","156"],
}
}
}
however, I need to add an extra condition like the following:
{
"query": {
"terms": {
"id": ["163","121","569","579"]
},
"range":{
"age":
{"gt":10}
}
}
}
the "id" field can be a long array.
You can combine both the queries using bool query
{
"query": {
"bool": {
"must": [
{
"terms": {
"Id": [
"134",
"156"
]
}
},
{
"range": {
"age": {
"gt": 10
}
}
}
]
}
}
}
I want to search documents based on the field of the result main query
For ex. Let say that my doc contains only two fields
userId
geopoint
I need a query that return me the document of a specific userId and documents of users that are around his geopoint
I didn't find a way to make this in one query and for now I making 2 queries (one to retrieve the doc of a user and one to retrieve users around his geopoint)
Thanks
UPDATE 1
The first query:
GET users\_search
{
"query": {
"term": {
"userId": "10250000075114"
}
}
}
Then I make the second query for users around it
GET users\_search
{
"query": {
"function_score": {
"query": {
"bool": {
"must_not": {
"term": {
"userId": "10250000075114"
}
}
}
},
"functions": [
{
"gauss": {
"rank": {
"origin": "0.8",
"offset": "0.05",
"scale": "0.1"
}
}
},
{
"gauss": {
"startPoint": {
"origin": "32.547484,34.95457",
"offset": "5km",
"scale": "10km"
}
}
},
{
"script_score": {
"script": "_score"
}
}
]
}
}
}
Where the startPoint in the second query is the startPoint result of the first
what you are looking for is the sub-query(which is present in RDBMS) but sub-queries are not present in Elasticsearch.
But you can use the filter on your user-ids and then find the users around only those users, please refer boolean query for more info and examples.
I have some documents in my index:
POST "/index/thing/_bulk" -s -d'
{ "index":{ "_id": 1 } }
{ "title":"One thing"}
{ "index":{ "_id": 2 } }
{ "title":"Second thing"}
{ "index":{ "_id": 3 } }
{ "title":"Three things"}
{ "index":{ "_id": 4 } }
{ "title":"And so fourth"}
{ "index":{ "_id": 5 } }
{ "title":"Five things"}
'
I also have documents which contain a users collection which are linked to the other documents (things) through the documents id attribute like so:
PUT /index/collection/1
{
"items": [
{"id": 1, "time_added": "2017-08-07T09:07:15.000Z", "condition": "fair"},
{"id": 3, "time_added": "2019-08-07T09:07:15.000Z", "condition": "good"},
{"id": 4, "time_added": "2016-08-07T09:07:15.000Z", "condition": "poor"}
]
}
I then use a terms lookup to get all the things in a users collection like so:
GET /documents/_search
{
"query" : {
"terms" : {
"_id" : {
"index" : "index",
"type" : "collection",
"id" : 1,
"path" : "items.id"
}
}
}
}
This works fine. I get the three documents in the collection and can search, sort and use aggregations like I want.
But is there a way to aggregate, filter and sort those documents based on the attributes (time_added or condition in this case) in the collection document? Say I wanted to sort based on time_added or filter for condition=="good" from the collection?
Maybe a script that can be applied to collection to sort or filter the items in there? It feels like this is getting pretty close to sql like left-join, so maybe Elastic Search is the wrong tool?
It looks like you need the nested data type
Taking your data as an example:
Without nested type:
POST collection/_bulk?filter_path=_
{"index":{}}
{"items":[{"id":11,"time_added":"2017-08-07T09:07:15.000Z","condition":"fair"},{"id":13,"time_added":"2019-08-07T09:07:15.000Z","condition":"good"},{"id":14,"time_added":"2016-08-07T09:07:15.000Z","condition":"poor"}]}
{"index":{}}
{"items":[{"id":21,"time_added":"2017-09-07T09:07:15.000Z","condition":"fair"},{"id":23,"time_added":"2019-09-07T09:07:15.000Z","condition":"good"},{"id":24,"time_added":"2016-09-07T09:07:15.000Z","condition":"poor"}]}
{"index":{}}
{"items":[{"id":31,"time_added":"2017-10-07T09:07:15.000Z","condition":"fair"},{"id":33,"time_added":"2019-10-07T09:07:15.000Z","condition":"good"},{"id":34,"time_added":"2016-10-07T09:07:15.000Z","condition":"poor"}]}
{"index":{}}
{"items":[{"id":41,"time_added":"2017-11-07T09:07:15.000Z","condition":"fair"},{"id":43,"time_added":"2019-11-07T09:07:15.000Z","condition":"good"},{"id":44,"time_added":"2016-11-07T09:07:15.000Z","condition":"poor"}]}
{"index":{}}
{"items":[{"id":51,"time_added":"2017-12-07T09:07:15.000Z","condition":"fair"},{"id":53,"time_added":"2019-12-07T09:07:15.000Z","condition":"good"},{"id":54,"time_added":"2016-12-07T09:07:15.000Z","condition":"poor"}]}
Query (you'd get incorrect results - expected one, got five):
GET collection/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"items.condition": {
"value": "good"
}
}
},
{
"range": {
"items.time_added": {
"lte": "2019-09-01"
}
}
}
]
}
}
}
Aggregation (incorect results - look at the first bucket "2016-08-01T00:00:00.000Z" - it contains 3 CONDITION sub-buckets with every condition type)
GET collection/_search
{
"size": 0,
"aggs": {
"DATE": {
"date_histogram": {
"field": "items.time_added",
"calendar_interval": "month"
},
"aggs": {
"CONDITION": {
"terms": {
"field": "items.condition.keyword",
"size": 10
}
}
}
}
}
}
With nested type
DELETE collection
PUT collection
{
"mappings": {
"properties": {
"items": {
"type": "nested"
}
}
}
}
# and POST the same data from above
Query (returns just one result)
GET collection/_search
{
"query": {
"nested": {
"path": "items",
"query": {
"bool": {
"must": [
{
"term": {
"items.condition": {
"value": "good"
}
}
},
{
"range": {
"items.time_added": {
"lte": "2019-09-01"
}
}
}
]
}
}
}
}
}
Aggregation (the first date bucket contains just one CONDITION sub-bucket)
GET collection/_search
{
"size": 0,
"aggs": {
"ITEMS": {
"nested": {
"path": "items"
},
"aggs": {
"DATE": {
"date_histogram": {
"field": "items.time_added",
"calendar_interval": "month"
},
"aggs": {
"CONDITION": {
"terms": {
"field": "items.condition.keyword",
"size": 10
}
}
}
}
}
}
}
}
Hope that helps :)
I've been facing some issues with multi field elasticsearch query. I am trying to query all the documents which matches the field called func_name to two hard coded strings, even though my index has documents with both these function names, but the query result is always fetching only one func_name. So far I have tried following queries.
1) Following returns only one function match, even though the documents have another function as well
GET /_search
{
"query": {
"multi_match": {
"query": "FEM_DS_GetTunerStatusInfo MDM_TunerStatusPrint",
"operator": "OR",
"fields": [
"func_name"
]
}
}
}
2) following intermittently gives me both the functions.
GET /_search
{
"query": {
"match": {
"func_name": {
"query": "MDM_TunerStatusPrint FEM_DS_GetTunerStatusInfo",
"operator": "or"
}
}
}
}
3) Following returns only one function match, even though the documents have another function as well
{
"query": {
"bool": {
"should": [
{ "match": { "func_name": "FEM_DS_GetTunerStatusInfo" }},
{ "match": { "func_name": "MDM_TunerStatusPrint" }}
]
}
}
}
Any help is much appreciated.
Thanks for your reply. Lets assume that I have following kind of documents in my elasticsearch. I want my search to return first two documents out of all as they matches my func_name.
{
"_index": "diag-178999",
"_source": {
"severity": "MIL",
"t_id": "03468500",
"p_id": "000007c6",
"func_name": "MDM_TunerStatusPrint",
"timestamp": "2017-06-01T02:04:51.000Z"
}
},
{
"_index": "diag-344563",
"_source": {
"t_id": "03468500",
"p_id": "000007c6",
"func_name": "FEM_DS_GetTunerStatusInfo",
"timestamp": "2017-07-20T02:04:51.000Z"
}
},
{
"_index": "diag-101010",
"_source": {
"severity": "MIL",
"t_id": "03468500",
"p_id": "000007c6",
"func_name": "some_func",
"timestamp": "2017-09-15T02:04:51.000Z"
}
The "two best ways" to request your ES is to filter by terms on a particular field or to aggregate your queries so that you can rename the field, apply multiple rules, and give a more understandable format to your response
See : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html and the other doc page is here, very useful :
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
In your case, you should do :
{
"from" : 0, "size" : 2,
"query": {
"filter": {
"bool": {
"must": {
"term": {
"func_name" : "FEM_DS_GetTunerStatusInfo OR MDM_TunerStatusPrint",
}
}
}
}
}
}
OR
"aggs": {
"aggregationName": {
"terms": {
"func_name" : "FEM_DS_GetTunerStatusInfo OR MDM_TunerStatusPrint"
}
}
}
}
The aggregation at the end is just here to show you how to do the same thing as your query filter. Let me know if it's working :)
Best regards
As I understand, you should use filtered query to match any document with one of the values of func_name mentioned above:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"terms": {
"func_name": [
"FEM_DS_GetTunerStatusInfo",
"MDM_TunerStatusPrint"
]
}
}
]
}
}
}
}
}
See:
Filtered Query, Temrs Query
UPDATE in ES 5.0:
{
"query": {
"bool": {
"must": [
{
"terms": {
"func_name": [
"FEM_DS_GetTunerStatusInfo",
"MDM_TunerStatusPrint"
]
}
}
]
}
}
}
See: this answer