ElasticSearch Delete Query - Filter with term and range - elasticsearch

I have the following query that I am trying to use to delete data from an ElasticSearch index.
{
"filter": {
"and": [
{
"range": {
"Time": {
"from": "20120101T000000",
"to": "20120331T000000"
}
}
},
{
"term": {
"Source": 1
}
}
]
}
}
I have tried to delete documents based on this query. This query returns results fine when I run it against the index. But when I try to run a delete command against the index, nothing happens.
I am not sure if I am constructiing the query wrong or what else.

You're using only a filter while the delete by query API probably needs a query. You can convert your filter to a query using a filtered query like this:
{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter": {
"and": [
{
"range": {
"Time": {
"from": "20120101T000000",
"to": "20120331T000000"
}
}
},
{
"term": {
"Source": 1
}
}
]
}
}
}
Otherwise you could convert your filter to a query using a bool query with two must clauses, so that you don't need a filtered query anymore. Anyway, I guess the filter approach is better since filters are faster.

Related

Deduplicate and perform composite aggregation on deduced result

I've an index in elastic search which contains data of daily transactions. Each doc has mainly three fields as below :
TxnId, Status, TxnType,userId
two documents can have same TxnIds.
I'm looking for a query that provides aggregation over status,TxnType for unique txnIds. Basically I'm looking for something like : select unique txnIds from user_table group by status,txnType.
I've a ES query which will dedup on TxnIds. I've another ES query which can perform composite aggregation on status and txnType. I want to do both things in Single query.
I tried collapse feature . I also tried cardinality and dedup features. But query is not giving correct output.:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"streamSource": 3
}
}
]
}
},
"collapse": {
"field": "txnId"
},
"aggs": {
"buckets": {
"composite": {
"size": 30,
"sources": [
{
"status": {
"terms": {
"field": "status"
}
}
},
{
"txnType": {
"terms": {
"field": "txnType"
}
}
}
]
}
}
}
}

Is it ok to use only filter query in elastic search

i have to query elastic search for some data and all my filters are drop down values as in they are exact matches only so i thought of using only the filter query and not any must or match query, so is there any problem with this kind of approach.
in the below example i am trying to get last 15 min data where L1 is any 1 of ("XYZ","CFG") and L2 is any 1 of ( "ABC","CDE")
My query looks like below :
{
"size": 20,
"sort": [
{
"eventTs": "desc"
}
],
"query": {
"bool": {
"filter": [
{
"range": {
"eventTs": {
"gte": "now-15m",
"lte": "now",
"format": "epoch_millis",
"boost": 1
}
}
},
{
"terms": {
"l1": [
"XYZ","CFG"
]
}
},
{
"terms": {
"l2":[
"ABC","CDE"
]
}
}
]
}
}
}
If you don't need _score which is used to show the relevant documents according to their score, you can use filter which is executed in much faster way(since calculation of score is disabled), and cached as well.
Must read query and filter context for in-depth understanding of these concepts.

update and retrieve in a single query elasticsearch

I want to update the status field to "IN_PROGRESS" from "FAILED" to all the docs in one of the ElasticSearch index that matches this below query and retrieve updated docs.
{
"query": {
"bool": {
"must": {
"match": { "status": "FAILED" }
},
"filter": [
{
"range": {
"count": { "gte": "2" }
}
},
{
"range": {
"updated": { "gte": "now-2h" }
}
}
]
}
}
}
I know I can achieve this by two queries (update_by_query to update and GET to retrieve all the updated docs). .The Problem is that I want to update and retrieve all the updated docs in a single query .
Is there any efficient way where I can perform this in a single query.
You can use below query with "_source": false which will return _id for all the documents.
POST multiapi/_search
{
"_source": false,
"query": {
"term": {
"status.keyword": {
"value": "FAILED"
}
}
}
}
From response you can get all the _ids and pass to the below Ids query.
POST multiapi/_update_by_query
{
"query": {
"ids": {
"values": ["M1BbcX4Bo1YkEVbN1wG1","NFBbcX4Bo1YkEVbN3gHm"]
}
},
"script": {
"source": "ctx._source['status'] = 'IN_PROGRESS'"
}
}
Also, if your index have large documents set then use search_after to retrive more then 10k documents.

not in search query in elasticsearch

I was wondering if there is a search query using the elasticsearch DSL for finding all the values except some that satisfy a certain condition on a dateOptionalTime field in the database.
The equivalent SQL query would be:
select * from index
where (not (date="0001-01-01T00:00:00+03:30"))
and (domain="somthing.com");
Yes, you can use bool/must_not to achieve this:
{
"query": {
"bool": {
"filter": {
"term": {
"domain": "something.com"
}
},
"must_not": {
"term": {
"date": "0001-01-01T00:00:00+03:30"
}
}
}
}
}

Elasticsearch aggregations on nested inner hits

I got a large amount of data in Elasticsearch. My douments have a nested field called "records" that contains a list of objects with several fields.
I want to be able to query specific objects from the records list, and therefore I use the inner_hits field in my query, but It doesn't help because aggregation uses size 0 so no results are returned.
I didn't succeed to make an aggregation work only for inner_hits, as aggregation returns results for all the objects inside records no matter the query.
This is the query I am using:
(Each document has first_timestamp and last_timestamp fields, and each object in the records list has a timestamp field)
curl -XPOST 'localhost:9200/_msearch?pretty' -H 'Content-Type: application/json' -d'
{
"index":[
"my_index"
],
"search_type":"count",
"ignore_unavailable":true
}
{
"size":0,
"query":{
"filtered":{
"query":{
"nested":{
"path":"records",
"query":{
"term":{
"records.data.field1":"value1"
}
},
"inner_hits":{}
}
},
"filter":{
"bool":{
"must":[
{
"range":{
"first_timestamp":{
"gte":1504548296273,
"lte":1504549196273,
"format":"epoch_millis"
}
}
}
],
}
}
}
},
"aggs":{
"nested_2":{
"nested":{
"path":"records"
},
"aggs":{
"2":{
"date_histogram":{
"field":"records.timestamp",
"interval":"1s",
"min_doc_count":1,
"extended_bounds":{
"min":1504548296273,
"max":1504549196273
}
}
}
}
}
}
}'
Your query is pretty complex.
To be short, here is your requested query:
{
"size": 0,
"aggregations": {
"nested_A": {
"nested": {
"path": "records"
},
"aggregations": {
"bool_aggregation_A": {
"filter": {
"bool": {
"must": [
{
"term": {
"records.data.field1": "value1"
}
}
]
}
},
"aggregations": {
"reverse_aggregation": {
"reverse_nested": {},
"aggregations": {
"bool_aggregation_B": {
"filter": {
"bool": {
"must": [
{
"range": {
"first_timestamp": {
"gte": 1504548296273,
"lte": 1504549196273,
"format": "epoch_millis"
}
}
}
]
}
},
"aggregations": {
"nested_B": {
"nested": {
"path": "records"
},
"aggregations": {
"my_histogram": {
"date_histogram": {
"field": "records.timestamp",
"interval": "1s",
"min_doc_count": 1,
"extended_bounds": {
"min": 1504548296273,
"max": 1504549196273
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Now, let me explain every step by aggregations' names:
size: 0 -> we are not interested in hits, only aggregations
nested_A -> data.field1 is under records so we dive our scope to records
bool_aggregation_A -> filter by data.field1: value1
reverse_aggregation -> first_timestamp is not in nested document, we need to scope out from records
bool_aggregation_B -> filter by first_timestamp range
nested_B -> now, we scope again into records for timestamp field (located under records)
my_histogram -> finally, aggregate date histogram by timestamp field
Inner_hits aggregation is not supported by elasticsearch. The reason behind it is that inner_hits is a very expensive operation and applying aggregation on inner_hits is like exponential increase in complexity of operation.
Here is the github link of the issue.
If you want aggregation on inner_hits you can probably use the following approach:
Make flexible query where you only get the required hit from elastic and aggregate over it. Repeat it multiple time to get all the hits and aggregate simultaneously. This approach may lead you with multiple search query which is not advisable.
You can make your application layer handle the aggregation logic by writing smart aggregation parser and run those parser on response from elasticsearch. This approach is a little better but you have an overhead of maintaining the parser according to changing needs.
I would personally recommend you to change your data-mapping style in elasticsearch so that you are able to run aggregation on it.
You can also check the code like this
PUT records
{
"mappings": {
"properties": {
"records": {
"type": "nested"
}
}
}
}
POST records/_doc
{
"records": [
{
"data": "test1",
"value": 1
},
{
"data": "test2",
"value": 2
}
]
}
GET records/_search
{
"size": 0,
"aggs": {
"all_nested_count": {
"nested": {
"path": "records"
},
"aggs": {
"bool_aggs": {
"filter": {
"bool": {
"must": [
{
"term": {
"records.data": "test2"
}
}
]
}
},
"aggs": {
"filtered_aggs": {
"sum": {
"field": "records.value"
}
}
}
}
}
}
}
}
Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/inner-hits.html

Resources