Getting SearchPhaseExecutionException using ElasticSearch Java Client - elasticsearch

I am using a filtered query with sort. When i run the query using the browser plugin, it runs fine. But when i use java client that ships with ElasticSearch, i get error
org.elasticsearch.action.search.SearchPhaseExecutionException: Failed
to execute phase [dfs], all shards failed; shardFailures
Here is the query thats being run
{
"from": 0,
"size": 50,
"query": {
"filtered": {
"query": {
"bool": {
"must": {
"bool": {
"should": [
{
"match": {
"_all": {
"query": "Happy Pharrel Williams",
"type": "boolean"
}
}
},
{
"flt": {
"fields": [
"name",
"artists",
"genre",
"albumName"
],
"like_text": "Happy Pharrel Williams"
}
}
]
}
}
}
},
"filter": {
"bool": {
"must": {
"or": {
"filters": [
{
"range": {
"releaseInfo.us": {
"from": null,
"to": "2015-07-22T23:16:12.852Z",
"include_lower": true,
"include_upper": true
}
}
},
{
"and": {
"filters": [
{
"missing": {
"field": "releaseInfo.us"
}
},
{
"range": {
"releaseInfo.WW": {
"from": null,
"to": "2015-07-22T23:16:12.851Z",
"include_lower": true,
"include_upper": true
}
}
}
]
}
}
]
}
}
}
}
}
},
"fields": [],
"sort": [
{
"popularity.US": {
"order": "asc",
"missing": 999
}
},
{
"_score": {}
}
] }
I understand that the error sounds like the field i am sorting on is missing in some of the indices. But i have provided the "missing" option in my sort and the query runs just fine when i run from ES browser head plugin.
Do you see anything wrong with the query structure or something else with Java Client ?

I was getting the exception because i was using a sort on a field that didn't exist in a certain number of indexed documents. I re-indexed all the documents and it worked.

Related

Continuous Kibana Transform is not updated as source index and has one checkpoint since creation

Kibana 7.7.1
I am creating a transform to do an aggregation on a true/false field per user's set of data.
on the source index document can be updated with updated status (true/false) and also new data are coming. Source index frequency of change can be more than 1 hour.
I am creating Transform sync on #timestamp
I created as many transforms as i can with all frequency and sync/delay possibilities and it never updated after creation and has only one checkpoint of creation.
I am not sure what am I doing wrong
here is the transform API I used
PUT _transform/mytransform
{
"source": {
"index": [
"mysource"
],
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": [
{
"match_phrase": {
"myflag": false
}
},
{
"range": {
"mydate": {
"lt": "now"
}
}
}
],
"should": [],
"must_not": []
}
}
},
"dest": {
"index": "mydest"
},
"frequency": "1h",
"sync": {
"time": {
"field": "#timestamp",
"delay": "24h"
}
},
"pivot": {
"group_by": {
"userID.keyword": {
"terms": {
"field": "userID.keyword"
}
}
},
"aggregations": {
"userDocID.keyword.value_count": {
"value_count": {
"field": "userDocID.keyword"
}
}
}
}
}

ElasticCloud - alert on disk usage using metricbeats

I'm struggling to understand how to define an alert for my hosts disk usage in elastic cloud.
The agent is installed on my different hosts with the "system" integration. Pretty sure this use metricbeats.
I can see this vizualisation here :
However the disk usage use a couple of field to get it's percentage :
system.fsstat.total_size.total
system.fsstat.total_size.used
When I review that part of the dashboard I end up with this :
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2022-05-12T08:47:46.895Z",
"lte": "2022-05-12T08:57:46.895Z",
"format": "strict_date_optional_time"
}
}
},
{
"bool": {
"must": [],
"filter": [
{
"bool": {
"should": [
{
"match_phrase": {
"data_stream.dataset": "system.fsstat"
}
}
],
"minimum_should_match": 1
}
}
],
"should": [],
"must_not": []
}
}
],
"filter": [],
"should": [],
"must_not": []
}
},
"aggs": {
"timeseries": {
"auto_date_histogram": {
"field": "#timestamp",
"buckets": 1
},
"aggs": {
"4e4dee91-4d1d-11e7-b5f2-2b7c1895bf32": {
"filter": {
"exists": {
"field": "system.fsstat.total_size.used"
}
},
"aggs": {
"docs": {
"top_hits": {
"size": 1,
"fields": [
"system.fsstat.total_size.used"
],
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
]
}
}
}
},
"57c96ee0-4d54-11e7-b5f2-2b7c1895bf32": {
"filter": {
"exists": {
"field": "system.fsstat.total_size.total"
}
},
"aggs": {
"docs": {
"top_hits": {
"size": 1,
"fields": [
"system.fsstat.total_size.total"
],
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
]
}
}
}
}
},
"meta": {
"timeField": "#timestamp",
"panelId": "4e4dc780-4d1d-11e7-b5f2-2b7c1895bf32",
"seriesId": "4e4dee90-4d1d-11e7-b5f2-2b7c1895bf32",
"intervalString": "600000ms",
"indexPatternString": "metrics-*",
"normalized": true
}
}
},
"runtime_mappings": {}
}
I want to create a threshold alert when the disk of any of my host reach, let's say 90%.
Threshold alert only takes one value, so I'm not able to create this alert.
Shoud I create a new field somewhere in metricbeats index or should I use a custom query alert ?
I'm quite new to ElasticCloud, I found a couple of solution using Python script etc but that seems a bit overkill for what I'm trying to achieve.
Hopefully someone will have a simple solution.

Elasticsearch multiple fields OR query

Here is an example record that I have stored in ES:
"taskCurateStatus": true,
"taskMigrateStatus": true,
"verifiedFields": 7,
"taskId": "abcdef123",
"operatorEmail": "test#test.com"
Example Query I'm making via /_search:
{
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"match": {
"msg.operator_email": "test#test.com"
}
}
{
"range": {
"#timestamp": {
"gte": "2017-03-05",
"lte": "2017-03-12"
}
}
}
]
}
},
"from": 0,
"size": 50
}
Basically I want to also filter by documents that have EITHER taskCurateStatus or taskMigrateStatus be true. Some messages have only one of them defined. I was thinking of using a should query but not sure how that would work with the match query. Any help would be appreciated. Thanks
you can add another boolean filter inside your must filter. This boolean filter can implemenet the should clause where you can compare the boolean flags with a should filter combining both the boolean check filters
{
"sort": [{
"#timestamp": {
"order": "desc"
}
}],
"query": {
"bool": {
"must": [{
"match": {
"msg.operator_email": "test#test.com"
}
}, {
"range": {
"#timestamp": {
"gte": "2017-03-05",
"lte": "2017-03-12"
}
}
}, {
"bool": {
"should": [{
"term": {
"taskCurateStatus": {
"value": true
}
}
}, {
"term": {
"taskMigrateStatus": {
"value": true
}
}
}]
}
}]
}
},
"from": 0,
"size": 50
}
Take a look at the above query and see if the helps
Thanks

Filtered aggregation query error

I am trying to run a filtered aggregation like below but getting error.
"Unknown key for a START_OBJECT in [associations]: [disabledDate]. Can anyone review the query and suggest any changes required.
STEPS in the query:
1. Query all documents with versionDate less than or equal to the given
date.
2. Aggregate on Id.
3. Run a subaggregation top hits query with missing disabledDate filter.
4. apply post filter for missing disabledDate.
{
"query": {
"bool": {
"must": [
{
"range": {
"versionDate": {
"from": null,
"to": "2016-05-25T20:53:22.742Z",
"include_lower": false,
"include_upper": true
}
}
},
{
"terms": {
"domainId": [
"yy"
]
}
},
{
"terms": {
"termId": [
"rr"
]
}
}
]
}
},
"aggregations": {
"associations": {
"terms": {
"field": "id",
"size": 0,
"execution_hint": "global_ordinals_low_cardinality",
"order": {
"_term": "asc"
},
"disabledDate": {
"filters": {
"missing": {
"field": "disbaledDate"
}
},
"aggregations": {
"top": {
"top_hits": {
"size": 1,
"_source": {
"includes": [],
"excludes": []
},
"sort": [
{
"versionDate": {
"order": "desc"
}
}
]
}
}
}
}
}
}
},
"post_filter": {
"missing": {
"field": "disabledDate"
}
}
}

How do I limit an ElasticSearch API count by date?

I'm trying to count the number of query matches over a given time range, hitting the URL /{index}/_count with the body indicated below.
I'm new to Query DSL, so it's quite possible I'm overlooking something obvious. However, the straightforward application of a count to an existing query doesn't work. I don't see anything in the docs that indicate a count query should receive special treatment.
I've tried adding a range and aggregations to the query, but I keep getting the following error or some variant:
indices:data/read/count[s]]]; nested:
QueryParsingException[[graylog2_NN] request does not support [{label}]]
Limit query by timestamp:
{
"query": {
"term": { "level":3 },
"range": {
"timestamp": {
"from": "2015-06-16 15:10:09.322",
"to": "2015-06-16 16:10:09.322",
"include_lower": true,
"include_upper": true
}
}
}
}
Use an aggregation:
{
"query": {
"term": { "level":3 }
},
"aggs": {
"range": {
"date_range": {
field: "_timestamp",
"ranges": {
{ "to": "now-1d" },
{ "from": "now-2d" },
}
}
}
}
}
I've also tried plugging in the query exported from the UI (bug icon on an individual stream display), no joy there either (one hour's worth of matches):
{
"from": 0,
"size": 100,
"query": {
"match_all": {}
},
"post_filter": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"from": "2015-06-16 15:10:09.322",
"to": "2015-06-16 16:10:09.322",
"include_lower": true,
"include_upper": true
}
}
},
{
"query": {
"query_string": {
"query": "streams:5568c9dbe4b0b31b781bf105"
}
}
}
]
}
},
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"highlight": {
"require_field_match": false,
"fields": {
"*": {
"fragment_size": 0,
"number_of_fragments": 0
}
}
}
}
I've found a query that both matches and lines up pretty closely with numbers I get from the UI ("Search in the last 1 day"):
{
"query": {
"filtered": {
"query": {
"term": { "level":3 }
},
"filter": {
"range": { "timestamp": { "gte": "now-1d" } }
}
}
}
}
Try the following query that uses bool query. I use a different timestamp format, which is the default in elasticsearch. Try that format first, if no luck modify the timestamp format to match yours.
{
"query": {
"bool" : {
"should" : [
{
"term": { "level":3 }
},
{
"range": {
"timestamp": {
"from": "2015-06-16T15:10:09",
"to": "2015-06-16T16:10:09"
}
}
}
]
}
}
}

Resources