Query DSL terms filtering with script for day by numeric value - elasticsearch

Within aggs I am able to get buckets by day of the week that are represented in numeric (1-7) keys using something like this:
"aggs" : {
"group_by_day" :{
"terms": {
"script": "doc['#timestamp'].date.dayOfWeek",
"order": {
"_key": "asc"
}
}
}
}
however I am looking for a way to add to the query filtering terms clause something like this to only show results for a monday or tuesday and haven't been able to get this:
I have tried
{
"terms": {
"script":"doc['#timestamp'].date.dayOfWeek"
}
}
and the use of script tag doesn't seem to be supported in terms query? at least how I am attempting to use it. Is there another way to get at filtering with script, or another approach (better) to get want I am trying to achieve? I am using 6.2...thanks!

Here is it:
"script":{
"script": {
"source": "doc['#timestamp'].date.dayOfWeek == 1"
}
}
Where I just handle the string to numeric conversion outside of this query, this is within a query.bool.must clause.

Related

Custom sort lexicographically as int

I have some elastic elements that have a string property that looks like 10/2021 and it need to be sorted as a int, but when I perform this query
"sort": [
{
"myProperty": {
"order": "asc"
}
},
I get the lexicographic order.
1/2021
10/2021
100/2021
101/2021
102/2021
But I need it to sort by the first number and the year like this:
1/2020
2/2020
...
1/2021
2/2021
I can't figure out how to custom sort, is it even possible?
Solution 1:
Using Scripted-Sort ...
Not Recommended with large data-set: It will take time as we are performing computations here
GET <>/_search
{
"query": {
"match_all": {}
},
"sort": {
"_script":{
"type":"number",
"script":{
"lang":"painless",
"source":"Integer.parseInt(doc['myProperty.keyword'].value.replace(\"/\",\"\"))" //<====== Replace myProperty.keyword with the keyword field or String field with field-data true
}
}
}
}
Note: i haven't added null checks in the script, just in case you have any document which don't have this field.
Solution 2:
Store another Numeric field in elastic search which doesn't have "/"
Sort based on that field
Migrate the data of existing documents to the field using update_by_query API
This is the Recommended approach.

Filtering documents by an unknown value of a field

I'm trying to create a query to filter my documents by one (can be anyone) value from a field (in my case "host.name"). The point is that I don't know previously the unique values of this field. I need found these and choose one to be used in the query.
I had tried the below query using a painless script, but I have not been able to achieve the goal.
{
"sort" : [{"#timestamp": "desc"}, {"host.name": "asc"}],
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": """
String k = doc['host.name'][0];
return doc['host.name'].value == k;
""",
"lang": "painless"
}
}
}
}
}
I'll appreciate if any can help me improving this idea of suggesting me a new one.
TL;DR you can't.
The script query context operates on one document at a time and so you won't have access to the other docs' field values. You can either use a scripted_metric aggregation which does allow iterating through all docs but it's just that -- an aggregation -- and not a query.
I'd suggest to first run a simple terms agg to figure out what values you're working with and then build your queries accordingly.

Elastic relative data math - finding all things today

I'm trying to do so fairly simple query with Elasticsearch, but I don't think I understand what I'm doing wrong, so I'm posting here for some pointers.
I have an elastic index where each document has a date like so:
{
// edited for brevity
"releasedate": "2020-10-03T15:55:03+00:00",
}
and I am using django DRF to make queries like so, where I pass this value along &releasedate__gt=now-3d/d
Which ends up with an elastic range query like this.
{
"from": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"releasedate": {
"gt": "now/d-3d"
}
}
}
]
}
},
"size": 10,
"sort": [
"_score"
]
}
If I want to see all "documents since yesterday", I think of it in terms of all documents with releasedate greater than midnight yesterday, I figured the key part of the query would need to be like so:
{
"query": {
"bool": {
"filter": [
{
"range": {
"releasedate": {
"gt": "now/d-1d"
}
}
}
]
}
}
}
So I expect this would round the time now, to 00:00 today, then go back one day.
So if I ran this on 2020-10-04. I'd assume this would catch a document with the release date of 2020-10-03T15:55:03+00:00.
Here's my reasoning
Rounding down with now/d would take us to 2020-10-04T00:00.
And then going back one day with -1d would take us to 2020-10-03T00:00.
This ought to include the document, but I'm not seeing it. I need to look back more than one day to find the documents, so I need to use now/d-2d to find matching documents.
Any idea why this might be? I'm unsure of how to see what now/d-1d evaluates in terms of a timezone aware object, to check - that's what I might reach for, but I don't know how with Elastic.
FWIW, this is using Elastic 5.6. We'll be updating soon.
I'd say that once you round down to the nearest day (either with now-2d/d or now/d-2d -- as you did), the gt query's intervals will indeed be day-based.
In other words, gt : 2020-10-03T00:00 is >= 2020-10-04T00:00. So what you need instead of gt is gte and that'll work as >=2020-10-03T00:00.

ElasticSearch how to get docs with 10 or more fields in them?

I want to get all docs that have 10 or more fields in them. I'm guessing something like this:
{
"query": {
"range": {
"fields": {
"gt": 1000
}
}
}
}
What you can do is to run a script query like this
{
"query": {
"script": {
"script": {
"source": "params._source.size() >= 10"
}
}
}
}
However, be advised that depending on the number of documents you have and the hardware that supports your cluster, this can negatively impact the performance of your cluster.
A better idea would be to add another integer field that contains the number of fields that the document contains, so you can simply run a range query on it, like in your question.
As Per Documentation of _source field, you can do this like that or can't get results based on fields count.
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html

Arithmetic operations with fields

Is it possible to query the result of a subtraction between two fields?
E.g. There are two fields: "start", "end". I would like documents with end - start > 10.
Can this be done directly or the only way to do is to create a new field while loading the documents with this difference?
You can use script filters using the scripting syntax explained in the scripting documentation.
For your specific issue, you might do something like
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"script": {
"script": "doc['end'].value - doc['start'].value > 10"
}
}
}
}
}
where you can replace the match_all query with your own.
As it's probably clear from the code above, you can access specific fields in your document with the sintax doc['field'] and apply specific functions to their values. In this case, .value (without parenthesis) returns the value of the field itself.
script filter in your query might be the way to go.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-query.html

Resources