Get records for particular day of the week in ElasticSearch - elasticsearch

I have an ES cluster that has some summarized numerical data such that there is exactly 1 record per day. I want to write a query that will return the documents for a specific day of the week. For example, all records for Tuesdays. Currently I am doing this by getting all records for the required date range and then filtering out the ones for the day that I need. Is there a way to do that with a query?

You can do it using a script like this:
POST my_index/_search
{
"query": {
"script": {
"script": {
"source": "doc.my_date.value.dayOfWeek == 2"
}
}
}
}
If you're going to run this query often, you would be probably better off creating another field dayOfWeek in your document that contains the day of the week that you can then easily query using a term query. It would be more efficient than a script.

Related

elasticsearch get date range of most recent ingestion

I have an elasticsearch index that gets new data in large dumps, so from looking at the graph its very obvious when new data is added.
If I only want to get data from the most recent ingestion (in this case data from 2020-08-06, whats the best way of doing this?
I can use this query to get the most recent document:
GET /indexname/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": queryString
}
}
]
}
},
"sort": {
"#timestamp" : "desc"
},
"size": 1
}
Which will return the most recent document, in this case a document with a timestamp of 2020-08-06. I can set that to my endDate and set my startDate to that date minus one day, but im worried of cases where the data was ingested overnight and spanned two days.
I could keep making requests to go back in time 5 hours at a time to find when the most recent large gap is, but im worried that making a request in a for loop could be time consuming? Is there a smarter way for getting the date range of my most recent ingestion?thx
When your data is coming in batches it'd be best to attribute an identifier to each batch. That way, there's no date math required.

ElasticSearch: Use Query to get single document ranking

I am trying to use ElasticSearch to compute a ranking. I'm not sure if this is possible and am trying to find out what my options might be. I need to run a query on all documents, sort them descending and then just return what number position in the list a specific record is located.
For example, I want to find out Julie's class ranking. I have records of each student in Julie's grade that contains their names and GPA's and I want to perform 1 query that will tell me what her rank in within her grade.
I am hoping there is an ES guru out there that can help because otherwise I am going to need to run a regular query, get back max 10,000 records and figure it out from there.
This cannot be found in a single query.
First you need to get GPA of "Julia" and then find count of docs which have score higher than Julia.
{
"query": {
"range": {
"gpa": {
"gt": 8 --> GPA of julia
}
}
},
"aggs": {
"count": {
"value_count": {
"field": "name.keyword" --> count where gpa is greater than 8
}
}
}
}
Better option is to store rank in document itself while indexing

Exclude results from Elasticsearch / Kibana based on aggregation value

Is it possible to exclude results based on the outcome of an aggregation?
In other words, I have aggregated on a Term and a whole bunch of results appear in a data table ordered in descending order by the count. Is it possible to configure kibana / elasticsearch to exclude results where count is 1 or less. (Where count is an aggregation).
I realise I can export the raw data from the data table visualization and delete those records manually through a text editor or excel. But I am trying to convince my organization that elasticsearch is a cool new thing and this is one of their 1st requirements...
You can exclude the result from the search by applying a filter here a sample that can be helpfull.
"query": {
"bool": {
"filter": {
"range": {
"Your_term": {
"gte": 1
}
}
}
}

ElasticSearch 2.4 date range histogram using the difference between two date fields

I haven't been able to find anything regarding this for ES 2.* in regards to the problem here or in the docs, so sorry if this is a duplicate.
What I am trying to do is create an aggregation in an ElasticSearch query that will allow me to create buckets based on the difference in a record between 2 date fields.
I.e. If I had data in ES for a shop, I might like to see the time difference between a purchase_date field and shipped_date field.
So in that instance I'd want to create an aggregate that had buckets to give me the hits for when shipped_date - purchase_date is < 1 day, 1-2 days, 3-4 days or 5+ days.
Ideally I was hoping this was possible in an ES query. Is that the case or would the best approach be to process the results into my own array based on the time difference for each hit?
I was able to achieve this by using the built in expression language which is enabled by default in ES 2.4. The functionality I wanted was to group my results to show the difference between EndDate and Date Processed in increments of 15 days. Relevant part of the query is:
{
...,
"aggs": {
"reason": {
"date_histogram": {
"min_doc_count": 1,
"interval": "1296000000ms", // 15 days
"format": "epoch_millis",
"script": {
"lang": "expression",
"inline": "doc['DateProcessed'] > doc['EndDate'] ? doc['DateProcessed'] - doc['EndDate'] : -1"
}
}
...
}
}

To Select documents having same startDate and endDate

I have some documents where in each document , there is a startDate and endDate date fields. I need all documents with both these value as same. I couldn't find any query which will help me to do it.
Elasticsearch supports script filters, which you can use in this case . More Info
Something like this is what you will need -
POST /<yourIndex>/<yourType>/_search?
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "doc['startDate'].value == doc['endDate'].value"
}
}
}
}
}
This can be achieved in 2 manner
Index solution - While indexing add an additional field called isDateSame and set it to true or false based on the value of startDate and endDate. Then you can easily do a query based on that field. This is the best optimized solution
Script solution - Elasticsdearch maintains all the indexed data in field data which is more like a reverse reverse index. Using script you can access any indexed fields and do comparison. This is pretty fast but not as good as first one.You can use the following query for the same

Resources