Optimistic multiIndex query to get min of maximum of all indexes - elasticsearch

I am learning elastic search and need help with multiindex search query.
So basically I have 7 indexes. Every index has lastUpdatedDate with every document. Now I want to query all selected indexes at once and get minimum of maximum of last updated date.
eg
index - "A" last updated on 20th Dec - max of all lastUpdatedDate records - 20th Dec
index - "B" last updated on 18th Dec - max of all lastUpdatedDate records - 18th Dec
index - "C" last updated on 19th Dec - max of all lastUpdatedDate records - 19th Dec
min of all these three indexes is 18th.
I can make query to all indexes separately from my backend service, but thinking of optimise query in Java to index all these indexes at once.
One more example:
Index-A {
Id:1, lastUpdatedDate: 15th Dec;
Id:2, lastUpdatedDate: 16th Dec;
Id:5, lastUpdatedDate: 15th Dec;
Id:6, lastUpdatedDate: 20th Dec;
};
Index-B{
Id:1, lastUpdatedDate: 21st Dec;
Id:2, lastUpdatedDate: 16th Dec;
Id:5, lastUpdatedDate: 15th Dec;
Id:6, lastUpdatedDate: 20th Dec;
};
Index-C{
Id:1, lastUpdatedDate: 22nd Dec;
Id:2, lastUpdatedDate: 16th Dec;
Id:5, lastUpdatedDate: 15th Dec;
Id:6, lastUpdatedDate: 20th Dec;
}
Now max of indexes are:
Index-A -> 20th Dec
Index-B -> 21st Dec
Index-C -> 22nd Dec
Then min is 20th Dec

A very simple query would be this one. Retrieve the max updated value per index and then retrieve the bucket with the minimum value in all those buckets:
GET _all/_search
{
"size": 0,
"aggs": {
"all_indexes": {
"terms": {
"field": "_index",
"size": 100
},
"aggs": {
"max_updated": {
"max": {
"field": "lastUpdatedDate"
}
}
}
},
"min_updated": {
"min_bucket": {
"buckets_path": "all_indexes>max_updated"
}
}
}
}

Related

Date difference scripted field in Kibana

I wanted to find the difference between two fields using scripted fields. Here are the two date fields and their format:
start_time - June 17th 2018, 09:44:46.000
end_time - June 17th 2018, 09:44:49.000
Which will give proc_time.
Here's what I am trying to do in scripted fields:
doc['start_time'].date.millis - doc['end_time'].date.millis
But this is returning the processing time which is deducted from epoch time.
For example, if my processing time is 2 seconds, then the output will be epoch time - 2 seconds.
Which is not what I want.
This is the sample doc:
17 Jun 2018 04:14:46 INFO CSG event file generation started at: Sun Jun 17 04:14:46 CDT 2018
17 Jun 2018 04:14:46 INFO Executing CSG file generation process
Warning: Using a password on the command line interface can be insecure.
17 Jun 2018 04:15:57 INFO Finished at: Sun Jun 17 04:15:57 CDT 2018
Any help would be appreciated.
Update
I've got this working with the following painless script:
((doc['csg_proc_end_time'].date.year) * 31536000 + doc['csg_proc_end_time'].date.monthOfYear * 86400 + doc['csg_proc_end_time'].date.dayOfMonth * 3600 + doc['csg_proc_end_time'].date.secondOfDay) - ((doc['csg_proc_start_time'].date.year) * 31536000 + doc['csg_proc_start_time'].date.monthOfYear * 86400 + doc['csg_proc_start_time'].date.dayOfMonth * 3600 + doc['csg_proc_start_time'].date.secondOfDay)
However, I would welcome any other script which does this in a simpler way.
JSON format for added fields:
"fields": {
"#timestamp": [
"2018-06-20T04:45:00.258Z"
],
"zimbra_proc_time": [
0
],
"csg_proc_time": [
71
],
"perftech_proc_time": [
0
],
"csg_proc_end_time": [
"2018-06-17T04:15:57.000Z"
],
"csg_proc_start_time": [
"2018-06-17T04:14:46.000Z"
]
},
This is what I've done to reproduce your issue and it works properly:
PUT test/doc/1
{
"csg_proc_end_time": "2018-06-17T04:15:57.000Z",
"csg_proc_start_time": "2018-06-17T04:14:46.000Z"
}
Now compute the processing time in a script field:
GET test/_search
{
"script_fields": {
"proc_time": {
"script": {
"source": "(doc.csg_proc_end_time.value.millis - doc.csg_proc_start_time.value.millis) / 1000"
}
}
}
}
Result: 71 seconds
{
"_index": "test",
"_type": "doc",
"_id": "1",
"_score": 1,
"fields": {
"proc_time": [
71
]
}
}

Rails split string with multiple parts

I have a string that's been imported from a csv such as:
14th Aug 2009:1, 15th Aug 2009:1, 16th Sep 2015:1|Style1, 17th Sep
2015:1|Style 1
I wish to add this data to my database in a specific way. First I split it on , to get each date group (in this case 4 dates).
Secondly i'd like a way to split each of those date groups into multiple segments. The first with the date, second with the number after the colon and then a varied amount more for each of the items separated by the | character.
Is there an decent efficient way to accomplish this in Ruby?
Looking for outcome to be a hash like so:
{ '14th Aug 2009' => 1, '15th Aug 2009' => 1, '16th Aug 2009' => 1, '16th Sep 2015' => { 1 => 'Style 1' }, '17th Sep 2015' => { 1 => 'Style 1' }
Basically if the string was like so:
15th Aug 2009:1, 16th Sep 2015:3|Style1|Style 1, 17th Sep
2015:1|Style 1
I would get
{ '15th Aug 2009' => 1, '16th Sep 2015' => { '', 'Style 1', 'Style 1' }, '17th Sep 2015' => { 1 => 'Style 1' }
Basically, the text separated by |'s should be assigned to the number after the colon. If the number is 3 and there are two sets of text after it then one is an empty string and the other two will say the text (eg: "Style 1".
Sorry for sounding very confusing.
I'm assuming that you meant for the '|' separted items to build an Array, as infused asked about. How about this?
s ="15th Aug 2009:1, 16th Sep 2015:3|Style1|Style 1, 17th Sep 2015:1|Style 1"
result = {}
s.split(',').each do |v|
date,rest = v.split(':')
items = rest.split('|')
if items[0] == "1"
result[date] = 1
else
result[date] = ['', items[1..-1]]
end
end

I don't know how to filter my log file with grok and logstash

I have an small java app that loads logs similar to these ones bellow:
Fri May 29 12:10:34 BST 2015 Trade ID: 2 status is :received
Fri May 29 14:12:36 BST 2015 Trade ID: 4 status is :received
Fri May 29 17:15:39 BST 2015 Trade ID: 3 status is :received
Fri May 29 21:19:43 BST 2015 Trade ID: 3 status is :Parsed
Sat May 30 02:24:48 BST 2015 Trade ID: 8 status is :received
Sat May 30 08:30:54 BST 2015 Trade ID: 3 status is :Data not found
Sat May 30 15:38:01 BST 2015 Trade ID: 3 status is :Book not found
Sat May 30 23:46:09 BST 2015 Trade ID: 6 status is :received
I want to use ELK stack to analyse my logs and filter them.
I would like at least 3 filters : Date and time, trade Id and status.
In the filter part of my logstash configuration file here is what I did:
filter {
grok {
match => { "message" => "%{DAY} %{MONTH} %{DAY} %{TIME} BST %{YEAR} Trade ID: %{NUMBER:tradeId} status is : %{WORD:status}" }
}
And for the moment I can't filter my logs as I want.
You have some extra spaces between the pattern, and for the status, you would like to parse the entire message, so using the GREEEDYDATA instead of the WORD is your choice.
filter {
grok {
match => { "message" => "%{DAY:day} %{MONTH:month} %{MONTHDAY:monthday} %{TIME:time} BST %{YEAR:year} Trade ID: %{NUMBER:tradeId} status is :%{GREEDYDATA:status}" }
}
}
For this log line:
Sat May 30 15:38:01 BST 2015 Trade ID: 3 status is :Book not found
You will end up with a json like:
{
"message" => "Sat May 30 15:38:01 BST 2015 Trade ID: 3 status is :Book not found",
"#version" => "1",
"#timestamp" => "2015-08-18T18:28:47.195Z",
"host" => "Gabriels-MacBook-Pro.local",
"day" => "Sat",
"month" => "May",
"monthday" => "30",
"time" => "15:38:01",
"year" => "2015",
"tradeId" => "3",
"status" => "Book not found"
}

How to do a year over year aggregation with Elasticsearch?

Assuming I have a date field on a document, I know using the date_histogram aggregation I can get a document count by day, month, year, etc.
What I want to do is get the average document count for January, February, March, etc. over several given years. The same goes for Monday, Tuesday, Wednesday, etc. over several given weeks. Is there a way to do this having just that same date field or what is the best way to accomplish this with Elasticsearch?
Example
Let's say we have a bunch of orders placed over three years:
2012 - Jan (10 orders), Feb (5 orders), Mar (7 orders), Apr (11 orders), etc
2013 - Jan (13 orders), Feb (7 orders), Mar (12 orders), Apr (15 orders), etc.
2014 - Jan (10 orders), Feb (7 orders), Mar (6 orders), Apr (13 orders), etc.
What I want is the average of each month over the given years, so the output would be:
Jan (10 + 13 + 10 / 3 = 11 orders), Feb (6.33 orders), Mar (8.33 orders), Apr (13 orders), etc.
It would be best if this can be generalized for N years (or N Januaries, etc.) so that we search over any date range.
You can use 'monthOfYear' like this:
"aggregations": {
"timeslice": {
"histogram": {
"script": "doc['timestamp'].date.getMonthOfYear()",
"interval": 1,
"min_doc_count": 0,
"extended_bounds": {
"min": 1,
"max": 12
},
"order": {
"_key": "desc"
}
}
}
The extended bounds will ensure you get a value for every month (even if it is zero).
If you want the month names, you can either do that in your own code, or, do this (at the consequence that you won't get values for months that have no data):
"aggregations": {
"monthOfYear": {
"terms": {
"script": "doc['timestamp'].date.monthOfYear().getAsText()",
"order": {
"_term": "asc"
}
}
}
Once you've got this, you can nest your stats aggregation inside this one:
"aggregations: {
"monthOfYear": {
"terms": {
...
},
"aggregations": {
"stats": ...
}
}
}
The question is pretty old now, but, hope this helps someone.
My understanding of what you want is:
You'd like to see the average number of documents per month in yearly buckets
is that correct?
if so, you could count the number of documents in a year (i.e. the yearly bucket) and then divide by 12 using a script.
E.g. to show the daily average doc count in weekly buckets (assuming 30 days per month):
curl -XGET 'http://localhost:9200/index/type/_search?pretty' -d '{
"aggs" : {
"monthly_bucket": {
"date_histogram": {"field": "datefield","interval": "week"},
"aggs" : {
"weekly_average": {"sum" : {"script" : " doc[\"datefield\"].value>0 ? 1/30 : 0"} }}
}
}
}'

Linq Query - get current month plus previous months

I need to build a Linq query that will show the results as follow:
Data:
Sales Month
----------------------
10 January
20 February
30 March
40 April
50 May
60 June
70 July
80 August
90 September
100 October
110 November
120 December
I need to get the results based on this scenario:
month x = month x + previous month
that will result in:
Sales Month
--------------------
10 January
30 February (30 = February 20 + January 10)
60 March (60 = March 30 + February 30)
100 April (100 = April 40 + March 60)
.........
Any help how to build this query ?
Thanks a lot!
Since you wanted it in LINQ...
void Main()
{
List<SaleCount> sales = new List<SaleCount>() {
new SaleCount() { Sales = 10, Month = 1 },
new SaleCount() { Sales = 20, Month = 2 },
new SaleCount() { Sales = 30, Month = 3 },
new SaleCount() { Sales = 40, Month = 4 },
...
};
var query = sales.Select ((s, i) => new
{
CurrentMonth = s.Month,
CurrentAndPreviousSales = s.Sales + sales.Take(i).Sum(sa => sa.Sales)
});
}
public class SaleCount
{
public int Sales { get; set; }
public int Month { get; set; }
}
...but in my opinion, this is a case where coming up with some fancy LINQ isn't going to be as clear as just writing out the code that the LINQ query is going to generate. This also doesn't scale. For example, including multiple years worth of data gets even more hairy when it wouldn't have to if it was just written out the "old fashioned way".
If you don't want add up all of the previous sales for each month, you will have to keep track of the total sales somehow. The Aggregate function works okay for this because we can build a list and use its last element as the current total for calculating the next element.
var sales = Enumerable.Range(1,12).Select(x => x * 10).ToList();
var sums = sales.Aggregate(new List<int>(), (list, sale) => list.Concat(new List<int>{list.LastOrDefault() + sale});

Resources