Missing Indices in Elasticsearch after 6.30pm UTC - elasticsearch

We have a ingestion pipeline that will create indices every 2 hours, eg: index-2022-05-10-0 at 12am UTC, index-2022-05-10-1 at 2am UTC and so on..The problem is after 7pm UTC there is no index seen in Elasticsearch. Is it due to the timezone issue? But I know Elasticsearch uses UTC and ES servers are also configured on UTC.
What might be the issue? The new index for next day is created at UTC 12am correctly. And if I see the index creation time according to IST, its 5.30am.
Since I am working from India, and its 5.30 hours ahead of UTC so when its 7pm utc, then in IST the day changes and its 12.30am, is that the time zone issue due to which further indices are not created? Could someone please help?
Below is the pipeline code
...
"script": { "lang": "painless", "source": "Date d=new Date((long)(timestampfield)*1000); DateFormat f = new SimpleDateFormat("HH"); String crh=(Integer.parseInt(f.format(d))/2).toString(); String nvFormat="yyyy-MM-dd-"+crh; DateFormat f2=new SimpleDateFormat(nvFormat); ctx['_index']="index-"+f2.format(d);"

Related

How to get the same output of departed Date.parse() in groovy?

I have an application that runs the old version of the spring application. The application has the function to create date objects using Date.parse as follows
Date getCstTimeZoneDateNow() {
String dateFormat = "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
def zonedDateString = new Date().format(dateFormat, TimeZone.getTimeZone('CST'))
Date date = Date.parse(dateFormat, zonedDateString)
return date // Tue Oct 18 20:36:12 EDT 2022 (in Date)
}
However, the code above is deprecated. I need to produce the same result.
I read other posts and it seems like Calender or SimpleDateFormatter is preferred.
And I thought SimpleDateFormatter has more capabilities.
This post helped me understand more about what is going on in the following code
SimpleDateFormat parse loses timezone
Date getCstTimeZoneDateNow() {
Date now = new Date()
String pattern = "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
SimpleDateFormat sdf = new SimpleDateFormat()
sdf.setTimeZone(TimeZone.getTimeZone('CST'))
// cstDateTime prints times in cst
String cstDateTime = sdf.format(now) // 2022-10-18T20:36:12.088Z (in String)
// JVM current time
Date date = sdf.parse(cstDateTime) // Tue Oct 18 21:36:12 EDT 2022 (in Date)
return date
}
Here my goal is to return the date object that is in the format of Tue Oct 18 20:36:12 EDT 2022
The format is good. However, like the post says, when I do sdf.parse(), it prints in JVM time.
This means, the format is good but the time zone is off.
How can I get the exact same result as before?
It does not have to use SimpleDateFormatter. Could be anything.
Thank you so much for reading and for your time.
Perhaps the important thing is, that the Date is always neutral to the timezone. Given example shows what is to be expected to work from the Java specs:
def format = new SimpleDateFormat()
format.setTimeZone(TimeZone.getTimeZone("CST"))
println new Date()
def date = format.parse(format.format(new Date()))
printf "parsed to %s%n", date
printf "formatted to %s (%s)%n", format.format(date), format.getTimeZone().getDisplayName()
In the output, notice when using the Format and when the toString(), a different time is shown accordingly, which is perfectly fine, since first we format and then parse again in the same format, thus the same time-zone. Later, we use the Date.toString() to output the date, this time using the system default time-zone which is always used when Date.toString() is called. In the output, the time-zone shift is reflected:
Thu Oct 20 09:22:58 EDT 2022
parsed to Thu Oct 20 09:22:00 EDT 2022
formatted to 10/20/22 8:22 AM (Central Standard Time)

Set year on syslog events read into logstash read after new year

Question
When reading syslog events with Logstash, how can one set a proper year where:
Syslog events still by default lack the year
Logstash processing can be late in processing - logs arriving late, logstash down for maintenance, syslog queue backing up
In short - events can come in un-even order - and all / most lack the year.
The Logstash date filter will successfully parse a Syslog date, and use the current year by default. This can be wrong.
One constraint: Logs will never be from the future, not counting TimeZone +/- 1 day.
How can logic be applied to logstash to:
Check if a parsed date appears to be in the future?
Handle "Feb 29" if parsed in the year after the actual leap year.
Date extraction and parsing
I've used the GROK filter plugin to extract the SYSLOGTIMESTAMP from the message into a syslog_timestamp field.
Then the Logstash date filter plugin to parse syslog_timestamp into the #timestamp field.
#
# Map the syslog date into the elasticsearch #timestamp field
#
date {
match => ["syslog_timestamp",
"MMM dd HH:mm:ss",
"MMM d HH:mm:ss",
"MMM dd yyyy HH:mm:ss",
"MMM d yyyy HH:mm:ss" ]
timezone => "Europe/Oslo"
target => "#timestamp"
add_tag => [ "dated" ]
tag_on_failure => [ "_dateparsefailure" ]
}
# Check if a localized date filter can read the date.
if "_dateparsefailure" in [tags] {
date {
match => ["syslog_timestamp",
"MMM dd HH:mm:ss",
"MMM d HH:mm:ss",
"MMM dd yyyy HH:mm:ss",
"MMM d yyyy HH:mm:ss" ]
locale => "no_NO"
timezone => "Europe/Oslo"
target => "#timestamp"
add_tag => [ "dated" ]
tag_on_failure => [ "_dateparsefailure_locale" ]
}
}
Background
We are storing syslog events into Elasticsearch using Logstash. The input comes from a wide variety of servers both of different OS and OS versions, several hundred in total.
On the logstash server the logs are read from file. Servers ship their logs using the standard syslog forwarding protocol.
The standard Syslog event still only has the month and date in each log, and configuring all servers to also add the year is out of scope for this question.
Problem
From time to time an event will occur where a servers syslog queue backs up. The queue will then (mostly) be released after a syslog / or server restart. The patching regime ensures that all servers are booted several times a year, so (most likely) any received events will at most be under a year old.
In addition any delay in processing, such as between 31/12 (December) and 1/1 (January) makes an event belong to another year than the year it is processed.
From time to time you also will need to re-read some logs, and then there's the leap year issue of February 29th - 29/02 - "Feb 29".
Examples:
May 25 HH:MM:SS
May 27 HH:MM:SS
May 30 HH:MM:SS
May 31 HH:MM:SS
Mai 31 HH:MM:SS # Localized
In sum: Logs may be late, and we need to handle it.
More advanced DateTime logic can be done with the Logstash Ruby filter plugin.
Leap year
29th of February every four years makes "Feb 29" a valid date for the year 2020, but not in 2021.
The date is saved in syslog_timestamp and run through the date filters in the Q.
The following Ruby code will:
Check if this year is a leap year (probably not since parsing failed)
Check if last year was a leap year.
If the date falls outside these checks we can't rightly assume anything else, so this check falls into the "I know and accept the risk."
#
# Handle old leap syslog messages, typically from the previous year, while in a non-leap-year
# Ruby comes with a price, so don't run it unless the date filter has failed and the date is "Feb 29".
#
if "_dateparsefailure" in [tags] and "_dateparsefailure_locale" in [tags] and [syslog_timestamp] =~ /^Feb 29/ {
ruby {
code => "
today = DateTime.now
last_year = DateTime.now().prev_year
if not today.leap? then
if last_year.leap? then
timestamp = last_year.strftime('%Y') + event.get('syslog_timestamp')
event.set('[#metadata][fix_leapyear]', LogStash::Timestamp.new(Time.parse(timestamp)))
end
end
"
}
#
# Overwrite the `#timestamp` field if successful and remove the failure tags
#
if [#metadata][fix_leapyear] {
mutate {
copy => { "[#metadata][fix_leapyear]" => "#timestamp" }
remove_tag => ["_dateparsefailure", "_dateparsefailure_locale"]
add_tag => ["dated"]
}
}
}
Date in the future
Dates "in the future" occurs if you get i.e. Nov 11 in a log parsed after New Year.
This Ruby filter will:
Set a tomorrow date variable two days in the future (ymmv)
Check if the parsed event date #timestamp is after (in the future) tomorrow
When reading syslog we assume that logs from the future does not exist. If you run test servers to simulate later dates you must of course adapt to that, but that is outside the scope.
# Fix Syslog date without YEAR.
# If the date is "in the future" we assume it is really in the past by one year.
#
if ![#metadata][fix_leapyear] {
ruby {
code => "
#
# Create a Time object for two days from the current time by adding 172800 seconds.
# Depends on that [event][timestamp] is set before any 'date' filter or use Ruby's `Time.now`
#
tomorrow = event.get('[event][timestamp]').time.localtime() + 172800
#
# Read the #timestamp set by the 'date' filter
#
timestamp = event.get('#timestamp').time.localtime()
#
# If the event timestamp is _newer_ than two days from now
# we assume that this is syslog, and a really old message, and that it is really from
# last year. We cannot be sure that it is not even older, hence the 'assume'.
#
if timestamp > tomorrow then
if defined?(timestamp.usec_with_frac) then
new_date = LogStash::Timestamp.new(Time.new(timestamp.year - 1, timestamp.month, timestamp.day, timestamp.hour, timestamp.min, timestamp.sec, timestamp.usec_with_frac)
else
new_date = LogStash::Timestamp.new(Time.new(timestamp.year - 1, timestamp.month, timestamp.day, timestamp.hour, timestamp.min, timestamp.sec))
end
event.set('#timestamp', new_date)
event.set('[event][timestamp_datefilter]', timestamp)
end
"
}
}
Caveat: I'm by no means a Ruby expert, so other answers or comments on how to improve on the Ruby code or logic will be greatly appreciated.
In the hope that this can help or inspire others.

Carbon php date time math

I know when an account is created in UTC. If the account is cancelled before 2am PST the next day then the account needs to be removed, otherwise it is not removed until later. I'm having trouble coming up with the actual statements to use in Carbon. For example:
$account->getAttribute('created_at');
returns
Illuminate\Support\Carbon #1597790786 {#3432
date: 2020-08-18 22:46:26.0 UTC (+00:00),
}
Therefore I need to know if now() is >= 2020-08-19 02:00:00.0 PDT/PST.
How should I do that?
Switch your date in the timezone to consider "tomorrow 2am" then re-switch in UTC for the comparison:
$cancellation = $account->getAttribute('cancelled_at');
$creation = $account->getAttribute('created_at');
if ($cancellation < $creation->tz('PST')->modify('tomorrow 2am')->utc()) {
// remove
}

How can I return LocalDate.now() in milliseconds?

I create date now:
ZoneId gmt = ZoneId.of("GMT");
LocalDateTime localDateTime = LocalDateTime.now();
LocalDate localDateNow = localDateTime.toLocalDate();
Then I want return this date in milliseconds:
localDateNow.atStartOfDay(gmt) - 22.08.2017
localDateNow.atStartOfDay(gmt).toEpochSecond(); - 1503360000 (18.01.70)
How can I return LocalDate.now() in milliseconds?
Calling toInstant().toEpochMilli(), as suggested by #JB Nizet's comment, is the right answer, but there's a little and tricky detail about using local dates that you must be aware of.
But before that, some other minor details:
Instead of ZoneId.of("GMT") you can use the built-in constant ZoneOffset.UTC. They're equivalent, but there's no need to create extra redundant objects if the API already provides one that does exactly the same thing.
Instead of calling LocalDateTime.now() and then .toLocalDate(), you can call LocalDate.now() directly - they're equivalent.
Now the tricky details: when you call the now() method (for either LocalDateTime or LocalDate), it uses the JVM's default timezone to get the values for the current date, and this value might be different depending on the timezone configured in the JVM.
In the JVM I'm using, the default timezone is America/Sao_Paulo, and the local time here is 09:37 AM. So LocalDate.now() returns 2017-08-22 (August 22th 2017).
But if I change the default timezone to Pacific/Kiritimati, it returns 2017-08-23. That's because in Kiritimati, right now is already August 23th 2017 (and the local time there, at the moment I write this, is 02:37 AM).
So, if I run this code when the default timezone is Pacific/Kiritimati:
LocalDate dtNow = LocalDate.now(); // 2017-08-23
System.out.println(dtNow.atStartOfDay(ZoneOffset.UTC).toInstant().toEpochMilli());
The output is:
1503446400000
Which is the equivalent of August 23th 2017 at midnight in UTC.
If I run the same code when the default timezone is America/Sao_Paulo, the result will be:
1503360000000
Which is the equivalent of August 22th 2017 at midnight in UTC.
Using now() makes your code depends on the JVM's default timezone. And this configuration can be changed without notice, even at runtime, making your code return different results when such change occurs.
And you don't need such an extreme case (like someone misconfiguring the JVM to a "very-far" timezone). In my case, for example, in America/Sao_Paulo timezone, if I run the code at 11 PM, LocalDate will return August 22th, but the current date in UTC will already be August 23th. That's because 11 PM in São Paulo is the same as 2 AM of the next day in UTC:
// August 22th 2017, at 11 PM in Sao Paulo
ZonedDateTime z = ZonedDateTime.of(2017, 8, 22, 23, 0, 0, 0, ZoneId.of("America/Sao_Paulo"));
System.out.println(z); // 2017-08-22T23:00-03:00[America/Sao_Paulo]
System.out.println(z.toInstant()); // 2017-08-23T02:00:00Z (in UTC is already August 23th)
So using a LocalDate.now() is not a guarantee that I'll always have the current date in UTC.
If you want the current date in UTC (regardless of the JVM default timezone) and set the time to midnight, it's better to use a ZonedDateTime:
// current date in UTC, no matter what the JVM default timezone is
ZonedDateTime zdtNow = ZonedDateTime.now(ZoneOffset.UTC);
// set time to midnight and get the epochMilli
System.out.println(zdtNow.with(LocalTime.MIDNIGHT).toInstant().toEpochMilli());
The output is:
1503360000000
Which is the equivalent of August 22th 2017 at midnight in UTC.
Another alternative is to pass the timezone to LocalDate.now, so it can get the correct values for the current date on the specified zone:
// current date in UTC, no matter what the JVM default timezone is
LocalDate dtNowUtc = LocalDate.now(ZoneOffset.UTC);
// set time to midnight and get the epochMilli
System.out.println(dtNow.atStartOfDay(ZoneOffset.UTC).toInstant().toEpochMilli());

Working with Rails Timezones?

Working on a 3 week new registration chart for metrics, I have the following code:
(3.weeks.ago.to_date..Date.today).map { |date| Metrics.registrations_on(date) }
In Metrics.rb:
def self.registrations_on(date)
date = date.midnight
end_date = date + 24.hours
User.where(:created_at => date..end_date).count
end
Before the day is done here in California, a new day's numbers are already starting to increase. The created_at timestamp is UTC as well.
I'd like to be able to see the stats from today, using our time zone. With my data already saved as UTC I'm curious as to how about accomplishing this.
Open config/application.rb, find config.time_zone, and assign it with appropriate value:
config.time_zone = 'Pacific Time (US & Canada)'
Restart your app, and all ruby date/time operation should be adjusted automatically to your time zone.
For a list of all supported time zone strings, use:
bundle exec rake time:zones:all

Resources