Smaller variation between times of different days - algorithm

I have working on a algorithm that select a set of date/time objects with a certain characteristic, but with no success.
The data to be used are in a list of lists of date/time objects,
e.g.:
lstDays[i][j], i <= day chooser, j <= time chooser
What is the problem? I need a set of nearest date/time objects. Each time of this set must come from different days.
For example: [2012-09-09 12:00,2012-09-10 12:00, 2012-09-11 12:00]
This example of a set of date/time objects is the best example because it minimize to zero.
Important
Trying to contextualize this: I want to observe if a phenomenon occurs at the same time in differents days. If not, I want to evaluate if distance between the hours is reasonable for my study.
I would like a generic algorithm to any number of days and time. This algorithm should return all set of datetime objects and its time distance:
[2012-09-09 12:00,2012-09-10 12:00, 2012-09-11 12:00], 0
[2012-09-09 13:00,2012-09-10 13:00, 2012-09-11 13:05], 5
and so on.
:: "0", because the diff between all times on the first line from datetime objects is zero seconds.
:: "5", because the diff between all times on the second line from datetime objects is five seconds.
Edit: Code here
for i in range(len(lstDays)):
for j in range(len(lstDays[i])):
print lstDays[i][j]
Output:
2013-07-18 11:16:00
2013-07-18 12:02:00
2013-07-18 12:39:00
2013-07-18 13:14:00
2013-07-18 13:50:00
2013-07-19 11:30:00
2013-07-19 12:00:00
2013-07-19 12:46:00
2013-07-19 13:19:00
2013-07-22 11:36:00
2013-07-22 12:21:00
2013-07-22 12:48:00
2013-07-22 13:26:00
2013-07-23 11:18:00
2013-07-23 11:48:00
2013-07-23 12:30:00
2013-07-23 13:12:00
2013-07-24 11:18:00
2013-07-24 11:42:00
2013-07-24 12:20:00
2013-07-24 12:52:00
2013-07-24 13:29:00
Note: lstDays[i][j] is a datetime object.
lstDays = [ [/*datetime objects from a day i*/], [/*datetime objects from a day i+1*/], [/*datetime objects from a day i+2/*], ... ]
And I am not worried with perfomance, a priori.
Hope that you can help me! (:

Generate a histogram:
hours = [0] * 24
for object in objects: # whatever your objects are
# assuming object.date_time looks like '2013-07-18 10:55:00'
hour = object.date_time[11:13] # assuming the hour is in positions 11-12
hours[int(hour)] += 1
for hour in xrange(24):
print '%02d: %d' % (hour, hours[hour])

You can always resort to calculating the times into a list, then estimate the differences, and group those objects that are below that limit. All packed into a dictionary with the difference as the value and the the timestamps as keys. If this is not exactly what you need, I'm pretty sure it should be easy to select whatever result you need from it.
import numpy
import datetime
times_list = [object1.time(), object2(), ..., objectN()]
limit = 5 # limit of five seconds
groups = {}
for time in times_list:
delta_times = numpy.asarray([(tt-time).total_seconds() for tt in times_list])
whr = numpy.where(abs(delta_times) < limit)[0]
similar = [str(times_list[ii]) for ii in whr]
if len(similar) > 1:
similar.sort()
max_time = numpy.max(delta_times[whr]) # max? median? mean?
groups[tuple(similar)] = max_time

Related

calculating some seasonal climate metrics with Iris

I have a new project on, calculating some seasonal climate metrics. As part of this, I need to identify, eg the wettest quarter in a set of climatological monthly data:
print(pr_cube)
Precipitation / (mm) (time: 12; latitude: 125; longitude: 211)
Dimension coordinates:
time x - -
latitude - x -
longitude - - x
where time is every month, averaged across 30-years with coord('time) =
DimCoord([2030-01-01 00:00:00, 2030-02-01 00:00:00, 2030-03-01 00:00:00,
2030-04-01 00:00:00, 2030-05-01 00:00:00, 2030-06-01 00:00:00,
2030-07-01 00:00:00, 2030-08-01 00:00:00, 2030-09-01 00:00:00,
2030-10-01 00:00:00, 2030-11-01 00:00:00, 2030-12-01 00:00:00]
I was wondering if I could add a seasons coordinate for all sets of consecutive 3 months, including 'wrapping around', something like this:
iris.coord_categorisation.add_season(cube, coord, name='season',
seasons=(''jfm', 'fma', 'mam', 'amj', 'mjj', 'jja', 'jas', 'aso', 'son', 'ond', 'ndj', 'djf'))
or
season = ('jfm', 'fma', 'mam', 'amj', 'mjj', 'jja', 'jas', 'aso', 'son', 'ond', 'ndj', 'djf')
iris.coord_categorisation.add_season_membership(cube, coord, season, name='all_quarters')
Not tested this yet, just wondered if about suggestions or a recommendation?
And then, get the season with the max rainfall?
Qtr_max_rain = pr_cube.collapsed('season', iris.analysis.MAX)
Would that work correctly ?
There may be a way to achieve this using coord_categorisation, but I believe the simplest way is to instead use iris.cube.Cube.rolling_window(). There's no native way to wrap around in the way you need, so you can hack it by duplicating Jan and Feb on the end of the existing data.
I've tested the below and it seems to work as intended. Hopefully it works for you.
# Create extra cube based on Jan and Feb from pr_cube.
extra_months_cube = pr_cube[:2, ...]
# Replace time coordinate with another that is advanced by a year - ensures correct sorting.
# Adjust addition depending on the unit of the time coordinate.
extra_months_coord = extra_months_cube.coord("time") + (24 * 365)
extra_months_cube.remove_coord("time")
extra_months_cube.add_dim_coord(extra_months_coord, 0)
# Combine original cube with extra cube.
both_cubes = iris.cube.CubeList([pr_cube, extra_months_cube])
fourteen_month_cube = both_cubes.concatenate_cube()
# Generate cube of 3-month MAX aggregations.
rolling_cube = fourteen_month_cube.rolling_window("time", iris.analysis.MAX, 3)
Once done, you would of course be free to add your suggested three month labels using iris.cube.Cube.add_aux_coord().

Awk and calculating start time from end time and duration

I have a file with date, end time and duration in decimal format and I need to calculate the start time. The file looks like:
20140101;1212;1.5
20140102;1515;1.58
20140103;1759;.69
20140104;1100;12.5
...
The duration 1.5 for the time 12:12 means one and a half hours and the start time would be 12:12 - 1:30 = 10:42 AM or 11:00 - 12.5 = 11:00 - 12:30 = 22:30 PM. Is there an easy way for calculating such time differences in Awk or is it the good ol' split-multiply-subtract-and-handle-the-day-break-yourself all over again?
Since the values are in hours and minutes, only the minutes matter and the seconds can be discarded, for example duration 1.58 means 1:34 and the leftover 0.8 seconds can be discarded.
I'm on GNU Awk 4.1.3
As you are using gawk take adventage of its native time functions:
gawk -F\; '{tmst=sprintf("%s %s %s %s %s 00",\
substr($1,1,4),\
substr($1,5,2),\
substr($1,7,2),\
substr($2,1,2),\
substr($2,3,2))
t1=mktime(tmst)
seconds=sprintf("%f",$3)+0
seconds*=60*60
difference=strftime("%H%M",t1-seconds)
print $0""FS""difference}' file
Results:
20140101;1212;1.5;1042
20140102;1515;1.58;1340
20140103;1759;.69;1717
20140104;1100;12.5;2230
Check: https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html
Explanation:
tmst=sprintf(..) :used to create a date string from the file
that conforms with the datespec of mktime function YYYY MM
DD HH MM SS [DST].
t1=mktime(tmst) :turn datespec into a timestamp than can be
handle by gawk (as the number of seconds elapsed since 1
January 1970)
seconds=sprintf("%f",$3)+0 : convert third field to float.
seconds*=60*60 : convert hours (in float) to seconds.
difference=strftime("%H%M",t1-seconds) : get the difference in
human maner, hours an minutes.
I highly recommend to use a programming language which supports datetime calculations, because the calculation can be tricky in detail because daylight saving shifts. You can use Python for example:
start_times.py:
import csv
from datetime import datetime, timedelta
with open('input.txt', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=';', quotechar='|')
for row in reader:
end_day = row[0]
end_time = row[1]
# Create a datetime object
end = datetime.strptime(end_day + end_time, "%Y%m%d%H%M")
# Translate duration into minutes
duration=float(row[2])*60
# Calculate start time
start = end - timedelta(minutes=duration)
# Column 3 is the start day (can differ from end day!)
row.append(start.strftime("%Y%m%d"))
# Column 4 is the start time
row.append(start.strftime("%H%M"))
print ';'.join(row)
Run:
python start_times.py
Output:
20140101;1212;1.5;20140101;1042
20140102;1515;1.58;20140102;1340
20140103;1759;.69;20140103;1717
20140104;1100;12.5;20140103;2230 <-- you see, the day matters!
The above example is using the system's timezone. If the input data refers to a different timezone, Pyhon's datetime module allows to specify it.
I would do something like this:
awk 'BEGIN{FS=OFS=";"}
{ h=substr($2,0,2); m=substr($2,3,2); mins=h*60 + m; diff=mins - $3*60;
print $0, int(diff/60) ":" int(diff%60)
}' file
That is, convert everything to minutes and then back to hours/minutes.
Test
$ awk 'BEGIN{FS=OFS=";"}{h=substr($2,0,2); m=substr($2,3,2); mins=h*60 + m; diff=mins - $3*60; print $0, int(diff/60) ":" int(diff%60)}' a
20140101;1212;1.5;10:42
20140102;1515;1.58;13:40
20140103;1759;.69;17:17

compare times with milliseconds

I have two dates a start date and an end date. I want to get a new time object which is the difference between the two. The differences I am concerned with are Hours, Minutes, Seconds and Milliseconds. I need to be able to create a new Time object from the result that includes the milliseconds difference
>> require 'time'
=> true
>> start_time = Time.parse '1970-01-01T00:00:00.200'
=> 1970-01-01 00:00:00 +0000
>> end_time = Time.parse '1970-01-01T01:01:01.400'
=> 1970-01-01 01:01:01 +0000
>> difference = Time.at(end_time - start_time)
=> 1970-01-01 01:01:01 +0000
my problem is that difference does not have the milliseconds
I can see that the Time has milliseconds by running
>> difference.strftime('%H:%M:%S.%L')
=> "01:01:01.199"
but how do I access the milliseconds that are in the Time difference object.
it is critical I have milliseconds as I am working in sub-second calculations?
UPDATE
I don't think my first attempt at this question was as descriptive as it should of been, my apologies for that.
require 'time'
a = Time.now
sleep(0.5)
b = Time.now
b - a
# => 0.505087
Milliseconds!
EDIT: Microseconds!
my problem is that difference does not have the milliseconds
It does have the milliseconds, Time#to_s / Time#inspect just doesn't show it. Its output is equivalent to: strftime "%Y-%m-%d %H:%M:%S %z"
how do I access the milliseconds that are in the Time difference object.
usec returns the microseconds and nsec returns the nanoseconds:
time = Time.at(0.2)
time.usec #=> 200000
time.nsec #=> 200000000
For milliseconds you could use
time.usec / 1000 #=> 200
Ruby's Time class has nanosecond precision: you can use Time#to_f to get a fractional number of seconds since the Unix epoch. If you subtract two Time objects, you'll get a fractional number of seconds between them. Thus, to get the number of milliseconds between two times, try:
((time2 - time1) * 1000).to_i

Ruby time subtraction

There is the following task: I need to get minutes between one time and another one: for example, between "8:15" and "7:45". I have the following code:
(Time.parse("8:15") - Time.parse("7:45")).minute
But I get result as "108000.0 seconds".
How can I fix it?
The result you get back is a float of the number of seconds not a Time object. So to get the number of minutes and seconds between the two times:
require 'time'
t1 = Time.parse("8:15")
t2 = Time.parse("7:45")
total_seconds = (t1 - t2) # => 1800.0
minutes = (total_seconds / 60).floor # => 30
seconds = total_seconds.to_i % 60 # => 0
puts "difference is #{minutes} minute(s) and #{seconds} second(s)"
Using floor and modulus (%) allows you to split up the minutes and seconds so it's more human readable, rather than having '6.57 minutes'
You can avoid weird time parsing gotchas (Daylight Saving, running the code around midnight) by simply doing some math on the hours and minutes instead of parsing them into Time objects. Something along these lines (I'd verify the math with tests):
one = "8:15"
two = "7:45"
h1, m1 = one.split(":").map(&:to_i)
h2, m2 = two.split(":").map(&:to_i)
puts (h1 - h2) * 60 + m1 - m2
If you do want to take Daylight Saving into account (e.g. you sometimes want an extra hour added or subtracted depending on today's date) then you will need to involve Time, of course.
Time subtraction returns the value in seconds. So divide by 60 to get the answer in minutes:
=> (Time.parse("8:15") - Time.parse("7:45")) / 60
#> 30.0

Number of days between two Time instances

How can I determine the number of days between two Time instances in Ruby?
> earlyTime = Time.at(123)
> laterTime = Time.now
> time_difference = laterTime - earlyTime
I'd like to determine the number of days in time_difference (I'm not worried about fractions of days. Rounding up or down is fine).
Difference of two times is in seconds. Divide it by number of seconds in 24 hours.
(t1 - t2).to_i / (24 * 60 * 60)
require 'date'
days_between = (Date.parse(laterTime.to_s) - Date.parse(earlyTime.to_s)).round
Edit ...or more simply...
require 'date'
(laterTime.to_date - earlyTime.to_date).round
earlyTime = Time.at(123)
laterTime = Time.now
time_difference = laterTime - earlyTime
time_difference_in_days = time_difference / 1.day # just divide by 1.day
[1] pry(main)> earlyTime = Time.at(123)
=> 1970-01-01 01:02:03 +0100
[2] pry(main)> laterTime = Time.now
=> 2014-04-15 11:13:40 +0200
[3] pry(main)> (laterTime.to_date - earlyTime.to_date).to_i
=> 16175
To account for DST (Daylight Saving Time), you'd have to count it by the days. Note that this assumes less than a day is counted as 1 (rounded up):
num = 0
cur = start_time
while cur < end_time
num += 1
cur = cur.advance(:days => 1)
end
return num
Here is a simple answer that works across DST:
numDays = ((laterTime - earlyTime)/(24.0*60*60)).round
60*60 is the number of seconds in an hour
24.0 is the number of hours in a day. It's a float because some days are a little more than 24 hours, some are less. So when we divide by the number of seconds in a day we still have a float, and round will round to the closest integer.
So if we go across DST, either way, we'll still round to the closest day. Even if you're in some weird timezone that changes more than an hour for DST.
in_days (Rails 6.1+)
Rails 6.1 introduces new ActiveSupport::Duration conversion methods like in_seconds, in_minutes, in_hours, in_days, in_weeks, in_months, and in_years.
As a result, now, your problem can be solved as:
date_1 = Time.parse('2020-10-18 00:00:00 UTC')
date_2 = Time.parse('2020-08-13 03:35:38 UTC')
(date_2 - date_1).seconds.in_days.to_i.abs
# => 65
Here is a link to the corresponding PR.
None of these answers will actually work if you don't want to estimate and you want to take into account daylight savings time.
For instance 10 AM on Wednesday before the fall change of clocks and 10 AM the Wednesday afterwards, the time between them would be 1 week and 1 hour. During the spring it would be 1 week minus 1 hour.
In order to get the accurate time you can use the following code
def self.days_between_two_dates later_time, early_time
days_between = (later_time.to_date-early_time.to_date).to_f
later_time_time_of_day_in_seconds = later_time.hour*3600+later_time.min*60+later_time.sec
earlier_time_time_of_day_in_seconds = early_time.hour*3600+early_time.min*60+early_time.sec
days_between + (later_time_time_of_day_in_seconds - early_time_time_of_day_in_seconds)/1.0.day
end

Resources