I need to subtract two DateTime objects in order to find out the difference in hours between them.
I try to do the following:
a = DateTime.new(2015, 6, 20, 16)
b = DateTime.new(2015, 6, 21, 16)
puts a - b
I get (-1/1), the object of class Rational.
So, the question is, how do I find out what the difference betweent the two dates is? In hours or days, or whatever.
And what does this Rational mean/represent when I subtract DateTimes just like that?
BTW:
When I try to subtract DateTime's with the difference of 1 year, I get (366/1), so when I do (366/1).to_i, I get the number of days. But when I tried subtracting two DateTime's with the difference of 1 hour, it gave me -1, the number of hours. So, how do I also find out the meaning of the returned value (hours, days, years, seconds)?
When you substract two datetimes, you'll get the difference in days, not hours.
You get a Rational type for the precision (some float numbers cannot be expressed exactly with computers)
To get a number of hours, multiply the result by 24, for minutes multiply by 24*60 etc...
a = DateTime.new(2015, 6, 20, 16)
b = DateTime.new(2015, 6, 21, 16)
(a - b).to_i
# days
# => -1
((a - b)* 24).to_i
# hours
# => -24
# ...
Here's a link to the official doc
If you do subtraction on them as a Time object it will return the result in seconds and then you can multiply accordingly to get minutes/hours/days/whatever.
a = DateTime.new(2015, 6, 20, 16)
b = DateTime.new(2015, 6, 21, 16)
diff = b.to_time - a.to_time # 86400
hours = diff / 60 / 60 # 24
Related
I am trying to write a simple script, where the input would be a start date, end date and a total amount of hours (150) and the script would generate a simple report containing random date-time intervals (with ideally weekdays) that would sum the entered amount of hours.
This is what I am trying to achieve:
Start: 2020-01-01
End: 2020-01-31
Total hours: 150
Report:
Jan 1, 2019, 08:02:20 – Jan 1, 2019, 08:55:00: sub time -> 52:40 (52 minutes 40 seconds)
Jan 1, 2019, 09:00:00 – Jan 1, 2019, 09:38:13: sub time -> 38:13 (38 minutes 13 seconds)
...
Jan 3, 2019, 13:15:00 – Jan 3, 2019, 14:45:13: sub time -> 01:30:13 (1 hour 30 minutes 13 seconds)
...
TOTAL TIME: 150 hours (or in minutes)
How do I generate time intervals where the total amount of minutes/hours would be equal to a given number of hours?
I assume the question is loosely-worded in the sense that "random" is not meant in a probability sense; that is, the intent is not to select a set of intervals (that total a given number of hours in length) with a mechanism that ensures all possible sets of such intervals have an equal likelihood of being selected. Rather, I understand that a set of intervals is to be chosen (e.g., for testing purposes) in a way that incorporates elements of randomness.
I have assumed the intervals are to be non-overlapping and the number of intervals is to be specified. I don't understand what "with ideally weekdays" means so I have disregarded that.
The heart of the approach I will propose is the following method.
def rnd_lengths(tot_secs, target_nbr)
max_secs = 2 * tot_secs/target_nbr - 1
arr = []
loop do
break(arr) if tot_secs.zero?
l = [(0.5 + max_secs * rand).round, tot_secs].min
arr << l
tot_secs -= l
end
end
The method generates an array of integers (lengths of intervals), measured in seconds, ideally having target_nbr elements. tot_secs is the required combined length of the "random" intervals (e.g., 150*3600).
Each element of the array is drawn randomly drawn from a uniform distribution that ranges from zero to max_secs (to be computed). This is done sequentially until tot_secs is reached. Should the last random value cause the total to exceed tot_secs it is reduced to make the total equal tot_secs.`
Suppose tot_secs equals 100 and we wish to generate 4 random intervals (target_nbr = 4). That means the average length of the intervals would be 25. As we are using a uniform distribution having an average of (1 + max_secs)/2, we may derive the value of max_secs from the expression
target_nbr * (1 + max_secs)/2 = tot_secs
which is
max_secs = 2 * tot_secs/target_nbr - 1
the first line of the method. For the example I mentioned, this would be
max_secs = 2 * 100/4 - 1
#=> 49
Let's try it.
rnd_lengths(100, 4)
#=> [49, 36, 15]
As you see the array that is returned sums to 100, as required, but it contains only 3 elements. That's why I named the argument target_nbr, as there is no assurance the array returned will have that number of elements. What to do? Try again!
rnd_lengths(100, 4)
#=> [14, 17, 26, 37, 6]
Still not 4 elements, so keep trying:
rnd_lengths(100, 4)
#=> [11, 37, 39, 13]
Success! It may take a few tries to get the correct number of elements, but for parameters likely to be used, and the nature of the probability distribution employed, I wouldn't expect that to be a problem.
Let's put this in a method.
def rdm_intervals(tot_secs, nbr_intervals)
loop do
arr = rnd_lengths(tot_secs, nbr_intervals)
break(arr) if arr.size == nbr_intervals
end
end
intervals = rdm_intervals(100, 4)
#=> [29, 26, 7, 38]
We can compute random gaps between intervals in the same way. Suppose the intervals fall within a range of 175 seconds (the number of seconds between the start time and end time). Then:
gaps = rdm_intervals(175-100, 5)
#=> [26, 5, 19, 4, 21]
As seen, the gaps sum to 75, as required. We can disregard the last element.
We can now form the intervals. The first interval begins at 26 seconds and ends at 26+29 #=> 55 seconds. The second interval begins at 55+5 #=> 60 seconds and ends at 60+26 #=> 86 seconds, and so on. We therefore find the intervals (each in ranges of seconds from zero) to be:
[26..55, 60..86, 105..112, 116..154]
Note that 175 - 154 = 21, the last element of gaps.
If one is uncomfortable with the fact that the last elements of intervals and gaps that are generally constrained in size one could of course randomly reposition those elements within their respective arrays.
One might not care if the number of intervals is exactly target_nbr. It would be simpler and faster to just use the first array of interval lengths produced. That's fine, but we still need the above methods to compute the random gaps, as their number must equal the number of intervals plus one:
gaps = rdm_intervals(175-100, intervals.size + 1)
We can now use these two methods to construct a method that will return the desired result. The argument tot_secs of this method equals total number of seconds spanned by the array intervals returned (e.g., 3600 * 150). The method returns an array containing nbr_intervals non-overlapping ranges of Time objects that fall between the given start and end dates.
require 'date'
def construct_intervals(start_date_str, end_date_str, tot_secs, nbr_intervals)
start_time = Date.strptime(start_date_str, '%Y-%m-%d').to_time
secs_in_period = Date.strptime(end_date_str, '%Y-%m-%d').to_time - start_time
intervals = rdm_intervals(tot_secs, nbr_intervals)
gaps = rdm_intervals(secs_in_period - tot_secs, nbr_intervals+1)
nbr_intervals.times.with_object([]) do |_,arr|
start_time += gaps.shift
end_time = start_time + intervals.shift
arr << (start_time..end_time)
start_time = end_time
end
end
See Date::strptime.
Let's try an example.
start_date_str = '2020-01-01'
end_date_str = '2020-01-31'
tot_secs = 3600*150
#=> 540000
construct_intervals(start_date_str, end_date_str, tot_secs, 4)
#=> [2020-01-06 18:05:04 -0800..2020-01-09 03:48:00 -0800,
# 2020-01-09 06:44:16 -0800..2020-01-11 23:33:44 -0800,
# 2020-01-20 20:30:21 -0800..2020-01-21 17:27:44 -0800,
# 2020-01-27 19:08:38 -0800..2020-01-28 01:38:51 -0800]
construct_intervals(start_date_str, end_date_str, tot_secs, 8)
#=> [2020-01-03 18:43:36 -0800..2020-01-04 10:49:14 -0800,
# 2020-01-08 07:55:44 -0800..2020-01-08 08:17:18 -0800,
# 2020-01-11 00:54:36 -0800..2020-01-11 23:00:53 -0800,
# 2020-01-14 05:20:14 -0800..2020-01-14 22:48:45 -0800,
# 2020-01-16 18:28:28 -0800..2020-01-17 22:50:24 -0800,
# 2020-01-22 02:59:31 -0800..2020-01-22 22:33:08 -0800,
# 2020-01-23 00:36:59 -0800..2020-01-24 12:15:37 -0800,
# 2020-01-29 11:22:21 -0800..2020-01-29 21:46:10 -0800]
See Date::strptime
START -xxx----xxx--x----xxxxx---xx--xx---xx-xx-x-xxx-- END
We need to fill a timespan with alternating periods of ON and OFF. This can be
denoted by a list of timestamps. Let's say that the period always starts with
an OFF period for simplicity's sake.
From the start/end of the timespan and the total seconds in ON state, we
gather useful facts:
the timespan's total size in seconds total_seconds
the second totals of both the ON (on_total_seconds) and the OFF (off_total_seconds) periods
Once we know these, a workable algorithm looks more or less like this - pardon
the functions without implementation:
# this can be a parameter as well
MIN_PERIODS = 10
MAX_PERIODS = 100
def fill_periods(start_date, end_date, on_total_seconds = 150*60*60)
total_seconds = get_total_seconds(start_date, end_date)
off_total_seconds = total_seconds - on_total_seconds
# establish two buckets to pull from alternately in populating our array of durations
on_bucket = on_total_seconds
off_bucket = off_total_seconds
result = []
# populate `result` with durations in seconds. `result` will sum to `total_seconds`
while on_bucket > 0 || off_bucket > 0 do
off_slice = rand(off_total_seconds / MAX_PERIODS / 2, off_total_seconds / MIN_PERIODS / 2).to_i
off_bucket -= [off_slice, off_bucket].min
on_slice = rand(on_total_seconds / MAX_PERIODS / 2, on_total_seconds / MIN_PERIODS / 2).to_i
on_bucket -= [on_slice, on_bucket].min
# randomness being random, we're going to hit 0 in one bucket before the
# other. when this happens, just add this (off, on) pair to the last one.
if off_slice == 0 || on_slice == 0
last_off, last_on = result.pop(2)
result << last_off + off_slice << last_on + on_slice
else
result << off_slice << on_slice
end
end
# build up an array of datetimes by progressively adding seconds to the last timestamp.
datetimes = result.each_with_object([start_date]) do |period, memo|
memo << add_seconds(memo.last, period)
end
# we want a list of datetime pairs denoting ON periods. since we know our
# timespan starts with OFF, we start our list of pairs with the second element.
datetimes.slice(1..-1).each_slice(2).to_a
end
Assuming each month always has 30 days, I'd like to calculate the days between two given dates.
FROM 05/04/2020
TO 20/12/2020
result: 256 days (NOT 259 days if we considered months with 31 days)
With the simple mathematical subtraction between dates I get the wrong risult:
(Date.new(2019,12,20) - Date.new(2019,4,5)).floor
=> 259
To overcome this I had to create a pretty complex alghoritm:
days += inclusive_days_in_range(
position_data[:workFrom],
position_data[:workFrom].at_end_of_month
)
months = inclusive_months_in_range(
position_data[:workFrom].at_beginning_of_month.next_month,
position_data[:workTo].at_end_of_month.prev_month
)
days += months * MAX_DAYS_IN_MONTHS
days += inclusive_days_in_range(
position_data[:workTo].at_beginning_of_month,
position_data[:workTo]
)
Is there a simple way?
Similar to #CarySwoveland's answer but uses dot product:
require 'matrix'
def ndays str
Vector[*str.split('/').map(&:to_i)].dot [1,30,360]
end
> ndays('20/12/2020') - ndays('05/04/2020') + 1
=> 256
Add +1 since it seems like you want the number of days, inclusive.
Another approach would be to count the number of months, multiply by 30, then subtract the days into the month of the FROM date, and add in the days of the TO date.
Counting months has already been answered on stack overflow here: Find number of months between two Dates in Ruby on Rails
so I'll use that as a reference to get the months. Then it's just a matter of addition and subtraction
from_date = Date.new(2019,4,5)
to_date = Date.new(2019,12,20)
num_months = (12*(to_date.year-from_date.year))+(to_date.month-from_date.month)
# We add 1 to make it inclusive, otherwise you get 255
num_days = (num_months*30) + to_date.day - from_date.day + 1
def days_from_zero(date_str)
d, m, y = date_str.split('/').map(&:to_i)
d + 30*(m + 12*y)
end
days_from_zero("05/04/2020") - days_from_zero("4/04/2020") #=> 1
days_from_zero("20/12/2020") - days_from_zero("05/04/2020") #=> 255
days_from_zero("05/04/2020") - days_from_zero("20/12/2020") #=> -255
days_from_zero("05/04/2020") - days_from_zero("3/6/20") #=> 719942
There is an SQL function with date as argument
f(p_date) = mod(to_char(p_date,'mm')+1,2)*39 + to_char(p_date,'dd')
The values of f(p_date) repeat themselves with a peroid of 2 months, i.e.
f(Feb 7th) = 46
f(Feb 8th) = 47
...
f(Apr 7th) = 46
...
f(Jun 7th) = 46
...
I don't catch a pattern here. Why is the multiplier equal to 39? Where do the 2 months come from?
What I need, is eventually same sort of function, but with a period of 40 days (or 1.5 months):
f(Feb 7th) = 46
..
f(Mar 19th) = 46
..
f(Apr 28th) = 46, etc
Thanks for any help.
Why is the multiplier equal to 39?
The modulo expression will evaluate to 0 for odd months and 1 for even months. This multiplied by 39 is either 0 or 39. Added the day, the function will return the day for odd months, and 39+day for even months.
Thus,
odd (january)
1, 2, 3, ..., last-of-month
even (february)
40, 41, 42, ... 39+last-of-month
Where do the 2 months come from?
The 2 is the argument of the modulus function (its divisor). The modulus function will return the sequence 1, 0, 1, 0, 1 ... for the input 1, 2, 3, 4, 5, ... and so on. Mathematically the remainder. It is used to create the odd/even periodicity.
#AlexeyKryuchkov, can you give more background about what you're trying to achieve and why?
1.5 months does not map to 40 days (or to any fixed number of days).
If you're trying to define a "40-day month", the easiest solution is to convert a date into an absolute day, then mod by 40.
I wrote a Q&A recently about the complexity of working with calendars: https://stackoverflow.com/a/48611348/9129668.
And adapting some of the code in that answer (which is based on SQL Server, not Oracle), the function you may be looking for would be something like:
((((DATEDIFF(DD, CONVERT(DATETIME2(0),'0001-01-01',102), p_date) + 1) - 1) % 40) + 1) AS day_of_40_day_mth
But if you give me a bit more explanation, I might be able to be more specific.
I have a community matrix (samples x species of animals). I sampled the animals weekly over many years (in this example, three years). I want to figure out how sampling timing (start week and duration a.k.a. number of weeks) affects species richness. Here is an example data set:
Data <- data.frame(
Year = rep(c('1996', '1997', '1998'), each = 5),
Week = rep(c('1', '2', '3', '4', '5'), 3),
Species1 =sample(0:5, 15, replace=T),
Species2 =sample(0:5, 15, replace=T),
Species3 =sample(0:5, 15, replace=T)
)
The outcome that I want is something along the lines of:
Year StartWeek Duration(weeks) SpeciesRichness
1996 1 1 2
1996 1 2 3
1996 1 3 1
...
1998 5 1 1
I had tried doing this via a combination of rollapply and vegan's specnumber, but got a sample x species matrix instead of a vector of Species Richness. Weird.
For example, I thought that this should give me species richness for sampling windows of two weeks:
test<-rollapply(Data[3:5],width=2,specnumber,align="right")
Thank you for your help!
I figured it out by breaking up the task into two parts:
1. Summing up species abundances using rollapplyr, as implemented in a ddplyr mutate_each thingamabob
2. Calculating species richness using vegan.
I did this for each sampling duration window separately.
Here is the bare bones version (I just did this successively for each sampling duration that I wanted by changing the width argument):
weeksum2 <- function(x) {rollapply(x, width = 2, align = 'left', sum, fill=NA)}
sum2weeks<-Data%>%
arrange(Year, Week)%>%
group_by(Year)%>%
mutate_each(funs(weeksum2), -Year, -Week)
weeklyspecnumber2<-specnumber(sum2weeks[,3:ncol(sum2weeks)],
groups = interaction(sum2weeks$Week, sum2weeks$Year))
weeklyspecnumber2<-unlist(weeklyspecnumber2)
weeklyspecnumber2<-as.data.frame(weeklyspecnumber2)
weeklyspecnumber2$WeekYear<-as.factor(rownames(weeklyspecnumber2))
weeklyspecnumber2<-tidyr::separate(weeklyspecnumber2, WeekYear, into = c('Week', 'Year'), sep = '[.]')
The problem I'm trying to solve: calculate the current average velocity of some data series where the data points are unevenly spread. For example, calculating the current speed of an upload, where the 'amount uploaded' signals arrive unevenly:
t = 0, sent = 0
t = 5, sent = 10
t = 6, sent = 12
t = 9, sent = 20
(last - first) / (time delta between first and last)
And that would be exactly the average velocity.
Unsless you forgot to tell us some details, you do not need the data points in the middle.
You can calculate the average per time unit by taking the delta of the new values and the previous values.
And if you want the average over multiple points, you can calculate the averages between several points, and than take the average of those averages.
For example:
Current average:
t34 = 9 - 6 = 3
sent34 = 20 - 12 = 8
average34 = 8 / 3 = 2.67
Average of last two time slots:
t23 = 6 - 5 = 1
sent23 = 12 - 10 = 2
average23 = 2 / 1 = 2
average234 = (2 + 2.67) / 2 = 2.33
Just rescale latest results
For you example:
t = 0, sent = 0
t = 5, sent = 10
t = 6, sent = 12
t = 9, sent = 20
CurrentSpeed = (20 -12) / (9 - 6) = 8/3 = 2.666666
You may use different rescale interval size to decrease speed of changing velocity (when connection "lost" "restored")
The standard way of calculating a velocity from noisy data is to apply a Kalman filter.