mktime shifts a time by one hour - windows

I faced with an interesting problem with mktime function. I use russian time zone (UTC+03:00) Волгоград, Москва, Санкт-Петербург (RTZ 2) / Volgograd, Moscow, Saint Petersburg/ and try to construct time_t for "7.01.2009 00:00:00"
tm localTM;
localTM.tm_sec = 0;
localTM.tm_min = 0;
localTM.tm_hour = 0;
localTM.tm_mday = 7;
localTM.tm_mon = 0;
localTM.tm_year = 109;
time_t t = mktime(&localTM);
After mktime execution date&time is changed to "6.01.2009 23:00:00".
I have no problems then I construct time for "06.01.2009 00:00:00" or "08.01.2009 00:00:00".
If I switch time zone to another one, I get no problems with "7.01.2009 00:00:00".
What can be a reason of this oddity, and how can I workaround the issue?

When performing conversion to time_t, mktime needs to guess if the input is DST (Daylight Saving Time) or not.
For that, tm.tm_isdst field is used. See from man mktime
tm_isdst A flag that indicates whether daylight saving time is in
effect at the time described. The value is positive if day-
light saving time is in effect, zero if it is not, and nega-
tive if the information is not available.
Since you do not initialize tm_isdst in your code, the default value (0) is used, making mktime think it's in NO-DST period.
To fix it in your code, simply add
localTM.tm_isdst = -1
Note - that logic is necessary as for some moments in time just the "wallclock" information stored in tm is not sufficient to determine the exact time.
And yes, the fact that the default behavior is like that is a bit messed up :)

Related

How to convert a hex TimeDateStamp DWORD value into human readable format?

Can anyone explain how to convert a Hex TimeDateStamp DWORD value into human readable format?
I'm just curious as to how a value such as 0x62444DB4 is converted into
"Wednesday, 30 March 2022 10:31:48 PM"
I tried googling of course and could not find any explanation. But there are online converters available.
But I'm just interested in converting these values for myself.
Your value is a 32-bit Timestamp.
Your datetime value is a 32-bit Unix Timestamp: The number of seconds since 1/1/1970.
See https://unixtime.org/
In most programming languages you can work with the hexadecimal notation directly.
Implementation should not be done by one person alone, since a lot of engineering goes into it. Leap years, even leap seconds, timezones, daylight savings time, UTC... all these things need to be addressed when working with a timestamp.
I have added my rough calculation below as a demonstration. Definitely use an existing package or library to work with timestamps.
See the JavaScript code below for demonstration.
There I multiply your value by 1000 because JavaScript works in Milliseconds. But otherwise this applies the same to other systems.
let timestamp = 0x62444DB4;
let dateTime = new Date(timestamp * 1000);
console.log('Timestamp in seconds:', timestamp);
console.log('Human-Readable:', dateTime.toDateString() + ' ' + dateTime.toTimeString());
// Rough output, just for the time.
// Year month and day get really messy with timezones, leap years, etc.
let hours = Math.floor(timestamp/3600) % 24;
let minutes = Math.floor(timestamp/60) % 60;
let seconds = Math.floor(timestamp) % 60;
console.log('Using our own time calculation:', hours + ':' + minutes + ':' + seconds);

Why there is no inverse function for gmtime in libc?

In libc there are two functions to convert from system time to calendar time - gmtime and localtime, but only localtime has inverse function - mktime. Why there is no inverse function for gmtime, and if there shouldn't be any, why gmtime exists?
I've found this piece of code work satisfactorily:
namespace std {
time_t timegm(tm* _Tm)
{
auto t = mktime(_Tm);
return t + (mktime(localtime(&t)) - mktime(gmtime(&t)));
}
}
which satifies the test:
auto t1 = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
auto t2 = std::timegm(std::gmtime(&t1));
EXPECT_EQ(t1, t2);
To explain the existence of gmtime(), some context is required:
gmtime() will convert a timestamp representation (number of seconds since 1970-01-01 00:00:00) to broken-down time representation (aka, struct tm), assuming that the timestamp timezone is UTC:
The gmtime() function converts the calendar time timep to
broken-down time representation, expressed in Coordinated Universal
Time (UTC). It may return NULL when the year does not fit into an
integer. The return value points to a statically allocated struct
which might be overwritten by subsequent calls to any of the date
and time functions.
In the other hand, localtime() takes in consideration the [local] system timezone (including daylight saving):
The localtime() function converts the calendar time timep to
broken- down time representation, expressed relative to the user's
specified timezone. The function acts as if it called tzset(3) and
sets the external variables tzname with information about the
current timezone, timezone with the difference between Coordinated
Universal Time (UTC) and local standard time in seconds, and
daylight to a nonzero value if daylight savings time rules apply
during some part of the year.
Note that the number of seconds since 1970-01-01 00:00:00 differ from timezone to timezone (when it was 1970-01-01 00:00:00 in New York, it clearly wasn't in, for instance, Tokyo).
The mktime() converts a struct tm to a time_t value (number of seconds since 1970-01-01 00:00:00) based on the [local] system timezone, and should not be interpreted as the inverse of any particular function (such as localtime() or gmtime()), as the inverse term may be [wrongly] interpreted as a safe cross-system conversion:
The mktime() function converts a broken-down time structure,
expressed as local time, to calendar time representation. The
function ignores the values supplied by the caller in the tm_wday
and tm_yday fields. The value specified in the tm_isdst field informs
mktime() whether or not daylight saving time (DST) is in effect
for the time supplied in the tm structure: a positive value means DST
is in effect;
There is also a non-portable function (for GNU and BSD systems) called timegm(), which assumes a UTC timezone, such as gmtime() does.
References
Blockquoted text is retrieved from parts of release 3.74 of the Linux man-pages project.

What is the value of the ISO 8601 duration `P1M` (in seconds)?

Suppose I have an ISO 8601 duration, expressed as "P1M". Phrased colloquially, this means "one month." Is there a standard rule for converting this into a number of seconds, assuming the start date is not known?
For 30-day months, it might be 2,592,000.
For 31-day months, it might be 2,678,400.
In February, it might be 2,419,200 or it might be 2,505,600.
My gut says there's no way to resolve "one month" to an exact number of seconds without knowing context, and where those seconds are laid out on the calendar. But are there standard rules/conventions to calculate these durations in an abstract way?
From ISO 8601 documentation that I found (page 6 - http://xml.coverpages.org/ISO-FDIS-8601.pdf), it seems you are correct in that the number of seconds in a month cannot definitively be determined. However it does note that "In certain applications a month is regarded as a unit of time of 30 days", so depending on your application this may be a valid approach.
The distinction between "Calendar Time" (Years, Months, etc) and "Absolute Time" (Hours, Minutes, Seconds, etc) is sometimes an important one. As an example, some people might complain about having 13 mortgage payments some years if they paid every 30 days as opposed to every month.
You are right, an ISO 8601 duration is dependent of the context.
A duration is a period/an interval of time between two dates.
Example :
2020-01-01/2020-02-01 = P1M = P31D
2020-02-01/2020-03-01 = P1M = P29D
2019-02-01/2019-03-01 = P1M = P28D
If you want a fixed duration indepedent of the context, use the day notation P30D, P60D, P90D... instead.
The same applies for years :
2019-01-01/2020-01-01 = P1Y = P12M = P365D
2020-01-01/2021-01-01 = P1Y = P12M = P366D
If you can't have context information about a duration, for example P1M retrieved from database or given by user input, use by default today's context.
//What is a duration of one month in seconds ?
P1M = ? (no context)
//Use default context
Today = 2020-03-31
2020-03-31/P1M = 2020-03-31/2020-04-30
=> P1M = P30D
//A month contains 2 592 000 seconds

Build fixed interval dataset from random interval dataset using stale data

Update: I've provided a brief analysis of the three answers at the bottom of the question text and explained my choices.
My Question: What is the most efficient method of building a fixed interval dataset from a random interval dataset using stale data?
Some background: The above is a common problem in statistics. Frequently, one has a sequence of observations occurring at random times. Call it Input. But one wants a sequence of observations occurring say, every 5 minutes. Call it Output. One of the most common methods to build this dataset is using stale data, i.e. set each observation in Output equal to the most recently occurring observation in Input.
So, here is some code to build example datasets:
TInput = 100;
TOutput = 50;
InputTimeStamp = 730486 + cumsum(0.001 * rand(TInput, 1));
Input = [InputTimeStamp, randn(TInput, 1)];
OutputTimeStamp = 730486.002 + (0:0.001:TOutput * 0.001 - 0.001)';
Output = [OutputTimeStamp, NaN(TOutput, 1)];
Both datasets start at close to midnight at the turn of the millennium. However, the timestamps in Input occur at random intervals while the timestamps in Output occur at fixed intervals. For simplicity, I have ensured that the first observation in Input always occurs before the first observation in Output. Feel free to make this assumption in any answers.
Currently, I solve the problem like this:
sMax = size(Output, 1);
tMax = size(Input, 1);
s = 1;
t = 2;
%#Loop over input data
while t <= tMax
if Input(t, 1) > Output(s, 1)
%#If current obs in Input occurs after current obs in output then set current obs in output equal to previous obs in input
Output(s, 2:end) = Input(t-1, 2:end);
s = s + 1;
%#Check if we've filled out all observations in output
if s > sMax
break
end
%#This step is necessary in case we need to use the same input observation twice in a row
t = t - 1;
end
t = t + 1;
if t > tMax
%#If all remaining observations in output occur after last observation in input, then use last obs in input for all remaining obs in output
Output(s:end, 2:end) = Input(end, 2:end);
break
end
end
Surely there is a more efficient, or at least, more elegant way to solve this problem? As I mentioned, this is a common problem in statistics. Perhaps Matlab has some in-built function I'm not aware of? Any help would be much appreciated as I use this routine a LOT for some large datasets.
THE ANSWERS: Hi all, I've analyzed the three answers, and as they stand, Angainor's is the best.
ChthonicDaemon's answer, while clearly the easiest to implement, is really slow. This is true even when the conversion to a timeseries object is done outside of the speed test. I'm guessing the resample function has a lot of overhead at the moment. I am running 2011b, so it is possible Mathworks have improved it in the intervening time. Also, this method needs an additional line for the case where Output ends more than one observation after Input.
Rody's answer runs only slightly slower than Angainor's (unsurprising given they both employ the histc approach), however, it seems to have some problems. First, the method of assigning the last observation in Output is not robust to the last observation in Input occurring after the last observation in Output. This is an easy fix. But there is a second problem which I think stems from having InputTimeStamp as the first input to histc instead of the OutputTimeStamp adopted by Angainor. The problem emerges if you change OutputTimeStamp = 730486.002 + (0:0.001:TOutput * 0.001 - 0.001)'; to OutputTimeStamp = 730486.002 + (0:0.0001:TOutput * 0.0001 - 0.0001)'; when setting up the example inputs.
Angainor's appears robust to everything I threw at it, plus it was the fastest.
I did a lot of speed tests for different input specifications - the following numbers are fairly representative:
My naive loop: Elapsed time is 8.579535 seconds.
Angainor: Elapsed time is 0.661756 seconds.
Rody: Elapsed time is 0.913304 seconds.
ChthonicDaemon: Elapsed time is 22.916844 seconds.
I'm +1-ing Angainor's solution and marking the question solved.
This "stale data" approach is known as a zero order hold in signal and timeseries fields. Searching for this quickly brings up many solutions. If you have Matlab 2012b, this is all built in to the timeseries class by using the resample function, so you would simply do
TInput = 100;
TOutput = 50;
InputTimeStamp = 730486 + cumsum(0.001 * rand(TInput, 1));
InputData = randn(TInput, 1);
InputTimeSeries = timeseries(InputData, InputTimeStamp);
OutputTimeStamp = 730486.002 + (0:0.001:TOutput * 0.001 - 0.001);
OutputTimeSeries = resample(InputTimeSeries, OutputTimeStamp, 'zoh'); % zoh stands for zero order hold
Here is my take on the problem. histc is the way to go:
% find Output timestamps in Input bins
N = histc(Output(:,1), Input(:,1));
% find counts in the non-empty bins
counts = N(find(N));
% find Input signal value associated with every bin
val = Input(find(N),2);
% now, replicate every entry entry in val
% as many times as specified in counts
index = zeros(1,sum(counts));
index(cumsum([1 counts(1:end-1)'])) = 1;
index = cumsum(index);
val_rep = val(index)
% finish the signal with last entry from Input, as needed
val_rep(end+1:size(Output,1)) = Input(end,2);
% done
Output(:,2) = val_rep;
I checked against your procedure for a few different input models (I changed the number of Output timestamps) and the results are the same. However, I am still not sure I understood your problem, so if something is wrong here let me know.

GetDateFormat() fails on dates before 1/1/1601

i am trying to format a date using Windows GetDateFormat API function:
nResult = GetDateFormat(
localeId, //0x409 for en-US, or LOCALE_USER_DEFAULT if you're not testing
0, //flags
dt, //a SYSTEMTIME structure
"M/d/yyyy", //the format we require
null, //the output buffer to contain string (null for now while we get the length)
0); //the length of the output buffer (zero while we get the length)
Now we pass it a date/time:
SYSTEMTIME dt;
dt.wYear = 1600;
dt.wMonth = 12;
dt.wDay = 31;
In this case nResult returns zero:
The function returns 0 if it does not succeed. To get extended error information, the application can call GetLastError, which can return one of the following error codes:
ERROR_INSUFFICIENT_BUFFER. A supplied buffer size was not large enough, or it was incorrectly set to NULL.
ERROR_INVALID_FLAGS. The values supplied for flags were not valid.
ERROR_INVALID_PARAMETER. Any of the parameter values was invalid.
If, however, i return a date one day later:
SYSTEMTIME dt;
dt.wYear = 1601;
dt.wMonth = 1;
dt.wDay = 1;
Then it works.
What am i doing wrong? How do i format dates?
e.g. the date of the birth of Christ:
12/25/0000
or the date when the universe started:
-10/22/4004 6:00 PM
or the date Caesar died:
-3/15/44
Bonus Reading
Sorting It All Out: GetDateFormat is Gregorian based
GetDateFormatEx function
This is actually a limitation on SystemTime.
...year/month/day/hour/minute/second/milliseconds value since 1 January 1601 00:00:00 UT... to 31 December 30827 23:59:59.999
I spent some time looking up how to get around this limitation, but since GetDateFormat() takes a SystemTime you'll probably have to bite the bullet and write your own format() method.
SYSTEMTIME struct is valid only from year 1601 through 30827, because in Windows machines, is system time counted from elapsed intervals from 1.1.1601 00:00. See
Wikipedia article.

Resources