Time as a Signed Integer - time

I've been reading up on the Y2038 problem and I understand that time_t will eventually revert to the lowest representable negative number because it'll try to "increment" the sign bit.
According to that Wikipedia page, changing time_t to an unsigned integer cannot be done because it would break programs that handle early dates. (Which makes sense.)
However, I don't understand why it wasn't made an unsigned integer in the first place. Why not just store January 1, 1970 as zero rather than some ridiculous negative number?

Because letting it start at signed −2,147,483,648 is equivalent to letting it start at unsigned 0. It doesn't change the range of a values a 32 bit integer can hold - a 32 bit integer can hold 4,294,967,296 different states. The problem isn't the starting point, the problem is the maximum value which can be held by the integer. Only way to mitigate the problem is to upgrade to 64 bit integers.
Also (as I just realized that): 1970 was set as 0, so we could reach back in time as well. (reaching back to 1901 seemed to be sufficient at the time). If they went unsigned, the epoch would've begun at 1901 to be able to reach back from 1970, and we would have the same problem again.

There's a more fundamental problem here than using unsigned values. If we used unsigned values, then we'd get only one more bit of timekeeping. This would have a definitely positive impact - it would double the amount of time we could keep - but then we'd have a problem much later on in the future. More generally, for any fixed-precision integer value, we'd have a problem along these lines.
When UNIX was being developed in the 1970s, having a 60 year clock sounded fine, though clearly a 120-year clock would have been better. If they had used more bits, then we'd have a much longer clock - say 1000 years - but after that much time elapsed we'd be right back in the same bind and would probably think back and say "why didn't they use more bits?"

Because not all systems have to deal purely with "past" and "future" values. Even in the 70s, when Unix was created and the time system defined, they had to deal with dates back in the 60s or earlier. So, a signed integer made sense.
Once everyone switches to 64bit time_t's, we won't have to worry about a y2038k type problem for another 2billion or so 136-year periods.

Related

Length of time representation in Go

Under Unix, I'm working on a program that needs to behave differently depending on whether time is 32-bit (will wrap in 2038) or 64-bit.
I presume Go time is not magic and will wrap in 2038 on a platform with a 32-bit time_t. If this is false and it is somehow always 64-bit, clue me in because that will prevent much grief.
What's the simplest way in Go to write a test for the platform's time_t size? Is there any way simpler than the obvious hack with cgo?
If you really want to find the size of time_t, you can use cgo to link to time.h. Then the sizeof time_t will be available as C.sizeof_time_t. It doesn't get much simpler.
package main
// #include <time.h>
import "C"
import (
"fmt"
)
func main() {
fmt.Println(C.sizeof_time_t);
}
Other than trying to set the system time to increasingly distant dates, which is not very polite to anything else running on that system, I don't know of any way to directly query the limits of the hardware clock in a portable fashion in any programming language. C simply hard codes the size of time_t in a file provided by the operating system (on OS X it's /usr/include/i386/_types.h), so you're probably best off taking advantage of that information by querying the size of time_t via cgo.
But there's very few reasons to do this. Go does not use time_t and does not appear to suffer from 2038 issues unless you actually plan to have code running on a 32-bit machine in 2038. If that's your plan, I'd suggest finding a better plan.
I presume Go time is not magic and will wrap in 2038 on a platform with a 32-bit time_t. If this is false and it is somehow always 64-bit, clue me in because that will prevent much grief.
Most of the the Year 2038 Problem is programs assuming that the time since 1970 will fit in a 32-bit signed integer. This effects time and date functions, as well as network and data formats which choose to represent time as a 32-bit signed integer since 1970. This is not some hardware limit (except if it's actually 2038, see below), but rather a design limitation of older programming languages and protocols. There's nothing stopping you from using 64 bit integers to represent time, or choosing a different epoch. And that's exactly what newer programming languages do, no magic required.
Go was first released in 2009 long after issues such as Unicode, concurrency, and 32-bit time (ie. the Year 2038 Problem) were acknowledged as issues any programming language would have to tackle. Given how many issues there are with C's time library, I highly doubt that Go is using it at all. A quick skim of the source code confirms.
While I can't find any explicit mention in the Go documentation of the limits of its Time representation, it appears to be completely disconnected from C's time.h structures such as time_t. Since Time uses 64 bit integers, it seems to be clear of 2038 problems unless you're asking for actual clock time.
Digging into the Go docs for Time we find their 0 is well outside the range of a 32-bit time_t which ranges from 1901 to 2038.
The zero value of type Time is January 1, year 1, 00:00:00.000000000 UTC
time.Unix takes seconds and nanoseconds as 64 bit integers leaving no doubt that it is divorced from the size of time_t.
time.Parse will parse a year "in the range 0000..9999", again well outside the range of a 32-bit time_t.
And so on. The only limitation I could find is that a Duration is limited to 290 years because it has a nanosecond accuracy and 290 years is about 63 bits worth of nanoseconds.
Of course, you should test your code on a machine with a 32-bit time_t.
One side issue of the 2038 Problem is time zones. Computers calculate time zone information from a time zone database, usually the IANA time zone database. This allows one to get the time offset for a certain location at a certain time.
Computers have their own copy of the time zone database installed. Unfortunately its difficult to know where they are located or when they were last updated. To avoid this issue, most programming languages supply their own copy of the time zone database. Go does as well.
The only real limitation on a machine with 32-bit time is the limits of its hardware clock. This tells the software what time it is right now. A 32-bit clock only becomes an issue if your program is still running on a 32-bit machine in 2038. There isn't much point to mitigating that because everything on that machine will have the same problem and its unlikely they took it into account. You're better off decommissioning that hardware before 2038.
Ordinarily, time.Time uses 63 bits to represent wall clock seconds elapsed since January 1, year 1 00:00:00 UTC, up through 219250468-12-04 15:30:09.147483647 +0000 UTC. For example,
package main
import (
"fmt"
"time"
)
func main() {
var t time.Time
fmt.Println(t)
t = time.Unix(1<<63-1, 1<<31-1)
fmt.Println(t)
}
Playground: https://play.golang.org/p/QPs1m6eMPH
Output:
0001-01-01 00:00:00 +0000 UTC
219250468-12-04 15:30:09.147483647 +0000 UTC
If time.Time is monotonic (derived from time.Now()), time.Time uses 33 bits to represent wall clock seconds, covering the years 1885 through 2157.
References:
Package time
Proposal: Monotonic Elapsed Time Measurements in Go

Bit Shift Operator '<<' creates Extra 0xffff?

I am currently stuck with this simple bit-shifting problem. The problem is that when I assign a short variable any values, and shift them with << 8, I get 0xffff(2 extra bytes) when I save the result to the 'short' variables. However, for 'long', it is OK. So I am wondering why this would anyhow happen ??
I mean, short isn't supposed to read more than 2 bytes but... it clearly shows that my short values are containing Extra 2 bytes with the value 0xffff.
I'm seeking for your wisdom.. :)
This image describes the problem. Clearly, when the 'sign' bit(15) of 'short' is set to 1 AFTER the bit shift operation, the whole 2 byte ahead turns into 0xffff. This is demonstrated by showing 127(0x7f) passing the test but 0x81 NOT passing the test because when it is shifted, Due to it's upper 8. That causes to set Bit15(sign bit) to '1'. Also, Because 257(0x101) doesn't set the bit 15 after shifting, it turns out to be OK.
There are several problems with your code.
First, you are doing bit shift operations with signed variables, this may have unexpected results. Use unsigned short instead of short to do bit shifting, unless you are sure of what you are doing.
You are explicitly casting a short to unsigned short and then storing the result back to a variable of type short. Not sure what you are expecting to happen here, this is pointless and will prevent nothing.
The issue is related to that. 129 << 8 is 33024, a value too big to fit in a signed short. You are accidently lighting the sign bit, causing the number to become negative. You would see that if you printed it as %d instead of %x.
Because short is implicitly promoted to int when passed as parameter to printf(), you see the 32-bit version of this negative number, which has its 16 most relevant bits lit in accordance. This is where the leading ffff come from.
You don't have this problem with long because even though its signed long its still large enough to store 33024 without overloading the sign bit.

Does Kernel::srand have a maximum input value?

I'm trying to seed a random number generator with the output of a hash. Currently I'm computing a SHA-1 hash, converting it to a giant integer, and feeding it to srand to initialize the RNG. This is so that I can get a predictable set of random numbers for an set of infinite cartesian coordinates (I'm hashing the coordinates).
I'm wondering whether Kernel::srand actually has a maximum value that it'll take, after which it truncates it in some way. The docs don't really make this obvious - they just say "a number".
I'll try to figure it out myself, but I'm assuming somebody out there has run into this already.
Knowing what programmers are like, it probably just calls libc's srand(). Either way, it's probably limited to 2^32-1, 2^31-1, 2^16-1, or 2^15-1.
There's also a danger that the value is clipped when cast from a biginteger to a C int/long, instead of only taking the low-order bits.
An easy test is to seed with 1 and take the first output. Then, seed with 2i+1 for i in [1..64] or so, take the first output of each, and compare. If you get a match for some i=n and all greater is, then it's probably doing arithmetic modulo 2n.
Note that the random number generator is almost certainly limited to 32 or 48 bits of entropy anyway, so there's little point seeding it with a huge value, and an attacker can reasonably easily predict future outputs given past outputs (and an "attacker" could simply be a player on a public nethack server).
EDIT: So I was wrong.
According to the docs for Kernel::rand(),
Ruby currently uses a modified Mersenne Twister with a period of 2**19937-1.
This means it's not just a call to libc's rand(). The Mersenne Twister is statistically superior (but not cryptographically secure). But anyway.
Testing using Kernel::srand(0); Kernel::sprintf("%x",Kernel::rand(2**32)) for various output sizes (2*16, 2*32, 2*36, 2*60, 2*64, 2*32+1, 2*35, 2*34+1), a few things are evident:
It figures out how many bits it needs (number of bits in max-1).
It generates output in groups of 32 bits, most-significant-bits-first, and drops the top bits (i.e. 0x[r0][r1][r2][r3][r4] with the top bits masked off).
If it's not less than max, it does some sort of retry. It's not obvious what this is from the output.
If it is less than max, it outputs the result.
I'm not sure why 2*32+1 and 2*64+1 are special (they produce the same output from Kernel::rand(2**1024) so probably have the exact same state) — I haven't found another collision.
The good news is that it doesn't simply clip to some arbitrary maximum (i.e. passing in huge numbers isn't equivalent to passing in 2**31-1), which is the most obvious thing that can go wrong. Kernel::srand() also returns the previous seed, which appears to be 128-bit, so it seems likely to be safe to pass in something large.
EDIT 2: Of course, there's no guarantee that the output will be reproducible between different Ruby versions (the docs merely say what it "currently uses"; apparently this was initially committed in 2002). Java has several portable deterministic PRNGs (SecureRandom.getInstance("SHA1PRNG","SUN"), albeit slow); I'm not aware of something similar for Ruby.

Determining Millisecond Time Intervals In Cocoa

Just as background, I'm building an application in Cocoa. This application existed originally in C++ in another environment. I'd like to do as much as possible in Objective-C.
My questions are:
1)
How do I compute, as an integer, the number of milliseconds between now and the previous time I remembered as now?
2)
When used in an objective-C program, including time.h, what are the units of
clock()
Thank you for your help.
You can use CFAbsoluteTimeGetCurrent() but bear in mind the clock can change between two calls and can screw you over. If you want to protect against that you should use CACurrentMediaTime().
The return type of these is CFAbsoluteTime and CFTimeInterval respectively, which are both double by default. So they return the number of seconds with double precision. If you really want an integer you can use mach_absolute_time() found in #include <mach/mach_time.h> which returns a 64 bit integer. This needs a bit of unit conversion, so check out this link for example code. This is what CACurrentMediaTime() uses internally so it's probably best to stick with that.
Computing the difference between two calls is obviously just a subtraction, use a variable to remember the last value.
For the clock function see the documentation here: clock(). Basically you need to divide the return value by CLOCKS_PER_SEC to get the actual time.
How do I compute, as an integer, the number of milliseconds between now and the previous time I remembered as now?
Is there any reason you need it as an integral number of milliseconds? Asking NSDate for the time interval since another date will give you a floating-point number of seconds. If you really do need milliseconds, you can simply multiply by that by 1000 to get a floating-point number of milliseconds. If you really do need an integer, you can round or truncate the floating-point value.
If you'd like to do it with integers from start to finish, use either UpTime or mach_absolute_time to get the current time in absolute units, then use AbsoluteToNanoseconds to convert that to a real-world unit. Obviously, you'll have to divide that by 1,000,000 to get milliseconds.
QA1398 suggests mach_absolute_time, but UpTime is easier, since it returns the same type AbsoluteToNanoseconds uses (no “pointer fun” as shown in the technote).
AbsoluteToNanoseconds returns an UnsignedWide, which is a structure. (This stuff dates back to before Mac machines could handle scalar 64-bit values.) Use the UnsignedWideToUInt64 function to convert it to a scalar. That just leaves the subtraction, which you'll do the normal way.

Can dbms_utility.get_time rollover?

I'm having problems with a mammoth legacy PL/SQL procedure which has the following logic:
l_elapsed := dbms_utility.get_time - l_timestamp;
where l_elapsed and l_timestamp are of type PLS_INTEGER and l_timestamp holds the result of a previous call to get_time
This line suddenly started failing during a batch run with a ORA-01426: numeric overflow
The documentation on get_time is a bit vague, possibly deliberately so, but it strongly suggests that the return value has no absolute significance, and can be pretty much any numeric value. So I was suspicious to see it being assigned to a PLS_INTEGER, which can only support 32 bit integers. However, the interweb is replete with examples of people doing exactly this kind of thing.
The smoking gun is found when I invoke get_time manually, it is returning a value of -214512572, which is suspiciously close to the min value of a 32 bit signed integer. I'm wondering if during the time elapsed between the first call to get_time and the next, Oracle's internal counter rolled over from its max value and its min value, resulting in an overflow when trying to subtract one from the other.
Is this a likely explanation? If so, is this an inherent flaw in the get_time function? I could just wait and see if the batch fails again tonight, but I'm keen to get an explanation for this behaviour before then.
Maybe late, but this may benefit someone searching on the same question.
The underlying implementation is a simple 32 bit binary counter, which is incremented every 100th of a second, starting from when the database was last started.
This binary counter is is being mapped onto a PL/SQL BINARY_INTEGER type - which is a signed 32-bit integer (there is no sign of it being changed to 64-bit on 64-bit machines).
So, presuming the clock starts at zero it will hit the +ve integer limit after about 248 days, and then flip over to become a -ve value falling back down to zero.
The good news is that provided both numbers are the same sign, you can do a simple subtraction to find duration - otherwise you can use the 32-bit remainder.
IF SIGN(:now) = SIGN(:then) THEN
RETURN :now - :then;
ELSE
RETURN MOD(:now - :then + POWER(2,32),POWER(2,32));
END IF;
Edit : This code will blow the int limit and fail if the gap between the times is too large (248 days) but you shouldn't be using GET_TIME to compare durations measure in days anyway (see below).
Lastly - there's the question of why you would ever use GET_TIME.
Historically, it was the only way to get a sub-second time, but since the introduction of SYSTIMESTAMP, the only reason you would ever use GET_TIME is because it's fast - it is a simple mapping of a 32-bit counter, with no real type conversion, and doesn't make any hit on the underlying OS clock functions (SYSTIMESTAMP seems to).
As it only measures relative time, it's only use is for measuring the duration between two points. For any task that takes a significant amount of time (you know, over 1/1000th of a second or so) the cost of using a timestamp instead is insignificant.
The number of occasions on where it is actually useful is minimal (the only one I've found is checking the age of data in a cache, where doing a clock hit for every access becomes significant).
From the 10g doc:
Numbers are returned in the range -2147483648 to 2147483647 depending on platform and machine, and your application must take the sign of the number into account in determining the interval. For instance, in the case of two negative numbers, application logic must allow that the first (earlier) number will be larger than the second (later) number which is closer to zero. By the same token, your application should also allow that the first (earlier) number be negative and the second (later) number be positive.
So while it is safe to assign the result of dbms_utility.get_time to a PLS_INTEGER it is theoretically possible (however unlikely) to have an overflow during the execution of your batch run. The difference between the two values would then be greater than 2^31.
If your job takes a lot of time (therefore increasing the chance that the overflow will happen), you may want to switch to a TIMESTAMP datatype.
Assigning a negative value to your PLS_INTEGER variable does raise an ORA-01426:
SQL> l
1 declare
2 a pls_integer;
3 begin
4 a := -power(2,33);
5* end;
SQL> /
declare
*
FOUT in regel 1:
.ORA-01426: numeric overflow
ORA-06512: at line 4
However, you seem to suggest that -214512572 is close to -2^31, but it's not, unless you forgot to typ a digit. Are we looking at a smoking gun?
Regards,
Rob.

Resources