Is there any sort of advantage (performance, indexes, size, etc) to storing dates in MongoDB as an ISODate() vs. storing as a regular UNIX timestamp?
The amount of overhead of a ISODate compared to a time_t is trivial compared to the advantages of the former.
An ISO 8601 format date is human readable, it can be used to express dates prior to January 1, 1970, and most importantly, it isn't prey to the Y2038 problem.
This last bit can't be stressed enough. In 1960, it seemed ludicrous that wasting an octet or two on a century number could yield any benefit as the turn of the century was impossibly far off. We know how wrong that turned out to be. The year 2038 will be here sooner than you expect, and time_t are already insufficient for representing – for example – the schedule of payments on a 30-year contract.
MongoDB's built-in Date type is very similar to a unix timestamp stored in time_t. The only difference is that Dates are a 64bit field storing miliseconds since Jan 1 1970, rather than a 32bit fields storing seconds since the same epoch. The only down side is that for current releases it treats the count as unsigned so it can't handle dates before 1970 correctly. This will be fixed in MongoDB 2.0 scheduled for release in about a month.
A possible point of confusion is the name "ISODate". It is just a helper function in the shell to wrap around javascript's horrible Date constructor. If you call either "ISODate()" or "new Date()" you will get back the exact same Date object, we just changed how it prints. You are still free to use normal ISO Date stings or time_t ints without using our constructors, but you won't get nice Date objects back in your language of choice.
Related
This is the value which I have,
Sun Mar 29 2020 02:55:00 GMT+0530
and I want to get,for example
Asia/Calcutta
as ouput. Thanks in advance.
Offset does not indicate zone
get TimeZone value using time stamp
No.
You cannot determine a time zone from an offset.
Many time zones can share the same offset-from-UTC (the number of hours-minutes-seconds ahead or behind the prime meridian).
See the list of time zone names in Wikipedia. Click on the column header to sort by offset. Notice how often several zones share the same offset.
Specific to your example, notice how we currently have two zones that coincidentally share an offset of five and a half hours ahead of UTC:
Asia/Kolkata (India)
Asia/Colombo (Sri Lanka)
So, without further input, there is no way to know if the author of your input string intended India time or Sri Lanka time.
By the way, the name Asia/Calcutta has been changed to Asia/Kolkata. If your system has no such name, then your tzdata is several years out of date. Always keep all the copies of tzdata up-to-date in OSes, database servers such as Postgres, and runtimes such as Java.
Another complication: politicians frequently change the offset used in their jurisdictions.
So while all of India today uses the same offset of +05:30, that has not always been the case, nor is it likely to always be true in the future (based on the history of how often zones change around the world).
ISO 8601
The ISO 8601 standard defines many sensible formats for representing date-time values as text.
2020-01-23T12:34:56.123456789+05:30
The java.time framework built into Java 8 and later extends one of those format wisely by appending the name of the time zone in square brackets. I suggest using this format if feasible.
2020-01-23T12:34:56.123456789+05:30[Asia/Kolkata]
It occurred to me that I'm not aware of a mechanism to store dates before 1970 jan. 1 as Unix timestamps. Since that date is the Unix "epoch" this isn't much of a surprise.
But - even though it's not designed for that - I still wish to store dates in the far past in Unix format.I need this for reasons.
So my question is: how would one go about making unix-timestamps contain "invalid" but still working dates? Would storing a negative amount of seconds work? Can we even store negative amounts of seconds in a unix-timestamp? I mean isn't it unsigned?
Also if I'm correct then I could only store dates as far back as 1901. dec. 13 20:45:52 could this be extended any further back in history by any means?
Unix Time is usually a 32-bit number of whole seconds from the first moment of 1970 in UTC, the epoch being 1 January 1970 00:00:00 UTC. That means a range of about 136 years with about half on either side of the epoch. Negative numbers are earlier, zero is the epoch, and positive are later. For a signed 32-bit integer, the values range from 1901-12-13 to 2038-01-19 03:14:07 UTC.
This is not written in stone. Well, it is written, but in a bunch of different stones. Older ones say 32-bit, newer ones 64-bit. Some specifications says that the meaning is "implementation-defined". Some Unix systems use an unsigned int to extend only into the future past the epoch, but usual practice has been a signed number. Some use a float rather than an integer. For details, see Wikipedia article on Unix Time, and this Question.
So, basically, your Question makes no sense. You have to know the context of your programming language (standard C, other C, Java, etc.), environment (POSIX-compliant), particular software library, or database store, or application.
Avoid Count-From-Epoch
Add to this lack of specificity the fact that a couple dozen other epochs have been used by various software systems, some extremely popular and common. Examples include January 1, 1601 for NTFS file system & COBOL, January 1, 1980 for various FAT file systems, January 1, 2001 for Apple Cocoa, and January 0, 1900 for Excel & Lotus 1-2-3 spreadsheets.
Further add the fact that different granularities of count have been used. Besides whole seconds, some systems use milliseconds, microseconds, or nanoseconds.
I recommend against tracking date-time as a count-from-epoch. Instead use specific data types where available in your programming language or database.
ISO 8601
When data types are not available, or when exchanging data, follow the ISO 8601 standard which defines sensible string formats for various kinds of date-time values.
Date
2015-07-29
A date-time with an offset from UTC (Z is zero/Zulu for UTC) (note padding zero on offset)
2015-07-29T14:59:08Z
2001-02-13T12:34:56.123+05:30
Week (with or without day of week)
2015-W31
2015-W31-3
Ordinal date (day-of-year)
2015-210
Interval
"2007-03-01T13:00:00Z/2008-05-11T15:30:00Z"
Duration (format of PnYnMnDTnHnMnS)
P3Y6M4DT12H30M5S = "period of three years, six months, four days, twelve hours, thirty minutes, and five seconds"
Search StackOverflow.com for many more Questions and Answers on these topics.
I am planning on starting a project that will need to record timestamps of incoming transactions. I appreciate that Unix Time is an integer value and I can use this type of functionality to my advantage. However, Unix Time only measures in seconds. As a minimal requirement I need to record transaction times at the millisecond level.
I know that there are ways that I could get around this issue, but I was wondering if there was another standardized way of representing time data that also represented milliseconds (or, some factor of sub-milliseconds) in the time value that is fully expressed as an integer value since epoch.
Does such a time format exist? FYI, so long as the date data-type is standardized, I don't care what system this is native in. I can code my own implementation, however, I would like to use an existing date/time format, rather than create my own.
One place where such a standard is used is ECMAScript / Javascript. Javascript date objects use milliseconds since January 1, 1970, midnight UTC for their numerical integer representation. This is detailed here.
You can test this using your browser's console:
var d = new Date();
console.log(d.getTime()); // yields integer milliseconds since epoch
So yes, there is prior art for such a use.
date +%s
outputs timestamp in seconds
date +%s%N
returns timestamp in nanoseconds
To get milliseconds divide the nanoseconds by 1 000 000
UNIX time is not appropriate for time stamping transactions because it does some weird stuff, inserting leap seconds on occasion, thus making it so that you won't be able to add and subtract time stamps reliably, nor sort transactions by timestamp.
A more appropriate standard for timestamps is TAI https://www.nist.gov/pml/time-and-frequency-division/nist-time-frequently-asked-questions-faq#tai . TAI is stored in the same way as UNIX time as a number of seconds and or microseconds and or nanoseconds since the UNIX epoch, however, it is the true number, no leap seconds are added or removed. This means that you can actually add and subtract TAI timestamps to get elapsed time and TAI timestamps are always sortable. Unfortunately, support for TAI timestamps is somewhat limited. For example, linux added support for TAI timestamps only recently in version 3.10 python added this support only in version 3.9/time.html?highlight=time#module-time
I'm looking for some best practices to handle and store static time values.
A static time is usually the time of a recurring event, e.g. the activities in a sport centre, the opening times of a restaurant, the time a TV show is aired every day.
This time values are not bound to a specific date, and should not be affected by daylight saving time. For example, a restaurant will open at 11:00am both in winter and summer.
What's the best way to handle this situation? How should this kind of values be stored?
I'm mainly interested in issues with automatic TimeZone and DST adjustments (that should be avoided), and in keeping the time values independent by any specific date.
The best strategies I've found so far are:
store the time as an integer number of seconds since midnight,
store the time as a string.
I did read this question, but it's mostly about the normal time values and not the use cases I described.
Update
The library I'm working on: github
Regarding database storage, consider the following in order from most preferred to least preferred option:
Use a TIME type if your database supports it, such as in SQL Server (2008 and greater), MySQL, and Postgres, or INTERVAL HOUR TO SECOND in Oracle.
Use separate integer fields for Hours and Minutes (and Seconds if you need them). Consider using a custom user-defined type to bind these together if your DB supports it.
Use string in 24-hour format with a leading zero, such as "01:23:00", "12:00:00" or "23:59:00". If you include seconds, then always include seconds. You want to keep the strings lexicographically sortable. Don't mix and match formatting. Be consistent.
Regarding the approach of storing a whole number of minutes (or seconds) elapsed since midnight, I recommend avoiding it. That works great when you are actually storing an elapsed duration of time, but not so great when storing a time of day. Consider:
Not every day has a midnight. In some time zones (ex: Brazil), on the day of the spring-forward DST transition, the clocks go from 23:59:59 to 01:00:00.
In any time zone that has DST, the "time elapsed since midnight" could be lying to you. Even when midnight exists, if you save 10:00 as "10 hours", then that's potentially a false statement. There may have been 9 hours or 11 hours elapsed since midnight, if you consider the two days per-year involved in DST transitions.
At some point in your application, you'll likely be applying this time-of-day value to some particular date. When you do, if you are using "elapsed time" semantics, you might be tempted to simply add the elapsed time to midnight of the date in question. That will lead to errors on DST transition days, for the reasons I just mentioned. If you are instead representing a "time of day" in your storage, you'll be more likely to combine them together properly. Of course, this is highly dependent on what language and API you are using.
With any of these, be careful when using recurrence patterns. Say you store a time of "02:00:00" when a bar closes every night. When DST springs forward, that time might not exist, and when it falls back, it will exist twice. You need to be prepared to check for this condition when you apply the time to any particular date.
What you should do is entirely up to your use case. In many situations, the sensible thing to do is to jump forward one hour in the spring-forward gap, and to pick the first of the two points in the fall-back overlap. But YMMV.
See also, the DST tag wiki.
Per comments, it looks like the "tod" gem will suffice for your Ruby code.
The question seems a little vague, but I will have a try.
Generally speaking, using an integer seems good enough for me. It is easy to compare, easy to add or subtract a duration (of seconds), and is space- and time-efficient. You can consider wrapping it in a class if you are using an object-oriented language.
As far as I know, there are no existing classes for your needs in C or C++.
In the .NET world, the TimeSpan class may be useful for your purpose. It has some conveniences, like: you can get the TimeSpan value from DateTime.TimeOfDay; you can add the TimeSpan with an interval (a TimeSpan); you can get the hours, minutes, and seconds components separately; etc.
If you use Python, datime.time is also a good candidate. It is designed exactly for usages like yours.
I do not know other good candidates in other languages.
Speaking for Java:
In Java, the use-cases you describe are not covered well by old java.util.Date (which is a global timestamp despite of its name) or java.util.GregorianCalendar (which is a kind of combination of date and time and zone etc.), but:
In Java 8 you have the new built-in class java.time.LocalTime which covers your use-cases well. Predecessor is the equally-named class LocalTime in the external and popular Java library JodaTime which is working since Java 5. Furthermore, in my own alpha-state-library I have the type net.time4j.PlainTime which is similar, but also offers 24:00-support (good for example for shop opening times). All in all Java is a well suited language with interesting time libraries which can mostly do what you wish. In detail:
a) TimeZone and DST adjustments are not handled by the Java classes mentioned above. Instead they are only handled if you convert such a plain wall time to another type like org.joda.time.DateTime which contains a reference to a timezone.
b) Indeed these time classes are completely independent from calendar date, too.
c) The internal storage strategy is for JSR-310 (Java 8):
private final byte hour;
private final byte minute;
private final byte second;
private final int nano;
JodaTime uses the other strategy of local milliseconds instead (elapsed time since midnight).
You cannot represent a time unless you also know the day/month/year. There is no such thing as "should not be affected by daylight saving time" as there are many complicated issues to deal with, including leap seconds and so on. Time, as a human sees it, is a complicated thing that cannot easily be dealt with mathematically.
If you really need to store "11am" without any date associated, then that's what you should store. Just store 11am (or perhaps just 11, use 24 hour time).
Then, if you need to do any math you must apply a date before doing any operations on the time.
I would also refrain from storing "11am" as "x seconds from midnight". You really should just use 11 hours, since that is what the user sees, and then have a good date/time library convert it to a useful format. For example, telling the user if the restaurant is open right now you'd pass it to a date library with today's date.
I see people storing / getting the server time and times relative to it using date or getTime which can be kept in the database as a string of the sorts: "July 21, 1983 01:15:00".
Up until now I stored my server time as the difference between NOW and 1 january 2013. This would return a number value (in minutes), rounded down between 1 jan 2013 and right now, which I keep as internal server time.
The advantages of this are that:
- querying the server implies a simple numeric comparison operation, while (I make an educated guess) comparing two dates implying internal conversion to objects and using fat comparison operations.
- storing a number of that size is more lightweight than a string of ~25 characters.
- converting back to "real" time is by adding 1 jan 2013 but second and millisecond values are lost due to initial roundness.
But still, other fellow programmers insist that using the string version
- is easy to read as a human.
- its an universal format for most languages (especially nodejs, mongodb and as3 which this project has).
I am uncertain which is better for large scale databases and specifically, for a multiplayer socket based game. I am sure others with real experience in this could shed some light on my issue.
So which is better and why?
Store them as Mongo Date objects. Mongo stores dates as 8-byte second-offset integers [1], and displays them in human readable format. You are NOT storing 25 characters!
Therefore, all comparisons are just as fast. There is no string parsing except for when you're querying, which is a one-time operation per query.
Your difference is stored as either as an int of 4 bytes. So you're saving ONLY 4 bytes over normal MongoDB date storage. That's a very small savings, considering against the average size of your mongo objects.
Consider all the disadvantages of your "offset since January 2013" method:
Time spent writing extra logic to offset the dates when updating or querying.
Time spent dealing with bugs that arise from having forgotten to offset a date.
Time spent shifting dates by hand or in your head when inspecting database output (when diagnosing a problem), instead of seeing the actual date right away.
Inability to use date operators in the MongoDB aggregations without extra work (e.g. $dayOfMonth, extra work being a projection to shift your dates internally to ).
Basically, more code and more headache and more time spent, all to save 4 bytes on objects in a database where the same 4 bytes can be saved by renaming your field from "updated" to "upd"? I don't think that's a wise tradeoff.
Also,
Best way to store date/time in mongodb
Premature optimization is the root of all evil. Don't optimize unless you've determined something to be a problem.
1 - http://bsonspec.org/#/specification