Comparing millisecond timestamps in HDFS - hadoop

I have 2 timestamp columns stored in an HDFS that I can access through Impala, hive, etc...
The timestamps that I need to compare may look like this example:
2014-04-08 00:23:21.687000000
2014-04-08 00:23:21.620000000
With differences in the milliseconds, and need to build a new column that in this example should have a value of 0.067000
I've tried using impala's built in time functions but none of them seem to make the cut.
I've tried:
casting the string to a timestamp and then substracting the 2 values. This returns an error "AnalysisException: Arithmetic operation requires numeric operands"
using the unix_timestamp function. This truncates the values to an int that represent seconds, so subsecond values are lost.

While writting this question I found the answer :)
The way to do it was using a double cast.
Cast(cast(time_stamp) as timestamp) as double)
this makes the times_stamp into a number without truncating sub-second values.
Once there it becomes a trivial arithmetic operation.

Related

What field type would give a better index perfomance in oracle db?

I have a field that contains time of order creation(order_time). Naturally, the best data type for that field is TIMESTAMP, but I want to create index and I'm not sure that TIMESTAMP index would be better than any numerical index. What's the best practice here?
I'm using oracle database
Always use the most appropriate data-type for the data:
If the data has date and time components and has a time-zone then use TIMESTAMP WITH TIME ZONE;
If the data has date and time components with fractional seconds and no time-zone then use TIMESTAMP;
If the data has date and time components with no fractional seconds and no time-zone then use DATE; and
If your data is an instant measured, for example, as the number of milliseconds (or seconds) since 1970-01-01 00:00:00 UTC and you almost entirely use it in its numeric form (i.e. you never, or very rarely, convert it to a human readable format such as YYYY-MM-DD HH:MI:SS.FF) then you may want to store it as a number. However, if you want to format it so it is readable or compare it to dates then you should prefer the TIMESTAMP (or DATE) data type.
Never use an inappropriate data-type for your column. The index performance between the different data-types should be mostly irrelevant and the overheads of converting from an inappropriate data-type to an appropriate one are likely to be a much more significant cost.

clickhouse dateTime with milliseconds

ClickHouse doesn't support, yet, DateTime with milliseconds.
I saw two possible suggestion regarding fields like: 2019-03-17T14:00:32.296Z
multiply by 100 an store it in UInt32/64. How do I use the multiply by 100 and store as UInt32?
to store milliseconds separately. Is there a way to remove milliseconds from 2019-03-17T14:00:32.296Z => 2019-03-17 14:00:32?
Thanks for your help!
Should use the datetime64 type - https://clickhouse.com/docs/en/sql-reference/data-types/datetime64/
In my mind, the main idea, why ClickHouse does not support milliseconds in DateTime is worse compression.
Long story short: use DateTime and precession by seconds. If you want to store milliseconds, you can go ahead with two ways:
Store milliseconds separately, so you will have a DateTime with your date, that you could use in all possible DateTime functions, as well as primary keys. And put milliseconds part in separate column with type UInt16. You have to prepare data separately before storing. Depends on what language do you use for preprocess data before storing, it could be different ways to do it. In golang it could be done:
time.Now().UnixNano() / 1e6 % 1e3
Another way, is to store whole as timestamp. This means you should convert your date to unix timestamp with milliseconds by your own and put it into ClickHouse as Uint64. It also depends on what do you use for preparing inserts. For golang it could like:
time.Now().UnixNano() / 1e6

Oracle DB: Convert String(Time stamp) into number(minutes)

So, I am trying to build a query in RMAN Catalogue ( using RC_RMAN_BACKUP_JOB_DETAILS) to compare the most recent backup duration (TIME_TAKEN_DISPLAY) for each database (DB_NAME) with its historical average AVG backup duration (TIME_TAKEN_DISPLAY).
How do I convert TIME_TAKEN_DISPLAY(timestamp; HH:MM:SS), i.e. in VARCHAR2 Format to a minute format, i.e number only, so as to run the query against the entire RC_RMAN_BACKUP_JOB_DETAILS to compare AVG time taken in past with time takes for last backup for each DB.
One thing that may work is converting String(Time_taken_display)->To_TIME(Time_taken_display in Time format)->TO_NUM(Time_taken_display in minutes in number format), but this will be so highly inefficient.
The solution can be pretty simple and complex depending on the requirements:
One simple solution is:
select avg(substr(TIME_TAKEN_DISPLAY, 0,2)*60 + substr(TIME_TAKEN_DISPLAY, 4,2) + substr(TIME_TAKEN_DISPLAY, 7,2)/60) from RC_RMAN_BACKUP_JOB_DETAILS;
Using Type Casting Functions:
Cast TIME_TAKEN_DISPLAY into time format using TO_TIMESTAMP and then cast to TO_NUMBER, but I did not want to take this approach as I plan to run my scripts against all databases logged in the view, and multiple casting will leave the performance highly inefficient.
But as per #alex Poole comment, I will be using ENLAPSED_SECONDS field as it is readily available in seconds and number data type.

Inserting BigDecimal =>Varchar2 column VS BigDecimal=>Number column

I was doing some tests, where I inserted some records of java bigDecimal to a varchar2 column in Oracle.
What I wanted to do was insert java bigDecimal to number column in Oracle.
I am wondering how the 2 work differently and what interim conversion steps does Oracle take in the scenarios.
BigDecimal =>Varchar2 column
BigDecimal=>Number column
Can I still use the findings from my previous tests. I am mostly looking at latency, throughput etc.
Remember the golden rule: You should never ever under no circumstances store numbers in varchar columns.
Storing numbers in character columns will give you a lot of trouble in the long run.
Always store numbers as numbers.
To store the numbers, use a PreparedStatement and use the setBigDecimal() method to send the number to the database. This will take care of any conversion and will guarantee that the correct value is stored in the database and you don't have to worry e.g. about different decimal separators in different locales when sending a number as a string to the database.
I did not find any measurable performace difference. This was just a test of a prototype so I can use the results.

How does SCN_TO_TIMESTAMP work?

Does the SCN itself encode a timestamp or is it a lookup from some table.
From an AskTom post he explains that the timestamp to +/-3seconds is stored in raw field in smon_scn_time. IS that where the function is going to get the value?
If so, when is that table purged if ever? If so, what triggers that purge?
If it is, does that make it impossible to translate old SCN's to Timestamps?
If it's impossible, then it eliminates any uses of that field that are long term things (read: auditing).
If I put that function in a query, would joining to that table be faster?
If so, anyone know how to covert that Raw column?
The SCN does not encode a time value. I believe it is an autoincrementing number.
I would guess that SMON is inserting a row into SMON_SCN_TIME (or whatever table underlies it) every time it increments the SCN, including the current timestamp.
I queried for the minimum recorded timestamp in several databases and they all go back about 5 days and have a little under 1500 rows in the table. So it is less than the instance lifetime.
I imagine the lower bound on how long the data is kept might be determined by the DB_FLASHBACK_RETENTION_TARGET parameter, which defaults to 1 day.
I would recommend using the function, they've probably provided it so they can change the internals at will.
No idea what the RAW column TIM_SCN_MAP contains, but the TIME_DP and SCN column would appear to give you the mapping.

Resources