Impala unable to read Dateless timestamp from Parquet file - time

Impala v2.11.0+ (CDH v5.11.1+) cannot read timestamps with only a time value from Parquet file.
create table TT2(t timestamp) STORED AS PARQUET;
insert into TT2 (t) values ("10:00:00");
select * from TT2;
+------+
| t |
+------+
| NULL |
+------+
WARNINGS: Parquet file 'hdfs://localhost:20500/test-warehouse/tt2/714d741212df3180-cd4e670800000000_226739479_data.0.parq' column 't' contains an out of range timestamp. The valid date range is 1400-01-01..9999-12-31.`
https://issues.apache.org/jira/browse/IMPALA-5942
Even though the select statement returns null, the metastore manager shows that the column has the value
4714-12-30 10:00:00.0
What I'm looking for is an alternative way to query this data to get the time value instead of manually finding and converting all the dateless timestamp columns to string.
I've tried
SELECT cast(t as string) FROM TT2
SELECT date_part('hour', t) FROM TT2
SELECT from_timestamp(tms, "HH:mm:ss") FROM TT2
SELECT extract(tms, "hour") FROM TT2
SELECT extract(cast(tms as string), "hour") FROM TT2

I believe that since you are interested in only the time part so I would suggest to simply replace the date part with some acceptable date and then impala provides functions to extact time from timestamp (all your further queries on top of data can make use of it), so I think one possible solution is
1)Create a temporary table where timestamp column is string
2)Now you will get the value like "4714-12-30 10:00:00.0"
3)now , you have to do things,
a) use split_apart to extract 10:00:00.0
b) concat the extracted part with "2018-11-12"
so now you can get values in the format "2018-11-12 10:00:00.0"
4)Now the result of previous stepcan be casted as a timestamp
5)Use the select [column names col1,2,3,..],(cast (concat("2018-11-12",split_apart(col,delim,index)))as timestamp) to insert data into original table from temporary table

Related

changing Date format in query

in some part of my program , I want to run a sql query and have the result which is a date like : %Y/%m/%d %H:%M:%S
SELECT MAX(created_at)
FROM HOT_FILES_LOGS
WHERE FILE_NAME = 'test'
date in created_at column is stored like 04/03/2021 15:45:30 ( it is fulled with SYSDATE)
but when I run this query, I get just 04.03.21
what should I do to fix it?
Apply TO_CHAR with appropriate format mask:
select to_char(max(created_at), 'yyyy.mm.dd hh24:mi:ss') as created_at
from hot_files_logs
where file_name = 'test'
Oracle does not store dates or timestamps in any display format, they are stored in an internal structure, every date in every Oracle database since at least 8i and probably earlier. This structure consists of 7 1-byte integers (timestamps in a similar but larger structure). How the date is displayed or a string converted to a date is controlled the specified date format string in the to_char or to_date function or if no format string given by the NLS_DISPLAY_FORMAT setting. To get a gimps at the internal settings run the following:
create table td( d date);
insert into td(d) values(sysdate);
select d "The Date" , dump(d) "Stored As" from td;
See example. The last used format is not practical but strictly demonstrable. Well I guess you could use it to seed a repeatable random sequence.

Is expression based partitioning supported in hive?

I have a table with a column, can i create a partition based on an expression using that column
I read that IBM's Big SQL technology has this feature.
I also know we can partition in hive by a column but what about an expression?
In this case i am doing a cast..it could be any expression
CREATE TABLE INVENTORY_A (
trans_id int,
product varchar(50),
trans_ts timestamp
)
PARTITIONED BY (
cast(trans_ts as date) AS date_part
)
I expect the records to be partitioned by the date value. So I expect that when a user writes a query like
select * from INVENTORY_A where trans_ts BETWEEN timestamp '2016-06-23 14:00:00.000' AND timestamp '2016-06-23 14:59:59.000'
the query will be smart enough to break the timestamp down by the date and do a filter only on the date
You can use Dynamic partitioning and cast your variables in select query.

Insert data convert to date from an existing table

I have a table that contains a time stamp number TEST1 (TIMESTAMP) and I want to create another table TEST2 that displays (TIMESTAMP, TIME) from the first table (the TIME field displays the timstamp converted to date). I tried this
insert into TEST2
values (TEST1.TIMESTAMP,to_date('1970-01-01 ','yyyy-mm-dd ') +
(TEST1.TIMESTAMP)/60/60/24 ,'YYYY-MM-DD');
can have help !
Assuming the TIMESTAMP column is really in seconds, your code is almost right. However: to_date('1970-01-01', 'yyyy-mm-dd') is correct, but to it you must add a number - plain and simple. That is: + TIMESTAMP/60/60/24. Just like that!
Or, to avoid rounding errors, you could do something like
to_date('1970-01-01', 'yyyy-mm-dd') + interval '1' second * test1.timestamp
Note that the fixed date can also be written simply as
date '1970-01-01'
Finally: If you want to store the date portion, wrap the result of that addition within TRUNC(...) - which truncates to the beginning of the day. Or, you can use the arithmetic expression, without INTERVAL and without TRUNC, but instead add TRUNC(TEST1.TIMESTAMP)/60/60/24 - that will truncate the seconds to a whole number of days.
Note that dates in Oracle do not have a "format" (format only applies to text representations of dates, not to dates themselves).
EDIT: It seems you also need help with the INSERT statement. When you insert into a table using data from another table, you don't INSERT ... VALUES, you INSERT ... SELECT. Something like (using your poorly chosen column names - poorly chosen because they are Oracle keywords):
insert into test2 (timestamp, time)
select timestamp, date '1970-01-01' + trunc(timestamp/60/60/24)
from test1
;
Notice that there are no calls whatsoever to either TO_CHAR or TO_DATE; and there is no format model like 'yyyy-mm-dd'. One of the good things about the SQL Standard date literal, date '1970-01-01', is that it accepts only one format, 'yyyy-mm-dd' (even the dashes are mandatory; / would be rejected).
If you want to see what is now in table TEST2:
select timestamp, time from test2;
And if you don't like how the date is displayed, you can control that:
select timestamp, to_char(time, 'yyyy-mm-dd') as time from test2;

How to write date condition on where clause in oracle

I have data in the date column as below.
reportDate
21-Jan-17
02-FEB-17
I want to write a query to fetch data for 01/21/2017?
Below query not working in Oracle.
SELECT * FROM tablename where reportDate=to_date('01/21/2017','mm/dd/yyyy')
What is the data type of reportDate? It may be DATE or VARCHAR2 and there is no way to know by just looking at it.
Run describe table_name (where table_name is the name of the table that contains this column) and see what it says.
If it's a VARCHAR2 then you need to convert it to a date as well. Use the proper format model: 'dd-Mon-rr'.
If it's DATE, it is possible it has time-of-day component; you could apply trunc() to it, but it is better to avoid calling functions on your columns if you can avoid it, for speed. In this case (if it's really DATE data type) the where condition should be
where report_date >= to_date('01/21/2017','mm/dd/yyyy')
and report_date < to_date('01/21/2017','mm/dd/yyyy') + 1
Note that the date on the right-hand side can also be written, better, as
date '2017-01-21'
(this is the ANSI standard date literal, which requires the key word date and exactly the format shown, since it doesn't use a format model; use - as separator and the format yyyy-mm-dd.)
The query should be something like this
SELECT *
FROM table_name
WHERE TRUNC(column_name) = TO_DATE('21-JAN-17', 'DD-MON-RR');
The TRUNC function returns a date value specific to that column.
The o/p which I got when I executed in sqldeveloper
https://i.stack.imgur.com/blDCw.png

How to store date value in hive timestamp?

I am trying to store the date and timestamp values in timestamp column using hive. The source file contain the values of date or sometimes timestamps.
Is there a way to read both date and timestamp by using the timestamp data type in hive.
Input:
2015-01-01
2015-10-10 12:00:00.232
2016-02-01
Output which I am getting:
null
2015-10-10 12:00:00.232
null
Is it possible to read both values by using timestamp data type.
DDL:
create external table mytime(id string ,t timestamp) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'hdfs://xxx/data/dev/ind/'
I was able think of a workaround. tried this with a small set of data:
Load the data with inconsistent date data into a hive table say table1 by making the column as string datatype .
Now create another table table2 with the datatype as timestamp for the required column and load the data from table1 to table2 using the transformation INSERT OVERWRITE TABLE table2 select id,if(length(tsstr) > 10, tsstr, concat(tsstr,' 00:00:00')) from table1;
This should load the data in required format.
Code as below:
`
create table table1
(
id int,
tsstr string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/user/cloudera/hive/table1.tb';
Data:
1,2015-04-15 00:00:00
2,2015-04-16 00:00:00
3,2015-04-17
LOAD DATA LOCAL INPATH '/home/cloudera/data/tsstr' INTO TABLE table1;
create table table2
(
id int,
mytimestamp timestamp
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/user/cloudera/hive/table2.tb';
INSERT INTO TABLE table2 select id,if(length(tsstr) > 10, tsstr, concat(tsstr,' 00:00:00')) from table1;
Result shows up as expected:
Hive is similar to any other database in terms of datatype mapping and hence requires a uniform values for a specific column to be stored under a conformed datatype. The data in your file for second column has non-uniform data i.e, some are in date format while others in timestamp format.
In order to not to lose the date, as suggested by #Kishore , make sure you have a uniform datatype in the file and get the file with timestamp values as 2016-01-01 00:00:000 where there are only dates.

Resources