CurrentTime() generated from Pig showing as NULL in Hive Datetime column

CurrentTime() generated from Pig showing as NULL in Hive Datetime column - hadoop

In Pig script I have generated datetime column with its value as CurrentTime().
While reading the data from Hive Table for the output generated by PigScript, it shows as NULL.
Is there any way that I can load the current datetime column from PIG to show in Hive Table?
The data in the file looks like 2020-07-24T14:38:26.748-04:00 and in the hive table the column is of timestamp datatype

Hive timestamp should be in 'yyyy-MM-dd HH:mm:ss.SSS' format (without T and timezone -04:00)
1.Define Hive column as STRING
2.Transfom string to format compatible with Hive timestamp
If you do not need milliseconds:
--use your string column instead of literal
from_unixtime(unix_timestamp('2020-07-24T14:38:26.748-04:00',"yyyy-MM-dd'T'HH:mm:ss.SSSX"))
Returns:
2020-07-24 18:38:26
If you need milliseconds then additionally extract milliseconds and concatenate with transformed timestamp:
select concat(from_unixtime(unix_timestamp('2020-07-24T14:38:26.748-04:00',"yyyy-MM-dd'T'HH:mm:ss.SSSX")),
'.',regexp_extract('2020-07-24T14:38:26.748-04:00','\\.(\\d{3})',1))
Result:
2020-07-24 18:38:26.748
Both results are compatible with Hive timestamp and if necessary can be cast explicitly to Timestamp type using CAST(str as timestamp) function, though comparing these strings with timestamps or inserting into timestamp works without explicit cast.
Alternatively you can format timestamp in Pig to be 'yyyy-MM-dd HH:mm:ss.SSS' I do not have Pig and can not check how ToString works.
Also for LazySimpleSerDe, alternative timestamp formats can be supported by providing the format to the SerDe property "timestamp.formats" (as of release 1.2.0 with HIVE-9298). Try "yyyy-MM-dd'T'HH:mm:ss.SSSX"

Related

How to specify the timestamp format when creating a table using a hdfs directory

I have the following csv file located at the path/to/file in my hdfs store.
1842,10/1/2017 0:02
7424,10/1/2017 4:06
I'm trying to create a table using the below command:
create external table t
(
number string,
reported_time timestamp
)
ROW FORMAT delimited fields terminated BY ','
LOCATION 'path/to/file';
I can see in the impala query editor that the reported_time column in the table t is always null. I guess this is due the fact that my timestamp wasn't in an accepted timestamp format.
Question:
How can I specify that the timestamp column should be of the dd/mm/yyyy hh:min format so that it correctly parses the timestamp?

You can't customize the timestamp(as per my exp*) but you can create the table with string data type and then you can convert string to timestamp as below:
select number,
reported_time,
from_unixtime(unix_timestamp(reported_time),'dd/MM/yyyy HH:mm') as reported_time
from t;

Handling dates in Hadoop

I'm new to the Big Data/Hadoop ecosystem and have noticed that dates are not always handled in standard way across technologies. I plan to be ingesting data from Oracle into Hive tables on an HDFS using Sqoop with Avro and Parquet file formats. Hive continues to import my dates into BIGINT values, I'd prefer TIMESTAMPS. I've tried using the "--map-column-hive" overrides... but it still does not work.
Looking for suggestions on the best way to handle dates for this use case.

Parquet File Format
If you use Sqoop to convert RDBMS data to Parquet, be careful with interpreting any resulting values from DATE, DATETIME, or TIMESTAMP columns. The underlying values are represented as the Parquet INT64 type, which is represented as BIGINT in the Impala table. The Parquet values represent the time in milliseconds, while Impala interprets BIGINT as the time in seconds. Therefore, if you have a BIGINT column in a Parquet table that was imported this way from Sqoop, divide the values by 1000 when interpreting as the TIMESTAMP type.
Avro File Format
Currently, Avro tables cannot contain TIMESTAMP columns. If you need to store date and time values in Avro tables, as a workaround you can use a STRING representation of the values, convert the values to BIGINT with the UNIX_TIMESTAMP() function, or create separate numeric columns for individual date and time fields using the EXTRACT() function.
You can also use your Hive query like this to get the result in your desired TIMESTAMP format.
FROM_UNIXTIME(CAST(SUBSTR(timestamp_column, 1,10) AS INT)) AS timestamp_column;
Other workaround is to import data using --query in sqoop command, where you can cast your column into timestamp format.
Example
--query 'SELECT CAST (INSERTION_DATE AS TIMESTAMP) FROM tablename WHERE $CONDITIONS'
If your SELECT query gets a bit long, you can use configuration files to shorten the length of the command line call. Here is the reference

Hive date format not supporting in impala

Hive date format not supporting in impala.
I created partition on date column in hive table but when i can access the same table from hive_metadata in impala its showing
CAUSED BY: TableLoadingException: Failed to load metadata for table
'employee_part' because of unsupported partition-column type 'DATE' in
partition column 'hiredate'.
Please let me know which date format does hive and impala commonly support.
I used date format in hive as yyyy-mm-dd

Impala doesnt support the hive date format.
You have to use a timestamp (which means that you will always carry time but it will be 00:00:00.0000). Then depending on the tool you use after, you have to make a convertion again unfortunately.

HDFS String data to timestamp in hive table

Hi I have a data in HDFS as a string '2015-03-26T00:00:00+00:00' ..if i want to load this data into Hive table (column as timestamp).i am not able to load and i am getting null values.
if i specify column as string i am getting the data into hive table
but if i specify column as timestamp i am not able to load the data and i am getting all NULL values in that column.
Eg: HDFS - '2015-03-26T00:00:00+00:00'
hive table- create table t1(my_date string)
i can get output as - '2015-03-26T00:00:00+00:00'
if i specify create table t1(my_date as timestamp)--i can see all null values
Can any one help me on this

Timestamps in text files have to use the format yyyy-mm-dd hh:mm:ss[.f...]. If they are in another format declare them as the appropriate type (INT, FLOAT, STRING, etc.) and use a UDF to convert them to timestamps.
Go through below link:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Timestamps

You have to use a staging table. In the staging table load as String and in the final table use UDF as below to convert the string value to Timestamp
from_unixtime(unix_timestamp(column_name, 'dd-MM-yyyy HH:mm'))

Informatica Date/Time Conversion

in one of the requirment informatica fetching data from flat file as source file and insert records into a temporary table temp of DB2 database. Flat file has one column as datetime datatype (YYYY/MM/DD HH:MM:SS). However, informatica fetching this column as string datatype (Since Informatica date format is different from this column & DB2). So before loading into temp table of DB2 database, I need to convert back this column into Datetime format.
With Expresion transformation, I can do this but I dont know how? To_date conversion function (TO_DATE(FIELD, 'YYYY/MM/DD HH:MM:SS')) is there but it will take care of date only (YYYY/MM/DD). Its not taking care of time (HH:MM:SS) and because of this records are not inserting into temp table.
How can I convert datetime from String datatype to DB2 datetime format (YYYY/MM/DD HH:MM:SS)?

You tried to use the month format string (i.e. MM) for the minutes part of the date.
You need to use MI:
TO_DATE(FIELD, 'YYYY/MM/DD HH:MI:SS')

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

CurrentTime() generated from Pig showing as NULL in Hive Datetime column - hadoop

Related

How to specify the timestamp format when creating a table using a hdfs directory

Handling dates in Hadoop

Hive date format not supporting in impala

HDFS String data to timestamp in hive table

Informatica Date/Time Conversion

Categories

Resources