How to automatically get the current date and time in a column using HIVE - hadoop

Hey I have two columns in my HIVE table :
For example :-
c1 : name
c2 : age
Now while creating a table I want to declare two more columns which automatically give me the current date and time when the row is loaded.
eg: John 24 26/08/2015 11:15
How can this be done?

Hive currently does not support the feature to add a default value to any column definition while creating a table. Please refer to the link for complete hive create table syntax:
Hive Create Table specification
Alternative work around for this issue would be to temporarily load data into temporary table and use the insert overwrite table statement to add the current date and time into the main table.
Below example may help:
1. Create a temporary table
create table EmpInfoTmp(name string, age int);
2. Insert data using a file or existing table into the EmpInfoTmp table:
name|age Alan|28 Sue|32 Martha|26
3. Create a table which will contain your final data:
create table EmpInfo(name string, age tinyint, createDate string, createTime string);
4. Insert data from the temporary table and with that also add the columns with default value as current date and time:
insert overwrite table empinfo select name, age, FROM_UNIXTIME( UNIX_TIMESTAMP(), 'dd/MM/YYYY' ), FROM_UNIXTIME( UNIX_TIMESTAMP(), 'HH:mm' ) from empinfofromfile;
5. End result would be like this:
name|age|createdate|createtime Alan|28|26/08/2015|03:56 Martha|26|26/08/2015|03:56 Sue|32|26/08/2015|03:56
Please note that the creation date and time values will be entered accurately by adding the data to your final table as and when it comes into the temp table.

Note: You can't set more then 1 column as CURRENT_TIMESTAMP.
Here this way, You cant set CURRENT_TIMESTAMP in one column
SQL:
CREATE TABLE IF NOT EXISTS `hive` (
`id` int(11) NOT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`age` int(11) DEFAULT '0',
`datecreated` timestamp NULL DEFAULT CURRENT_TIMESTAMP
);

Hey i found a way to do it using shell script.
Heres how :
echo "$(date +"%Y-%m-%d-%T") $(wc -l /home/hive/landing/$line ) $dir " >> /home/hive/recon/fileinfo.txt
Here i get the date without spaces. In the end I upload the textfile to my hive table.

Related

fast comparison a list with itself

I have a list giant list (100k entries) in my database. Each entry contains a id, text and a date.
I created a function to compare two text as possible. How it looks like is not necessary right now.
Is there a "good" way to remove "duplicates" (as possible) from the list by text?
Currently I'm looping through the list twice and compare each entry with each entry, except itself by id.
If your question is when you insert a row in the table... you can include the unique constraint.
Postgresql
CREATE TABLE table1 (
id serial PRIMARY KEY,
txt VARCHAR (50),
dt timestamp,
UNIQUE(txt)
);
Oracle
CREATE TABLE table1
( id numeric(10) NOT NULL,
txt varchar2(50) NOT NULL,
date timestamp,
CONSTRAINT txt_unique UNIQUE (txt)
);

oracle add column to existing table

I have already a table in oracle defined as below:
CREATE TABLE GENERAL_STATISTICS.PPLP_LOAD_GENSTAT3
(
NAME VARCHAR2(100 BYTE),
START_TIME DATE,
END_TIME DATE
)
What I would like to achieve is add an extra column at the end (as 4th column to the table). I execute:
ALTER TABLE PPLP_LOAD_GENSTAT3
ADD
(
ROWS_LOADED varchar2(100 BYTE)
);
I receive an error "ORA-01735: invalid ALTER TABLE option"
What would be the correct way to achieve this?
Thank you,
Best Regards
its for the type of field, you must try to change to another type of data

Oracle converted Varchar field to number how to convert it back to Varchar

Till today we use to enter numeric value in the Varchar column so Oracle converted that varchar field to Numeric field.
And now when we are trying to insert Character value it is throwing ORA-01722 (invalid number).
Could anyone help me out in order to convert it back to varchar field?
The commenters are correct: the database does not change column types on its own. In general, you must create a new column, copy the old data over to the new column, drop the original column, and rename the new column.
drop table deleteme_table;
-- zippy s/b varchar2(30), not integer
CREATE TABLE deleteme_table
(
adate DATE
, zippy INTEGER
);
-- Add the correct type to the table as a new column
ALTER TABLE deleteme_table
ADD (tempcol VARCHAR2 (30));
-- Copy the old values to the new column
UPDATE deleteme_table
SET tempcol = zippy;
-- Get rid of the original column
ALTER TABLE deleteme_table
DROP COLUMN zippy;
-- Rename to original column name
alter table deleteme_table rename column tempcol to zippy;

Hadoop and hive optimisation

I need help on the following scenario:
1) Memo table is the source table in hive.
It has 5493656359 records.Its desc is as follows:
load_ts timestamp
memo_ban bigint
memo_id bigint
sys_creation_date timestamp
sys_update_date timestamp
operator_id bigint
application_id varchar(6)
dl_service_code varchar(5)
dl_update_stamp bigint
memo_date timestamp
memo_type varchar(4)
memo_subscriber varchar(20)
memo_system_txt varchar(180)
memo_manual_txt varchar(2000)
memo_source varchar(1)
data_dt string
market_cd string
Partition information:
data_dt string
market_cd string
2)
This is the target table
CREATE TABLE IF NOT EXISTS memo_temprushi (
load_ts TIMESTAMP,
ban BIGINT,
attuid BIGINT,
application VARCHAR(6),
system_note INT,
user_note INT,
note_time INT,
date TIMESTAMP)
PARTITIONED BY (data_dt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS ORC
TBLPROPERTIES ("orc.compress"="SNAPPY");
3)
This is the initial load statement from source table Memo into
target table memo_temprushi. Loads all records till date 2015-12-14:
SET hive.exec.compress.output=true;
SET mapred.output.compression.type=BLOCK;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
INSERT INTO TABLE memo_temprushi PARTITION (DATA_DT)
SELECT LOAD_TS,MEMO_BAN, OPERATOR_ID, APPLICATION_ID,
CASE WHEN LENGTH(MEMO_SYSTEM_TXT)=0 THEN 0 ELSE 1 END,
CASE WHEN LENGTH(MEMO_MANUAL_TXT)=0 THEN 0 ELSE 1 END,
HOUR(MEMO_DATE), MEMO_DATE, DATA_DT
FROM tlgmob_gold.MEMO WHERE LOAD_TS < DATE('2015-12-15');
4)
For incremental load I want to insert the rest of the records i.e. from date 2015-12-15 onward. I'm using following query:
INSERT INTO TABLE memo_temprushi PARTITION (DATA_DT)
SELECT MS.LOAD_TS,MS.MEMO_BAN, MS.OPERATOR_ID, MS.APPLICATION_ID,
CASE WHEN LENGTH(MS.MEMO_SYSTEM_TXT)=0 THEN 0 ELSE 1 END,
CASE WHEN LENGTH(MS.MEMO_MANUAL_TXT)=0 THEN 0 ELSE 1 END,
HOUR(MS.MEMO_DATE), MS.MEMO_DATE, MS.DATA_DT
FROM tlgmob_gold.MEMO MS JOIN (select max(load_ts) max_load_ts from memo_temprushi) mt
ON 1=1
WHERE
ms.load_ts > mt.max_load_ts;
It launches 2 jobs. Initially it gives warning regarding stage being a cross product.
The first job gets completely quite soon but second job remains stuck at reduce 33%.
The log shows : [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: EventFetcher is interrupted.. Returning
It shows that the number of reducers is 1.
Trying to increase the number of reducers through this command set mapreduce.job.reduces but it's not working.
Thanks
You can try this.
Run "select max(load_ts) max_load_ts from memo_temprushi"
Add the value in the where condition of the query and remove the join condition of the query.
If it works, then you can develop shell script in which first query will get max value and then run the second query with out join.
Here is the sample shell script.
max_date=$(hive -e "select max(order_date) from orders" 2>/dev/null)
hive -e "select order_date from orders where order_date >= date_sub('$max_date', 7);"

Hive update all values in a column

I have an external partitioned Hive table. One of its columns is a string named OLDDATE that has the date in a different format(DD-MM-YY). I want to update the column and store dates in YYYY-MM-DD format. All years are 20XX.
So I thought of this
select CONCAT('20',SPLIT(OLDDATE ,'-')[2],'-',SPLIT(OLDDATE ,'-')[1],'-',SPLIT(OLDDATE ,'-')[0]) from table
This gives me the dates in the format I want. Now how do I overwrite the old date with this new date?
You can effect an update by overwriting the table with its own contents, just with the date field changed according to your transformation, like this pseudo-code:
INSERT OVERWRITE table
SELECT
col1
, col2
...
, CONCAT('20',SPLIT(OLDDATE ,'-')[2],'-',SPLIT(OLDDATE ,'-')[1],'-',SPLIT(OLDDATE ,'-')[0]) AS olddate
...
, coln
FROM table;
#user2441441
To overwrite a partitioned table:
INSERT OVERWRITE table PARTITION (p_col)
SELECT
col1
, col2
...
, CONCAT('20',SPLIT(OLDDATE ,'-')[2],'-',SPLIT(OLDDATE ,'-')[1],'-
',SPLIT(OLDDATE ,'-')[0]) AS olddate
...
, coln
, p_col
FROM table;
Since its an partitioned table, the folder names must be created with the date values.
Hence you are not able to update the values.
One work around for this would be create a new table and run your above query and insert data into the new table.
After that you can drop your existing table and treat this new table as your required table.

Resources