add partition in hive table based on a sub query - hadoop

I am trying to add partition to a hive table (partitioned by date)
My problem is that the date needs to be fetched from another table.
My query looks like :
ALTER TABLE my_table ADD IF NOT EXISTS PARTITION(server_date = (SELECT max(server_date) FROM processed_table));
When i run the query hive throws the following error:
Error: Error while compiling statement: FAILED: ParseException line 1:84 cannot recognize input near '(' 'SELECT' 'max' in constant (state=42000,code=40000)

Hive does not allow to use functions/UDF's for the partition column.
Approach 1:
To achieve this you can run the first query and store the result in one variable and then execute the query.
server_date=$(hive -e "set hive.cli.print.header=false; select max(server_date) from processed_table;")
hive -hiveconf "server_date"="$server_date" -f your_hive_script.hql
Inside your script you can use the following statement:
ALTER TABLE my_table ADD IF NOT EXISTS PARTITION(server_date =${hiveconf:server_date});
For more information on the hive variable substitution, you can refer link
Approach 2:
In this approach, you will need to create a temporary table if the partition data you are expecting is already not loaded in any other partitioned table.
Considering your data doesn't have the server_date column.
Load the data into temporary table
set hive.exec.dynamic.partition=true;
Execute the below query:
INSERT OVERWRITE TABLE my_table PARTITION (server_date)
SELECT b.column1, b.column2,........,a.server_date as server_date FROM (select max(server_date) as server_date from ) a, my_table b;

Related

No partition predicate found for Alias even when the partition predicate in present in the query

I have a table pos.pos_inv in hdfs which is partitioned by yyyymm. Below is the query:
select DATE_ADD(to_date(from_unixtime(unix_timestamp(Inv.actvydt, 'MM/dd/yyyy'))),5),
to_date(from_unixtime(unix_timestamp(Inv.actvydt, 'MM/dd/yyyy'))),yyyymm
from pos.pos_inv inv
INNER JOIN pos.POSActvyBrdg Brdg ON Brdg.EIS_POSActvyBrdgId = Inv.EIS_POSActvyBrdgId
where to_date(from_unixtime(unix_timestamp(Inv.nrmlzdwkenddt, 'MM/dd/yyyy')))
BETWEEN DATE_SUB(to_date(from_unixtime(unix_timestamp(Inv.actvydt, 'MM/dd/yyyy'))),6)
and DATE_ADD(to_date(from_unixtime(unix_timestamp(Inv.actvydt, 'MM/dd/yyyy'))),6)
and inv.yyyymm=201501
I have provided the partition value for the query as 201501, but still i get the error"
Error while compiling statement: FAILED: SemanticException [Error 10041]: No partition predicate found for Alias "inv" Table "pos_inv"
(schema)The partition, yyyymm is int type and actvydt is date stored as string type.
This happens because hive is set to strict mode.
this allow the partition table to access the respective partition /folder in hdfs .
set hive.mapred.mode=unstrict; it will work
In your query error it is said: No partition predicate found for Alias "inv" Table "pos_inv".
So you must put the where clause for the fields of the partitioned table (for pos_inv), and not for the other one (inv), as you've done.
set hive.mapred.mode=unstrict allows you access the whole data rather than the particular partitons. In some case read whole dataset is necessary, such as: rank() over
This happens when the 2 tables have the same column name (possibly same partition column). Try to deal with separate tables with where condition like below
WITH tableA as
(
-- All your where clause here
),
tableB AS
(
-- All your where clause here
)
select tableA.*, tableB.*

Apache Hive - Single Insert Date Value

I'm trying to insert a date into a date column using Hive. So far, here's what i've tried
INSERT INTO table1 (EmpNo, DOB)
VALUES ('Clerk#0008000', cast(substring(from_unixtime(unix_timestamp(cast('2016-01-01' as string), 'yyyy-MM-dd')),1,10) as date));
AND
INSERT INTO table table1 values('Clerk#0008000', cast(substring(from_unixtime(unix_timestamp(cast('2016-01-01' as string), 'yyyy-MM-dd')),1,10) as date));
AND
INSERT INTO table1 SELECT
'Clerk#0008000', cast(substring(from_unixtime(unix_timestamp(cast('2016-01-01' as string), 'yyyy-MM-dd')),1,10) as date);
But i still get
FAILED: SemanticException [Error 10293]: Unable to create temp file for insert values Expression of type TOK_FUNCTION not supported in insert/values
OR
FAILED: ParseException line 2:186 Failed to recognize predicate '<EOF>'. Failed rule: 'regularBody' in statement
Hive ACID has been enabled on the ORC based table and simple inserts without dates are working.
I think i'm missing something really simple. But can't put my finger on it.
Ok. I found it. I feel like a doofus now.
It was as simple as
INSERT INTO table1 values ('Clerk#0008000', '2016-01-01');

Excluding the partition field from select queries in Hive

Suppose I have a table definition as follows in Hive(the actual table has around 65 columns):
CREATE EXTERNAL TABLE S.TEST (
COL1 STRING,
COL2 STRING
)
PARTITIONED BY (extract_date STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\007'
LOCATION 'xxx';
Once the table is created, when I run hive -e "describe s.test", I see extract_date as being one of the columns on the table. Doing a select * from s.test also returns extract_date column values. Is it possible to exclude this virtual(?) column when running select queries in Hive.
Change this property
set hive.support.quoted.identifiers=none;
and run the query as
SELECT `(extract_date)?+.+` FROM <table_name>;
I tested it working fine.

Hive: Insert into hive table with column using select 1

Let's say I have a hive table test_entry with column called entry_id.
hive> desc test_entry;
OK
entry_id int
Time taken: 0.4 seconds, Fetched: 1 row(s)
Suppose I need to insert one row into this above table using select 1 (which returns 1). For example: A syntax which looks like the below:
hive> insert into table test_entry select 1;
But I get the below error:
FAILED: NullPointerException null
So effectively, I would like to insert one row for entry)id whose value will be 1 with such a select statement(without referring another table).
How can this be done?
Hive does not support what you're trying to do. Inserts to ORC based tables was introduced in Hive 0.13.
Prior to that, you have to specify a FROM clause if you're doing INSERT .. SELECT
A workaround might be to create an external table with one row and do the following
INSERT .. SELECT 1 FROM table_with_one_row

Hive and Sqoop partition

I have sqoopd data from Netezza table and output file is in HDFS, but one column is a timestamp and I want to load it as a date column in my hive table. Using that column I want to create partition on date. How can i do that?
Example: in HDFS data is like = 2013-07-30 11:08:36
In hive I want to load only date (2013-07-30) not timestamps. I want to partition on that column DAILY.
How can I pass partition by column as dynamically?
I have tried with loading data into one table as source. In final table I will do insert overwrite table partition by (date_column=dynamic date) select * from table1
Set these 2 properties -
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
And the Query can be like -
INSERT OVERWRITE TABLE TABLE PARTITION (DATE_STR)
SELECT
:
:
-- Partition Col is the last column
to_date(date_column) DATE_STR
FROM table1;
You can explore the two options of hive-import - if it is an incremental import you will be able to get the current day's partition.
--hive-partition-key
--hive-partition-value
You can just load the EMP_HISTORY table from EMP by enabling dynamic partition and converting the timestamp to date using to_date date function
The code might look something like this....
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE EMP_HISTORY PARTITION (join_date)
SELECT e.name as name, e.age as age, e.salay as salary, e.loc as loc, to_date(e.join_date) as join_date from EMP e ;

Resources