How to fix partition predicate error without using (set hive.mapred.mode=unstrict) - hadoop

Is there any way to bypass Hive query when strict mode is enabled in partition table without using set hive.mapred.mode=unstrict
I have two tables both are partitioned with dte, when I do union operation on both tables and trying to select 2 where conditions I am getting partition predicate error
Query
with a as (select "table1" as table_name,column1,column2,column3 from table_one
union all
select "table2" as table_name,column1,column2,column3 from table_two)
select * from a where dte='2017-08-01' and table_name='table1';
Error
Error: Error while compiling statement: FAILED: SemanticException
[Error 10041]: No partition predicate found for Alias "table_one"
Table "table_one" (state=42000,code=10041)

Related

whether impala support java UDF in where clause

I can use a java based UDF in hive and impala,but throw ClassNotFound error when call the udf in where clause
The UDF can not use when referenced in where clause but work properly when it only referenced behind select with impala 2.9.0-cdh5.12.1
In hive select udfjson(memo,state) from tableA where udfjson(memo,state) = 0 and name = 'test' is working properly but not in impala.
Execute select udfjson(memo,state) from tableA where name = 'test' in impala is OK. The UDF can use in impala only it not in where clause
here is the error
Error(255): Unknown error 255
Root cause: NoClassDefFoundError: org/apache/hadoop/hdfs/DFSInputStream$ByteArrayStrategy
It's possible to referenced UDF in where clause with impala ?
Use sub-query:
select * from
(
select udfjson(memo,state) as state from tableA where name = 'test'
)s
where s.state=0

global temporary table hibernate

I'm trying to replace a giant IN clause (hundreds of values) with a JOIN for performance reasons, so I created a global temp table (Oracle) hoping that may be a viable alternative:
CREATE GLOBAL TEMPORARY TABLE TMP_USER_GUID (
user_guid varchar(20)
)
ON COMMIT DELETE ROWS
When I run my sql manually, it works fine:
INSERT ALL
INTO ent.tmp_usr_guid VALUES ('00JD49W7IJ93ZU5MBWBQ')
-- as many INTO statements as I would have IN parameters
SELECT * FROM DUAL;
SELECT u.guid, u.first_name, u.last_name, ...
FROM usr u
JOIN ...
JOIN ...
JOIN tmp_usr_guid tug ON u.guid = tug.usr_guid
When I try running it as a native sql statement using Hibernate (5.2.12.FINAL) it throws:
org.hibernate.QueryException: unexpected char: ';' [INSERT ALL INTO
ent.tmp_usr_guid VALUES ('00JD49W7IJ93ZU5MBWBQ') SELECT * FROM DUAL;
SELECT u.guid,
Any thoughts on the correct approach to take?

add partition in hive table based on a sub query

I am trying to add partition to a hive table (partitioned by date)
My problem is that the date needs to be fetched from another table.
My query looks like :
ALTER TABLE my_table ADD IF NOT EXISTS PARTITION(server_date = (SELECT max(server_date) FROM processed_table));
When i run the query hive throws the following error:
Error: Error while compiling statement: FAILED: ParseException line 1:84 cannot recognize input near '(' 'SELECT' 'max' in constant (state=42000,code=40000)
Hive does not allow to use functions/UDF's for the partition column.
Approach 1:
To achieve this you can run the first query and store the result in one variable and then execute the query.
server_date=$(hive -e "set hive.cli.print.header=false; select max(server_date) from processed_table;")
hive -hiveconf "server_date"="$server_date" -f your_hive_script.hql
Inside your script you can use the following statement:
ALTER TABLE my_table ADD IF NOT EXISTS PARTITION(server_date =${hiveconf:server_date});
For more information on the hive variable substitution, you can refer link
Approach 2:
In this approach, you will need to create a temporary table if the partition data you are expecting is already not loaded in any other partitioned table.
Considering your data doesn't have the server_date column.
Load the data into temporary table
set hive.exec.dynamic.partition=true;
Execute the below query:
INSERT OVERWRITE TABLE my_table PARTITION (server_date)
SELECT b.column1, b.column2,........,a.server_date as server_date FROM (select max(server_date) as server_date from ) a, my_table b;

No partition predicate found for Alias even when the partition predicate in present in the query

I have a table pos.pos_inv in hdfs which is partitioned by yyyymm. Below is the query:
select DATE_ADD(to_date(from_unixtime(unix_timestamp(Inv.actvydt, 'MM/dd/yyyy'))),5),
to_date(from_unixtime(unix_timestamp(Inv.actvydt, 'MM/dd/yyyy'))),yyyymm
from pos.pos_inv inv
INNER JOIN pos.POSActvyBrdg Brdg ON Brdg.EIS_POSActvyBrdgId = Inv.EIS_POSActvyBrdgId
where to_date(from_unixtime(unix_timestamp(Inv.nrmlzdwkenddt, 'MM/dd/yyyy')))
BETWEEN DATE_SUB(to_date(from_unixtime(unix_timestamp(Inv.actvydt, 'MM/dd/yyyy'))),6)
and DATE_ADD(to_date(from_unixtime(unix_timestamp(Inv.actvydt, 'MM/dd/yyyy'))),6)
and inv.yyyymm=201501
I have provided the partition value for the query as 201501, but still i get the error"
Error while compiling statement: FAILED: SemanticException [Error 10041]: No partition predicate found for Alias "inv" Table "pos_inv"
(schema)The partition, yyyymm is int type and actvydt is date stored as string type.
This happens because hive is set to strict mode.
this allow the partition table to access the respective partition /folder in hdfs .
set hive.mapred.mode=unstrict; it will work
In your query error it is said: No partition predicate found for Alias "inv" Table "pos_inv".
So you must put the where clause for the fields of the partitioned table (for pos_inv), and not for the other one (inv), as you've done.
set hive.mapred.mode=unstrict allows you access the whole data rather than the particular partitons. In some case read whole dataset is necessary, such as: rank() over
This happens when the 2 tables have the same column name (possibly same partition column). Try to deal with separate tables with where condition like below
WITH tableA as
(
-- All your where clause here
),
tableB AS
(
-- All your where clause here
)
select tableA.*, tableB.*

Insert Overwrite for multiple inserts in hive which have the same partition with same parameter value

Hi Guys,
So I am trying to do multiple inserts and i am successfully able to
do it but if there are two queries which have same partition and
static value assigned it gives me the following error
:15:02:22 [EXPLAIN - 0 row(s), 0.000 secs] [Error Code: 10087, SQL State:
42000] Error while compiling statement: FAILED: SemanticException
[Error 10087]: The same output cannot be present multiple times:
table_name#id=0
here first insert happens successfully but because second insert has the same value assigned for id
which is 0 ..it gives the above error ..please let me know a
workaround.Thanks :)
FROM (
Select * from Table_Name
)Query
INSERT OVERWRITE TABLE Table_Name PARTITION(id=0)
select column1,column2,column3
GROUP BY column1,column2,column1
INSERT OVERWRITE TABLE Table_Name PARTITION(id=0)
select column1,count(*) as column2
Instead of multiple inserts, you can just do one insert with a union of the two queries.
FROM (
Select * from Table_Name
)Query
INSERT OVERWRITE TABLE Table_Name PARTITION(id=0)
select <query 1>
UNION ALL
select <query 2>

Resources