Hadoop - error message when declaring variable within query - hadoop

I have tried the following query within HUE's Beeswax Query Editor:
SET MAXDATE=(SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE);
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
WHERE YEAR(DATA_DAY) >= '2015'
AND DATA_DAY > ${HIVECONF:MAXDATE};
This query will not run and produces the following error message:
FAILED: ParseException line 1:4 missing KW_ROLE at 'MAXDATE' near 'MAXDATE' line 1:11 missing EOF at '=' near 'MAXDATE'
Any advice on what the problem is? I don't understand what the KW_ROLE message means.
I come from a SQL Server background and would just run the following within SQL Server, but am trying to find a functional Hadoop/Hive equivalent.
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
WHERE YEAR(DATA_DAY) >= '2015'
AND DATA_DAY > (SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE)

Query which you have tried contains syntax issue. HiveConf should surrounded by single quotes.
SET MAXDATE=(SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE);
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
WHERE YEAR(DATA_DAY) >= '2015'
AND DATA_DAY > '${HIVECONF:MAXDATE}';

As far as I know, hive support the following syntax too.
SELECT COUNT(*) FROM DB2.SOURCE_TABLE a
JOIN
(SELECT MAX(DATA_DAY) AS max_date FROM DB1.DESTINATION_TABLE) b
WHERE YEAR(a.DATA_DAY) >= '2015'
AND a.DATA_DAY > b.max_date;
But it's not a good implementation if there are bunches of data on DB1.DESTINATION_TABLE.
In such case each query would task lots of sub-querys in SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE.
If possible, you could store your SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE result in another table, maybe Max_table.
Then the sql would be like this:
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
JOIN Max_table
WHERE YEAR(DB2.SOURCE_TABLE.DATA_DAY) >= '2015' and
DB2.SOURCE_TABLE.DATA_DAY > (Max_table.DATA_DAY)

Related

Select Statement Using Dual Table As A Sub Query In Oracle Generates "From keyword not found" Error

I get below error message while executing following queries using a select statement from dual table as a sub query:
Error: ORA-00923: FROM keyword not found where expected
Query 1:
select a.dt_1, a.dt_2, a.dt_1=a.dt_2 as "match_type" from
(select to_date(replace('2020-05-14 00:00:00',' 00:00:00',''), 'yyyy/mm/dd') as "dt_1", to_date('14/05/2020','dd/mm/yyyy') as "dt_2" from dual) a
Query 2:
select a.dt_1, a.dt_2, a.dt_1=a.dt_2 as match_type from
(select to_date(replace('2020-05-14 00:00:00',' 00:00:00',''), 'yyyy/mm/dd') as dt_1, to_date('14/05/2020','dd/mm/yyyy') as dt_2 from dual) a
When I individually run sub query it executes as expected, however when I run the whole statement it generates error.
Any help is appreciated.
Your match_type column is generating the error. Oracle doesn't support relational operator matching. You may try below query -
SELECT a.dt_1,
a.dt_2,
CASE WHEN a.dt_1=a.dt_2 THEN 'TRUE' ELSE 'FLASE' END AS "match_type"
FROM (SELECT TO_DATE(REPLACE('2020-05-14 00:00:00',' 00:00:00',''), 'yyyy/mm/dd') as "dt_1",
TO_DATE('14/05/2020','dd/mm/yyyy') as "dt_2"
FROM DUAL) a;

Convert Oracle (Cross Join?) to Netezza when using comma separated table list instead of JOIN keywords

Below is is some Oracle PL/SQL code to join tables without using actual JOIN keywords. This looks like a cross join? How would I convert to Netezza SQL code? That's where I'm stuck.
SELECT COUNT(*)
FROM TABLE_A A, TABLE_A B
WHERE A.X = 'Y' AND A.PATH LIKE '/A/A/A'
AND B.X = 'Z' AND B.PATH LIKE '/B/B/B';
Oracle Cross Join:
http://www.sqlguides.com/sql_cross_join.php
Here's what I tried so far:
SELECT *
from TABLE_A A
cross join (
select * from TABLE_A
) B
WHERE
A.X = 'Y' AND A.PATH LIKE '/A/A/A'
AND B.X = 'Z' AND B.PATH LIKE '/B/B/B';
EDIT:
a_horse_with_no_name:
When I use either syntax in Netezza for the COUNT(*) in the very beginning, it works and returns a count of 60, which matches the first query above when running in Oracle. Without the WHERE clause in Netezza returns 125316 results, which matches the first query above when running in Oracle. When I use either syntax in Netezza for the SELECT * in the very beginning, I get error
ERROR [HY000] ERROR: Record size 70418 exceeds internal limit of 65535 bytes'
Had to use explicit columns in Netezza when doing a CROSS JOIN. Using SELECT * throws the error as indicated in my question EDIT. Also had to escape the '%' character by escaping nothing. Thank you a_horse_with_no_name. Cheers! "Where everybody knows your name." ;-)
select A.CODE, B.CODE, LOWER(A.DIM), LOWER(B.DIM)
FROM TABLE_A A
cross join TABLE_A B
WHERE A.PATH LIKE '\A\A\A%' ESCAPE '' AND A.X = 'Y'
AND B.PATH LIKE '\B\B\B%' ESCAPE '' AND B.X = 'Y'

ORA-00979 Not a GROUP BY Expression with complex EXTRACTVALUE

I'm trying to execute a rather simple Oracle query against a table with a XMLTYPE column:
Select POB_CODPOL, CODSIS From (
Select T1.POB_CODPOL, EXTRACTVALUE(T1.POB_XMLPOL, '/Polbas/polfun[nomfun="filterBySystem"]/extpar[codele="codsis"]/valele/text()') CODSIS
From TDTX_POLITICA_CLOB T1
Where T1.POB_CODEMP = '001840'
)
Group By POB_CODPOL, CODSIS
This throws a ORA-00979 Not a GROUP BY Expression, which I don't really understand.
Even worse: when I execute the exact same query, but with a simplified XPATH query does work:
Select POB_CODPOL, CODSIS From (
Select T1.POB_CODPOL, EXTRACTVALUE(T1.POB_XMLPOL, 'Polbas/codpol/text()') CODSIS
From TDTX_POLITICA_CLOB T1
Where T1.POB_CODEMP = '001840'
)
Group By POB_CODPOL, CODSIS
It looks like Oracle doesn't like conditions like [nomfun="filterBySystem"] when using a GROUP BY (without the grouping clause, everything works fine).
Any idea on why this can be happening?
Edit: the result of the inner query is rather simple:
EXTRACTVALUE is deprecated.
Oracle recommends to use XMLQUERY, XMLTABLE for it.
This one should work:
WITH t as
(SELECT T1.POB_CODPOL, x.CODSIS
FROM TDTX_POLITICA_CLOB T1
NATURAL JOIN XMLTABLE('/Polbas/polfun[nomfun="filterBySystem"]/extpar[codele="codsis"]/valele‌​'
PASSING POB_XMLPOL COLUMNS
CODSIS VARCHAR2(50) PATH '/') x
Where T1.POB_CODEMP = '001840')
SELECT POB_CODPOL, CODSIS
FROM t
GROUP BY POB_CODPOL, CODSIS;
There's a Bug 28588011 : "ORA-00979: NOT A GROUP BY EXPRESSION" WHEN TABLE FORMAT IS BINARY XML.
Severity 2 - Severe Loss of Service
Created 02-Sep-2018
Status 11 - Code/Hardware Bug (Response/Resolution)
WORKAROUND No
I guess they won't fix it. But the hint NO_XML_QUERY_REWRITE (that I use for other XML related bugs, too) worked for me.

How do I store the result of a query into a variable in HiveQL and then use it in another select statement?

How do I store the result of a query into a variable in HiveQL and then use it in another select statement?
For example, whenever I store a normal variable and use it in a select statement it works just fine.
SET a=1; SELECT CASE WHEN b > ${hiveconf:a} THEN NULL ELSE 1 from my_table
But when I try and put a query into the variable, its seems to store the query instead of running it and storing the result. This then results in an error.
SET a=SELECT MAX(num) FROM my_other_table; SELECT CASE WHEN b > ${hiveconf:a} THEN NULL ELSE 1 from my_table
The error being: cannot recognize input near 'select' 'max' '(' in select clause
Does anyone know a work around to this? I am using Hive 0.13
You cant do that only by hive.
If your hive query is controlled by outer script like shell or python.You can perform the first query, get the output and then put it in the next sql.
Or you can change your sql to use join.Your example code can be changed to
select case when b > t.a then NULL else 1 from my_table
join (select max(num) a from my_other_table) t

Why can't hive recognize alias named in select part?

Here's the scenario: When I invoke hql as follows, it tells me that it cannot find alias for u1.
hive> select user as u1, url as u2 from rank_test where u1 != "";
FAILED: SemanticException [Error 10004]: Line 1:50 Invalid table alias or column reference 'u1': (possible column names are: user, url)
This problem is the same as when I try to use count(*) as cnt. Could anyone give me some hint on how to use alias in where clause? Thanks a lot!
hive> select user, count(*) as cnt from rank_test where cnt >= 2 group by user;
FAILED: ParseException line 1:58 missing EOF at 'where' near 'user'
The where clause is evaluated before the select clause, which is why you can't refer to select aliases in your where clause.
You can however refer to aliases from a derived table.
select * from (
select user as u1, url as u2 from rank_test
) t1 where u1 <> "";
select * from (
select user, count(*) as cnt from rank_test group by user
) t1 where cnt >= 2;
Side note: a more efficient way to write the last query would be
select user, count(*) as cnt from rank_test group by user
having count(*) >= 2
If I remember correctly, you can refer to the alias in having i.e. having cnt >= 2
I was able to use Alias in my Hive select statement using backtick symbol ``.
SELECT COL_01 AS `Column_A`;
The above solution worked for Hive version 1.2.1.
reference link

Resources