SemanticException [Error 10007]: Ambiguous column reference _c1 - hadoop

I'm facing issue while using four level of nesting in a hive query. Below is the query which I'm executing -
SELECT *,
SUM(qtod.amount) OVER (PARTITION BY qtod.id, qtod.year_begin_date ORDER BY qtod.tran_date)
FROM (SELECT *,
SUM(mtod.amount) OVER (PARTITION BY mtod.id, mtod.quarter_begin_date ORDER BY mtod.tran_date)
FROM (SELECT *,
SUM(wtod.amount) OVER (PARTITION BY wtod.id, wtod.month_begin_date ORDER BY wtod.tran_date)
FROM (select id,
year_begin_date,
quarter_begin_date,
month_begin_date,
week_begin_date,
tran_date,
amount,
SUM(amount)
OVER (PARTITION BY id,week_begin_date ORDER BY tran_date) FROM table_name)wtod)mtod)qtod;
If I'm excluding fourth level nesting it is working fine, but while including it, getting below Error msg -
FAILED: SemanticException [Error 10007]: Ambiguous column reference
_c1 in qtod
To avoid nesting i have tried to do it in other way
SELECT * FROM
(SELECT id,year_begin_date,tran_date,amount,SUM(amount) OVER (PARTITION BY id,year_begin_date ORDER BY tran_date) FROM yeartodate)ytod
JOIN
(SELECT *, SUM(mtod.amount) OVER (PARTITION BY mtod.id, mtod.quarter_begin_date ORDER BY mtod.tran_date)
FROM (SELECT *, SUM(wtod.amount) OVER (PARTITION BY wtod.id, wtod.month_begin_date ORDER BY wtod.tran_date)
FROM (select id,
year_begin_date,
quarter_begin_date,
month_begin_date,
week_begin_date,
tran_date,
amount,
SUM(amount)
OVER (PARTITION BY id,week_begin_date ORDER BY tran_date) FROM table_name)wtod)mtod)qtod
ON qtod.id=ytod.id AND qtod.tran_date=ytod.tran_date;
Still getting same Error.
after searching on web i found it's an issue with hive itself according to JIRA raised for hive
As jira is fixed now and patch is available in hive 14, so i tried to run it on hive 14(HDP).
Still getting the same Error.
Please write your suggestion.....

Non-aliased function calls within a SELECT are mapped to column names _c1, _c2, etc. In this case you have a single non-aliased function call per SELECT, so they all create a column _c1.
The issue is that because you are doing a SELECT * from the next sub-query down and then appending another function call that maps to _c1 then you have the same column named twice, and hence an error around an ambiguous column reference.
The solution should be to alias all of your function calls so that they do not use the _c1 default name, like so:
SELECT * FROM
(SELECT id,year_begin_date,tran_date,amount,SUM(amount) AS ytod_amount_sum OVER (PARTITION BY id,year_begin_date ORDER BY tran_date) FROM yeartodate)ytod
JOIN
(SELECT *, SUM(mtod.amount) AS mtod_amount_sum OVER (PARTITION BY mtod.id, mtod.quarter_begin_date ORDER BY mtod.tran_date)
FROM (SELECT *, SUM(wtod.amount) AS wtod_amount_sum OVER (PARTITION BY wtod.id, wtod.month_begin_date ORDER BY wtod.tran_date)
FROM (select id,
year_begin_date,
quarter_begin_date,
month_begin_date,
week_begin_date,
tran_date,
amount,
SUM(amount) AS amount_sum
OVER (PARTITION BY id,week_begin_date ORDER BY tran_date) FROM table_name)wtod)mtod)qtod
ON qtod.id=ytod.id AND qtod.tran_date=ytod.tran_date;

Related

Hive: How to output total row count as a variable

I have a dataset that I'm de-duping with the following code:
select session_id, sol_id, id, session_context_code, date
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id, date) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1
order by session_id, sol_id, date
I want to add a variable that stores the total count of rows after dedup, and I tried with count(*):
select session_id, sol_id, id, session_context_code, date,count(*) as total
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id,date) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1
order by session_id, sol_id, date
The error I received:
ERROR: Execute error: org.apache.hive.service.cli.HiveSQLException:
Error while compiling statement: FAILED: SemanticException
[Error 10025]: Line 1:44 Expression not in GROUP BY key 'session_id'
I just want to output a count as a variable that counts all distinct records by session_id and sol_id after de-duped by row number. How do I incorporate that to the code?
Based on Gomz's suggestion, but received error:
ERROR: Execute error: org.apache.hive.service.cli.HiveSQLException:
Error while compiling statement: FAILED: ParseException line
1:614 missing EOF at 'group' near 'nifi_date'
Code:
select session_id, solicit_id, nifi_date,id, session_context_code,count(*) as total
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1 and
session_context_code in ("4","3") and
order by session_id, sol_id, nifi_date
group by session_id, sol_id, nifi_date,id, session_context_code
A Hive query with COUNT(*) along with columns in SELECT clause should have these columns grouped at the end with GROUP BY.
Some samples:
SELECT COUNT(*) FROM employees;
SELECT id, name, COUNT(*) FROM employees GROUP BY id, name;
In your issue scenario, the query should look like below,
select session_id, sol_id, id, session_context_code, count(*) as total
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id,date) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1
GROUP BY session_id, sol_id, id, session_context_code
order by session_id, sol_id, date
You can read more HERE
Update: If you want to count all distinct records only by session_id and sol_id, then the query can be as follows,
select session_id, sol_id, count(*) as total
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id,date) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1
GROUP BY session_id, sol_id
order by session_id, sol_id, date;
As discussed, you can use only the columns you need to be counted in SELECT and GROUP BY.
If you need the results with multiple columns more than what needs to be counted, you can create a temporary table with only the columns those are counted and join with the original table. i.e., if you need the columns c,d,e,f as well from the table even though you need the count of columns a, b then you can do something like below,
CREATE TABLE tmp AS
SELECT a, b, count(*)
FROM table1
GROUP BY a,b;
Do a JOIN between tmp and table1 on columns a, b
SELECT y.a, y.b, x.c, x.d, x.e, x.f
FROM tmp y, table1 x
WHERE y.a=x.a
AND y.b=x.b;
Hope this helps!

How to get records from select statement based on one column distinct value in Oracle?

Please help me with next problem:
And the result should be:
filtered by iban_code distinct
You can use row_number analytical function.
Select * from
(Select t.*,
Row_number()
over (partition by per_id, iban_code
order by main_bank_account desc) as rn
From your_table t)
Where rn=1;
Cheers!!

Issue with select subquery in Mysql8.0

Why is below an invalid sql statement in mysql. It works perfectly in oracle.
SELECT originalAmount,fees,id FROM
(SELECT originalAmount,fees,id, ROW_NUMBER() OVER (PARTITION BY transaction_number ORDER BY eventdate ASC) RANK FROM kir_records where customerid= 1704)
WHERE RANK = 1;
I immediately get a syntax error as soon as paste this in mysql workbench.
Error:
Select is invalid at this position. Expecting '(' at first select.
Is there a workaround to make this work ?
try using this query.
SELECT originalAmount,fees,id FROM
((SELECT originalAmount,fees,id, ROW_NUMBER() OVER (PARTITION BY transaction_number ORDER BY eventdate ASC) RANK FROM kir_records where customerid= 1704))
WHERE RANK = 1;
Look like RANK is a reserved word in MySql. Used backquotes (``) around RANK and it worked as expected. One other thing to take care about is that every derived table (AKA sub-query) must indeed have an alias. Dervied Table alias
Here is the query which worked for me :
SELECT originalAmount,fees,id FROM
(SELECT originalAmount,fees,id, ROW_NUMBER() OVER (PARTITION BY transaction_number ORDER BY eventdate ASC) `RANK` FROM kir_records where customerid= 1704) AS SomeAlias
WHERE `RANK` = 1;

Getting Error "ORA-01732: data manipulation operation not legal on this view" while deleting from a table in oracle

Same query running in Db2 , but in oracle it's giving error.
Please help. thanks in advance.
delete from (SELECT
EMP_ID,
SAL,
ROW_NUMBER() OVER (PARTITION BY EMP_ID ORDER BY SAL DESC) As RN
FROM FPM.FACT_PL_BS
WHERE MEASUREMENT_PERIOD_ID=20170811
AND SCENARIO_ID=1) A where RN>1}
Check documentation Notes on Updatable Views:
The view must not contain any of the following constructs:
A set operator
A DISTINCT operator
An aggregate or analytic function
A GROUP BY, ORDER BY, MODEL, CONNECT BY, or START WITH clause
A collection expression in a SELECT list
A subquery in a SELECT list
A subquery designated WITH READ ONLY
Joins, with some exceptions, as documented in Oracle Database Administrator's Guide
ROW_NUMBER is an Analytic Function, so update is not permitted.
I think this one should work (not tested):
delete from FPM.FACT_PL_BS
WHERE ROWID =ANY
(SELECT ROW_ID
FROM
(SELECT ROWID as ROW_ID,
EMP_ID,
SAL,
ROW_NUMBER() OVER (PARTITION BY EMP_ID ORDER BY SAL DESC) As RN
FROM FPM.FACT_PL_BS
WHERE MEASUREMENT_PERIOD_ID=20170811
AND SCENARIO_ID=1)
WHERE RN > 1;
or maybe
delete from FPM.FACT_PL_BS
WHERE MEASUREMENT_PERIOD_ID=20170811
AND SCENARIO_ID=1
AND ROWID <>ALL
(select MAX(ROWID) KEEP (DENSE_RANK FIRST ORDER BY SAL) OVER (PARTITION BY EMP_ID)
FROM FPM.FACT_PL_BS)

Error on Partition over row number

I am creating a sub-query to select distinct entries on a certain column, DIS_COL, then return all other columns for those distinct entries, arbitrarily selecting the first row.
To do this I'm creating a sub-query that selects only first rows using over - partition by, then selecting from that sub-query.
There is an error with my code however; "ORA-00923: FROM keyword not found where expected".
My code is below:
select *
from (
select *,
row_number() over (partition by DIS_COL order by COL_2) as row_number --ORDER BY FIELD DETERMINES WHICH ROW IS THE FIRST ROW AND THUS WHICH ONE IS SELECTED.
from MY_TABLE
) as rows
where row_number = 1
AND CRITERIA_COL = 'CRIT_1'
OR CRITERIA_COL_2 = 'CRIT_2';
How can I correct my code to achieve the desired result?
I am working on an Oracle database.
Remove as rows. It is not proper syntax for the table/query alias. It is syntax for column alias.
select *
from (
select T.*,
row_number() over (partition by DIS_COL order by COL_2) as row_number --ORDER BY FIELD DETERMINES WHICH ROW IS THE FIRST ROW AND THUS WHICH ONE IS SELECTED.
from MY_TABLE t
)
where row_number = 1
AND (CRITERIA_COL = 'CRIT_1'
OR CRITERIA_COL_2 = 'CRIT_2');
It's not the ROW_NUMBER, it's the *, Add an alias to the subquery:
select *
from (
select T.*, -- here
row_number() over (partition by DIS_COL order by COL_2) as row_number --ORDER BY FIELD DETERMINES WHICH ROW IS THE FIRST ROW AND THUS WHICH ONE IS SELECTED.
from MY_TABLE
)T as rows -- and here
where row_number = 1
AND CRITERIA_COL = 'CRIT_1'
OR CRITERIA_COL_2 = 'CRIT_2';

Resources