I have a dataset that I'm de-duping with the following code:
select session_id, sol_id, id, session_context_code, date
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id, date) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1
order by session_id, sol_id, date
I want to add a variable that stores the total count of rows after dedup, and I tried with count(*):
select session_id, sol_id, id, session_context_code, date,count(*) as total
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id,date) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1
order by session_id, sol_id, date
The error I received:
ERROR: Execute error: org.apache.hive.service.cli.HiveSQLException:
Error while compiling statement: FAILED: SemanticException
[Error 10025]: Line 1:44 Expression not in GROUP BY key 'session_id'
I just want to output a count as a variable that counts all distinct records by session_id and sol_id after de-duped by row number. How do I incorporate that to the code?
Based on Gomz's suggestion, but received error:
ERROR: Execute error: org.apache.hive.service.cli.HiveSQLException:
Error while compiling statement: FAILED: ParseException line
1:614 missing EOF at 'group' near 'nifi_date'
Code:
select session_id, solicit_id, nifi_date,id, session_context_code,count(*) as total
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1 and
session_context_code in ("4","3") and
order by session_id, sol_id, nifi_date
group by session_id, sol_id, nifi_date,id, session_context_code
A Hive query with COUNT(*) along with columns in SELECT clause should have these columns grouped at the end with GROUP BY.
Some samples:
SELECT COUNT(*) FROM employees;
SELECT id, name, COUNT(*) FROM employees GROUP BY id, name;
In your issue scenario, the query should look like below,
select session_id, sol_id, id, session_context_code, count(*) as total
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id,date) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1
GROUP BY session_id, sol_id, id, session_context_code
order by session_id, sol_id, date
You can read more HERE
Update: If you want to count all distinct records only by session_id and sol_id, then the query can be as follows,
select session_id, sol_id, count(*) as total
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id,date) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1
GROUP BY session_id, sol_id
order by session_id, sol_id, date;
As discussed, you can use only the columns you need to be counted in SELECT and GROUP BY.
If you need the results with multiple columns more than what needs to be counted, you can create a temporary table with only the columns those are counted and join with the original table. i.e., if you need the columns c,d,e,f as well from the table even though you need the count of columns a, b then you can do something like below,
CREATE TABLE tmp AS
SELECT a, b, count(*)
FROM table1
GROUP BY a,b;
Do a JOIN between tmp and table1 on columns a, b
SELECT y.a, y.b, x.c, x.d, x.e, x.f
FROM tmp y, table1 x
WHERE y.a=x.a
AND y.b=x.b;
Hope this helps!
Please help me with next problem:
And the result should be:
filtered by iban_code distinct
You can use row_number analytical function.
Select * from
(Select t.*,
Row_number()
over (partition by per_id, iban_code
order by main_bank_account desc) as rn
From your_table t)
Where rn=1;
Cheers!!
Why is below an invalid sql statement in mysql. It works perfectly in oracle.
SELECT originalAmount,fees,id FROM
(SELECT originalAmount,fees,id, ROW_NUMBER() OVER (PARTITION BY transaction_number ORDER BY eventdate ASC) RANK FROM kir_records where customerid= 1704)
WHERE RANK = 1;
I immediately get a syntax error as soon as paste this in mysql workbench.
Error:
Select is invalid at this position. Expecting '(' at first select.
Is there a workaround to make this work ?
try using this query.
SELECT originalAmount,fees,id FROM
((SELECT originalAmount,fees,id, ROW_NUMBER() OVER (PARTITION BY transaction_number ORDER BY eventdate ASC) RANK FROM kir_records where customerid= 1704))
WHERE RANK = 1;
Look like RANK is a reserved word in MySql. Used backquotes (``) around RANK and it worked as expected. One other thing to take care about is that every derived table (AKA sub-query) must indeed have an alias. Dervied Table alias
Here is the query which worked for me :
SELECT originalAmount,fees,id FROM
(SELECT originalAmount,fees,id, ROW_NUMBER() OVER (PARTITION BY transaction_number ORDER BY eventdate ASC) `RANK` FROM kir_records where customerid= 1704) AS SomeAlias
WHERE `RANK` = 1;
Same query running in Db2 , but in oracle it's giving error.
Please help. thanks in advance.
delete from (SELECT
EMP_ID,
SAL,
ROW_NUMBER() OVER (PARTITION BY EMP_ID ORDER BY SAL DESC) As RN
FROM FPM.FACT_PL_BS
WHERE MEASUREMENT_PERIOD_ID=20170811
AND SCENARIO_ID=1) A where RN>1}
Check documentation Notes on Updatable Views:
The view must not contain any of the following constructs:
A set operator
A DISTINCT operator
An aggregate or analytic function
A GROUP BY, ORDER BY, MODEL, CONNECT BY, or START WITH clause
A collection expression in a SELECT list
A subquery in a SELECT list
A subquery designated WITH READ ONLY
Joins, with some exceptions, as documented in Oracle Database Administrator's Guide
ROW_NUMBER is an Analytic Function, so update is not permitted.
I think this one should work (not tested):
delete from FPM.FACT_PL_BS
WHERE ROWID =ANY
(SELECT ROW_ID
FROM
(SELECT ROWID as ROW_ID,
EMP_ID,
SAL,
ROW_NUMBER() OVER (PARTITION BY EMP_ID ORDER BY SAL DESC) As RN
FROM FPM.FACT_PL_BS
WHERE MEASUREMENT_PERIOD_ID=20170811
AND SCENARIO_ID=1)
WHERE RN > 1;
or maybe
delete from FPM.FACT_PL_BS
WHERE MEASUREMENT_PERIOD_ID=20170811
AND SCENARIO_ID=1
AND ROWID <>ALL
(select MAX(ROWID) KEEP (DENSE_RANK FIRST ORDER BY SAL) OVER (PARTITION BY EMP_ID)
FROM FPM.FACT_PL_BS)
I am creating a sub-query to select distinct entries on a certain column, DIS_COL, then return all other columns for those distinct entries, arbitrarily selecting the first row.
To do this I'm creating a sub-query that selects only first rows using over - partition by, then selecting from that sub-query.
There is an error with my code however; "ORA-00923: FROM keyword not found where expected".
My code is below:
select *
from (
select *,
row_number() over (partition by DIS_COL order by COL_2) as row_number --ORDER BY FIELD DETERMINES WHICH ROW IS THE FIRST ROW AND THUS WHICH ONE IS SELECTED.
from MY_TABLE
) as rows
where row_number = 1
AND CRITERIA_COL = 'CRIT_1'
OR CRITERIA_COL_2 = 'CRIT_2';
How can I correct my code to achieve the desired result?
I am working on an Oracle database.
Remove as rows. It is not proper syntax for the table/query alias. It is syntax for column alias.
select *
from (
select T.*,
row_number() over (partition by DIS_COL order by COL_2) as row_number --ORDER BY FIELD DETERMINES WHICH ROW IS THE FIRST ROW AND THUS WHICH ONE IS SELECTED.
from MY_TABLE t
)
where row_number = 1
AND (CRITERIA_COL = 'CRIT_1'
OR CRITERIA_COL_2 = 'CRIT_2');
It's not the ROW_NUMBER, it's the *, Add an alias to the subquery:
select *
from (
select T.*, -- here
row_number() over (partition by DIS_COL order by COL_2) as row_number --ORDER BY FIELD DETERMINES WHICH ROW IS THE FIRST ROW AND THUS WHICH ONE IS SELECTED.
from MY_TABLE
)T as rows -- and here
where row_number = 1
AND CRITERIA_COL = 'CRIT_1'
OR CRITERIA_COL_2 = 'CRIT_2';