Aggregation, mathematical function & GROUP BY in a single query in Hive - hadoop

I have Table T1 with the below schema:
job_id job_name queue memory cores start_time end_time
job_1234 ABC A_user 51200 20 22-02-2018 22-02-2018
job_2345 ABC A_user 71680 30 22-02-2018 23-02-2018
I want the output to be:
ID f_queue f_job_name f_memory f_cores f_start_time f_end_time process_month
1 A_user ABC 120 50 22-02-2018 23-02-2018 201702
Where memory= (51200+71680/1024), cores=(20+30), ID and process_month are the static variables that I am passing to the hive script.
Is the below query the right one:
select
${ID},
job_id as f_queue,
job_name as f_job_name,
sum(memory)/1024 as f_memory,
sum(f_cores) as f_cores,
min(start_time) as f_start_time,
max(end_time) as f_end_time,
${process_month} as process_month
from T1 group by f_job_name,f_queue;

in group by you need to refer to actual column names i.e job_id,job_name not the alias names(i.e f_queue,f_job_name).
in your query you are doing group by job_id as f_queue i.e job_id field and job_name, when hive did group by on both job_id,job_name above records goes to different groups(job_1234,job_2345)
You need to change your query to
select
queue as f_queue,
job_name as f_job_name,
sum(memory)/1024 as f_memory,
sum(cores) as f_cores,
min(start_time) as f_start_time,
max(end_time) as f_end_time
from queue group by job_name,queue;
in the above query i'm doing group by on queue and job_name which will result
+----------+-------------+-----------+----------+---------------+-------------+--+
| f_queue | f_job_name | f_memory | f_cores | f_start_time | f_end_time |
+----------+-------------+-----------+----------+---------------+-------------+--+
| A_user | ABC | 120.0 | 50.0 | 22-02-2018 | 23-02-2018 |
+----------+-------------+-----------+----------+---------------+-------------+--+
Your query results:-
including job_name,job_id in group by clause
select
job_id as f_queue,
job_name as f_job_name,
sum(memory)/1024 as f_memory,
sum(cores) as f_cores,
min(start_time) as f_start_time,
max(end_time) as f_end_time
from queue group by job_name,job_id;
Output:-
+-----------+-------------+-----------+----------+---------------+-------------+--+
| f_queue | f_job_name | f_memory | f_cores | f_start_time | f_end_time |
+-----------+-------------+-----------+----------+---------------+-------------+--+
| job_1234 | ABC | 50.0 | 20.0 | 22-02-2018 | 22-02-2018 |
| job_2345 | ABC | 70.0 | 30.0 | 22-02-2018 | 23-02-2018 |
+-----------+-------------+-----------+----------+---------------+-------------+--+
Probably Your Required query Would be something like
select
${ID},
queue as f_queue,
job_name as f_job_name,
sum(memory)/1024 as f_memory,
sum(cores) as f_cores,
min(start_time) as f_start_time,
max(end_time) as f_end_time
${process_month} as process_month
from T1 group by queue,job_name;

Related

Delete query inoracle db is fast but select query running too long

I have below table with 160,000 rows. When I use SELECT ID FROM mytable WHERE id NOT IN ( SELECT max(id) FROM mytable GROUP BY user_id); query is running very long and not finishing (I wait for 1 Hr) but when I use delete FROM mytable WHERE id NOT IN (SELECT max(id) FROM mytable GROUP BY user_id); query is running in 0.5 seconds. Why??
---------------------------------------------------------------------------------------------------
| id | MyTimestamp | Name | user_id ...
----------------------------------------------------------------------------------------------------
| 0 | 1657640396 | John | 123581 ...
| 1 | 1657638832 | Tom | 168525 ...
| 2 | 1657640265 | Tom | 168525 ...
| 3 | 1657640292 | John | 123581 ...
| 4 | 1657640005 | Jack | 896545 ...
-----------------------------------------------------------------------------------------

Pull interlinked records based on rank and latest timestamp

I have a table like below.
myTable:
---------------------------------------------------------------------------------
id | ref | type | status | update_dt
---------------------------------------------------------------------------------
id1 | m1123 | 10 | 1 | 03-NOV-22 10.44.64.104000000 AM
id1 | m2123 | 10 | 2 | 03-NOV-22 10.44.64.104000000 AM
id1 | s1123 | 20 | | 03-NOV-22 10.44.64.104000000 AM
id1 | s2123 | 20 | | 03-NOV-22 10.44.54.104000000 AM
id1 | p1123 | 30 | | 03-NOV-22 10.44.54.104000000 AM
id2 | m1234 | 10 | | 02-NOV-22 10.44.64.104000000 AM
id2 | s1234 | 20 | | 02-NOV-22 10.44.54.104000000 AM
id2 | s2234 | 20 | | 02-NOV-22 10.44.54.104000000 AM
id3 | m1345 | 10 | 1 | 01-NOV-22 10.44.64.104000000 AM
id3 | s1345 | 20 | | 01-NOV-22 10.44.64.104000000 AM
id3 | s2345 | 20 | | 01-NOV-22 10.44.54.104000000 AM
---------------------------------------------------------------------------------
My requirement looks pretty complex to me and I have tried to reach somewhere but not completely there. Here are my requirements.
From the table, I have to pull records of type 10 and 20 alone. With type 10 having status either null or 1.
For type 10 comparison, I need to convert the update_dt to epoch and pull all the type 10 records above a specific epoch.
type 10 records are linked to type 20 records by the id. They have the same id.
For all the records pulled in step 2, need to pull their corresponding type 20 records. But only the latest one based on update_dt.
If multiple records of type 20 has the same update_dt from step 4, any one of them can be picked.
By the above requirements, I need to get a result like for a sample epoch that corresponds to Nov 1 2022 - 11AM (1667300400):
-----------------------------------------------------------------------------------------------
ref1 | ref2 | ref1_update_dt | ref2_update_dt
-----------------------------------------------------------------------------------------------
m1123 | s1123 | 03-NOV-22 10.44.64.104000000 AM | 03-NOV-22 10.44.64.104000000 AM
m1234 | s2234 | 02-NOV-22 10.44.64.104000000 AM | 02-NOV-22 10.44.54.104000000 AM
-----------------------------------------------------------------------------------------------
I tried the below. But didnt quite get there.
WITH cte_latest AS
(
SELECT
t1.ref ref1,
t2.ref ref2,
t1.update_dt ref1_update_dt,
t2.update_dt ref2_update_dt,
RANK() OVER(ORDER BY t2.update_dt DESC) rank_temp
FROM
myTable t1
JOIN myTable t2 ON
t1.id = t2.id
WHERE
t1.type = 10
AND (t1.status IS NULL
OR t1.status = 1)
AND t2.type = 20
AND (CAST(t1.update_dt AS DATE) - TO_DATE('01/01/1970', 'DD/MM/YYYY')) * 24 * 60 * 60 > '1667300400')
SELECT
ref1,
ref2,
ref1_update_dt,
ref2_update_dt
FROM
cte_latest
WHERE
rank_temp = 1
ORDER BY
ref1_update_dt;
Please help.
RANK will return the same number when there are multiple type 20 records that have the same update_dt. So, you will want to use ROW_NUMBER instead. That will ensure that each type 20 row gets a unique number to break any ties - per rule #5.
Also, you will need to partition the ROW_NUMBER based on the id of the type 10 records. That will cause the numbering to reset at 1 for each type 10 record id. Without partitioning every row in the result set would get a unique number.
ROW_NUMBER() OVER (PARTITION BY t1.id ORDER BY t2.update_dt DESC)

Oracle SQL: Display single columns from multiple rows of a single table with Logic

Oracle SQL
I would like to look-up data from 2 rows of the same column from the same table together in an existing long query with multiple joins.
Current table set-up (single table):
Table: ACCOUNT_DETAILS
| TRX_ID | TYPE | FAC_ID | ACC_ID |
| ------ | ---- | ------ | ------ |
| 1234 | CRDR | ABC123 | AB1234 |
| 1234 | DBTR | XYZ222 | XY9800 |
| 9876 | CRDR | EFG999 | EF7659 |
| 9876 | DBTR | ABC123 | AB9900 |
Expected Result:
Table: REPORT
| TRX_ID | Counterparty FAC_ID | Counterparty ACC_ID |
| ------ | ------------------- | ------------------- |
| 1234 | XYZ222 | XY9800 |
| 9876 | EFG999 | EF7659 |
Logic needed:
If FAC_ID NOT LIKE 'ABC%' then refer to the Counterparty FAC_ID (e.g. for TRX_ID = 1234, it will refer to the DBTR FAC_ID and DBTR ACC_ID; TRX_ID = 9876, it will refer to the CRDR FAC_ID AND CRDR ACC_ID)
Example:
SELECT (CASE WHEN TYPE = 'DBTR' AND FAC_ID LIKE 'ABC%' THEN (SELECT FAC_ID FROM ACCOUNT_DETAILS WHERE TYPE = 'CRDR')
ELSE (SELECT FAC_ID FROM ACCOUNT_DETAILS WHERE TYPE = 'DBTR') END)
FROM ACCOUNT_DETAILS
I've tried options such as JOINs, UNIONs and subqueries but it does not work. I would like to have the Counterparty FAC_ID and Counterparty ACC_IDs in separate single lines in the query, as I will include it in a long query that I already have.
We can use CASE in a CTE to flag the rows where FAC_ID does not start with "ABC" and then use the flag in the WHERE clause.
See db<>fiddle here for schema.
with cte as
(
select
TRX_ID,
FAC_ID,
ACC_ID,
CASE WHEN FAC_ID LIKE 'ABC%' THEN 0 ELSE 1 END ordinal
FROM ACCOUNT_DETAILS)
SELECT
TRX_ID,
FAC_ID,
ACC_ID
FROM CTE
WHERE ordinal = 1;
TRX_ID | FAC_ID | ACC_ID
-----: | :----- | :-----
1234 | XYZ222 | XY9800
9876 | EFG999 | EF7659
db<>fiddle here

When i select , only one column is checked without duplicates

I have a 2 table like this:
first table
+------------+---------------+--------+
| pk | user_one |user_two|
+------------+---------------+--------+
second table
+------------+---------------+--------+----------------+----------------+
| pk | sender |receiver|fk of firsttable|content |
+------------+---------------+--------+----------------+----------------+
First and second table have one to many(1:N) relations.
There are many records in second table:
| pk | sender|receiver|fk of firsttable|content |
|120 |car224 |car223 |1 |test message1 to 223
|121 |car224 |car223 |1 |test message2 to 223
|122 |car224 |car225 |21 |test message1 to 225
|123 |car224 |car225 |21 |test message2 to 225
|124 |car224 |car225 |21 |test message3 to 225
|125 |car224 |car225 |21 |test message4 to 225
I need to find if fk has the same value and I want the row with the largest pk.
I've changed the above column name to make it easier to understand.
Here is the actual sql I've tried so far:
select *
from (select rownum rn,
mr.mrno,
mr.user_one,
mr.user_two,
m.mno,
m.content
from tbl_messagerelation mr,
tbl_message m
where (mr.user_one = 'car224' or
mr.user_two='car224') and
m.rowid in (select max(rowid)
from tbl_message
group by m.mno) and
rownum <= 1*20)
where rn > (1-1) * 20
And this is the result:
+---------+-------+----------+----------+-------------------------+----------------------+
| rn | mrno | user_one | user_two | mno(pk of second table) | content |
+---------+-------+----------+----------+-------------------------+----------------------+
| 1 | 1 | car224 | car223 | 125 | test message4 to 225 |
| 2 | 21 | car224 | car225 | 125 | test message4 to 225 |
+---------+-------+----------+----------+-------------------------+----------------------+
My desired result is something like this:
+---------+---------+----------+--------------------+----------------------+
| fk | sender | receiver | pk of second table | content |
+---------+---------+----------+--------------------+----------------------+
| 1 | car224 | car223 | 121 | test message2 to 223 |
| 21 | car224 | car223 | 125 | test message4 to 225 |
+---------+---------+----------+--------------------+----------------------+
Your table description when compared to your query is confusing me. However, what I could understand was that you are probably looking for row_number().
An important advice is to use standard explicit JOIN syntax rather than outdated a,b syntax for joins. Join keys were not clear to me and you may replace it appropriately in your final query.
select * from
(
select mr.*, m.*, row_number() over ( partition by m.fk order by m.pk desc ) as rn
from tbl_messagerelation mr join tbl_message m on mr.? = m.?
) where rn =1
Or perhaps you don't need that join at all
select * from
(
select m.*, row_number() over ( partition by m.fk order by m.pk desc ) as rn
from tbl_message m
) where rn =1

Oracle plan_baseline is ignored

I have query with bind variables which comming from outer application.
The optimizer use the the unwanted index and I want to force it use another plan.
So I generate the good plan using index hint and then created the baseline with the plans
and connect the wanted plan to the query sql_id, and change the fixed attribute to 'YES'.
I executed the DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE function
and the output shows that the wanted plan marked as fixed=yes.
So why when I'm running the query it still with the bad plan??
The code:
-- Query
SELECT DISTINCT t_01.puid
FROM PWORKSPACEOBJECT t_01 , PPOM_APPLICATION_OBJECT t_02
WHERE ( ( UPPER(t_01.pobject_type) IN ( UPPER( :1 ) , UPPER( :2 ) )
AND ( t_02.pcreation_date >= :3 ) ) AND ( t_01.puid = t_02.puid ) )
-- get the text
select sql_fulltext
from v$sqlarea
where sql_id = '21pts328r2nb7' and rownum = 1;
-- prepare the explain plan
explain plan for
SELECT DISTINCT t_01.puid
FROM PWORKSPACEOBJECT t_01 , PPOM_APPLICATION_OBJECT t_02
WHERE ( ( UPPER(t_01.pobject_type) IN ( UPPER( :1 ) , UPPER( :2 ) )
AND ( t_02.pcreation_date >= :3 ) ) AND ( t_01.puid = t_02.puid ) ) ;
-- we can see that there is no use of index - PIPIPWORKSPACEO_2
select * from table(dbms_xplan.display);
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10382 | 517K| 61553 |
| 1 | HASH UNIQUE | | 10382 | 517K| 61553 |
| 2 | HASH JOIN | | 158K| 7885K| 61549 |
| 3 | INLIST ITERATOR | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID| PWORKSPACEOBJECT | 158K| 4329K| 52689 |
| 5 | INDEX RANGE SCAN | PIPIPWORKSPACEO_3 | 158K| | 534 |
| 6 | INDEX RANGE SCAN | DBTAO_IX1_PPOM | 3402K| 74M| 2911 |
------------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
-- generate plan with the wanted index
explain plan for
select /*+ index(t_01 PIPIPWORKSPACEO_2)*/ distinct t_01.puid
from pworkspaceobject t_01 , ppom_application_object t_02
where ( ( upper(t_01.pobject_type) in ( upper( :1 ) , upper( :2 ) )
and ( t_02.pcreation_date >= :3 ) ) and ( t_01.puid = t_02.puid ) ) ;
-- the index working - the index used
select * from table(dbms_xplan.display);
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10382 | 517K| 223K|
| 1 | HASH UNIQUE | | 10382 | 517K| 223K|
| 2 | HASH JOIN | | 158K| 7885K| 223K|
| 3 | TABLE ACCESS BY INDEX ROWID| PWORKSPACEOBJECT | 158K| 4329K| 214K|
| 4 | INDEX FULL SCAN | PIPIPWORKSPACEO_2 | 158K| | 162K|
| 5 | INDEX RANGE SCAN | DBTAO_IX1_PPOM | 3402K| 74M| 2911 |
-----------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
-- get the sql_id of the query with the good index
-- 7t72qvghr0zqh
select sql_id from v$sqlarea where sql_text like 'select /*+ index(t_01 PIPIPWORKSPACEO_2)%';
-- get the plan hash value of the good plan by the sql_id
--4040955653
select plan_hash_value from v$sql_plan where sql_id = '7t72qvghr0zqh';
-- get the plan hash value of the bad plan by the sql_id
--1044780890
select plan_hash_value from v$sql_plan where sql_id = '21pts328r2nb7';
-- load the source plan
begin
dbms_output.put_line(
dbms_spm.load_plans_from_cursor_cache
( sql_id => '21pts328r2nb7' )
);
END;
-- the new base line created with the bad plan
select * from dba_sql_plan_baselines;
-- load the good plan of the second sql_id (with the wanted index)
-- and bind it to the sql_handle of the source query
begin
dbms_output.put_line(
DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE
( sql_id => '7t72qvghr0zqh',
plan_hash_value => 4040955653,
sql_handle => 'SQL_4afac4211aa3317d' )
);
end;
-- new there are 2 plans bind to the same sql_handle and sql_text
select * from dba_sql_plan_baselines;
-- alter the good one to be fixed
begin
dbms_output.put_line(
dbms_spm.alter_sql_plan_baseline
( sql_handle =>
'SQL_4afac4211aa3317d',
PLAN_NAME => 'SQL_PLAN_4pyq444da6cbxf7c97cc7',
ATTRIBUTE_NAME => 'fixed',
ATTRIBUTE_VALUE => 'YES'
)) ;
end;
-- check the good plan - fixed = yes
select * from table(
dbms_xplan.display_sql_plan_baseline (
sql_handle => 'SQL_4afac4211aa3317d',
plan_name => 'SQL_PLAN_4pyq444da6cbxf7c97cc7',
format => 'ALL'));
--------------------------------------------------------------------------------
SQL handle: SQL_4afac4211aa3317d
SQL text: SELECT DISTINCT t_01.puid FROM PWORKSPACEOBJECT t_01 ,
PPOM_APPLICATION_OBJECT t_02 WHERE ( ( UPPER(t_01.pobject_type) IN (
UPPER( :1 ) , UPPER( :2 ) ) AND ( t_02.pcreation_date >= :3 ) ) AND (
t_01.puid = t_02.puid ) )
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Plan name: SQL_PLAN_4pyq444da6cbxf7c97cc7 Plan id: 4157177031
Enabled: YES Fixed: YES Accepted: YES Origin: MANUAL-LOAD
--------------------------------------------------------------------------------
Plan hash value: 4040955653
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10382 | 517K| | 223K (1)| 00:44:37 |
| 1 | HASH UNIQUE | | 10382 | 517K| | 223K (1)| 00:44:37 |
|* 2 | HASH JOIN | | 158K| 7885K| 6192K| 223K (1)| 00:44:37 |
| 3 | TABLE ACCESS BY INDEX ROWID| PWORKSPACEOBJECT | 158K| 4329K| | 214K (1)| 00:42:50 |
|* 4 | INDEX FULL SCAN | PIPIPWORKSPACEO_2 | 158K| | | 162K (1)| 00:32:25 |
|* 5 | INDEX RANGE SCAN | DBTAO_IX1_PPOM | 3402K| 74M| | 2911 (1)| 00:00:35 |
-----------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
3 - SEL$1 / T_01#SEL$1
4 - SEL$1 / T_01#SEL$1
5 - SEL$1 / T_02#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T_01"."PUID"="T_02"."PUID")
4 - filter(UPPER("POBJECT_TYPE")=UPPER(:1) OR UPPER("POBJECT_TYPE")=UPPER(:2))
5 - access("T_02"."PCREATION_DATE">=:3)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) "T_01"."PUID"[VARCHAR2,15]
2 - (#keys=1) "T_01"."PUID"[VARCHAR2,15]
3 - "T_01"."PUID"[VARCHAR2,15]
4 - "T_01".ROWID[ROWID,10]
5 - "T_02"."PUID"[VARCHAR2,15]
Note
-----
- 'PLAN_TABLE' is old version
-- run explain plan for the query
-- need to use the new plan
declare
v_string clob;
begin
select sql_fulltext
into v_string
from v$sqlarea
where sql_id = '21pts328r2nb7' and rownum = 1;
execute immediate 'explain plan for ' || v_string using '1','1',sysdate;
end;
-- check the plan - still the unwanted index and plan
select * from table(dbms_xplan.display);
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10382 | 517K| 61553 |
| 1 | HASH UNIQUE | | 10382 | 517K| 61553 |
| 2 | HASH JOIN | | 158K| 7885K| 61549 |
| 3 | INLIST ITERATOR | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID| PWORKSPACEOBJECT | 158K| 4329K| 52689 |
| 5 | INDEX RANGE SCAN | PIPIPWORKSPACEO_3 | 158K| | 534 |
| 6 | INDEX RANGE SCAN | DBTAO_IX1_PPOM | 3402K| 74M| 2911 |
------------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
From a read through of your test case, I suspect the problem is that you're interpreting the FIXED attribute incorrectly.
If you list all the plans for your baseline, you will probably find the original and the loaded cursor plan are both ENABLED and ACCEPTED at the moment. I think what you need to do (based on my own usage of these calls) is use the ENABLED attribute. Set ENABLED to NO for the unwanted plan.
Try:
exec dbms_spm.alter_sql_plan_baseline(
sql_handle=>'SQL_...' -- baseline to update
,plan_name=>'SQL_PLAN_...' -- unwanted plan signature to disable
,attribute_name=>'ENABLED',attribute_value=>'NO')

Resources