Query taking time despite adding session settings - hadoop
Following is the ETL generated query
Query -
SELECT infaHiveSysTimestamp('SS') as a0, 7991 as a1, single_use_subq30725.a1 as a2, SUBSTR(SUBSTR(single_use_subq30725.a2, 0, 5), 0, 5) as a3, CAST(1 AS SMALLINT) as a4, single_use_subq30725.a3 as a5, single_use_subq30725.a4 as a6, SUBSTR(SUBSTR(SUBSTR(single_use_subq30725.a8, (CASE WHEN 12 < (- LENGTH(single_use_subq30725.a8)) THEN 0 ELSE 12 END), 104857600), 0, 20), 0, 20) as a7, infaNativeUDFCallString('TO_CHAR', single_use_subq30725.a5) as a8, infaHiveSysTimestamp('SS') as a9, CAST(infaNativeUDFCallDate('TRUNC', single_use_subq30725.a6, 'DD') AS DATE) as a10 FROM (SELECT (CASE WHEN 1 = t1.a1 THEN t1.a0 ELSE CAST(NULL AS TIMESTAMP) END) as a0, infaNativeUDFCallDate('TRUNC', (CASE WHEN 1 = t1.a1 THEN t1.a0 ELSE CAST(NULL AS TIMESTAMP) END), 'DD') as a1
FROM
(
SELECT MAX(t1.a0) as a0, MAX(t1.a1) as a1
FROM (
SELECT mstr_load_audit.last_run_ts as a0, 1 as a1 FROM mstr_etl.mstr_load_audit WHERE interface_name='m_CTM_RAWTLogData_target_tbl'
) t1
)t1
) single_use_subq39991
JOIN (
SELECT w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.CREATE_TS as a0, CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.txACTION_ID AS STRING) as a1, SUBSTR(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.STORE_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.STORE_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 10), (CASE WHEN 0 < (- LENGTH(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.STORE_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.STORE_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 10))) THEN 0 ELSE 0 END), 10) as a2, SUBSTR(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LANE_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LANE_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 10), (CASE WHEN 0 < (- LENGTH(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LANE_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LANE_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 10))) THEN 0 ELSE 0 END), 10) as a3, SUBSTR(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 20), (CASE WHEN 0 < (- LENGTH(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 20))) THEN 0 ELSE 0 END), 20) as a4, CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LOYALTY_DEV_NUM AS DECIMAL(28, 0)) as a5, CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_DT AS TIMESTAMP) as a6, CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.ETL_LOAD_DT AS TIMESTAMP) as a7, w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_TS as a8
FROM
sourcedb.W5883634877684653839_Read_lcl_tlog_raw_2_VIEW__m_CTM_RAWTLogData_target_tbl
)
single_use_subq30725
WHERE (single_use_subq39991.a0 < single_use_subq30725.a0) AND (single_use_subq39991.a1 <= single_use_subq30725.a7)]
As this query is generated in hive pushdown mode, we have added following settings in the environment sql
SET hive.vectorized.execution.enabled=true;
SET hive.vectorized.execution.reduce.enabled=true;
SET hive.cbo.enable=true;
SET hive.compute.query.using.stats=true;
set hive.exec.orc.split.strategy=BI;
set hive.merge.tezfiles=true;
But we did not see any significant gains.
We do have an option of moving this job in traditional batch mode - where we run this via shell script.
Is there any scope of making any changes to the query there by reducing the execution time. I am sure we can get rid of all type conversions and reduce execution time there.
Are there any additional things, we can try.
This join in your query:
JOIN (
SELECT
... SKIPPED ...
)
single_use_subq30725
WHERE (single_use_subq39991.a0 < single_use_subq30725.a0) AND (single_use_subq39991.a1 <= single_use_subq30725.a7)]
works as CROSS JOIN because no ON condition specified.
After this CROSS JOIN, the dataset is being filtered using this WHERE (single_use_subq39991.a0 < single_use_subq30725.a0) AND (single_use_subq39991.a1 <= single_use_subq30725.a7)
Actually it does not multiply rows and should work as MAP-JOIN because first subquery returns one row maximum:
SELECT MAX(t1.a0) as a0, MAX(t1.a1) as a1
Add this setting to enable map-join: set hive.auto.convert.join=true;
Check map-join is in the EXPLAIN output.
But the biggest problem is not this CROSS (MAP?) join itself. It prevents predicate push-down to work before join, when reading table in second query.
I suggest to remove join at all and calculate first query once and provide a0 and a1 as a parameters in the where clause. In such way you will eliminate unnnecessary join and predicate push-down may work directly.
For example PPD could be applied to this column: w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.CREATE_TS as a0
Check PPD and other performance settings: https://stackoverflow.com/a/48296562/2700344
Related
How to get the data from oracle on the following demand?
The table like this. bh sl productdate a1 100 2022-1-1 a1 220 2022-1-2 a1 220 2022-1-3 a2 200 2022-1-1 a2 350 2022-1-2 a2 350 2022-1-3 The result like this. bh sl_q(sl_before) sl_h(sl_after) sl_b(changeValue) productdate a1 100 220 120 2022-1-2 a2 200 350 150 2022-1-2 Rules:the same field bh, when the field sl change,then get the record.
We can use a ROW_NUMBER trick here: WITH cte AS ( SELECT t.*, ROW_NUMBER() OVER (PARTITION BY bh ORDER BY productdate) rn1, ROW_NUMBER() OVER (PARTITION BY bh ORDER BY productdate DESC) rn2 FROM yourTable t ) SELECT bh, MAX(CASE WHEN rn1 = 1 THEN sl END) AS sl_q, MAX(CASE WHEN rn2 = 1 THEN sl END) AS sl_h, MAX(CASE WHEN rn2 = 1 THEN sl END) - MAX(CASE WHEN rn1 = 1 THEN sl END) AS sl_b FROM cte GROUP BY bh;
How to use ClickHouse partition value in SQL query?
I have a table with tuple partitions: (0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1), (3, 0), ... CREATE TABLE my_table ( id Int32, a Int32, b Float32, c Int32 ) ENGINE = MergeTree PARTITION BY ( intDiv(id, 1000000), a < 20000 AND b > 0.6 AND c >= 100 ) ORDER BY id; I need only rows with partition (<any number>, 1) and I'm looking for a way to use partition value in a query like SELECT * FROM my_table WHERE my_table.partition[2] == 1; Does ClickHouse have such a feature?
In version 21.6 was added virtual columns _partition_id and _partition_value that can help you: SELECT *, _partition_id, _partition_value FROM my_table WHERE (_partition_value.2) = 1
And what is the problem with where (a < 20000 AND b > 0.6 AND c >= 100) = 1 ??? insert into my_table select 1, 3000000, 0, 0 from numbers(100000000); insert into my_table select 1, 0, 10, 200 from numbers(100); SET send_logs_level = 'debug'; set force_index_by_date=1; select sum(id) from my_table where (a < 20000 AND b > 0.6 AND c >= 100) = 1; ...Selected 1/7 parts by partition key... ┌─sum(id)─┐ │ 100 │ └─────────┘ 1 rows in set. Elapsed: 0.002 sec. Though (_partition_value.2) = 1 will be faster because it does not require to read columns a,b,c for filtering.
Oracle Segment Does Not Equal Extents?
For a given tablespace, why doesn't the sum of bytes in dba_extents equal the sum of bytes in dba_segments? (additional questions after sample script.) SQL> with "SEG" as ( select 'segment_bytes' what , to_char(sum(bytes), '9,999,999,999,999') bytes from dba_segments where tablespace_name = 'MYDATA' ) , "EXT" as ( select 'extent_bytes' what , to_char(sum(bytes), '9,999,999,999,999') bytes from dba_extents where tablespace_name = 'MYDATA' ) , "FS" as ( select tablespace_name , sum(bytes) free_bytes from dba_free_space where tablespace_name = 'MYDATA' group by tablespace_name ), "DF" as ( select tablespace_name , sum(bytes) alloc_bytes , sum(user_bytes) user_bytes from dba_data_files where tablespace_name = 'MYDATA' group by tablespace_name ) select what, bytes from SEG union all select 'datafile_bytes-freespace' what , to_char(alloc_bytes - nvl(free_bytes, 0), '9,999,999,999,999') used_file_bytes from DF left join FS on DF.tablespace_name = FS.tablespace_name union all select 'datafile_userbytes-freespace' what , to_char(user_bytes - nvl(free_bytes, 0), '9,999,999,999,999') used_user_bytes from DF left join FS on DF.tablespace_name = FS.tablespace_name union all select what, bytes from EXT ; WHAT BYTES ---------------------------- ------------------ segment_bytes 2,150,514,819,072 datafile_bytes-freespace 2,150,528,540,672 datafile_userbytes-freespace 2,150,412,845,056 extent_bytes 2,150,412,845,056 4 rows selected. I would have expected segment_bytes to equal either extent_bytes or datafile_bytes-freespace, but it falls somewhere in between. Is segment_bytes more than extent_bytes due to segment "overhead" (keeping track of all of the extents)? If so, then is it also true that this segment "overhead" is part of the datafile "overhead"? Oracle 19.1 Enterprise Edition. Thanks in advance.
For example, the difference between dba_segments and dba_extents might be in the objects from recyclebin: please look at the results from my test database: with seg as ( select segment_name,sum(bytes) b1 from dba_segments group by segment_name ) ,ext as ( select segment_name,sum(bytes) b2 from dba_extents group by segment_name ) select seg.segment_name seg1 ,ext.segment_name seg2 ,b1,b2 from seg full outer join ext on seg.segment_name=ext.segment_name where lnnvl(b1=b2) order by 1,2; Results: SEG1 SEG2 B1 B2 ------------------------------ ------------------------------ ---------- ---------- BIN$xi7yNJwFcIrgUwIAFaxDaA==$0 65536 BIN$xi7yNJwGcIrgUwIAFaxDaA==$0 65536 _SYSSMU10_2262159254$ _SYSSMU10_2262159254$ 0 4325376 _SYSSMU1_3588498444$ _SYSSMU1_3588498444$ 0 3276800 _SYSSMU2_2971032042$ _SYSSMU2_2971032042$ 0 2228224 _SYSSMU3_3657342154$ _SYSSMU3_3657342154$ 0 2228224 _SYSSMU4_811969446$ _SYSSMU4_811969446$ 0 2293760 _SYSSMU5_3018429039$ _SYSSMU5_3018429039$ 0 3276800 _SYSSMU6_442110264$ _SYSSMU6_442110264$ 0 2228224 _SYSSMU7_2728255665$ _SYSSMU7_2728255665$ 0 2097152 _SYSSMU8_801938064$ _SYSSMU8_801938064$ 0 2228224 _SYSSMU9_647420285$ _SYSSMU9_647420285$ 0 3276800 12 rows selected. As you can see first 2 rows are objects from recyclebin, so you can run the same query and check if your objects are in recyclebin too. They are not visible in dba_extents, because they filtered out by segment_flag: select text_vc from dba_views where view_name='DBA_EXTENTS'; select ds.owner, ds.segment_name, ds.partition_name, ds.segment_type, ds.tablespace_name, e.ext#, f.file#, e.block#, e.length * ds.blocksize, e.length, e.file# from sys.uet$ e, sys.sys_dba_segs ds, sys.file$ f where e.segfile# = ds.relative_fno and e.segblock# = ds.header_block and e.ts# = ds.tablespace_id and e.ts# = f.ts# and e.file# = f.relfile# and bitand(NVL(ds.segment_flags,0), 1) = 0 and bitand(NVL(ds.segment_flags,0), 65536) = 0 union all select ds.owner, ds.segment_name, ds.partition_name, ds.segment_type, ds.tablespace_name, e.ktfbueextno, f.file#, e.ktfbuebno, e.ktfbueblks * ds.blocksize, e.ktfbueblks, e.ktfbuefno from sys.sys_dba_segs ds, sys.x$ktfbue e, sys.file$ f where e.ktfbuesegfno = ds.relative_fno and e.ktfbuesegbno = ds.header_block and e.ktfbuesegtsn = ds.tablespace_id and ds.tablespace_id = f.ts# and e.ktfbuefno = f.relfile# and bitand(NVL(ds.segment_flags, 0), 1) = 1 and bitand(NVL(ds.segment_flags,0), 65536) = 0; So if we comment out those predicates (bitand(NVL(segment_flags,0)....) and check our difference (BIN$... and _SYSSMU... objects), we will find which predicates filter them out: with my_dba_extents( OWNER,SEGMENT_NAME,PARTITION_NAME ,SEGMENT_TYPE,TABLESPACE_NAME,EXTENT_ID,FILE_ID ,BLOCK_ID,BYTES,BLOCKS,RELATIVE_FNO ,segment_flags) as ( select ds.owner, ds.segment_name, ds.partition_name, ds.segment_type, ds.tablespace_name, e.ext#, f.file#, e.block#, e.length * ds.blocksize, e.length, e.file# ,segment_flags from sys.uet$ e, sys.sys_dba_segs ds, sys.file$ f where e.segfile# = ds.relative_fno and e.segblock# = ds.header_block and e.ts# = ds.tablespace_id and e.ts# = f.ts# and e.file# = f.relfile# -- and bitand(NVL(ds.segment_flags,0), 1) = 0 -- and bitand(NVL(ds.segment_flags,0), 65536) = 0 union all select ds.owner, ds.segment_name, ds.partition_name, ds.segment_type, ds.tablespace_name, e.ktfbueextno, f.file#, e.ktfbuebno, e.ktfbueblks * ds.blocksize, e.ktfbueblks, e.ktfbuefno ,segment_flags from sys.sys_dba_segs ds, sys.x$ktfbue e, sys.file$ f where e.ktfbuesegfno = ds.relative_fno and e.ktfbuesegbno = ds.header_block and e.ktfbuesegtsn = ds.tablespace_id and ds.tablespace_id = f.ts# and e.ktfbuefno = f.relfile# -- and bitand(NVL(ds.segment_flags, 0), 1) = 1 -- and bitand(NVL(ds.segment_flags,0), 65536) = 0 ) select segment_name ,bitand(NVL(segment_flags, 0), 1) as predicate_1 ,bitand(NVL(segment_flags,0), 65536) as predicate_2 ,case when bitand(NVL(segment_flags,0), 1) = 0 then 'y' else 'n' end pred_1_res ,case when bitand(NVL(segment_flags,0), 65536) = 0 then 'y' else 'n' end pred_2_res from my_dba_extents e where e.segment_name like 'BIN%' or e.segment_name like '_SYSSMU%'; SEGMENT_NAME PREDICATE_1 PREDICATE_2 PRED_1_RES PRED_2_RES ------------------------------ ----------- ----------- -------------- -------------- _SYSSMU1_3588498444$ 1 0 n y _SYSSMU1_3588498444$ 1 0 n y _SYSSMU1_3588498444$ 1 0 n y _SYSSMU1_3588498444$ 1 0 n y _SYSSMU1_3588498444$ 1 0 n y _SYSSMU2_2971032042$ 1 0 n y _SYSSMU2_2971032042$ 1 0 n y ... _SYSSMU10_2262159254$ 1 0 n y _SYSSMU10_2262159254$ 1 0 n y _SYSSMU10_2262159254$ 1 0 n y BIN$xi7yNJwGcIrgUwIAFaxDaA==$0 1 65536 n n BIN$xi7yNJwFcIrgUwIAFaxDaA==$0 1 65536 n n Re "datafile_bytes-freespace": Don't forget that each datafile has own header, so nor dba_segments, nor dba_extents should not count it. PS. Other 10 rows are undo segments, but that is not your case since your query checks just your MYDATA tablespace, not UNDO.
how i need to pick values depending upon other values
I have a table with data shown below no s d 100 I D 100 C D 101 C null 101 I null 102 C D 102 I null then i'm using this query to partition create table pinky nologging as select no,status,dead from(select no,status,dead, row_number() over(partition by no order by dead desc) seq from PINK) d where seq = 1; i'm getting this results 100 I D 101 C null 102 I null but i want data like shown below 100 C D 101 I NULL 102 I NULL i.e, FOR I AND C COMBINATION and both d column is D then pick C FOR I AND C COMBINATION and both d column is null then pick I FOR I AND C COMBINATION and d column is null AND d then pick null corresponding value
Assuming that there can be only one dead is null record for each no: with -- you data, remove it when running the query -- in your environment ... pink (no, status, dead) as (select 100, 'I', 'D' from dual union select 100, 'C', 'D' from dual union select 101, 'C', null from dual union select 101, 'I', null from dual union select 102, 'C', 'D' from dual union select 102, 'I', null from dual ), -- ... end of you data -- temp as ( -- a temporary table (CTE) where we make some -- preliminary calculations select pink.*, -- count rows with status = 'I' for this no sum(case when status = 'I' then 1 else 0 end) over(partition by no) ni, -- count rows with status = 'C' for this no sum(case when status = 'C' then 1 else 0 end) over(partition by no) nc, -- count rows with dead = 'D' for this no sum(case when dead = 'D' then 1 else 0 end) over(partition by no) nd, -- total number of rows (in case it's not always = 2) count(*) over(partition by no) n from pink ) select no, status, dead from temp where -- pick 'C' if there's also 'I' and all dead = 'D' status = 'C' and ni > 0 and nd = n -- pick 'I' if there's also 'C' and all dead is null or status = 'I' and nc > 0 and nd = 0 -- pick dead is null if there are I's and C's and -- all other dead's = 'D' or dead is null and ni > 0 and nc > 0 and n - nd = 1;
calculate percentage of two select counts
I have a query like select count(1) from table_a where state=1; it gives 20 select count(1) from table_a where state in (1,2); it gives 25 I would like to have a query to extract percentage 80% (will be 20*100/25). Is possible to have these in only one query?
I think without testing that the following SQL command can do that SELECT SUM(CASE WHEN STATE = 1 THEN 1 ELSE 0 END) /SUM(CASE WHEN STATE IN (1,2) THEN 1 ELSE 0 END) as PERCENTAGE FROM TABLE_A or the following SELECT S1 / (S1 + S2) as S1_PERCENTAGE FROM ( SELECT SUM(CASE WHEN STATE = 1 THEN 1 ELSE 0 END) as S1 ,SUM(CASE WHEN STATE = 2 THEN 1 ELSE 0 END) as S2 FROM TABLE_A ) or the following SELECT S1 / T as S1_PERCENTAGE FROM ( SELECT SUM(CASE WHEN STATE = 1 THEN 1 ELSE 0 END) as S1 ,SUM(CASE WHEN STATE IN (1,2) THEN 1 ELSE 0 END) as T FROM TABLE_A ) you have the choice for performance or readability !
Just as a slight variation on #schlebe's first query, you can continue to use count() by making that conditional: select count(case when state = 1 then state end) / count(case when state in (1, 2) then state end) as result from table_a or multiplying by 100 to get a percentage instead of a decimal: select 100 * count(case when state = 1 then state end) / count(case when state in (1,2) then state end) as percentage from table_a Count ignores nulls, and both of the case expressions default to null if their conditions are not met (you could have else null to make it explicit too). Quick demo with a CTE for dummy data: with table_a(state) as ( select 1 from dual connect by level <= 20 union all select 2 from dual connect by level <= 5 union all select 3 from dual connect by level <= 42 ) select 100 * count(case when state = 1 then state end) / count(case when state in (1,2) then state end) as percentage from table_a; PERCENTAGE ---------- 80
Why the plsql tag? Regardless, i think what you need is: (select count(1) from table_a where state=1) * 100 / (select count(1) from table_a where state in (1,2)) from dual