Query taking time despite adding session settings - hadoop

Following is the ETL generated query
Query -
SELECT infaHiveSysTimestamp('SS') as a0, 7991 as a1, single_use_subq30725.a1 as a2, SUBSTR(SUBSTR(single_use_subq30725.a2, 0, 5), 0, 5) as a3, CAST(1 AS SMALLINT) as a4, single_use_subq30725.a3 as a5, single_use_subq30725.a4 as a6, SUBSTR(SUBSTR(SUBSTR(single_use_subq30725.a8, (CASE WHEN 12 < (- LENGTH(single_use_subq30725.a8)) THEN 0 ELSE 12 END), 104857600), 0, 20), 0, 20) as a7, infaNativeUDFCallString('TO_CHAR', single_use_subq30725.a5) as a8, infaHiveSysTimestamp('SS') as a9, CAST(infaNativeUDFCallDate('TRUNC', single_use_subq30725.a6, 'DD') AS DATE) as a10 FROM (SELECT (CASE WHEN 1 = t1.a1 THEN t1.a0 ELSE CAST(NULL AS TIMESTAMP) END) as a0, infaNativeUDFCallDate('TRUNC', (CASE WHEN 1 = t1.a1 THEN t1.a0 ELSE CAST(NULL AS TIMESTAMP) END), 'DD') as a1
FROM
(
SELECT MAX(t1.a0) as a0, MAX(t1.a1) as a1
FROM (
SELECT mstr_load_audit.last_run_ts as a0, 1 as a1 FROM mstr_etl.mstr_load_audit WHERE interface_name='m_CTM_RAWTLogData_target_tbl'
) t1
)t1
) single_use_subq39991
JOIN (
SELECT w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.CREATE_TS as a0, CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.txACTION_ID AS STRING) as a1, SUBSTR(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.STORE_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.STORE_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 10), (CASE WHEN 0 < (- LENGTH(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.STORE_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.STORE_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 10))) THEN 0 ELSE 0 END), 10) as a2, SUBSTR(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LANE_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LANE_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 10), (CASE WHEN 0 < (- LENGTH(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LANE_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LANE_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 10))) THEN 0 ELSE 0 END), 10) as a3, SUBSTR(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 20), (CASE WHEN 0 < (- LENGTH(SUBSTR(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_NUM AS DECIMAL(18, 0))), (CASE WHEN 0 < (- LENGTH(infaNativeUDFCallString('TO_CHAR', CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_NUM AS DECIMAL(18, 0))))) THEN 0 ELSE 0 END), 20))) THEN 0 ELSE 0 END), 20) as a4, CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.LOYALTY_DEV_NUM AS DECIMAL(28, 0)) as a5, CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_DT AS TIMESTAMP) as a6, CAST(w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.ETL_LOAD_DT AS TIMESTAMP) as a7, w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.tx_TS as a8
FROM
sourcedb.W5883634877684653839_Read_lcl_tlog_raw_2_VIEW__m_CTM_RAWTLogData_target_tbl
)
single_use_subq30725
WHERE (single_use_subq39991.a0 < single_use_subq30725.a0) AND (single_use_subq39991.a1 <= single_use_subq30725.a7)]
As this query is generated in hive pushdown mode, we have added following settings in the environment sql
SET hive.vectorized.execution.enabled=true;
SET hive.vectorized.execution.reduce.enabled=true;
SET hive.cbo.enable=true;
SET hive.compute.query.using.stats=true;
set hive.exec.orc.split.strategy=BI;
set hive.merge.tezfiles=true;
But we did not see any significant gains.
We do have an option of moving this job in traditional batch mode - where we run this via shell script.
Is there any scope of making any changes to the query there by reducing the execution time. I am sure we can get rid of all type conversions and reduce execution time there.
Are there any additional things, we can try.

This join in your query:
JOIN (
SELECT
... SKIPPED ...
)
single_use_subq30725
WHERE (single_use_subq39991.a0 < single_use_subq30725.a0) AND (single_use_subq39991.a1 <= single_use_subq30725.a7)]
works as CROSS JOIN because no ON condition specified.
After this CROSS JOIN, the dataset is being filtered using this WHERE (single_use_subq39991.a0 < single_use_subq30725.a0) AND (single_use_subq39991.a1 <= single_use_subq30725.a7)
Actually it does not multiply rows and should work as MAP-JOIN because first subquery returns one row maximum:
SELECT MAX(t1.a0) as a0, MAX(t1.a1) as a1
Add this setting to enable map-join: set hive.auto.convert.join=true;
Check map-join is in the EXPLAIN output.
But the biggest problem is not this CROSS (MAP?) join itself. It prevents predicate push-down to work before join, when reading table in second query.
I suggest to remove join at all and calculate first query once and provide a0 and a1 as a parameters in the where clause. In such way you will eliminate unnnecessary join and predicate push-down may work directly.
For example PPD could be applied to this column: w5883634877684653839_read_lcl_tlog_raw_2_view__m_ctm_rawtlogdata_target_tbl.CREATE_TS as a0
Check PPD and other performance settings: https://stackoverflow.com/a/48296562/2700344

Related

How to get the data from oracle on the following demand?

The table like this.
bh sl productdate
a1 100 2022-1-1
a1 220 2022-1-2
a1 220 2022-1-3
a2 200 2022-1-1
a2 350 2022-1-2
a2 350 2022-1-3
The result like this.
bh sl_q(sl_before) sl_h(sl_after) sl_b(changeValue) productdate
a1 100 220 120 2022-1-2
a2 200 350 150 2022-1-2
Rules:the same field bh, when the field sl change,then get the record.
We can use a ROW_NUMBER trick here:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY bh ORDER BY productdate) rn1,
ROW_NUMBER() OVER (PARTITION BY bh ORDER BY productdate DESC) rn2
FROM yourTable t
)
SELECT bh, MAX(CASE WHEN rn1 = 1 THEN sl END) AS sl_q,
MAX(CASE WHEN rn2 = 1 THEN sl END) AS sl_h,
MAX(CASE WHEN rn2 = 1 THEN sl END) -
MAX(CASE WHEN rn1 = 1 THEN sl END) AS sl_b
FROM cte
GROUP BY bh;

How to use ClickHouse partition value in SQL query?

I have a table with tuple partitions: (0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1), (3, 0), ...
CREATE TABLE my_table
(
id Int32,
a Int32,
b Float32,
c Int32
)
ENGINE = MergeTree
PARTITION BY
(
intDiv(id, 1000000),
a < 20000 AND b > 0.6 AND c >= 100
)
ORDER BY id;
I need only rows with partition (<any number>, 1) and I'm looking for a way to use partition value in a query like
SELECT *
FROM my_table
WHERE my_table.partition[2] == 1;
Does ClickHouse have such a feature?
In version 21.6 was added virtual columns _partition_id and _partition_value that can help you:
SELECT
*,
_partition_id,
_partition_value
FROM my_table
WHERE (_partition_value.2) = 1
And what is the problem with
where (a < 20000 AND b > 0.6 AND c >= 100) = 1
???
insert into my_table select 1, 3000000, 0, 0 from numbers(100000000);
insert into my_table select 1, 0, 10, 200 from numbers(100);
SET send_logs_level = 'debug';
set force_index_by_date=1;
select sum(id) from my_table where (a < 20000 AND b > 0.6 AND c >= 100) = 1;
...Selected 1/7 parts by partition key...
┌─sum(id)─┐
│ 100 │
└─────────┘
1 rows in set. Elapsed: 0.002 sec.
Though (_partition_value.2) = 1 will be faster because it does not require to read columns a,b,c for filtering.

Oracle Segment Does Not Equal Extents?

For a given tablespace, why doesn't the sum of bytes in dba_extents equal the sum of bytes in dba_segments? (additional questions after sample script.)
SQL> with
"SEG" as
( select 'segment_bytes' what
, to_char(sum(bytes), '9,999,999,999,999') bytes
from dba_segments
where tablespace_name = 'MYDATA'
)
, "EXT" as
( select 'extent_bytes' what
, to_char(sum(bytes), '9,999,999,999,999') bytes
from dba_extents
where tablespace_name = 'MYDATA'
)
, "FS" as
( select tablespace_name
, sum(bytes) free_bytes
from dba_free_space
where tablespace_name = 'MYDATA'
group by tablespace_name
),
"DF" as
( select tablespace_name
, sum(bytes) alloc_bytes
, sum(user_bytes) user_bytes
from dba_data_files
where tablespace_name = 'MYDATA'
group by tablespace_name
)
select what, bytes from SEG
union all select 'datafile_bytes-freespace' what
, to_char(alloc_bytes - nvl(free_bytes, 0), '9,999,999,999,999') used_file_bytes
from DF
left join FS
on DF.tablespace_name = FS.tablespace_name
union all select 'datafile_userbytes-freespace' what
, to_char(user_bytes - nvl(free_bytes, 0), '9,999,999,999,999') used_user_bytes
from DF
left join FS
on DF.tablespace_name = FS.tablespace_name
union all select what, bytes from EXT
;
WHAT BYTES
---------------------------- ------------------
segment_bytes 2,150,514,819,072
datafile_bytes-freespace 2,150,528,540,672
datafile_userbytes-freespace 2,150,412,845,056
extent_bytes 2,150,412,845,056
4 rows selected.
I would have expected segment_bytes to equal either extent_bytes or datafile_bytes-freespace, but it falls somewhere in between.
Is segment_bytes more than extent_bytes due to segment "overhead" (keeping track of all of the extents)?
If so, then is it also true that this segment "overhead" is part of the datafile "overhead"?
Oracle 19.1 Enterprise Edition. Thanks in advance.
For example, the difference between dba_segments and dba_extents might be in the objects from recyclebin: please look at the results from my test database:
with
seg as (
select segment_name,sum(bytes) b1
from dba_segments
group by segment_name
)
,ext as (
select segment_name,sum(bytes) b2
from dba_extents
group by segment_name
)
select
seg.segment_name seg1
,ext.segment_name seg2
,b1,b2
from seg full outer join ext on seg.segment_name=ext.segment_name
where lnnvl(b1=b2)
order by 1,2;
Results:
SEG1 SEG2 B1 B2
------------------------------ ------------------------------ ---------- ----------
BIN$xi7yNJwFcIrgUwIAFaxDaA==$0 65536
BIN$xi7yNJwGcIrgUwIAFaxDaA==$0 65536
_SYSSMU10_2262159254$ _SYSSMU10_2262159254$ 0 4325376
_SYSSMU1_3588498444$ _SYSSMU1_3588498444$ 0 3276800
_SYSSMU2_2971032042$ _SYSSMU2_2971032042$ 0 2228224
_SYSSMU3_3657342154$ _SYSSMU3_3657342154$ 0 2228224
_SYSSMU4_811969446$ _SYSSMU4_811969446$ 0 2293760
_SYSSMU5_3018429039$ _SYSSMU5_3018429039$ 0 3276800
_SYSSMU6_442110264$ _SYSSMU6_442110264$ 0 2228224
_SYSSMU7_2728255665$ _SYSSMU7_2728255665$ 0 2097152
_SYSSMU8_801938064$ _SYSSMU8_801938064$ 0 2228224
_SYSSMU9_647420285$ _SYSSMU9_647420285$ 0 3276800
12 rows selected.
As you can see first 2 rows are objects from recyclebin, so you can run the same query and check if your objects are in recyclebin too. They are not visible in dba_extents, because they filtered out by segment_flag:
select text_vc from dba_views where view_name='DBA_EXTENTS';
select ds.owner, ds.segment_name, ds.partition_name, ds.segment_type,
ds.tablespace_name,
e.ext#, f.file#, e.block#, e.length * ds.blocksize, e.length, e.file#
from sys.uet$ e, sys.sys_dba_segs ds, sys.file$ f
where e.segfile# = ds.relative_fno
and e.segblock# = ds.header_block
and e.ts# = ds.tablespace_id
and e.ts# = f.ts#
and e.file# = f.relfile#
and bitand(NVL(ds.segment_flags,0), 1) = 0
and bitand(NVL(ds.segment_flags,0), 65536) = 0
union all
select
ds.owner, ds.segment_name, ds.partition_name, ds.segment_type,
ds.tablespace_name,
e.ktfbueextno, f.file#, e.ktfbuebno,
e.ktfbueblks * ds.blocksize, e.ktfbueblks, e.ktfbuefno
from sys.sys_dba_segs ds, sys.x$ktfbue e, sys.file$ f
where e.ktfbuesegfno = ds.relative_fno
and e.ktfbuesegbno = ds.header_block
and e.ktfbuesegtsn = ds.tablespace_id
and ds.tablespace_id = f.ts#
and e.ktfbuefno = f.relfile#
and bitand(NVL(ds.segment_flags, 0), 1) = 1
and bitand(NVL(ds.segment_flags,0), 65536) = 0;
So if we comment out those predicates (bitand(NVL(segment_flags,0)....) and check our difference (BIN$... and _SYSSMU... objects), we will find which predicates filter them out:
with
my_dba_extents(
OWNER,SEGMENT_NAME,PARTITION_NAME
,SEGMENT_TYPE,TABLESPACE_NAME,EXTENT_ID,FILE_ID
,BLOCK_ID,BYTES,BLOCKS,RELATIVE_FNO
,segment_flags)
as (
select ds.owner, ds.segment_name, ds.partition_name, ds.segment_type,
ds.tablespace_name,
e.ext#, f.file#, e.block#, e.length * ds.blocksize, e.length, e.file#
,segment_flags
from sys.uet$ e, sys.sys_dba_segs ds, sys.file$ f
where e.segfile# = ds.relative_fno
and e.segblock# = ds.header_block
and e.ts# = ds.tablespace_id
and e.ts# = f.ts#
and e.file# = f.relfile#
-- and bitand(NVL(ds.segment_flags,0), 1) = 0
-- and bitand(NVL(ds.segment_flags,0), 65536) = 0
union all
select
ds.owner, ds.segment_name, ds.partition_name, ds.segment_type,
ds.tablespace_name,
e.ktfbueextno, f.file#, e.ktfbuebno,
e.ktfbueblks * ds.blocksize, e.ktfbueblks, e.ktfbuefno
,segment_flags
from sys.sys_dba_segs ds, sys.x$ktfbue e, sys.file$ f
where e.ktfbuesegfno = ds.relative_fno
and e.ktfbuesegbno = ds.header_block
and e.ktfbuesegtsn = ds.tablespace_id
and ds.tablespace_id = f.ts#
and e.ktfbuefno = f.relfile#
-- and bitand(NVL(ds.segment_flags, 0), 1) = 1
-- and bitand(NVL(ds.segment_flags,0), 65536) = 0
)
select
segment_name
,bitand(NVL(segment_flags, 0), 1) as predicate_1
,bitand(NVL(segment_flags,0), 65536) as predicate_2
,case when bitand(NVL(segment_flags,0), 1) = 0 then 'y' else 'n' end pred_1_res
,case when bitand(NVL(segment_flags,0), 65536) = 0 then 'y' else 'n' end pred_2_res
from my_dba_extents e
where e.segment_name like 'BIN%'
or e.segment_name like '_SYSSMU%';
SEGMENT_NAME PREDICATE_1 PREDICATE_2 PRED_1_RES PRED_2_RES
------------------------------ ----------- ----------- -------------- --------------
_SYSSMU1_3588498444$ 1 0 n y
_SYSSMU1_3588498444$ 1 0 n y
_SYSSMU1_3588498444$ 1 0 n y
_SYSSMU1_3588498444$ 1 0 n y
_SYSSMU1_3588498444$ 1 0 n y
_SYSSMU2_2971032042$ 1 0 n y
_SYSSMU2_2971032042$ 1 0 n y
...
_SYSSMU10_2262159254$ 1 0 n y
_SYSSMU10_2262159254$ 1 0 n y
_SYSSMU10_2262159254$ 1 0 n y
BIN$xi7yNJwGcIrgUwIAFaxDaA==$0 1 65536 n n
BIN$xi7yNJwFcIrgUwIAFaxDaA==$0 1 65536 n n
Re "datafile_bytes-freespace": Don't forget that each datafile has own header, so nor dba_segments, nor dba_extents should not count it.
PS. Other 10 rows are undo segments, but that is not your case since your query checks just your MYDATA tablespace, not UNDO.

how i need to pick values depending upon other values

I have a table with data shown below
no s d
100 I D
100 C D
101 C null
101 I null
102 C D
102 I null
then i'm using this query to partition
create table pinky nologging as
select no,status,dead
from(select no,status,dead,
row_number() over(partition by no order by dead desc) seq
from PINK) d
where seq = 1;
i'm getting this results
100 I D
101 C null
102 I null
but i want data like shown below
100 C D
101 I NULL
102 I NULL
i.e,
FOR I AND C COMBINATION and both d column is D then pick C
FOR I AND C COMBINATION and both d column is null then pick I
FOR I AND C COMBINATION and d column is null AND d then pick null corresponding value
Assuming that there can be only one dead is null record for each no:
with
-- you data, remove it when running the query
-- in your environment ...
pink (no, status, dead) as
(select 100, 'I', 'D' from dual union
select 100, 'C', 'D' from dual union
select 101, 'C', null from dual union
select 101, 'I', null from dual union
select 102, 'C', 'D' from dual union
select 102, 'I', null from dual
),
-- ... end of you data
--
temp as ( -- a temporary table (CTE) where we make some
-- preliminary calculations
select pink.*,
-- count rows with status = 'I' for this no
sum(case when status = 'I' then 1 else 0 end) over(partition by no) ni,
-- count rows with status = 'C' for this no
sum(case when status = 'C' then 1 else 0 end) over(partition by no) nc,
-- count rows with dead = 'D' for this no
sum(case when dead = 'D' then 1 else 0 end) over(partition by no) nd,
-- total number of rows (in case it's not always = 2)
count(*) over(partition by no) n
from pink
)
select no, status, dead
from temp
where -- pick 'C' if there's also 'I' and all dead = 'D'
status = 'C' and ni > 0 and nd = n
-- pick 'I' if there's also 'C' and all dead is null
or status = 'I' and nc > 0 and nd = 0
-- pick dead is null if there are I's and C's and
-- all other dead's = 'D'
or dead is null and ni > 0 and nc > 0 and n - nd = 1;

calculate percentage of two select counts

I have a query like
select count(1) from table_a where state=1;
it gives 20
select count(1) from table_a where state in (1,2);
it gives 25
I would like to have a query to extract percentage 80% (will be 20*100/25).
Is possible to have these in only one query?
I think without testing that the following SQL command can do that
SELECT SUM(CASE WHEN STATE = 1 THEN 1 ELSE 0 END)
/SUM(CASE WHEN STATE IN (1,2) THEN 1 ELSE 0 END)
as PERCENTAGE
FROM TABLE_A
or the following
SELECT S1 / (S1 + S2) as S1_PERCENTAGE
FROM
(
SELECT SUM(CASE WHEN STATE = 1 THEN 1 ELSE 0 END) as S1
,SUM(CASE WHEN STATE = 2 THEN 1 ELSE 0 END) as S2
FROM TABLE_A
)
or the following
SELECT S1 / T as S1_PERCENTAGE
FROM
(
SELECT SUM(CASE WHEN STATE = 1 THEN 1 ELSE 0 END) as S1
,SUM(CASE WHEN STATE IN (1,2) THEN 1 ELSE 0 END) as T
FROM TABLE_A
)
you have the choice for performance or readability !
Just as a slight variation on #schlebe's first query, you can continue to use count() by making that conditional:
select count(case when state = 1 then state end)
/ count(case when state in (1, 2) then state end) as result
from table_a
or multiplying by 100 to get a percentage instead of a decimal:
select 100 * count(case when state = 1 then state end)
/ count(case when state in (1,2) then state end) as percentage
from table_a
Count ignores nulls, and both of the case expressions default to null if their conditions are not met (you could have else null to make it explicit too).
Quick demo with a CTE for dummy data:
with table_a(state) as (
select 1 from dual connect by level <= 20
union all select 2 from dual connect by level <= 5
union all select 3 from dual connect by level <= 42
)
select 100 * count(case when state = 1 then state end)
/ count(case when state in (1,2) then state end) as percentage
from table_a;
PERCENTAGE
----------
80
Why the plsql tag? Regardless, i think what you need is:
(select count(1) from table_a where state=1) * 100 / (select count(1) from table_a where state in (1,2)) from dual

Resources