Query to find Index bloat - greenplum

I need a query to find whether index bloat on a table. I saw some queries where they are comparing table size with index size. If there is any other approach, please share the query.
I am using Greenplum 4.3 (which is based Postgres 8.2)

Bloat Score Query
The following SQL query will examine each table in the XML schema and identify dead rows (tuples) that are wasting disk space.
SELECT schemaname || '.' || relname as tblnam,
n_dead_tup,
(n_dead_tup::float / n_live_tup::float) * 100 as pfrag
FROM pg_stat_user_tables
WHERE schemaname = 'xml' and n_dead_tup > 0 and n_live_tup > 0 order by pfrag desc;
If this query returns a high percentage ( pfrag ) of dead tuples, the VACUUM command may be used to reclaim space.
7 Considered to be high
From wiki.postgres.org
SELECT
current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/
ROUND((CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages::float/otta END)::numeric,1) AS tbloat,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes,
iname, /*ituples::bigint, ipages::bigint, iotta,*/
ROUND((CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages::float/iotta END)::numeric,1) AS ibloat,
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
FROM (
SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols
FROM (
SELECT
ma,bs,schemaname,tablename,
(datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
(maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
) AS nullhdr
FROM pg_stats s, (
SELECT
(SELECT current_setting('block_size')::numeric) AS bs,
CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
JOIN pg_class cc ON cc.relname = rs.tablename
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema'
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC

Related

Oracle: Value from main query is not available in subquery

I have this query, and one of its column is a subquery that should be bringing a list of values using a listagg function. This list has its starting point as the S.ID_ORGAO_INTELIGENCIA value. The list is a should be, it always has values.
The listagg function is consuming an inline view that uses a window function to create the list.
select *
from (
SELECT DISTINCT S.ID_SOLICITACAO,
S.NR_PROTOCOLO_SOLICITACAO,
S.DH_INCLUSAO,
S.ID_USUARIO,
U.NR_CPF,
OI.ID_MODULO,
OI.ID_ORGAO_INTELIGENCIA,
OI.NO_ORGAO_INTELIGENCIA,
R.ID_ATRIBUICAO,
P.ID_PERMISSAO,
1 AS TIPO_NOTIFICACAO,
(
select LISTAGG(oc6.ID_ORGAO_INTELIGENCIA || '-' || oc6.ord || '-', '; ') WITHIN GROUP (ORDER BY oc6.ord) eai
from (
SELECT oc1.ID_ORGAO_INTELIGENCIA,
oc1.ID_ORGAO_INTELIGENCIA_PAI,
oc1.SG_ORGAO_INTELIGENCIA,
rownum as ord
FROM TB_ORGAO_INTERNO oc1
WHERE oc1.DH_EXCLUSAO is null
-- THE VALUE FROM S.ID_ORGAO_INTELIGENCIA IS NOT AVAILBLE HERE
START WITH oc1.ID_ORGAO_INTELIGENCIA = S.ID_ORGAO_INTELIGENCIA
CONNECT BY prior oc1.ID_ORGAO_INTELIGENCIA_PAI = oc1.ID_ORGAO_INTELIGENCIA
) oc6) aproPrec
FROM TB_SOLICITACAO S
INNER JOIN TB_ORGAO_INTERNO OI ON S.ID_ORGAO_INTELIGENCIA = OI.ID_ORGAO_INTELIGENCIA
INNER JOIN TB_RELACIONAMENTO_ATRIBUICAO R
ON (R.ID_MODULO = OI.ID_MODULO AND R.ID_ORGAO_INTELIGENCIA IS NULL AND
R.ID_SOLICITACAO IS NULL)
INNER JOIN TB_PERMISSAO P
ON (P.ID_USUARIO = :usuario AND P.ID_ORGAO_INTELIGENCIA = :orgao AND
P.ID_ATRIBUICAO = R.ID_ATRIBUICAO)
INNER JOIN TB_USUARIO U ON (U.ID_USUARIO = S.ID_USUARIO)
WHERE 1 = 1
AND U.DH_EXCLUSAO IS NULL
AND P.DH_EXCLUSAO IS NULL
AND S.DH_EXCLUSAO IS NULL
AND OI.DH_EXCLUSAO IS NULL
AND R.ID_ATRIBUICAO IN :atribuicoes
AND P.ID_STATUS_PERMISSAO = 7
AND OI.ID_MODULO = 1
AND S.ID_STATUS_SOLICITACAO IN (1, 2, 5, 6)
and s.ID_ORGAO_INTELIGENCIA in (SELECT DISTINCT o.ID_ORGAO_INTELIGENCIA
FROM TB_ORGAO_INTERNO o
WHERE o.DH_EXCLUSAO IS NULL
START WITH o.ID_ORGAO_INTELIGENCIA = 3
CONNECT BY PRIOR o.ID_ORGAO_INTELIGENCIA = o.ID_ORGAO_INTELIGENCIA_PAI)
);
The problem is that the aproPrec column is always returning null as its result.
If I force the criteria to have the S.ID_ORGAO_INTELIGENCIA hardcoded, the list returns its true value.
If I chance this:
START WITH oc1.ID_ORGAO_INTELIGENCIA = S.ID_ORGAO_INTELIGENCIA
To this:
START WITH oc1.ID_ORGAO_INTELIGENCIA = 311
where 311 is the value that the S.ID_ORGAO_INTELIGENCIA column really has.
Is there a way to make this query works as 'I think' it should work?
To make it work, I changed the subquery by this another one:
(
select qt_.*
from (
SELECT QRY_NAME.*,
rownum as ord
FROM (
SELECT oc1.ID_ORGAO_INTELIGENCIA,
oc1.ID_ORGAO_INTELIGENCIA_PAI,
connect_by_root (oc1.ID_ORGAO_INTELIGENCIA) as root
FROM TB_ORGAO_INTERNO oc1
CONNECT BY NOCYCLE PRIOR oc1.ID_ORGAO_INTELIGENCIA_PAI = oc1.ID_ORGAO_INTELIGENCIA
) QRY_NAME
WHERE root = s.ID_ORGAO_INTELIGENCIA
) qt_
)

performance of a self join on oracle db

I have this self join that is very slow on oracle DB. I have put indexes on all fields concerned. Does anybody have advice on how to increase performance?
select count(tNew.idtariffa) CONT
from tariffe tAtt
join tariffe tNew on tAtt.idtariffa = tNew.idtariffa
where (tAtt.stato_attivo = 't')
and (tNew.stato_attivo = 'f')
and (tAtt.validity_date < tNew.validity_date)
and (tAtt.dataimport < tNew.dataimport)
and (tNew.validity_date < to_date('2017-6-26','YYYY-MM-DD'))
Try PUSH_PRED hint :
select /*+ NO_MERGE(tNew) PUSH_PRED(tNew) */
count(tNew.idtariffa) CONT
from tariffe tAtt
join tariffe tNew on tAtt.idtariffa = tNew.idtariffa
where (tAtt.stato_attivo = 't')
and (tNew.stato_attivo = 'f')
and (tAtt.validity_date < tNew.validity_date)
and (tAtt.dataimport < tNew.dataimport)
and (tNew.validity_date < to_date('2017-6-26','YYYY-MM-DD'))
Exists version is worth of try:
select count(1) cont
from tariffe n
where stato_attivo = 'f'
and validity_date < date '2017-06-26'
and exists ( select null
from tariffe
where idtariffa = n.idtariffa
and stato_attivo = 't'
and validity_date < n.validity_date
and dataimport < n.dataimport )
Performance tuning without details like data volumes, data skew, index defintions, explain plan, etc is just guessing.
So here are some more guesses :)
Your driving table should be tariffe tNew as that's the one you use to top the result set.
tNew.validity_date < to_date('2017-6-26','YYYY-MM-DD'))
Now, unless tNew.stato_attivo = 'f' is extremely selective you're going to be retrieving a large chunk of all the rows in the table (depending on how far back the records go) so a Full Table Scan would be the most efficient way of grabbing those records.
The join on tariffe tAtt is problematic because idtariffa is not a unique column. So the join is a set of tNew records against a set of tAtt records. These will be filtered in memory using the criteria in the WHERE clause.
" I have put indexes on all fields concerned"
Single column indexes won't help here. You might get some joy from a compound index on all the pertinent columns:
tariffe (stato_attivo , validity_date, idtariffa, dataimport)
This would be worth building if you run this query very often.
Any other guesses? Subquery factoring to hit the main table once. Doing a Full Table Scan just once would speed things up if tariffe has a lot of columns.
with cte as (
select stato_attivo , validity_date, idtariffa, dataimport
from tariffe
where validity_date < to_date('2017-6-26','YYYY-MM-DD'
)
select count(tNew.idtariffa) CONT
from cte tNew
join cte tAtt on tAtt.idtariffa = tNew.idtariffa
where (tAtt.stato_attivo = 't')
and (tNew.stato_attivo = 'f')
and (tAtt.validity_date < tNew.validity_date)
and (tAtt.dataimport < tNew.dataimport)

ORA-01427: Subquery returns more than one row

When I execute the query below, I get the message like this: "ORA-01427: Sub-query returns more than one row"
Define REN_RunDate = '20160219'
Define MOP_ADJ_RunDate = '20160219'
Define RID_RunDate = '20160219'
Define Mbr_Err_RunDate = '20160219'
Define Clm_Err_RunDate = '20160219'
Define EECD_RunDate = '20160219'
select t6.Member_ID, (Select 'Y' from MBR_ERR t7 where t7.Member_ID = t6.Member_ID and t7.Rundate = &Mbr_Err_RunDate ) Mbr_Err,
NVL(Claim_Sent_Amt,0) Sent_Claims, Rejected_Claims,Orphan_Claim_Amt,Claims_Accepted, MOP_Adj_Sent Sent_MOP_Adj,Net_Sent,
(Case
When Net_Sent < 45000 then 0
When Net_Sent > 25000 then 20500
Else
Net_Sent - 45000
End
)Net_Sent_RI,
' ' Spacer,
Total_Paid_Claims CMS_Paid_Claims, MOP_Adjustment CM_MOP_Adj, MOP_Adjusted_Paid_claims CM_Net_Claims, Estimated_RI_Payment CM_RI_Payment
from
(
select NVL(t3.Member_ID,t5.Member_ID)Member_ID, t3.Claim_Sent_Amt, NVL(t4.Reject_Claims_Amt,0) Rejected_Claims, NVL( t8.Orphan_Amt,0) Orphan_Claim_Amt,
(t3.Claim_Sent_Amt - NVL(t4.Reject_Claims_Amt,0) - NVL(t8.Orphan_Amt,0)) Claims_Accepted,
NVL(t2.MOP_Adj_Amt,0) MOP_Adj_Sent ,
( (t3.Claim_Sent_Amt - NVL(t4.Reject_Claims_Amt,0)) - NVL(t2.MOP_Adj_Amt,0) - NVL(t8.Orphan_Amt,0) ) Net_Sent,
t5.Member_ID CMS_Mbr_ID,t5.Total_Paid_Claims,t5.MOP_Adjustment, t5.MOP_Adjusted_Paid_Claims, t5.Estimated_RI_Payment
From
(
Select t1.Member_ID, Sum( t1.Paid_Amount) Claim_Sent_Amt
From RENS t1
where t1.rundate = &REN_RunDate
group by t1.Member_ID
) t3
Left Join MOP_ADJ t2
on (t3.Member_ID = t2.Member_ID and t2.rundate = &MOP_ADJ_RunDate)
Left Join
(select Member_ID, sum(Claim_Total_Paid_Amount) Reject_Claims_Amt from CLAIM_ERR
where Rundate = &Claim_Err_RunDate
and Claim_Total_Paid_Amount != 0
Group by member_ID
)t4
on (t4.Member_ID = t3.Member_ID )
Full Outer Join
(
select distinct Member_ID,Total_Paid_Claims,MOP_Adjustment,MOP_Adjusted_Paid_Claims, Estimated_RI_Payment
from RID
where Rundate = &RID_RunDate
and Estimated_RI_Payment != 0
)t5
On(t5.Member_ID = t3.Member_ID)
Left Outer Join
(
select Member_ID, Sum(Claim_Paid_Amount) Orphan_Amt
From EECD
where RunDate = &EECD_RunDate
group by Member_ID
)t8
On(t8.Member_ID = t3.Member_ID)
)t6
order by Member_ID
You have this expression among the select columns (at the top of your code):
(Select 'Y' from MBR_ERR t7 where t7.Member_ID = t6.Member_ID
and t7.Rundate = &Mbr_Err_RunDate ) Mbr_Err
If you want to select the literal 'Y', then just select 'Y' as Mbr_Err. If you want to select either 'Y' or null, depending on whether the the subquery returns exactly one row or zero rows, then write it that way.
I suspect this subquery (or perhaps another one in your code, used in a similar way) returns more than one row - in which case you will get exactly the error you got.

Finding SP chain calls in oracle

I got these three giant schema in Oracle which I call them db layers (L3, L2, L1).
In each layer I got many SPs which might call some procedures from their underlying layers. Now for documentation purposes I need to draw something like a tree to show these chain calls. Well I'm not interested to get involved in the drudgery of extracting this data manually.
The question is, is there an automated way to do this? like a query to find out who calls who.
I was just fiddling around a little. So maybe like a starting point.
Replace DBA_OBJECTS.OWNER IN ('HUSQVIK') with your schemas.
WITH leafs AS (
SELECT
DBA_OBJECTS.OWNER, DBA_OBJECTS.OBJECT_NAME NAME,
CASE WHEN COUNT(PARENT_REFERENCES.REFERENCED_NAME) > 0 THEN 1 ELSE 0 END IS_REFERENCED,
CASE WHEN COUNT(CHILD_REFERENCES.NAME) > 0 THEN 1 ELSE 0 END HAS_REFERENCES
FROM
DBA_OBJECTS
LEFT JOIN DBA_DEPENDENCIES PARENT_REFERENCES ON DBA_OBJECTS.OWNER = PARENT_REFERENCES.REFERENCED_OWNER AND DBA_OBJECTS.OBJECT_NAME = PARENT_REFERENCES.REFERENCED_NAME
LEFT JOIN DBA_DEPENDENCIES CHILD_REFERENCES ON DBA_OBJECTS.OWNER = CHILD_REFERENCES.OWNER AND DBA_OBJECTS.OBJECT_NAME = CHILD_REFERENCES.NAME
WHERE
OBJECT_TYPE IN ('PACKAGE BODY', 'FUNCTION', 'PROCEDURE')
AND DBA_OBJECTS.OWNER IN ('HUSQVIK')
GROUP BY
DBA_OBJECTS.OWNER, DBA_OBJECTS.OBJECT_NAME
)
SELECT 'Entry point -> ' || OWNER || '.' || NAME DEPENDENCY_PATH, 1 MAX_STACK_DEPTH FROM leafs WHERE leafs.IS_REFERENCED = 0 AND leafs.HAS_REFERENCES = 0
UNION ALL
SELECT
DEPENDENCY_PATH, STACK_DEPTH
FROM (
SELECT
'Entry point -> ' ||
CONNECT_BY_ROOT DBA_DEPENDENCIES.OWNER || '.' || CONNECT_BY_ROOT DBA_DEPENDENCIES.NAME ||
SYS_CONNECT_BY_PATH(DBA_DEPENDENCIES.REFERENCED_OWNER || '.' || DBA_DEPENDENCIES.REFERENCED_NAME, ' -> ') DEPENDENCY_PATH,
CONNECT_BY_ISLEAF ISLEAF,
LEVEL + 1 STACK_DEPTH
FROM
DBA_DEPENDENCIES
LEFT JOIN
(SELECT * FROM leafs WHERE leafs.IS_REFERENCED = 0) roots
ON roots.OWNER = DBA_DEPENDENCIES.OWNER AND roots.NAME = DBA_DEPENDENCIES.NAME
WHERE
DBA_DEPENDENCIES.REFERENCED_TYPE IN ('PACKAGE BODY', 'FUNCTION', 'PROCEDURE')
START WITH
roots.NAME IS NOT NULL
CONNECT BY NOCYCLE
PRIOR DBA_DEPENDENCIES.REFERENCED_OWNER = DBA_DEPENDENCIES.OWNER AND
PRIOR DBA_DEPENDENCIES.REFERENCED_NAME = DBA_DEPENDENCIES.NAME)
WHERE ISLEAF = 1
Get all dependencies of your schema with this query:
select * from all_dependencies where owner = 'your_schema_name'
Export result of query to JSON (or any other format).
Process JSON of dependencies to generate tree(s).

Counting the total number of rows depending on a column value

I have a query that should count the total number of rows returned depending on a column value. For example:
As you can see, the M field should display the total number of rows returned which should be 5 because the FT_LOT are all the same value. Here is the query that I have so far:
SELECT DISTINCT
VBATCH_ID, MAXIM_PN, BAGNUMBER, FT_LOT
, m
, level as n
FROM
(
SELECT
VBATCH_ID, MAXIM_PN, BAGNUMBER, FT_LOT, QTY, DC, PRINTDATE, WS_GREEN, WS_PNR, WS_PCN, MSL, BAKETIME, EXPTIME
, una
, dulo
, (dulo - una) + 1 AS m
FROM
(
SELECT c.containername VBATCH_ID
,pb.productname MAXIM_PN
,bn.wipdatavalue BAGNUMBER
,ln.wipdatavalue FT_LOT
,aw.wipdatavalue QTY
,DECODE(ln.wipdatavalue,la.attr_081,la.attr_083
,la.attr_085,la.attr_087
,la.attr_089,la.attr_091
,la.attr_093,la.attr_095
,la.attr_097,la.attr_099
,la.attr_101,la.attr_103
,la.attr_105,la.attr_107
,la.attr_109,la.attr_111
,la.attr_113,la.attr_116
,la.attr_117,la.attr_119
) DC
,TO_CHAR(SYSDATE,'MM/DD/YYYY HH:MI:SS PM') PRINTDATE
,DECODE(UPPER(la.attr_158),'GREEN','HF',NULL) WS_GREEN
,DECODE(la.attr_140,NULL,NULL,'PNR') WS_PNR
,DECODE(la.Attr_080,NULL,NULL,'PCN') WS_PCN
,p.attr_011 MSL
,P.attr_013 BAKETIME
,p.attr_014 EXPTIME
, CASE
WHEN INSTR(bn.wipdatavalue, '-') = 0 THEN
bn.wipdatavalue
ELSE
SUBSTR(bn.wipdatavalue, 1, INSTR(bn.wipdatavalue, '-')-1)
END AS una
, CASE
WHEN INSTR(bn.wipdatavalue, '-') = 0 THEN
bn.wipdatavalue
ELSE
SUBSTR(bn.wipdatavalue, INSTR(bn.wipdatavalue, '-') + 1)
END AS dulo
FROM Container C
JOIN a_lotattributes la ON c.lotattributesid = la.lotattributesid
JOIN product p ON c.productid=p.productid
JOIN productbase pb ON p.productbaseid=pb.productbaseid
JOIN a_adhocwipdatarecord a ON a.objectrefid=c.containerid
JOIN a_adhocwipdatarecorddetails bn ON a.adhocwipdatarecordid=bn.adhocwipdatarecordid AND bn.wipdatanamename ='TR_BAG_NUMBER'
LEFT JOIN a_adhocwipdatarecorddetails ln ON a.adhocwipdatarecordid=ln.adhocwipdatarecordid AND ln.wipdatanamename ='TR_FT_LOT NUMBER'
LEFT JOIN a_adhocwipdatarecorddetails aw ON A.adhocwipdatarecordid=aw.adhocwipdatarecordid AND aw.wipdatanamename ='TR_FT LOT QTY'
WHERE ln.wipdatavalue = :ftlot AND bn.wipdatavalue LIKE :wip
)
) WHERE level LIKE :n
CONNECT BY LEVEL <= m
ORDER BY BAGNUMBER
Thanks for helping out guys.
Actually GROUP BY is not the solution. Having looked again at your desired output I have realised that what you want is an analytic count.
Your posted query is a bit of a mess and, sorry ,but I'm not prepared to invest time in it. This is the sort of structure you need:
select vbatch_id, maxim_pn, bagnumber, ft_lot
, count(*) over (partition by ft_lot) m
from whatever ...
Find out more.
Not sure why you need the DISTINCT. DISTINCT almost always indicates a failure to get the WHERE clause right.

Resources