Slow Query performance due to not exists in oracle - oracle

this is my query it take more time to execute it can anyone make it faster!!!
I think the not exists causes more time consuming but I don't know how to convert it to left outer join with more conditions I have changed it many times but the result was changed with it.
thanks in advance.

As per basic tuning principle use exists or not exists if the query used inside not exists or exists has huge data.if it doesn't have huge data use IN or NOT IN instead
Also remove the distinct in SELECT DISTINCT t.tax_payer_no, taxestab.estab_no and use it in the CTE query and see how much time it makes
with data as (
SELECT t.tax_payer_no tax_payer_no,taxestab.estab_no estab_no.. rest of your query)
select count(1),tax_payer_no,estab_no from data
group by tax_payer_no,estab_no

Related

Replacing NOT IN with NOT EXISTS and an OUTER JOIN in Oracle Database 12c

I understand that the performance of our queries is improved when we use EXISTS and NOT EXISTS in the place of IN and NOT IN, however, is performance improved further when we replace NOT IN with an OUTER JOIN as opposed to NOT EXISTS?
For example, the following query selects all models from a PRODUCT table that are not in another table called PC. For the record, no model values in the PRODUCT or PC tables are null:
select model
from product
where not exists(
select *
from pc
where product.model = pc.model);
The following OUTER JOIN will display the same results:
select product.model
from product left join pc
on pc.model = product.model
where pc.model is null;
Seeing as these both return the same values, which option should we use to better improve the performance of our queries?
The query plan will tell you. It will depend on the data and tables. In the case of OUTER JOIN and NOT EXISTS they are the same.
However to your opening sentence, NOT IN and NOT EXISTS are not the same if NULL is accepted on model. In this case you say model cannot be null so you might find they all have the same plan anyway. However when making this assumption, the database must be told there cannot be nulls (using NOT NULL) as opposed to there simply not being any. If you don't it will make different plans for each query which may result in different performance depending on your actual data. This is generally true and particularly true for ORACLE which does not index NULLs.
Check out EXPLAIN PLAN

Why are subqueries in selects slow compared to joins?

I have two queries. The first retrieves some aggregate from another table as a column, using a subquery in the select (returns a string concatenation of a column for all rows).
The second query does the same by having a subselect in the from and then join the results. This second query however is doing the aggregate on the complete table before joining, but it is much faster (286ms vs 7645ms).
I don't understand why the subquery is so much slower, while the second query does the aggregate on a table with 175k rows (on postgresql 9.5). Using a subselect is much easier to integrate in a query builder, so I would like to use that, and the second query will slow down when the number of records increase. Is there a way to increase the speed of a subselect?
Query 1:
select kp_No,
(select string_agg(description,E'\n') from (select nt_Text as description from fgeNote where nt_kp_No=fgeContact.kp_No order by nt_No DESC limit 3) as subquery) as description
from fgeContact
where kp_k_No=729;
Explain: https://explain.depesz.com/s/8sL
Query 2:
select kp_No, NoteSummary
from fgeContact
LEFT JOIN
(select nt_kp_No, string_agg(nt_Text,E'\n') as NoteSummary
from
(select nt_kp_No, nt_Text from fgeNote ORDER BY nt_No DESC) as sortquery
group by nt_kp_No) as joinquery
ON joinquery.nt_kp_No=kp_No
where kp_k_No=729;
Explain: https://explain.depesz.com/s/yk9W
This is because in the second query you are retrieving all registers in a single scan while in the first one, the subquery is executed one time per each selected register of the master table so, each time, the table should be scanned again.
Even with indexed scans, it is usually more expensive than scanning the whole table, even sequentially (in fact, sequential scan is much faster than indexed scan when when selecting more than a few registers because indexing implies some overhead) and picking only for interesting registers.
But that depends to the actual data distribution too. Its perfecly possible that, for a different value of kp_k_No, the second query would become faster if the table contains only one or a few rows with that value for this parameter.
It's a matter of test and guess the different situations that will happen...

Full outer joins or Union . which is faster ( My table has a million rows )

I have to join 3 tables to retrieve data and looks like full outer join is a potential solution but during my try it took more than a hour to execute the query .
Any alternatives would be helpful .
thank you.
Not sure what your query looks like however, add indexes on the table if these tables are newly being created.
However, answering your question using UNION ALL will be faster, as it simply passes the first SELECT statement, and then parses the second SELECT statement and adds the results to the end of the output table. Even a Normal UNION is faster than a join.
The UNION's will make better use of indexes which could result in a faster query.

SQL Azure - query with row_number() executes slow if columns with nvarchar of big size are included

I have the following query (generated by Entity Framework with standard paging. This is the inner query and I added the TOP 438 part):
SELECT TOP 438 [Extent1].[Id] AS [Id],
[Extent1].[MemberType] AS [MemberType],
[Extent1].[FullName] AS [FullName],
[Extent1].[Image] AS [Image],
row_number() OVER (ORDER BY [Extent1].[FullName] ASC) AS [row_number]
FROM [dbo].[ShowMembers] AS [Extent1]
WHERE 3 = CAST( [Extent1].[MemberType] AS int)
ShowMembers table has about 11K rows, but only 438 with MemberType == 3. The 'Image' column is of type nvarchar(2000) that holds the URL to the image on a CDN. If I include this column in the query (only in SELECT part), the query chokes up somehow and generates result in a range between 2-30 seconds (it differs in different runs). If I comment out that column, query runs fast as expected. If I include the 'Image' column, but comment out the row_number column, query also runs fast as expected.
Obviously, I've been too liberal with the size of the URL, so I started playing around with the size. I found out that if I set the Image column to nvarchar(884), then the query runs fast as expected. If I set it up to 885 it's slow again.
This is not bound to one column, but to the size of all columns in the SELECT statement. If I just increase the size by one, performance differences are obvious.
I am not a DB expert, so any advice is welcomed.
PS In local SQL Server 2012 Express there are no performance issues.
PPS Running the query with OFFSET 0 ROWS FETCH NEXT 438 ROWS ONLY (without the row_count column of course) is also slow.
Row_number has to sort all the rows to get you things in the order you want. Adding a larger column into the result set implies that it all get sorted and thus is much slower/does more IO. You can see this, btw, if you enable "set statistics io on" and "set statistics time on" in SSMS when debugging problems like this. It will give you some insight into the number of IOs and other operations happening at runtime in the query:
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-statistics-io-transact-sql?view=sql-server-2017
In terms of what you can do to make this query go faster, I encourage you to think about some things that may change the design of your database schema a bit. First, consider whether you actually need the rows sorted in a specific order at all. If you don't need things in order, it will be cheaper to iterate over them without the row_number (by a measurable amount). So, if you just want to conceptually iterate over each entry once, you can do that by doing an order by something more static that is still monotonic such as the identity column. Second, if you do need to have things in sorted order, then consider whether they are changing frequently/infrequently. If it is infrequent, it may be possible to just compute and persist a column value into each row that has the relative order that you want (and update it each time you modify the table). In this model, you could index the new column and then request things in that order (in the top-level order by in the query - row_number not needed). If you do need things dynamically computed like you are doing and you need things in an exact order all the time, your final option is to move the URL to a second table and join with it after the row_number. This will avoid the sort being "wide" in the computation of row_number.
Best of luck to you

Data Densification Oracle query not efficient and slowly

I'm trying to cope with data densification for reporting purposes. I created two dimension tables (time & skills) and one data table (calls). Now since during certain time there are no calls in the data table, I will not get a time series including all the days. I now have studied many samples in the Internet how to cope with data densification and came up the the solution below.
Query works as intended, just it takes quite long and I have the feeling it is quite inefficient. Could you please advice me how to speed up query execution time?
Thank you and best regards,
Alex
SELECT DISTINCT
DAY_ID,
DAY_SHORT,
WEEK_ID,
MONTH_ID,
QUARTER_ID,
YEAR_ID,
AREA,
FIRMA,
PRODUCT,
PRODUCT_FAMILY,
PRODUCT_WFM,
LANGUAGE,
NVL(NCO,0) NCO,
NVL(NCH,0) NCH,
NVL(NCH60,0) NCH60,
NVL(LOST,0) LOST
FROM (
SELECT
DS.AREA,
DS.FIRMA,
DS.PRODUCT,
DS.PRODUCT_FAMILY,
DS.PRODUCT_WFM,
DS.LANGUAGE,
SUM(NVL(CH.HANDLED,0)+NVL(CH.LOST,0)) AS NCO,
SUM(CH.HANDLED) AS NCH,
SUM(CH.HANDLED_IN_SL) AS NCH60,
SUM(CH.LOST) AS LOST,
CH.DELIVER_DATE,
CH.SKILL_NAME
FROM
WFM.WFM_TBL_DIMENSION_SKILL DS
LEFT JOIN
OPS.VW_CALL_HISTORY CH
ON
DS.SPLIT_NAME=CH.SKILL_NAME
GROUP BY
DS.AREA,
DS.FIRMA,
DS.PRODUCT,
DS.PRODUCT_FAMILY,
DS.PRODUCT_WFM,
DS.LANGUAGE,
CH.DELIVER_DATE,
CH.SKILL_NAME
) temp_values
PARTITION BY
(
temp_values.AREA,
temp_values.FIRMA,
temp_values.PRODUCT,
temp_values.PRODUCT_FAMILY,
temp_values.PRODUCT_WFM,
temp_values.LANGUAGE,
temp_values.DELIVER_DATE,
temp_values.SKILL_NAME
)
RIGHT OUTER JOIN (
SELECT
DAY_ID,
DAY_SHORT,
WEEK_ID,
MONTH_ID,
QUARTER_ID,
YEAR_ID
FROM
WFM.WFM_TBL_DIMENSION_TIME
WHERE
DAY_ID BETWEEN(SELECT MIN(DELIVER_DATE) FROM OPS.VW_CALL_HISTORY) and TRUNC(sysdate-1)
) temp_time
ON
temp_values.DELIVER_DATE=temp_time.DAY_ID
Have a look at the execution plan and check which steps take very long. Use EXPLAIN PLAN to get it. Look for full table scans, see if indexes could help. Make sure you have up-to-date stats on the tables.
Since you are talking about dimension tables, this code is assumed to be from a data warehousing database. If it is, do you use partitions? Parallel DML? Are you using EE?
I reduced the arguments in PARTITION BY () to a single primary key (temp_values.SKILL_NAME) and joined the missing information from the skill dimension with a LEFT OUTER JOIN at the end of the above described query. In that way no more equal duplications are produced which leds me reduce SELECT DISTINCT to SELECT.
Additionally I added foreign & primary keys and let the query run in parallel mode.
It helps me to reduce execution time by over 80%, which is sufficient. Thanks guys!

Resources