Why isn't it using the index? - performance

Hello kind people of the internet.
I am wrecking my head trying to figure out why the optimiser isn't using my index for my query on Amazon Aurora. The query is dynamically created based on a report users have created through an applications UI, so I can't change the query per se.
The query uses these qualifiers
WHERE
table_in_question.deleted = 0
ORDER BY
table_in_question.date_modified DESC,
table_in_question.id DESC
I have an index, "my_index", which indexes these specific fields (table_in_question fields deleted, date_modified, ID) but MySQL doesn't use it.
The query takes approx 1200 ms to run. If I add FORCE INDEX (my_index) it takes about 120ms. Arguably about 10x faster - but unless I use force index, it doesn't use it.
Around 1 million rows are returned according to EXPLAIN, so I don't think it's a case of not using the index because of a low amount of rows being returned is the case.
The full query is
SELECT
case when some_table.id IS NOT NULL then some_table.id else "" end my_favorite,
table_in_question.date_entered,
table_in_question.name,
table_in_question.description,
table_in_question.pr_is_read,
table_in_question.pr_is_approved,
table_in_question.parent_type,
table_in_question.parent_id,
table_in_question.id,
table_in_question.date_modified,
table_in_question.assigned_user_id,
table_in_question.created_by
FROM
table_in_question
INNER JOIN (
SELECT
tst.team_user_is_member_of
FROM
team_sets_teams tst
INNER JOIN team_memberships team_membershipstable_in_question ON (
team_membershipstable_in_question.team_id = tst.team_id
)
AND (team_membershipstable_in_question.user_id = 'UUID')
AND (team_membershipstable_in_question.deleted = 0)
GROUP BY
tst.team_user_is_member_of
) table_in_question_tf ON table_in_question_tf.team_user_is_member_of = table_in_question.team_user_is_member_of
LEFT JOIN systemfavourites sf_table_in_question ON (sf_table_in_question.module = 'table_in_question')
AND (sf_table_in_question.record_id = table_in_question.id)
AND (sf_table_in_question.assigned_user_id = 'UUID')
AND (sf_table_in_question.deleted = '0')
INNER JOIN opportunities jt1_table_in_question ON (table_in_question.opportunity_id = jt1_table_in_question.id)
AND (jt1_table_in_question.deleted = 0)
LEFT JOIN another_table jt1_table_in_question_cstm ON jt1_table_in_question_cstm.id_c = jt1_table_in_question.id
LEFT JOIN systemfavourites table_in_question_favorite ON (table_in_question.id = table_in_question_favorite.record_id)
AND (table_in_question_favorite.deleted = '0')
AND (table_in_question_favorite.module = 'table_in_question')
AND (table_in_question_favorite.created_by = 'UUID')
LEFT JOIN users some_table ON (
some_table.id = table_in_question_favorite.modified_user_id
)
AND (some_table.deleted = 0)
WHERE
table_in_question.deleted = 0
ORDER BY
table_in_question.date_modified DESC,
table_in_question.id DESC
;
EXPLAIN shows this
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
table_in_question
ALL
idx_table_in_question_tmst_id
968234
10.0
Using where; Using temporary; Using filesort
Can anyone help explain how I make an index it will actually use by default?
Thanks.

Related

MAX DATE with Multiple tables/Inner Joins in Toad/Oracle

I have only been using Toad/Oracle for a few weeks so am still learning coding etc I have knowledge of SQL Code in Access and trying to now learn Oracle.
I need to return the max date from UCMRBILDAT from tbl BIC/AZUCDMO0100 but only from contracts which are contained in linked tbl LH_DAT
I have also tried a having MAX UCMRBILDAT but this didnt work.
UCMRBILDAT (/BIC/AZUCDMO0100)
UC_MRESULT (/BIC/AZUCDMO0100)
UC_MRSTAT (/BIC/AZUCDMO0100
UC_MRCAT (/BIC/AZUCDMO0100)
CONTRACT_NUMBER (LH_DAT)
UC_MR_NUMB (/BIC/AZUCDMO0100) + (/BIC/AZUCDMO0200)
SELECT UCMRBILDAT,
UC_MRESULT,
UC_MRSTAT,
UC_MRCAT
FROM LH_DAT
( SELECT CONTRACT_NUMBER, MAX (UCMRBILDAT) MXBD
FROM SAPSR3."/BIC/AZUCDMO0100"
GROUP BY CONTRACT_NUMBER) GMR
LEFT OUTER JOIN SAPSR3."/BIC/AZUCDMO0200"
ON (CONTRACT_NUMBER = UCCONTRACT)
INNER JOIN SAPSR3."/BIC/AZUCDMO0100"
ON ("/BIC/AZUCDMO0200".UC_MR_NUMB = "/BIC/AZUCDMO0100".UC_MR_NUMB)
WHERE CONTRACT_NUMBER = '2000014420'
AND UCMRBILDAT = MXBD
AND MR.CONTRACT_NUMBER = GMR.CONTRACT_NUMBER
Max bill date from BIC/AZUCDMO0100 but only for contracts contained in table LH_DAT
EDIT NEED MAX DATE FOR UCMRBILDAT ON BELOW SCRIPT
SELECT CONTRACT_NUMBER,
UCMRBILDAT,
UC_MRESULT,
UC_MRCAT
FROM LH_DAT
LEFT OUTER JOIN SAPSR3."/BIC/AZUCDMO0200"
ON (CONTRACT_NUMBER = UCCONTRACT)
INNER JOIN SAPSR3."/BIC/AZUCDMO0100"
ON ("/BIC/AZUCDMO0200".UC_MR_NUMB = "/BIC/AZUCDMO0100".UC_MR_NUMB)
WHERE CONTRACT_NUMBER = '2000014420'
AND "/BIC/AZUCDMO0200".SOURSYSTEM = 'SP'
AND "/BIC/AZUCDMO0200".UCDELE_IND <> 'X'
To get the max value of "BIC/AZUCDMO0100".UCMRBILDAT where there's a linked value from LH_DAT you'd want to use:
SELECT MAX(ba.UCMRBILDAT)
FROM SAPSR3."BIC/AZUCDMO0100" ba
INNER JOIN LH_DAT ld
ON ld.some_field = ba.some_field
There must be fields which link "BIC/AZUCDMO0100" and LH_DAT together, but in your query they're not specified. Find those fields, plug them in to the query above, and you should get the result you're looking for.

performance of a self join on oracle db

I have this self join that is very slow on oracle DB. I have put indexes on all fields concerned. Does anybody have advice on how to increase performance?
select count(tNew.idtariffa) CONT
from tariffe tAtt
join tariffe tNew on tAtt.idtariffa = tNew.idtariffa
where (tAtt.stato_attivo = 't')
and (tNew.stato_attivo = 'f')
and (tAtt.validity_date < tNew.validity_date)
and (tAtt.dataimport < tNew.dataimport)
and (tNew.validity_date < to_date('2017-6-26','YYYY-MM-DD'))
Try PUSH_PRED hint :
select /*+ NO_MERGE(tNew) PUSH_PRED(tNew) */
count(tNew.idtariffa) CONT
from tariffe tAtt
join tariffe tNew on tAtt.idtariffa = tNew.idtariffa
where (tAtt.stato_attivo = 't')
and (tNew.stato_attivo = 'f')
and (tAtt.validity_date < tNew.validity_date)
and (tAtt.dataimport < tNew.dataimport)
and (tNew.validity_date < to_date('2017-6-26','YYYY-MM-DD'))
Exists version is worth of try:
select count(1) cont
from tariffe n
where stato_attivo = 'f'
and validity_date < date '2017-06-26'
and exists ( select null
from tariffe
where idtariffa = n.idtariffa
and stato_attivo = 't'
and validity_date < n.validity_date
and dataimport < n.dataimport )
Performance tuning without details like data volumes, data skew, index defintions, explain plan, etc is just guessing.
So here are some more guesses :)
Your driving table should be tariffe tNew as that's the one you use to top the result set.
tNew.validity_date < to_date('2017-6-26','YYYY-MM-DD'))
Now, unless tNew.stato_attivo = 'f' is extremely selective you're going to be retrieving a large chunk of all the rows in the table (depending on how far back the records go) so a Full Table Scan would be the most efficient way of grabbing those records.
The join on tariffe tAtt is problematic because idtariffa is not a unique column. So the join is a set of tNew records against a set of tAtt records. These will be filtered in memory using the criteria in the WHERE clause.
" I have put indexes on all fields concerned"
Single column indexes won't help here. You might get some joy from a compound index on all the pertinent columns:
tariffe (stato_attivo , validity_date, idtariffa, dataimport)
This would be worth building if you run this query very often.
Any other guesses? Subquery factoring to hit the main table once. Doing a Full Table Scan just once would speed things up if tariffe has a lot of columns.
with cte as (
select stato_attivo , validity_date, idtariffa, dataimport
from tariffe
where validity_date < to_date('2017-6-26','YYYY-MM-DD'
)
select count(tNew.idtariffa) CONT
from cte tNew
join cte tAtt on tAtt.idtariffa = tNew.idtariffa
where (tAtt.stato_attivo = 't')
and (tNew.stato_attivo = 'f')
and (tAtt.validity_date < tNew.validity_date)
and (tAtt.dataimport < tNew.dataimport)

Simple condition break down query optimizer and its performance

I have a simple query:
select top 10 *
FROM Revision2UploadLocations r2l
inner join Revisions r on r2l.RevisionId = r.Id
INNER JOIN [Databases] [D] on [R].[DatabaseId] = [D].[Id]
INNER JOIN [SqlServers] [S] on [D].[InstanceId] = [S].[Id]
where --r.ValidationStatus in (2, 3) and
r2l.[ChecksumWasSent] = 0 AND r2l.Status = 2
This query is usually executed for 0.5s:
But the same query with uncommented condition is executed for 5s (!!!) and have a very strange execution plan (Revisions and SqlServers are joined although they have no linked columns and the most selective condition "r2l.[ChecksumWasSent] = 0 AND r2l.Status = 2" is executed at the end of query processing:
ValidationStatus is ordinary int not null column.
Columns Revision2UploadLocations.RevisionId, Revisions.DatabaseId, Databases.InstanceId are indexed.
Here is description of tables:
CREATE TABLE [SqlServers]
(
[Id] int identity(1,1) NOT NULL CONSTRAINT PK_SqlServers PRIMARY KEY,
...
)
CREATE TABLE [Databases](
[Id] int identity(1,1) NOT NULL CONSTRAINT PK_Databases PRIMARY KEY,
[InstanceId] int NOT NULL,
[Name] nvarchar(128) NOT NULL,
...
CONSTRAINT FK_Databases_SqlServers FOREIGN KEY ([InstanceId]) REFERENCES [SqlServers]([Id])
)
CREATE INDEX [IX_Databases_DatabaseId] ON [Databases] ([InstanceId] ASC)
CREATE TABLE [Revisions]
(
[Id] int identity(1, 1) NOT NULL,
[DatabaseId] int NOT NULL,
[BackupStatus] tinyint NOT NULL,
[ValidationStatus] tinyint NOT NULL,
...
CONSTRAINT PK_Revisions PRIMARY KEY([Id]),
CONSTRAINT FK_Revisions_Databases FOREIGN KEY ([DatabaseId]) REFERENCES [Databases]([Id])
)
CREATE INDEX [IX_Revisions_DatabaseId] ON [Revisions] ([DatabaseId] ASC)
CREATE TABLE [Revision2UploadLocations]
(
[Id] int NOT NULL IDENTITY (1, 1) CONSTRAINT PK_Revision2UploadLocations PRIMARY KEY,
[Status] int NOT NULL,
RevisionId int NOT NULL,
[ChecksumWasSent] bit NOT NULL,
CONSTRAINT FK_r2l_Revisions FOREIGN KEY ([RevisionId]) REFERENCES [Revisions]([Id])
)
CREATE INDEX [IX_Revision2UploadLocations_RevisionId] ON [Revision2UploadLocations] ([RevisionId] ASC)
How I can improve performance of this query?
EDIT Now I have some more details:
Some tables (SqlServers and Databases) have 1-10 records, but Revisions and Revision2UploadLocations) have 500K+ records, so query optimize decide to use full scan instead index search for small tables and take it first.
Query Performance Tuning (SQL Server Compact):
A small table is one whose contents fit in one or just a few data pages. Avoid indexing very small tables because it is typically more efficient to do a table scan.
As a temprary solution I tried to use query hint FORCE ORDER: Query Hint (SQL Server Compact)
and response time decreased from 5sec to 0.5sec.
But I don't think that it's a good solution.
The Geoffrey's solution doesn't give you the expected result.
The first statement selects 10 rows without garanties that their r.ValidationStatus are 2 or 3. So finaly, you can get less than 10 rows (or even no rows at all).
I think you can rewrite you query as this:
SELECT top 10 *
FROM Revisions r
INNER JOIN Revision2UploadLocations r2l
ON r2l.RevisionId = r.Id
AND r2l.[ChecksumWasSent] = 0
AND r2l.Status = 2
INNER JOIN [Databases] [D] on [D].[Id] = [R].[DatabaseId]
INNER JOIN [SqlServers] [S] on [S].[Id] = [D].[InstanceId]
WHERE r.ValidationStatus in (2, 3)
And if r2l.[ChecksumWasSent] datatype is bit (boolean) with :
more 0 than 1, you can create an index on RevisionId + Status
very much more 1 than 0, you can create and inde RevisionId + ChecksumWasSent + Status
I have found in the past if I insert first to a temp table the first part of your query, with the field you want to further filter on ("ValidationStatus"], then query your temp table the performance/speed is much better.
So the initial query would be this:
select *
into #tmp
FROM Revision2UploadLocations r2l
inner join Revisions r on r2l.RevisionId = r.Id
INNER JOIN [Databases] [D] on [R].[DatabaseId] = [D].[Id]
INNER JOIN [SqlServers] [S] on [D].[InstanceId] = [S].[Id]
where --r.ValidationStatus in (2, 3) and
r2l.[ChecksumWasSent] = 0 AND r2l.Status = 2
then the final select would be:
select * from #tmp
where ValidationStatus in (2,3)
No need for indexes, and I know its weird how the optimizer doesn't always work but this approach has been useful to me several times in the past.

How can I optimize these Postgres queries and DB performance?

I need some help in optimizing the following two queries which are almost similar but data selection is a little different. Here is my table definition
CREATE TABLE public.rates (
rate_id bigserial NOT NULL PRIMARY KEY,
prefix varchar(50),
rate_name varchar(30),
rate numeric(8,6),
intrastate_cost numeric(8,6),
interstate_cost numeric(8,6),
status char(3) DEFAULT 'act'::bpchar,
min_duration integer,
call_increment integer,
connection_cost numeric(8,6),
rate_type varchar(3) DEFAULT 'lcr'::character varying,
owner_type varchar(10),
start_date timestamp WITHOUT TIME ZONE,
end_date timestamp WITHOUT TIME ZONE,
rev integer,
ratecard_id integer,
/* Keys */
CONSTRAINT rates_pkey
PRIMARY KEY (id)
) WITH (
OIDS = FALSE
);
and two queries here I am using,
SELECT
rates.* ,
rc.ratecard_prefix ,
rc.default_lrn ,
rc.lrn_lookup_method ,
customers.customer_id ,
customers.balance ,
customers.channels AS customer_channels ,
customers.cps AS customer_cps ,
customers.balance AS customer_balance
FROM
rates
JOIN ratecards rc
ON rc.card_type = 'customer' AND
rc.ratecard_id = rates.ratecard_id
JOIN customers
ON rc.customer_id = customers.customer_id
WHERE
customers.status = 'act' AND
rc.status = 'act' AND
rc.customer_id = 'AB8KA191' AND
owner_type = 'customer' AND
'17606109973' LIKE concat (rc.ratecard_prefix, rates.prefix, '%') AND
rates.status = 'act' AND
now() BETWEEN rates. start_date AND
rates.end_date AND
customers.balance > 0
ORDER BY
LENGTH(PREFIX) DESC LIMIT 1;
and the second one,
SELECT
*
FROM
rates
JOIN ratecards rc
ON rc.card_type = 'carrier' AND
rc.ratecard_id = rates.ratecard_id
JOIN carriers
ON rc.carrier_id = carriers.carrier_id
JOIN carrier_switches cswitch
ON carriers.carrier_id = cswitch.carrier_id
WHERE
rates.intrastate_cost < 0.011648 AND
owner_type = 'carrier' AND
'16093960411' LIKE concat (rates.prefix, '%') AND
rates.status = 'act' AND
carriers.status = 'act' AND
now() BETWEEN rates.start_date AND
rates.end_date AND
rates.intrastate_cost <> -1 AND
cswitch.enabled = 't' AND
rates.rate_type = 'lrn' AND
rates.min_duration >= 6
ORDER BY
rates.intrastate_cost ASC,
LENGTH(rates.prefix) DESC,
cswitch.priority DESC
I created an index on field owner_type (not shown in schema above) but the query performance is not really what is expected. CPU usage becomes too high for the DB server and everything starts to slow down. Explain output for first query is here and the second one is here.
When the number of records are less in the table, things work fine, naturally, but when the number of records increases the CPU goes higher. I currently have around 341821 records in the table.
How can I improve the query execution or possibly change the query in order to speed things up?
I have set enable_bitmapscan = off because I think this gives me better performance. If set to on, every index scan is followed up with a Bitmap heap scan.
Things did ease up a little bit by changing the query to
SELECT
rates.*,
rc.ratecard_prefix,
rc.default_lrn,
rc.lrn_lookup_method,
customers.customer_id,
customers.balance,
customers.channels AS customer_channels,
customers.cps AS customer_cps,
customers.balance AS customer_balance
FROM
rates
JOIN ratecards rc
ON rc.card_type = 'customer' AND
rc.ratecard_id = rates.ratecard_id
JOIN customers
ON rc.customer_id = customers.customer_id
WHERE
customers.status = 'act' AND
rc.status = 'act' AND
rc.customer_id = 'AB8KA191' AND
owner_type = 'customer' AND
(CONCAT (rc.ratecard_prefix, rates.prefix) IN ('16026813306',
'1602681330',
'160268133',
'16026813',
'1602681',
'160268',
'16026',
'1602',
'160',
'16',
'1')) AND
rates.status = 'act' AND
now() BETWEEN rates.start_date AND
rates.end_date AND
customers.balance > 0
ORDER BY
LENGTH(PREFIX) DESC LIMIT 1
Postgres.conf is here
But still each Postgres process takes around 25%+ CPU. I now also using pgbouncer to utilize connection pooling but still not helping.

Using LINQ to get distinct items that do not join

I'm having problems running a LINQ query between two tables and returning an answer set that doesen't match.
TB_AvailableProducts
-Prod_ID
-Name
....
TB_Purchases
-Cust_ID
-Prod_ID
Is there a way to get all distinct products that a customer has not purchased by using 1 LINQ query, or do I have to be doing two separate queries, 1 for all products and 1 for purchased products, and compare the two?
This query will return all products, which do not have related record in purchases table.
int customerID = 1;
var query = from ap in context.TB_AvailableProducts
join p in context.TB_Purchases.Where(x => x.Cust_ID == customerID)
on ap.Prod_ID equals p.Prod_ID into g
where !g.Any()
select ap;
I don't think you need Distinct here if you don't have duplicated records in your products table.
Generated SQL query will look like:
SELECT ap.Prod_ID, ap.Name
FROM TB_AvailableProducts AS ap
WHERE NOT EXISTS (SELECT
1 AS C1
FROM TB_Purchases AS p
WHERE (1 = p.Cust_ID) AND (ap.Prod_ID = p.Prod_ID)
)

Resources