why count(*) is slow even with index? - oracle

This is my query:
select count(*)
FROM TB_E2V_DOCUMENTOS_CICLO D
WHERE (D.TIPOCLIENTE = null or null is null)
AND (D.TIPODOCUMENTOCLIENTE = null or null is null)
AND (D.NUMDOCUMENTOCLIENTE = null or null is null)
AND (D.BA = null or null is null)
AND (D.FA = null or null is null)
AND (D.NOMBRECLIENTE = null or null is null)
AND (D.NUMTELEFONO = null or null is null)
AND (D.NUMSUSCRIPCION = null or null is null)
AND (D.TIPORECIBO in ('Recibo'))
AND (D.NUMRECIBO = null or null is null)
AND (TO_DATE(D.FECHAEMISION, 'yyyy/MM/dd') BETWEEN TO_DATE('2019-5-1', 'yyyy-MM-dd') AND TO_DATE('2020-2-18', 'yyyy-MM-dd'))
AND (D.MONTORECIBO = null or null is null)
AND (D.NUMPAGINAS = 0 or 0 = 0)
AND (D.NOMBREARCHIVO = null or null is null)
AND (D.NEGOCIO = null or null is null)
AND (D.NOMBREMETADATACARGA = null or null is null)
AND (D.FECHACARGA = TO_DATE(null) or TO_DATE(null) is null);
This query returns
And when I do a Xplain For:
The cost is very high, but this query uses the index. The query lasts 10 seconds approximately.
How can I improve the performance of the query?
I'm using Oracle 12c

Notes: All of the " and ( = null or null is null)" predicates will always evaluate to true; Oracle does not define null so null does not equal null, so instead if you want to check for null then use "is null"
select * from dual where null = null; -- returns no rows
select * from dual where not (null <> null); -- returns no rows
select * from dual where null is null; -- returns 1 row
select * from dual where not(null is not null); -- returns 1 row
As far as indexing goes, you need an index that is selective (i.e. return much fewer rows) and is present in the where clause predicate. In this case it looks like a function-based index on TO_DATE(D.FECHAEMISION, 'yyyy/MM/dd')
along with D.TIPORECIBO is in order. The INDEX SKIP SCAN is used in this case probably because D.TIPORECIBO is not the leading column; INDEX SKIP SCANs are slower then INDEX RANGE SCANs because it needs to read more index blocks.

There are a few factors involved here:
First, this query is using the second (or third) part of a composite index, resulting in the SKIP SCAN.
Take a look at all indexes on the table and see what kind of index is on TIPORECIBO.
It is likely that this isn't the leading column. You might improve the performance by creating an index with TIPORECIBO as leading column, but it is unlikely--this appears to be a "type" column that might have only a few values, and not a good candidate for an index.
The second issue is that Oracle uses the index to get a set of candidate rows, then goes to the data blocks themselves to get the rows for further filtering.
A select count(*) will perform much better if Oracle doesn't need to fetch the data blocks. This can be achieved by creating an index that contains all of the data needed for the filter.
In your case, an index on TIPORECIBO and FECHAEMISION would mean that Oracle could go to the index alone without needing to access the data blocks.
The third issue is that you are applying TO_DATE to the FECHAEMISION column. If this is a DATE datatype, then you don't need the conversion and it is causing you trouble. If you do need the conversion, an option would be a function-based index on TO_DATE(D.FECHAEMISION, 'yyyy/MM/dd').
To tune this particular query, you can try a function-based composite index:
CREATE INDEX TB_E2V_DOCUMENTOS_CICLO_FX1 ON TB_E2V_DOCUMENTOS_CICLO(FECHAEMISION, TO_DATE(D.FECHAEMISION, 'yyyy/MM/dd'))
Finally, this query is clearly being generated from code:
lines like AND (D.BA = null or null is null) seem to be a way of excluding portions of the WHERE clause when the front-end passes a NULL. This would possibly be AND (D.BA = 'X' or 'X' is null) if a value were provided for that parameter.
As such, be careful when tuning for the current set of parameters, as any change in what generated this query will impact the effectiveness of your tuning.
If you have a way to influence how this query is generated, it would be nice to simply exclude those non-event filters when the values are not provided, though Oracle ought to be able to handle them as-is.

Related

Insert into table two and update table two for BigQuery in one query

I am using StandardSQL in BigQuery. I am writing a scheduled query which inserts records into table (2). Now, given that it's sceduled, I am trying to figure out how to update records in table (2) from the sceduled query, which is always inserting records into table (2).
In particular, when there is a record in table (2) but not generated by my query then I want to update table (2) and a boolean column to No.
Below is my query, where in the query would I add the update logic for table (2)?
INSERT INTO record (airport_name, icao_address, arrival, flight_number, origin_airport_icao, destination_airport_icao)
WITH
planes_stopped_in_airport AS (
SELECT
p.IATA_airport_code,
p.airport_name,
p.airport_ISO_country_code,
p.ICAO_airport_code,
timestamp,
a.icao_address,
a.latitude,
a.longitude,
a.altitude_baro,
a.speed,
heading,
callsign,
source,
a.collection_type,
vertical_rate,
squawk_code,
icao_actype,
flight_number,
origin_airport_icao,
destination_airport_icao,
scheduled_departure_time_utc,
scheduled_arrival_time_utc,
estimated_arrival_time_utc,
tail_number,
ingestion_time
FROM
`updates` a
JOIN
Polygons p
ON
1 = 1
WHERE
a.timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 20 MINUTE) and a.timestamp <= CURRENT_TIMESTAMP()
AND ( latitude IS NULL
AND longitude IS NULL
AND callsign IS NULL
AND speed IS NULL
AND heading IS NULL
AND altitude_baro IS NULL) IS FALSE
AND ST_DWithin( ST_GeogFromText( polygon ),
ST_GeogPoint(a.longitude,
a.latitude),
10)
AND a.collection_type = '1' -- and speed < 50
AND (origin_airport_icao IS NULL
AND destination_airport_icao IS NULL) IS FALSE )
SELECT
p.airport_name,
icao_address,
MIN(timestamp) AS Arrival,
flight_number,
origin_airport_icao,
destination_airport_icao
FROM
planes_stopped_in_airport p
WHERE
flight_number NOT IN (SELECT Distinct flight_number
FROM `table(2)`
)
GROUP BY
icao_address,
p.airport_name,
flight_number,
origin_airport_icao,
destination_airport_icao
HAVING
flight_number IS NOT NULL
ORDER BY
airport_name,
arrival
You can probably do it with MERGE statement, see details in https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement.
If I understood your requirements correctly, you need something like
MERGE dataset.Destination T
USING (SELECT * ...) S
ON T.key = S.key
WHEN MATCHED THEN
UPDATE SET T.foo = S.foo, T.bool_flag = FALSE
WHEN NOT MATCHED THEN
INSERT ...

Constraint check if row and other row not null on same time

I have a school 'project' to work on, which has some tables and one table needs to have a constraint which is not working out for me.
There are some tables like QUESTION, ANSWER and REACTION.
A reaction belongs with or a question or a answer but not both on the same time.
There by I have 2 rows:
question_id NUMBER,
answer_id NUMBER,
Both not null because the cant by null, but not on the same time.
I already made a constraint but isn't working..
/* CHECK if reaction belongs to an question or a answer NOT WORKING YET*/
CONSTRAINT CHECK_question_or_answer CHECK((answer_id != NULL AND question_id = NULL) OR (answer_id = NULL OR question_id != NULL))
Already tested the constraint and I can insert a value without a question_id or answer_id.
I hope it's a bit clear, if not, I am happy yo try explain myself better.
(still newby on SQL)
Thanks.
Your constraint:
CONSTRAINT CHECK_question_or_answer CHECK((answer_id != NULL AND profile_id = NULL) OR (answer_id = NULL OR profile_id != NULL))
Is always FALSE.
You need to use IS NULL or IS NOT NULL like:
CONSTRAINT CHECK_question_or_answer CHECK((answer_id IS NOT NULL AND profile_id IS NULL) OR (answer_id IS NULL OR profile_id IS NOT NULL))
This is because comparison operators != , = , > , <, combined with NULL produce NULL and are treated as false.
Demo:
SELECT 1
FROM dual
WHERE 1 IS NOT NULL;
SELECT 1
FROM dual
WHERE 1 != NULL;
From doc:
NULL values represent missing or unknown data. NULL values are used as
placeholders or as the default entry in columns to indicate that no
actual data is present. The NULL is untyped in SQL, meaning that it is
not an integer, a character, or any other specific data type.
Note that NULL is not the same as an empty data string or the
numerical value '0'. While NULL indicates the absence of a value, the
empty string and numerical zero both represent actual values.
While a NULL value can be assigned, it can not be equated with
anything, including itself.
Because NULL does not represent or equate to a data type, you cannot
test for NULL values with any comparison operators, such as =, <, or
<>.
The IS NULL and IS NOT NULL operators are used to test for NULL
values.
Do it the other way around. Put the id of the main table in the others like that
question table
--------------
id
text
...
answers table
-------------
id
question_id
text
...
reactions table
---------------
id
question_id
text
...
And question_id is never null. Then you can use a left join to get the results from both tables - one of them will have no results.
select *
from questions q
left join answers a on a.question_id = q.id
left join reactions r on r.question_id = q.id
While #lad2025s answer is good for two columns, if you wanted to extend the method to more than two it can get a bit cumbersome.
A flexible alternative is:
check ((case when answer_id is null then 0 else 1 end +
case when question_id is null then 0 else 1 end ) = 1)
It extends well to checking for a particular count of null (or non-null) values for an arbitrary number of columns.
For example, if you had column_1, column_2, column3, and column_4, and wanted at least 1 of them to be non-null, then:
check ((case when column_1 is null then 0 else 1 end +
case when column_2 is null then 0 else 1 end +
case when column_3 is null then 0 else 1 end +
case when column_4 is null then 0 else 1 end ) >= 1)

How to compare multiple columns under same row in a table?

Following columns of a table should not be equal in my where clause.
cd_delivery_address
cd_mail_delivery_address
cd_st_code
cd_mail_st_code
cd_zip
cd_mail_zip
Please find my code snippet to achieve this:
select * from table cd
where
(
(cd_mail_delivery_address <> cd_delivery_address or
(cd_mail_delivery_address is null and cd_delivery_address is not null) or
(cd_mail_delivery_address is not null and cd_delivery_address is null)
)
and (
cd.cd_city <> cd.cd_mail_city or
(cd.cd_city is null and cd_mail_city is not null) or
(cd_city is not null and cd_mail_city is null))
and (
cd.st_code <> cd.cd_mail_st_code or
(cd.st_code is null and cd_mail_st_code is not null) or
(st_code is not null and cd_mail_st_code is null)
)
and (
cd.cd_zip <> cd.cd_mail_zip or
(cd.cd_zip is null and cd_mail_zip is not null) or
(cd_zip is not null and cd_mail_zip is null)
)
)
All columns are varchar2 and i get correct output for this code. But is it a better way to compare multiple columns in pl sql? can i improve this code? Any suggestion would be helpful.
You could replace your null checks with NVL function something like this:
...
NVL(cd_mail_delivery_address,'_') <> NVL(cd_delivery_address,'_')
...
it's definitively more readable but I'm not sure about query efficency
I have done it for two columns using a join:
select a.cd_delivery_address,b.cd_mail_delivery_address
from cd a inner join cd b
where a.cd_delivery_address <> b.cd_mail_delivery_address and
a.cd_delivery_address = b.cd_delivery_address
Here null checking condition will be omitted and will reduce the number of conditions, but there is a performance impact since join is involved.

Change DataSet Designer parameter inference

I'm using the VS2010 DataSet designer to make some select queries with optional parameters similar to this:
SELECT CustomerID, FirstName, JoinDate, etc
FROM tblCustomers
WHERE (
(#CustomerID IS NULL OR CustomerID = #CustomerID) AND
(#FirstName IS NULL OR FirstName = #FirstName) AND
(#JoinedBefore IS NULL OR JoinDate < #JoinedBefore) AND
(#JoinedAfter IS NULL OR JoinDate > #JoinedAfter) AND
.. etc ..
)
The inference for these properties data-types and allow DB null is almost always wrong. I end up with string types set for date time and vice versa. Over half the fields are always marked as non-null.
That obviously wreaks havoc on my queries. I can manually change these inference's, but every time I have to update the TableAdapter, it resets them all to what it thinks is best! Anyone know how to either a) get the inferences right, or b) override them in a permanent way?
It seems VS infers the data type based on the first occurrence of the parameter in the query. Because I put my #Parater IS NULL OR... first, that confused the designer and caused it to infer wrong a lot of the time. I swapped the order of my query and now it infers perfectly:
SELECT CustomerID, FirstName, JoinDate, etc
FROM tblCustomers
WHERE (
(CustomerID = #CustomerID OR #CustomerID IS NULL AND
(FirstName = #FirstName OR #FirstName IS NULL) AND
(JoinDate < #JoinedBefore OR #JoinedBefore IS NULL) AND
(JoinDate > #JoinedAfter OR #JoinedAfter IS NULL) AND
.. etc ..
)

Convert this SQL query to Linq (Not Exists + sub query)

I would like this SQL to be converted to LINQ. (it shouldl select rows from input which do not exist in table production based on 3 columns. If a column in both tables contains NULL, it should be considered as having the same value)
SELECT i.* FROM INPUT AS i
WHERE NOT EXISTS
(SELECT p.Agent FROM Production AS p
WHERE ISNULL(i.CustID,'') <> ISNULL(p.CustID,'')
AND ISNULL(i.CustName,'') <> ISNULL(p.CustName,'')
AND ISNULL(i.household,'') <> ISNULL(p.Household,''))
First of all - this is not a good SQL query. Every column is wrapped in a non-sargable function which means that the engine won't be able to take advantage of any indexes on any of those columns (assuming you have any).
Let's start by rewriting this as a semi-decent SQL query:
SELECT i.*
FROM Input i
LEFT JOIN Production p
ON (p.CustID = i.CustID OR (p.CustID IS NULL AND i.CustID IS NULL))
AND (p.CustName = i.CustName OR (p.CustName IS NULL AND i.CustName IS NULL))
AND (p.Household = i.Household OR
(p.Household IS NULL AND i.Household IS NULL))
WHERE p.CustID IS NULL
Now having said this, LEFT JOIN / IS NULL is not great for efficiency either, but we don't have much choice here because we're comparing on multiple columns. Based on your column names, I'm starting to wonder if the schema is properly normalized. A CustID should most likely be associated with one and only one CustName - the fact that you have to compare both of these seems a bit odd. And Household - I'm not sure what that is, but if it's a varchar(x)/nvarchar(x) column then I wonder if it might also have a 1:1 relationship with the customer.
If I'm speculating too much here then feel free to dismiss this paragraph; but just in case, I want to say that if this data isn't properly normalized, normalizing it would make it much easier and faster to query on:
SELECT *
FROM Input
WHERE CustID NOT IN (SELECT CustID FROM Production)
Anyway, going back to the first query, since that's what we have to work with for now. Unfortunately it's impossible to create a join on those specific conditions in Linq, so we need to rewrite the SQL query as something slightly worse (because we now have to read from Input twice):
SELECT *
FROM Input
WHERE <Primary Key> NOT IN
(
SELECT i.<Primary Key>
FROM Input i
INNER JOIN Production p
ON (p.CustID = i.CustID OR (p.CustID IS NULL AND i.CustID IS NULL))
AND (p.CustName = i.CustName OR (p.CustName IS NULL AND i.CustName IS NULL))
AND (p.Household = i.Household OR
(p.Household IS NULL AND i.Household IS NULL))
)
Now we have something we can finally translate to Linq syntax. We still can't do the join explicitly, which would be best, but we go old-school, start from the cartesian join and toss the join conditions into the WHERE segment, and the server will still be able to sort it out:
var excluded =
from i in input
from p in production
where
((p.CustID == i.CustID) || ((p.CustID == null) && (i.CustID == null))) &&
((p.CustName == i.CustName) ||
((p.CustName == null) && (i.CustName == null))) &&
((p.Household == i.Household) ||
((p.Household == null) && (i.Household == null)));
select i.PrimaryKey;
var results =
from i in input
where !excluded.Contains(i.PrimaryKey)
select i;
I'm assuming here that you have some sort of primary key on the table. If you don't, you've got other problems, but you can get around this particular problem using EXCEPT:
var excluded =
from i in input
from p in production
where
((p.CustID == i.CustID) || ((p.CustID == null) && (i.CustID == null))) &&
((p.CustName == i.CustName) ||
((p.CustName == null) && (i.CustName == null))) &&
((p.Household == i.Household) ||
((p.Household == null) && (i.Household == null)));
select i;
var results = input.Except(excluded);

Resources