How do I optimize the following SQL query for performance? - performance

How do I optimize the following SQL query for performance?
select * from Employee where CNIC = 'some-CNIC-number'
Will using alias help making it a little faster?
I am using Microsoft SQL Server.

This is better if you tell us what RDBMS you are using, but...
1 - Don't do SELECT *. Specify which columns you need. Less data = faster query
2 - For indexing, make sure you have an index on CNIC. You also want a good clustered index on a primary key (preferably something like an ID number)
3 - You put the number in single quotes ' ' which indicates you may have it as a varchar column. If it will always be NUMERIC, it should be an int/bigint data type. This takes up less space and will be faster to retrieve and index by.

Create an index on CNIC:
CREATE INDEX ix_employee_cnic ON employee (cnic)

First thing, as I see this column will be used for storing Id card nos, then you can make your coulmn of type int rather than varchar or nvarchar as searching will faster on an integer type as compared to varchar or nvarchar.
Second, use with (no lock), like
select * from Employee with (nolock) where CNIC = 'some-CNIC-number'
This is to minimize the chances of a deadlock.

Related

how to get all values from LowCardinality(String) datatype more optimal way?

i am new to clickhouse,and amazed by clickhosue's performance.
there have one thing bother me, i have a table use LowCardinality(String) type.
DDL like
create table test.table1(code LowCardinality(String),xxx,xxx,xxx ) engine=MergeTree xxx
i want get all distinct value from table.
there is my select query
select distinct code from test.table1;
but the mount of data is quite large, about 14 billion which may take 40s to complete,and is processed all rows in table,
so i wonder if there have any method to get data from LowCardinality datatype?
Try this query:
select code /*, count() */
from test.table1
group by code

Datatype difference in procedure parameter and sql query inside it

In my backend procedure i have a varchar2 parameter and i am using it in the SQL query to search with number column. Will this cause any kind of performance issues ?
for ex:
Proc (a varchar)
is
select * from table where deptno = a;
end
Here deptno is number column in table and a is varchar .
It might do. The database will resolve the differences in datatype by casting DEPTNO to a VARCHAR2. This will prevent the optimizer from using any (normal) index you have on that column. Depending on the data volumes and distribution, an indexed read may not always be the most efficient access path, in which case the data conversion doesn't matter.
So it does depend. But what are your options if it does matter (you have a highly selective index on that column)?
One solution would be to apply an explicit data conversion in your query:
select * from table
where deptno = to_number(a);
This will cause the query to fail if A contains a value which won't convert to a number.
A better solution would be to change the datatype of A so that the calling program can only pass a numeric value. This throws the responsibility for duff data where it properly belongs.
The least attractive solution is to keep the procedure's signature and the query as is, and build a function-based index on the column:
create index emp_deptchar_fbi on emp(to_char(deptno));
Read the documentation to find out more about function-based indexes.

Stumped - Oracle won't use index when value is specified but will when function returns same value

I'm currently working with a database that has two indexes for a specific table. The index I want has two columns "Name" (varchar2) and "Time" (number). When I write the query
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN STARTVALUE AND ENDVALUE
(where STARTVALUE and ENDVALUE are numbers) it does not use the index. However if I use the following query instead
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN MY_FUNC('STARTQUAL') AND MY_FUNC('ENDQUAL')
it does.
The only difference I can think of is that MY_FUNC explicitly returns a value of type NUMBER - is it possible that the query optimizer is confused about the data type for STARTVALUE and ENDVALUE specified explicitly and is refusing to use the index (I saw some similar threads that mentioned a type conflict was the cause).
Note:
The value being returned by MY_FUNC is EXACTLY the same value that I am specifying in the first query.
The index in question is UNDOUBTEDLY (absolutely no question) the correct index to be using and execution times are orders of magnitude faster when it does.
I have even specified a query hint with the first query and it refuses to use the index.
I know there must be something silly / simple that I'm overlooking but I just can't see it.
Thanks in advance for your assistance.
Alternatively, Oracle could be optimizing the queries differently based on whether the query involves literal values or bound values.
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN 7 AND 41;
I'll bet Oracle knows something about the distribution of data in the TIME column, and is making a guess - perhaps using outdated statistics - as to what percentage of rows and blocks (i.e. the selectivity) of that column is. Check to see if there's a histogram on that column.
However, a query like this:
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN MY_FUNC('7') AND MY_FUNC('41');
is likely to be optimized as semantically equivalent to:
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN :some_bind AND :some_other_bind;
Because Oracle doesn't know what MY_FUNC('7') does - or even that MY_FUNC('7') will always return the same value of 7 - unless you've told Oracle the function's deterministic. So my experience is that Oracle takes a stab in the dark, for the most part, and tends to prefer an index with a high clustering factor. It seems to guess that even if the index isn't the best choice, at least it minimizes the downside risk by visiting as few data blocks as possible.
My recommendation is to find out for yourself why it's behaving differently - take a 10053 trace of each query:
alter session set events = '10053 trace name context forever;
run sql
alter session set events = '10053 trace name context off;
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN STARTVALUE AND ENDVALUE
Here, you have TIME which is a NUMBER, and STARTVALUE and ENDVALUE which are strings (according to your comment). Therefore, an implicit conversion is done - i.e. your query is effectively:
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TO_CHAR(TIME) BETWEEN STARTVALUE AND ENDVALUE
Unless you have a function-based index on TO_CHAR(TIME), it won't use an index.
Therefore, you must tell Oracle that you always expect the string parameters to be convertable to numbers, i.e.:
SELECT SOMETHING
FROM MYTABLE
WHERE NAME = 'SOME-NAME'
AND TIME BETWEEN TO_NUMBER(STARTVALUE) AND TO_NUMBER(ENDVALUE)
(It's always good practice to avoid implicit conversions, especially in queries, anyway)

Getting count from large tables

I was trying to get the count from a table with millions of entries. My query looks somewhat like this:
Select count(*)
from Users
where status = 'A' and office_id = '000111' and user_type = 'C'
Status can be A or C, User Type can be C or R.
Status, Office_id and User_type are Strings
The result has around 10 million rows, and its taking a lot of time. I just want the total count.
Would appreciate if anyone could tell me why its taking this much time, and workaround if any.
Do let me know in case of any more details required.
The database engine is Oracle 11g
Edit: I Added index for all three columnns. Still theres no improvement. Also tried the below query, but it always returns the total count in the table without checking the conditions.
SELECT COUNT(office_id_key)
FROM Users
WHERE EXISTS (SELECT * FROM Users WHERE status = 'A' AND office_id = '000111' AND user_type = 'C')
Why not just simply create indexes on the table on age and place this way your search will be faster then simply scanning the entire table for these values.
CREATE INDEX age_index ON Employee(age);
CREATE INDEX place_index ON Employee(place);
This should speed up the process.
AMENDED BASED ON QUERY CHANGE
CREATE INDEX status_index ON Users(status);
CREATE INDEX office_id_index ON Users(office_id);
CREATE INDEX user_type_index ON Users(user_type);
You'll want to create the following multi-column index on the Users table to improve the query:
(office_id, status, user_type)
The database can use a "covering" index with COUNT(*). Create the index with the columns in that order, due to cardinality.
After adding the indexes, I think changing where to where exists and a subquery may help as well.
Edit2: removed exists as it was returning all valid, usually the subquery has multiple joins, but I guess the case with one table returns all true. I read that count is optimized to act similar to exists when it has only one table and no where clause, so I treat the results as a table. Hopefully, this will have the same quick results.
select count(1) from
(select 1 from Employee where age = '25' and place = 'bricksgate')
Edit: When you use 'where exists' the db server doesn't load your data into memory and also takes advantage of the indexes because you will be reading values from the indexes not doing costly table lookups. You may also want to change count(*) to count(place) - that way it will limit the fields to an indexed field as well.
In your original query, your data was doing table lookups and then loading them into memory just to be counted.
count(1) works faster than count(*)

Oracle - Understanding the no_index hint

I'm trying to understand how no_index actually speeds up a query and haven't been able to find documentation online to explain it.
For example I have this query that ran extremely slow
select *
from <tablename>
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString' and
Action_='_someAction_' and
Timestamp_ >= trunc(sysdate - 2)
And one of our DBAs was able to speed it up significantly by doing this
select /*+ NO_INDEX(TAB_000000000019) */ *
from <tablename>
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString' and
Action_='_someAction_' and
Timestamp_ >= trunc(sysdate - 2)
And I can't figure out why? I would like to figure out why this works so I can see if I can apply it to another query (this one a join) to speed it up because it's taking even longer to run.
Thanks!
** Update **
Here's what I know about the table in the example.
It's a 'partitioned table'
TAB_000000000019 is the table not a column in it
field1 is indexed
Oracle's optimizer makes judgements on how best to run a query, and to do this it uses a large number of statistics gathered about the tables and indexes. Based on these stats, it decides whether or not to use an index, or to just do a table scan, for example.
Critically, these stats are not automatically up-to-date, because they can be very expensive to gather. In cases where the stats are not up to date, the optimizer can make the "wrong" decision, and perhaps use an index when it would actually be faster to do a table scan.
If this is known by the DBA/developer, they can give hints (which is what NO_INDEX is) to the optimizer, telling it not to use a given index because it's known to slow things down, often due to out-of-date stats.
In your example, TAB_000000000019 will refer to an index or a table (I'm guessing an index, since it looks like an auto-generated name).
It's a bit of a black art, to be honest, but that's the gist of it, as I understand things.
Disclaimer: I'm not a DBA, but I've dabbled in that area.
Per your update: If field1 is the only indexed field, then the original query was likely doing a fast full scan on that index (i.e. reading through every entry in the index and checking against the filter conditions on field1), then using those results to find the rows in the table and filter on the other conditions. The conditions on field1 are such that an index unique scan or range scan (i.e. looking up specific values or ranges of values in the index) would not be possible.
Likely the optimizer chose this path because there are two filter predicates on field1. The optimizer would calculate estimated selectivity for each of these and then multiply them to determine their combined selectivity. But in many cases this will significantly underestimate the number of rows that will match the condition.
The NO_INDEX hint eliminates this option from the optimizer's consideration, so it essentially goes with the plan it thinks is next best -- possibly in this case using partition elimination based on one of the other filter conditions in the query.
Using an index degrades query performance if it results in more disk IO compared to querying the table with an index.
This can be demonstrated with a simple table:
create table tq84_ix_test (
a number(15) primary key,
b varchar2(20),
c number(1)
);
The following block fills 1 Million records into this table. Every 250th record is filled with a rare value in column b while all the others are filled with frequent value:
declare
rows_inserted number := 0;
begin
while rows_inserted < 1000000 loop
if mod(rows_inserted, 250) = 0 then
insert into tq84_ix_test values (
-1 * rows_inserted,
'rare value',
1);
rows_inserted := rows_inserted + 1;
else
begin
insert into tq84_ix_test values (
trunc(dbms_random.value(1, 1e15)),
'frequent value',
trunc(dbms_random.value(0,2))
);
rows_inserted := rows_inserted + 1;
exception when dup_val_on_index then
null;
end;
end if;
end loop;
end;
/
An index is put on the column
create index tq84_index on tq84_ix_test (b);
The same query, but once with index and once without index, differ in performance. Check it out for yourself:
set timing on
select /*+ no_index(tq84_ix_test) */
sum(c)
from
tq84_ix_test
where
b = 'frequent value';
select /*+ index(tq84_ix_test tq84_index) */
sum(c)
from
tq84_ix_test
where
b = 'frequent value';
Why is it? In the case without the index, all database blocks are read, in sequential order. Usually, this is costly and therefore considered bad. In normal situation, with an index, such a "full table scan" can be reduced to reading say 2 to 5 index database blocks plus reading the one database block that contains the record that the index points to. With the example here, it is different altogether: the entire index is read and for (almost) each entry in the index, a database block is read, too. So, not only is the entire table read, but also the index. Note, that this behaviour would differ if c were also in the index because in that case Oracle could choose to get the value of c from the index instead of going the detour to the table.
So, to generalize the issue: if the index does not pick few records then it might be beneficial to not use it.
Something to note about indexes is that they are precomputed values based on the row order and the data in the field. In this specific case you say that field1 is indexed and you are using it in the query as follows:
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString'
In the query snippet above the filter is on both a variable piece of data since the percent (%) character cradles the string and then on another specific string. This means that the default Oracle optimization that doesn't use an optimizer hint will try to find the string inside the indexed field first and also find if the data it is a sub-string of the data in the field, then it will check that the data doesn't match another specific string. After the index is checked the other columns are then checked. This is a very slow process if repeated.
The NO_INDEX hint proposed by the DBA removes the optimizer's preference to use an index and will likely allow the optimizer to choose the faster comparisons first and not necessarily force index comparison first and then compare other columns.
The following is slow because it compares the string and its sub-strings:
field1_ like '%someGenericString%'
While the following is faster because it is specific:
field1_ like 'someSpecificString'
So the reason to use the NO_INDEX hint is if you have comparisons on the index that slow things down. If the index field is compared against more specific data then the index comparison is usually faster.
I say usually because when the indexed field contains more redundant data like in the example #Atish mentions above, it will have to go through a long list of comparison negatives before a positive comparison is returned. Hints produce varying results because both the database design and the data in the tables affect how fast a query performs. So in order to apply hints you need to know if the individual comparisons you hint to the optimizer will be faster on your data set. There are no shortcuts in this process. Applying hints should happen after proper SQL queries have been written because hints should be based on the real data.
Check out this hints reference: http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm
To add to what Rene' and Dave have said, this is what I have actually observed in a production situation:
If the condition(s) on the indexed field returns too many matches, Oracle is better off doing a Full Table Scan.
We had a report program querying a very large indexed table - the index was on a region code and the query specified the exact region code, so Oracle CBO uses the index.
Unfortunately, one specific region code accounted for 90% of the tables entries.
As long as the report was run for one of the other (minor) region codes, it completed in less than 30 minutes, but for the major region code it took many hours.
Adding a hint to the SQL to force a full table scan solved the problem.
Hope this helps.
I had read somewhere that using a % in front of query like '%someGenericString%' will lead to Oracle ignoring the INDEX on that field. Maybe that explains why the query is running slow.

Resources