How to reduce cost on select statement? - performance

I have a table in oracle 10g with around 51 columns and 25 Million number of records in it. When I execute a simple select query on the table to extract 3 columns I am getting the cost too high around 182k. So I need to reduce the cost effect. Is there any possible way to reduce it?
Query:
select a,b,c
from X
a - char
b - varchar2
c - varchar2
TIA

In cases like this it's difficult to give good advice without knowing why you would need to query 25 million records. As #Ryan says, normally you'd have a WHERE clause; or, perhaps you're extracting the results into another table or something?
A covering index (i.e. over a,b,c) would probably be the only way to make any difference to the performance - the query could then do a fast full index scan, and would get many more records per block retrieved.

Well...if you know you only need a subset of those values, throwing a WHERE clause on there would obviously help out quite a bit. If you truly need all 25 million records, and the table is properly indexed, then I'd say there's really not much you can do.

Yes, better telling the purpose of the select as jeffrey Kemp said.
If normal select, you just need to give index to your fields that mostly you can do, provide table statistic on index (DBMS_STATS.GATHER_TABLE_STATS), check the statistic of each field to be sure your index is right (Read: http://bit.ly/qR12Ul).
If need to load to another table, use cursor, limit the records of each executing and load to the table via the bulk insert (FORALL technique).

Related

SQL Azure - query with row_number() executes slow if columns with nvarchar of big size are included

I have the following query (generated by Entity Framework with standard paging. This is the inner query and I added the TOP 438 part):
SELECT TOP 438 [Extent1].[Id] AS [Id],
[Extent1].[MemberType] AS [MemberType],
[Extent1].[FullName] AS [FullName],
[Extent1].[Image] AS [Image],
row_number() OVER (ORDER BY [Extent1].[FullName] ASC) AS [row_number]
FROM [dbo].[ShowMembers] AS [Extent1]
WHERE 3 = CAST( [Extent1].[MemberType] AS int)
ShowMembers table has about 11K rows, but only 438 with MemberType == 3. The 'Image' column is of type nvarchar(2000) that holds the URL to the image on a CDN. If I include this column in the query (only in SELECT part), the query chokes up somehow and generates result in a range between 2-30 seconds (it differs in different runs). If I comment out that column, query runs fast as expected. If I include the 'Image' column, but comment out the row_number column, query also runs fast as expected.
Obviously, I've been too liberal with the size of the URL, so I started playing around with the size. I found out that if I set the Image column to nvarchar(884), then the query runs fast as expected. If I set it up to 885 it's slow again.
This is not bound to one column, but to the size of all columns in the SELECT statement. If I just increase the size by one, performance differences are obvious.
I am not a DB expert, so any advice is welcomed.
PS In local SQL Server 2012 Express there are no performance issues.
PPS Running the query with OFFSET 0 ROWS FETCH NEXT 438 ROWS ONLY (without the row_count column of course) is also slow.
Row_number has to sort all the rows to get you things in the order you want. Adding a larger column into the result set implies that it all get sorted and thus is much slower/does more IO. You can see this, btw, if you enable "set statistics io on" and "set statistics time on" in SSMS when debugging problems like this. It will give you some insight into the number of IOs and other operations happening at runtime in the query:
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-statistics-io-transact-sql?view=sql-server-2017
In terms of what you can do to make this query go faster, I encourage you to think about some things that may change the design of your database schema a bit. First, consider whether you actually need the rows sorted in a specific order at all. If you don't need things in order, it will be cheaper to iterate over them without the row_number (by a measurable amount). So, if you just want to conceptually iterate over each entry once, you can do that by doing an order by something more static that is still monotonic such as the identity column. Second, if you do need to have things in sorted order, then consider whether they are changing frequently/infrequently. If it is infrequent, it may be possible to just compute and persist a column value into each row that has the relative order that you want (and update it each time you modify the table). In this model, you could index the new column and then request things in that order (in the top-level order by in the query - row_number not needed). If you do need things dynamically computed like you are doing and you need things in an exact order all the time, your final option is to move the URL to a second table and join with it after the row_number. This will avoid the sort being "wide" in the computation of row_number.
Best of luck to you

after alter system flush shared_pool low performance Oracle

We did refactoring and replaced 2 similar requests with parameterized request
a.isGood = :1
after that request that used this parameter with parameter 'Y' was executed longer that usually (become almost the same with parameter 'N'). We used alter system flush shared_pool command and request for parameter 'Y' has completed fast (as before refactoring) while request with parameter 'N' hangs for a long time.
As you could understand number of lines in data base with parameter 'N' much more then with 'Y'
Oracle 10g
Why it happened?
I assume that you have an index on that column, otherwise the performance would be the same regardless of the Y/N combination. I have seen this happening quite bit on 10g+ due to Oracle's optimizer Bind Peeking combined to histograms on columns with skewed data distribution. The histograms get created automatically when one gathers tables statistics using the parameter method_opt with 'FOR ALL COLUMNS SIZE AUTO' (among other values). Oracle optimizes the query for the value in the bind variables provided in the very first execution of that query. If you run the query with Y the first time, Oracle might want to use an index instead of a full table scan, since Y will return a small quantity of rows. The next time you run the query with N, then Oracle will repeat the first execution plan, which happens to be a poor choice for N, since it will return the vast majority of rows.
The execution plans are cached in the SGA. Once you flush it, you get a brand new execution plan the very first time the query runs again.
My suggestion is:
Obtain the explain plan of both original queries (one with a hardcoded Y and one with a hardcode N). Investigate if the two plans use different indexes or one has a much higher Cost than the other. I have the feeling that one uses a full table scan and the other uses an index. The first one should be faster for N and the second should be faster for Y.
Try to remove the statistics on the table and see if it makes a difference on the query that has the bind variable. Later you need to gather statistics again for the table or other queries on that table might suffer.
You can also gather statistics for that one table using method_opt => FOR ALL COLUMNS SIZE 1. That will keep the statistics without the histograms on any columns of that table.
A bitmap index on this column might fix the issue as well. Indexes on a column that have only two possible values (Y and N) are not exactly very efficient.
If column isGood has 99,000 'N' values and 1,000 'Y' values and you run with the condition isGood = 'Y', then it may be appropriate to use an index to find the results: you are returning 1% of the rows. If you run the query with the condition isGood = 'N', a full table scan would be more appropriate since you are returning most of the table anyway. If you were to use an index for the N condition, you would be doing an extra index lookup for every data item lookup.
Although the general rule is that bind parameters are good, it can be problematic in this kind of instance if really two different plans are required for the query. With the bind parameter scenario:
SELECT * FROM x WHERE isGood = :1
The statement will be parsed and a plan computed and saved in the sql cache. The same plan will be used for both query scenarios which is not desirable. But:
SELECT * FROM x WHERE isGood = 'Y'
SELECT * FROM x WHERE isGood = 'N'
will result in two plans being stored in the sql cache, hopefully each with the appropriate plan for the query. Version 11g avoids this problem with adaptive cursor sharing, which can use different plans for different bind variable values.
You need to look at your plans (EXPLAIN PLAN) to see what is happening in your case. Flush the cache, try one method, examine the plan; try the other, examine the plan. It might give you an idea what is happening in your case. There are a bunch of other topics you might follow up on that may help, for example:
using a hint to force the use of an index
cursor_sharing parameter
histograms on statistics

different columns on select query results different costs

There is an index at table invt_item_d on (item_id & branch_id & co_id) columns.
The plan results for the first query are TABLE ACCESS FULL and cost is 528,
results for the second query are INDEX FAST FULL SCAN (my index) and cost is 27.
The only difference is, as you can see, the selected column is used in index on the second query.
Is there something wrong with this? And please, can you tell me what should I do to fix this at db administration level?
select d.qty
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
select d.item_id
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
EDIT:
i made a new query and this query's cost is 529, with TABLE ACCESS FULL.
select qty from invt_item_d
so it doesn't matter if i use an index or not. Some says this is normal, is this a normal behaviour really?
In the first case, the table must be accessed, since the "qty" column is only stored in the table.
In the second case, all the columns used in the query can be read from the index, skipping the table read altogether.
You can add another index on columns (item_id, branch_id, co_id, qty) and it will most probably be used in the first query.
From the Oracle documentation: http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm
A fast full index scan is a full index scan in which the database
accesses the data in the index itself without accessing the table, and
the database reads the index blocks in no particular order.
Fast full index scans are an alternative to a full table scan when
both of the following conditions are met:
The index must contain all columns needed for the query.
A row containing all nulls must not appear in the query result set. For this result to be guaranteed, at least one column in the
index must have either:
A NOT NULL constraint
A predicate applied to it that prevents nulls from being considered in the query result set
This is exactly the main purpose of using index -- make search faster.
Querying columns with indexes are faster compared to querying columns without indexes.
Its basic oracle knowledge.
I am adding another answer because it seems to be more convinient.
First:
" i doesn't hit the index because there are 34000 rows, not millions". This is COMPLETELY WRONG and a dangerous understanding.
What I meant was, if there are a few thousand rows, and the index is not hit(oracle engine does a full table scan(TABLE ACCESS FULL) then), its not a big deal. Oracle is fast enough to read few thousand rows in a matter of a second(even without indexes) , and hence you wont feel the difference.The query is still slower(than the occasion when there is an index) , but its is so minimally slower that you wont feel the difference.
But, if there are millions of rows, the execution of the query will be much, much slower without index ( as this time it will scan millions of rows in a full table scan)and your performance will be hit.
Second: Why on earth do you have to loop over a table with 34000 rows, that too 4000 times???
Thats a terrible approach. Avoid loops as much as possible.There has to be a better approach!
Third:
You can force the oracle optimiser to hit the index by using the index hint.You will need to know the name of the index for that.
select /*+ index(invt_item_d <index_name>) */
d.qty
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
Here is the link to a stack overflow question on index hint

Real Time issues: Oracle Performance tuning (types / indexes / plsql / queries)

I am looking for a real time solution...
Below are my DB columns. I am using Oracle10g. Please help me in defining table types / indexes and tuned PLSQL / query (both) for the updates and insertion
Insert and Update queries are simple but here we need to take care of the performance because my system will execute such 200 times per second.
Let me know... should I use procedures or simple queries? It is requested to write tuned plsql and query with proper DB table types / indexes.
I would really like to see the performance of my system after continuous 200 updates per second
DB table (columns) (I can change the structure if required so please let me know...)
Play ID - ID
Type - Song or Message
Count - Summation of total play
Retries - Summation of total play, if failed.
Duration - Total Duration
Last Updated - Late Updated Date Time
Thanks in advance ... let me know in case of any confusion...
You've not really given a lot of detail about WHAT you are updating etc.
As a basis for you to write your update statements, don't use PL/SQL unless you cannot achieve what you want to do in SQL as the context switching alone will hurt your performance before you even get round to processing any records.
If you are able to create indexes specifically for the update then index the columns that will appear in your update statement's WHERE clause so the records can be found quickly before being updated.
As for inserting, look up the benefits of the /*+ append */ hint for inserting records to see if it will benefit your particular case.
Finally, the table structure you will use will depend on may factors that you haven't even begun to touch on with the details you've supplied, I suggest you either do some research on DB structure or ask your DBA's for a 101 class in it.
Best of luck...
EDIT:
In response to:
Play ID - ID ( here id would be song name like abc.wav something..so may be VARCHAR2, yet not decided..whats your openion...is that fine if primary key is of type VARCHAR2....any suggesstions are most welcome...... ) Type - Song or Message ( varchar2) Count - Summation of total play ( Integer) Retries - Summation of total play, if failed. ( Integer) Duration - Total Duration ( Integer) Last Updated - Late Updated Date Time ( DateTime )
There is nothing wrong with having a PRIMARY KEY as a VARCHAR2 data type (though there is often debate about the value of having a non-specific PK, i.e. a sequence). You must, however, ensure your PK is unique, if you can't guarentee this then it would be worth having a sequence as your PK over having to introduce another columnn to maintain uniqueness.
As for declaring your table columns as INTEGER, they eventually will be resolved to NUMBER anyway so I'd just create the table column as a number (unless you have a very specific reason for creating them as INTEGER).
Finally, the DATETIME column, you only need decare it as a DATE datatype unless you need real precision in your time portion, in which case declare it as a TIMESTAMP datatype.
As for helping you with the structure of the table itself (i.e. which columns you want etc.) then that is not something I can help you with as I know nothing of your reporting requirements, application requirements or audit requirements, company best practice, naming conventions etc. I'm afraid that is something for you to decide for yourself.
For performance though, keep indexes to a minumum (i.e. only index columns that will aid your UPDATE WHERE clause search), only update the minimum data possible and, as suggested before, research the APPEND hint for inserts it may help in your case but you will have to test it for yourself.

Oracle - Understanding the no_index hint

I'm trying to understand how no_index actually speeds up a query and haven't been able to find documentation online to explain it.
For example I have this query that ran extremely slow
select *
from <tablename>
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString' and
Action_='_someAction_' and
Timestamp_ >= trunc(sysdate - 2)
And one of our DBAs was able to speed it up significantly by doing this
select /*+ NO_INDEX(TAB_000000000019) */ *
from <tablename>
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString' and
Action_='_someAction_' and
Timestamp_ >= trunc(sysdate - 2)
And I can't figure out why? I would like to figure out why this works so I can see if I can apply it to another query (this one a join) to speed it up because it's taking even longer to run.
Thanks!
** Update **
Here's what I know about the table in the example.
It's a 'partitioned table'
TAB_000000000019 is the table not a column in it
field1 is indexed
Oracle's optimizer makes judgements on how best to run a query, and to do this it uses a large number of statistics gathered about the tables and indexes. Based on these stats, it decides whether or not to use an index, or to just do a table scan, for example.
Critically, these stats are not automatically up-to-date, because they can be very expensive to gather. In cases where the stats are not up to date, the optimizer can make the "wrong" decision, and perhaps use an index when it would actually be faster to do a table scan.
If this is known by the DBA/developer, they can give hints (which is what NO_INDEX is) to the optimizer, telling it not to use a given index because it's known to slow things down, often due to out-of-date stats.
In your example, TAB_000000000019 will refer to an index or a table (I'm guessing an index, since it looks like an auto-generated name).
It's a bit of a black art, to be honest, but that's the gist of it, as I understand things.
Disclaimer: I'm not a DBA, but I've dabbled in that area.
Per your update: If field1 is the only indexed field, then the original query was likely doing a fast full scan on that index (i.e. reading through every entry in the index and checking against the filter conditions on field1), then using those results to find the rows in the table and filter on the other conditions. The conditions on field1 are such that an index unique scan or range scan (i.e. looking up specific values or ranges of values in the index) would not be possible.
Likely the optimizer chose this path because there are two filter predicates on field1. The optimizer would calculate estimated selectivity for each of these and then multiply them to determine their combined selectivity. But in many cases this will significantly underestimate the number of rows that will match the condition.
The NO_INDEX hint eliminates this option from the optimizer's consideration, so it essentially goes with the plan it thinks is next best -- possibly in this case using partition elimination based on one of the other filter conditions in the query.
Using an index degrades query performance if it results in more disk IO compared to querying the table with an index.
This can be demonstrated with a simple table:
create table tq84_ix_test (
a number(15) primary key,
b varchar2(20),
c number(1)
);
The following block fills 1 Million records into this table. Every 250th record is filled with a rare value in column b while all the others are filled with frequent value:
declare
rows_inserted number := 0;
begin
while rows_inserted < 1000000 loop
if mod(rows_inserted, 250) = 0 then
insert into tq84_ix_test values (
-1 * rows_inserted,
'rare value',
1);
rows_inserted := rows_inserted + 1;
else
begin
insert into tq84_ix_test values (
trunc(dbms_random.value(1, 1e15)),
'frequent value',
trunc(dbms_random.value(0,2))
);
rows_inserted := rows_inserted + 1;
exception when dup_val_on_index then
null;
end;
end if;
end loop;
end;
/
An index is put on the column
create index tq84_index on tq84_ix_test (b);
The same query, but once with index and once without index, differ in performance. Check it out for yourself:
set timing on
select /*+ no_index(tq84_ix_test) */
sum(c)
from
tq84_ix_test
where
b = 'frequent value';
select /*+ index(tq84_ix_test tq84_index) */
sum(c)
from
tq84_ix_test
where
b = 'frequent value';
Why is it? In the case without the index, all database blocks are read, in sequential order. Usually, this is costly and therefore considered bad. In normal situation, with an index, such a "full table scan" can be reduced to reading say 2 to 5 index database blocks plus reading the one database block that contains the record that the index points to. With the example here, it is different altogether: the entire index is read and for (almost) each entry in the index, a database block is read, too. So, not only is the entire table read, but also the index. Note, that this behaviour would differ if c were also in the index because in that case Oracle could choose to get the value of c from the index instead of going the detour to the table.
So, to generalize the issue: if the index does not pick few records then it might be beneficial to not use it.
Something to note about indexes is that they are precomputed values based on the row order and the data in the field. In this specific case you say that field1 is indexed and you are using it in the query as follows:
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString'
In the query snippet above the filter is on both a variable piece of data since the percent (%) character cradles the string and then on another specific string. This means that the default Oracle optimization that doesn't use an optimizer hint will try to find the string inside the indexed field first and also find if the data it is a sub-string of the data in the field, then it will check that the data doesn't match another specific string. After the index is checked the other columns are then checked. This is a very slow process if repeated.
The NO_INDEX hint proposed by the DBA removes the optimizer's preference to use an index and will likely allow the optimizer to choose the faster comparisons first and not necessarily force index comparison first and then compare other columns.
The following is slow because it compares the string and its sub-strings:
field1_ like '%someGenericString%'
While the following is faster because it is specific:
field1_ like 'someSpecificString'
So the reason to use the NO_INDEX hint is if you have comparisons on the index that slow things down. If the index field is compared against more specific data then the index comparison is usually faster.
I say usually because when the indexed field contains more redundant data like in the example #Atish mentions above, it will have to go through a long list of comparison negatives before a positive comparison is returned. Hints produce varying results because both the database design and the data in the tables affect how fast a query performs. So in order to apply hints you need to know if the individual comparisons you hint to the optimizer will be faster on your data set. There are no shortcuts in this process. Applying hints should happen after proper SQL queries have been written because hints should be based on the real data.
Check out this hints reference: http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm
To add to what Rene' and Dave have said, this is what I have actually observed in a production situation:
If the condition(s) on the indexed field returns too many matches, Oracle is better off doing a Full Table Scan.
We had a report program querying a very large indexed table - the index was on a region code and the query specified the exact region code, so Oracle CBO uses the index.
Unfortunately, one specific region code accounted for 90% of the tables entries.
As long as the report was run for one of the other (minor) region codes, it completed in less than 30 minutes, but for the major region code it took many hours.
Adding a hint to the SQL to force a full table scan solved the problem.
Hope this helps.
I had read somewhere that using a % in front of query like '%someGenericString%' will lead to Oracle ignoring the INDEX on that field. Maybe that explains why the query is running slow.

Resources