What to do when the SQL index does not solve the problem of performance? - oracle

This query ONE
SELECT * FROM TEST_RANDOM WHERE EMPNO >= '236400' AND EMPNO <= '456000';
in the Oracle Database is running with cost 1927.
And this query TWO :
SELECT * FROM TEST_RANDOM WHERE EMPNO = '236400';
is running with cost 1924.
This table TEST_RANDOM has 1.000.000 rows, I created this table so:
Create table test_normal (empno varchar2(10), ename varchar2(30), sal number(10), faixa varchar2(10));
Begin
For i in 1..1000000
Loop
Insert into test_normal values(
to_char(i), dbms_random.string('U',30),
dbms_random.value(1000,7000), 'ND'
);
If mod(i, **10000)** = 0 then
Commit;
End if;
End loop;
End;
Create table test_random
as
select /*+ append */ * from test_normal order by dbms_random.random;
I created a B-Tree index in the field EMPNO so:
CREATE INDEX IDX_RANDOM_1 ON TEST_RANDOM (EMPNO);
After this, the query TWO improved, and the cost changed to 4.
But the query ONE did not improve, because Oracle Database ignored it, for some reason Oracle Database understood that this query is not worth it to use the plan execution with the index...
My question is: What could we do to improve this query ONE performance? Because the solution of the index did not solve and its cost continues to be expensive...

For this query, Oracle does not use an index because the optimizer correctly estimated the number of rows and correctly decided that a full table scan would be faster or more efficient.
B-Tree indexes are generally only useful when they can be used to return a small percentage of rows, and your first query returns about 25% of the rows. It's hard to say what the ideal percentage of rows is, but 25% is almost always too large. On my system, the execution plan changes from full table scan to index range scan when the query returns 1723 rows - but that number will likely be different for you.
There are several reasons why full table scans are better than indexes for retrieving a large percentage of rows:
Single-block versus multi-block: In Oracle, like in almost all computer systems, it can be significantly faster to retrieve multiple chunks of data at a time (sequential access) instead of retrieving one random chunk of data at a time (random access).
Clustering factor: Oracle stores all rows in blocks, which are usually 8KB large and are analogous to pages. If the index is very inefficient, like if the index is built on randomly sorted data and two sequential reads rarely read from the same block, then reading 25% of all the rows from an index may still require reading 100% of the table blocks.
Algorithmic complexity: A full table scan reads the data as a simple heap, which is O(N). A single index access is much faster, at O(LOG(N)). But as the number of index accesses increases, the benefit wears off, until eventually using the index is O(N * LOG(N)).
Some things you can do to improve performance without indexes:
Partitioning: Partitioning is the idea solution for retrieving a large percentage of data from a table (but the option must be licensed). With partitioning, Oracle splits the logical table into multiple physical tables, and the query can only read from the required partitions. This can create the benefit of multi-block reads, but still limits the amount of data scanned.
Parallelism: Make Oracle work harder instead of smarter. But parallelism probably isn't worth the trouble for such a small table.
Materialized views: Create tables that only store exactly what you need.
Ordering the data: Improve the index clustering factor by sorting the table data by the relevant column instead of doing it randomly. In your case, replace order by dbms_random.random with order by empno. Depending on your version and platform, you may be able to use a materialized zone map to keep the table sorted.
Compression: Shrink the table to make it faster to read the whole thing.
That's quite a lot of information for what is possibly a minor performance problem. Before you go down this rabbit hole, it might be worth asking if you actually have an important performance problem as measured by a clock or by resource consumption, or are you just fishing for performance problems by looking at the somewhat meaningless cost metric?

Related

Clickhouse: Should i optimize MergeTree table manually?

I have a table like:
create table test (id String, timestamp DateTime, somestring String) ENGINE = MergeTree ORDER BY (id, timestamp)
i inserted 100 records then inserted another 100 records and i run select query
select * from test clickhouse returning with 2 parts their lengths are 100 and they are ordered in themselves. Then i run the query optimize table test and it started to return with 1 part and its length is 200 and ordered. So should i run optimize query after all insert and does it increase select query performance like select count(*) from test where id = 'foo' ?
Merges are eventual and may never happen. It depends on the number of inserts that happened after, the number of parts in the partition, size of parts. If the total size of input parts are greater than the maximum part size then they will never be merged.
It is very unreasonable to constantly merge up to one part.
Merger does not have such goal. In the contrary the goal is to have the minimum number of parts withing smallest number of merges. Merges consume the huge amount of disk and processor resources.
It makes no sense to merge two 300GB parts into one 600GB part for 3 hours. Merger have to read, decompress 600GB, merge, compress, write them back, after that the performance of the selects will not grow at all or will grow minimally.
Usually not, you can rely on Clickhouse background merges.
Also, Clickhouse has no intention to merge all the data from the partition into one part file, because "over-optimization" can affect performance too

Oracle performance database size increases

I have a high level question. Say I have a SQL query that takes 30ms to complete, it runs against an indexed column on a table with 1million records. Now if the table size is increased to 5million records should I expect the query to take 5 times as long (as 5 times the indexes have to be searched), so 150ms. I apologise if this is too simplistic, I have a program that is running 10 SQL (indexed) against a table that is going to be increased by this factor, the queries currently take 300ms and I am concerned this would increase to 1.5s. Any help would be appreciated!
You can think of an index lookup as doing a search through a binary tree followed by a fetch of the page with the appropriate data. Typically, the index would fit in memory and the search through the index would be quite fast. Multiplying the data size of 10 would increase the depth of the tree by 3 or 4. With in-memory comparison operations this would not be noticeable for most queries. (There are other types of indexes besides binary b-trees, but this is a convenient model for thinking about performance.)
The data fetch then could incur the overhead of reading a page from disk. That should still be quite fast.
So, the easy answer to your question is: no. However, this assumes that the query is something like:
select t.*
from table t
where t.indexcol = CONSTANTVALUE
And, it assumes that the query only returns one row. Things that might affect the performance as the table size increases include:
The size of the returned data set increases with the size of the table. Returning more values necessarily takes longer. For some queries, the performance is more dependent on the mechanism for returning values than calculating/fetching the data.
The query contains join or group by.
The statistics of the table are up-to-date, so the optimizer doesn't accidentally choose a full table scan rather than an index lookup.
You are in a memory-constrained environment where the index doesn't fit in memory. Or, the entire table is in memory when smaller but incurs the overhead of a cache-miss as it gets larger.

Performance tuning - Query on partitioned vs non-partitioned table

I have two queries, one of which involves a partitioned table in the query while the other query is the same except that it involves the non-partitioned equivalent table. The original (non-partitioned table) query is performing better than the partitioned counter-part. I am not sure how to isolate the problem for this. Looking at the execution plan, I find that the indexes used are the same b/w the two queries and that the new query shows the PARTITION RANGE clause in its execution plan meaning that partition pruning is taking place. The query is of the following form:-
Select rownum, <some columns>
from partTabA
inner join tabB on condition1
inner join tabC on condition2
where partTabA.column1=<value> and <other conditions>
and partTabA.column2 in (select columns from tabD where conditions)
where partTabA is the partitioned table and partTabA.column1 is the partitioning key(range partition). In the original query, this gets replaced by the non-partitioned equivalent of the same table. What parameters should I look at to find out why the new query performs badly. Tool that I have is Oracle SQL Developer.
PARTITION RANGE ITERATOR does not necessarily mean that partition pruning is happening.
You'll also want to look at the Pstart and Pstop in the explain plan, to see which partitions are being used.
There are several potential reasons the partitioned query will be slower, even though it's reading the same data. (Assuming that the partitioned query isn't properly pruning, and is reading from the whole table.)
Reading from multiple local indexes may be much less efficient than reading from a single, larger index.
There may be a lot of wasted space from large initial segment sizes, a large number of partitions, etc. Compare the segment sizes with this: select * from dba_segments where segment_name in ('PARTTABA', 'TABA'); If that's the issue, you may want to look into your tablespace settings, or using deferred segment creation.
I believe that you're dealing with partitioning overhead, if you have partitioned table then oracle has to find which partition to scan first.
Could you paste here both execution plans? How large are the tables? How selective are indexes used here?
Did you try to gather statistics?
You may also try to look into trace file to see what's going on.

Oracle : table with unused columns impact performance?

I have a table in my oracle db with 100 columns. 50 columns in this table are not used by the program accessing this table. (i.e. the select queries only select the relevant columns and NOT using '*')
My question is this :
If I recreate the same table with only the columns I need will it improve queries performance using the same query I used with the original table (remember that only the relevant columns are selected)
It is well worth mentioning the the program makes these queries a reasonable amount of times per second!
P.S. :
This is an existing project I am working on and the table design was made a long time ago for other products as well (thats why we have unused columns now)
So the effect of this will be that the average row will be smaller, if the extra columns have got data that will no longer be in the table. Therefore the table can be smaller, and not only will it use less space on disk it will use less memory space in the SGA, and caching will be more efficient.
Therefore, if you access the table via a full table scan then it will be faster to read the segment, but if you use index-based access mechanisms then the only performance improvement is likely to be through an improved chance of fetching the block from cache.
[Edited]
This SO thread suggests "it always pulls a tuple...". Hence, you are likely to see some performance improvement, not sure major or minor, as already mentioned.

LOW_VALUE and HIGH_VALUE in USER_TAB_COLUMNS

I have a question regarding the columns LOW_VALUE and HIGH_VALUE in the view USER_TAB_COLUMNS (or equivalent).
I was just wondering if these values are always correct, as in, if you have a column with 500k rows with value 1, 500k rows with value of 5 and 1 row with a value of 1000, the LOW_VALUE should be 1 (after you convert the raw figure) and HIGH_VALUE should be 1000 (after you convert the raw figure). However, are there any circumstances where Oracle would 'miss' this outlier value and instead have 5 for HIGH_VALUE?
Also, what is the purpose of these 2 values?
Thanks
As with all optimizer-related statistics, these values are estimates with varying degrees of accuracy from whenever statistics were gathered on the table. As such, it is entirely expected that they would be close but not completely accurate and entirely possible that they would be wildly incorrect.
When you gather statistics, you specify a percentage of the rows (or blocks) that should be sampled. It is possible to specify a 100% sample size, in which case Oracle would examine every row, but it is relatively rare to ask for a sample size nearly that large. It is much more efficient to ask for a much smaller sample size (either explicitly or by letting Oracle automatically determine the sample size). If your sample of rows happens not to include the one row with a value of 1000, the HIGH_VALUE would not be 1000, the HIGH_VALUE would be 5 assuming that is the largest value that the sample saw.
Statistics are also a snapshot in time. By default, 11g will gather statistics every night on objects that have undergone enough change since the last time that statistics were gathered on that object to warrant refreshing the statistics though you can disable that job or change the parameters. So if you gather statistics today with a 100% sample size in order to get a HIGH_VALUE of 1000 and then insert one row with a value of 3000 and never modify the table again, it's likely that Oracle would never gather statistics on that table again (unless you explicitly requested it to) and that the HIGH_VALUE would remain 1000 forever.
Assuming that there is no histogram on the column (which is another whole discussion), Oracle uses the LOW_VALUE and HIGH_VALUE to estimate how selective a particular predicate would be. If the LOW_VALUE is 1, the HIGH_VALUE is 1000, there are 1,000,000 rows in the table, there is no histogram on the column, and you run a query like
SELECT *
FROM some_table
WHERE column_name BETWEEN 100 and 101
Oracle will guess that the data is uniformly distributed between 1 and 1000 so that this query would return 1,000 rows (multiplying the number of rows in the table (1 million) by the fraction of the range the query covers (1/1000)). This selectivity estimate, in turn, would drive the optimizer's determination of whether it would be more efficient to use an index or to do a table scan, what join methods to use, what order to evaluate the various predicates, etc. If you have a non-uniform distribution of data, however, you'll likely end up with a histogram on the column which gives Oracle more detailed information about the distribution of data in the column than the LOW_VALUE and HIGH_VALUE provide.

Resources