Can oracle merge bitmap indexes during fast full scan? - performance

I have a large fact table with 300M rows and 50 columns in it. There are multiple reports over this table and each report uses only couple out of 50 columns from the table.
Each column in the fact table is indexed with BITMAP INDEX. The idea is to use these indexes as a one-column version of the original table assuming that oracle could merge BITMAP INDEXes easily.
If I use several columns from the table in WHERE statement, I can see that oracle is able to merge these indexes effectively. There is BITMAP AND operation in execution plan as expected.
If I use several columns from the table in SELECT statement, I can see that depending on columns selectivity, oracle is either performing unneeded TABLE ACCESS or BITMAP CONVERSION [to rowids] and then HASH JOIN of these conversions.
Is there any way to eliminate the HASH JOIN in case of joining several BITMAP INDEXes? Is there any hint in oracle to force BITMAP MERGE when columns appear in SELECT statement rather than WHERE?
Intuitively it seems like the HASH JOIN for BITMAP INDEXes is unneeded operation in SELECT statement taking into account it is indeed unneeded in WHERE statement. But I couldn't find any evidence that oracle could avoid it.
Here are some examples:
SELECT a, b, c /* 3 BITMAP CONVERSIONs [to rowids] and then 2 unneeded HASH JOINS */
FROM fact;
SELECT a, b, c, d, e /* TABLE ACCESS [full] instead of reading all the data from indexes */
FROM fact;
SELECT a /* BITMAP INDEX [fast full scan] as expected*/
FROM fact
WHERE b = 1 and c = 2; /* BITMAP AND over two BITMAP INDEX [single value] as expected */
Are there any hints to optimize examples #1 and #2?
In production I use oracle11g but I tried similar queries on oracle12c and it look like in both versions of oracle behave the same.

After some research it looks like oracle12c is incapable of joining BITMAP INDEXes if they are used in SELECT clause efficiently.
There is no dedicated access path to join BITMAP INDEXes in SELECT clause and so HASH JOIN is used in this case.
Oracle cannot use BITMAP MERGE access path in this case as it performs OR operation between two bitmaps:
How Bitmap Merge Works
A merge uses an OR operation between two bitmaps.
The resulting bitmap selects all rows from the first bitmap,
plus all rows from every subsequent bitmap.
Detailed analysis showed that only HASH JOIN was considered by cost optimizer in my case. I wasn't able to find any evidence that BITMAP INDEXes could be used efficiently in SELECT statement. Oracle documentation suggests using BITMAP INDEXes only in WHERE clause or joining fact to dimensions.
And either of the following are true:
The indexed column will be restricted in queries (referenced in the
WHERE clause).
or
The indexed column is a foreign key for a dimension table. In this
case, such an index will make star transformation more likely.
In my case it is neither of the two.

I think what you are seeing is essentially the "index join access path" in action :) Oracle needs to join the data from both scans on ROWID, to stitch the rows together. The hash join is the only method open to Oracle. The fact that you are using bitmap indexes is actually irrelevant; you see the same behaviour with b-tree indexes
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1973K| 43M| 137K (30)| 00:00:06 |
| 1 | VIEW | index$_join$_001 | 1973K| 43M| 137K (30)| 00:00:06 |
|* 2 | HASH JOIN | | | | | |
|* 3 | INDEX FAST FULL SCAN| IO | 1973K| 43M| 17201 (78)| 00:00:01 |
|* 4 | INDEX FAST FULL SCAN| IT | 1973K| 43M| 17201 (78)| 00:00:01 |
-------------------------------------------------------------------------------------------

Related

How can we uniquely identify a each record in Vertica database in absence of a primary key?

There are some tables in which the primary key, unique key, or composite key is not specified.
How can we uniquely identify each record in Vertica?
Vertica, in this respect, is a real relational database: If there is no declared identifier, there is no identifier.
And it is columnar.
Row based DBMSs have a way to physically identify the location of a certain row. That's where, for example, the ROWID of Oracle comes from.
Vertica being columnar, you can not even locate the actual position of the first column in the sort order:
If, for example, you have gender as the first column in the sort order, and the column is encoded as Run-Length-Encoding (RLE), then, you have, in that file that contains the column, for example, the value 'F', an integer of 502, the value 'M' and an integer of 498 for a table containing 1000 rows.
You could calculate the hash of all columns (with a small risk of hash collisions), but if you have two rows like this:
42 | Arthur Dent | 2022-01-25
42 | Arthur Dent | 2022-01-25
There is now way of discerning one row from the other.
Even if you apply a
ROW_NUMBER() OVER(PARTITION BY <all_columns_of_the_table> )
which would lead one of the above rows to get a 1 and the other to get a 2- there will be no way of determining which was assigned to which row.
These are the two, not completely satisfactory, ways of working around this behaviour. Which is not a problem, but a behavioural feature.

Trying to figure out max length of Rowid in Oracle

As per my design I want to fetch rowid as in
select rowid r from table_name;
into a C variable. I was wondering what is the max size / length in characters of the rowid.
Currently in one of the biggest tables in my DB we have the max length as 18 and its 18 throughout the table for rowid.
Thanks in advance.
Edit:
Currently the below block of code is iterated and used for multiple tables hence in-order to make the code flexible without introducing the need of defining every table's PK in the query we use ROWID.
select rowid from table_name ... where ....;
delete from table_name where rowid = selectedrowid;
I think as the rowid is picked and used then and there without storing it for future, it is safe to use in this particular scenario.
Please refer to below answer:
Is it safe to use ROWID to locate a Row/Record in Oracle?
I'd say no. This could be safe if for instance the application stores ROWID temporarily(say generating a list of select-able items, each identified with ROWID, but the list is routinely regenerated and not stored). But if ROWID is used in any persistent way it's not safe.
A physical ROWID has a fixed size in a given Oracle version, it does not depend on the number of rows in a table. It consists of the number of the datafile, the number of the block within this file, and the number of the row within this block. Therefore it is unique in the whole database and allows direct access to the block and row without any further lookup.
As things in the IT world continue to grow, it is safe to assume that the format will change in future.
Besides volume there are also structural changes, like the advent of transportable tablespaces, which made it necessary to store the object number (= internal number of the table/partition/subpartion) inside the ROWID.
Or the advent of Index organized tables (mentioned by #ibre5041), which look like a table, but are in reality just an index without such a physical address (because things are moving constantly in an index). This made it necessary to introduce UROWIDs which can store physical and index-based ROWIDs.
Please be aware that a ROWID can change, for instance if the row moves from one table partition to another one, or if the table is defragmented to fill the holes left by many DELETEs.
According documentation ROWID has a length of 10 Byte:
Rowids of Row Pieces
A rowid is effectively a 10-byte physical address of a row.
Every row in a heap-organized table has a rowid unique to this table
that corresponds to the physical address of a row piece. For table
clusters, rows in different tables that are in the same data block can
have the same rowid.
Oracle also documents the (current) format see, Rowid Format
In general you could use the ROWID in your application, provided the affected rows are locked!
Thus your statement may look like this:
CURSOR ... IS
select rowid from table_name ... where .... FOR UPDATE;
delete from table_name where rowid = selectedrowid;
see SELECT FOR UPDATE and FOR UPDATE Cursors
Oracle even provides a shortcut. Instead of where rowid = selectedrowid you can use WHERE CURRENT OF ...

When the VIEW table calculate or change the value?

I create a View table, like
CREATE VIEW a AS SELECT b.kcu_id, sum(b.price), FROM b GROUP BY b.kcu_id
I create a view because my table contains too many rows, like 10000 or more. And it is too costly if we must sum that many rows every time get operation is called.
I use spring data jpa to get the data from the view. What I want to ask is, when I use the getPrice method to get the sum of prices, it is calculate the sum when I use the get method or the database calculate the sum when there are a change in column performance in b table in database?
For your info, price column is rarely change in my case.
If it's just a "regular" view like you have in your example, the data will be calculated anew every time you query it. A view, after all, is just a slightly modified view of the data in the table at any given point.
You can have what they call "materialised views", which are more like a physical table that's updated from the underlying table periodically but you generally have to do that differently that with a normal "create view" command.
With PostgreSQL, the commands you're looking for are:
create materialized view
refresh materialized view
The former creates a materialised view in pretty much the same way as your create view and also populates the view with data (unless you've used the with no data clause). It also remembers the underlying query used to create the view (like any view does) so that you can update the data at some later point (which is what the refresh command above does).
By way of example, the following PostgreSQL code:
create table below (val integer);
insert into below values (42);
create materialized view above as select * from below;
insert into below values (99);
select * from below;
select * from above;
refresh materialized view above;
select * from above;
will materialise the view when the table contains only the 42 and later refresh it to include the 99 as well:
Underlying table 'below' with both data items:
| val |
|-----|
| 42 |
| 99 |
Materialised view 'above', created before the insert of 99:
| val |
|-----|
| 42 |
Materialised view 'above' after refreshing the view:
| val |
|-----|
| 42 |
| 99 |
Provided you're willing to live with the possibility that the data may be a little out of date, that's probably the best way to do it. Given your comment that the "price column is rarely change[d]", that may not be an issue.
However, I'm actually quite surprised that 10,000 rows is causing you an problem, it's not really that big a table. Hence you may want to look at other possible fixes, such as ensuring you have an index on the kcu_id column.

Oracle 11g XE slow query execution

I face a problem regarding a select query i have created. The query is the following:
SELECT --184.791
C.MSISDN
FROM
CONTACTS_HISTORY C
INNER JOIN WAVECONTACTS_HISTORY WC
ON C.CONTACTSID = WC.CONTACTSID
WHERE
C.CAMPAIGNSID = 472;
The C.CAMPAIGNSID, C.CONTACTSID and WC.CONTACTSID columns are indexed and the WC.CONTACTSID is a foregin key to the C.CONTACTSID.
The CONTACTS_HISTORY table has 3.000.000 records and the WAVECONTACTS_HISTORY table 2.000.000 records.
When I include the join in the query the execution is too slow.
The execution plan from the SQLDeveloper has a total cost of 3.
I can not understand why the execution is too slow. Is this because of the limitation of the XE edition?
The Oracle DB is installed on my laptop Intel Core i3, 8GB RAM (but I am aware of the limitations of this edition to 1 CPU, 1 Gb RAM)
OPERATION OBJECT_NAME OPTIONS COST
SELECT STATEMENT 3
NESTED LOOPS
NESTED LOOPS 3
TABLE ACCESS WAVECONTACTS_HISTORY FULL 2
INDEX IX_CONTACTS_HISTORY_CMPSID RANGE SCAN 1
Access Predicates
TABLE ACCESS CONTACTS_HISTORY BY INDEX ROWID 1
Filter Predicates
WC.CONTACTSID=C.CONTACTSID
All of the costs are out of whack, especially this one:
TABLE ACCESS WAVECONTACTS_HISTORY FULL 2
That's a FULL TABLE SCAN of two million row table.
Most likely your statistics are stale. Gather fresh stats on the tables and indexes, and you should see a much smarter and efficient execution plan. Find out more.
That may not be the whole solution. Tuning is dependent on a number of factors, such as skew and distribution. For instance, if you have relatively few Campaigns in your Contact History and they're spread across the table the index on C.CAMPAIGNSID won't work any magic for you. If this is a query you're going to run a lot you should consider a compound index on (CAMPAIGNSID, CONTACTSID), in that order.
Alternatively, as you don't actually use any columns from WAVECONTACTS_HISTORY you could replace the join with an IN or EXISTS sub-query.

Oracle 10g - optimize WHERE IS NOT NULL

We have Oracle 10g and we need to query 1 table (no joins) and filter out rows where 1 of the columns is null. When we do this - WHERE OurColumn IS NOT NULL - we get a full table scan on a very large table - BAD BAD BAD. The column has an index on it but it gets ignored in this instance. Are there any solutions to this?
Thanks
The optimizer thinks that the full table scan will be better.
If there are just a few NULL rows, the optimizer is right.
If you are absolutely sure that the index access will be faster (that is, you have more than 75% rows with col1 IS NULL), then hint your query:
SELECT /*+ INDEX (t index_name_on_col1) */
*
FROM mytable t
WHERE col1 IS NOT NULL
Why 75%?
Because using INDEX SCAN to retrieve values not covered by the index implies a hidden join on ROWID, which costs about 4 times as much as table scan.
If the index range includes more than 25% of rows, the table scan is usually faster.
As mentioned by Tony Andrews, clustering factor is more accurate method to measure this value, but 25% is still a good rule of thumb.
The optimiser will make its decision based on the relative cost of the full table scan and using the index. This mainly comes down to how many blocks will have to be read to satisfy the query. The 25%/75% rule of thumb mentioned in another answer is simplistic: in some cases a full table scan will make sense even to get 1% of the rows - i.e. if those rows happen to be spread around many blocks.
For example, consider this table:
SQL> create table t1 as select object_id, object_name from all_objects;
Table created.
SQL> alter table t1 modify object_id null;
Table altered.
SQL> update t1 set object_id = null
2 where mod(object_id,100) != 0
3 /
84558 rows updated.
SQL> analyze table t1 compute statistics;
Table analyzed.
SQL> select count(*) from t1 where object_id is not null;
COUNT(*)
----------
861
As you can see, only approximately 1% of the rows in T1 have a non-null object_id. But due to the way I built the table, these 861 rows will be spread more or less evenly around the table. Therefore, the query:
select * from t1 where object_id is not null;
is likely to visit almost every block in T1 to get data, even if the optimiser used the index. It makes sense then to dispense with the index and go for a full table scan!
A key statistic to help identify this situation is the index clustering factor:
SQL> select clustering_factor from user_indexes where index_name='T1_IDX';
CLUSTERING_FACTOR
-----------------
460
This value 460 is quite high (compared to the 861 rows in the index), and suggests that a full table scan will be used. See this DBAZine article on clustering factors.
If you are doing a select *, then it would make sense to do a table scan rather than using the index. If you know which columns you are interested in, you could create a covered index with those colums plus the one you are applying the IS NOT NULL condition.
It can depend on the type of index you have on the table.
Most B-tree indexes do not store null entries. Bitmap indexes do store null entries.
So, if you have:
select * from mytable
where mycolumn is null
and you have a standard B-tree index on mycolumn, then the query can't use the index as the "null" isn't in the index.
(If the index is against multiple columns, and one of the indexed columns is not null then there will be an entry in the index.)
Create an index on that column.
To make sure the index is used, it should be on the index and other columns in the where.
ocdecio answered:
If you are doing a select *, then it would make sense to do a table scan rather than using the index.
That's not strictly true; an index will be used if there is an index that fits your where clause, and the query optimizer decides using that index would be faster than doing a table scan. If there is no index, or no suitable index, only then must a table scan be done.
It's also worth checking whether Oracle's statistics on the table are up to date. It may not know that a full table scan will be slower.
Oracle database don't index null values at all in regular (b-tree) indexes, so it can't use it nor you can't force oracle database to use it.
BR
Using hints should be done only as a work around rather than a solution.
As mentioned in other answers, the null value is not available in B-TREE indexes.
Since you know that you have mostly null values in this column, would you be able to replace the null value by a range for instance.
That really depends on your column and the nature of your data but typically, if your column is a date type for instance:
where mydatecolumn is not null
Can be translated in a rule saying: I want all rows which have a date.
Then you can most definitely do this:
where mydatecolumn <=sysdate (in oracle)
This will return all rows with a date and ommit null values while taking advantage of the index on that column without using any hints.
See http://www.oracloid.com/2006/05/using-index-for-is-null/
If your index is on one single field, it will NOT be used. Try to add a dummy field or a constant in the index:
create index tind on t(field_to_index, 1);

Resources