Oracle JDB Thin Client - Unique index with varchar2 not used - oracle

First som basics.
Java 6
OJDBC6
Oracle 10.2.0.4 (also the same result in 11g version)
I am experiencing that a sql statement is behaving differently when executed from Java with the OJDBC6 client and using the tool SQL Gate that probably uses a native/OCI driver. For som reason the optimizer chooses to use hash join for the executed statement in Java but not for the other.
Here is the table:
CREATE TABLE DPOWNERA.XXX_CHIP (
xxxCH_ID NUMBER(22) NOT NULL,
xxxCHP_ID NUMBER(22) NOT NULL,
xxxSP_ID NUMBER(22) NULL,
xxxCU_ID NUMBER(22) NULL,
xxxFT_ID NUMBER(22) NULL,
UEMTE_ID NUMBER(38) NULL,
xxxCH_CHIPID VARCHAR2(30) NOT NULL
)
The index:
ALTER TABLE DPOWNERA.XXX_CHIP ADD
(
CONSTRAINT IX_AK1_XXX_CHIPV2
UNIQUE ( XXXCH_CHIPID )
USING INDEX
TABLESPACE DP_DATA01
PCTFREE 10
INITRANS 2
MAXTRANS 255
STORAGE (
INITIAL 128 K
NEXT 128 K
MINEXTENTS 1
MAXEXTENTS UNLIMITED
)
);
Here is the SQL i used:
SELECT *
FROM (SELECT m2.*,
rownum rnum
FROM (SELECT m_chip.xxxch_id,
m_chip.xxxch_chipid
FROM xxx_chip m_chip
ORDER BY m_chip.xxxch_chipid) m2
WHERE rownum < 101)
WHERE rnum >= 1;
And finally excerpts from the explain plan:
SQL Tool Query:
OPERATION OBJECT_NAME COST CARDINALITY CPU_COST
---------------- ------------------- ----- ----------- ----------
SELECT STATEMENT NULL 2 10 11740
VIEW NULL 2 10 11740
COUNT NULL NULL NULL NULL
VIEW NULL 2 10 11740
NESTED LOOPS NULL 2 10 11740
TABLE ACCESS XXX_CHIP 1 1000000 3319
INDEX IX_AK1_XXX_CHIPV2 1 10 2336
TABLE ACCESS XXX_CUSTOMER 1 1 842
INDEX IX_PK_XXX_CUSTOMER 1 1 105
QQL Java Query OJDBC Thin client:
**OPERATION OBJECT_NAME COST CARDINALITY CPU_COST**
SELECT STATEMENT NULL 15100 100 1538329415
VIEW NULL 15100 100 1538329415
COUNT NULL NULL NULL NULL
VIEW NULL 15100 1000000 1538329415
SORT NULL 15100 1000000 1538329415
HASH JOIN NULL 1639 1000000 424719850
VIEW index$_join$_004 3 3 2268646
HASH JOIN NULL NULL NULL NULL
INDEX IX_AK1_XXX_CUSTOMER 1 3 965
INDEX IX_PK_XXX_CUSTOMER 1 3 965
TABLE ACCESS xxx_CHIP 1614 1000000 320184788
So, i am lost to why the hash join is chosen by the optimizer?
My guess is that the varchar2 is treated differently.

I found an answer and it was simpler than i thought. It all has to do with the VARCHAR2 datatype of the index column. My database was set to language and country "en", "US" but locally
i have another language and region. Therfore the optimizer rightly discarded the index since it wasn't configured with the same language and country as the client.
So what i did to test it was to start my eclipse with some extra -D parameters entered in my eclipse.ini file.
-Duser.language=en
-Duser.country=US
-Duser.region=US
Then in the data source explorer in Eclipse i had created a connection and ran my statement and it worked like a charm.
So lesson learned is to always see to that the client and database are compatible language wise. Probably we will change so we use UTF-8 in the database so it is the same for every installation. Otherwise you will have to configure it for every installation depending on country and language.
Hope this will help someone. If the answer was unclear please post a comment.

Related

Need to ignore "start with" sequence for identity column when comparing schema in oracle db

We have an oracle database (18c) on several servers, and need to sync the schema from dev to prod servers. Since it is only the schema that needs to be synced, and not the content of the tables, we do not need to know the next sequence number of primary key columns. (And we certainly do not want to update the prod servers with this sequence number.)
Have tried both SQL Developers Diff Tool and dbForge Schema Compare for Oracle, but they both list tables where only this sequence number is different as tables that need to be updated.
I have not found a setting in SQL Developer Diff Tool that handles this. In dbForge Schema Compare for Oracle they have the Ignore START WITH in sequences, but this seems to not work as I thought, since it still marks tables that are equal except for the sequence number as tables that need an update.
For new tables that only exist in the source db - the sync script will be like this:
CREATE TABLE TEST (
ID NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY(
START WITH 102),
TEXT VARCHAR2(4000 BYTE),
CONSTRAINT TEST_ID_PK PRIMARY KEY (ID))
LOGGING;
We need that script without the (START WITH 102) part in it.
For a table that exist in both source and target db (with no other change than the sequence number) - the sync script will be like this:
ALTER TABLE TEST
MODIFY(ID GENERATED BY DEFAULT ON NULL AS IDENTITY(
START WITH 114
INCREMENT BY 1
MAXVALUE 9999999999999999999999999999
MINVALUE 1
CACHE 20
NOCYCLE
NOORDER));
The reality here is that this is a table that does not need an update, and I thought that Ignore START WITH in sequence would handle this, but apparently not.
Anyone out there have a solution for us?
Well, I believe it is a very bad idea to use SQL Developer, or any other IDE tool for that matter, to create scripts to be deployed on Production. You are describing a clear case of lacking a real control version software, like GIT or SVN. You shouldn't need to compare between databases unless there is something wrong, but never for creating DDL scripts.
In this specific case, I would use DBMS_METADATA to create the DDLs
Example
SQL> create table t ( c1 number generated by default on null as identity ( start with 1 increment by 1 ) , c2 number ) ;
Table created.
SQL> insert into t values ( null , 1 ) ;
1 row created.
SQL> r
1* insert into t values ( null , 1 )
1 row created.
SQL> r
1* insert into t values ( null , 1 )
1 row created.
SQL> r
1* insert into t values ( null , 1 )
1 row created.
SQL> select * from t ;
C1 C2
---------- ----------
1 1
2 1
3 1
4 1
In this case SQL developer shows start with 5, because that is the next value of the identity column. You can use DBMS_METADATA.GET_DDL to get the right DDL without this clause.
SQL> begin
DBMS_METADATA.set_transform_param (DBMS_METADATA.session_transform, 'SQLTERMINATOR', true);
DBMS_METADATA.set_transform_param (DBMS_METADATA.session_transform, 'PRETTY', true);
end;
/
PL/SQL procedure successfully completed.
SQL> select dbms_metadata.get_ddl('TABLE','T') from dual
DBMS_METADATA.GET_DDL('TABLE','T')
--------------------------------------------------------------------------------
CREATE TABLE "SYS"."T"
( "C1" NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY MINVALUE 1 MAXVALUE 99
99999999999999999999999999 INCREMENT BY 1 START WITH 1 CACHE 20 NOORDER NOCYCLE
NOKEEP NOSCALE NOT NULL ENABLE,
"C2" NUMBER
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "SYSTEM" ;
There are several options to for example not get the storage attributes. I always use this one
SQL> BEGIN
2 DBMS_METADATA.set_transform_param (DBMS_METADATA.session_transform, 'SQLTERMINATOR', true);
DBMS_METADATA.set_transform_param (DBMS_METADATA.session_transform, 'PRETTY', true);
3 4 DBMS_METADATA.set_transform_param (DBMS_METADATA.session_transform, 'SEGMENT_ATTRIBUTES', true);
DBMS_METADATA.set_transform_param (DBMS_METADATA.session_transform, 'STORAGE', false);
END;
/ 5 6 7
PL/SQL procedure successfully completed.
SQL> select dbms_metadata.get_ddl('TABLE','T') from dual ;
DBMS_METADATA.GET_DDL('TABLE','T')
--------------------------------------------------------------------------------
CREATE TABLE "SYS"."T"
( "C1" NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY MINVALUE 1 MAXVALUE 99
99999999999999999999999999 INCREMENT BY 1 START WITH 1 CACHE 20 NOORDER NOCYCLE
NOKEEP NOSCALE NOT NULL ENABLE,
"C2" NUMBER
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
TABLESPACE "SYSTEM" ;
SQL>
For comparison purposes, you might want to have a look into DBMS_COMPARISON
https://docs.oracle.com/database/121/ARPLS/d_comparison.htm#ARPLS868

MariaDB not optimizing query

first question here ever, so be nice :-)
I used to use MySQL on all of my servers and since Cent OS 7 now comes with MariaDB instea, I gave it a try. All seems to be good except for one query, which MySQL performs in milliseconds and MariaDB takes seconds :(
SELECT * FROM (
SELECT id, date_start FROM matches
WHERE matches.type =5409
AND matches.status =10
AND matches.date_start >= '2016-02-01'
AND matches.date_start <= '2016-02-08'
) AS tmp
INNER JOIN seat ON tmp.id = seat.match_id
The table seat has 5.4 million entries, matches has a third of that. For every match played there are 3 seats.
Now MySQL cleverly derives the tmp table first and then joins on the mere 112 matches found for the given time frame:
id select_type table type possible_keys key len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 112
1 PRIMARY seat ref match_id match_id 8 tmp.id 3
2 DERIVED matches ALL NULL NULL NULL NULL 1919638 Using where
MariaDB on the other hand does the JOIN first on the whole matches table. Not so smart for 5 million entries:
id select_type table type possible_keys key len ref rows Extra
1 SIMPLE seat ALL match_id NULL NULL NULL 5462345
1 SIMPLE matches eq_ref PRIMARY PRIMARY 8 seat.match_id 1 Using where
match_id is a key in the seat table.
I tried different approaches and nothing worked. I don't want to give up on MariaDB yet, because I read that and also noticed myself that it's somewhat faster on other queries. But this is actually a show stopper...
So any help would be much appreciated!
For matches:
INDEX(type, status, date_start)
will speed up both MySQL and MariaDB. (type and status can be swapped, but date_start needs to be third.)
The reason for the difference is that MySQL is ahead of MariaDB on certain optimizations, and you hit one such.

Why is my date dimension table useless? (Confusion over PostgreSQL storage...)

I have looked over this about 4 times and am still perplexed with these results.
Take a look at the following (which I originally posted here)
Date dimension table --
-- Some output omitted
DROP TABLE IF EXISTS dim_calendar CASCADE;
CREATE TABLE dim_calendar (
id SMALLSERIAL PRIMARY KEY,
day_id DATE NOT NULL,
year SMALLINT NOT NULL, -- 2000 to 2024
month SMALLINT NOT NULL, -- 1 to 12
day SMALLINT NOT NULL, -- 1 to 31
quarter SMALLINT NOT NULL, -- 1 to 4
day_of_week SMALLINT NOT NULL, -- 0 () to 6 ()
day_of_year SMALLINT NOT NULL, -- 1 to 366
week_of_year SMALLINT NOT NULL, -- 1 to 53
CONSTRAINT con_month CHECK (month >= 1 AND month <= 31),
CONSTRAINT con_day_of_year CHECK (day_of_year >= 1 AND day_of_year <= 366), -- 366 allows for leap years
CONSTRAINT con_week_of_year CHECK (week_of_year >= 1 AND week_of_year <= 53),
UNIQUE(day_id)
);
INSERT INTO dim_calendar (day_id, year, month, day, quarter, day_of_week, day_of_year, week_of_year) (
SELECT ts,
EXTRACT(YEAR FROM ts),
EXTRACT(MONTH FROM ts),
EXTRACT(DAY FROM ts),
EXTRACT(QUARTER FROM ts),
EXTRACT(DOW FROM ts),
EXTRACT(DOY FROM ts),
EXTRACT(WEEK FROM ts)
FROM generate_series('2000-01-01'::timestamp, '2024-01-01', '1day'::interval) AS t(ts)
);
/* ==> [ INSERT 0 8767 ] */
Tables for testing --
DROP TABLE IF EXISTS just_dates CASCADE;
DROP TABLE IF EXISTS just_date_ids CASCADE;
CREATE TABLE just_dates AS
SELECT a_date AS some_date
FROM some_table;
/* ==> [ SELECT 769411 ] */
CREATE TABLE just_date_ids AS
SELECT d.id
FROM just_dates jd
INNER JOIN dim_calendar d
ON d.day_id = jd.some_date;
/* ==> [ SELECT 769411 ] */
ALTER TABLE just_date_ids ADD CONSTRAINT jdfk FOREIGN KEY (id) REFERENCES dim_calendar (id);
Confusion --
pocket=# SELECT pg_size_pretty(pg_relation_size('dim_calendar'));
pg_size_pretty
----------------
448 kB
(1 row)
pocket=# SELECT pg_size_pretty(pg_relation_size('just_dates'));
pg_size_pretty
----------------
27 MB
(1 row)
pocket=# SELECT pg_size_pretty(pg_relation_size('just_date_ids'));
pg_size_pretty
----------------
27 MB
(1 row)
Why is a table consisting of a bunch of smallints the same size as a table consisting of a bunch of dates? And I should mention that before, when dim_calendar.id was a normal SERIAL, it gave the same 27MB result.
Also, and more importantly -- WHY does a table with 769411 records with a single smallint field have a size of 27MB, which is > 32bytes/record???
P.S. Yes, I will have billions (or at a minimum hundreds of millions) of records, and am trying to add performance and space optimizations wherever possible.
EDIT
This might have something to do with it, so throwing it out there --
pocket=# select count(id) from just_date_ids group by id;
count
--------
409752
359659
(2 rows)
In tables with one or two columns, the biggest part of the size is always the Tuple Header.
Have a look here http://www.postgresql.org/docs/current/interactive/storage-page-layout.html, it explains how the data is stored. I'm quoting the part of the above page that is most relevant with your question
All table rows are structured in the same way. There is a fixed-size header (occupying 23 bytes on most machines), followed by an optional null bitmap, an optional object ID field, and the user data.
This mostly explains the question
WHY does a table with 769411 records with a single smallint field have a size of 27MB, which is > 32bytes/record???
The other part of your question has to do with the byte alignment of postgres data. Smallints are aligned in 2-byte offsets, but ints (and dates of course... date is an int4 after all) are aligned in 4 bytes offsets. So the order in which the table columns are devlared plays a significant role.
Having a table with smallint, date, smallint needs 12 bytes for user data (not counting the overhead), while declaring smallint, smallint, date only will need 8 bytes. See a great (and surprisingly not accepted) answer here Calculating and saving space in PostgreSQL

Perfomance issues just on production database

I'm having some performance issues when querying a table in a production database. While the query runs in 2.1 seconds in the test database (returning 8640 of 28 million records), at production, it takes 2.05 minutes (returning 8640 of 31 million records). I'm having a hard time to find the problem since I'm not an oracle expert.
Since the explain plan in both databases shows the correct index usage, I'm inclined to think that the problem resides on the table/indexes creation.
I've noticed some small differences between the SQL scripts used for the table creation:
Test database:
create table TB_PONTO_ENE
(
cd_ponto NUMBER(10) not null,
cd_fonte NUMBER(10),
cd_medidor NUMBER(10),
cd_usuario NUMBER(10),
dt_hr_insercao DATE,
dt_hr_instante DATE not null,
dt_hr_hora DATE,
dt_hr_dia DATE,
dt_hr_mes DATE,
dt_hr_instante_hv DATE,
dt_hr_hora_hv DATE,
dt_hr_dia_hv DATE,
dt_hr_mes_hv DATE,
vl_eneat_del FLOAT,
vl_eneat_rec FLOAT,
vl_enere_del FLOAT,
vl_enere_rec FLOAT,
vl_eneat_del_cp FLOAT,
vl_eneat_rec_cp FLOAT,
vl_enere_del_cp FLOAT,
vl_enere_rec_cp FLOAT
)
tablespace TELEMEDICAO
pctfree 10
initrans 1
maxtrans 255
storage
(
initial 64K
minextents 1
maxextents unlimited
);
alter table TB_PONTO_ENE
add constraint CP_TB_PONTO_ENE primary key (CD_PONTO, DT_HR_INSTANTE)
using index
tablespace TELEMEDICAO
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
minextents 1
maxextents unlimited
);
alter table TB_PONTO_ENE
add constraint CE_PENE_CD_FONTE foreign key (CD_FONTE)
references TB_FONTE (CD_FONTE) on delete set null;
alter table TB_PONTO_ENE
add constraint CE_PENE_CD_MEDIDOR foreign key (CD_MEDIDOR)
references TB_MEDIDOR (CD_MEDIDOR) on delete set null;
alter table TB_PONTO_ENE
add constraint CE_PENE_CD_PONTO foreign key (CD_PONTO)
references TB_PONTO (CD_PONTO) on delete cascade;
alter table TB_PONTO_ENE
add constraint CE_PENE_CD_USUARIO foreign key (CD_USUARIO)
references TB_USUARIO (CD_USUARIO) on delete set null
disable;
Production database:
create table TB_PONTO_ENE
(
cd_ponto NUMBER(10) not null,
cd_fonte NUMBER(10),
cd_medidor NUMBER(10),
cd_usuario NUMBER(10),
dt_hr_insercao DATE,
dt_hr_instante DATE not null,
dt_hr_hora DATE,
dt_hr_dia DATE,
dt_hr_mes DATE,
dt_hr_instante_hv DATE,
dt_hr_hora_hv DATE,
dt_hr_dia_hv DATE,
dt_hr_mes_hv DATE,
vl_eneat_del FLOAT,
vl_eneat_rec FLOAT,
vl_enere_del FLOAT,
vl_enere_rec FLOAT,
vl_eneat_del_cp FLOAT,
vl_eneat_rec_cp FLOAT,
vl_enere_del_cp FLOAT,
vl_enere_rec_cp FLOAT
)
tablespace TELEMEDICAO
pctfree 10
initrans 1
maxtrans 255
storage
(
initial 64K
next 5M
minextents 1
maxextents unlimited
pctincrease 0
);
alter table TB_PONTO_ENE
add constraint CP_TB_PONTO_ENE primary key (CD_PONTO, DT_HR_INSTANTE)
using index
tablespace MEDICAO_NDX
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
pctincrease 0
);
alter table TB_PONTO_ENE
add constraint CE_PENE_CD_FONTE foreign key (CD_FONTE)
references TB_FONTE (CD_FONTE) on delete set null;
alter table TB_PONTO_ENE
add constraint CE_PENE_CD_MEDIDOR foreign key (CD_MEDIDOR)
references TB_MEDIDOR (CD_MEDIDOR) on delete set null;
alter table TB_PONTO_ENE
add constraint CE_PENE_CD_PONTO foreign key (CD_PONTO)
references TB_PONTO (CD_PONTO) on delete cascade;
alter table TB_PONTO_ENE
add constraint CE_PENE_CD_USUARIO foreign key (CD_USUARIO)
references TB_USUARIO (CD_USUARIO) on delete set null;
The production database puts the indexes in another tablespace. Another difference is the next 5M at the tablespace declaration (no value defined in the test database).
When looking at the index properties, I also see some differences:
Test database:
AVG_DATA_BLOCKS_PER_KEY 1
AVG_LEAF_BLOCKS_PER_KEY 1
BLEVEL 2
BUFFER_POOL DEFAULT
CLUSTERING_FACTOR 611494
COMPRESSION DISABLED
DEGREE 1
DISTINCT_KEYS 28568389
DROPPED NO
GENERATED N
GLOBAL_STATS YES
INDEX_NAME CP_TB_PONTO_ENE
INDEX_TYPE NORMAL
INITIAL_EXTENT 65536
INI_TRANS 2
INSTANCES 1
IOT_REDUNDANT_PKEY_ELIM NO
JOIN_INDEX NO
LAST_ANALYZED 21/07/2010 22:08:34
LEAF_BLOCKS 85809
LOGGING YES
MAX_EXTENTS 2147483645
MAX_TRANS 255
MIN_EXTENTS 1
NUM_ROWS 28568389
PARTITIONED NO
PCT_FREE 10
SAMPLE_SIZE 377209
SECONDARY N
STATUS VALID
TABLESPACE_NAME TELEMEDICAO
TABLE_NAME TB_PONTO_ENE
TABLE_TYPE TABLE
TEMPORARY N
UNIQUENESS UNIQUE
USER_STATS NO
Production database:
AVG_DATA_BLOCKS_PER_KEY 1
AVG_LEAF_BLOCKS_PER_KEY 1
BLEVEL 2
BUFFER_POOL DEFAULT
CLUSTERING_FACTOR 10154395
COMPRESSION DISABLED
DEGREE 1
DISTINCT_KEYS 14004395
GENERATED N
GLOBAL_STATS YES
INDEX_NAME CP_TB_PONTO_ENE
INDEX_TYPE NORMAL
INITIAL_EXTENT 65536
INI_TRANS 2
INSTANCES 1
JOIN_INDEX NO
LAST_ANALYZED 05/03/2010 08:45:19
LEAF_BLOCKS 42865
LOGGING YES
MAX_EXTENTS 2147483645
MAX_TRANS 255
MIN_EXTENTS 1
NEXT_EXTENT 1048576
NUM_ROWS 14004395
PARTITIONED NO
PCT_FREE 10
PCT_INCREASE 0
SAMPLE_SIZE 2800879
SECONDARY N
STATUS VALID
TABLESPACE_NAME MEDICAO_NDX
TABLE_NAME TB_PONTO_ENE
TABLE_TYPE TABLE
TEMPORARY N
UNIQUENESS UNIQUE
USER_STATS NO
Two other things has come to my attention: the explain plan for select count(*) from thetable shows that the index is used at the test database, but shows a full table scan at the production database. Which led me to another observation: the test database index has 160MB and the production db has more than 1GB (and we don't do deletes on this table).
Can anyone point me to the solution?
UPDATE
Here are the execution plans:
Test database:
Execution Plan
----------------------------------------------------------
Plan hash value: 1441290166
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 18767 (4)| 00:03:46 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| IDX_HV_TB_PONTO_ENE | 28M| 18767 (4)| 00:03:46 |
-------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
111 recursive calls
0 db block gets
83586 consistent gets
83533 physical reads
0 redo size
422 bytes sent via SQL*Net to client
399 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
1 rows processed
Production database
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=RULE
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'TB_PONTO_ENE'
Statistics
----------------------------------------------------------
1 recursive calls
3 db block gets
605327 consistent gets
603698 physical reads
180 redo size
201 bytes sent via SQL*Net to client
242 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
UPDATE 2
The production server is running Oracle 9.2.0.
UPDATE 3
Here are the statistics for the execution with the optimizer mode set to CHOOSE:
SQL> SELECT dt_hr_instante, vl_eneat_del,vl_eneat_rec,vl_enere_del, vl_enere_rec FROM tb_ponto_ene WHERE cd_ponto = 31 AND dt_hr_instante BETWEEN to_date('01/06/2010 00:05:00','dd/mm/yyyy hh24:mi:ss') AND to_date('01/07/2010 00:00:00', 'dd/mm/yyyy hh24:mi:ss');
8640 rows selected.
Elapsed: 00:01:49.51
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=36)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TB_PONTO_ENE' (Cost=4 Card=1 Bytes=36)
2 1 INDEX (RANGE SCAN) OF 'CP_TB_PONTO_ENE' (UNIQUE) (Cost=3 Card=1)
Statistics
----------------------------------------------------------
119 recursive calls
0 db block gets
9169 consistent gets
7438 physical reads
0 redo size
308524 bytes sent via SQL*Net to client
4267 bytes received via SQL*Net from client
577 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
8640 rows processed
The Test database indexes properties include IOT_REDUNDANT_PKEY_ELIM and DROPPED columns but not the production indexes. These columns were added in oracle 10g.
Is perhaps the production database running under the old 9i version and the test database under 10g ? If so, I'd consider that a more significant difference than anything else.
That said if "select count(*) from thetable" is not using a primary key index it is very odd. The index stats are very out of date (14,004,395 rows when you suggest there's over 30 million and last gathered in March). If the table has doubled in size in the last six months, and its stats are even older, then it might be an issue.
The autotrace plan for production says "RULE" optimizer. If you look at the Oracle Tuning document (9i) section RBO Path 15: Full table scan it clearly states full table scan will be used.

How to index a date column with null values?

How should I index a date column when some rows has null values?
We have to select rows between a date range and rows with null dates.
We use Oracle 9.2 and higher.
Options I found
Using a bitmap index on the date column
Using an index on date column and an index on a state field which value is 1 when the date is null
Using an index on date column and an other granted not null column
My thoughts to the options are:
to 1: the table have to many different values to use an bitmap index
to 2: I have to add an field only for this purpose and to change the query when I want to retrieve the null date rows
to 3: locks tricky to add an field to an index which is not really needed
What is the best practice for this case?
Thanks in advance
Some infos I have read:
Oracle Date Index
When does Oracle index null column values?
Edit
Our table has 300,000 records. 1,000 to 10,000 records are inserted and delete every day. 280,000 records have a null delivered_at date. It is a kind of picking buffer.
Our structure (translated to english) is:
create table orders
(
orderid VARCHAR2(6) not null,
customerid VARCHAR2(6) not null,
compartment VARCHAR2(8),
externalstorage NUMBER(1) default 0 not null,
created_at DATE not null,
last_update DATE not null,
latest_delivery DATE not null,
delivered_at DATE,
delivery_group VARCHAR2(9),
fast_order NUMBER(1) default 0 not null,
order_type NUMBER(1) default 0 not null,
produkt_group VARCHAR2(30)
)
In addition to Tony's excellent advice, there is also an option to index your column in such a way that you don't need to adjust your queries. The trick is to add a constant value to just your index.
A demonstration:
Create a table with 10,000 rows out of which only 6 contain a NULL value for the a_date column.
SQL> create table mytable (id,a_date,filler)
2 as
3 select level
4 , case when level < 9995 then date '1999-12-31' + level end
5 , lpad('*',1000,'*')
6 from dual
7 connect by level <= 10000
8 /
Table created.
First I'll show that if you just create an index on the a_date column, the index is not used when you use the predicate "where a_date is null":
SQL> create index i1 on mytable (a_date)
2 /
Index created.
SQL> exec dbms_stats.gather_table_stats(user,'mytable',cascade=>true)
PL/SQL procedure successfully completed.
SQL> set autotrace on
SQL> select id
2 , a_date
3 from mytable
4 where a_date is null
5 /
ID A_DATE
---------- -------------------
9995
9996
9997
9998
9999
10000
6 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=72 Card=6 Bytes=72)
1 0 TABLE ACCESS (FULL) OF 'MYTABLE' (Cost=72 Card=6 Bytes=72)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
720 consistent gets
0 physical reads
0 redo size
285 bytes sent via SQL*Net to client
234 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
6 rows processed
720 consistent gets and a full table scan.
Now change the index to include the constant 1, and repeat the test:
SQL> set autotrace off
SQL> drop index i1
2 /
Index dropped.
SQL> create index i1 on mytable (a_date,1)
2 /
Index created.
SQL> exec dbms_stats.gather_table_stats(user,'mytable',cascade=>true)
PL/SQL procedure successfully completed.
SQL> set autotrace on
SQL> select id
2 , a_date
3 from mytable
4 where a_date is null
5 /
ID A_DATE
---------- -------------------
9995
9996
9997
9998
9999
10000
6 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=6 Bytes=72)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'MYTABLE' (Cost=2 Card=6 Bytes=72)
2 1 INDEX (RANGE SCAN) OF 'I1' (NON-UNIQUE) (Cost=2 Card=6)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
285 bytes sent via SQL*Net to client
234 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
6 rows processed
6 consistent gets and an index range scan.
Regards,
Rob.
"Our table has 300,000 records....
280,000 records have a null
delivered_at date. "
In other words almost the entire table satisfies a query which searches on where DELIVERED_AT is null. An index is completely inappropriate for that search. A full table scan is much the best approach.
If you have an Enterprise Edition license and you have the CPUs to spare, using a parallel query would reduce the elapsed time.
Do you mean that your queries will be like this?
select ...
from mytable
where (datecol between :from and :to
or datecol is null);
It would only be worth indexing the nulls if they were relatively few in the table - otherwise a full table scan may be the most efficient way to find them. Assuming it is worth indexing them you could create a function-based index like this:
create index mytable_fbi on mytable (case when datecol is null then 1 end);
Then change your query to:
select ...
from mytable
where (datecol between :from and :to
or case when datecol is null then 1 end = 1);
You could wrap the case in a function to make it slicker:
create or replace function isnull (p_date date) return varchar2
DETERMINISTIC
is
begin
return case when p_date is null then 'Y' end;
end;
/
create index mytable_fbi on mytable (isnull(datecol));
select ...
from mytable
where (datecol between :from and :to
or isnull(datecol) = 'Y');
I made sure the function returns NULL when the date is not null so that only the null dates are stored in the index. Also I had to declare the function as DETERMINISTIC. (I changed it to return 'Y' instead of 1 merely because to me the name "isnull" suggests it should; feel free to ignore my preference!)
Avoid the table lookup and create the index like this :
create index i1 on mytable (a_date,id) ;

Resources