Why is the NLSSORT index not used for this query? - oracle

In our application we have case-insensitive semantics configured at session level:
alter session set NLS_COMP=LINGUISTIC;
alter session set NLS_SORT=BINARY_AI;
but then I want to have a table with a NAME column with binary semantics, so I defined an function-based index accordingly:
create table RAW_SCREEN (
ID number(10) constraint RSCR_PK primary key,
NAME nvarchar2(256) not null
);
create unique index RSCR_IDX on RAW_SCREEN (nlssort(NAME, 'NLS_SORT=BINARY'));
I would have expected the query below to take advantage of the function-based index:
select * from RAW_SCREEN where
nlssort(NAME, 'NLS_SORT=BINARY') = nlssort(N'raw_screen1', 'NLS_SORT=BINARY');
but it doesn't. The query plan shows a table scan. While experimenting, I've discovered that a simple index on NAME does the trick:
create unique index RSCR_IDX2 on RAW_SCREEN (NAME);
When running the query again, the RSCR_IDX2 index was used successfully.
Now, that is not very surprising, but I can't understand why the first function-based index was not used by the optimizer. The indexed expression matches exactly the expression used in the WHERE condition. Do you have any idea why it wasn't used?
NOTE: This was run on Oracle 10.2
Here's a full test script if you want to try it out:
alter session set NLS_COMP=LINGUISTIC;
alter session set NLS_SORT=BINARY_AI;
create table RAW_SCREEN (
ID number(10) constraint RSCR_PK primary key,
NAME nvarchar2(256) not null
);
create unique index RSCR_IDX on RAW_SCREEN (nlssort(NAME, 'NLS_SORT=BINARY'));
--create unique index RSCR_IDX2 on RAW_SCREEN (NAME);
begin
for i in 1..10000
loop
insert into RAW_SCREEN values (i, 'raw_screen' || i);
end loop;
end;
/
commit;
select * from RAW_SCREEN where nlssort(NAME, 'NLS_SORT=BINARY') = nlssort(N'raw_screen1000', 'NLS_SORT=BINARY');

Expressions are converted to NLS session settings in DML, but not in DDL.
This is arguably a bug with the behavior of NLSSORT(char, 'NLS_SORT=BINARY').
From the manual: "If you specify BINARY, then this function returns char."
But that is not true for the index. Normally it is very convenient that the index expression does not undergo any transformation; if it depended on session settings
than tools like DBMS_METADATA.GET_DDL would have to return many alter session statements. But in this case it means that you can create an index that will never
be used.
The explain plan shows the real expression. Here's how Oracle uses nlssort in a session without it being explicitly used:
alter session set nls_comp=linguistic;
alter session set nls_sort=binary_ai;
drop table raw_screen;
create table raw_screen (
id number(10) constraint rscr_pk primary key,
name nvarchar2(256) not null
);
create unique index idx_binary_ai
on raw_screen (nlssort(name, 'nls_sort=binary_ai'));
explain plan for select * from raw_screen where name = n'raw_screen1000';
select * from table(dbms_xplan.display(format=>'basic predicate'));
Plan hash value: 2639454581
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| RAW_SCREEN |
|* 2 | INDEX UNIQUE SCAN | IDX_BINARY_AI |
-----------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(NLSSORT("NAME",'nls_sort=''BINARY_AI''')=HEXTORAW('0072006
10077005F00730063007200650065006E003100300030003000'))
This example shows that nlssort(char, 'nls_sort=binary') is dropped by the DML:
alter session set nls_comp=linguistic;
alter session set nls_sort=binary_ai;
drop table raw_screen;
create table raw_screen (
id number(10) constraint rscr_pk primary key,
name nvarchar2(256) not null
);
create unique index idx_binary_ai on
raw_screen (nlssort(name, 'nls_sort=binary_ai'));
explain plan for select * from raw_screen where
nlssort(name,'nls_sort=binary') = nlssort(N'raw_screen1000','nls_sort=binary');
select * from table(dbms_xplan.display(format=>'basic predicate'));
Plan hash value: 237065300
----------------------------------------
| Id | Operation | Name |
----------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | TABLE ACCESS FULL| RAW_SCREEN |
----------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NAME"=U'raw_screen1000')
In summary - index DDL needs to exactly match the transformed expressions, which can depend on session settings and the unusual behavior of binary.

When you apply a function to a column in the where clause of a query, any corresponding indexes on this column must also include the function for Oracle to make use of them when executing the query. The NLSSORT function can be applied to strings in your where clause automatically be Oracle if you set NLS_COMP and NLS_SORT appropriately.
To enable case-insensitive searching, we must convert the strings stored in the table by a applying a function such as upper(), lower(), etc. We must then also create a function-based index on the column with the same function we use in our queries.
By changing the NLS_COMP parameter to ANSI and the NLS_SORT parameter to BINARY_CI for a session, Oracle will automatically place the NLSSORT function to strings in your query however! In this case, you don't have to change your query, as Oracle does this for you behind the scenes.

Related

View Performance

I have requirement of performing some calculation on a column of a table with large date set ( 300 GB). and return that value.
Basically I need to create a View on that table. Table has data of 21 years and It is partitioned on date column (Daily). We can not put date condition on View's query and User will put filter on runtime while execution of the view.
For example:
Create view v_view as
select * from table;
Noe I want to query View like
Select * v_view where ts_date between '1-Jan-19' and '1-Jan-20'
How Internally Oracle execute above statement? Will it execute view query first and then put date filter on that?
If so will there not be performance issue ? and how to resolve this?
oracle first generates the view and then applies the filter. you can create a function that input may inserted by user. the function results a create query and if yo run the query then the view will be created. just run:
create or replace function fnc_x(where_condition in varchar2)
return varchar2
as
begin
return ' CREATE OR REPLACE VIEW sup_orders AS
SELECT suppliers.supplier_id, orders.quantity, orders.price
FROM suppliers
INNER JOIN orders
ON suppliers.supplier_id = orders.supplier_id
'||where_condition||' ';
end fnc_x;
this function should be run. input the function is a string like this:
''WHERE suppliers.supplier_name = Microsoft''
then you should run a block like this to run the function's result:
cl scr
set SERVEROUTPUT ON
declare
szSql varchar2(3000);
crte_vw varchar2(3000);
begin
szSql := 'select fnc_x(''WHERE suppliers.supplier_name = Microsoft'') from dual';
dbms_output.put_line(szSql);
execute immediate szSql into crte_vw; -- generate 'create view' command that is depended on user's where_condition
dbms_output.put_line(crte_vw);
execute immediate crte_vw ; -- create the view
end;
In this manner, you just need received where_condition from user.
Oracle can "push" the predicates inside simple views and can then use those predicates to enable partition pruning for optimal performance. You almost never need to worry about what Oracle will run first - it will figure out the optimal order for you. Oracle does not need to mindlessly build the first step of a query, and then send all of the results to the second step. The below sample schema and queries demonstrate how only the minimal amount of partitions are used when a view on a partitioned table is queried.
--drop table table1;
--Create a daily-partitioned table.
create table table1(id number, ts_date date)
partition by range(ts_date)
interval (numtodsinterval(1, 'day'))
(
partition p1 values less than (date '2000-01-01')
);
--Insert 1000 values, each in a separate day and partition.
insert into table1
select level, date '2000-01-01' + level
from dual
connect by level <= 1000;
--Create a simple view on the partitioned table.
create or replace view v_view as select * from table1;
The following explain plan shows "Pstart" and "Pstop" set to 3 and 4, which means that only 2 of the many partitions are used for this query.
--Generate an explain plan for a simple query on the view.
explain plan for
select * from v_view where ts_date between date '2000-01-02' and date '2000-01-03';
--Show the explain plan.
select * from table(dbms_xplan.display(format => 'basic +partition'));
Plan hash value: 434062308
-----------------------------------------------------------
| Id | Operation | Name | Pstart| Pstop |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | | |
| 1 | PARTITION RANGE ITERATOR| | 3 | 4 |
| 2 | TABLE ACCESS FULL | TABLE1 | 3 | 4 |
-----------------------------------------------------------
However, partition pruning and predicate pushing do not always work when we may think they should. One thing we can do to help the optimizer is to use date literals instead of strings that look like dates. For example, replace
'1-Jan-19' with date '2019-01-01'. When we use ANSI date literals, there is no ambiguity and Oracle is more likely to use partition pruning.

Validation checks in PL/SQL

As per my project requirement, I need to store all validation check queries in one table and validate all records of another table and update each record with its validation status.
For example, I have two tables called EMP and VALIDATIONS
Validation table has two columns as below:
------------------- --------------
Validation_desc Validation_sql
------------------ --------------
EID_IS_NULL related SQL should be here
SAL_HIGH related SQL should be here
EMPtable has normal columns like eid,ename,sal,dept,is_valid,val_desc.
I should write PL/SQL code which will fetch all validation sql's from VALIDATIONS table and check each record of EMP table and validate them. If first record got success with all validations which are available in VALIDATIONS table then EMP table IS_VALID column should be updated with 1 and Validation_desc should be null for that particular record. If second record got failed with 2 checks then that record's IS_VALID column should be updated with 0 and Validation_desc should be updated with those Validation_descwith comma separated, like wise it should check all validations for all records of EMP table.
I have tried below code to fetch all details from both the tables but not able to write logic for validations.
CREATE PROCEDURE P_VALIDATION
as
TYPE REC_TYPE IS RECORD( Validation_desc VARCHAR2(4000),
Validation_sql VARCHAR2(4000));
TYPE VAL_CHECK_TYPE IS TABLE OF REC_TYPE;
LV_VAL_CHECK VAL_CHECK_TYPE;
CURSOR CUR_FEED_DATA IS SELECT * FROM EMP;
LV_FEED_DATA EMP%ROWTYPE;
BEGIN
SELECT Validation_desc, Validation_sql
BULK COLLECT INTO LV_VAL_CHECK FROM VALIDATIONS;
OPEN CUR_FEED_DATA;
LOOP
FETCH CUR_FEED_DATA INTO LV_FEED_DATA;
EXIT WHEN CUR_FEED_DATA%NOTFOUND;
FOR I IN LV_VAL_CHECK.FIRST .. LV_VAL_CHECK.LAST LOOP
----SOME VALIDATIONS LOGIC HERE--
END LOOP;
END LOOP;
CLOSE CUR_FEED_DATA;
END;
There is no single type of validation. Validations can be is null, is not null, is true/false, returns rows/no rows etc. There are a couple of ways to tackle this
write your own assertion package with a procedure for each type of validation. In your table you'd store the type of validation and the expression to be evaluated. This is quite a bit of work.
leverage an open source testing framework like utPLSQL and modify that a bit to suit your needs. All kind of validations have already been implemented there. If you decide to go this route, note that there are major differences between version 2 and 3
First off pay attention to #KoenLostrie opening sentence: There is no single type of validation. This basically says no single process will solve the problem. But using the database's built in validations should be your first line of attack. Both of the example validation simply cease to be necessary with simple predefined constraints:
Use a constraint to define ... — a rule that
restricts the values in a database. Oracle Database lets you create
six types of constraints ...
For the table you outlined constraints can handle validations you outlined and a couple extras:
+-------------+--------+----------------------------------+-----------------------------------------------------+
| Validation | Column | Constraint Type | Description |
+-------------+--------+----------------------------------+-----------------------------------------------------+
| Eid_is_Null | eid | Primary Key | Guarantees eid is not null and is unique in table |
+-------------+--------+----------------------------------+-----------------------------------------------------+
| Sal_High | salary | Check (salary <= 50000) | Guarantees salary column is not greater than 50,000 |
+-------------+--------+----------------------------------+-----------------------------------------------------+
| Dept_OK | dept | not null | Ensures column is present |
+ +----------------------------------+-----------------------------------------------------+
| | | Foreign key to Departments table | Guarantees value exists in reference table |
+-------------+--------+----------------------------------+-----------------------------------------------------+
| Name_Ok | ename | not null | Ensures column is present |
+-------------+--------+----------------------------------+-----------------------------------------------------+
For the listed above, and any other constraint added, you do not need a query to validate - the database manager simply will not allow an invalid row to exist. Of course now your code needs to handle the resulting exceptions.
Unfortunately constraints cannot handle every validation so you will still need validation routines to record those. However, you should not create a comma separated list (not only do they violate 1st normal form they are always more trouble than they are worth - how do you update the list when the error is resolved?) Instead create a new table - emp_violations as:
create table emp_violations (
emp_vio_id integer generated always as identity
, emp_id integer -- or type of emp_id
, vol_id integer -- or type of vio_id (the pk of violations table)
, date_created date
, date_resolved date
, constraint emp_violations_pk
primary key (emp_violation_id)
, constraint emp_violations_2_emp_fk
foreign key (emp_id)
references emp(mp_id)
, constraint emp_violations_2_violations_fk
foreign key (vio_id)
references violations(vio_id)
);
With this it is easy to see what non-constraint violations have, or currently exist, and when they were resolved. Also remove the columns Validation_desc (no longer needed) and is_valid (derivable from unresolved emp_violations requiring no additional column maintenance).
If you absolutely must get is_valid and a comma separated list of violations then create a view with the LISTAGG function.

How to alter column (Changing datasize), if table was created with partitions?

I have created table with a partition:
CREATE TABLE edw_src.pageviewlog_dev
(
accessurl character varying(1000),
msisdn character varying(1000),
customerid integer
)
WITH (
OIDS=FALSE
)
DISTRIBUTED BY (msisdn)
PARTITION BY RANGE(customerid)
(
PARTITION customerid START (0) END (200)
)
Now I want to change the datasize of accessurl from 1000 to 3000.I am not able to change the datasize,Whenever I am trying I am getting the error.
ERROR: "pageviewlog_dev_1_prt_customerid" is a member of a partitioning configurationHINT: Perform the operation on the master table.
I am able to change If I change the datatype from pg_attribute.If there any other way to change the datasize of existing column other than pg_attribute
I have found the Solution for the same .Sorry for the replying late .Below is the way to do ,whenever we face this kind of problem in "Post grel and greenplum"
UPDATE pg_attribute SET atttypmod = 300+4
WHERE attrelid = 'edw_src.ivs_hourly_applog_events'::regclass
AND attname = 'adtransactionid';
Greenplum isn't Postgresql so please don't confuse people by asking a Greenplum question with PostgreSQL in the title.
Don't modify catalog objects like pg_attribute. That will cause lots of problems and isn't supported.
The Admin Guide has the syntax for changing column datatypes and this is all you need to do:
ALTER TABLE edw_src.pageviewlog_dev
ALTER COLUMN accessurl TYPE character varying(3000);
Here is the working example with your table:
CREATE SCHEMA edw_src;
CREATE TABLE edw_src.pageviewlog_dev
(
accessurl character varying(1000),
msisdn character varying(1000),
customerid integer
)
WITH (
OIDS=FALSE
)
DISTRIBUTED BY (msisdn)
PARTITION BY RANGE(customerid)
(
PARTITION customerid START (0) END (200)
);
Output:
NOTICE: CREATE TABLE will create partition "pageviewlog_dev_1_prt_customerid" for table "pageviewlog_dev"
Query returned successfully with no result in 47 ms.
And now alter the table:
ALTER TABLE edw_src.pageviewlog_dev
ALTER COLUMN accessurl TYPE character varying(3000);
Output:
Query returned successfully with no result in 62 ms.
Proof in psql:
\d edw_src.pageviewlog_dev
Table "edw_src.pageviewlog_dev"
Column | Type | Modifiers
------------+-------------------------+-----------
accessurl | character varying(3000) |
msisdn | character varying(1000) |
customerid | integer |
Number of child tables: 1 (Use \d+ to list them.)
Distributed by: (msisdn)
If you are unable to alter the table it is probably because the catalog is corrupted after you updated pg_attribute directly. You can try dropping the table and recreating it or you can open a support ticket to have them attempt to correct the catalog corruption.

Optimizer using an index not present in the current schema

CONNECT alll/all
SELECT /*+ FIRST_ROWS(25) */ employee_id, department_id
FROM hr.employees
WHERE department_id > 50;
Execution Plan
Plan hash value: 2056577954
| Id | Operation | Name | Rows | Bytes |
| 0 | SELECT STATEMENT | | 25 | 200
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 25 | 200
|* 2 | INDEX RANGE SCAN | **EMP_DEPARTMENT_IX** | |
SQL> select * from user_indexes where index_name = 'EMP_DEPARTMENT_IX';
no rows selected
NOTE: There is an index with the same name on the DEPARTMENT column of the EMPLOYEES table in some other schema. And when that index is dropped a FULL TABLE SCAN on EMPLOYEES is performed.
Can the optimizer use that other index from some other schema over here?
You're connected as user ALLL, but you're querying a table in the HR schema:
SELECT /*+ FIRST_ROWS(25) */ employee_id, department_id
FROM hr.employees
WHERE department_id > 50;
You stressed other schema in the question, but seem to have overlooked that the table you're querying is also in another schema. The employees table won't appear in user_tables either.
The index being used is associated with that table, so it's likely to be in the same HR schema. You can see it in all_indexes or dba_indexes; the optimiser will use it even if you can't see it though. And it doesn't have to be in the same schema as the table, though it usually will be; in those views you might notice separate owner and table owner columns.
The schema model would break down if you could only utilise indexes in your own schema when accessing a table in someone else's. Every user would have to create their own copies of the indexes, which would be untenable.
You don't even necessarily have to be able to see the table - if you query a view that hides the underlying table from you (so you have select privs on the view only) the index will still be used in the background. And you might not always be explicitly using the schema prefix, if there is a synonym for the table, or you change your default schema.
Try looking in SYS.INDEXES:
select * from SYS.INDEXES where IXNAME = 'EMP_DEPARTMENT_IX'
Sounds like you are not the owner of the index, as you have noted. As long as your user can access the table data, then the index should be used by the optimizer.

Oracle partition key

I have many tables with large amount of data. The PK is the column (TAB_ID) which has data type RAW(16). I created the hash partitions with partition key having the TAB_ID column.
My issue is: the SQL statement (select * from my_table where tab_id = 'aas1df') does not use partition pruning. If I change the column datatype to varchar2(32), partition pruning works.
Why does not partition pruning work with partition key which have datatype RAW(16)?
I'm just guessing: try select * from my_table where 'aas1df' = tab_id.
Probably the datatype conversion works other way that expected. Anyway you should use the function UTL_RAW.CAST_TO_RAW.
Edited:
Is your table partitioned by TAB_ID? If yes, then there is something wrong with your design, you usually partition table by some useful business value, NOT by surrogate key.
If you know the PK value you do not need partition pruning at all. When Oracle traverses the PK index it gets ROWID value. This ROWID contains file-number, block ID and also row number within the block. So Oracle can access directly the row.
HEXTORAW enables partition pruning.
In the sample below the Pstart and Pstop are literal numbers, implying partition pruning occurs.
create table my_table
(
TAB_ID raw(16),
a number,
constraint my_table_pk primary key (tab_id)
)
partition by hash(tab_id) partitions 16;
explain plan for
select *
from my_table
where tab_id = hextoraw('1BCDB0E06E7C498CBE42B72A1758B432');
select * from table(dbms_xplan.display(format => 'basic +partition'));
Plan hash value: 1204448714
--------------------------------------------------------------------------
| Id | Operation | Name | Pstart| Pstop |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | |
| 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| MY_TABLE | 2 | 2 |
| 2 | INDEX UNIQUE SCAN | MY_TABLE_PK | | |
--------------------------------------------------------------------------

Resources