ORA-01722: invalid number only on select * from view, not against view directly - oracle

I'm getting ORA-01722: invalid number but only when I select * from theView but not when I select against the theView directly (using the SQL inside the view's CREATE OR REPLACE...).
(I have faced and understand this error before, as well as running aggregates against NULL values, that one shouldn't store VARCHARS in NUMBER columns, etc. but am struggling to understand this issue)

Usually you get it when oracle executes your filter predicate that should filter only numbers, after predicates where you use it as a number.
Simple example:
create table t as
select '1' x, 'num' xtype from dual union all
select 'A' x, 'str' xtype from dual
/
create index t_ind on t(x);
You can see we get ORA-01722 in this very simple example even though we specified filter xtype='num' before x > 0:
select x
from (
select x
from t
where xtype='num'
) v
where v.x > 0;
ERROR:
ORA-01722: invalid number
Execution plan:
Plan hash value: 1601196873
---------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
|* 1 | TABLE ACCESS FULL| T | 1 | 6 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$F5BB74E1 / T#SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter((TO_NUMBER("X")>0 AND "XTYPE"='num'))
As you can see from the plan, inline view was merged and both predicates are on the same level.
Now compare with this:
select/*+
no_merge(v)
opt_param('_optimizer_filter_pushdown' 'false')
*/
x
from (
select x
from t
where xtype='num'
) v
where v.x > 0;
X
-
1
Execution plan:
Plan hash value: 3578092569
----------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
|* 1 | VIEW | | 1 | 3 | 3 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T | 1 | 6 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$2 / V#SEL$1
2 - SEL$2 / T#SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TO_NUMBER("V"."X")>0)
2 - filter("XTYPE"='num')
Read more about this: http://orasql.org/2013/06/10/too-many-function-executions/

Related

Oracle CASE expression in query plan

The background to this question is that I am currently investigating the query plans generated by having Oracle VPD column masking policies active. I would assume that the underlying rewrite is expressed as a CASE expression, e.g. SELECT CASE WHEN ss_quantity > 13 THEN ss_quantity ELSE NULL END ss_quantity FROM store_sales to represent the cell-level policy ss_quantity > 13 on the ss_quantity column of the store_sales table.
The goal is to be able to see and verify where in the query execution plan the CASE expression is executed. For instance in a query such as:
select ss_quantity
from store_sales, date_dim
where ss_sold_date_sk = d_date_sk
and d_year = 1998;
However, the execution plan generated from DBMS_XPLAN does not tell me where the CASE expression is executed. The plan is below. From this plan I cannot tell if the CASE statement is executed as part of the projection of the HASH JOIN (1) or as part of the projection of the TABLE ACCESS FULL (3).
Does anyone know a way to get this information?
PLAN_TABLE_OUTPUT
Plan hash value: 2770377741
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 117K| 5966K| 11576 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 117K| 5966K| 11576 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| DATE_DIM | 15 | 390 | 377 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| STORE_SALES | 3035K| 75M| 11191 (1)| 00:00:01 |
----------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$F5BB74E1
2 - SEL$F5BB74E1 / "DATE_DIM"#"SEL$1"
3 - SEL$F5BB74E1 / "STORE_SALES"#"SEL$2"
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("SS_SOLD_DATE_SK"="D_DATE_SK")
2 - filter("D_YEAR"=1998)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1; rowset=256) "SS_QUANTITY"[NUMBER,22]
2 - (rowset=256) "D_DATE_SK"[NUMBER,22]
3 - (rowset=256) "SS_SOLD_DATE_SK"[NUMBER,22], "SS_QUANTITY"[NUMBER,22]
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

Improving query performance to search in clob column in big table

I have a big table (about 4.6 million records) with many columns, I have concatenated some columns and inserted it in clob column(one column have an alias of the name so the name came in a different way and it inserted in clob column )and that column with ctxsys.context index.
I searched with contains a function with the fuzzy operator and for performance, I added tow columns to search and those columns with a bitmap index
and analyzed the table by using DBMS_STATS.GATHER_TABLE_STATS() also I alter my index to take parallel with 4 degrees and increase SORT_AREA_SIZE to 8300000.
My problem is when I searched it's taken from 2 to 5 min to executed.
is there any way to improve performance and reduce time execution(another algorithm to speed searching or I can change the structure of my table by increase the columns and search in multiple columns).
Here is my query:
SELECT first_name,
last_name,
countries,
category,
aliases
FROM (SELECT first_name,
last_name,
countries,
category,
aliases,
rr
FROM (SELECT T.u_id,
T.first_name,
T.last_name,
T.countries,
T.category,
T.aliases,
ROWNUM rr,
all_data
FROM tbl_rsk_list_world T
WHERE t.countries = 'SPAIN'
AND category = 'Eng')
WHERE Contains(all_data, 'fuzzy(JOSE,60,,weight)', 1) > 0)
WHERE rr BETWEEN 1 AND 500
The Execution plan is:
SQL> select * from TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2747287528
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20651 | 109M| 5724 (1
|* 1 | VIEW | | 20651 | 109M| 5724 (1
| 2 | COUNT | | | |
| 3 | PX COORDINATOR | | | |
| 4 | PX SEND QC (RANDOM)| :TQ10000 | 20651 | 4638K| 5724 (1
| 5 | PX BLOCK ITERATOR | | 20651 | 4638K| 5724 (1
|* 6 | TABLE ACCESS FULL| TBL_RSK_LIST_WORLD | 20651 | 4638K| 5724 (1
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CTXSYS"."CONTAINS"("ALL_DATA",'fuzzy(jose,60,,weight)',1)>0 AND "
AND "from$_subquery$_002"."RR">=1)
6 - filter("COUNTRIES"='SPAIN' AND "CATEGORY"='Eng')
20 rows selected
when I using FIRST_ROWS and DOMAIN_INDEX_NO_SORT hints the execution plan be:
SQL> select * from TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1488722846
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 173K|
|* 1 | VIEW | | 50 | 173K|
| 2 | COUNT | | | |
| 3 | TABLE ACCESS BY INDEX ROWID| TBL_RSK_LIST_WORLD | 50 | 11500 |
|* 4 | DOMAIN INDEX | NDX_RSK_LIST_WORLD_CTX | | |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CATEGORY"='Eng' AND "COUNTRIES"='SPAIN' AND
"from$_subquery$_002"."RR"<=500 AND "from$_subquery$_002"."RR">=1)
4 - access("CTXSYS"."CONTAINS"("W"."ALL_DATA",'fuzzy(jose,60,,weight)',1)>0)
18 rows selected
but still the performance bad :\

optimizer always full scans table eben though fething only 3 rows

I have a table foo which was created like this.
CREATE TABLE foo AS SELECT * FROM all_objects;
CREATE INDEX foo_I1 ON foo(owner,object_type,status);
exec dbms_stats.gather_table_stats('hr','foo',method_opt=>'FOR ALL COLUMNS size AUTO');
I created an index on 3 columns and firing a query which looks like below.
select * from foo where status='INVALID';
select * from foo where status='VALID';
status='VALID' fetches near about 71000 rows in a table of 71780 rows. it does a full table scan. it's understandable. but in case of status='INVALID' which fetches only 3 rows , it's doing full table scan. It's also getting A rows and E rows very different.
PLAN: same for both queries.
SQL_ID gdhy9j91gu9sm, child number 0
select /*+gather_plan_statistics */ * from foo where status='VALID'
Plan hash value: 1245013993
------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 50 |00:00:00.01 | 4 |
|* 1 | TABLE ACCESS FULL| FOO | 1 | 71773 | 50 |00:00:00.01 | 4 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='VALID')
Please explan this behaviour. Database Version: 11.2g oracle.
A missing histogram is probably causing the full table scan. Histograms are usually only created if the data is skewed and if the column has been used in a relevant predicate.
Sometimes you need to run a query before gathering statistics, to let Oracle know that this column is important enough to deserve a histogram.
select * from foo where status='INVALID';
exec dbms_stats.gather_table_stats('hr','foo',method_opt=>'FOR ALL COLUMNS size AUTO');
Re-run the SELECT and now it can use the histogram. With the histogram Oracle knows that INVALID returns a small number of rows, and an index would be useful:
explain plan for select * from foo where status='INVALID';
select * from table(dbms_xplan.display);
Plan hash value: 1520589999
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 134 | 217 (0)| 00:00:01|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| FOO | 1 | 134 | 217 (0)| 00:00:01|
|* 2 | INDEX SKIP SCAN | FOO_I1 | 1 | | 216 (0)| 00:00:01|
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("STATUS"='INVALID')
filter("STATUS"='INVALID')

How is the cardinality determined in a query?

I have two queries which return the same result set, but when reviewing the execution plans they have different values of cardinality.
The queries are:
select acq_cod
, prp
, df_val
, descr
from acqdefprp
where (prp like '%pswd%' or prp like '%Pswd%')
and prp not like '%kno%'
and prp not like '%encr%';
and
select acq_cod
, prp
, df_val
, descr
from acqdefprp
where regexp_instr(prp, 'pswd', 1,1,0,'i' ) > 0
and regexp_instr(prp, '(encr)|(kno)', 1,1,0,'i' ) = 0;
The first query has the following explain plan:
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 65 | 4485 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| acqdefprp | 65 | 4485 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("PRP" LIKE '%pswd%' OR "PRP" LIKE '%Pswd%')
AND "PRP" NOT LIKE '%kno%'
AND "PRP" NOT LIKE '%encr%')
And the explain plan for the second query is:
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 69 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| acqdefprp | 1 | 69 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------
1 - filter(REGEXP_INSTR ("PRP",'(encr)|(kno)',1,1,0,'i') = 0
AND REGEXP_INSTR ("PRP",'pswd',1,1,0,'i') > 0 )
My question is why is the cardinality different between the two execution plans? For the first plan the cardinality (rows) is 65 and for the second it's 1?
My assumption is that this cardinality is the maximum number of rows that will be returned by each condition, if each condition was evaluated separately, and all of this based on table statistics. And that is why for my first query this assumed maximum is 65, since the WHERE conditions are a little more permissive.
And also that is why for the second query the cardinality is 1, since the regexp_instr is more restrictive.
If my assumptions are not correct, I'd really like to know what determines this cardinality number.
Thank you in advance for any help
In your case the expression are too complex for the optimizer to use basic statistics to estimate the cardinality. In these cases (it doesn't seem that you use histograms that might affect LIKE predicates) a fixed selectivity is used:
equality operator: 1%
inequality operator: 5%
So your
LIKE example is approximately (5 % + 5 % - (5 % * 5 %)) * 95 % * 95 % => 8.8 % of total table rows. - (5 % * 5 %) is the intersection because of OR operator.
REGEX example is 1 % * 5 % => 0.05 % of total table rows.
Oracle also supports extended statistics where you can compute statistics and histograms for specific expressions or correlated columns.
You comapare plans with direct WHERE conditions and with REGEXP_INSTR functions. Actually there is no difference which function to use, for oracle very difficult to give a real estimate without function execution.
For example we can create function -
CREATE OR REPLACE FUNCTION f_check(str IN VARCHAR2)
RETURN NUMBER IS
BEGIN
IF str LIKE 'A%' THEN
RETURN 1;
END IF;
RETURN -1;
END;
/
First select -
SELECT *
FROM tmptxt
WHERE dsc LIKE 'A%'
Plan hash value: 2928917536
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 121 | 4356 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TMPTXT | 121 | 4356 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DSC" LIKE 'A%')
and with function -
SELECT *
FROM tmptxt
WHERE f_check(dsc) = 1
Plan hash value: 2928917536
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 36 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TMPTXT | 1 | 36 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("F_CHECK"("DSC")=1)
This two queries give the same result, but plan estimate has some difference. It is not too important (fullscan in first way and fullscan in the second), just need to evaluate the whole plan, didn't dwell on the numbers.
My assumption is that this cardinality is the maximum number of rows that will be returned by each condition....
No, cardinality is the estimation of the CBO how many rows will be returned in the operation. (technicaly always >= 1).
The cardinality is calculated either from the object statistics stored in data dictionary or by dynaming sampling (details here).
Dynamic sampling are more costly (as they are calculated in each parse) but can return much precise results.
So one possible workaround to get better estimation is to use dynamic sampling. Here small demo with level 10 (which is extrem and demo only as the whole table is scanned in parsing step; but it is not a problem with 779 rows table and the cardinatlity is exact)
create table tst as
select ltrim(to_char(rownum,'09999')) prp from dual connect by level <= 999999;
select count(*) from tst where prp like '%999%';
280
select count(*) from tst where regexp_instr(prp, '999', 1,1,0,'i' ) > 0;
280
Alter session set optimizer_dynamic_sampling=10;
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from tst where prp like '%999%';
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 280 | 1400 | 467 (2)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| TST | 280 | 1400 | 467 (2)| 00:00:06 |
--------------------------------------------------------------------------
1 - filter("PRP" IS NOT NULL AND "PRP" LIKE '%999%')
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from tst where regexp_instr(prp, '999', 1,1,0,'i' ) > 0;
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 280 | 1400 | 479 (5)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| TST | 280 | 1400 | 479 (5)| 00:00:06 |
--------------------------------------------------------------------------
1 - filter( REGEXP_INSTR ("PRP",'999',1,1,0,'i')>0)

Using DECODE on a parameter in WHERE clause will shortcircuit using the index?

I have a stored procedure with multiple mandatory parameters and a SELECT statement inside it which has multiple conditions in its WHERE clause, like below:
SELECT *
FROM TABLE
WHERE column_1 = param_1
AND column_2 = param_2
AND column_3 = param_3;
This query works fine and it uses the indexes on the table correctly. But a change in requirements implied adjusting the procedure so that you can pass it less parameters, so maybe just the first two, but we want the procedure to work with minimal changes to the stored procedure.
One of the suggestions I've made was to use a DECODE function to treat each possibly NULL parameter, like this:
SELECT *
FROM TABLE
WHERE column_1 = param_1
AND column_2 = param_2
AND column_3 = DECODE(param_3, null, column_3);
And this way, I considered that because the function is not applied on the table column, the index will still be used. I have made some tests and the query still works and uses the indexes even in this situation.
But I'm still getting contradicted by our architect (with no other explanations), that the query will not use the index because I'm using a function in the WHERE clause.
I'm not sure if my change is enough proof that it will always use the index, or if there are other situations which I should check and in which the index might not be used because of the DECODE function.
Any help / suggestions / information will be very much appreciated.
You are right. Test it and prove it.
Setup
SQL> CREATE TABLE t AS SELECT LEVEL id FROM dual CONNECT BY LEVEL <=10;
Table created.
SQL>
SQL> CREATE INDEX id_indx ON t(ID);
Index created.
Test case
Normal query, without any function:
SQL> set autot on explain
SQL>
SQL> SELECT * FROM t WHERE ID = 5;
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Using DECODE on the value(not on column):
SQL> SELECT * FROM t WHERE ID = decode(5, NULL, 3, 5);
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Using NVL on the value(not on column):
SQL> SELECT * FROM t WHERE ID = nvl(5, 3);
ID
----------
5
Execution Plan
----------------------------------------------------------
Plan hash value: 1629656632
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ID_INDX | 1 | 3 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=5)
Above all the three cases, index is used.
DECODE on the column:
SQL> SELECT * FROM t WHERE decode(ID, NULL, 3, 5) = 5;
ID
----------
1
2
3
4
5
6
7
8
9
10
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 3 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(DECODE(TO_CHAR("ID"),NULL,3,5)=5)
NVL on the column:
SQL> SELECT * FROM t WHERE nvl(ID, 3) = 3;
ID
----------
3
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 3 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NVL("ID",3)=3)
SQL>
As expected, index is not used as you are applying a function on the column having a regular index. You need a function-based index.
So, you are right, you don't have to worry about index usage when you are not applying the function on the column, but on the parameter value.

Resources