Oracle 12c Inline View Evaluation - oracle

A long time ago in a database far, far away a developer wrote a query in which he/she was reliant on the order in which predicates were written.
For example,
select x
from a, b
where NOT REGEXP_LIKE (a.column, '[^[:digit:]]')
and a.char_column = b.numeric_column;
(explain plan suggests a to_number conversion will be applied to a.char_column)
I think by chance more than design this "just works" (Oracle 11g). However, the order of the predicates isn't adhered to when running in Oracle 12c, so this query breaks with an invalid number exception. I'm aware that I could try to force 12c to evaluate the predicates in order by using the ORDERED_PREDICATES hint as follows
select /*+ ORDERED_PREDICATES +*/ x
from a, b
where NOT REGEXP_LIKE (a.column, '[^[:digit:]]')
and a.char_column = b.numeric_column
.. Or cast one of the values using to_char for the comparison. The downside is that to_char could operate on say a million rows. I think the following inline view is probably a better solution. Am I guaranteed that the inline view will be evaluated first?
select x
from b
inner join (
select only_rows_with_numeric_values as numeric_column
from a
where NOT REGEXP_LIKE (a.column, '[^[:digit:]]')
) c
on c.numeric_column = b.numeric_column;

About predicate order - look at https://jonathanlewis.wordpress.com/2015/06/02/predicate-order-2/
You should rewrite your last query to next using rownum according to doc(https://docs.oracle.com/database/121/SQLRF/queries008.htm#SQLRF52358)
select x
from b
inner join (
select only_rows_with_numeric_values as numeric_column,
rownum
from a
where NOT REGEXP_LIKE (a.column, '[^[:digit:]]')
) c
on c.numeric_column = b.numeric_column;
to suppress query unnesting or simply using hint /*+ no_unnest*/

Related

how to compare one value against 2 values in Oracle

I want to compare a a value against 2 values without using OR or DECODE. The value I want to compare with two values is the one which I am getting as a return code of a function. If I use OR or DECODE then I have to call function twice and it gives performance hit. Currently I am coding as below
select *
from table1 t1, table2 t1
where t1.empid = t2.empid
and myfunction(t2.balance) = t1.total OR myfunction(t2.balance) = -1
Please suggest if there is a way to call function once and compare with 2 values.
To shorten your code you could use IN operator which acts like OR.
select *
from table1 t1
join table2 t1 on
t1.empid = t2.empid
and myfunction(t2.balance) in (t1.total, -1)
I've also replaced old-fashioned join syntax in where clause for JOIN keyword and you're advised to be using that in your future SQL journeys.
Good thing to know would be that even though you call the function twice, most modern databases would actually call it only once, so I wouldn't be that much concerned about it.

Nested select in hiveQL

In one of my use case, i have two tables namely flow and conf. The flow table contains list of all flight data. It has columns creationdate,datafilename,aircraftid. The conf table contains configuration information. It has columns configdate, aircraftid, configurationame. There are multiple versions of configurations created for one aircraft type. So, when we process a datafilename, we need to identify the aircraftid from the flow table, and pick up the configuration from conf table that was created just before the datafilename was created. So, i tried this,
FROM (
SELECT
F_FILE_CREATION_DATE,
F_FILE_ARCHIVED_RELATIVE_PATH,
F_FILE_ARCHIVED_NAME,
K_AIRCRAFT
from T_FLOW f )x left join
(
select c.config_date, c.aircraft_id, c.configurationfrom t_conf c
) y on y.aircraft_id = x.K_AIRCRAFT
select
x.F_FILE_CREATION_DATE,
x.F_FILE_ARCHIVED_RELATIVE_PATH,
x.F_FILE_ARCHIVED_NAME,
x.K_AIRCRAFT,
y.config_date,
y.aircraft_id,
y.configuration;
This picks up all the configurations created for the aircraft which is obvious as there is no condition to check conf.config_date < flow.f_file_creation_date. I tried to include this condition like this,
FROM (
SELECT
F_FILE_CREATION_DATE,
F_FILE_ARCHIVED_RELATIVE_PATH,
F_FILE_ARCHIVED_NAME,
K_AIRCRAFT
from T_FLOW f )x join
(
select c.config_date, c.aircraft_id, c.FILEFILTER from t_conf c
) y on y.aircraft_id = x.K_AIRCRAFT where y.config_date < x.f_file_creation_date
select
x.F_FILE_CREATION_DATE,
x.F_FILE_ARCHIVED_RELATIVE_PATH,
x.F_FILE_ARCHIVED_NAME,
x.K_AIRCRAFT,
y.config_date,
y.aircraft_id,
y.filefilter;
This time failed with the error
required (...)+ loop did not match anything at input 'where' in statement
Can someone give me a hint or two where i am going wrong and on how to fix this?
select f.f_file_creation_date
,f.f_file_archived_relative_path
,f.f_file_archived_name
,f.k_aircraft
,c.config_date
,c.aircraft_id
,c.filefilter
from t_flow as f
join (select config_date
,aircraft_id
,filefilter
,lead (config_date,1,date '3000-01-01') over
(
partition by aircraft_id
order by config_date
) as next_config_date
from t_conf
) c
on c.aircraft_id =
f.k_aircraft
where f.f_file_creation_date >= c.config_date
and f.f_file_creation_date < c.next_config_date
Please read carefully
Posting a question
When you post a data related question -
Supply a data sample: source data + required results.
It is going to be more clear than any explanation you give.
It will also supply a common background for further discussions and a way for you and others to verify the correctness of the given solutions.
Supply the size properties (records/volume) of the tables.
It is important for performance considerations ans might impact the given solution.
SQL
Hive currently does not support any JOIN condition type other than equijoin (e.g. t1.X = t2.X and t1.Y = t2.Y). This is why you get an error.
If you are doing an inner join (and not outer join) then you can move the non-equijoin conditions to the WHERE clause.
Stick to ISO SQL standard. There is a conventional order for SQL clauses: SELECT-FROM-WHERE...
You gain nothing from esoteric syntax except for esoteric error messages.
There is no reason what so ever to use sub-queries in order to narrow the columns list.
Just to make it perfectly clear - There isn't any performance gain doing that. More than that, if it would have work as you assume (and it does not) the performance would have been worse, not better.
I can't reproduce your error. I guess your query is valid.
What version do you use for Hive ? I tested this query with hive 2.1.1.
DROP TABLE IF EXISTS t_flow;
CREATE TABLE IF NOT EXISTS t_flow (
f_file_creation_date DATE
, f_file_archived_relative_path STRING
, f_file_archived_name STRING
, k_aircraft STRING
);
-- Conf table contains configuration information.
-- It has columns configdate, aircraftid, configurationame
DROP TABLE IF EXISTS t_conf;
CREATE TABLE IF NOT EXISTS t_conf (
config_date DATE
, aircraft_id STRING
, filefilter STRING
);
SELECT
x.f_file_creation_date,
x.f_file_archived_relative_path,
x.f_file_archived_name,
x.k_aircraft,
y.config_date,
y.aircraft_id,
y.filefilter
FROM
(SELECT
f_file_creation_date,
f_file_archived_relative_path,
f_file_archived_name,
k_aircraft
FROM t_flow f) x
JOIN
(SELECT
c.config_date,
c.aircraft_id,
c.filefilter
FROM t_conf c) y on y.aircraft_id = x.k_aircraft where y.config_date < x.f_file_creation_date;

Can we perform nonequi join with outer join?

Can we perform outer join with inquality operators.
When I tried I got the result for right outer join but it's not working for left outer join.
SELECT EMP.ENAME,EMP.SALARY,SALG.SALARY_GRADE
FROM EMPLOYEE EMP , SALARY_GRADES SALG
WHERE EMP.SAL BETWEEN SALG.FROM_RANGE(+) AND SALG.TO_RANGE
Above query is generating the result as inner join where as below query is working fine.
SELECT EMP.ENAME,EMP.SALARY,SALG.SALARY_GRADE
FROM EMPLOYEE EMP , SALARY_GRADES SALG
WHERE EMP.SAL(+) BETWEEN SALG.FROM_RANGE AND SALG.TO_RANGE
I meant to say that right outer join is working fine but not left outer join.
Ummm, yes. Did you create a simple test case to demonstrate? Please always do this.
Both LEFT and RIGHT JOINs work fine. Given the following schema:
create table a (
id number
, val number );
insert all
into a values (1, 1)
into a values (2, 2)
into a values (3, 5)
select * from dual;
create table b (
id number
, min_val number
, max_val number );
insert all
into b values (1, 1, 1)
into b values (2, 1, 6)
into b values (3, 4, 6)
into b values (3, 10, 12)
select * from dual;
These two queries return the expected data. Please note my use of ANSI joins.
select *
from a
left outer join b
on a.val between b.min_val and b.max_val;
select *
from a
right outer join b
on a.val between b.min_val and b.max_val;
Here's the proof.
If you're ever in any doubt as to whether there is a problem with the database or your code you should assume either that your code is incorrect or that the data in your database simply does not exist. It's highly unlikely to be the database itself.
A very good way to test this is to do as I have done, create a very simple example. A short, self-contained, correct example that demonstrates the concepts you're using. You can then apply this to your own code to work out where you might have been going wrong.
You've commented:
Thanks for your answer but......when I insert one more record with 4
as id and 50 as value using insert into a values(4,50); then if i
query using oracle proprietary syntax like select * from a, b where
a.val between b.min_val(+) and b.max_val; I am not getting inserted
record in the result...? It is working with ansi syntax but not with
traditional syntax.....
So, this would imply that your query using the Oracle proprietary syntax is incorrect. I much prefer the ANSI standard as it's extremely obvious if you've done something wrong and it's portable. However, if you want to use the Oracle syntax the reason is that you've turned it into an INNER JOIN but not stating that both items in the BETWEEN are part of the OUTER JOIN:
select *
from a
, b
where a.val between b.min_val(+) and b.max_val(+);

Join in Cursor query or two cursors, which is faster?

I need some suggestions for my cursor which is expected to run against millions of records. Here is my cursor query.
CURSOR items_cursor IS -- Brings only records that need to be updated
SELECT a.*, b.* FROM
( SELECT DataId, Name, VersionNum, OwnerId, SubType, LEVEL Lev FROM DTree
START WITH ParentId = startFrom CONNECT BY PRIOR DataId= ABS(ParentId) -- Brings ABS of ParentId
)a,
(
SELECT o.DataId pDataId, o.Permissions OwnerPerm, p.Permissions PublicPerm FROM DTreeAcl o, DTreeAcl p WHERE
o.DataId=p.Dataid AND o.AclType=1 AND p.AclType=3 AND (o.Permissions != ownerPerm OR p.Permissions != publicPerm)
)b
WHERE a.Lev >= 1 AND a.Lev <= 3 AND a.DataId = b.pDataId;
Is it better to get data from second table in another cursor inside the first cursor than join everything in first cursor itself??
A database is built to join. In the vast majority of cases, you're better off letting the database do the join in SQL rather than trying to write your own in PL/SQL.
The only way you'd be better off writing the join in PL/SQL would be if you know that you want a nested loop join and the Oracle optimizer chooses a much less efficient plan. In that case, though, you'd be better off getting the optimizer to give you the plan you want rather than writing a nested loop join in PL/SQL.

Does Oracle implicit conversion depend on joined tables or views

I've faced with a weird problem now. The query itself is huge so I'm not going to post it here (I could post however in case someone needs to see). Now I have a table ,TABLE1, with a CHAR(1) column, COL1. This table column is queried as part of my query. When I filter the recordset for this column I say:
WHERE TAB1.COL1=1
This way the query runs and returns a very big resultset. I've recently updated one of the subqueries to speed up the query. But after this when I write WHERE TAB1.COL1=1 it does not return anything, but if I change it to WHERE TAB1.COL1='1' it gives me the records I need. Notice the WHERE clause with quotes and w/o them. So to make it more clear, before updating one of the sub-queries I did not have to put quotes to check against COL1 value, but after updating I have to. What feature of Oracle is it that I'm not aware of?
EDIT: I'm posting the tw versions of the query in case someone might find it useful
Version 1:
SELECT p.ssn,
pss.pin,
pd.doc_number,
p.surname,
p.name,
p.patronymic,
to_number(p.sex, '9') as sex,
citiz_c.short_name citizenship,
p.birth_place,
p.birth_day as birth_date,
coun_c.short_name as country,
di.name as leg_city,
trim( pa.settlement
|| ' '
|| pa.street) AS leg_street,
pd.issue_date,
pd.issuing_body,
irs.irn,
irs.tpn,
irs.reg_office,
to_number(irs.insurer_type, '9') as insurer_type,
TO_CHAR(sa.REG_CODE)
||CONVERT_INT_TO_DOUBLE_LETTER(TO_NUMBER(SUBSTR(TO_CHAR(sa.DOSSIER_NR, '0999999'), 2, 3)))
||SUBSTR(TO_CHAR(sa.DOSSIER_NR, '0999999'), 5, 4) CONVERTED_SSN_DOSSIER_NR,
fa.snr
FROM
(SELECT pss_t.pin,
pss_t.ssn
FROM EHDIS_INSURANCE.pin_ssn_status pss_t
WHERE pss_t.difference_status < 5
) pss
INNER JOIN SSPF_CENTRE.file_archive fa
ON fa.ssn = pss.ssn
INNER JOIN SSPF_CENTRE.persons p
ON p.ssn = fa.ssn
INNER JOIN
(SELECT pd_2.ssn,
pd_2.type,
pd_2.series,
pd_2.doc_number,
pd_2.issue_date,
pd_2.issuing_body
FROM
--The changed subquery starts here
(SELECT ssn,
MIN(type) AS type
FROM SSPF_CENTRE.person_documents
GROUP BY ssn
) pd_1
INNER JOIN SSPF_CENTRE.person_documents pd_2
ON pd_2.type = pd_1.type
AND pd_2.ssn = pd_1.ssn
) pd
--The changed subquery ends here
ON pd.ssn = p.ssn
INNER JOIN SSPF_CENTRE.ssn_archive sa
ON p.ssn = sa.ssn
INNER JOIN SSPF_CENTRE.person_addresses pa
ON p.ssn = pa.ssn
INNER JOIN
(SELECT i_t.irn,
irs_t.ssn,
i_t.tpn,
i_t.reg_office,
(
CASE i_t.insurer_type
WHEN '4'
THEN '1'
ELSE i_t.insurer_type
END) AS insurer_type
FROM sspf_centre.irn_registered_ssn irs_t
INNER JOIN SSPF_CENTRE.insurers i_t
ON i_t.irn = irs_t.new_irn
OR i_t.old_irn = irs_t.old_irn
WHERE irs_t.is_registration IS NOT NULL
AND i_t.is_real IS NOT NULL
) irs ON irs.ssn = p.ssn
LEFT OUTER JOIN SSPF_CENTRE.districts di
ON di.code = pa.city
LEFT OUTER JOIN SSPF_CENTRE.countries citiz_c
ON p.citizenship = citiz_c.numeric_code
LEFT OUTER JOIN SSPF_CENTRE.countries coun_c
ON pa.country_code = coun_c.numeric_code
WHERE pa.address_flag = '1'--Here's the column value with quotes
AND fa.form_type = 'Q3';
And Version 2:
SELECT p.ssn,
pss.pin,
pd.doc_number,
p.surname,
p.name,
p.patronymic,
to_number(p.sex, '9') as sex,
citiz_c.short_name citizenship,
p.birth_place,
p.birth_day as birth_date,
coun_c.short_name as country,
di.name as leg_city,
trim( pa.settlement
|| ' '
|| pa.street) AS leg_street,
pd.issue_date,
pd.issuing_body,
irs.irn,
irs.tpn,
irs.reg_office,
to_number(irs.insurer_type, '9') as insurer_type,
TO_CHAR(sa.REG_CODE)
||CONVERT_INT_TO_DOUBLE_LETTER(TO_NUMBER(SUBSTR(TO_CHAR(sa.DOSSIER_NR, '0999999'), 2, 3)))
||SUBSTR(TO_CHAR(sa.DOSSIER_NR, '0999999'), 5, 4) CONVERTED_SSN_DOSSIER_NR,
fa.snr
FROM
(SELECT pss_t.pin,
pss_t.ssn
FROM EHDIS_INSURANCE.pin_ssn_status pss_t
WHERE pss_t.difference_status < 5
) pss
INNER JOIN SSPF_CENTRE.file_archive fa
ON fa.ssn = pss.ssn
INNER JOIN SSPF_CENTRE.persons p
ON p.ssn = fa.ssn
INNER JOIN
--The changed subquery starts here
(SELECT ssn,
type,
series,
doc_number,
issue_date,
issuing_body
FROM
(SELECT ssn,
type,
series,
doc_number,
issue_date,
issuing_body,
ROW_NUMBER() OVER (partition BY ssn order by type) rn
FROM SSPF_CENTRE.person_documents
)
WHERE rn = 1
) pd --
--The changed subquery ends here
ON pd.ssn = p.ssn
INNER JOIN SSPF_CENTRE.ssn_archive sa
ON p.ssn = sa.ssn
INNER JOIN SSPF_CENTRE.person_addresses pa
ON p.ssn = pa.ssn
INNER JOIN
(SELECT i_t.irn,
irs_t.ssn,
i_t.tpn,
i_t.reg_office,
(
CASE i_t.insurer_type
WHEN '4'
THEN '1'
ELSE i_t.insurer_type
END) AS insurer_type
FROM sspf_centre.irn_registered_ssn irs_t
INNER JOIN SSPF_CENTRE.insurers i_t
ON i_t.irn = irs_t.new_irn
OR i_t.old_irn = irs_t.old_irn
WHERE irs_t.is_registration IS NOT NULL
AND i_t.is_real IS NOT NULL
) irs ON irs.ssn = p.ssn
LEFT OUTER JOIN SSPF_CENTRE.districts di
ON di.code = pa.city
LEFT OUTER JOIN SSPF_CENTRE.countries citiz_c
ON p.citizenship = citiz_c.numeric_code
LEFT OUTER JOIN SSPF_CENTRE.countries coun_c
ON pa.country_code = coun_c.numeric_code
WHERE pa.address_flag = 1--Here's the column value without quotes
AND fa.form_type = 'Q3';
I've put separating comments for the changed subqueries and the WHERE clause in both queries. Both versions of the subqueries return the same result, one of them is just slower, which is why I decided to update it.
With the most simplistic example I can't reproduce your problem on 11.2.0.3.0 or 11.2.0.1.0.
SQL> create table tmp_test ( a char(1) );
Table created.
SQL> insert into tmp_test values ('1');
1 row created.
SQL> select *
2 from tmp_test
3 where a = 1;
A
-
1
If I then insert a non-numeric value into the table I can confirm Chris' comment "that Oracle will rewrite tab1.col1 = 1 to to_number(tab1.col1) = 1", which implies that you only have numeric characters in the column.
SQL> insert into tmp_test values ('a');
1 row created.
SQL> select *
2 from tmp_test
3 where a = 1;
ERROR:
ORA-01722: invalid number
no rows selected
If you're interested in tracking this down you should gradually reduce the complexity of the query until you have found a minimal, reproducible, example. Oracle can pre-compute a conversion to be used in a JOIN, which as your query is complex seems like a possible explanation of what's happening.
Oracle explicitly recommends against using implicit conversion so it's wiser not to use it at all; as you're finding out. For a start there's no guarantees that your indexes will be used correctly.
Oracle recommends that you specify explicit conversions, rather than rely on implicit or automatic conversions, for these reasons:
SQL statements are easier to understand when you use explicit data type conversion functions.
Implicit data type conversion can have a negative impact on performance, especially if the data type of a column value is converted to that of a constant rather than the other way around.
Implicit conversion depends on the context in which it occurs and may not work the same way in every case. For example, implicit conversion from a datetime value to a VARCHAR2 value may return an unexpected year depending on the value of the NLS_DATE_FORMAT
parameter.
Algorithms for implicit conversion are subject to change across software releases and among Oracle products. Behavior of explicit conversions is more predictable.
If you do only have numeric characters in the column I would highly recommend changing this to a NUMBER(1) column and I would always recommend explicit conversion to avoid a lot of pain in the longer run.
It's hard to tell without the actual query. What I would expect is that TAB1.COL1 is in some way different before and after the refactoring.
Candidates differences are Number vs. CHAR(1) vs. CHAR(x>1) vs VARCHAR2
It is easy to introduce differences like this with subqueries where you join two tables which have different types in the join column and you return different columns in your subquery.
To hunt that issue down you might want to check the exact datatypes of your query. Not sure how to do that right now .. but an idea would be to put it in a view and use sqlplus desc on it.

Resources