CREATE TABLE ak_temp
(
misc_1 varchar2(4000),
misc_2 varchar2(4000),
time_of_insert timestamp
);
Query 1(Original) :
select *
from ak_temp
where misc_1 in('ankush')
or misc_2 in('ankush')
Query 2 :
select *
from ak_temp
where 'ankush' in (misc_1,misc_2)
Hi, I have somewhat similar problem with the queries, i want to avoid Query 1 since the cost in our live environment is coming bit on higher side so i have figured out Query 2 with less cost, are these both functionally equivalent?
The two are functionally equivalent and should produce the same query plan, so one should not have a performance advantage over the other (or any such advantage would be miniscule). Oracle might be smart enough to take advantage of two indexes, one on ak_temp(misc_1) and one on ak_temp(misc_2).
However, you might also consider:
select t.*
from ak_temp t
where misc_1 = 'ankush'
union
select t.*
from ak_temp t
where misc_2 = 'ankush';
This version will definitely take advantage of indexes, but the union can slow things down if many rows match the two conditions.
EDIT:
To avoid the union, you can do:
select t.*
from ak_temp t
where misc_1 = 'ankush'
union all
select t.*
from ak_temp t
where misc_2 = 'ankush' and (misc_1 <> 'ankush' and misc_1 is not null); -- or use `lnnvl()`
Related
I'm using jOOQ to delete a variable number of rows from an Oracle database:
List<Integer> ids = Lists.newArrayList(1, 2, 3, 4);
db.deleteFrom(MESSAGE)
.where(MESSAGE.ID.in(ids))
.execute();
However, this means that a variable number of bind variables is used. This leads to the problem that Oracle always does a hard parse.
I have tried using the unnest or table function to create a statement with only one bind variable. Unfortunately, this does not seem to work. jOOQ creates statements with multiple bind variables and union all statements:
db.deleteFrom(MESSAGE)
.where(MESSAGE.ID.in(
select(field("*", Long.class))
.from(table(ids))
))
.execute();
LoggerListener DEBUG - Executing query : delete from "MESSAGE" where "MESSAGE"."ID" in (select * from ((select ? "COLUMN_VALUE" from dual) union all (select ? "COLUMN_VALUE" from dual) union all (select ? "COLUMN_VALUE" from dual) union all (select ? "COLUMN_VALUE" from dual)) "array_table")
LoggerListener DEBUG - -> with bind values : delete from "MESSAGE" where "MESSAGE"."ID" in (select * from ((select 1 "COLUMN_VALUE" from dual) union all (select 2 "COLUMN_VALUE" from dual) union all (select 3 "COLUMN_VALUE" from dual) union all (select 4 "COLUMN_VALUE" from dual)) "array_table")
The javadoc of the unnest function recommends using the table function for Oracle
Create a table from an array of values.
This is equivalent to the TABLE function for H2, or the UNNEST function in HSQLDB and Postgres
For Oracle, use table(ArrayRecord) instead, as Oracle knows only typed arrays
In all other dialects, unnesting of arrays is emulated using several UNION ALL connected subqueries.
Altough I use the table function, I still get the emulated UNION ALL statement.
TABLE operator specifics
Oracle's TABLE(collection) function doesn't accept arbitrary arrays, only SQL table types, such as:
CREATE TYPE t AS TABLE OF NUMBER
You could create such a type and use the code generator to create a TRecord, which you could pass to that TABLE operator:
from(table(new TRecord(ids)))
Unfortunately, such a type is required in Oracle. In its absence, the UNION ALL emulation seems to be the best way, but that doesn't solve your problem.
Note the TABLE operator has its own caveats, mainly that it produces poor cardinality estimates, so for small tables, it may not be the best choice. Though, do measure yourself! I've documented this here in a blog post
Avoiding cursor cache contention
jOOQ has a nice feature called IN list padding, which mitigates most of the dynamic SQL problems that arise from such hard parses. In short, it prevents arbitrary IN list sizes by repeating the last element to pad the list up to 2^n elements, getting you:
id IN (?) -- For 1 element
id IN (?, ?) -- For 2 elements
id IN (?, ?, ?, ?) -- For 3-4 elements
id IN (?, ?, ?, ?, ?, ?, ?, ?) -- For 5-8 elements
It's a hack for when the TABLE operator is too complex. It works by reducing the number of possible predicates logarithmically, which might just be good enough.
Other alternatives
You can always also use:
Temp tables, and batch insert the IDs in there
Use an actual semi join: ID IN (SELECT original_id FROM original_table) (this is usually the best approach, if possible)
I'm wondering if it's possible to pass one or more parameters to a WITH clause query; in a very simple way, doing something like this (taht, obviously, is not working!):
with qq(a) as (
select a+1 as increment
from dual
)
select qq.increment
from qq(10); -- should get 11
Of course, the use I'm going to do is much more complicated, since the with clause should be in a subquery, and the parameter I'd pass are values taken from the main query....details upon request... ;-)
Thanks for any hint
OK.....here's the whole deal:
select appu.* from
(<quite a complex query here>) appu
where not exists
(select 1
from dual
where appu.ORA_APP IN
(select slot from
(select distinct slots.inizio,slots.fine from
(
with
params as (select 1900 fine from dual)
--params as (select app.ora_fine_attivita fine
-- where app.cod_agenda = appu.AGE
-- and app.ora_fine_attivita = appu.fine_fascia
--and app.data_appuntamento = appu.dataapp
--)
,
Intervals (inizio, EDM) as
( select 1700, 20 from dual
union all
select inizio+EDM, EDM from Intervals join params on
(inizio <= fine)
)
select * from Intervals join params on (inizio <= fine)
) slots
) slots
where slots.slot <= slots.fine
)
order by 1,2,3;
Without going in too deep details, the where condition should remove those records where 'appu.ORA_APP' match one of the records that are supposed to be created in the (outer) 'slots' table.
The constants used in the example are good for a subset of records (a single 'appu.AGE' value), that's why I should parametrize it, in order to use the commented 'params' table (to be replicated, then, in the 'Intervals' table.
I know thats not simple to analyze from scratch, but I tried to make it as clear as possible; feel free to ask for a numeric example if needed....
Thanks
i am trying to better a query. I have a dataset of ticket opened. Every ticket has different rows, every row rappresent an update of the ticket. There is a field (dt_update) that differs it every row.
I have this indexs in the st_remedy_full_light.
IDX_ASSIGNMENT (ASSIGNMENT)
IDX_REMEDY_INC_ID (REMEDY_INC_ID)
IDX_REMDULL_LIGHT_DTUPD (DT_UPDATE)
Now, the query is performed in 8 second. Is high for me.
WITH last_ticket AS
( SELECT *
FROM st_remedy_full_light a
WHERE a.dt_update IN
( SELECT MAX(dt_update)
FROM st_remedy_full_light
WHERE remedy_inc_id = a.remedy_inc_id
)
)
SELECT remedy_inc_id, ASSIGNMENT FROM last_ticket
This is the plan
How i could to better this query?
P.S. This is just a part of a big query
Additional information:
- The table st_remedy_full_light contain 529.507 rows
You could try:
WITH last_ticket AS
( SELECT remedy_inc_id, ASSIGNMENT,
rank() over (partition by remedy_inc_id order by dt_update desc) rn
FROM st_remedy_full_light a
)
SELECT remedy_inc_id, ASSIGNMENT FROM last_ticket
where rn = 1;
The best alternative query, which is also much easier to execute, is this:
select remedy_inc_id
, max(assignment) keep (dense_rank last order by dt_update)
from st_remedy_full_light
group by remedy_inc_id
This will use only one full table scan and a (hash/sort) group by, no self joins.
Don't bother about indexed access, as you'll probably find a full table scan is most appropriate here. Unless the table is really wide and a composite index on all columns used (remedy_inc_id,dt_update,assignment) would be significantly quicker to read than the table.
I have two hive tables (t1 and t2) that I would like to compare. The second table has 5 additional columns that are not in the first table. Other than the five disjoint fields, the two tables should be identical. I am trying to write a query to check this. Here is what I have so far:
SELECT * FROM t1
UNION ALL
select * from t2
GROUP BY some_value
HAVING count(*) == 2
If the tables are identical, this should return 0 records. However, since the second table contains 5 extra fields, I need to change the second select statement to reflect this. There are almost 60 column names so I would really hate to write it like this:
SELECT * FROM t1
UNION ALL
select field1, field2, field3,...,fieldn from t2
GROUP BY some_value
HAVING count(*) == 2
I have looked around and I know there is no select * EXCEPT syntax, but is there a way to do this query without having to explicity name each column that I want included in the final result?
You should have used UNION DISTINCT for the logic you are applying.
However, the number and names of columns returned by each select_statement have to be the same otherwise a schema error is thrown.
You could have a look at this Python program that handles such comparisons of Hive tables (comparing all the rows and all the columns), and would show you in a webpage the differences that might appear: https://github.com/bolcom/hive_compared_bq
To skip the 5 extra fields, you could use the "--ignore-columns" option.
For a pair of cursors where the total number of rows in the resultset is required immediately after the first FETCH, ( after some trial-and-error ) I came up with the query below
SELECT
col_a,
col_b,
col_c,
COUNT(*) OVER( PARTITION BY 1 ) AS rows_in_result
FROM
myTable JOIN theirTable ON
myTable.col_a = theirTable.col_z
GROUP BY
col_a, col_b, col_c
ORDER BY
col_b
Now when the output of the query is X rows, rows_in_result reflects this accurately.
What does PARTITION BY 1 mean?
I think it probably tells the database to partition the results into pieces of 1-row each
It is an unusual use of PARTITION BY. What it does is put everything into the same partition so that if the query returns 123 rows altogether, then the value of rows_in_result on each row will be 123 (as its alias implies).
It is therefore equivalent to the more concise:
COUNT(*) OVER ()
Databases are quite free to add restrictions to the OVER() clause. Sometimes, either PARTITION BY [...] and/or ORDER BY [...] are mandatory clauses, depending on the aggregate function. PARTITION BY 1 may just be a dummy clause used for syntax integrity. The following two are usually equivalent:
[aggregate function] OVER ()
[aggregate function] OVER (PARTITION BY 1)
Note, though, that Sybase SQL Anywhere and CUBRID interpret this 1 as being a column index reference, similar to what is possible in the ORDER BY [...] clause. This might appear to be a bit surprising as it imposes an evaluation order to the query's projection. In your case, this would then mean that the following are equivalent
COUNT(*) OVER (PARTITION BY 1)
COUNT(*) OVER (PARTITION BY col_a)
This curious deviation from other databases' interpretation allows for referencing more complex grouping expressions.