Using rownum in subquery - oracle

In an algorithm the users passes a query, for instance:
SELECT o_orderdate, o_orderpriority FROM h_orders WHERE rownum <= 5
The query returns the following:
1996-01-02 5-LOW
1996-12-01 1-URGENT
1993-10-14 5-LOW
1995-10-11 5-LOW
1994-07-30 5-LOW
The algorithm needs the count for the select attributes (o_orderdate, o_orderpriority in the above example) and therefore it rewrites the query to:
SELECT o_orderdate, count(o_orderdate) FROM
(SELECT o_orderdate, o_orderpriority FROM h_orders WHERE rownum <= 5)
GROUP BY o_orderdate
This query returns the following:
1992-01-01 5
However the intended result is:
1996-12-01 1
1995-10-11 1
1994-07-30 1
1996-01-02 1
1993-10-14 1
Any idea how I could rewrite the parsing stage or how the user could pass a syntactically different query to receive the above results?

The rows returned by the inner query are essentially non-deterministic, as they depend on the order in which the optimiser identifies rows as part of the required data set. A change in execution plan due to modified predicates might change the order in which the rows come back, and new rows added to the table can also change which rows are included.

If you always want n rows then either use distinct(o_orderdate) in the innerquery, which will render the GROUP BY useless.
Or you can add another outer select with rownum to get n of the grouped rows, like this:
select o_orderdate, counter from
(
SELECT o_orderdate, count(o_orderdate) as counter FROM
(SELECT o_orderdate, o_orderpriority FROM h_orders)
GROUP BY o_orderdate
)
WHERE rownum <= 5
Although the results will most likely be useless as they will be undeterministic (as mentioned by David Aldridge).

As your outer query makes no use of "o_orderpriority", why not just get rid of the subquery and simply query like this:
SELECT o_orderdate, count(o_orderdate) AS order_count
FROM h_orders
WHERE rownum <= 5
GROUP BY o_orderdate

Related

Hadoop Hive MAX gives multiple results

I am trying to get a maximum value from a count selecting 2 label srcip and max, but everytime I include srcip I have to use group by srcip at the end and gives me result as the max wasnt even there.
When I write the query like this it gives me the correct max value but I want to select srcip as well.
Select max(count1) as maximum
from (SELECT srcip,count(srcip) as count1 from data group by srcip)t;
But when I do include srcip in the select I get result as there was no max function
Select srcip,max(count1) as maximum
from (SELECT srcip,count(srcip) as count1 from data group by srcip)t
group by srcip;
I would expect from this a single result but I get multiple.
Anyone has any ideas?
You may do ORDER BY count DESC with LIMIT 1 to get the scrip with MAX of count.
SELECT srcip, count(srcip) as count1
from data group by srcip
ORDER BY count1 DESC LIMIT 1
let's consider you have a data like this.
Table
Let's see what happens when you run following query, what happens to data.
Query
SELECT srcip,count(srcip) as count1 from data group by srcip
Output: table1
Now let's see what happens you run your outer query on above table .
Select srcip,max(count1) as maximum from table1 group by srcip
Same Output
Reason being your query says to select srcip and maximum of count from each group of srcip. And we have 3 groups, so 3 rows.
The query below returns exact one row having the max count and the associated scrip. This is the query based on the expected result; you would rather look more into sql and earlier comments, then progress to hive analytical queries.
Some people could argue that there is better way to optimize this query for your expected result but this should give you a motivation to look more into Hive analytical queries.
select scrip, count1 as maximum from (select srcip, count(scrip) over (PARTITION by scrip) as count1, row_number() over (ORDER by scrip desc) as row_num from data) q1 having row_num = 1;

Passing a parameter to a WITH clause query in Oracle

I'm wondering if it's possible to pass one or more parameters to a WITH clause query; in a very simple way, doing something like this (taht, obviously, is not working!):
with qq(a) as (
select a+1 as increment
from dual
)
select qq.increment
from qq(10); -- should get 11
Of course, the use I'm going to do is much more complicated, since the with clause should be in a subquery, and the parameter I'd pass are values taken from the main query....details upon request... ;-)
Thanks for any hint
OK.....here's the whole deal:
select appu.* from
(<quite a complex query here>) appu
where not exists
(select 1
from dual
where appu.ORA_APP IN
(select slot from
(select distinct slots.inizio,slots.fine from
(
with
params as (select 1900 fine from dual)
--params as (select app.ora_fine_attivita fine
-- where app.cod_agenda = appu.AGE
-- and app.ora_fine_attivita = appu.fine_fascia
--and app.data_appuntamento = appu.dataapp
--)
,
Intervals (inizio, EDM) as
( select 1700, 20 from dual
union all
select inizio+EDM, EDM from Intervals join params on
(inizio <= fine)
)
select * from Intervals join params on (inizio <= fine)
) slots
) slots
where slots.slot <= slots.fine
)
order by 1,2,3;
Without going in too deep details, the where condition should remove those records where 'appu.ORA_APP' match one of the records that are supposed to be created in the (outer) 'slots' table.
The constants used in the example are good for a subset of records (a single 'appu.AGE' value), that's why I should parametrize it, in order to use the commented 'params' table (to be replicated, then, in the 'Intervals' table.
I know thats not simple to analyze from scratch, but I tried to make it as clear as possible; feel free to ask for a numeric example if needed....
Thanks

Oracle SELECT * FROM LARGE_TABLE - takes minutes to respond

So I have a simple table with 5 or so columns, one of which is a clob containing some JSON data.
I am running
SELECT * FROM BIG_TABLE
SELECT * FROM BIG_TABLE WHERE ROWNUM < 2
SELECT * FROM BIG_TABLE WHERE ROWNUM = 1
SELECT * FROM BIG_TABLE WHERE ID=x
I expect that any fractionally intelligent relational database would return the data immediately. We are not imposing order by/group by clauses, so why not return the data as and when you find it?
Of all the forms of SELECT statements above, only 4. returned in a sub-second manner. This is unexpected for 1-3 which are returning between 1 and 10 minutes before the query shows any responses in SQL Developer. SQL Developer has the standard SQL Array Fetch Size of 50 (JDBC Fetch size of 50 rows) so at a minimum, it is taking 1-10 minutes to return 50 rows from a simple table with no joins on a super high-performance RAC cluster backed by fancy 4-tiered EMC disk subsystem.
Explain plans show a table scan. Fine, but why should I wait 1-10 minutes for the results with rownum in the WHERE clause?
What is going on here?
OK - I found the issue. ROWNUM does not operate like I thought it did and in the code above it never stops the full table scan.
This is because:
RowNum is assigned during the predicate operation (where clause evaluation) and incremented afterwards, i.e.: your row makes it into the result set and then gets rownum assigned.
In order to filter by rownum you need to already have it exist, something like ...
SELECT * FROM (SELECT * FROM BIG_TABLE) WHERE ROWNUM < 1
In effect what this means is that there is no way to filter out the top 5 rows from a table without having first selected the entire table if no other filter criteria are involved.
I solved my problem like this...
SELECT * FROM (SELECT * FROM BIG_TABLE WHERE
DATE_COL BETWEEN :Date1 AND :Date2) WHERE ROWNUM < :x;

Oracle pagination ROWNUM column>=value challenge

Having some trouble with oracle pagination. Case:
Table with > 1 billion rows:
Measurement(Id Number, Classification VARCHAR, Value NUMBER)
Index:
ON Measurement(Value)
I need a query that gets the first match and the following 2000 matches ordered by Value. I also would like to use the index.
First idea:
SELECT * FROM Measurement WHERE Value >= 1234567890
AND ROWNUM <= 2000 ORDER BY Value ASC
Result:
The query just returns the first 2000 cases it can find in the table, starting from the top, where Value is higher or equal to 1234567890, and then orders that resultset ascending.
Second idea:
SELECT * FROM
(SELECT * FROM Measurement WHERE Value >= 1234567890 ORDER BY Value ASC)
WHERE ROWNUM <= 2000
Result:
Oracle does not understand that ROWNUM should limit the amount from the inner query, so oracle decides to get all rows where Value is greater or equal to 1234567890 first, and then order that giant resultset before returning the first 2000 rows. Because Oracle is guessing that most of the data in the table will be returned, it ignores any use of index as well.
None of these approaches are acceptable as the first one gives the wrong results, and the second one takes hours.
Is pagination supported at all in Oracle?
You can use the following
SELECT * FROM
(SELECT Id, Classification, Value, ROWNUM Rank FROM Measurement WHERE Value >= 1234567890)
WHERE Rank <= 2000
order by Rank
You do not need to order in the sub-query. Simply unnecessary.
The above is not pagination but the firs page I would suppose.
Not sure if you got the solution for your problem, but to put my two cents:
The first query will not answer your requirements as it will fetch 2000 random records that satisfy your query and then do an order by.
Coming to the second query :
Oracle will first do the execution of the second query and will then only move to the outer query. So, the rownum filter will be applied only after the inner query is executed.
You can try the below approach, to do INDEX FAST FULL SCAN, i have tested it on a table with 2.76 million rows and it is having lesser cost than the other approach:
SELECT * from Measurement
where value in ( SELECT VALUE FROM
(SELECT Value FROM Measurement
WHERE Value >= 1234567890 ORDER BY Value ASC)
WHERE ROWNUM <= 2000)
Hope it Helps
Vishad
I think I have fond a potential solution. However, it's not a query.
declare
cursor c is
SELECT * FROM Measurement WHERE Value >= 1234567890 ORDER BY Value ASC;
l_rec c%rowtype;
begin
open c;
for i in 1 .. 2000
loop
fetch c into l_rec;
exit when c%notfound;
end loop;
close c;
end;
/
Kindly experiment with more options
SELECT *
FROM( SELECT /*+ FIRST_ROWS(2000) */
Id,
Classification,
Value,
ROW_NUMBER() OVER (ORDER BY Value) AS rn
FROM Measurement
where Value > 1234567889
)
WHERE rn <=2000;
Update1:- Force the use of index on Value.Here IDX_ON_VALUE is the Name of the index on Value in Measurement
SELECT * FROM
(SELECT /*+ INDEX(a IDX_ON_VALUE) */* FROM Measurement
a WHERE value >=1234567890 )
ORDER BY a.Value ASC)
WHERE ROWNUM <= 2000

How to get records randomly from the oracle database?

I need to select rows randomly from an Oracle DB.
Ex: Assume a table with 100 rows, how I can randomly return 20 of those records from the entire 100 rows.
SELECT *
FROM (
SELECT *
FROM table
ORDER BY DBMS_RANDOM.RANDOM)
WHERE rownum < 21;
SAMPLE() is not guaranteed to give you exactly 20 rows, but might be suitable (and may perform significantly better than a full query + sort-by-random for large tables):
SELECT *
FROM table SAMPLE(20);
Note: the 20 here is an approximate percentage, not the number of rows desired. In this case, since you have 100 rows, to get approximately 20 rows you ask for a 20% sample.
SELECT * FROM table SAMPLE(10) WHERE ROWNUM <= 20;
This is more efficient as it doesn't need to sort the Table.
SELECT column FROM
( SELECT column, dbms_random.value FROM table ORDER BY 2 )
where rownum <= 20;
In summary, two ways were introduced
1) using order by DBMS_RANDOM.VALUE clause
2) using sample([%]) function
The first way has advantage in 'CORRECTNESS' which means you will never fail get result if it actually exists, while in the second way you may get no result even though it has cases satisfying the query condition since information is reduced during sampling.
The second way has advantage in 'EFFICIENT' which mean you will get result faster and give light load to your database.
I was given an warning from DBA that my query using the first way gives loads to the database
You can choose one of two ways according to your interest!
In case of huge tables standard way with sorting by dbms_random.value is not effective because you need to scan whole table and dbms_random.value is pretty slow function and requires context switches. For such cases, there are 3 additional methods:
1: Use sample clause:
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6
for example:
select *
from s1 sample block(1)
order by dbms_random.value
fetch first 1 rows only
ie get 1% of all blocks, then sort them randomly and return just 1 row.
2: if you have an index/primary key on the column with normal distribution, you can get min and max values, get random value in this range and get first row with a value greater or equal than that randomly generated value.
Example:
--big table with 1 mln rows with primary key on ID with normal distribution:
Create table s1(id primary key,padding) as
select level, rpad('x',100,'x')
from dual
connect by level<=1e6;
select *
from s1
where id>=(select
dbms_random.value(
(select min(id) from s1),
(select max(id) from s1)
)
from dual)
order by id
fetch first 1 rows only;
3: get random table block, generate rowid and get row from the table by this rowid:
select *
from s1
where rowid = (
select
DBMS_ROWID.ROWID_CREATE (
1,
objd,
file#,
block#,
1)
from
(
select/*+ rule */ file#,block#,objd
from v$bh b
where b.objd in (select o.data_object_id from user_objects o where object_name='S1' /* table_name */)
order by dbms_random.value
fetch first 1 rows only
)
);
To randomly select 20 rows I think you'd be better off selecting the lot of them randomly ordered and selecting the first 20 of that set.
Something like:
Select *
from (select *
from table
order by dbms_random.value) -- you can also use DBMS_RANDOM.RANDOM
where rownum < 21;
Best used for small tables to avoid selecting large chunks of data only to discard most of it.
Here's how to pick a random sample out of each group:
SELECT GROUPING_COLUMN,
MIN (COLUMN_NAME) KEEP (DENSE_RANK FIRST ORDER BY DBMS_RANDOM.VALUE)
AS RANDOM_SAMPLE
FROM TABLE_NAME
GROUP BY GROUPING_COLUMN
ORDER BY GROUPING_COLUMN;
I'm not sure how efficient it is, but if you have a lot of categories and sub-categories, this seems to do the job nicely.
-- Q. How to find Random 50% records from table ?
when we want percent wise randomly data
SELECT *
FROM (
SELECT *
FROM table_name
ORDER BY DBMS_RANDOM.RANDOM)
WHERE rownum <= (select count(*) from table_name) * 50/100;

Resources