How to select a certain number of random values with odi 12 c? - random

How can I make the following query work in odi 12c?
I need your support on which components I need to use and how.
SELECT *
FROM (
SELECT *
FROM table name
ORDER BY DBMS_RANDOM.RANDOM)
WHERE rownum <= n;
I used sort component but it gives me the warning in this image.

Related

How can you check query performance with small data set

All the Oracles out here,
I have an Oracle PL/SQL procedure but very small data that can run on the query. I suspect that when the data gets large, the query might start performing back. Are there ways in which I can check for performance and take corrective measure even before the data build up? If I wait for the data buildup, it might get too late.
Do you have any general & practical suggestions for me? Searching the internet did not get me anything convincing.
Better to build yourself some test data to get an idea of how things will perform. Its easy to get started, eg
create table MY_TEST as select * from all_objects;
gives you approx 50,000 rows typically. You can scale that easily with
create table MY_TEST as select a.* from all_objects a ,
( select 1 from dual connect by level <= 10);
Now you have 500,000 rows
create table MY_TEST as select a.* from all_objects a ,
( select 1 from dual connect by level <= 10000);
Now you have 500,000,000 rows!
If you want unique values per row, then add rownum eg
create table MY_TEST as select rownum r, a.* from all_objects a ,
( select 1 from dual connect by level <= 10000);
If you want (say) 100,000 distinct values in a column then TRUNC or MOD. You can also use DBMS_RANDOM to generate random numbers, strings etc.
Also check out Morten's test data generator
https://github.com/morten-egan/testdata_ninja
for some domain specific data, and also the Oracle sample schemas on github which can also be scaled using techniques above.
https://github.com/oracle/db-sample-schemas

Oracle SQL parallel spooling automatically

I have a heavy query which spools data into a csv file that is sent to users. I have manually made parallel sessions and am executing the query with filter condition so that i can join all spooled files at the end into one single file thus reducing the time to generate the data (usually it takes about 10 hours, with parallel sessions it takes 2.5-3 hours).
My query is how can I automate this such that the script will find out max(agreementid) and then distribute it into X number of spool calls to generate X files where each file will be having max 100000 record say.
Additional Explanation: I guess my question was not very clear. I will try and explain again.
I have a table/view with large amount of data.
I need to spool this data into a CSV file.
It takes humongous amount of time to spool the CSV file.
I run parallel spools by doing below.
a) Select .... from ... where agreementid between 1 to 1000000;
b) Select .... from ... where agreementid between 1000001 to 2000000;
and so on and then spooling them individually in multiple sessions.
This helps me to generate multiple file which I can then stictch together and share with users.
I need a script (i guess dos based or AIX based) which will find the min and max of agreementID from my table and create the spooling scripts automatically and execute them through separate sessions of sql so that I get the files generated automatically.
Not sure whether I could make myself clear enough.
Thanks guys for replying to my earlier query but that was not what I was looking at.
A bit unclear what you want, but I think you want a query to find a low/high range of agreement_ids for x groups of ids (buckets). If so, try something like (using 4 buckets in this example):
select bucket, min(agreement_id), max(agreement_id), count(1)
from (
select agreement_id, ntile(4) over (order by agreement_id) bucket
from my_table
)
group by bucket;
Edit: If your problem is in messing with spooling multiple queries and combining, I would rather opt for creating a single materialized view (using parallel in the underlying query on the driving table) and refresh (complete, atomic_refresh=>false) when needed. Once refreshed, simply extract from the snapshot table (to a csv or whatever format you want).
There might be a simpler way, but this generates four 'buckets' of IDs, and you could plug the min and max values into your parametrized filter condition:
select bucket, min(agreementid) as min_id, max(agreementid) as max_id
from (
select agreementid,
case when rn between 1 and cn / 4 then 1
when rn between (cn / 4) - 1 and 2 * (cn / 4) then 2
when rn between (2 * cn / 4) - 1 and 3 * (cn / 4) then 3
when rn between (3 * cn / 4) - 1 and cn then 4
end as bucket
from (
select agreementid, rank() over (order by agreementid) as rn,
count(*) over () as cn from agreements
)
)
group by bucket;
If you wanted an upper limit for each bucket rather than a fixed number of buckets then you could do:
select floor(rn / 100000), min(agreementid) as min_id, max(service_num) as max_id
from (
select agreementid, rank() over (order by agreementid) as rn
from agreements
)
group by floor(rn / 100000);
And then pass each min/max to a SQL script, e.g. from a shell script calling SQL*Plus. The bucket number could be passed as well and be used as part of the spool file name, via a positional parameter.
I'm curious about what you've identified as the bottleneck though; have you tried running it as a parallel query inside the database, with a /*+ PARALLEL */ hint?

Compare two tables with minus operation in oracle

Some tables' data need to be updated(or deleted , inserted) in my system.
But I want to know which data are updated,deleted and inserted.
So before the data are changed ,I will backup the table in different schema
just like this:
create table backup_table as select * from schema1.testtable
and after the data are changed,I want to find the difference between backup_table
and testtable ,and I want to save the difference into a table in the backup schema.
the sql I will run is like this:
CREATE TABLE TEST_COMPARE_RESULT
AS
SELECT 'BEFORE' AS STATUS, T1.*
FROM (
SELECT * FROM backup_table
MINUS
SELECT * FROM schema1.testtable
) T1
UNION ALL
SELECT 'AFTER' AS STATUS, T2.*
FROM (
SELECT * FROM schema1.testtable
MINUS
SELECT * FROM backup_table
) T2
What I am worried about is that I heared about that the minus operation will use
a lot of system resource.In my sysytem, some table size will be over 700M .So I want to
know how oracle will read the 700M data in memory (PGA??) or the temporary tablespace?
and How I should make sure that the resource are enough to to the compare operation?
Minus is indeed a resource intensive task. It need to read both tables and do sorts to compare the two tables. However, Oracle has advanced techniques to do this. It won't load the both tables in memory(SGA) if can't do it. It will use, yes, temporary space for sorts. But I would recommend you to have a try. Just run the query and see what happens. The database won't suffer and allways you can stop the execution of statement.
What you can do to improve the performance of the query:
First, if you have columns that you are sure that won't changed, don't include them.
So, is better to write:
select a, b from t1
minus
select a, b from t2
than using a select * from t, if there are more than these two columns, because the work is lesser.
Second, if the amount of data to compare si really big for your system(too small temp space), you should try to compare them on chunks:
select a, b from t1 where col between val1 and val2
minus
select a, b from t2 where col between val1 and val2
Sure, another possibility than minus is to have some log columns, let's say updated_date. selecting with where updated_date greater than start of process will show you updated records. But this depends on how you can alter the database model and etl code.

hibernate oracle rownum issue

SELECT * FROM (
select *
from tableA
where ColumnA = 'randomText'
ORDER BY columnL ASC
) WHERE ROWNUM <= 25
on execution of this query, due to some Oracle optimization, the query takes about 14 minutes to execute . If I remove the where clause , the query executes in seconds. most of the columns of the table have indexes on them, including the ones mentioned above. I do not have much flexibility on the structure of the query as I use hibernate.
This query returns results instantly too, with the correct result:
SELECT *
FROM (
select *
from tableA,
dual
where ColumnA = 'randomText'
ORDER BY columnL ASC
) WHERE ROWNUM <= 25
is there something I can do, using hibernate?
UPDATE: I use EntityManager.createQuery(), and I use setMaxResults(25) and setFirstResult() too. the query above is what hibernate's query looks like, upon observation of logs
I don't get the explain plans exactly matched to your queries, but it seems oracle using a different index for the two queries.
Can you create an index containing columnA and columnL?
If you have an index only containing columnA, you MIGHT be able to drop that without a large effect on performance of other queries.
An alternative would be to add a hint to use the index used in the faster query. But this would require you to use native sql.
this means you are using hibernate/jpa? If so, I guess you are using the EntityManager.createNativeQuery() to create the query? Try removing your where-restriction and use the .setMaxResults(25) on the Query instead.
Anyways, why do you need the outer-select? Wouldn't
select *
from tableA
where ColumnA = 'randomText'
AND ROWNUM <= 25
ORDER BY columnL ASC
produce the desired results?

Does Oracle's ROWNUM build the whole table before it extract the rows you want?

I need to make a navigation panel that shows only a subset of a possible large result set. This subset is 20 records before and 20 records after the resulted record set. As I navigate the results through the navigation panel, I'll be applying a sliding window design using ROWNUM to get the next subset. My question is does Oracle's ROWNUM build the whole table before it extracts the rows you want? Or is it intelligent enough to only generate the rows I need? I googled and I couldn't find an explanation on this.
The pre-analytic-function method for doing this would be:
select col1, col2 from (
select col1, col2, rownum rn from (
select col1, col2 from the_table order by sort_column
)
where rownum <= 20
)
where rn > 10
The Oracle optimizer will recognize in this case that it only needs to get the top 20 rows to satisfy the inner query. It will likely have to look at all the rows (unless, say, the sort column is indexed in a way that lets it avoid the sort altogether) but it will not need to do a full sort of all the rows.
Your solution will not work (as Bob correctly pointed out) but you can use row_number() to do what you want:
SELECT col1,
col2
FROM (
SELECT col1,
col2,
row_number() over (order by some_column) as rn
FROM your_table
) t
WHERE rn BETWEEN 10 AND 20
Note that this solution has the added benefit that you can order the final result on a different criteria if you want to.
Edit: forgot to answer your initial question:
With the above solution, yes Oracle will have to build the full result in order to find out the correct numbering.
With 11g and above you might improve your query using the query cache.
Concerning the question's title.
See http://www.orafaq.com/wiki/ROWNUM and this in-depth explanation by Tom Kyte.
Concerning the question's goal.
This should be what you're looking for: Paging with Oracle
I don't think your design is quite going to work out as you've planned. Oracle assigns values to ROWNUM in the order that they are produced by the query - the first row produced is assigned ROWNUM=1, the second is assigned ROWNUM=2, etc. Notice that in order to have ROWNUM=21 assigned the query must first return the first twenty rows and thus if you write a query which says
SELECT *
FROM MY_TABLE
WHERE ROWNUM >= 21 AND
ROWNUM <= 40
no rows will be returned because in order for there to be rows with ROWNUM >= 21 the query must first return all the rows with ROWNUM <= 20.
I hope this helps.
It's an old question but you should try this - http://www.inf.unideb.hu/~gabora/pagination/results.html

Resources