inserting big random data into blob column - oracle

I need to debug some flow and to do that, I need to generate at least some 100k rows, using say:
insert into a.b(DATA, ID, ORDER_ID)
SELECT UTL_RAW.CAST_TO_RAW('too short!'),
raw_to_guid(sys_guid()),
level
FROM dual CONNECT BY level <= 5;
but that data are too small! I don't care what data is it, it can be all ones. I can generate random bytes base64 encoded, but I have no idea how to pass it into oracle, but it would be even better to fix somehow the code above and generate it on db side. Can someone advice?

You can define a function to generate a BLOB from random data; and since Oracle 12c that can be an on-the-fly function defined in a with clause (read more):
insert /*+ WITH_PLSQL */ into b (data, id, order_id)
with
function generate_blob return blob as
l_blob blob;
begin
dbms_lob.createtemporary(l_blob, false);
for i in 1..50 loop
dbms_lob.append(
l_blob,
to_blob(utl_raw.cast_to_raw(dbms_random.string('p', 2000)))
);
end loop;
return l_blob;
end generate_blob;
select generate_blob,
raw_to_guid(sys_guid()),
level
from dual connect by level <= 5;
which inserts five rows with a random 100-byte BLOB value; this query shows the length and the first 20 bytes to show they are random:
select order_id, id, lengthb(data), rawtohex(dbms_lob.substr(data, 20, 1)) from b
ORDER_ID
ID
LENGTHB(DATA)
RAWTOHEX(DBMS_LOB.SUBSTR(DATA,20,1))
1
AAFAB7EF-11F8-EED5-E053-182BA8C00F30
100000
773C2F707E7D406D712D575B3C3D3135273F276B
2
AAFAB7EF-12F8-EED5-E053-182BA8C00F30
100000
676E282367235D455E2F4F547872315A59705A31
3
AAFAB7EF-13F8-EED5-E053-182BA8C00F30
100000
422D5C503F64733A623C215A49696D4640643B3F
4
AAFAB7EF-14F8-EED5-E053-182BA8C00F30
100000
5426716867607C203F6D253A2843666F4E203472
5
AAFAB7EF-15F8-EED5-E053-182BA8C00F30
100000
2E353E5228723356407D566026374E4B5E617447
fiddle
You can make the BLOBs much bigger if you want, just by change the loop's upper bound from 50 to something larger.

Related

Randomly select 10 subjects and retain all of their observations

I am stuck with a the following problem in SAS. I have a dataset of this format:
The dataet consists of 500ids with different number of observations per ID. I'm trying to randomly select 5id's and at the same time retain all of their observations. I built a random generator in the first place saving a vector with 10 numbers in the interval [1,500]. However it became clumpsy when I tried to use this vector in order to select the ids correspoding to the vector with the random numbers. To be more clear, I want my net result to be a dataset which includes all observations correspoding to ID 1,10,43, 22, 67, or any other sequence of 5 numbers.
Any tip will be more than appreciated!
From your question, I assume you already have your 10 random numbers. If they are saved in a table/dataset, you can run a left join between them and your original dataset, by id. This will pull out all the original observations with the same id.
Let's say that your ramdonly selected numbers are saved in a table called "random_ids". Then, you can do:
proc sql;
create table want as
select distinct
t1.id,
t2.*
from random_ids as t1
left join have as t2 on t1.id = t2.id;
quit;
If your random numbers are not saved in a dataset, you may simply copy them to a where statement, like:
proc sql;
create table want as
select distinct
*
from have
where id in (1 10 43 22 67) /*here you put the ids you want*/
quit;
Best,
Proc SURVEYSELECT is your friend.
data have;
call streaminit(123);
do _n_ = 1 to 500;
id = rand('integer', 1e6);
do seq = 1 to rand('integer', 35);
output;
end;
end;
run;
proc surveyselect noprint data=have sampsize=5 out=want;
cluster id;
run;
proc sql noprint;
select count(distinct id) into :id_count trimmed from want;
%put NOTE: &=id_count;
If you don't have the procedure as part of your SAS license, you can do sample selection per k/n algorithm. NOTE: Earliest archived post for k/n is May 1996 SAS-L message which has code based on a 1995 SAS Observations magazine article.
proc sql noprint;
select count(distinct id) into :N trimmed from have;
proc sort data=have;
by id;
data want_kn;
retain N &N k 5;
if _n_ = 1 then call streaminit(123);
keep = rand('uniform') < k / N;
if keep then k = k - 1;
do until (last.id);
set have;
by id;
if keep then output;
end;
if k = 0 then stop;
N = N - 1;
drop k N keep;
run;
proc sql noprint;
select count(distinct id) into :id_count trimmed from want_kn;
%put NOTE: &=id_count;

How to save just one column values of a table into an array?

My query returns something like this:
I want to save each value into array. I know how to save each column of a row using SELECT INTO, but I don't know how to save rows of a table with just one column.
I would like to get this:
my_array(1) = 11111
my_array(2) = 22222
my_array(3) = 33333
....
Length of an array is 6. I know that my query will not return more than 6 rows.
If query returns less than 6 rows is it possible to put NULL into array element where there is no rows for it?
As suggested in a comment you may use BULK COLLECT
select your_col BULK COLLECT
INTO your_collection from my_array where some_condition = 'something';
Regarding your question
if query returns less than 6 rows is it possible to put NULL into
array element where there is no rows for it
You have not said why it's required but a BULK COLLECT will create as many elements in the array as the number of rows present in the query result. In case you need NULL elements, you may use count and extend collection methods to check and allocate null elements until the count is 6.
DECLARE
TYPE myarrtype is table of integer;
my_array myarrtype;
BEGIN
select level bulk collect into my_array from dual
connect by level <= 4; --generates 4 integers 1-4 and loads into array.
dbms_output.put_line('OLD COUNT '||my_array.count);
if my_array.count <= 6 then
my_array.extend(6 - my_array.count); --This appends required number of
--NULL elements
dbms_output.put_line('NEW COUNT '||my_array.count);
end if;
END;
/
Output
1 rows affected
dbms_output:
OLD COUNT 4
NEW COUNT 6
Demo

Constant-time index for string column on Oracle database

I have an orders table. The table belongs to a multi-tenant application, so there are orders from several merchants in the same table. The table stores hundreds of millions of records. There are two relevant columns for this question:
MerchantID, an integer storing the merchant's unique ID
TransactionID, a string identifying the transaction
I want to know whether there is an efficient index to do the following:
Enforce a unique constraint on Transaction ID for each Merchant ID. The constraint should be enforced in constant time.
Do constant time queries involving exact matches on both columns (for instance, SELECT * FROM <table> WHERE TransactionID = 'ff089f89feaac87b98a' AND MerchantID = 24)
Further info:
I am using Oracle 11g. Maybe this Oracle article is relevant to my question?
I cannot change the column's data type.
constant time means an index performing in O(1) time complexity. Like a hashmap.
Hash clusters can provide O(1) access time, but not O(1) constraint enforcement time. However, in practice the constant access time of a hash cluster is worse than the O(log N) access time of a regular b-tree index. Also, clusters are more difficult to configure and do not scale well for some operations.
Create Hash Cluster
drop table orders_cluster;
drop cluster cluster1;
create cluster cluster1
(
MerchantID number,
TransactionID varchar2(20)
)
single table hashkeys 10000; --This number is important, choose wisely!
create table orders_cluster
(
id number,
MerchantID number,
TransactionID varchar2(20)
) cluster cluster1(merchantid, transactionid);
--Add 1 million rows. 20 seconds.
begin
for i in 1 .. 10 loop
insert into orders_cluster
select rownum + i * 100000, mod(level, 100)+ i * 100000, level
from dual connect by level <= 100000;
commit;
end loop;
end;
/
create unique index orders_cluster_idx on orders_cluster(merchantid, transactionid);
begin
dbms_stats.gather_table_stats(user, 'ORDERS_CLUSTER');
end;
/
Create Regular Table (For Comparison)
drop table orders_table;
create table orders_table
(
id number,
MerchantID number,
TransactionID varchar2(20)
) nologging;
--Add 1 million rows. 2 seconds.
begin
for i in 1 .. 10 loop
insert into orders_table
select rownum + i * 100000, mod(level, 100)+ i * 100000, level
from dual connect by level <= 100000;
commit;
end loop;
end;
/
create unique index orders_table_idx on orders_table(merchantid, transactionid);
begin
dbms_stats.gather_table_stats(user, 'ORDERS_TABLE');
end;
/
Trace Example
SQL*Plus Autotrace is a quick way to find the explain plan and track I/O activity per statement. The number of I/O requests is labeled as "consistent gets" and is a decent way of measuring the amount of work done. This code demonstrates how the numbers were generated for other sections. The queries often need to be run more than once to warm things up.
SQL> set autotrace on;
SQL> select * from orders_cluster where merchantid = 100001 and transactionid = '2';
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 621801084
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 16 | 1 (0)| 00:00:01 |
|* 1 | TABLE ACCESS HASH| ORDERS_CLUSTER | 1 | 16 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("MERCHANTID"=100001 AND "TRANSACTIONID"='2')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
31 consistent gets
0 physical reads
0 redo size
485 bytes sent via SQL*Net to client
540 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
SQL>
Find Optimal Hashkeys, Trade-Offs
For optimal read performance all the hash collisions should fit in one block (all Oracle I/O is done per block, usually 8K). Getting the ideal storage right is tricky and requires knowing the hash algorithm, storage size (not the same as the block size), and number of hash keys (the buckets). Oracle has a default algorithm and size so it is possible to focus on only one attribute, the number of hash keys.
More hash keys leads to fewer collisions. This is good for TABLE ACCESS HASH performance as there is only one block to read. Below are the number of consistent gets for different hashkey sizes. For comparison an index access is also included. With enough hashkeys the number of blocks decreases to the optimal number, 1.
Method Consistent Gets (for transactionid = 1, 20, 300, 4000, and 50000)
Index 4, 3, 3, 3, 3
Hashkeys 100 1, 31, 31, 31, 31
Hashkeys 1000 1, 3, 4, 4, 4
Hashkeys 10000 1, 1, 1, 1, 1
More hash keys also lead to more buckets, more wasted space, and a slower TABLE ACCESS FULL operation.
Table type Space in MB
HeapTable 24MB
Hashkeys 100 26MB
hashkeys 1000 30MB
hashkeys 10000 81MB
To reproduce my results, use a sample query like select * from orders_cluster where merchantid = 100001 and transactionid = '1'; and change the last value to 1, 20, 300, 4000, and 50000.
Performance Comparison
Consistent gets are predictable and easy to measure, but at the end of the day only the wall clock time matters. Surprisingly, the index access with 4 times more
consistent gets is still faster than the optimal hash cluster scenario.
--3.5 seconds for b-tree access.
declare
v_count number;
begin
for i in 1 .. 100000 loop
select count(*)
into v_count
from orders_table
where merchantid = 100000 and transactionid = '1';
end loop;
end;
/
--3.8 seconds for hash cluster access.
declare
v_count number;
begin
for i in 1 .. 100000 loop
select count(*)
into v_count
from orders_cluster
where merchantid = 100000 and transactionid = '1';
end loop;
end;
/
I also tried the test with variable predicates but the results were similar.
Does it Scale?
No, hash clusters do not scale. Despite the O(1) time complexity of TABLE ACCESS HASH, and the O(log n) time complexity of INDEX UNIQUE SCAN, hash clusters never seem to outperform b-tree indexes.
I tried the above sample code with 10 million rows. The hash cluster was painfully slow to load, and still under-performed the index on SELECT performance. I tried to scale it up to 100 million rows but the insert was going to take 11 days.
The good news is that b*trees scale well. Adding 100 million rows to the above example only require 3 levels in the index. I looked at all DBA_INDEXES for a large database environment (hundreds of databases and a petabyte of data) - the worst index had only 7 levels. And that was a pathological index on VARCHAR2(4000) columns. In most cases your b-tree indexes will stay shallow regardless of the table size.
In this case, O(log n) beats O(1).
But WHY?
Poor hash cluster performance is perhaps a victim of Oracle's attempt to simplify things and hide the kind of details necessary to make a hash cluster work well. Clusters are difficult to setup and use properly and would rarely provide a significant benefit anyway. Oracle has not put a lot of effort into them in the past few decades.
The commenters are correct that a simple b-tree index is best. But it's not obvious why that should be true and it's good to think about the algorithms used in the database.

Using the data from a for loop as a table in the From in a Select

I am converting some code in Access over to Oracle, and one of the queries in Access uses a table that I am unable to use in Oracle. I am unable to create new tables, so I am trying to figure out a way to use the logic behind the table in the FOR section of my select.
The logic of the table is similar to:
FOR i = 1 To 100
number = number + 1
.AddNew
!tbl_number = number
NEXT i
I'm trying to convert this to oracle, and so far I have:
FOR i in 1 .. 100 LOOP
number := number + 1;
--This is where I am stuck; How do I simulate the table part
END LOOP;
I was thinking a cursor or a record would be the answer, but I can't seem to figure out how to implement that. In the end I basically want to have:
SELECT
table.number
FROM
(
--My for loop logic
) table
EDIT
The calculation is a bit more complicated; that was just an example. They aren't actually sequential, and there isn't really a pattern to rows.
EDIT
Here is a more complicated version of the for loop which is closer to what I'm actually doing:
FOR i in 1 .. 100 LOOP
number1 := number1 + 7;
number2 := (number2 + 8) / number1;
--This is where I am stuck; How do I simulate the table part
END LOOP;
You could use a recursive query (assuming you are on Oralce 11gR2 or later):
with example(idx, number1, number2) as (
-- Anchor Section
select 1
, 1 -- initial value
, 2 -- initial value
from dual
union all
-- Recursive Section
select prev.idx + 1
, prev.number1 + 7
, (prev.number2 + 8) / prev.number1
from example prev
where prev.idx < 100 -- The Guard
)
select * from example;
In the Anchor section set all the values for your first record. Then in the Recursive section setup the logic to determine the next records values as a function of the prior records values.
The Anchor section could select the initial values from some other table rather than being hard coded as in my example.
The recursive section needs to select from the named subquery (in this case example) but may also join to other tables as needed.
You need to generate a set with sequential integer numbers. Maybe you can use this (for Oracle 10g and above):
SELECT
ROWNUM NUM
FROM
DUAL D1,
DUAL D2
CONNECT BY
(D1.DUMMY = D2.DUMMY AND ROWNUM <= 100)

How to get records randomly from the oracle database?

I need to select rows randomly from an Oracle DB.
Ex: Assume a table with 100 rows, how I can randomly return 20 of those records from the entire 100 rows.
SELECT *
FROM (
SELECT *
FROM table
ORDER BY DBMS_RANDOM.RANDOM)
WHERE rownum < 21;
SAMPLE() is not guaranteed to give you exactly 20 rows, but might be suitable (and may perform significantly better than a full query + sort-by-random for large tables):
SELECT *
FROM table SAMPLE(20);
Note: the 20 here is an approximate percentage, not the number of rows desired. In this case, since you have 100 rows, to get approximately 20 rows you ask for a 20% sample.
SELECT * FROM table SAMPLE(10) WHERE ROWNUM <= 20;
This is more efficient as it doesn't need to sort the Table.
SELECT column FROM
( SELECT column, dbms_random.value FROM table ORDER BY 2 )
where rownum <= 20;
In summary, two ways were introduced
1) using order by DBMS_RANDOM.VALUE clause
2) using sample([%]) function
The first way has advantage in 'CORRECTNESS' which means you will never fail get result if it actually exists, while in the second way you may get no result even though it has cases satisfying the query condition since information is reduced during sampling.
The second way has advantage in 'EFFICIENT' which mean you will get result faster and give light load to your database.
I was given an warning from DBA that my query using the first way gives loads to the database
You can choose one of two ways according to your interest!
In case of huge tables standard way with sorting by dbms_random.value is not effective because you need to scan whole table and dbms_random.value is pretty slow function and requires context switches. For such cases, there are 3 additional methods:
1: Use sample clause:
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6
for example:
select *
from s1 sample block(1)
order by dbms_random.value
fetch first 1 rows only
ie get 1% of all blocks, then sort them randomly and return just 1 row.
2: if you have an index/primary key on the column with normal distribution, you can get min and max values, get random value in this range and get first row with a value greater or equal than that randomly generated value.
Example:
--big table with 1 mln rows with primary key on ID with normal distribution:
Create table s1(id primary key,padding) as
select level, rpad('x',100,'x')
from dual
connect by level<=1e6;
select *
from s1
where id>=(select
dbms_random.value(
(select min(id) from s1),
(select max(id) from s1)
)
from dual)
order by id
fetch first 1 rows only;
3: get random table block, generate rowid and get row from the table by this rowid:
select *
from s1
where rowid = (
select
DBMS_ROWID.ROWID_CREATE (
1,
objd,
file#,
block#,
1)
from
(
select/*+ rule */ file#,block#,objd
from v$bh b
where b.objd in (select o.data_object_id from user_objects o where object_name='S1' /* table_name */)
order by dbms_random.value
fetch first 1 rows only
)
);
To randomly select 20 rows I think you'd be better off selecting the lot of them randomly ordered and selecting the first 20 of that set.
Something like:
Select *
from (select *
from table
order by dbms_random.value) -- you can also use DBMS_RANDOM.RANDOM
where rownum < 21;
Best used for small tables to avoid selecting large chunks of data only to discard most of it.
Here's how to pick a random sample out of each group:
SELECT GROUPING_COLUMN,
MIN (COLUMN_NAME) KEEP (DENSE_RANK FIRST ORDER BY DBMS_RANDOM.VALUE)
AS RANDOM_SAMPLE
FROM TABLE_NAME
GROUP BY GROUPING_COLUMN
ORDER BY GROUPING_COLUMN;
I'm not sure how efficient it is, but if you have a lot of categories and sub-categories, this seems to do the job nicely.
-- Q. How to find Random 50% records from table ?
when we want percent wise randomly data
SELECT *
FROM (
SELECT *
FROM table_name
ORDER BY DBMS_RANDOM.RANDOM)
WHERE rownum <= (select count(*) from table_name) * 50/100;

Resources