What is the difference between NOPARALLEL and PARALLEL 1 in Oracle? - oracle

What is the difference between NOPARALLEL and PARALLEL 1? If I create three tables like so:
CREATE TABLE t0 (i NUMBER) NOPARALLEL;
CREATE TABLE t1 (i NUMBER) PARALLEL 1;
CREATE TABLE t2 (i NUMBER) PARALLEL 2;
They show up in the data dictionary as
SELECT table_name, degree FROM user_tables WHERE table_name IN ('T0','T1','T2');
TABLE_NAME DEGREE
T0 1 <==
T1 1 <==
T2 2
The documentation, however, states quite clearly
NOPARALLEL: Specify NOPARALLEL for serial execution. This is the default.
PARALLEL integer: Specification of integer indicates the degree of parallelism, which is the number of parallel threads used in the parallel operation. Each parallel thread may use one or two parallel execution servers.
So, NOPARALLEL is definitely serial, while PARALLEL 1 uses one thread, which may use one or two parallel servers??? But how can Oracle distinguish between both of them when the data dictionary stores the same value 1 for both?
BTW, the CREATE TABLE sys.tab$ statment in ?/rdbms/admin/dcore.bsq has the comment
/*
* Legal values for degree, instances:
* NULL (used to represent 1 on disk/dictionary and implies noparallel), or
* 2 thru EB2MAXVAL-1 (user supplied values), or
* EB2MAXVAL (implies use default value)
*/
degree number, /* number of parallel query slaves per instance */
instances number, /* number of OPS instances for parallel query */

There is no difference between NOPARALLEL and PARALLEL 1 - those options are stored the same way and behave the same way. This is a documentation bug because Oracle will never use two parallel execution servers for PARALLEL 1. We can test this situation by looking at V$PX_PROCESS and by understanding the producer/consumer model of parallelism.
How to Test Parallelism
There are many ways to measure the amount of parallelism, such as the execution plan or looking at GV$SQL.USERS_EXECUTING. But one of the best ways is to use the view GV$PX_PROCESS. The following query will show all the parallel servers currently being used:
select *
from gv$px_process
where status <> 'AVAILABLE';
Producer/Consumer Model
The Using Parallel Execution chapter of the VLDB and Partitioning Guide is worth reading if you want to fully understand Oracle parallelism. In particular, read the Producer/Consumer Model section of the manual to understand when Oracle will double the number of parallel servers.
In short - Each operation is executed in parallel separately, but the operations need to feed data into each other. A full table scan may use 4 parallel servers to read the data but the group by or order by operations need another 4 parallel servers to hash or sort the data. While the degree of parallelism is 4, the number of parallel servers is 8. This is what the SQL Language Reference means by the sentence "Each parallel thread may use one or two parallel execution servers."
Oracle doesn't just randomly double the number of servers. The doubling only happens for certain operations like an ORDER BY, which lets us test precisely when Oracle is enabling parallelism. The below tests demonstrate that Oracle will not double 1 parallel thread to 2 parallel servers.
Tests
Create these three tables:
create table table_noparallel noparallel as select level a from dual connect by level <= 1000000;
create table table_parallel_1 parallel 1 as select level a from dual connect by level <= 1000000;
create table table_parallel_2 parallel 2 as select level a from dual connect by level <= 1000000;
Run the below queries, and while they are running use a separate session to run the previous query against GV$PX_PROCESS. It may be helpful to use an IDE here, because you only have to retrieve the first N rows and keep the cursor open to count as using the parallel servers.
--0 rows:
select * from table_noparallel;
--0 rows:
select * from table_noparallel order by 1;
--0 rows:
select * from table_parallel_1;
--0 rows:
select * from table_parallel_1 order by 1;
--2 "IN USE":
select * from table_parallel_2;
--4 "IN USE":
select * from table_parallel_2 order by 1;
Notice that the NOPARALLEL and the PARALLEL 1 table work exactly the same way and neither of them use any parallel servers. But the PARALLEL 2 table will cause the number of parallel execution servers to double when the results are ordered.
Why is PARALLEL 1 Even Allowed?
Why doesn't Oracle just force the PARALLEL clause to only accept numbers larger than one and avoid this ambiguity? After all, the compiler already enforces a limit; the clause PARALLEL 0 will raise the error "ORA-12813: value for PARALLEL or DEGREE must be greater than 0".
I would guess that allowing a numeric value to mean "no parallelism" can make some code simpler. For example, I've written programs where the DOP was calculated and passed as a variable. If only numbers are used, the dynamic SQL is as simple as:
v_sql := 'create or replace table test1(a number) parallel ' || v_dop;
If we had to use NOPARALLEL, the code gets a bit uglier:
if v_dop = 1 then
v_sql := 'create or replace table test1(a number) noparallel';
else
v_sql := 'create or replace table test1(a number) parallel ' || v_dop;
end if;

Related

ORA-12839 when I from parallel DML in my ATP instance?

I am testing ATP with my application and get the following error:
ORA-12839 Cannot Modify An Object In Parallel After Modifying It.
Is there any way to disable the parallel DML on the ATP without making changes to the application code?
DROP TABLE objects PURGE;
CREATE TABLE objects
AS
SELECT *
FROM user_objects;
UPDATE /*+ parallel (objects) */ objects
SET
object_id = object_id + 1000;
SELECT *
FROM objects;
Do NOT use the HIGH or MEDIUM service, where parallelism is built-in and configured out of the box without you actively enabling it.
You should either use the transactional services (LOW, TP, TPURGENT) or you can disable parallel DML using “alter session disable parallel dml”.
Here is the same script, running on the LOW service -
select sys_context('userenv', 'service_name') from dual;
DROP TABLE objects PURGE;
CREATE TABLE objects
AS
SELECT *
FROM user_objects;
UPDATE /*+ parallel (objects) */ objects
SET
object_id = object_id + 1000;
SELECT *
FROM objects;
But wait, what are these 'LOW' or 'HIGH' services?
(Docs)
Note the words 'parallel' in the descriptions -
The basic characteristics of these consumer groups are:
HIGH: Highest resources, lowest concurrency. Queries run in parallel.
MEDIUM: Less resources, higher concurrency. Queries run in parallel.
You can modify the MEDIUM service concurrency limit. See Change MEDIUM
Service Concurrency Limit for more information.
LOW: Least resources, highest concurrency. Queries run serially.

Running PLSQL in parallel

Running PLSQL script to generate load
For some reasons (reproducing errors, ...) I would like to generate some load (with specific actions) in a PL SQL script.
What I would like to do:
A) Insert 1.000.000 rows in Schema A Table 1
B) In a loop and as best in parallel (2 or 3 times)
1) read from Schema-A.Table-1 one row with locking
2) insert it to Schema-B.Table-2
3) delete row from Schema-A.Table-1
Is there a way to do this B-task in a script in parallel in PLSQL on calling the script?
Who would this look like?
It's usually better to parallelize SQL statements inside a PL/SQL block, instead of trying to parallelize the entire PL/SQL block:
begin
execute immediate 'alter session enable parallel dml';
insert /*+ append parallel */ into schemaA.Table1 ...
commit;
insert /*+ append parallel */ into schemaB.Table2 ...
commit;
delete /*+ parallel */ from schemaA.Table1 where ...
commit;
dbms_stats.gather_table_stats('SCHEMAA', 'TABLE1', degree => 8);
dbms_stats.gather_table_stats('SCHEMAB', 'TABLE2', degree => 8);
end;
/
Large parallel DML statements usually require less code and run faster than creating your own parallelism in PL/SQL. Here are a few things to look out for:
You must have Enterprise Edition, large tables, decent hardware, and a sane configuration to run parallel SQL.
Setting the DOP is difficult. Using the hint /*+ parallel */ lets Oracle decide but you might want to play around with it by specifying a number, such as /*+ parallel(8) */.
Direct-path writes (the append hint) can be significantly faster. But they lock the entire table and the new results won't be recoverable until after the next backup.
Check the execution plan to ensure that direct-path writes are used - look for the operation LOAD AS SELECT instead of LOAD TABLE CONVENTIONAL. Tuning parallel SQL statements is best done with the Real-Time SQL Monitoring reports, found in select dbms_sqltune.report_sql_monitor(sql_id => 'SQL_ID') from dual;
You might want to read through the Parallel Execution Concepts chapter of the manual. Oracle parallelism can be tricky but it can also make your processes runs orders of magnitude faster if you're careful.
If objective is fast load and parallel is just an attempt to get that then do.
Create table newtemp as select old
To create table.
Then
Create table old_remaning as select old with not exists newtemp
Then drop old and new tables. Then do rename table . These operations will use parallel options at db level

Query cost: Global Temporary Tables vs. Collections (Virtual Arrays)

I have a query whose results are stored in a GTT (Global Temporary Table) and in a Collection.
Selecting the data from the GTT again, I get a very small cost: 103.
SELECT
...
FROM my_table_gtt
JOIN table2 ...
JOIN table3 ...
But when switching this from a GTT to a Collection (VA - Virtual Array), the cost skyrockets (78.000), but the difference in execution times between the two is very small.
SELECT
...
FROM TABLE(CAST(my_table_va as my_table_tt))
JOIN table2 ...
JOIN table3 ...
My question is why is there such a big difference in cost between the two approaches? From my knowledge, GTTs don't store table statistics, so why is it returning a better cost than the VA?
Global temporary tables can have statistics as any other table. In fact they are like any other table, they have data segments, just in temporary tablespace.
In 11g the statistics are global so they sometimes cause issues with execution plans. In 12c they are session based so each session gets proper ones (if available).
The collection type cardinality is based on DB block size and for default 8 kB block is 8168. Collection content is stored in PGA. It's quite common to hint the cardinality when using collection types in complex queries to hint the optimizer. You can also use extended optimizer interface for implementing own way for calculating cost.
Edit - added tests:
CREATE TYPE STRINGTABLE IS TABLE OF VARCHAR2(255);
CREATE GLOBAL TEMPORARY TABLE TMP (VALUE VARCHAR2(255));
INSERT INTO TMP SELECT 'Value' || LEVEL FROM DUAL CONNECT BY LEVEL <= 1000000;
DECLARE
x STRINGTABLE;
cnt NUMBER;
BEGIN
SELECT VALUE BULK COLLECT INTO x FROM TMP;
DBMS_OUTPUT.PUT_LINE(TO_CHAR(SYSTIMESTAMP, 'MI:SS.FF3'));
SELECT SUM(LENGTH(VALUE)) INTO cnt FROM TMP;
DBMS_OUTPUT.PUT_LINE(TO_CHAR(SYSTIMESTAMP, 'MI:SS.FF3'));
SELECT SUM(LENGTH(COLUMN_VALUE)) INTO cnt FROM TABLE(x);
DBMS_OUTPUT.PUT_LINE(TO_CHAR(SYSTIMESTAMP, 'MI:SS.FF3'));
END;
In this case is the access to GTT about twice as fast then to collection, cca 200 ms vs. 400 ms on my test machine. When I increased the number of rows to 10 000 000, I got ORA-22813: operand value exceeds system limits on the second query.
The most important difference between collections and GTT in SQL, is that CBO(cost-based optimizer) has limitations for TABLE function (kokbf$...), for example JPPD doesn't work with TABLE() functions.
Some workarounds: http://orasql.org/2019/05/30/workarounds-for-jppd-with-view-and-tablekokbf-xmltable-or-json_table-functions/

Oracle SQL parallel spooling automatically

I have a heavy query which spools data into a csv file that is sent to users. I have manually made parallel sessions and am executing the query with filter condition so that i can join all spooled files at the end into one single file thus reducing the time to generate the data (usually it takes about 10 hours, with parallel sessions it takes 2.5-3 hours).
My query is how can I automate this such that the script will find out max(agreementid) and then distribute it into X number of spool calls to generate X files where each file will be having max 100000 record say.
Additional Explanation: I guess my question was not very clear. I will try and explain again.
I have a table/view with large amount of data.
I need to spool this data into a CSV file.
It takes humongous amount of time to spool the CSV file.
I run parallel spools by doing below.
a) Select .... from ... where agreementid between 1 to 1000000;
b) Select .... from ... where agreementid between 1000001 to 2000000;
and so on and then spooling them individually in multiple sessions.
This helps me to generate multiple file which I can then stictch together and share with users.
I need a script (i guess dos based or AIX based) which will find the min and max of agreementID from my table and create the spooling scripts automatically and execute them through separate sessions of sql so that I get the files generated automatically.
Not sure whether I could make myself clear enough.
Thanks guys for replying to my earlier query but that was not what I was looking at.
A bit unclear what you want, but I think you want a query to find a low/high range of agreement_ids for x groups of ids (buckets). If so, try something like (using 4 buckets in this example):
select bucket, min(agreement_id), max(agreement_id), count(1)
from (
select agreement_id, ntile(4) over (order by agreement_id) bucket
from my_table
)
group by bucket;
Edit: If your problem is in messing with spooling multiple queries and combining, I would rather opt for creating a single materialized view (using parallel in the underlying query on the driving table) and refresh (complete, atomic_refresh=>false) when needed. Once refreshed, simply extract from the snapshot table (to a csv or whatever format you want).
There might be a simpler way, but this generates four 'buckets' of IDs, and you could plug the min and max values into your parametrized filter condition:
select bucket, min(agreementid) as min_id, max(agreementid) as max_id
from (
select agreementid,
case when rn between 1 and cn / 4 then 1
when rn between (cn / 4) - 1 and 2 * (cn / 4) then 2
when rn between (2 * cn / 4) - 1 and 3 * (cn / 4) then 3
when rn between (3 * cn / 4) - 1 and cn then 4
end as bucket
from (
select agreementid, rank() over (order by agreementid) as rn,
count(*) over () as cn from agreements
)
)
group by bucket;
If you wanted an upper limit for each bucket rather than a fixed number of buckets then you could do:
select floor(rn / 100000), min(agreementid) as min_id, max(service_num) as max_id
from (
select agreementid, rank() over (order by agreementid) as rn
from agreements
)
group by floor(rn / 100000);
And then pass each min/max to a SQL script, e.g. from a shell script calling SQL*Plus. The bucket number could be passed as well and be used as part of the spool file name, via a positional parameter.
I'm curious about what you've identified as the bottleneck though; have you tried running it as a parallel query inside the database, with a /*+ PARALLEL */ hint?

oracle : how to ensure that a function in the where clause will be called only after all the remaining where clauses have filtered the result?

I am writing a query to this effect:
select *
from players
where player_name like '%K%
and player_rank<10
and check_if_player_is_eligible(player_name) > 1;
Now, the function check_if_player_is_eligible() is heavy and, therefore, I want the query to filter the search results sufficiently and then only run this function on the filtered results.
How can I ensure that the all filtering happens before the function is executed, so that it runs the minimum number of times ?
Here's two methods where you can trick Oracle into not evaluating your function before all the other WHERE clauses have been evaluated:
Using rownum
Using the pseudo-column rownum in a subquery will force Oracle to "materialize" the subquery. See for example this askTom thread for examples.
SELECT *
FROM (SELECT *
FROM players
WHERE player_name LIKE '%K%'
AND player_rank < 10
AND ROWNUM >= 1)
WHERE check_if_player_is_eligible(player_name) > 1
Here's the documentation reference "Unnesting of Nested Subqueries":
The optimizer can unnest most subqueries, with some exceptions. Those exceptions include hierarchical subqueries and subqueries that contain a ROWNUM pseudocolumn, one of the set operators, a nested aggregate function, or a correlated reference to a query block that is not the immediate outer query block of the subquery.
Using CASE
Using CASE you can force Oracle to only evaluate your function when the other conditions are evaluated to TRUE. Unfortunately it involves duplicating code if you want to make use of the other clauses to use indexes as in:
SELECT *
FROM players
WHERE player_name LIKE '%K%'
AND player_rank < 10
AND CASE
WHEN player_name LIKE '%K%'
AND player_rank < 10
THEN check_if_player_is_eligible(player_name)
END > 1
There is the NO_PUSH_PRED hint to do it without involving rownum evaluation (that is a good trick anyway) in the process!
SELECT /*+NO_PUSH_PRED(v)*/*
FROM (
SELECT *
FROM players
WHERE player_name LIKE '%K%'
AND player_rank < 10
) v
WHERE check_if_player_is_eligible(player_name) > 1
You usually want to avoid forcing a specific order of execution. If the data or the query changes, your hints and tricks may backfire. It's usually better to provide useful metadata to Oracle so it can make the correct decisions for you.
In this case, you can provide better optimizer statistics about the function with ASSOCIATE STATISTICS.
For example, if your function is very slow because it has to read 50 blocks each time it is called:
associate statistics with functions
check_if_player_is_eligible default cost(1000 /*cpu*/, 50 /*IO*/, 0 /*network*/);
By default Oracle assumes that a function will select a row 1/20th of the time. Oracle wants to eliminate as many rows as soon
as possible, changing the selectivity should make the function less likely to be executed first:
associate statistics with functions
check_if_player_is_eligible default selectivity 90;
But this raises some other issues. You have to pick a selectivity for ALL possible conditions, 90% certainly won't always be accurate. The IO cost is the number of blocks fetched, but CPU cost is "machine instructions used", what exactly does that mean?
There are more advanced ways to customize statistics,for example using the Oracle Data Cartridge Extensible Optimizer. But data cartridge is probably one of the most difficult Oracle features.
You did't specify whether player.player_name is unique or not. One could assume that it is and then the database has to call the function at least once per result record.
But, if player.player_name is not unique, you would want to minimize the calls down to count(distinct player.player_name) times. As (Ask)Tom shows in Oracle Magazine, the scalar subquery cache is an efficient way to do this.
You would have to wrap your function call into a subselect in order to make use of the scalar subquery cache:
SELECT players.*
FROM players,
(select check_if_player_is_eligible(player.player_name) eligible) subq
WHERE player_name LIKE '%K%'
AND player_rank < 10
AND ROWNUM >= 1
AND subq.eligible = 1
Put the original query in a derived table then place the additional predicate in the where clause of the derived table.
select *
from (
select *
from players
where player_name like '%K%
and player_rank<10
) derived_tab1
Where check_if_player_is_eligible(player_name) > 1;

Resources