Is restrict(amp) more restrictive than CUDA kernel code?

Is restrict(amp) more restrictive than CUDA kernel code? - parallel-processing

In C++ AMP, kernel functions or lambdas are marked with restrict(amp), which imposes severe restrictions on the allowed subset of C++ (listed here). Does CUDA allow any more freedom on the subset of C or C++ in kernel functions?

As of Visual Studio 11 and CUDA 4.1, restrict(amp) functions are more restrictive than CUDA's analogous __device__ functions. Most noticeably, AMP is more restrictive about how pointers can be used. This is a natural consequence of AMP's DirectX11 computational substrate, which disallows pointers in HLSL (graphics shader) code. By constrast, CUDA's lower-level IR is PTX, which is more general purpose than HLSL.
Here's a line by line comparison:
| VS 11 AMP restrict(amp) functions | CUDA 4.1 sm_2x __device__ functions |
|------------------------------------------------------------------------------|
|* can only call functions that have |* can only call functions that have |
| the restrict(amp) clause | the __device__ decoration |
|* The function must be inlinable |* need not be inlined |
|* The function can declare only |* Class types are allowed |
| POD variables | |
|* Lambda functions cannot |* Lambdas are not supported, but |
| capture by reference and | user functors can hold pointers |
| cannot capture pointers | |
|* References and single-indirection |* References and multiple-indirection |
| pointers are supported only as | pointers are supported |
| local variables and function | |
|* No recursion |* Recursion OK |
|* No volatile variables |* Volatile variables OK |
|* No virtual functions |* Virtual functions OK |
|* No pointers to functions |* Pointers to functions OK |
|* No pointers to member functions |* Pointers to member functions OK |
|* No pointers in structures |* Pointers in structures OK |
|* No pointers to pointers |* Pointers to pointers OK |
|* No goto statements |* goto statements OK |
|* No labeled statements |* Labeled statements OK |
|* No try, catch, or throw statements |* No try, catch, or throw statements |
|* No global variables |* Global __device__ variables OK |
|* Static variables through tile_static |* Static variables through __shared__ |
|* No dynamic_cast |* No dynamic_cast |
|* No typeid operator |* No typeid operator |
|* No asm declarations |* asm declarations (inline PTX) OK |
|* No varargs |* No varargs |
You can read more about restrict(amp)'s restrictions here. You can read about C++ support in CUDA __device__ functions in Appendix D of the CUDA C Programming Guide.

Related

Oracle 19c performance issue - long operation

I have an application that runs automated process against the database on a very regular basis. Unfortunately, the statement was crafted inside an ORM and can't be rewritten.
The statement is as follows:
SELECT t0.id FROM SCHEMA.CMTS t0, SCHEMA.CMNTSRCH t3, SCHEMA.CONT t1, SCHEMA.YU t2 WHERE (
(
t0.YUtypeyn = 0
OR
t0.YUtypeyn = 1
)
AND
(
(
t1.contcd IN ('FD')
OR
t1.contcd IS NULL
)
AND
(
t2.TRNSMDCD = 'RD'
AND
t2.spid = 1
AND
t0.compyn = NULL
AND
(
NOT
(
upper(t3.curryuloc) LIKE upper('ABC%') ESCAPE '\'
)
AND
NOT
(
upper(t3.curryuloc) LIKE upper('DEF%') ESCAPE '\'
)
AND
NOT
(
upper(t3.curryuloc) LIKE upper('GHI%') ESCAPE '\'
)
AND
NOT
(
upper(t3.curryuloc) LIKE upper('%JKL%') ESCAPE '\'
)
AND
NOT
(
upper(t3.curryuloc) LIKE upper('%MNO%') ESCAPE '\'
)
AND
NOT
(
upper(t3.curryuloc) LIKE upper('%PQR%') ESCAPE '\'
)
AND
NOT
(
upper(t3.curryuloc) LIKE upper('%STU%') ESCAPE '\'
)
AND
NOT
(
upper(t3.curryuloc) LIKE upper('%VWX%') ESCAPE '\'
)
AND
NOT
(
upper(t3.curryuloc) LIKE upper('%YZ%') ESCAPE '\'
)
)
AND
t0.cancelledyn = NULL
)
)
)
AND
t0.CMNTSRCHid = t3.id
AND
t0.contcd = t1.contcd
AND
t0.YUid = t2.id(+)
Performance issues have been noticed while this statement is running. When monitoring the v$session_longops, I noticed a full table scan against CMNTSRCH.
2640 seconds elapsed so far for a full table scan - which raised concerns.
Explain plan is as follows:
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 0 (0)| |
|* 1 | FILTER | | | | | |
| 2 | NESTED LOOPS | | 1 | 59 | 26677 (1)| 00:00:02 |
| 3 | NESTED LOOPS | | 1 | 59 | 26677 (1)| 00:00:02 |
| 4 | MERGE JOIN CARTESIAN | | 1 | 27 | 26674 (1)| 00:00:02 |
| 5 | NESTED LOOPS | | 1 | 13 | 24369 (1)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | CONT_PK | 1 | 5 | 0 (0)| 00:00:01 |
|* 7 | TABLE ACCESS FULL | CMNTSRCH | 1 | 8 | 24369 (1)| 00:00:01 |
| 8 | BUFFER SORT | | 36824 | 503K| 2305 (1)| 00:00:01 |
|* 9 | INDEX FAST FULL SCAN | IDX_YU20220408 | 36824 | 503K| 2305 (1)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | CMNT_013 | 1 | | 2 (0)| 00:00:01 |
|* 11 | TABLE ACCESS BY INDEX ROWID| CMNT | 1 | 32 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Reviewing the statement, I decided to create a function based index on t3.id and t3.curryuloc and reviewed the explain plan - it appeared to improve CPU cost and now performs an Index Fast Full scan as opposed to a Table Scan.
New explain plan (after Function Based index creation):
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 0 (0)| |
|* 1 | FILTER | | | | | |
| 2 | NESTED LOOPS | | 1 | 59 | 3975 (1)| 00:00:01 |
| 3 | NESTED LOOPS | | 1 | 59 | 3975 (1)| 00:00:01 |
| 4 | MERGE JOIN CARTESIAN | | 1 | 27 | 3972 (1)| 00:00:01 |
| 5 | NESTED LOOPS | | 1 | 13 | 1667 (2)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | CONT_PK | 1 | 5 | 0 (0)| 00:00:01 |
|* 7 | INDEX FAST FULL SCAN | IDX_CMNTSRCH_20220407 | 1 | 8 | 1667 (2)| 00:00:01 |
| 8 | BUFFER SORT | | 36824 | 503K| 2305 (1)| 00:00:01 |
|* 9 | INDEX FAST FULL SCAN | IDX_YU20220408 | 36824 | 503K| 2305 (1)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | CMNT_013 | 1 | | 2 (0)| 00:00:01 |
|* 11 | TABLE ACCESS BY INDEX ROWID| CMNT | 1 | 32 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------
However, this operation is also taking an incredibly long time
Monitoring the v$session_longops, I can see when a new session runs the statement. SOFAR starts at about 4000 blocks, and gets slower and slower, almost grinding to a halt by the time it gets to 5500 blocks. Forcing the use of the Function based index doesn't appear to have made any difference to the time it takes for the operation to complete.
Oddly though, if I run the statement manually with the same arguments the app binds to the parameters, it take almost no time to execute (there are no rows returned).
What should be my next steps to troubleshoot this issue?

You have a whole mess of WHERE clauses like this:
AND (NOT(upper(t3.curryuloc) LIKE upper('ABC%') ESCAPE '\')
See the upper() function on the column CMNTSRCH.curryuloc? It can't random-access an ordinary index.
But Oracle has function indexes. You may be able to get better performance if you create an index like this:
CREATE INDEX whatever_1 ON CMNTSRCH (UPPER(curryuloc), id);
or, reversing the column order, like this:
CREATE INDEX whatever_2 ON CMNTSRCH (id, UPPER(curryuloc));
It's worth a try.
But, it has to be said, that cascade of AND...NOT...LIKE clauses isn't written for efficiency.

what is significance of tempspc in oracle explain plan and the measure of this is M=MB and K=KB?

what is significance of tempspc in oracle explain plan and the measure of this is M=MB and K=KB?
is there any way we can reduce this tempspc utilization in oracle query
Please find sample plan
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11966 | 3552K| | 623 (1)| 00:00:01 |
|* 1 | VIEW | | 11966 | 3552K| | 623 (1)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK | | 11966 | 1332K| 1968K| 623 (1)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| BAl_ACTIVITY | 11966 | 1332K| | 311 (1)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IDX_FIN_ACTVT_BAL_ID | 11966 | | | 37 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------------------

I believe the metrics are expressed in bytes unless a unit specifyer is provided.
TEMP tablespace is used for large SQL operations that can't be completed in memory and need to spill to disk. For example:
Sorting
merging
parallel
To reduce temp:
You can increase the PGA to provide more memory for the SQL
operations and thus reduce the likelihood of spilling over to temp.
Rewrite or tune the query to use less expensive operations.

Why is Oracle's query planner adding a filter predicate that replicates a constraint?

I have a simple Oracle query with a plan that doesn't make sense.
SELECT
u.custid AS "custid",
l.listid AS "listid"
FROM
users u
INNER JOIN lists l ON u.custid = l.custid
And here’s what the autotrace explain tells me it has for a plan
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1468K| 29M| | 11548 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 1468K| 29M| 7104K| 11548 (1)| 00:00:01 |
| 2 | INDEX FAST FULL SCAN| USERS_PK | 404K| 2367K| | 266 (2)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | LISTS | 1416K| 20M| | 9110 (1)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("U"."CUSTID"="L"."CUSTID")
3 - filter(TRUNC("SORT_TYPE")>=1 AND TRUNC("SORT_TYPE")<=16)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- this is an adaptive plan
- 1 Sql Plan Directive used for this statement
What concerns me is predicate 3. sort_type does not appear in the query, and is not indexed at all. It seems to me that sort_type should not be involved in this query at all.
There is a constraint on lists.sort_type: (Yes, I know we could have sort_type be an INTEGER not a NUMBER)
sort_type NUMBER DEFAULT 2 NOT NULL,
CONSTRAINT lists_sort_type CHECK ( sort_type BETWEEN 1 AND 16 AND TRUNC(sort_type) = sort_type )
It looks to me that that filter is on sort_type is basically a tautology. Every row in lists must pass that filter because of that constraint.
If I drop the constraint, the filter no longer shows up in the plan, and the estimated cost goes down a little bit. If I add the constraint back, the plan uses the filter again. There's no significant difference in execution speed one way or the other.
I'm concerned because I discovered this filter in a much larger, more complex query that I was trying to optimize down from a couple of minutes of runtime.
Why is Oracle adding that filter, and is it a problem and/or pointing to another problem?
EDIT: If I change the constraint on sort_type to not have the TRUNC part, the filter disappears. If I split the constraint into two different constraints, the filter comes back.

Generally speaking, Oracle generates predicates based on your CHECK constraints whenever doing so will give the optimizer more information to generate a good plan. It is not always smart enough to recognize when those are redundant. Here is a short example in Oracle 12c using your table names:
-- Create the CUSTS table
CREATE TABLE custs ( custid number not null );
CREATE INDEX custs_u1 on custs (custid);
-- Create the LISTS table
CREATE TABLE lists
( listid number not null,
sort_type number not null,
custid number,
constraint lists_c1 check ( sort_type between 1 and 16 and
trunc(sort_type) = sort_Type )
);
-- Explain a join
EXPLAIN PLAN FOR
SELECT /*+ USE_HASH(u) */
u.custid AS "custid",
l.listid AS "listid"
FROM custs u
INNER JOIN lists l ON u.custid = l.custid;
-- Show the plan
select * from table(dbms_xplan.display);
Plan hash value: 2443150416
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 39 | 3 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 39 | 3 (0)| 00:00:01 |
| 2 | INDEX FULL SCAN | CUSTS_U1 | 1 | 13 | 1 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| LISTS | 1 | 26 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("U"."CUSTID"="L"."CUSTID")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
So far, nothing weird. No questionable predicates added.
Now, let's tell the Oracle optimizer that the distribution of data on TRUNC(sort_type) might matter...
declare
x varchar2(30);
begin
x := dbms_stats.create_extended_stats ( user, 'LISTS', '(TRUNC(SORT_TYPE))');
dbms_output.put_line('Extension name = ' || x);
end;
... and, now, let's explain that same query again...
-- Re-explain the same join as before
EXPLAIN PLAN FOR
SELECT /*+ USE_HASH(u) */
u.custid AS "custid",
l.listid AS "listid"
FROM custs u
INNER JOIN lists l ON u.custid = l.custid;
-- Show the new plan
select * from table(dbms_xplan.display);
Plan hash value: 2443150416
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | 3 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 52 | 3 (0)| 00:00:01 |
| 2 | INDEX FULL SCAN | CUSTS_U1 | 1 | 13 | 1 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| LISTS | 1 | 39 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("U"."CUSTID"="L"."CUSTID")
3 - filter(TRUNC("SORT_TYPE")>=1 AND TRUNC("SORT_TYPE")<=16)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Now, Oracle has added the predicate, because the CBO might benefit from it. Does it really benefit from it? No, but Oracle is only smart enough to know that it might and that it doesn't(*) hurt anything.
(*) there have been numerous bugs in previous versions where this _has_ hurt things by messing up the selectivity estimated by the CBO.
The presence of extended statistics is only one example reason of why Oracle might think it could benefit from this predicate. To find out if that is the reason in your case, you can look for extended statistics in your database like this:
SELECT * FROM dba_stat_extensions where table_name = 'LISTS';
Keep in mind, the Oracle CBO can create stat extensions on its own. So there could be extended stats that you didn't realize were there.

Oracle 11gR2 - View Function Columns Evaluation

I seem to have an odd issue regarding an Oracle view that has functions defined for columns and when those functions are evaluated.
Let's say I have the following view and function definition:
CREATE OR REPLACE VIEW test_view_one AS
SELECT column_one,
a_package.function_that_returns_a_value(column_one) function_column
FROM a_table;
CREATE OR REPLACE PACKAGE BODY a_package AS
FUNCTION function_that_returns_a_value(p_key VARCHAR2) RETURN VARCHAR2 IS
CURSOR a_cur IS
SELECT value
FROM table_b
WHERE key = p_key;
p_temp VARCHAR2(30);
BEGIN
-- Code here to write into a temp table. The function call is autonomous.
OPEN a_cur;
FETCH a_cur INTO p_temp;
CLOSE a_cur;
RETURN p_temp;
END function_that_returns_a_value;
END a_package;
In general, I would expect that if function_column is included in a query then for every row brought back by that query, the function would be run. This seems to be true in some circumstances but not for others.
For example, let's say I have the following:
SELECT pageouter,*
FROM(WITH page_query AS (SELECT *
FROM test_view_one
ORDER BY column_one)
SELECT page_query.*, ROWNUM as innerrownum
FROM page_query
WHERE rownum <= 25) pageouter WHERE pageouter.innerrownum >= 1
In this scenario, that inner query (the one querying test_view_one) brings back around 90,000 records.
If I define the function as inserting into a temporary table then I can tell that the function ran 25 times, once for each row brought back. Exactly what I would expect.
However, if I add a significant where clause on to that inner query, e.g.
SELECT pageouter,*
FROM(WITH page_query AS (SELECT *
FROM test_view_one
WHERE EXISTS (SELECT 'x' FROM some_table WHERE ...)
AND NOT EXISTS (SELECT 'x' FROM some_other_table WHERE ...)
AND EXISTS (SELECT 'x' FROM another_table WHERE ...)
ORDER BY column_one)
SELECT page_query.*, ROWNUM as innerrownum
FROM page_query
WHERE rownum <= 25) pageouter WHERE pageouter.innerrownum >= 1
Then the number of rows being brought back by the inner query is 60,000 and if I then query the temporary table, I can tell the function has run 60,000 times. Unsurprisingly, this pretty much destroys performance of the query.
The queries above are run as part of a paging implementation which is why we only ever bring back 25 rows and is why we only ever need the functions to be run for those 25 rows.
I should add, if I change the WHERE clause (i.e. I remove some of the conditions) then the query goes back to behaving it self, only running the functions for the 25 rows that are actually brought back.
Does anyone have any idea as to when functions in views are evaluated? Or anyway in determining what causes it or a way of identifying when the functions are evaluated (I've checked the explain plan and there's nothing in there which seems to give it away). If I knew that then I could hopefully find a solution to the problem but there seems to be little documentation other than "They'll run for each row brought back" which is clearly not the case in some scenarios.
I fully appreciate it's difficult to work out what's going on without a working schema but if you need anymore info then please feel free to ask.
Many Thanks
Additional Info as Requested.
Below is the actual explain plan that I get out of the production environment. The table names don't match the above query (in fact there's considerably more tables involved but they're all joined by NOT EXISTS statements within the WHERE clause.)
The DEMISE table, is the equivalent of the A_TABLE in the above query.
It's worth noting that stats were gathered just before I ran the explain plan to make it as accurate as possible.
My understanding of this is that the VIEW row is where the functions would be evaluated, which occurs AFTER the rows have been filtered down. My understanding is obviously flawed!
So this is the bad plan, the one that calls the function 60,000 times...
Execution Plan
----------------------------------------------------------
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 10230 | 984 (1)|
| 1 | FAST DUAL | | 1 | | 2 (0)|
| 2 | FAST DUAL | | 1 | | 2 (0)|
|* 3 | VIEW | | 5 | 10230 | 984 (1)|
|* 4 | COUNT STOPKEY | | | | |
| 5 | VIEW | | 5 | 10165 | 984 (1)|
|* 6 | SORT ORDER BY STOPKEY | | 5 | 340 | 984 (1)|
| 7 | COUNT | | | | |
|* 8 | FILTER | | | | |
|* 9 | HASH JOIN RIGHT OUTER | | 5666 | 376K| 767 (1)|
|* 10 | INDEX RANGE SCAN | USERDATAI1 | 1 | 12 | 2 (0)|
|* 11 | HASH JOIN RIGHT ANTI | | 5666 | 309K| 765 (1)|
|* 12 | INDEX FAST FULL SCAN | TNNTMVINI1 | 1 | 17 | 35 (0)|
|* 13 | HASH JOIN RIGHT ANTI | | 6204 | 236K| 729 (1)|
|* 14 | INDEX RANGE SCAN | CODESGENI3 | 1 | 10 | 2 (0)|
|* 15 | INDEX FULL SCAN | DEMISEI4 | 6514 | 184K| 727 (1)|
| 16 | NESTED LOOPS | | 1 | 25 | 3 (0)|
| 17 | NESTED LOOPS | | 1 | 25 | 3 (0)|
|* 18 | INDEX RANGE SCAN | PROPERTY_GC | 1 | 15 | 2 (0)|
|* 19 | INDEX UNIQUE SCAN | CODESGENI1 | 1 | | 0 (0)|
|* 20 | TABLE ACCESS BY INDEX ROWID| CODESGEN | 1 | 10 | 1 (0)|
| 21 | TABLE ACCESS FULL | QCDUAL | 1 | | 3 (0)|
|* 22 | INDEX RANGE SCAN | DMSELEASI4 | 1 | 21 | 2 (0)|
|* 23 | INDEX RANGE SCAN | TNNTMVINI1 | 1 | 17 | 1 (0)|
| 24 | TABLE ACCESS FULL | QCDUAL | 1 | | 3 (0)|
-------------------------------------------------------------------------------------------
This is the good plan. This calls the function 25 times but has some of the not exists statements removed from the where clause.
Execution Plan
----------------------------------------------------------
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 54200 | 144 (0)|
| 1 | FAST DUAL | | 1 | | 2 (0)|
| 2 | FAST DUAL | | 1 | | 2 (0)|
|* 3 | VIEW | | 25 | 54200 | 144 (0)|
|* 4 | COUNT STOPKEY | | | | |
| 5 | VIEW | | 26 | 56030 | 144 (0)|
| 6 | COUNT | | | | |
|* 7 | FILTER | | | | |
| 8 | NESTED LOOPS ANTI | | 30 | 3210 | 144 (0)|
| 9 | NESTED LOOPS OUTER | | 30 | 2580 | 114 (0)|
| 10 | NESTED LOOPS ANTI | | 30 | 2220 | 84 (0)|
| 11 | NESTED LOOPS ANTI | | 32 | 1824 | 52 (0)|
| 12 | TABLE ACCESS BY INDEX ROWID| DEMISE | 130K| 5979K| 18 (0)|
| 13 | INDEX FULL SCAN | DEMISEI4 | 34 | | 3 (0)|
|* 14 | INDEX RANGE SCAN | CODESGENI3 | 1 | 10 | 1 (0)|
|* 15 | INDEX RANGE SCAN | TNNTMVINI1 | 1 | 17 | 1 (0)|
|* 16 | INDEX RANGE SCAN | USERDATAI1 | 1 | 12 | 1 (0)|
|* 17 | INDEX RANGE SCAN | DMSELEASI4 | 1 | 21 | 1 (0)|
| 18 | TABLE ACCESS FULL | QCDUAL | 1 | | 3 (0)|
----------------------------------------------------------------------------------------
I fully appreciate the second plan is doing less but that doesn't explain why the functions aren't being evaluated... at least not that I can work out.

The Pagination with ROWNUM may be performed
in two ways:
A) full scan the row source with optimized sorting (limited to the top N rows) or
B) index access of the row source with no sort at all
Here simplified example of case A
SELECT *
FROM
(SELECT a.*,
ROWNUM rnum
FROM
( SELECT * FROM test_view_one ORDER BY id
) a
WHERE ROWNUM <= 25
)
WHERE rnum >= 1
The corresponding execution plan looks as follows (Note that I presend also part
of column projection - I will soon explain why):
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 975 | | 1034 (1)| 00:00:01 |
|* 1 | VIEW | | 25 | 975 | | 1034 (1)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | | |
| 3 | VIEW | | 90000 | 2285K| | 1034 (1)| 00:00:01 |
|* 4 | SORT ORDER BY STOPKEY| | 90000 | 439K| 1072K| 1034 (1)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TEST | 90000 | 439K| | 756 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
...
3 - "A"."ID"[NUMBER,22], "A"."FUNCTION_COLUMN"[NUMBER,22]
4 - (#keys=1) "ID"[NUMBER,22], "MY_PACKAGE"."MY_FUNCTION"("ID")[22]
5 - "ID"[NUMBER,22]
Within the execution the table is accessed with FULL SCAN, i.e. all records are red.
The optimization is in the SORT operation: SORT ORDER BY STOPKEY means that not all
rows are sorted, but only the top 25 are kept and sortet.
Here the execution plan for case B
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 975 | 2 (0)| 00:00:01 |
|* 1 | VIEW | | 25 | 975 | 2 (0)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | VIEW | | 26 | 676 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN| TEST_IDX | 26 | 130 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Here are accessed only the required 25 rows and therefore the function can't be called more that the N times.
Now the important consideration, in case A, the function can, but need not be called for each row. How do we see it?
The answer is in the column projection in the explain plan.
4 - (#keys=1) "ID"[NUMBER,22], "MY_PACKAGE"."MY_FUNCTION"("ID")[22]
The relevant line 4 show, that the function is called in the SORT operation and therefor for each line. (Sort gets all the rows).
Conclusion
My test on 11.2 shows that in case A (FULL SCAN with SORT ORDER BY STOPKEY) the view function is called
once per each row.
I guess the only workaround is to select only the ID, limit the result and than join back the original view to get the function value.
Final notes
I tested this in 12.1 as well and see below the shift in the column projection.
The function is calculated first in the VIEW (line 3), i.e. both cases works fine.
Column Projection Information (identified by operation id):
-----------------------------------------------------------
...
3 - "A"."ID"[NUMBER,22], "A"."FUNCTION_COLUMN"[NUMBER,22]
4 - (#keys=1) "ID"[NUMBER,22]
5 - "ID"[NUMBER,22]
And of course in 12c the new feature of OFFSET - FETCH NEXT could be used.
Good Luck!

Oracle slow query performance with PARALLEL optimization plan

I have very simple query
SELECT
A
FROM table
where B = 'X'
explain plan for it looks like
|
0 | SELECT STATEMENT | | 2 | 16 | 4 (0)| 00:00:01 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 2 | 16 | 4 (0)| 00:00:01 | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | 2 | 16 | 4 (0)| 00:00:01 | Q1,00 | PCWC | |
|* 4 | INDEX FAST FULL SCAN| TABLE_UNIQUE_ROLES_KEY1 | 2 | 16 | 4 (0)| 00:00:01 | Q1,00 | PCWP | |
It appears to me that Oracle tries to run PARALLEL execution plan.
But I do not have any understanding why it would do it. It significantly slows down query
and if I do
SELECT /*+ NO_PARALLEL */
A
FROM table
where B = 'X'
it works fast, and plan is:
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 16 | 4 (0)| 00:00:01 |
|* 1 | INDEX FAST FULL SCAN| TABLE_UNIQUE_ROLES_KEY1 | 2 | 16 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
What causes parallelism in first scenario?
The degree on the table is set to 1 but the degree on the TABLE_UNIQUE_ROLES_KEY1 (and the other indexes on the table) are all set to 4. I don't have privileges to query v$parameter so I can't see how parallelism is configured for the database.
TABLE_UNIQUE_ROLES_KEY1 is a covering index for the query-- it is defined on the columns (a, b, c, d) where a is the column I'm selecting, b is the column that I'm filtering on and c and d are not involved in the query.

The immediate cause is that someone has told Oracle that it should use parallel query (the degree for your indexes has all been set to 4). That tends to make the optimizer think that full scanning the index in parallel will be relatively cheap which is why the optimizer is picking that plan.
You can change the parallel setting on your index
ALTER INDEX TABLE_UNIQUE_ROLES_KEY1 NOPARALLEL
which should stop the optimizer from choosing this plan (you may have to set other indexes to noparallel as well to prevent the optimizer from picking a different index to full scan in parallel). But I'd hesitate to do that until I understood what person or process set the degree on your indexes to 4-- if you don't understand the root cause, it's likely that you'll end up either breaking something else or in an endless battle where that person/ process sets your indexes to use parallelism and you set them back.
The two most likely candidates for what caused the indexes to have a degree of 4 are that someone (either a developer or a DBA) was trying to get parallel query to kick in for some other query or that the DBA is running an (almost certainly unnecessary) script that periodically rebuilds indexes that does so in parallel without realizing that this changes the degree setting on the index and makes it likely that parallel query kicks in. So you probably need to have a chat with the other developers and/or the other DBAs to figure out whether setting the index to noparallel will negatively affect them and whether there are other processes that will be changing the setting on you.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Is restrict(amp) more restrictive than CUDA kernel code? - parallel-processing

In C++ AMP, kernel functions or lambdas are marked with restrict(amp), which imposes severe restrictions on the allowed subset of C++ (listed here). Does CUDA allow any more freedom on the subset of C or C++ in kernel functions?

Related

Oracle 19c performance issue - long operation

what is significance of tempspc in oracle explain plan and the measure of this is M=MB and K=KB?

Why is Oracle's query planner adding a filter predicate that replicates a constraint?

Oracle 11gR2 - View Function Columns Evaluation

Oracle slow query performance with PARALLEL optimization plan

Categories

Resources