Oracle: How to implement an *almost* deterministic function? [duplicate] - oracle

This question already has answers here:
Is there a PL/SQL pragma similar to DETERMINISTIC, but for the scope of one single SQL SELECT?
(4 answers)
Closed 8 years ago.
I have a project which, in essence, is a batch of computations. It depends on quite a few parameters that, while they are not constants (they might change over time) there is no way they are going to change in the batch context.
To make myself clear, think of VAT rate: it might change over time, but when one closes an accounting period, it behaves like a constant on what concerns the closing itself.
Because these parameters are all over the place, I would like to find a way to limit DB lookup as much as possible. Ideally, I would implement a DETERMINISTIC function, but, this is out of the question - as suggested by relative documentation.
Any ideas / suggestions?
Thank you in advance.
EDIT:
Keep also in mind that these values are stored in database - as we may keep VAT rate so that we may know its value at a given point in time. Though it wouldn't be expected, it is possible that a batch concerning some previous period will run again - and will need to know the value of its parameters as they were then.
The benefit of a DETERMINISTIC function is that, given the fact that it produces consistent results (same input always gives the same output) is what I would do if these values were constants and I wouldn't want to keep track of them. But the documentation states clearly that, if a function does db lookups, it must never be DETERMINISTIC.

You can't create an "almost" deterministic function. You can create a deterministic function if you call it correctly. If we assume you're creating a simple function to calculate the amount of VAT you can do it two ways; firstly by referencing the table directly in the function:
create or replace function calculate_vat is (
P_Sale_Value in number ) return number is
l_vat number;
begin
select trunc(vat_rate * P_Sale_Value, 2) into l_vat
from vat_table
where ...
return l_vat;
end;
/
This would be called something like this:
select sale_value, calculate_vat(sale_value)
from sales_table
You cannot create this function as deterministic because the value in the table might change. As the documentation says:
Do not specify this clause to define a function that uses package variables or that accesses the database in any way that might affect the return result of the function
However, you can create the function differently if you pass in the VAT value as a parameter:
create or replace function calculate_vat is (
P_Sale_Value in number
, P_VAT_Rate in number
) return number deterministic is
begin
return trunc(P_VAT_Rate* P_Sale_Value, 2);
end;
/
You can then call it with a JOIN on the VAT table, to give you your, valid, deterministic function.
select s.sale_value, calculate_vat(s.sale_value, v.vat_rate)
from sales_table s
join vat_table v
on ...
where ...

Related

Is there a way to tag a Snowflake view as "safe" for result reuse?

Reading the Snowflake documentation for Using Persisted Query Results, one of the conditions that must be met for result reuse is the following:
The query does not include user-defined functions (UDFs) or external
functions.
After some experiments with this trivial UDF and view:
create function trivial_function(x number)
returns number
as
$$
x
$$
;
create view redo as select trivial_function($1) as col1, $2 as col2
from values (1,2), (3,4);
I've verified that this condition also applies to queries over views that use UDFs in their definitions.
The thing is, I have a complex view that would benefit much from result reuse, but employs a lot of UDFs for the sake of clarity. Some UDFs could be inlined, but that would make the view much more difficult to read. And some UDFs (those written in Javascript) are impossible to inline.
And yet, I know that all the UDFs are "pure" in the functional programming sense: for the same inputs, they always return the same outputs. They don't check the current timestamp, generate random values, or reference some other table that might change between invocations.
Is there some way to "convince" the query planner that a view is safe for result reuse, despite the presence of UDFs?
There is parameter called IMMUTABLE:
CREATE FUNCTION
VOLATILE | IMMUTABLE
Specifies the behavior of the UDF when returning results:
VOLATILE: UDF might return different values for different rows, even for the same input (e.g. due to non-determinism and statefullness).
IMMUTABLE: UDF assumes that the function, when called with the same inputs, will always return the same result. This guarantee is not checked. Specifying IMMUTABLE for a UDF that returns different values for the same input will result in undefined behavior.
Default: VOLATILE

pl/sql query optimization with function call in where clause

I am trying to optimize a query where I am using a function() call in the where clause.
The function() simply changes the timezone of the date.
When I call the function as part of the SELECT, it executes extremely fast (< 0.09 sec against table of many hundreds of thousands of rows)
select
id,
fn_change_timezone (date_time, 'UTC', 'US/Central') AS tz_date_time,
value
from a_table_view
where id = 'keyvalue'
and date_time = to_date('01-10-2014','mm-dd-yyyy')
However, this version runs "forever" [meaning I stop it after umpteen minutes]
select id, date_time, value
from a_table_view
where id = 'keyvalue'
and fn_change_timezone (date_time, 'UTC', 'US/Central') = to_date('01-10-2014','mm-dd-yyyy')
(I know I'd have to change the date being compared, its just for example)
So my question is two-fold:
If the function is so fast outside of the where clause, why is it so much slower than say using TRUNC() or other functions (obviously trunc() doesnt do a table lookup like my function - but still the function is very very fast outside the where clause)
What are alternate ways of accomplishing this outside of the where clause ?
I tried this as an alternative, which did not seem any better, it still ran until I stopped the query:
select
tz.date_time,
v.id,
v.value
from
(select
fn_change_timezone(to_date('01/10/2014-00:00:00', 'mm/dd/yyyy-hh24:mi:ss'), 'UTC', 'US/Central') as date_time
from dual
) tz
inner join
(
select
id,
fn_change_timezone (date_time, 'UTC', 'US/Central') AS v_date_time,
value
from a_table_view
where id = 'keyvalue'
) v ON
v.tz_date_time = tz.date_time
Hopefully I am explaining the issue well.
There are at least four potential issues with using functions in the WHERE clause:
Functions may prevent indexes. A function-based index can solve this issue.
Functions may prevent partition pruning. Hard-coding values or maybe virtual column partitioning are possible solutions, although neither is likely helpful in this case.
Functions may run slowly. Even if the function is cheap, it is often very expensive to switch between SQL and PL/SQL. Some possible solutions are DETERMINISTIC, PARALLEL_ENABLE, function result caching, defining the logic in purely SQL, or with 12c defining the function in SQL.
Functions may cause bad cardinality estimates. It's hard enough for the optimizer to guess the result of normal conditions, adding procedural code makes it even more difficult. Using ASSOCIATE STATISTICS it is possible to provide some information to the optimizer about the cost and cardinality of the function.
Without more information, such as an explain plan, it is difficult to know what the specific issue is with this query.
Function calls in the WHERE clause are a Bad Thing. The problem is that the function may be called for every row in the table, which may be many more than the selected set. This can be a real performance killer (don't ask me how I know :-). In the first version with the function call in the SELECT list the function will only be called when a row has been chosen and is being added to the result set - in the second version the function may well be called for every row in the table. Also, depending on the version of Oracle you're using there may be significant overhead to calling a user function from SQL, but I think this penalty has been largely eliminated in versions since 10g.
Best of luck.
Share and enjoy.

Why is PostgreSQL calling my STABLE/IMMUTABLE function multiple times?

I'm trying to optimise a complex query in PostgreSQL 9.1.2, which calls some functions. These functions are marked STABLE or IMMUTABLE and are called several times with the same arguments in the query. I assumed PostgreSQL would be smart enough to only call them once for each set of inputs - after all, that's the point of STABLE and IMMUTABLE, isn't it? But it appears that the functions are being called multiple times. I wrote a simple function to test this, which confirms it:
CREATE OR REPLACE FUNCTION test_multi_calls1(one integer)
RETURNS integer
AS $BODY$
BEGIN
RAISE NOTICE 'Called with %', one;
RETURN one;
END;
$BODY$ LANGUAGE plpgsql IMMUTABLE;
WITH data AS
(
SELECT 10 AS num
UNION ALL SELECT 10
UNION ALL SELECT 20
)
SELECT test_multi_calls1(num)
FROM data;
Output:
NOTICE: Called with 10
NOTICE: Called with 10
NOTICE: Called with 20
Why is this happening and how can I get it to only execute the function once?
The following extension of your test code is informative:
CREATE OR REPLACE FUNCTION test_multi_calls1(one integer)
RETURNS integer
AS $BODY$
BEGIN
RAISE NOTICE 'Immutable called with %', one;
RETURN one;
END;
$BODY$ LANGUAGE plpgsql IMMUTABLE;
CREATE OR REPLACE FUNCTION test_multi_calls2(one integer)
RETURNS integer
AS $BODY$
BEGIN
RAISE NOTICE 'Volatile called with %', one;
RETURN one;
END;
$BODY$ LANGUAGE plpgsql VOLATILE;
WITH data AS
(
SELECT 10 AS num
UNION ALL SELECT 10
UNION ALL SELECT 20
)
SELECT test_multi_calls1(num)
FROM data
where test_multi_calls2(40) = 40
and test_multi_calls1(30) = 30
OUTPUT:
NOTICE: Immutable called with 30
NOTICE: Volatile called with 40
NOTICE: Immutable called with 10
NOTICE: Volatile called with 40
NOTICE: Immutable called with 10
NOTICE: Volatile called with 40
NOTICE: Immutable called with 20
Here we can see that while in the select-list the immutable function was called multiple times, in the where clause it was called once, while the volatile was called thrice.
The important thing isn't that PostgreSQL will only call a STABLE or IMMUTABLE function once with the same data - your example clearly shows that this is not the case - it's that it may call it only once. Or perhaps it will call it twice when it would have to call a volatile version 50 times, and so on.
There are different ways in which stability and immutability can be taken advantage of, with different costs and benefits. To provide the sort of saving you are suggesting it should make with select-lists it would have to cache the results, and then lookup each argument (or list of arguments) in this cache before either returning the cached result or calling function on a cache-miss. This would be more expensive than calling your function, even in the case where there was a high percentage of cache-hits (there could be 0% cache hits meaning this "optimisation" did extra work for absolutely no gain). It could store maybe just the last parameter and result, but again that could be completely useless.
This is especially so considering that stable and immutable functions are often the lightest functions.
With the where clause however, the immutability of test_multi_calls1 allows PostgreSQL to actually restructure the query from the plain meaning of the SQL given:
For every row calculate test_multi_calls1(30) and if the result is
equal to 30 continue processing the row in question
To a different query plan entirely:
Calculate test_multi_calls1(30) and if it is equal to 30 then
continue with the query otherwise return a zero row result-set without
any further calculation
This is the sort of use that PostgreSQL makes of STABLE and IMMUTABLE - not the caching of results, but the rewriting of queries into different queries which are more efficient but give the same results.
Note also that test_multi_calls1(30) is called before test_multi_calls2(40) no matter what order they appear in the where clause. This means that if the first call results in no rows being returned (replace = 30 with = 31 to test) then the volatile function won't be called at all - again regardless to which is on which side of the and.
This particular sort of rewriting depends upon immutability or stability. With where test_multi_calls1(30) != num query re-writing will happen for immutable but not for merely stable functions. With where test_multi_calls1(num) != 30 it won't happen at all (multiple calls) though there are other optimisations possible:
Expressions containing only STABLE and IMMUTABLE functions can be used with index scans. Expressions containing VOLATILE functions cannot. The number of calls may or may not decrease, but much more importantly the results of the calls will then be used in a much more efficient way in the rest of the query (only really matters on large tables, but then it can make a massive difference).
In all, don't think of volatility categories in terms of memoisation, but rather in terms of giving PostgreSQL's query planner opportunities to restructure entire queries in ways that are logically equivalent (same results) but much more efficient.
According to the documentation IMMUTABLE functions will return the same value given the same arguments. Since you are feeding dynamic arguments (And not even the same once) optimizer has no reason to believe that it will get the same results and hence calls the function. Better qustion is: why is your query invoking the function multiple times if it doesn't need to?

How to inline a variable in PL/SQL?

The Situation
I have some trouble with my query execution plan for a medium-sized query over a large amount of data in Oracle 11.2.0.2.0. In order to speed things up, I introduced a range filter that does roughly something like this:
PROCEDURE DO_STUFF(
org_from VARCHAR2 := NULL,
org_to VARCHAR2 := NULL)
-- [...]
JOIN organisations org
ON (cust.org_id = org.id
AND ((org_from IS NULL) OR (org_from <= org.no))
AND ((org_to IS NULL) OR (org_to >= org.no)))
-- [...]
As you can see, I want to restrict the JOIN of organisations using an optional range of organisation numbers. Client code can call DO_STUFF with (supposed to be fast) or without (very slow) the restriction.
The Trouble
The trouble is, PL/SQL will create bind variables for the above org_from and org_to parameters, which is what I would expect in most cases:
-- [...]
JOIN organisations org
ON (cust.org_id = org.id
AND ((:B1 IS NULL) OR (:B1 <= org.no))
AND ((:B2 IS NULL) OR (:B2 >= org.no)))
-- [...]
The Workaround
Only in this case, I measured the query execution plan to be a lot better when I just inline the values, i.e. when the query executed by Oracle is actually something like
-- [...]
JOIN organisations org
ON (cust.org_id = org.id
AND ((10 IS NULL) OR (10 <= org.no))
AND ((20 IS NULL) OR (20 >= org.no)))
-- [...]
By "a lot", I mean 5-10x faster. Note that the query is executed very rarely, i.e. once a month. So I don't need to cache the execution plan.
My questions
How can I inline values in PL/SQL? I know about EXECUTE IMMEDIATE, but I would prefer to have PL/SQL compile my query, and not do string concatenation.
Did I just measure something that happened by coincidence or can I assume that inlining variables is indeed better (in this case)? The reason why I ask is because I think that bind variables force Oracle to devise a general execution plan, whereas inlined values would allow for analysing very specific column and index statistics. So I can imagine that this is not just a coincidence.
Am I missing something? Maybe there is an entirely other way to achieve query execution plan improvement, other than variable inlining (note I have tried quite a few hints as well but I'm not an expert on that field)?
In one of your comments you said:
"Also I checked various bind values.
With bind variables I get some FULL
TABLE SCANS, whereas with hard-coded
values, the plan looks a lot better."
There are two paths. If you pass in NULL for the parameters then you are selecting all records. Under those circumstances a Full Table Scan is the most efficient way of retrieving data. If you pass in values then indexed reads may be more efficient, because you're only selecting a small subset of the information.
When you formulate the query using bind variables the optimizer has to take a decision: should it presume that most of the time you'll pass in values or that you'll pass in nulls? Difficult. So look at it another way: is it more inefficient to do a full table scan when you only need to select a sub-set of records, or to do indexed reads when you need to select all records?
It seems as though the optimizer has plumped for full table scans as being the least inefficient operation to cover all eventualities.
Whereas when you hard code the values the Optimizer knows immediately that 10 IS NULL evaluates to FALSE, and so it can weigh the merits of using indexed reads for find the desired sub-set records.
So, what to do? As you say this query is only run once a month I think it would only require a small change to business processes to have separate queries: one for all organisations and one for a sub-set of organisations.
"Btw, removing the :R1 IS NULL clause
doesn't change the execution plan
much, which leaves me with the other
side of the OR condition, :R1 <=
org.no where NULL wouldn't make sense
anyway, as org.no is NOT NULL"
Okay, so the thing is you have a pair of bind variables which specify a range. Depending on the distribution of values, different ranges might suit different execution plans. That is, this range would (probably) suit an indexed range scan...
WHERE org.id BETWEEN 10 AND 11
...whereas this is likely to be more fitted to a full table scan...
WHERE org.id BETWEEN 10 AND 1199999
That is where Bind Variable Peeking comes into play.
(depending on distribution of values, of course).
Since the query plans are actually consistently different, that implies that the optimizer's cardinality estimates are off for some reason. Can you confirm from the query plans that the optimizer expects the conditions to be insufficiently selective when bind variables are used? Since you're using 11.2, Oracle should be using adaptive cursor sharing so it shouldn't be a bind variable peeking issue (assuming you are calling the version with bind variables many times with different NO values in your testing.
Are the cardinality estimates on the good plan actually correct? I know you said that the statistics on the NO column are accurate but I would be suspicious of a stray histogram that may not be updated by your regular statistics gathering process, for example.
You could always use a hint in the query to force a particular index to be used (though using a stored outline or optimizer plan stability would be preferable from a long-term maintenance perspective). Any of those options would be preferable to resorting to dynamic SQL.
One additional test to try, however, would be to replace the SQL 99 join syntax with Oracle's old syntax, i.e.
SELECT <<something>>
FROM <<some other table>> cust,
organization org
WHERE cust.org_id = org.id
AND ( ((org_from IS NULL) OR (org_from <= org.no))
AND ((org_to IS NULL) OR (org_to >= org.no)))
That obviously shouldn't change anything, but there have been parser issues with the SQL 99 syntax so that's something to check.
It smells like Bind Peeking, but I am only on Oracle 10, so I can't claim the same issue exists in 11.
This looks a lot like a need for Adaptive Cursor Sharing, combined with SQLPlan stability.
I think what is happening is that the capture_sql_plan_baselines parameter is true. And the same for use_sql_plan_baselines. If this is true, the following is happening:
The first time that a query started it is parsed, it gets a new plan.
The second time, this plan is stored in the sql_plan_baselines as an accepted plan.
All following runs of this query use this plan, regardless of what the bind variables are.
If Adaptive Cursor Sharing is already active,the optimizer will generate a new/better plan, store it in the sql_plan_baselines but is not able to use it, until someone accepts this newer plan as an acceptable alternative plan. Check dba_sql_plan_baselines and see if your query has entries with accepted = 'NO' and verified = null
You can use dbms_spm.evolve to evolve the new plan and have it automatically accepted if the performance of the plan is at least 1,5 times better than without the new plan.
I hope this helps.
I added this as a comment, but will offer up here as well. Hope this isn't overly simplistic, and looking at the detailed responses I may be misunderstanding the exact problem, but anyway...
Seems your organisations table has column no (org.no) that is defined as a number. In your hardcoded example, you use numbers to do the compares.
JOIN organisations org
ON (cust.org_id = org.id
AND ((10 IS NULL) OR (10 <= org.no))
AND ((20 IS NULL) OR (20 >= org.no)))
In your procedure, you are passing in varchar2:
PROCEDURE DO_STUFF(
org_from VARCHAR2 := NULL,
org_to VARCHAR2 := NULL)
So to compare varchar2 to number, Oracle will have to do the conversions, so this may cause the full scans.
Solution: change proc to pass in numbers

Oracle (PL/SQL): Is UPDATE RETURNING concurrent?

I'm using table with a counter to ensure unique id's on a child element.
I know it is usually better to use a sequence, but I can't use it because I have a lot of counters (a customer can create a couple of buckets and each of them needs to have their own counter, they have to start with 1 (it's a requirement, my customer needs "human readable" keys).
I'm creating records (let's call them items) that have a prikey (bucket_id, num = counter).
I need to guarantee that the bucket_id / num combination is unique (so using a sequence as prikey won't fix my problem).
The creation of rows doesn't happen in pl/sql, so I need to claim the number (btw: it's not against the requirements to have gaps).
My solution was:
UPDATE bucket
SET counter = counter + 1
WHERE id = param_id
RETURNING counter INTO num_forprikey;
PL/SQL returns var_num_forprikey so the item record can be created.
Question:
Will I always get unique num_forprikey even if the user concurrently asks for new items in a bucket?
Will I always get unique num_forprikey
even if the user concurrently asks for
new items in a bucket?
Yes, at least up to a point. The first user to issue that update gets a lock on the row. So no other user can successfully issue that same statement until user numero uno commits (or rolls back). So uniqueness is guaranteed.
Obviously, the cavil is regarding concurrency. Your access to the row is serialized, so there is no way for two users to get a new PRIKEY simultaneously. This is not necessarily a problem. It depends on how many users you have creating new Items, and how often they do it. One user peeling off numbers in the same session won't notice a thing.
I seem to recall this problem from many years back working on of all things an INGRES database. There were no sequences in those days so a lot of effort was put into finding the best scaling solution for this problem by the top INGRES minds of the day. I was fortunate enough to be working along side them so that even though my mind is pitifully smaller than any of theirs, proxmity = residual affect and I retained something. This was one of the things. Let me see if I can remember.
1) for each counter you need row in a work table.
2) each time you need a number
a) lock the row
b) update it
c) get its new value (you use returning for this which I avoid like the plague)
d) commit the update to release your lock on the row
The reason for the commit is for trying to get some kind of scalability. There will always be a limit but you do not serialize on getting a number for any period of time.
In the oracle world we would improve the situation by using a function defined as an AUTONOMOUS_TRANSACTION in order to acquire the next number. IF you think about it, this solution requires that gaps be allowed which you said is OK. By commiting the number update independently of the main transaction, you gain scalability but you introduce gapping.
You will have to accept the fact that your scalability will drop dramatically in this scenario. This is due to at least two reasons:
1) the update/select/commit sequence does its best to reduce the time during which the KEY row is locked, but it is still not zero. Under heavy load, you will serialize and eventually be limited.
2) you are commiting on every key get. A commit is an expensive operation requiring many memory and file management actions on the part of the database. This will limit you also.
In the end you are likely looking at three or more orders of magnitude drop in concurrent transaction load because you are not using sequences. I base this on my experience of the past.
But if you customer requires it, what can you do right?
Good luck. I have not tested the code for syntax errors, I leave that to you.
create or replace function get_next_key (key_name_p in varchar2) return number is
pragma autonomous_transaction;
kev_v number;
begin
update key_table set key = key + 1 where key_name = key_name_p;
select key_name into key_name_v from key_name where key_name = key_name_p;
commit;
return (key_v);
end;
/
show errors
You can still use sequences, just use the row_number() analytic function to please your users. I described it here in more detail: http://rwijk.blogspot.com/2008/01/sequence-within-parent.html
Regards,
Rob.
I'd figure out how to make sequences work. It's the only guarantee, though an exception clause could be coded
http://www.orafaq.com/forum/t/83382/0/ The benefit to sequences (and they could be dynamically created, is you can specify nocache and guarantee order)

Resources