Performance of User Defined Functions in SQL Server 2000 and 2005 - performance

We have two tables, Customer and CustomerEvent both contains few million rows. On SQL Server 2000, we deployed an UDF called fn_CustomerEvent which returns TRUE or FALSE based on two parameters CustomerID and EventCode, e.g.
SELECT dbo.fn_CustomerEvent(1345678, 'Music')
The UDF code is:
CREATE FUNCTION [dbo].[fn_CustomerEvent](#CustomerID INT, #EviCode NVARCHAR(10))
RETURNS NVARCHAR(10)
AS
BEGIN
DECLARE #List NVARCHAR(10)
SELECT #List = CASE
WHEN COUNT(*) > 0 THEN 'TRUE'
ELSE 'FALSE'
END
FROM CustomerEvent
WHERE
CustomerID = #CustomerID
AND EviCode = #EviCode
RETURN #List
END
The performance on SQL Server 2000 was great. Return TOP 5000 rows within 3 seconds. For example,
SELECT TOP 5000
CustomerID, dbo.fn_CustomerEvent(1345678, 'Music')
FROM [Table1]
But now, we are moving to SQL Server 2005. Same code, same UDF, but performance drops dramatically from 3 seconds to 1 minutes 20 seconds.
Can anyone point me a right direction on where should I start to optimize the performance?

The scalar UDF is evaluated for each row (i.e. 5000 times). You could either call it once and store the result in a variable
DECLARE #Result nvarchar(10)
SELECT #Result = dbo.fn_CustomerEvent(1345678, 'Music')
SELECT TOP 5000
CustomerID, #Result
FROM [Table1]
or you can use an inline TVF (and I would also use EXISTS instead of COUNT)
CREATE FUNCTION CustomerEvent (#CustomerID INT,
#EviCode NVARCHAR(10))
RETURNS TABLE
AS
RETURN
(SELECT CASE
WHEN EXISTS(SELECT *
FROM CustomerEvent
WHERE CustomerID = #CustomerID
AND EviCode = #EviCode) THEN 'TRUE'
ELSE 'FALSE'
END)
See Scalar functions, inlining, and performance: An entertaining title for a boring post for more about this technique.

There is one big problem with UDF's: they don't work with indexes. If you want to get code re-use and maintain performance, I will normally build either a computed column (which can be indexed) or a view.

CREATE FUNCTION CustomerEvent (#CustomerID INT,
#EviCode NVARCHAR(10))
RETURNS TABLE
AS
RETURN
(SELECT COALESCE((SELECT 'TRUE' FROM CustomerEvent
WHERE
CustomerID = #CustomerID
AND EviCode = #EviCode)
, 'FALSE'))
Check for Indexes, Rebuild Them and Update your statistics.

Related

MIN function behavior changed on Oracle databases after SAS Upgrade to 9.4M7

I have a program that has been working for years. Today, we upgraded from SAS 9.4M3 to 9.4M7.
proc setinit
Current version: 9.04.01M7P080520
Since then, I am not able to get the same results as before the upgrade.
Please note that I am querying on Oracle databases directly.
Trying to replicate the issue with a minimal, reproducible SAS table example, I found that the issue disappear when querying on a SAS table instead of on Oracle databases.
Let's say I have the following dataset:
data have;
infile datalines delimiter="|";
input name :$8. id $1. value :$8. t1 :$10.;
datalines;
Joe|A|TLO
Joe|B|IKSK
Joe|C|Yes
;
Using the temporary table:
proc sql;
create table want as
select name,
min(case when id = "A" then value else "" end) as A length 8
from have
group by name;
quit;
Results:
name A
Joe TLO
However, when running the very same query on the oracle database directly I get a missing value instead:
proc sql;
create table want as
select name,
min(case when id = "A" then value else "" end) as A length 8
from have_oracle
group by name;
quit;
name A
Joe
As per documentation, the min() function is behaving properly when used on the SAS table
The MIN function returns a missing value (.) only if all arguments are missing.
I believe this happens when Oracle don't understand the function that SAS is passing it - the min functions in SAS and Oracle are very different and the equivalent in SAS would be LEAST().
So my guess is that the upgrade messed up how is translates the SAS min function to Oracle, but it remains a guess. Does anyone ran into this type of behavior?
EDIT: #Richard's comment
options sastrace=',,,d' sastraceloc=saslog nostsuffix;
proc sql;
create table want as
select t1.name,
min(case when id = 'A' then value else "" end) as A length 8
from oracle_db.names t1 inner join oracle_db.ids t2 on (t1.tid = t2.tid)
group by t1.name;
ORACLE_26: Prepared: on connection 0
SELECT * FROM NAMES
ORACLE_27: Prepared: on connection 1
SELECT UI.INDEX_NAME, UIC.COLUMN_NAME FROM USER_INDEXES UI,USER_IND_COLUMNS UIC WHERE UI.TABLE_NAME='NAMES' AND
UIC.TABLE_NAME='NAMES' AND UI.INDEX_NAME=UIC.INDEX_NAME
ORACLE_28: Executed: on connection 1
SELECT statement ORACLE_27
ORACLE_29: Prepared: on connection 0
SELECT * FROM IDS
ORACLE_30: Prepared: on connection 1
SELECT UI.INDEX_NAME, UIC.COLUMN_NAME FROM USER_INDEXES UI,USER_IND_COLUMNS UIC WHERE UI.TABLE_NAME='IDS' AND
UIC.TABLE_NAME='IDS' AND UI.INDEX_NAME=UIC.INDEX_NAME
ORACLE_31: Executed: on connection 1
SELECT statement ORACLE_30
ORACLE_32: Prepared: on connection 0
select t1."NAME", MIN(case when t2."ID" = 'A' then t1."VALUE" else ' ' end) as A from
NAMES t1 inner join IDS t2 on t1."TID" = t2."TID" group by t1."NAME"
ORACLE_33: Executed: on connection 0
SELECT statement ORACLE_32
ACCESS ENGINE: SQL statement was passed to the DBMS for fetching data.
NOTE: Table WORK.SELECTED_ATTR created, with 1 row and 2 columns.
! quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.34 seconds
cpu time 0.09 seconds
Use the SASTRACE= system option to log SQL statements sent to the DBMS.
options SASTRACE=',,,d';
will provide the most detailed logging.
From the prepared statement you can see why you are getting a blank from the Oracle query.
select
t1."NAME"
, MIN ( case
when t2."ID" = 'A' then t1."VALUE"
else ' '
end
) as A
from
NAMES t1 inner join IDS t2 on t1."TID" = t2."TID"
group by
t1."NAME"
The SQL MIN () aggregate function will exclude null values from consideration.
In SAS SQL, a blank value is also interpreted as null.
In SAS your SQL query returns the min non-null value TLO
In Oracle transformed query, the SAS blank '' is transformed to ' ' a single blank character, which is not-null, and thus ' ' < 'TLO' and you get the blank result.
The actual MIN you want to force in Oracle is min(case when id = "A" then value else null end) which #Tom has shown is possible by omitting the else clause.
The only way to see the actual difference is to run the query with trace in the prior SAS version, or if lucky, see the explanation in the (ignored by many) "What's New" documents.
Why are you using ' ' or '' as the ELSE value? Perhaps Oracle is treating a string with blanks in it differently than a null string.
Why not use null in the ELSE clause?
or just leave off the ELSE clause and let it default to null?
libname mylib oracle .... ;
proc sql;
create table want as
select name
, min(case when id = "A" then value else null end) as A length 8
from mylib.have_oracle
group by name
;
quit;
Also try running the Oracle code yourself, instead of using implicit pass thru.
proc sql;
connect to oracle ..... ;
create table want as
select * from connection to oracle
(
select name,
min(case when id = "A" then value else null end) as A length 8
from have_oracle
group by name
)
;
quit;
When I try to reproduce this in Oracle I get the result you are looking for so I suspect it has something to do with SAS (which I'm not familiar with).
with t as (
select 'Joe' name, 'A' id, 'TLO' value from dual union all
select 'Joe' name, 'B' id, 'IKSK' value from dual union all
select 'Joe' name, 'C' id, 'Yes' value from dual
)
select name
, min(case when id = 'A' then value else '' end) as a
from t
group by name;
NAME A
---- ----
Joe TLO
Unrelated, if you are only interested in id = 'A' then a better query would be:
select name
, min(value) as a
from t
where id = 'A'
group by name;

Return Boolean value when table has data in the specified range

I need a query to return boolean when there's table has data in the given range.
Assume table
Customer
[User ID, Name, Date, Products_Purchased]
I'm trying to do:
select case when exists(
select Date, count(*)
from Customer
where date between '2015-08-03' and '2015-08-05'
)
then cast(1 as BIT)
else case(0 as BIT)end;
This is throwing an error near "select Date".
However, weird part is the inner query is running perfectly fine.
Im wondering if im missing out something here !
What about something more straightforward e.g.
select case when count(*) >0 then 1 else 0 end as HIT
from ... where ...
That way you don't have to bother about Hive assuming that EXISTS implies a correlated sub-query, automagically translated into a MapJoin, i.e. a Java HashMap shuffled to the 2nd line of Mappers jobs, etc. Not exactly your use case.
Then it's not useful to compute the exact count, so the query could be refined as
select case when count(*) >0 then 1 else 0 end as HIT
from
(select ... from ... where ... limit 1) X
[Edit] There is no "bit" datatype in Hive. But the default "int" should be OK if you just want a return flag (zero / non-zero)

SQL Tuning for IN Clause

The below Teradata query is taking around 18 seconds to complete.
The highlighted values passed in IN clause is from another Oracle database so I am not able to implement a join with that table.
SELECT distinct sec.SerialNum esn, ef.EngineFamilyCd family, em.EngineModelCd model,
es.EngineSeriesCd series, sac.AircraftTailNum tailNumRef, sec.EnginePositionNum enginePosition,
o1.OrganizationId ownerOrgId, o2.OrganizationId operatorOrgId,
sec.EngineInstallationDttm installedDate, sec.EngineRemovalDttm removalDate,
sec.HardwareConfigNm hardwareConfig, sec.EngineControlNm engineControl,
sec.ApplicationSelectorNm appSelector, sec.EngineMonitorInd engineMonitorInd,
sec.EngineThrustRatingFctr engineThrustRating, sec.StatusDesc engineStatus, sec.n1modifiernum n1modifier
FROM DB_MASTER_BV.SZEngineCurrent sec,
DB_MASTER_BV.EngineSeries es,
DB_MASTER_BV.EngineModel em,
DB_Master_BV.EngineFamily ef,
DB_MASTER_BV.SZAircraftCurrent sac,
DB_MASTER_BV.Organization o1,
DB_MASTER_BV.Organization o2
WHERE sec.EngineSeriesCd = es.EngineSeriesCd
and es.EngineModelCd = em.EngineModelCd
and em.EngineFamilyCd = ef.EngineFamilyCd
and sec.MasterAircraftId = sac.MasterAircraftId
and o1.MasterOrganizationId = sec.OwnerMasterOrganizationId
and o2.MasterOrganizationId = sec.OperatorMasterOrganizationId
AND (sec.SerialNum in('733276','193283','690168','741471','876374','873383','193386','906397','804314','900116','785670','900399','724321','193488','811373','779917','193699','994688',
'779410','575169','A59299','900206','193297','575484','896359','367230','810105','876485','906385','876484','707149','811222','706801','193596','731949','697881',
'889697','804626','575194','707159','706129','900230','900231','706834','811352','900229','785748','193460','888221','906272','906266','906264','906263','994356',
'194431','731966','892417','811341','577413','741572','575564','889262','706956','876157','900257','900153','706958','706957','960436','892429','892427','900354',
'697138','645655','193352','994337','707189','697833','959190','900246','811317','577437','193643','697976','890692','193229','965579','900137','900135','894897',
'697723','193363','193367','785505','907077','959184','811311','706526','577302','706529','994332','702792','706663','779834','731931','960127','193371','876183',
'741563','193235','803843','577320','994318','907087','741460','907086','959170','994462','900464','193626','877503','643711','811202','811201','704585','193504',
'193500','875246','704876','725834','699783','699780','802380','900304','706885','906191','577773','959152','872574','811435','697388','699381','892485','577698',
'907035','811445','907039','894999','894857','894595','697273','894597','959139','577894','874898','706959','900424','193337','577697','907011','875696','699555',
'699554','575629','906149','906150','193452','962968','811264','811266','962970','875395','699543','575638','906153','857962','896247','858349','779746','906161',
'906928','802857','779640','193424','550309','424520','550305','575608','872517','906169','892196','811386','811385','906173','907220','959234','876666','959231',
'876662','893785','875914','802649','550218','550315','906111','741984','550319','906405','906501','550118','643371','785254','550116','550117','802946','906629',
'907145','550325','550324','906837','550320','906838','702591','550220','550227','906415','690289','906517','704416','731431','550125','959201','906413','994176',
'550333','550140','550337','891651','550141','550338','906746','907269','550132','550137','550138','892914','550342','906123','550153','550345','950923','906129',
'873188','906850','906953','690270','890713','645352','893127','697590','874826','424439','893126','907110','550144','856305','690269','892824','550256','550257',
'906867','907186','960852','720754','960851','906866','888607','805573','811530','960756','872352','550266','550267','550264','811518','888896','906730','994958',
'892247','960970','875186','906987','424124','550232','A59303','702660','875885','811609','888626','424219','906897','994981','731502','697496','695345','962996',
'894371','907153','805541','907154','424337','906613','906615','900512','906610','956141','994611','804582','994718','888648','575219','888756','896973','424395',
'872117','A59227','697616','731380','697614','900161','690410','994213','956155','956154','779492','994231','702876','577248','994727','193818','890879','722243',
'906499','577354','888560','645121','896972','960823','804279','900175','888853','193724','550285','550282','906469','994803','906466','888299','877141','890984',
'695688','994533','888327','A59348','A59346','994410','733116','550296','550290','550292','906478','731763','725658','896408','645145','994751','731654','740358',
'906441','550158','193849','906543','906448','994262','575824','424186','906345','643663','888305','906243','906244','702963','906453','906452','956119','906451',
'956116','950489','550166','906454','367457','896764','575833','994268','906252','994127','733236','906258','956123','550178','994777','956126','956127','956128',
'906786','906788','906687','643290','994631','956225','994632','888574','906365','804228','731599','643682','550182','804369','994784','550186','550183','888826',
'575127','906439','890482','906438','906691','890472','994509','193147','575718','804215','575276','994793','897257'))
and END(sec.EngineValidPd) is until_changed
order by esn
Also if there is more than 1000 records, I am implementing the IN clause as follows
AND (sec.SerialNum in( first 999 recods) OR sec.SerialNum in( next 999 recods)… OR sec.SerialNum in( remaining recods))
Please suggest solution which would be faster than the above query and which will not cause issue with more than 1000 records in IN clause
What is your Teradata release?
In TD14 there's a built-in table function to split a string of values, you can simply pass all values within a single string:
AND sec.SerialNum IN
(
SELECT token
FROM TABLE (STRTOK_SPLIT_TO_TABLE(1, '733276,193283,690168,741471,876374', ',')
RETURNS (outkey INTEGER,
tokennum INTEGER,
token VARCHAR(20) CHARACTER SET UNICODE)
) AS d
)

Oracle and possible constant predicates in "WHERE" clause

I have a common problem with ORACLE in following example code:
create or replace procedure usp_test
(
p_customerId number,
p_eventTypeId number,
p_out OUT SYS_REFCURSOR
)
as
begin
open p_out for
select e.Id from eventstable e
where
(p_customerId is null or e.CustomerId = p_customerId)
and
(p_eventTypeId is null or e.EventTypeId = p_eventTypeId)
order by Id asc;
end usp_test;
The "OR" in "(p_customerId is null or e.CustomerId = p_customerId)" kills procedure performance, because optimizer will not use index (i hope for index seek) on "CustomerId" column optimally, resulting in scan instead of seek. Index on "CustomerId" has plenty of distinct values.
When working with MSSQL 2008 R2 (latest SP) or MSSQL 2012 i can hint the query with "option(recompile)" which will:
Recompile just this query
Resolve values for all variables (they are known after sproc is called)
Replace all resolved variables with constants and eliminate constant
predicate parts
For example: if i pass p_customerId = 1000, then "1000 is null" expression will always be false, so optimizer will ignore it.
This will add some CPU overhead, but it is used mostly for rarely called massive reports procedures, so no problems here.
Is there any way to do that in Oracle? Dynamic-SQL is not an option.
Adds
Same procedure just without "p_customerId is null" and "p_eventTypeId is null" runs for ~0.041 seconds, while the upper one runs for ~0.448 seconds (i have ~5.000.000 rows).
CREATE INDEX IX_YOURNAME1 ON eventstable (NVL(p_customerId, 'x'));
CREATE INDEX IX_YOURNAME2 ON eventstable (NVL(p_eventTypeId, 'x'));
create or replace procedure usp_test
(
p_customerId number,
p_eventTypeId number,
p_out OUT SYS_REFCURSOR
)
as
begin
open p_out for
select e.Id from eventstable e
where
(NVL(p_customerId, 'x') = e.CustomerId OR NVL(p_customerId, 'x') = 'x')
AND (NVL(p_eventTypeId, 'x') = e.EventTypeId OR NVL(p_eventTypeId, 'x') = 'x')
order by Id asc;
end usp_test;
One column index can't help as it's not stored in index definition.
Is creating index on (customer id, event id, id ) allowed? This way all needed columns are in index...

How to put more than 1000 values into an Oracle IN clause [duplicate]

This question already has answers here:
SQL IN Clause 1000 item limit
(5 answers)
Closed 8 years ago.
Is there any way to get around the Oracle 10g limitation of 1000 items in a static IN clause? I have a comma delimited list of many of IDs that I want to use in an IN clause, Sometimes this list can exceed 1000 items, at which point Oracle throws an error. The query is similar to this...
select * from table1 where ID in (1,2,3,4,...,1001,1002,...)
Put the values in a temporary table and then do a select where id in (select id from temptable)
select column_X, ... from my_table
where ('magic', column_X ) in (
('magic', 1),
('magic', 2),
('magic', 3),
('magic', 4),
...
('magic', 99999)
) ...
I am almost sure you can split values across multiple INs using OR:
select * from table1 where ID in (1,2,3,4,...,1000) or
ID in (1001,1002,...,2000)
You may try to use the following form:
select * from table1 where ID in (1,2,3,4,...,1000)
union all
select * from table1 where ID in (1001,1002,...)
Where do you get the list of ids from in the first place? Since they are IDs in your database, did they come from some previous query?
When I have seen this in the past it has been because:-
a reference table is missing and the correct way would be to add the new table, put an attribute on that table and join to it
a list of ids is extracted from the database, and then used in a subsequent SQL statement (perhaps later or on another server or whatever). In this case, the answer is to never extract it from the database. Either store in a temporary table or just write one query.
I think there may be better ways to rework this code that just getting this SQL statement to work. If you provide more details you might get some ideas.
Use ...from table(... :
create or replace type numbertype
as object
(nr number(20,10) )
/
create or replace type number_table
as table of numbertype
/
create or replace procedure tableselect
( p_numbers in number_table
, p_ref_result out sys_refcursor)
is
begin
open p_ref_result for
select *
from employees , (select /*+ cardinality(tab 10) */ tab.nr from table(p_numbers) tab) tbnrs
where id = tbnrs.nr;
end;
/
This is one of the rare cases where you need a hint, else Oracle will not use the index on column id. One of the advantages of this approach is that Oracle doesn't need to hard parse the query again and again. Using a temporary table is most of the times slower.
edit 1 simplified the procedure (thanks to jimmyorr) + example
create or replace procedure tableselect
( p_numbers in number_table
, p_ref_result out sys_refcursor)
is
begin
open p_ref_result for
select /*+ cardinality(tab 10) */ emp.*
from employees emp
, table(p_numbers) tab
where tab.nr = id;
end;
/
Example:
set serveroutput on
create table employees ( id number(10),name varchar2(100));
insert into employees values (3,'Raymond');
insert into employees values (4,'Hans');
commit;
declare
l_number number_table := number_table();
l_sys_refcursor sys_refcursor;
l_employee employees%rowtype;
begin
l_number.extend;
l_number(1) := numbertype(3);
l_number.extend;
l_number(2) := numbertype(4);
tableselect(l_number, l_sys_refcursor);
loop
fetch l_sys_refcursor into l_employee;
exit when l_sys_refcursor%notfound;
dbms_output.put_line(l_employee.name);
end loop;
close l_sys_refcursor;
end;
/
This will output:
Raymond
Hans
I wound up here looking for a solution as well.
Depending on the high-end number of items you need to query against, and assuming your items are unique, you could split your query into batches queries of 1000 items, and combine the results on your end instead (pseudocode here):
//remove dupes
items = items.RemoveDuplicates();
//how to break the items into 1000 item batches
batches = new batch list;
batch = new batch;
for (int i = 0; i < items.Count; i++)
{
if (batch.Count == 1000)
{
batches.Add(batch);
batch.Clear()
}
batch.Add(items[i]);
if (i == items.Count - 1)
{
//add the final batch (it has < 1000 items).
batches.Add(batch);
}
}
// now go query the db for each batch
results = new results;
foreach(batch in batches)
{
results.Add(query(batch));
}
This may be a good trade-off in the scenario where you don't typically have over 1000 items - as having over 1000 items would be your "high end" edge-case scenario. For example, in the event that you have 1500 items, two queries of (1000, 500) wouldn't be so bad. This also assumes that each query isn't particularly expensive in of its own right.
This wouldn't be appropriate if your typical number of expected items got to be much larger - say, in the 100000 range - requiring 100 queries. If so, then you should probably look more seriously into using the global temporary tables solution provided above as the most "correct" solution. Furthermore, if your items are not unique, you would need to resolve duplicate results in your batches as well.
Yes, very weird situation for oracle.
if you specify 2000 ids inside the IN clause, it will fail.
this fails:
select ...
where id in (1,2,....2000)
but if you simply put the 2000 ids in another table (temp table for example), it will works
below query:
select ...
where id in (select userId
from temptable_with_2000_ids )
what you can do, actually could split the records into a lot of 1000 records and execute them group by group.
Here is some Perl code that tries to work around the limit by creating an inline view and then selecting from it. The statement text is compressed by using rows of twelve items each instead of selecting each item from DUAL individually, then uncompressed by unioning together all columns. UNION or UNION ALL in decompression should make no difference here as it all goes inside an IN which will impose uniqueness before joining against it anyway, but in the compression, UNION ALL is used to prevent a lot of unnecessary comparing. As the data I'm filtering on are all whole numbers, quoting is not an issue.
#
# generate the innards of an IN expression with more than a thousand items
#
use English '-no_match_vars';
sub big_IN_list{
#_ < 13 and return join ', ',#_;
my $padding_required = (12 - (#_ % 12)) % 12;
# get first dozen and make length of #_ an even multiple of 12
my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l) = splice #_,0,12, ( ('NULL') x $padding_required );
my #dozens;
local $LIST_SEPARATOR = ', '; # how to join elements within each dozen
while(#_){
push #dozens, "SELECT #{[ splice #_,0,12 ]} FROM DUAL"
};
$LIST_SEPARATOR = "\n union all\n "; # how to join #dozens
return <<"EXP";
WITH t AS (
select $a A, $b B, $c C, $d D, $e E, $f F, $g G, $h H, $i I, $j J, $k K, $l L FROM DUAL
union all
#dozens
)
select A from t union select B from t union select C from t union
select D from t union select E from t union select F from t union
select G from t union select H from t union select I from t union
select J from t union select K from t union select L from t
EXP
}
One would use that like so:
my $bases_list_expr = big_IN_list(list_your_bases());
$dbh->do(<<"UPDATE");
update bases_table set belong_to = 'us'
where id in ($bases_list_expr)
UPDATE
Instead of using IN clause, can you try using JOIN with the other table, which is fetching the id. that way we don't need to worry about limit. just a thought from my side.
Instead of SELECT * FROM table1 WHERE ID IN (1,2,3,4,...,1000);
Use this :
SELECT * FROM table1 WHERE ID IN (SELECT rownum AS ID FROM dual connect BY level <= 1000);
*Note that you need to be sure the ID does not refer any other foreign IDS if this is a dependency. To ensure only existing ids are available then :
SELECT * FROM table1 WHERE ID IN (SELECT distinct(ID) FROM tablewhereidsareavailable);
Cheers

Resources