MIN function behavior changed on Oracle databases after SAS Upgrade to 9.4M7 - oracle

I have a program that has been working for years. Today, we upgraded from SAS 9.4M3 to 9.4M7.
proc setinit
Current version: 9.04.01M7P080520
Since then, I am not able to get the same results as before the upgrade.
Please note that I am querying on Oracle databases directly.
Trying to replicate the issue with a minimal, reproducible SAS table example, I found that the issue disappear when querying on a SAS table instead of on Oracle databases.
Let's say I have the following dataset:
data have;
infile datalines delimiter="|";
input name :$8. id $1. value :$8. t1 :$10.;
datalines;
Joe|A|TLO
Joe|B|IKSK
Joe|C|Yes
;
Using the temporary table:
proc sql;
create table want as
select name,
min(case when id = "A" then value else "" end) as A length 8
from have
group by name;
quit;
Results:
name A
Joe TLO
However, when running the very same query on the oracle database directly I get a missing value instead:
proc sql;
create table want as
select name,
min(case when id = "A" then value else "" end) as A length 8
from have_oracle
group by name;
quit;
name A
Joe
As per documentation, the min() function is behaving properly when used on the SAS table
The MIN function returns a missing value (.) only if all arguments are missing.
I believe this happens when Oracle don't understand the function that SAS is passing it - the min functions in SAS and Oracle are very different and the equivalent in SAS would be LEAST().
So my guess is that the upgrade messed up how is translates the SAS min function to Oracle, but it remains a guess. Does anyone ran into this type of behavior?
EDIT: #Richard's comment
options sastrace=',,,d' sastraceloc=saslog nostsuffix;
proc sql;
create table want as
select t1.name,
min(case when id = 'A' then value else "" end) as A length 8
from oracle_db.names t1 inner join oracle_db.ids t2 on (t1.tid = t2.tid)
group by t1.name;
ORACLE_26: Prepared: on connection 0
SELECT * FROM NAMES
ORACLE_27: Prepared: on connection 1
SELECT UI.INDEX_NAME, UIC.COLUMN_NAME FROM USER_INDEXES UI,USER_IND_COLUMNS UIC WHERE UI.TABLE_NAME='NAMES' AND
UIC.TABLE_NAME='NAMES' AND UI.INDEX_NAME=UIC.INDEX_NAME
ORACLE_28: Executed: on connection 1
SELECT statement ORACLE_27
ORACLE_29: Prepared: on connection 0
SELECT * FROM IDS
ORACLE_30: Prepared: on connection 1
SELECT UI.INDEX_NAME, UIC.COLUMN_NAME FROM USER_INDEXES UI,USER_IND_COLUMNS UIC WHERE UI.TABLE_NAME='IDS' AND
UIC.TABLE_NAME='IDS' AND UI.INDEX_NAME=UIC.INDEX_NAME
ORACLE_31: Executed: on connection 1
SELECT statement ORACLE_30
ORACLE_32: Prepared: on connection 0
select t1."NAME", MIN(case when t2."ID" = 'A' then t1."VALUE" else ' ' end) as A from
NAMES t1 inner join IDS t2 on t1."TID" = t2."TID" group by t1."NAME"
ORACLE_33: Executed: on connection 0
SELECT statement ORACLE_32
ACCESS ENGINE: SQL statement was passed to the DBMS for fetching data.
NOTE: Table WORK.SELECTED_ATTR created, with 1 row and 2 columns.
! quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.34 seconds
cpu time 0.09 seconds

Use the SASTRACE= system option to log SQL statements sent to the DBMS.
options SASTRACE=',,,d';
will provide the most detailed logging.
From the prepared statement you can see why you are getting a blank from the Oracle query.
select
t1."NAME"
, MIN ( case
when t2."ID" = 'A' then t1."VALUE"
else ' '
end
) as A
from
NAMES t1 inner join IDS t2 on t1."TID" = t2."TID"
group by
t1."NAME"
The SQL MIN () aggregate function will exclude null values from consideration.
In SAS SQL, a blank value is also interpreted as null.
In SAS your SQL query returns the min non-null value TLO
In Oracle transformed query, the SAS blank '' is transformed to ' ' a single blank character, which is not-null, and thus ' ' < 'TLO' and you get the blank result.
The actual MIN you want to force in Oracle is min(case when id = "A" then value else null end) which #Tom has shown is possible by omitting the else clause.
The only way to see the actual difference is to run the query with trace in the prior SAS version, or if lucky, see the explanation in the (ignored by many) "What's New" documents.

Why are you using ' ' or '' as the ELSE value? Perhaps Oracle is treating a string with blanks in it differently than a null string.
Why not use null in the ELSE clause?
or just leave off the ELSE clause and let it default to null?
libname mylib oracle .... ;
proc sql;
create table want as
select name
, min(case when id = "A" then value else null end) as A length 8
from mylib.have_oracle
group by name
;
quit;
Also try running the Oracle code yourself, instead of using implicit pass thru.
proc sql;
connect to oracle ..... ;
create table want as
select * from connection to oracle
(
select name,
min(case when id = "A" then value else null end) as A length 8
from have_oracle
group by name
)
;
quit;

When I try to reproduce this in Oracle I get the result you are looking for so I suspect it has something to do with SAS (which I'm not familiar with).
with t as (
select 'Joe' name, 'A' id, 'TLO' value from dual union all
select 'Joe' name, 'B' id, 'IKSK' value from dual union all
select 'Joe' name, 'C' id, 'Yes' value from dual
)
select name
, min(case when id = 'A' then value else '' end) as a
from t
group by name;
NAME A
---- ----
Joe TLO
Unrelated, if you are only interested in id = 'A' then a better query would be:
select name
, min(value) as a
from t
where id = 'A'
group by name;

Related

duplicates with connect by in oracle

select RP.COUNTRYID,RP.PRDCODE,
RP.REPID,
RP.CHANNELID,
RP.CUSTOMERID,
RP.DIVISION,
RP.WWCOGS_GAUSS,
RP.WWCOGS_SAP,
RP.WWCOGS_BusLine,
RP.CURRID ,
ADD_MONTHS(to_date('01-01-'||rp.year,'DD-MM-YYYY'),level-1) KFDATE
from
(SELECT CP.COUNTRYID,
CP.PRDCODE,
CP.REPID,
CP.CHANNELID,
CP.CUSTOMERID,
IP.DIVISION,
CP.WWCOGS_GAUSS,
CP.WWCOGS_SAP,
CP.WWCOGS_BusLine,
CP.year,
CP.CURRID from
(select distinct IBC.COUNTRYID ,
decode(ls.dmdunit,null,mwa.ProductName,ls.dmdunit) PRDCODE,
ReportingUnit REPID,
'99' CHANNELID,
IBC.COUNTRYID CUSTOMERID ,
(
CASE
WHEN MWA.COGSSourceCF LIKE '%Gauss%'
THEN MWA.COGSPriceCF * MWA.ExchRate
ELSE NULL
END) WWCOGS_GAUSS,
(
CASE
WHEN MWA.COGSSourceCF LIKE '%SAP%'
THEN MWA.COGSPriceCF * MWA.ExchRate
ELSE NULL
END) WWCOGS_SAP,
(
CASE
WHEN MWA.COGSSourceCF NOT LIKE '%Gauss%'
AND MWA.COGSSourceCF NOT LIKE '%SAP%'
THEN MWA.COGSPriceCF * MWA.ExchRate
ELSE NULL
END) WWCOGS_BusLine,
mwa.year,
IRC.CURRID
from BAM.M_WWCOGS_AREA MWA,
MICSTAG.M_IBP_REPUNIT_CURRENCY IRC,
micstag.M_IBP_BDREPORTINGCOUNTRY IBC
,MICSTAG.M_LOCALPRODUCT_STAG LS
WHERE MWA.ReportingUnit=IRC.REPID
--and mwa.productname='FR21030390085'
--and MWA.GaussCountry='BE BELGIUM'
AND IBC.COUNTRYPLANNINGGROUP =MWA.GaussCountry
and IBC.businessdivision=31
and MWA.COGSPriceCF <>0
and ls.REPORTINGUNITID(+)=mwa.ReportingUnit
and ls.IBPLOCALPRDID(+)= mwa.ProductName ) CP, micstag.M_IBP_PRODUCT IP
where CP.PRDCODE=IP.PRDID) RP
CONNECT BY level <= 12 ;
the above query is getting unwanted duplicate, if i use distinct the query is running forever.
req. duplicate the records based on year in result set of rp
consider value of year is 2019 than 12 records should came from 1-jan-2019 to 1-dec-2019.
more than one value of year are possible
The CONNECT BY LEVEL <= 12 trick works nicely with dual because that table returns one row. Things are messier when they base result set returns more than one row, because the CONNECT BY generates a product. This is why you're getting duplicates.
What you need to do is specify some additional criteria for the connection. Ideally you will have a primary key in the projection - I'm going to assume it's REPID, so if it's something different you'll need to tweak this. Anyway, you'll need something like this:
) RP
CONNECT BY level <= 12
and rp.repid = prior rp.repid
and prior sys_guid() is not null
The prior sys_guid() bit prevents ORA-01436: CONNECT BY loop in user data.

Translate hierarchical Oracle query to DB2 query

I work primarily with SAS and Oracle and am still new to DB2. Im faced with needing a hierarchical query to separate a clob into chunks that can be pulled into sas. SAS has a limit of 32K for character variables so I cant just pull the dataset in normally.
I found an old stackoverflow question about the best way to pull a clob into a sas data set but it is written in Oracle.
Import blob through SAS from ORACLE DB
Since I am new to DB2 and the syntax for this type of join seems very different I was hoping to find someone that could help convert it and explain the syntax. I find the Oracle syntax to be much easier to understand. I'm not sure in DB2 if you would use a CTE recursion like this https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/apsg/src/tpc/db2z_xmprecursivecte.html or if you would use hierarchical queries like this https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_71/sqlp/rbafyrecursivequeries.htm
Here is the Oracle query.
SELECT
id
, level as chunk_id
, regexp_substr(clob_value, '.{1,32767}', 1, level, 'n') as clob_chunk
FROM (
SELECT id, clob_value
FROM schema.table
WHERE id = 1
)
CONNECT BY LEVEL <= regexp_count(clob_value, '.{1,32767}',1,'n')
order by id, chunk_id;
The table has two fields the id and the clob_value and would look like this.
ID CLOB_VALUE
1 really large clob
2 medium clob
3 another large clob
The thought is I would want this result. I would only ever be doing this one row at a time where id= which ever row I am processing.
ID CHUNK_ID CLOB
1 1 clob_chunk1of3
1 2 clob_chunk2of3
1 3 clob_chunk3of3
Thanks for any time spent reading and helping.
Here is a solution that should work in DB2 with few changes (but please be advised that I don't know DB2 at all; I am just using Oracle features that are in the SQL Standard, so they should be implemented identically - or almost so - in DB2).
Below I create a table with your sample data; then I show how to chunk it into substrings of length at most 8 characters. Although the strings are short, I defined the column as CLOB and I am using CLOB tools; this should work on much larger CLOBs.
You can make both the chunk size and the id into bind parameters, if needed. In my demo below I hardcoded the chunk size and I show the result for all IDs in the table. In case the CLOB is NULL, I do return one chunk (which is NULL, of course).
Note that touching CLOBs in a query is very expensive; so most of the work is done without touching the CLOBs. I only work on them as little as possible.
PREP WORK
drop table tbl purge; -- If needed
create table tbl (id number, clob_value clob);
insert into tbl (id, clob_value)
select 1, 'really large clob' from dual union all
select 2, 'medium clob' from dual union all
select 3, 'another large clob' from dual union all
select 4, null from dual -- added to check handling
;
commit;
QUERY
with
prep(id, len) as (
select id, dbms_lob.getlength(clob_value)
from tbl
)
, rec(id, len, ord, pos) as (
select id, len, 1, 1
from prep
union all
select id, len, ord + 1, pos + 8
from rec
where len >= pos + 8
)
select id, ord, dbms_lob.substr(clob_value, 8, pos)
from tbl inner join rec using (id)
order by id, ord
;
ID ORD CHUNK
---- ---- --------
1 1 really l
1 2 arge clo
1 3 b
2 1 medium c
2 2 lob
3 1 another
3 2 large cl
3 3 ob
4 1
Another option is to enable the Oracle compatibility in Db2 and just issue the hierarchical query.
This GitHub repository has background information on SQL recursion in DB2, including the Oracle-style syntax and a side by side example (both work against the Db2 sample database):
-- both queries are against the SAMPLE database
-- and should return the same result
SELECT LEVEL, CAST(SPACE((LEVEL - 1) * 4) || '/' || DEPTNAME
AS VARCHAR(40)) AS DEPTNAME
FROM DEPARTMENT
START WITH DEPTNO = 'A00'
CONNECT BY NOCYCLE PRIOR DEPTNO = ADMRDEPT;
WITH tdep(level, deptname, deptno) as (
SELECT 1, CAST( DEPTNAME AS VARCHAR(40)) AS DEPTNAME, deptno
FROM department
WHERE DEPTNO = 'A00'
UNION ALL
SELECT t.LEVEL+1, CAST(SPACE(t.LEVEL * 4) || '/' || d.DEPTNAME
AS VARCHAR(40)) AS DEPTNAME, d.deptno
FROM DEPARTMENT d, tdep t
WHERE d.admrdept=t.deptno and d.deptno<>'A00')
SELECT level, deptname
FROM tdep;

Return Boolean value when table has data in the specified range

I need a query to return boolean when there's table has data in the given range.
Assume table
Customer
[User ID, Name, Date, Products_Purchased]
I'm trying to do:
select case when exists(
select Date, count(*)
from Customer
where date between '2015-08-03' and '2015-08-05'
)
then cast(1 as BIT)
else case(0 as BIT)end;
This is throwing an error near "select Date".
However, weird part is the inner query is running perfectly fine.
Im wondering if im missing out something here !
What about something more straightforward e.g.
select case when count(*) >0 then 1 else 0 end as HIT
from ... where ...
That way you don't have to bother about Hive assuming that EXISTS implies a correlated sub-query, automagically translated into a MapJoin, i.e. a Java HashMap shuffled to the 2nd line of Mappers jobs, etc. Not exactly your use case.
Then it's not useful to compute the exact count, so the query could be refined as
select case when count(*) >0 then 1 else 0 end as HIT
from
(select ... from ... where ... limit 1) X
[Edit] There is no "bit" datatype in Hive. But the default "int" should be OK if you just want a return flag (zero / non-zero)

pl-sql include column names in query

A weird request maybe but. My boss wants me to create an admin version of a page we have that displays data from an oracle query in a table.
The admin page, instead of displaying the data (query returns 1 row), needs to return the table name and column name
Ex: Instead of:
Name Initial
==================
Bob A
I want:
Name Initial
============================
Users.FirstName Users.MiddleInitial
I realize I can do this in code but would rather just modify the query to return the data I want so I can leave the report generation code mostly alone.
I don't want to do it in a stored procedure.
So when I spit out the data in the report using something like:
blah blah = MyDataRow("FirstName")
I can leave that as is but instead of it displaying "BOB" it would display "Users.FirstName"
And I want to do the query using select * if possible instead of listing all the columns
So for each of the columns I am querying in the * , I want to get (instead of the column value) the tablename.ColumnName or tablename|columnName
hope you are following- I am confusing myself...
pseudo:
select tablename + '.' + Columnname as WhateverTheColumnNameIs
from Table1
left join Table2 on whatever...
Join Table_Names on blah blah
Whew- after writing all this I think I will just do it on the code side.
But if you are up for it maybe a fun challenge
Oracle does not provide an authentic way(there is no pseudocolumn) to get the column name of a table as a result of a query against that table. But you might consider these two approaches:
Extract column name from an xmltype, formed by passing cursor expression(your query) in the xmltable() function:
-- your table
with t1(first_name, middle_name) as(
select 1,2 from dual
), -- your query
t2 as(
select * -- col1 as "t1.col1"
--, col2 as "t1.col2"
--, col3 as "t1.col3"
from hr.t1
)
select *
from ( select q.object_value.getrootelement() as col_name
, rownum as rn
from xmltable('//*'
passing xmltype(cursor(select * from t2 where rownum = 1))
) q
where q.object_value.getrootelement() not in ('ROWSET', 'ROW')
)
pivot(
max(col_name) for rn in (1 as "name", 2 as "initial")
)
Result:
name initial
--------------- ---------------
FIRST_NAME MIDDLE_NAME
Note: In order for column names to be prefixed with table name, you need to list them
explicitly in the select list of a query and supply an alias, manually.
PL/SQL approach. Starting from Oracle 11g you could use dbms_sql() package and describe_columns() procedure specifically to get the name of columns in the cursor(your select).
This might be what you are looking for, try selecting from system views USER_TAB_COLS or ALL_TAB_COLS.

Performance of User Defined Functions in SQL Server 2000 and 2005

We have two tables, Customer and CustomerEvent both contains few million rows. On SQL Server 2000, we deployed an UDF called fn_CustomerEvent which returns TRUE or FALSE based on two parameters CustomerID and EventCode, e.g.
SELECT dbo.fn_CustomerEvent(1345678, 'Music')
The UDF code is:
CREATE FUNCTION [dbo].[fn_CustomerEvent](#CustomerID INT, #EviCode NVARCHAR(10))
RETURNS NVARCHAR(10)
AS
BEGIN
DECLARE #List NVARCHAR(10)
SELECT #List = CASE
WHEN COUNT(*) > 0 THEN 'TRUE'
ELSE 'FALSE'
END
FROM CustomerEvent
WHERE
CustomerID = #CustomerID
AND EviCode = #EviCode
RETURN #List
END
The performance on SQL Server 2000 was great. Return TOP 5000 rows within 3 seconds. For example,
SELECT TOP 5000
CustomerID, dbo.fn_CustomerEvent(1345678, 'Music')
FROM [Table1]
But now, we are moving to SQL Server 2005. Same code, same UDF, but performance drops dramatically from 3 seconds to 1 minutes 20 seconds.
Can anyone point me a right direction on where should I start to optimize the performance?
The scalar UDF is evaluated for each row (i.e. 5000 times). You could either call it once and store the result in a variable
DECLARE #Result nvarchar(10)
SELECT #Result = dbo.fn_CustomerEvent(1345678, 'Music')
SELECT TOP 5000
CustomerID, #Result
FROM [Table1]
or you can use an inline TVF (and I would also use EXISTS instead of COUNT)
CREATE FUNCTION CustomerEvent (#CustomerID INT,
#EviCode NVARCHAR(10))
RETURNS TABLE
AS
RETURN
(SELECT CASE
WHEN EXISTS(SELECT *
FROM CustomerEvent
WHERE CustomerID = #CustomerID
AND EviCode = #EviCode) THEN 'TRUE'
ELSE 'FALSE'
END)
See Scalar functions, inlining, and performance: An entertaining title for a boring post for more about this technique.
There is one big problem with UDF's: they don't work with indexes. If you want to get code re-use and maintain performance, I will normally build either a computed column (which can be indexed) or a view.
CREATE FUNCTION CustomerEvent (#CustomerID INT,
#EviCode NVARCHAR(10))
RETURNS TABLE
AS
RETURN
(SELECT COALESCE((SELECT 'TRUE' FROM CustomerEvent
WHERE
CustomerID = #CustomerID
AND EviCode = #EviCode)
, 'FALSE'))
Check for Indexes, Rebuild Them and Update your statistics.

Resources