I've always operated under the assumptions that for small to medium data sets, I would use temp table variables and once the data sets get large (relatively - and I know this is somewhat subjective), I would move to #temp tables.
I have some code that returns 159 rows very quickly - 0.00 ms. I see 159 rows and assume temp table variables would be fine. However, When I insert into temp table variable, the code spins for 10 minutes. When I switch to #temp table, boom - back to 0.00 seconds.
Can someone tell me why this would occur or something I can do to figure this out?
The environment is SQL Server 2017 Enterprise, plenty of memory and disk space.
DECLARE #Year INT = 2017
DECLARE #Month INT = NULL
DECLARE #TempTableVariable TABLE
(
Test BIGINT
)
DROP TABLE IF EXISTS tempdb.dbo.#TempTableTest
CREATE TABLE #TempTableTest (Test BIGINT)
--INSERT INTO #TempTableVariable
INSERT INTO #TempTableTest
SELECT
idfHumanCase
FROM dbo.tlbHumanCase
WHERE YEAR(COALESCE(datOnSetDate, datFinalDiagnosisDate, datTentativeDiagnosisDate, datEnteredDate)) = #Year
AND (MONTH(COALESCE(datOnSetDate, datFinalDiagnosisDate, datTentativeDiagnosisDate, datEnteredDate)) = #Month OR #Month IS NULL)
AND LegacyCaseID IN
(
SELECT LegacyCaseID
FROM dbo.tlbHumanCase
WHERE YEAR(COALESCE(datOnSetDate, datFinalDiagnosisDate, datTentativeDiagnosisDate, datEnteredDate)) = #Year
AND (MONTH(COALESCE(datOnSetDate, datFinalDiagnosisDate, datTentativeDiagnosisDate, datEnteredDate)) = #Month OR #Month IS NULL)
GROUP BY LegacyCaseID HAVING COUNT(*) > 1
)
SELECT * FROM #TempTableTest
--SELECT * FROM #TempTableVariable
Related
I have a program that has been working for years. Today, we upgraded from SAS 9.4M3 to 9.4M7.
proc setinit
Current version: 9.04.01M7P080520
Since then, I am not able to get the same results as before the upgrade.
Please note that I am querying on Oracle databases directly.
Trying to replicate the issue with a minimal, reproducible SAS table example, I found that the issue disappear when querying on a SAS table instead of on Oracle databases.
Let's say I have the following dataset:
data have;
infile datalines delimiter="|";
input name :$8. id $1. value :$8. t1 :$10.;
datalines;
Joe|A|TLO
Joe|B|IKSK
Joe|C|Yes
;
Using the temporary table:
proc sql;
create table want as
select name,
min(case when id = "A" then value else "" end) as A length 8
from have
group by name;
quit;
Results:
name A
Joe TLO
However, when running the very same query on the oracle database directly I get a missing value instead:
proc sql;
create table want as
select name,
min(case when id = "A" then value else "" end) as A length 8
from have_oracle
group by name;
quit;
name A
Joe
As per documentation, the min() function is behaving properly when used on the SAS table
The MIN function returns a missing value (.) only if all arguments are missing.
I believe this happens when Oracle don't understand the function that SAS is passing it - the min functions in SAS and Oracle are very different and the equivalent in SAS would be LEAST().
So my guess is that the upgrade messed up how is translates the SAS min function to Oracle, but it remains a guess. Does anyone ran into this type of behavior?
EDIT: #Richard's comment
options sastrace=',,,d' sastraceloc=saslog nostsuffix;
proc sql;
create table want as
select t1.name,
min(case when id = 'A' then value else "" end) as A length 8
from oracle_db.names t1 inner join oracle_db.ids t2 on (t1.tid = t2.tid)
group by t1.name;
ORACLE_26: Prepared: on connection 0
SELECT * FROM NAMES
ORACLE_27: Prepared: on connection 1
SELECT UI.INDEX_NAME, UIC.COLUMN_NAME FROM USER_INDEXES UI,USER_IND_COLUMNS UIC WHERE UI.TABLE_NAME='NAMES' AND
UIC.TABLE_NAME='NAMES' AND UI.INDEX_NAME=UIC.INDEX_NAME
ORACLE_28: Executed: on connection 1
SELECT statement ORACLE_27
ORACLE_29: Prepared: on connection 0
SELECT * FROM IDS
ORACLE_30: Prepared: on connection 1
SELECT UI.INDEX_NAME, UIC.COLUMN_NAME FROM USER_INDEXES UI,USER_IND_COLUMNS UIC WHERE UI.TABLE_NAME='IDS' AND
UIC.TABLE_NAME='IDS' AND UI.INDEX_NAME=UIC.INDEX_NAME
ORACLE_31: Executed: on connection 1
SELECT statement ORACLE_30
ORACLE_32: Prepared: on connection 0
select t1."NAME", MIN(case when t2."ID" = 'A' then t1."VALUE" else ' ' end) as A from
NAMES t1 inner join IDS t2 on t1."TID" = t2."TID" group by t1."NAME"
ORACLE_33: Executed: on connection 0
SELECT statement ORACLE_32
ACCESS ENGINE: SQL statement was passed to the DBMS for fetching data.
NOTE: Table WORK.SELECTED_ATTR created, with 1 row and 2 columns.
! quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.34 seconds
cpu time 0.09 seconds
Use the SASTRACE= system option to log SQL statements sent to the DBMS.
options SASTRACE=',,,d';
will provide the most detailed logging.
From the prepared statement you can see why you are getting a blank from the Oracle query.
select
t1."NAME"
, MIN ( case
when t2."ID" = 'A' then t1."VALUE"
else ' '
end
) as A
from
NAMES t1 inner join IDS t2 on t1."TID" = t2."TID"
group by
t1."NAME"
The SQL MIN () aggregate function will exclude null values from consideration.
In SAS SQL, a blank value is also interpreted as null.
In SAS your SQL query returns the min non-null value TLO
In Oracle transformed query, the SAS blank '' is transformed to ' ' a single blank character, which is not-null, and thus ' ' < 'TLO' and you get the blank result.
The actual MIN you want to force in Oracle is min(case when id = "A" then value else null end) which #Tom has shown is possible by omitting the else clause.
The only way to see the actual difference is to run the query with trace in the prior SAS version, or if lucky, see the explanation in the (ignored by many) "What's New" documents.
Why are you using ' ' or '' as the ELSE value? Perhaps Oracle is treating a string with blanks in it differently than a null string.
Why not use null in the ELSE clause?
or just leave off the ELSE clause and let it default to null?
libname mylib oracle .... ;
proc sql;
create table want as
select name
, min(case when id = "A" then value else null end) as A length 8
from mylib.have_oracle
group by name
;
quit;
Also try running the Oracle code yourself, instead of using implicit pass thru.
proc sql;
connect to oracle ..... ;
create table want as
select * from connection to oracle
(
select name,
min(case when id = "A" then value else null end) as A length 8
from have_oracle
group by name
)
;
quit;
When I try to reproduce this in Oracle I get the result you are looking for so I suspect it has something to do with SAS (which I'm not familiar with).
with t as (
select 'Joe' name, 'A' id, 'TLO' value from dual union all
select 'Joe' name, 'B' id, 'IKSK' value from dual union all
select 'Joe' name, 'C' id, 'Yes' value from dual
)
select name
, min(case when id = 'A' then value else '' end) as a
from t
group by name;
NAME A
---- ----
Joe TLO
Unrelated, if you are only interested in id = 'A' then a better query would be:
select name
, min(value) as a
from t
where id = 'A'
group by name;
I have a procedure and a particular query in the procedure is generating about 50GB of temp space causing below exception after few executions.:
SQL state [72000]; error code [1652]; ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
DBAs are pointing to the below query in the stored procedure that needs to be rewritten. The tables used in the query below are smaller than 0.1 GB but the query generates 50GB of temp space!
SELECT tab1.ORDID ID1, tab2.ORDID ID2
FROM (
SELECT
OT.ORDID,
CONNECT_BY_ROOT OT.UNIQ_ORIG_KEY ORIG_UID
FROM order_tab OT, status_tab ST
WHERE OT.otype IN ('A','B')
AND OT.order_uid IS NULL
AND OT.BATCH_ID = ST.BATCH_ID
AND ST.CT_DATE = :A1
AND ST.BSTATUS = 1
CONNECT BY PRIOR OT.UNIQ_KEY = OT.UNIQ_ORIG_KEY
) tab1 , order_tab tab2
WHERE tab2.ORD_VERID = 1
AND tab1.ORIG_UID = tab2.UNIQ_KEY
ORDER BY ID1;
Could someone please help in rewriting the query efficiently so that temp space utilization is reduced. Database used Oracle 12c.
I am recieving information from a csv file from one department to compare with the same inforation in a different department to check for discrepencies (About 3/4 of a million rows of data with 44 columns in each row). After I have the data in a table, I have a program that will take the data and send reports based on a HQ. I feel like the way I am going about this is not the most efficient. I am using oracle for this comparison.
Here is what I have:
I have a vb.net program that parses the data and inserts it into an extract table
I run a procedure to do a full outer join on the two tables into a new table with the fields in one department prefixed with '_c'
I run another procedure to compare the old/new data and update 2 different tables with detail and summary information. Here is code from inside the procedure:
DECLARE
CURSOR Cur_Comp IS SELECT * FROM T.AEC_CIS_COMP;
BEGIN
FOR compRow in Cur_Comp LOOP
--If service pipe exists in CIS but not in FM and the service pipe has status of retired in CIS, ignore the variance
If(compRow.pipe_num = '' AND cis_status_c = 'R')
continue
END IF
--If there is not a summary record for this HQ in the table for this run, create one
INSERT INTO t.AEC_CIS_SUM (HQ, RUN_DATE)
SELECT compRow.HQ, to_date(sysdate, 'DD/MM/YYYY') from dual WHERE NOT EXISTS
(SELECT null FROM t.AEC_CIS_SUM WHERE HQ = compRow.HQ AND RUN_DATE = to_date(sysdate, 'DD/MM/YYYY'))
-- Check fields and update the tables accordingly
If (compRow.cis_loop <> compRow.cis_loop_c) Then
--Insert information into the details table
INSERT INTO T.AEC_CIS_DET( Fac_id, Pipe_Num, Hq, Address, AutoUpdatedFl,
DateTime, Changed_Field, CIS_Value, FM_Value)
VALUES(compRow.Fac_ID, compRow.Pipe_Num, compRow.Hq, compRow.Street_Num || ' ' || compRow.Street_Name,
'Y', sysdate, 'Cis_Loop', compRow.cis_loop, compRow.cis_loop_c);
-- Update information into the summary table
UPDATE AEC_CIS_SUM
SET cis_loop = cis_loop + 1
WHERE Hq = compRow.Hq
AND Run_Date = to_date(sysdate, 'DD/MM/YYYY')
End If;
END LOOP;
END;
Any suggestions of an easier way of doing this rather than an if statement for all 44 columns of the table? (This is run once a week if it matters)
Update: Just to clarify, there are 88 columns of data (44 of duplicates to compare with one suffixed with _c). One table lists each field in a row that is different so one row can mean 30+ records written in that table. The other table keeps tally of the number of discrepencies for each week.
First of all I believe that your task can be implemented (and should be actually) with staight SQL. No fancy cursors, no loops, just selects, inserts and updates. I would start with unpivotting your source data (it is not clear if you have primary key to join two sets, I guess you do):
Col0_PK Col1 Col2 Col3 Col4
----------------------------------------
Row1_val A B C D
Row2_val E F G H
Above is your source data. Using UNPIVOT clause we convert it to:
Col0_PK Col_Name Col_Value
------------------------------
Row1_val Col1 A
Row1_val Col2 B
Row1_val Col3 C
Row1_val Col4 D
Row2_val Col1 E
Row2_val Col2 F
Row2_val Col3 G
Row2_val Col4 H
I think you get the idea. Say we have table1 with one set of data and the same structured table2 with the second set of data. It is good idea to use index-organized tables.
Next step is comparing rows to each other and storing difference details. Something like:
insert into diff_details(some_service_info_columns_here)
select some_service_info_columns_here_along_with_data_difference
from table1 t1 inner join table2 t2
on t1.Col0_PK = t2.Col0_PK
and t1.Col_name = t2.Col_name
and nvl(t1.Col_value, 'Dummy1') <> nvl(t2.Col_value, 'Dummy2');
And on the last step we update difference summary table:
insert into diff_summary(summary_columns_here)
select diff_row_id, count(*) as diff_count
from diff_details
group by diff_row_id;
It's just rough draft to show my approach, I'm sure there is much more details should be taken into account. To summarize I suggest two things:
UNPIVOT data
Use SQL statements instead of cursors
You have several issues in your code:
If(compRow.pipe_num = '' AND cis_status_c = 'R')
continue
END IF
"cis_status_c" is not declared. Is it a variable or a column in AEC_CIS_COMP?
In case it is a column, just put the condition into the cursor, i.e. SELECT * FROM T.AEC_CIS_COMP WHERE not (compRow.pipe_num = '' AND cis_status_c = 'R')
to_date(sysdate, 'DD/MM/YYYY')
That's nonsense, you convert a date into a date, simply use TRUNC(SYSDATE)
Anyway, I think you can use three single statements instead of a cursor:
INSERT INTO t.AEC_CIS_SUM (HQ, RUN_DATE)
SELECT comp.HQ, trunc(sysdate)
from AEC_CIS_COMP comp
WHERE NOT EXISTS
(SELECT null FROM t.AEC_CIS_SUM WHERE HQ = comp.HQ AND RUN_DATE = trunc(sysdate));
INSERT INTO T.AEC_CIS_DET( Fac_id, Pipe_Num, Hq, Address, AutoUpdatedFl, DateTime, Changed_Field, CIS_Value, FM_Value)
select comp.Fac_ID, comp.Pipe_Num, comp.Hq, comp.Street_Num || ' ' || comp.Street_Name, 'Y', sysdate, 'Cis_Loop', comp.cis_loop, comp.cis_loop_c
from T.AEC_CIS_COMP comp
where comp.cis_loop <> comp.cis_loop_c;
UPDATE AEC_CIS_SUM
SET cis_loop = cis_loop + 1
WHERE Hq IN (Select Hq from T.AEC_CIS_COMP)
AND trunc(Run_Date) = trunc(sysdate);
They are not tested but they should give you a hint how to do it.
We have two tables, Customer and CustomerEvent both contains few million rows. On SQL Server 2000, we deployed an UDF called fn_CustomerEvent which returns TRUE or FALSE based on two parameters CustomerID and EventCode, e.g.
SELECT dbo.fn_CustomerEvent(1345678, 'Music')
The UDF code is:
CREATE FUNCTION [dbo].[fn_CustomerEvent](#CustomerID INT, #EviCode NVARCHAR(10))
RETURNS NVARCHAR(10)
AS
BEGIN
DECLARE #List NVARCHAR(10)
SELECT #List = CASE
WHEN COUNT(*) > 0 THEN 'TRUE'
ELSE 'FALSE'
END
FROM CustomerEvent
WHERE
CustomerID = #CustomerID
AND EviCode = #EviCode
RETURN #List
END
The performance on SQL Server 2000 was great. Return TOP 5000 rows within 3 seconds. For example,
SELECT TOP 5000
CustomerID, dbo.fn_CustomerEvent(1345678, 'Music')
FROM [Table1]
But now, we are moving to SQL Server 2005. Same code, same UDF, but performance drops dramatically from 3 seconds to 1 minutes 20 seconds.
Can anyone point me a right direction on where should I start to optimize the performance?
The scalar UDF is evaluated for each row (i.e. 5000 times). You could either call it once and store the result in a variable
DECLARE #Result nvarchar(10)
SELECT #Result = dbo.fn_CustomerEvent(1345678, 'Music')
SELECT TOP 5000
CustomerID, #Result
FROM [Table1]
or you can use an inline TVF (and I would also use EXISTS instead of COUNT)
CREATE FUNCTION CustomerEvent (#CustomerID INT,
#EviCode NVARCHAR(10))
RETURNS TABLE
AS
RETURN
(SELECT CASE
WHEN EXISTS(SELECT *
FROM CustomerEvent
WHERE CustomerID = #CustomerID
AND EviCode = #EviCode) THEN 'TRUE'
ELSE 'FALSE'
END)
See Scalar functions, inlining, and performance: An entertaining title for a boring post for more about this technique.
There is one big problem with UDF's: they don't work with indexes. If you want to get code re-use and maintain performance, I will normally build either a computed column (which can be indexed) or a view.
CREATE FUNCTION CustomerEvent (#CustomerID INT,
#EviCode NVARCHAR(10))
RETURNS TABLE
AS
RETURN
(SELECT COALESCE((SELECT 'TRUE' FROM CustomerEvent
WHERE
CustomerID = #CustomerID
AND EviCode = #EviCode)
, 'FALSE'))
Check for Indexes, Rebuild Them and Update your statistics.
WITH TOP 100000 (100k) this query is finished in about 3 seconds
WITH TOP 1000000 (1mil) this query is finished in about 2 minutes
SELECT TOP 1000000
db_id = IDENTITY(int, 1, 1), *
INTO dbo.tablename
FROM dbname.dbo.tablename
Actual execution plan is always:
clustered index scan 4% cost
top
top
compute scalar
insert (96% cost)
select into
The table has 1.3 mil rows and has an int primary key on first column
Can I speed it up somehow? I'm using SQL Server 2008 R2.
The results showed that 100,000 records takes 159 ms, and 1,000,000 records takes 1,435 ms. On a Raid 1 OS, Raid 1 Data, Raid 1 Log, Raid 1 TempDb all separate drives. Our Dev enviroment.
The results showed that 100,000 records takes 113 ms, and 1,000,000 records takes 996 ms. On my laptop with a single SSD (Samsung 840 250GB). SSD's rock!!!
The results showed that 100,000 records takes 188 ms, and 1,000,000 records takes 1,880 ms. On a Raid 1 OS, Raid 10 Data, Raid 10 Log, Raid 1 TempDb all separate drives under a production load.
Here is a complete script that shows that the 1 million takes less than ten times as long as 100,000. Your situation is likely slightly different, but this shows that the fundamentals are not the issue.
The results show that 100,000 records takes 146 ms, and 1,000,000 records takes 1,315 ms.
These results are from my desktop. If someone else could run the script and post their results, that would be very useful.
Rob
USE master;
GO
-- Drop database SourceDB
IF EXISTS (SELECT * FROM sys.databases WHERE name = 'SourceDB') ALTER DATABASE SourceDB SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
IF EXISTS (SELECT * FROM sys.databases WHERE name = 'SourceDB') DROP DATABASE SourceDB;
GO
-- Create database SourceDB
CREATE DATABASE SourceDB;
ALTER DATABASE SourceDB SET RECOVERY SIMPLE;
GO
USE SourceDB;
GO
-- Create table SourceDB.dbo.SourceTable
CREATE TABLE dbo.SourceTable (
ColID int PRIMARY KEY
);
GO
-- Populate table SourceDB.dbo.SourceTable
DECLARE #i int = 0;
WHILE #i < 1300000
BEGIN
SET #i += 1;
INSERT INTO dbo.SourceTable (ColID) VALUES (#i);
END;
GO
-- Drop database Test1
IF EXISTS (SELECT * FROM sys.databases WHERE name = 'Test1') ALTER DATABASE Test1 SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
IF EXISTS (SELECT * FROM sys.databases WHERE name = 'Test1') DROP DATABASE Test1;
GO
-- Create database Test1
CREATE DATABASE Test1;
ALTER DATABASE Test1 SET RECOVERY SIMPLE;
ALTER DATABASE Test1 MODIFY FILE (NAME = Test1, SIZE = 3000MB, MAXSIZE = 8TB);
ALTER DATABASE Test1 MODIFY FILE (NAME = Test1_log, SIZE = 3000MB, MAXSIZE = 2TB);
GO
USE Test1;
GO
IF EXISTS (SELECT * FROM sys.tables WHERE [OBJECT_ID] = OBJECT_ID('dbo.DestinationTable1')) DROP TABLE dbo.DestinationTable1;
IF EXISTS (SELECT * FROM sys.tables WHERE [OBJECT_ID] = OBJECT_ID('dbo.DestinationTable2')) DROP TABLE dbo.DestinationTable2;
GO
DECLARE #n int = 100000;
DECLARE #t1 datetime2 = SYSDATETIME();
SELECT TOP (#n) db_id = IDENTITY(int, 1, 1), *
INTO dbo.DestinationTable1
FROM SourceDB.dbo.SourceTable;
SELECT DATEDIFF(ms, #t1, SYSDATETIME()) AS ElapsedMs;
GO
DECLARE #n int = 1000000;
DECLARE #t1 datetime2 = SYSDATETIME();
SELECT TOP (#n) db_id = IDENTITY(int, 1, 1), *
INTO dbo.DestinationTable2
FROM SourceDB.dbo.SourceTable;
SELECT DATEDIFF(ms, #t1, SYSDATETIME()) AS ElapsedMs;
GO