Oracle table from SAS dataset - oracle

I was trying to create Oracle tables from SAS datasets. I am successful in many cases, but am stuck for a particular dataset. I am providing the log file below. I am working with SAS 9 and Oracle 11.2.0.1.0 on Linux.
Any suggestions?
1 libname dibsdata "/data2/dibyendu/Jan_9/on_demand/";
NOTE: Libref DIBSDATA was successfully assigned as follows:
Engine: V9
Physical Name: /data2/dibyendu/Jan_9/on_demand
2 libname myora oracle user=sasuser password=XXXXXXXXXX path=CIOEDATA ;
NOTE: Libref MYORA was successfully assigned as follows:
Engine: ORACLE
Physical Name: CIOEDATA
3 data myora.on_demand;
4 set dibsdata.on_demand;
5 run;
NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
ERROR: Error attempting to CREATE a DBMS table. ERROR: ORACLE execute error: ORA-00904: : invalid identifier..
NOTE: The DATA step has been abnormally terminated.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: SAS set option OBS=0 and will continue to check statements. This might cause NOTE: No observations in data set.
WARNING: The data set MYORA.ON_DEMAND may be incomplete. When this step was stopped there were 0 observations and 48 variables.
ERROR: ROLLBACK issued due to errors for data set MYORA.ON_DEMAND.DATA.
NOTE: DATA statement used (Total process time):
real time 0.06 seconds
cpu time 0.00 seconds
ERROR: Errors printed on page 1.
2 The SAS System 17:00 Wednesday, January 9, 2013
NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
real time 1.24 seconds
cpu time 0.04 seconds

Oracle error ORA-00904 means you are trying to create a table with an invalid column name. Most likely you have a SAS variable with a name longer that 30 characters or is an Oracle reserved word. For example, the two variables in this SAS dataset are illegal in Oracle:
data a;
column_name_too_long_for_oracle = 1;
date = today(); /* This is a reserved word */
run;
Here is the Oracle 11g Reserved Words list. Check the variable names in your SAS dataset and rename them to something legal in Oracle. If example, if the offender is a SAS variable named DATE, you might try this:
data myora.on_demand;
set dibsdata.on_demand(rename=(DATE=PROJ_DATE));
run;

Related

SAS: sort error (by variable not sorted properly)

This question is a follow up from another I had here SAS: Data step view -> error: by variable not sorted properly; I am opening a new question as the desired solution is slightly different: As I am looping through several input files, one of the raw-files is not properly sorted, I wonder what I could do to make my program to skip that particular input file and just continue?
Quote:
I am using a macro to loop through files based on names and extract data which works fine for the majority of the cases, however from time to time I experience
ERROR: BY variables are not properly sorted on data set CQ.CQM_20141113.
where CQM_20141113 is the file I am extracting data from. In fact my macro loops through CQ.CQM_2014: and it works up until 20141113. Because of this single failure the file is then not created.
I am using a data step view to "initialize" the data and then in a further step to call data step view (code sample with shortened where conditions):
%let taq_ds = cq.cqm_2014:;
data _v_&tables / view=_v_&tables;
set &taq_ds;
by sym_root date time_m; *<= added by statement
format sym_root date time_m;
where sym_root = &stock;
run;
data xtemp2_&stockfiname (keep = sym_root year date iprice);
retain sym_root year date iprice;
set _v_&tables;
by sym_root date time_m;
/* some conditions */
run;
When I see the error via the log file and I run the file again, then it works (sometimes I need a few trials).
I was thinking of a proc sort, but how to do that when using data step view?
Please note the cqm-files are very large (which could also be the root of the problem).
End Quote:
edit: I tried your code (and deleted the by statement in the data step view), however I am getting this error:
NOTE: Line generated by the macro variable "TAQ_DS".
152 cq.cqm_2013:
_
22
200
ERROR 22-322: Syntax error, expecting one of the following: a name, ;, (, ',',
ANSIMISS, AS, CROSS, EXCEPT, FULL, GROUP, HAVING, INNER,
INTERSECT, JOIN, LEFT, NATURAL, NOMISS, ORDER, OUTER, RIGHT,
UNION, USING, WHERE.
ERROR 200-322: The symbol is not recognized and will be ignored.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of
statements.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
ERROR: File WORK._V_CQM_2013.DATA does not exist.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: SAS set option OBS=0 and will continue to check statements.
This might cause NOTE: No observations in data set.
WARNING: The data set WORK.XTEMP2_OXY may be incomplete. When this step was
stopped there were 0 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
Do you need the by statement in the view creation?
If not, then sort from the view into a temporary data set:
proc sort data=_v_&tables out=__temp;
by sym_root date time_m;
run;
data xtemp2_&stockfiname (keep = sym_root year date iprice);
retain sym_root year date iprice;
set __temp;
by sym_root date time_m;
/* some conditions */
run;
Another option would be to create the view in PROC SQL adding the sort order:
proc sql noprint;
create view _v_&tables as
select <whatever>
from &taq_ds
where <clause>
order by sym_root, date, time_m;
quit;

What is the fastest way to take dump of a table in Oracle?

I'm trying to take a dump of a table on to remote disk mounted on my server and below is the cmd I've used.
Exporting of dump started and after 6 hours below ORA errors where thrown.
Looking for a better way:
ORA-02354: error in exporting/importing data
ORA-01555: snapshot too old: rollback segment number 17 with name "_SYSSMU17$" too small
Command used:
expdp user/password TABLES=TABLE_NAME DIRECTORY=TEST_DIR DUMPFILE=DUMP.dmp LOGFILE=LOG.log

Insufficient memory error in proc sort

My data is stored in Oracle table MY_DATA. This table contains only 2 rows with 7 columns. But when I execute step:
proc sort data=oraclelib.MY_DATA nodupkey out=SORTED_DATA;
by client_number;
run;
the following error appears:
ERROR: The SAS System stopped processing this step because of insufficient memory.
If I comment nodupkey option then error disappears. If I copy dataset in work library and execute proc sort on it then everything is OK too.
My memory options:
SORTSIZE=1073741824
SUMSIZE=0
MAXMEMQUERY=268435456
LOADMEMSIZE=0
MEMSIZE=31565617920
REALMEMSIZE=0
What can be the root of the problem and how can I fix it?
My Oracle password was in grace period and when I changed it the issue disappeared.

How to update an Oracle Table from SAS efficiently?

The problem I am trying to solve:
I have a SAS dataset work.testData (in the work library) that contains 8 columns and around 1 million rows. All columns are in text (i.e. no numeric data). This SAS dataset is around 100 MB in file size. My objective is to have a step to parse this entire SAS dataset into Oracle. i.e. sort of like a "copy and paste" of the SAS dataset from the SAS platform to the Oracle platform. The rationale behind this is that on a daily basis, this table in Oracle gets "replaced" by the one in SAS which will enable downstream Oracle processes.
My approach to solve the problem:
One-off initial setup in Oracle:
In Oracle, I created a table called testData with a table structure pretty much identical to the SAS dataset testData. (i.e. Same table name, same number of columns, same column names, etc.).
On-going repeating process:
In SAS, do a SQL-pass through to truncate ora.testData (i.e. remove all rows whilst keeping the table structure). This ensure the ora.testData is empty before inserting from SAS.
In SAS, a LIBNAME statement to assign the Oracle database as a SAS library (called ora). So I can "see" what's in Oracle and perform read/update from SAS.
In SAS, a PROC SQL procedure to "insert" data from the SAS dataset work.testData into the Oracle table ora.testData.
Sample codes
One-off initial setup in Oracle:
Step 1: Run this Oracle SQL Script in Oracle SQL Developer (to create table structure for table testData. 0 rows of data to begin with.)
DROP TABLE testData;
CREATE TABLE testData
(
NODENAME VARCHAR2(64) NOT NULL,
STORAGE_NAME VARCHAR2(100) NOT NULL,
TS VARCHAR2(10) NOT NULL,
STORAGE_TYPE VARCHAR2(12) NOT NULL,
CAPACITY_MB VARCHAR2(11) NOT NULL,
MAX_UTIL_PCT VARCHAR2(12) NOT NULL,
AVG_UTIL_PCT VARCHAR2(12) NOT NULL,
JOBRUN_START_TIME VARCHAR2(19) NOT NULL
)
;
COMMIT;
On-going repeating process:
Step 2, 3 and 4: Run this SAS code in SAS
******************************************************;
******* On-going repeatable process starts here ******;
******************************************************;
*** Step 2: Trancate the temporary Oracle transaction dataset;
proc sql;
connect to oracle (user=XXX password=YYY path=ZZZ);
execute (
truncate table testData
) by oracle;
execute (
commit
) by oracle;
disconnect from oracle;
quit;
*** Step 3: Assign Oracle DB as a libname;
LIBNAME ora Oracle user=XXX password=YYY path=ZZZ dbcommit=100000;
*** Step 4: Insert data from SAS to Oracle;
PROC SQL;
insert into ora.testData
select NODENAME length=64,
STORAGE_NAME length=100,
TS length=10,
STORAGE_TYPE length=12,
CAPACITY_MB length=11,
MAX_UTIL_PCT length=12,
AVG_UTIL_PCT length=12,
JOBRUN_START_TIME length=19
from work.testData;
QUIT;
******************************************************;
**** On-going repeatable process ends here *****;
******************************************************;
The limitation / problem to my approach:
The Proc SQL step (that transfer 100 MB of data from SAS to Oracle) takes around 5 hours to perform - the job takes too long to run!
The Question:
Is there a more sensible way to perform data transfer from SAS to Oracle? (i.e. updating an Oracle table from SAS).
First off, you can do the drop/recreate from SAS if that's a necessity. I wouldn't drop and recreate each time - a truncate seems easier to get the same results - but if you have other reasons then that's fine; but either way you can use execute (truncate table xyz) from oracle or similar to drop, using a pass-through connection.
Second, assuming there are no constraints or indexes on the table - which seems likely given you are dropping and recreating it - you may not be able to improve this, because it may be based on network latency. However, there is one area you should look in the connection settings (which you don't provide): how often SAS commits the data.
There are two ways to control this, the DBCOMMMIT setting and the BULKLOAD setting. The former controls how frequently commits are executed (so if DBCOMMIT=100 then a commit is executed every 100 rows). More frequent commits = less data is lost if a random failure occurs, but much slower execution. DBCOMMIT defaults to 0 for PROC SQL INSERT, which means just make one commit (fastest option assuming no errors), so this is less likely to be helpful unless you're overriding this.
Bulkload is probably my recommendation; that uses SQLLDR to load your data, ie, it batches the whole bit over to Oracle and then says 'Load this please, thanks.' It only works with certain settings and certain kinds of queries, but it ought to work here (subject to other conditions - read the documentation page above).
If you're using BULKLOAD, then you may be up against network latency. 5 hours for 100 MB seems slow, but I've seen all sorts of things in my (relatively short) day. If BULKLOAD didn't work I would probably bring in the Oracle DBAs and have them troubleshoot this, starting from a .csv file and a SQL*LDR command file (which should be basically identical to what SAS is doing with BULKLOAD); they should know how to troubleshoot that and at least be able to monitor performance of the database itself. If there are constraints on other tables that are problematic here (ie, other tables that too-frequently recalculate themselves based on your inserts or whatever), they should be able to find out and recommend solutions.
You could look into PROC DBLOAD, which sometimes is faster than inserts in SQL (though all in all shouldn't really be, and is an 'older' procedure not used too much anymore). You could also look into whether you can avoid doing a complete flush and fill (ie, if there's a way to transfer less data across the network), or even simply shrinking the column sizes.

Oracle Vendor Code 17002 Large Select Columns

When executing a select that returns a large amount of columns over several tables the error "Vendor code 17002" is received. The query only returns one result. When the number of columns returned is less than 635, the query works. When another column is added the error is seen.
The following was seen in a dump file:
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x45] [PC:0x35797B4, _kkqstcrf()+1342]
DDE: Problem Key 'ORA 7445 [kkqstcrf()+1342]' was flood controlled (0x6) (incident: 10825)
ORA-07445: exception encountered: core dump [kkqstcrf()+1342] [ACCESS_VIOLATION] [ADDR:0x45] [PC:0x35797B4] [UNABLE_TO_READ] []
Dump file c:\app\7609179\diag\rdbms\orcl\orcl\trace\orcl_s001_9928.trc
Thu Feb 07 15:10:56 2013
ORACLE V11.2.0.1.0 - Production vsnsta=0
vsnsql=16 vsnxtr=3
Dumping diagnostics for abrupt exit from ksedmp
Windows 7, Oracle 11.2.0.1.0 Enterprise Edition, SQL Developer, Same result from Java Application.
ORA-07445 is a generic error which Oracle uses to signal unexpected behaviour in the OS i.e. a bug.
There should be some additional information in that trace file:
c:\app\7609179\diag\rdbms\orcl\orcl\trace\orcl_s001_9928.trc
Have you looked in it?
Unfortunately the nature of ORA-07445 means that the solution underlying problem is usually due to the specific combination of platform, OS and database versions. Oracle have published some advice on diagnosis but most routes lead to calling Oracle Support. Find out more.
At least you know the immediate cause. So if you don't have a Support contract there is a workaround: change you application so you don't have to select that 635th column. That is an awful lot of columns to have in a single query.
There isn't an actual limit to the number of columns permitted in a query's projection but it's possible that the total length of the statement exceeds the limit. This limit varies according to several factors and isn't ispecified in the docs. How long (how many chars) is the statement with and with out that pesky additional column? perhaps shortening some column names will do the trick.

Resources