Efficient macro looping in SAS to get to Oracle Stored Procedure - oracle

I'm using SAS to access an Oracle database. The problem is that the function / stored procedure lives on one server in Oracle - which is fine when my data lives there too - but when the data is on a different server I still want to use that function. So I loaded some macros with the personal id's to pass them to the function in a loop. It works, but it's painfully slow. I don't need 'optimal', just 'reasonable'...my datasets will max around 100,000 rows. I've read that creating a dataset is one of the most resource intensive jobs in SAS, so I'm experimenting with creating an empty table and insert into, but I haven't noticed much gain yet.
So the question is - can I use the Oracle stored procedures for data on a different server in a reasonable amount of time within SAS? (Either by improving my existing approach or something completely different)
My first attempt (around 25 minutes for 13,000 personal id's):
%MACRO STATE() ;
options nosource nonotes;
%* 2. get macro max loop n;
proc sql noprint;
select left(put(count(distinct pidm),10.)) into :loopn from examp
;quit;
%* 3. load macros with the pidms of interest;
proc sql noprint;
select distinct pidm into :pidm1 - :pidm&loopn from examp order by pidm;
quit;
%Do i = 1 %TO &loopn ; /*build em */
%* %put **************LOOP &i OF &loopn *********************;
proc sql noprint;
connect to oracle as mycon(user=xxxxxx password=xxxxxxx path='PROD') ;
create table subsetdat&i as
select * from connection to mycon
(select %quote(&&pidm&i) as pidm ,UILIB.ADDR.STATE(&&pidm&i, 'MA') as state
from dual);
disconnect from mycon ;
; quit;
%END;
data state; set subsetdat1-subsetdat&loopn ; /*stack 'em */
%Do j = 1 %TO &loopn ; /*drop 'em */
proc sql ;
drop table subsetdat&j
;
%END;
options source notes;
%MEND STATE ;
options nomprint;
%STATE() ;

Move to loop inside the proc sql, thereby removing the overhead of creating multiple datasets from multiple pass-through queries, and use a union all to 'stack' the individual query results together.
%MACRO STATE() ;
options nosource nonotes;
/* 2. get macro max loop n; */
proc sql noprint;
select left(put(count(distinct pidm),10.)) into :loopn from examp
;quit;
/* 3. load macros with the pidms of interest; */
proc sql noprint;
select distinct pidm into :pidm1 - :pidm&loopn from examp order by pidm;
quit;
/* Build single pass-thru query with multiple select ... union all select ... etc */
proc sql noprint;
connect to oracle as mycon(user=xxxxxx password=xxxxxxx path='PROD') ;
create table state as
select * from connection to mycon
(%DO I = 1 %TO &loopn ; /*build em */
select %quote(&&pidm&i) as pidm ,UILIB.ADDR.STATE(&&pidm&i, 'MA') as state from dual
%IF &I lt &LOOPN %THEN %DO ; /* if not last iteration do a `union all` */
union all
%END ;
%END ;
) ;
disconnect from mycon ;
quit;
options source notes;
%MEND STATE ;
options nomprint;
%STATE() ;

Related

Assign a consistent random number to id in SAS across datasets

I have two datasets data1 and data2 with an id column. I want to assign a random id to each id, but this random number needs to be consistent across datasets. (rand_id for id=1 must be the same in both datasets). The objective is to get:
id
rand_id
1
0.4212
2
0.5124
3
0.1231
id
rand_id
1
0.4212
3
0.1231
2
0.5124
4
0.9102
Note that Id's do not need to be ordered, and some Id's might appear in one dataset but not at the other one. I thought
DATA data1;
SET data1;
CALL STREAMINIT(id);
rand_id=RAND('uniform');
RUN;
and the same for data2 would do the job, but it does not. It just takes as seed the first id and generates a sequence of random numbers.
From the STREAMINIT documentation, it seems it's only called once per data setp. I'd like to be called it in every row. Is this possible?
The idea is to create a table random_values with an associated random id for each id that we later join on the two tables.
*assign random seed;
%let random_seed = 71514218;
*list of unique id;
proc sql;
create table unique_id as
select distinct id
from (
select id from have1
union all
select id from have2
)
;
quit;
*add random values;
data random_values;
set unique_id;
call streaminit(&random_seed.);
rand = rand('uniform', 0, 1);
run;
*join back on have1;
proc sql;
create table have1 as
select t1.id, t2.rand as rand_id
from have1 t1 left join random_values t2
on t1.id = t2.id
;
quit;
*join back on have2;
proc sql;
create table have2 as
select t1.id, t2.rand as rand_id
from have2 t1 left join random_values t2
on t1.id = t2.id
;
quit;
Why not use a lookup dataset. You could create/update it using HASH object.
First make an empty dataset:
data rand_id;
set one(keep=id);
rand_id=.;
stop;
run;
Then process the first dataset. Adding the new RAND_ID variable to that dataset and also populating the RAND_ID dataset with all of the unique ID values.
data one_random;
if _n_=1 then do;
declare hash h(dataset:'rand_id');
rc=h.definekey('id');
rc=h.definedata('id','rand_id');
rc=h.definedone();
end;
if eof then rc=h.output(dataset:'rand_id');
set one end=eof;
if h.find() then do;
rand_id=rand('uniform');
rc=h.add();
end;
drop rc;
run;
Repeat for any other datasets that share the same ID variable.
data two_random;
if _n_=1 then do;
declare hash h(dataset:'rand_id');
rc=h.definekey('id');
rc=h.definedata('id','rand_id');
rc=h.definedone();
end;
if eof then rc=h.output(dataset:'rand_id');
set two end=eof;
if h.find() then do;
rand_id=rand('uniform');
rc=h.add();
end;
drop rc;
run;
Simplest way to do this in my opinion is to create a format dataset. Tom's hash example is fine also, but this is probably easier if you don't know hash tables.
Do NOT seed the random number from the ID itself - this is not random anymore.
data forfmt;
set data1;
call streaminit(7);
label = put(rand('Uniform'),12.9);
start = id;
fmtname = 'RANDIDF';
output;
if _n_ eq 1 then do;
hlo='o';
label='.';
output;
end;
run;
proc format cntlin=forfmt;
quit;
Then you can use put(id,randidf.) to assign the random ID (and use input instead of put and make it an informat, if you want it to be numeric, that's handled via type='i'; and needs the input to be character or turned into character via put). No sorting required, very fast lookup most of the time.
Solved:
DATA data1;
SET data1;
seed = id;
CALL RANUNI(seed,rand_id);
DROP seed;
RUN;
Generates the desired result.

Randomly select 10 subjects and retain all of their observations

I am stuck with a the following problem in SAS. I have a dataset of this format:
The dataet consists of 500ids with different number of observations per ID. I'm trying to randomly select 5id's and at the same time retain all of their observations. I built a random generator in the first place saving a vector with 10 numbers in the interval [1,500]. However it became clumpsy when I tried to use this vector in order to select the ids correspoding to the vector with the random numbers. To be more clear, I want my net result to be a dataset which includes all observations correspoding to ID 1,10,43, 22, 67, or any other sequence of 5 numbers.
Any tip will be more than appreciated!
From your question, I assume you already have your 10 random numbers. If they are saved in a table/dataset, you can run a left join between them and your original dataset, by id. This will pull out all the original observations with the same id.
Let's say that your ramdonly selected numbers are saved in a table called "random_ids". Then, you can do:
proc sql;
create table want as
select distinct
t1.id,
t2.*
from random_ids as t1
left join have as t2 on t1.id = t2.id;
quit;
If your random numbers are not saved in a dataset, you may simply copy them to a where statement, like:
proc sql;
create table want as
select distinct
*
from have
where id in (1 10 43 22 67) /*here you put the ids you want*/
quit;
Best,
Proc SURVEYSELECT is your friend.
data have;
call streaminit(123);
do _n_ = 1 to 500;
id = rand('integer', 1e6);
do seq = 1 to rand('integer', 35);
output;
end;
end;
run;
proc surveyselect noprint data=have sampsize=5 out=want;
cluster id;
run;
proc sql noprint;
select count(distinct id) into :id_count trimmed from want;
%put NOTE: &=id_count;
If you don't have the procedure as part of your SAS license, you can do sample selection per k/n algorithm. NOTE: Earliest archived post for k/n is May 1996 SAS-L message which has code based on a 1995 SAS Observations magazine article.
proc sql noprint;
select count(distinct id) into :N trimmed from have;
proc sort data=have;
by id;
data want_kn;
retain N &N k 5;
if _n_ = 1 then call streaminit(123);
keep = rand('uniform') < k / N;
if keep then k = k - 1;
do until (last.id);
set have;
by id;
if keep then output;
end;
if k = 0 then stop;
N = N - 1;
drop k N keep;
run;
proc sql noprint;
select count(distinct id) into :id_count trimmed from want_kn;
%put NOTE: &=id_count;

Data Calculation for joining two tables

I am studying Foxpro to create a simple application for manipulating data from two tables A and B (size of tableB >> size of tableA). The data from an Excel spreadsheet is imported into these two tables.
tableA
id balance load state
1 10 null l
2 22 null l
3 31 null l
tableB
Load id id ord fact type 1st value rounded value state
1 1 1 0.09 1 null null l
2 1 2 0.02 0 null null l
3 1 3 0.13 1 null null l
4 1 4 -0.05 0 null null l
5 2 1 0.01 1 null null l
6 2 2 0.092 1 null null l
7 2 3 0.03 0 null null l
8 3 1 0.14 1 null null l
9 3 2 0.12 0 null null l
10 3 3 -0.02 0 null null l
My friend wants me to write a Foxpro code to do the following things: first, create empty tableA and tableB containing the columns shown above. Each columns will be loaded by (hundreds of thousands) of data from an excel spreadsheet everyday. Second, for each unique id, the code updates the 3 columns 1st value, rounded value and load with given formulas:
1st value[i] = If(Type[i]=0, load[i-1]*fact[i], load[i-1]*fact[i]/(1-fact[i]))
1st value[1] = If(Type[1]=0, balance[1]*fact[1], balance[1]*fact[1]/(1-fact[1]))
rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
load[i+1] = load[i] + rounded value[i+1] (i >= 1)
load[1] = balance[1] + rounded value[1]
I think I have to create a table like the following to store the calculation above for this step:
Calculation Table
balance id ord 1st value rounded value load
10 1 1 0.989 0.90 10.9 (= 10 + 0.9)
10.9 1 2 0.218 0.20 11.1 (= 10.9 + 0.2)
11.1 1 3 1.658 1.60 12.7 (= 11.1 + 1.6)
11.06 1 4 -0.635 -0.64 11.06 (=12.7 + (-0.64))
Desired output
Using results in Calculation Table, we update the original tableA and tableB as follows:
tableB
Load id id ord 1st value rounded value state
1 1 1 0.989 0.90 calculated
2 1 2 0.218 0.20 calculated
3 1 3 1.658 1.60 calculated
4 1 4 -0.635 -0.64 calculated
5 2 1 ... .... calculated
6 2 2 ... .... calculated
tableA (Note: for each value in `load id`, the `load` column only stores the **last** value in the `calculation` table which corresponds to maximum `ord`)
id balance load state
1 10 9.5 calculated
2 22 ... calculated
3 31 ... calculated
Can anyone please help me with the syntax for creating tableB, computing and store results for columns 1st value, rounded value and load into a calculation table with Inner Join function on id column between tableA and tableB , and update tableB?
My attempt:
First step (Creating two tables A and B with column fields shown above)
CREATE TABLE tableA;
( id int, ;
balance double, ;
load C(240), ;
state C(240), ;)
CREATE TABLE tableB;
( Load id int, ;
id int, ;
ord int, ;
fact double, ;
type binary (not sure....) ;
1st value C(240),;
rounded value C(240), ;
state C(240), ;)
(adding as another answer just because others got too long to read)
can you try your code with this dataset
(drive.google.com/open?id=1uCWwt5ubd2_F8w2gsh3v4VDpibWz7PAz) to see if
you will get the two output tables from your code, each similar to the
one shown in the previous Excel worksheet I uploaded for you?
I downloaded that spreadsheet and here is what I needed to change:
Your ranges were C8:F35 and H8:O62 for tableA and B. Also your "balance" was named "base". New code (downloaded to d:\temp\workbook2.xlsx) edited to match ranges and "balance" to "base":
* Get the data from given excel filename and ranges
* first range is tableA, second one is tableB
GetDataFromExcel("d:\temp\WorkBook2.xlsx", "Sheet1$C8:F35", "Sheet1$H8:O62")
* Now data is in cursors csrA and crsB do the calculation in these
DoCalculation()
* Done. Show the results selecting and browsing the crsA and B
Select crsA
Browse
Select crsB
Browse
* Get specific fields only from crsB
Select loadId, id, ord, firstVal, roundedVal, state ;
from crsB ;
into cursor crsBCustom ;
nofilter
browse
* Check data from both cursors (join)
* I chose the fields as I see fit
* ta and tb are local aliases for crsA and crsB
* helping to write shorter SQL in this case
Select tb.LoadId, tb.Id, ta.base, ta.load, ;
tb.firstValue, tb.roundVal, ;
ta.State as StateA, tb.State as StateB ;
from crsA ta ;
inner join crsB tb on ta.Id = tb.Id ;
order by tb.Id, tb.Ord ;
into cursor crsBoth ;
NoFilter
browse
* Does the specific calculations on specific data
Procedure DoCalculation
*1st value[1] = If(Type[1]=0, Base[1]*fact[1], Base[1]*fact[1]/(1-fact[1]))
*rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
*rounded value[1] = If(1st value[1]>0, rounddown(1st value[1], 1), roundup(1st value[1],2)
*load[1] = Base[1] + rounded value[1]
* i > 1 - ord > 1
*1st value[i] = If(Type[i]=0, load[i-1]*fact[i], load[i-1]*fact[i]/(1-fact[i]))
*rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
*load[i+1] = load[i] + rounded value[i+1] (i >= 1)
*declare local variable
Local lnBase
* select crsB and create an index there
Select CrsB
Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag ALinkB
* select crsA as parent and link to crsB
* using the "id" part of index
Select crsA
Set Relation To Padl(Id,10,'0') Into CrsB
* start looping the rows
Scan
* working with a new Id (1, 2, ...)
* save base value to m.lnBase
lnBase = crsA.Base
* select crsB and start looping the rows there
* because of the index in effect and the relation created
* pointer would be on the first crsB row with a matching Id
* and since Ord is also part of the index the first row of
* given Id
* Limit the looping in crsB (child table) to Id in crsA
* using WHILE clause
Select CrsB
Scan While Id = crsA.Id
* do replacing starting on first row of this Id (Ord=1)
* we don't have any scope clauses in replace, thus
* we are doing "single row" updates
Replace ;
firstValue With m.lnBase*fact / Iif(!Type, 1, 1-fact), ;
roundVal With Iif(firstValue > 0, ;
roundDown(firstValue,1), ;
roundUp(firstValue, 2))
* after each replace update m.lnBase value
* to use in next row
lnBase = m.lnBase + CrsB.roundVal
Endscan
* completed updating crsB
* select crsA and also update crsA.base with final 'load' value
Select crsA
Replace Load With m.lnBase
Endscan
* Update state to 'Calculated'
Update crsA set state = 'Calculated'
Update crsB set state = 'Calculated'
Endproc
* Get data from excel with given filename and ranges
* This code is not generic and expects the
* data to be in a specific format.
* Does not do any error check
Procedure GetDataFromExcel(tcExcelFileName, tcTableARange, tcTableBRange)
* declare and define the connection string to excel
Local lcConStr
lcConStr = ;
'Provider=Microsoft.ACE.OLEDB.12.0;'+;
'Data Source='+Fullpath(m.tcExcelFileName)+';'+;
'Extended Properties="Excel 12.0;HDR=Yes"'
* Declare and define the 2 SQL needed to get data for A and B
* rename the fields in SQL for easier handling
Local lcSQLA, lcSQLB
TEXT to lcSQLA textmerge noshow
Select [id], [base], [load], [state]
from [<< m.tcTableARange >>]
ENDTEXT
TEXT to m.lcSQLB textmerge noshow
select
[Load Id] as LoadId,
[Id], [Ord], [Fact], [Type],
[1st value] as firstValue,
[Rounded value] as roundVal,
[State]
from [<< m.tcTableBRange >>]
ENDTEXT
* Execute the queries and place results in given cursors
ADOQuery(m.lcConStr, m.lcSQLA, "crsTableA")
ADOQuery(m.lcConStr, m.lcSQLB, "crsTableB")
* Sanitize the cursors a bit
* (OledB query would assign rather generic datatypes)
Select Cast(Id As Int) As Id, Cast(Base As Double) As Base, ;
Cast(Load As Double) As Load, Cast(State As c(50)) As State ;
from crsTableA ;
into Cursor crsA ;
readwrite
Select Cast(LoadId As Int) As LoadId, ;
Cast(Id As Int) As Id, Cast(ord As Int) As ord, ;
Cast(fact As Double) As fact, Cast(Type As logical) As Type, ;
Cast(firstValue As Double) As firstValue, ;
Cast(roundVal As Double) As roundVal, ;
Cast(State As c(50)) As State From crsTableB ;
into Cursor CrsB ;
readwrite
Use In (Select('crsTableA'))
Use In (Select('crsTableB'))
Endproc
* roundUp and down custom functions
* RoundUp and Down excel style
* Not correct math wise IMHO
Procedure roundUp(tnValue, tnPlaces)
Local lnResult, lnValue
lnValue = Abs(m.tnValue)
If Round(m.lnValue, m.tnPlaces) != m.lnValue
lnValue = Round(m.lnValue+((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
Return Sign(m.tnValue) * m.lnValue
Endproc
Procedure roundDown(tnValue, tnPlaces)
Local lnResult, lnValue
lnValue = Abs(m.tnValue)
If Round(m.lnValue, m.tnPlaces) != m.lnValue
lnValue = Round(m.lnValue-((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
Return Sign(m.tnValue) * m.lnValue
Endproc
* Generic function to query a given data source
* and place results in a cursor
Procedure ADOQuery(tcConStr,tcQuery,tcCursorName)
Local oConn As 'ADODB.Connection'
Local oRS As ADODB.RecordSet
oConn = Createobject('ADODB.Connection')
oConn.Mode= 1 && adModeRead
oConn.Open( m.tcConStr )
oRS = oConn.Execute(m.tcQuery)
RS2Cursor(oRS,m.tcCursorName)
oRS.Close
oConn.Close
Endproc
* Helper function to ADOQuery to convert
* an ADODB.Recordset to a VFP cursor
Procedure RS2Cursor(toRS, tcCursorName) && simple single cursor - not intended for complex ones
tcCursorName = Iif(Empty(m.tcCursorName),'ADORs',m.tcCursorName)
Local xDOM As 'MSXML.DOMDocument'
xDOM = Createobject('MSXML.DOMDocument')
toRS.Save(xDOM, 1)
Xmltocursor(xDOM.XML, m.tcCursorName)
Endproc
This is the whole code. Just changing the filepath and name to yours, select all the code, right click and execute selection to see results. Or save it as a prg, say ImportMyExcel.prg and run it:
ImportMyExcel()
You could see the results I have so I didn't upload any results.
Also, is Procedure RS2Cursor(toRS, tcCursorName) intended to generate
the 2 output tables? Why do we need this procedure though: Procedure
ADOQuery(tcConStr,tcQuery,tcCursorName)?
Well those procedures are a little tricky for a newcomer (maybe not). I think you should know the history of VFP, cursors, cursor adapters, converting ADO recordset to a cursor etc (probably advanced level). I don't know, those were the procedures I came up with and published also on the foxite link that I gave to you. Just think they are black boxed (like a built-in one) functions doing they are work. ADOQuery's work is to simply query an OLEDB source and return the result as a cursor. With a cursorAdapter you might not need such a procedure but that procedure was designed before CursorAdapter existence.
Two more questions please: 1) where does the m come from in
m.lnBalance?
m. explicitly notifies the compiler that it is a memory variable. It is referred to as MDOT. There are developers who claim it is not needed and generally it leads to long running discussions (and likely you would find my name in those discussions). Up until today nobody could show and\or demonstrate me why we shouldn't or we don't need to use it. If you believe me it is not a preference but a thing that you should use.
2) Don't we need to define crsTableA? Or you meant we can use the
CREATE Table tableA in your previous code to make crsTableA valid?
No. There is no table in that code. We read the data from excel into a cursor (crsTableA and crsTableB initially) and then sanitize into 2 cursors crsA and crsB. All of them are cursors. Cursors are like tables but are not persisted on disk. They may even spend all their life in memory and are gone when you close them. Here I preferred cursors because without harming any real data you could run N times and check your results. When you are satisfied persisting the data is as simple as a "Select ... into" or "insert into ..." (there are more ways too) a table. Even in the case of a table you don't need to use "Create Table ...". A "select Into ..." command can select the data from a source and save it to a table by creating it (like a combined 'create table ...' and then 'insert into ...').
Also, I saw that B9:E12 does not match the range of tableA or tableB
in the Excel spreadsheet I uploaded for you before. Am I missing
something here?
It matched your original samples if you think data starts at B9 and G9 respectively.
I have another question: can you please clarify on what these lines
do: Select CrsB Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag
ALinkB Select crsA Set Relation To Padl(Id,10,'0') Into CrsB.
I think I explained this part in the previous question. I will soon comment the code itself.
Adding as another answer to prevent clutter. I can do further explanations if you need to. Here I used the Excel ranges that would match to sample data. You would replace the range with the actual one (as well as the excel filename):
GetDataFromExcel("c:\myFolder\myExcel.xlsx", "B9:E12", "G9:N19")
DoCalculation()
Select crsA
Browse
Select crsB
Browse
Procedure DoCalculation
*1st value[1] = If(Type[1]=0, balance[1]*fact[1], balance[1]*fact[1]/(1-fact[1]))
*rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
*rounded value[1] = If(1st value[1]>0, rounddown(1st value[1], 1), roundup(1st value[1],2)
*load[1] = balance[1] + rounded value[1]
* i > 1 - ord > 1
*1st value[i] = If(Type[i]=0, load[i-1]*fact[i], load[i-1]*fact[i]/(1-fact[i]))
*rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
*load[i+1] = load[i] + rounded value[i+1] (i >= 1)
Local lnBalance
Select CrsB
Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag ALinkB
Select crsA
Set Relation To Padl(Id,10,'0') Into CrsB
Scan
lnBalance = crsA.Balance
Select CrsB
Scan While Id = crsA.Id
Replace ;
firstValue With m.lnBalance*fact / Iif(!Type, 1, 1-fact), ;
roundVal With Iif(firstValue > 0, ;
roundDown(firstValue,1), ;
roundUp(firstValue, 2))
lnBalance = m.lnBalance + CrsB.roundVal
Endscan
Select crsA
Replace Load With m.lnBalance
Endscan
Endproc
Procedure GetDataFromExcel(tcExcelFileName, tcTableARange, tcTableBRange)
Local lcConStr
lcConStr = ;
'Provider=Microsoft.ACE.OLEDB.12.0;'+;
'Data Source='+Fullpath(m.tcExcelFileName)+';'+;
'Extended Properties="Excel 12.0;HDR=Yes"'
Local lcSQLA, lcSQLB
TEXT to lcSQLA textmerge noshow
Select [id], [balance], [load], [state]
from [Sheet1$<< m.tcTableARange >>]
ENDTEXT
TEXT to m.lcSQLB textmerge noshow
select
[Load Id] as LoadId,
[Id], [Ord], [Fact], [Type],
[1st value] as firstValue,
[Rounded value] as roundVal,
[State]
from [Sheet1$<< m.tcTableBRange >>]
ENDTEXT
ADOQuery(m.lcConStr, m.lcSQLA, "crsTableA")
ADOQuery(m.lcConStr, m.lcSQLB, "crsTableB")
Select Cast(Id As Int) As Id, Cast(Balance As Double) As Balance, ;
Cast(Load As Double) As Load, Cast(State As c(1)) As State ;
from crsTableA ;
into Cursor crsA ;
readwrite
Select Cast(LoadId As Int) As LoadId, ;
Cast(Id As Int) As Id, Cast(ord As Int) As ord, ;
Cast(fact As Double) As fact, Cast(Type As logical) As Type, ;
Cast(firstValue As Double) As firstValue, ;
Cast(roundVal As Double) As roundVal, ;
Cast(State As c(1)) As State From crsTableB ;
into Cursor CrsB ;
readwrite
Use In (Select('crsTableA'))
Use In (Select('crsTableB'))
Endproc
Procedure roundUp(tnValue, tnPlaces)
If Round(m.tnValue, m.tnPlaces) = m.tnValue
Return m.tnValue
Else
Return Round(m.tnValue+((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
Endproc
Procedure roundDown(tnValue, tnPlaces)
If Round(m.tnValue, m.tnPlaces) = m.tnValue
Return m.tnValue
Else
Return Round(m.tnValue-((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
Endproc
Procedure ADOQuery(tcConStr,tcQuery,tcCursorName)
Local oConn As 'ADODB.Connection'
Local oRS As ADODB.RecordSet
oConn = Createobject('ADODB.Connection')
oConn.Mode= 1 && adModeRead
oConn.Open( m.tcConStr )
oRS = oConn.Execute(m.tcQuery)
RS2Cursor(oRS,m.tcCursorName)
oRS.Close
oConn.Close
Endproc
Procedure RS2Cursor(toRS, tcCursorName) && simple single cursor - not intended for complex ones
tcCursorName = Iif(Empty(m.tcCursorName),'ADORs',m.tcCursorName)
Local xDOM As 'MSXML.DOMDocument'
xDOM = Createobject('MSXML.DOMDocument')
toRS.Save(xDOM, 1)
Xmltocursor(xDOM.XML, m.tcCursorName)
Endproc
EDIT: I edited the other answer for the comments beneath it. Now for your questions:
Shouldn't GetDataFromExcel("c:\myFolder\myExcel.xlsx", "B9:E12", "G9:N19") get called after the Procedure Procedure
GetDataFromExcel(tcExcelFileName, tcTableARange, tcTableBRange)??
No. Procedures are always placed after normal execution code in a prg file. IOW if your PRG has:
Do Something
* ...
Procedure SomeProcedure
* ...
endproc
Procedure Something
endproc
Code starts with calling Something and executes the lines after that up until it sees the first Procedure call (or FUNCTION, DEFINE CLASS). Something might be a procedure (as in the sample) or a separate prg.
Shouldn't Procedure roundUp and Procedure roundDown get called before roundDown(firstValue,1), ; roundUp(firstValue, 2))??
No, same as the above. What you say more looks like the rules of core C.
Does the left ID on this line Scan While Id = crsA.Id come from CrsB?? Also, why is there the change from crsA to CrsA? Is this a
typo? – user177196 5 mins ago
Yes. it comes from crsB. But in a sense, you are right I should be explicit and include the alias there as:
Scan while crsB.Id = crsA.Id
In VFP if you don't include an alias, then the one that is current is assumed.
We are scanning crsA in outer loop. Then we are switching to crsB and scanning there, after we are done switching back to crsA (actually scan command remembers the alias it is associated and does this switch when it hits endscan implicitly but I prefer to be explicit).
EDIT:
Select CrsB
Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag ALinkB
Select crsA
Set Relation To Padl(Id,10,'0') Into CrsB
On first two lines we are selecting crsB cursor and creating an index on it. Index expression contains both the Id and Old fields. VFP doesn't support multiple column names in an index key, but it supports expressions. Padding both fields with 10 zeros we are creating keys like:
Id, Ord: 2,3 as an example has a key 00000000020000000003
We could make it smaller but anyway since not knowing how much big the Id,Ord could be made it 10 in length to fit any 32 bits integer value.
Then on 3rd, 4th lines we are selecting cursor crsA and then setting relation from crsA into crsB via the expression Padl(Id,10,'0') - Id padded with 10 zeros. From crsA Id:1 has a relation key of 0000000001 then (matching all index keys that start with 0000000001 whatever the Ord part is - BTW having Ord in index too makes sure that they are ordered by Ord).
In effect, when the record pointer points to Id:1 in crsA, in crsB automatically those with Id:1 are matched (best observed with a browse - browse crsB then select crsA and browse. As you navigate in crsA, you would see the browse window for crsB would show only the rows with matching Id). Conceptually it looks like this controlling the record pointer in both cursors:
crsA (id) crsB (Id, Ord)
1 ----+------- 1,1
+------- 1,2
+------- 1,3
+------- 1,4
2 ----+------- 2,1
+------- 2,2
+------- 2,3
I used that because it is a powerful feature of VFP was an easier way to express what you want. The same could be achieved by using SQL Update too, however, VFP's SQL is not that much powerful and would be much more complex to write (For [1] easy but for > 1 case it gets complex - it was also not so easy in other backends too in distant past but in time, backends like postgreSQL, MS SQL server ... etc have gained much more support for such queries).
Well you have a long question, containing multiple questions within. I will try to reply in pieces (editing my answer in between), since it would be a long answer (might even be good to divide into multiple answers).
First, your create table syntax was close but incorrect. VFP (it is not VFB but V FP by the way), does not support spaces in field names (unless it is a long fieldname). Using field names with spaces would just be asking for trouble. So prefer not using them. It would look like:
CREATE TABLE tableA;
( id int, ;
balance double, ;
load C(240), ;
state C(240))
CREATE TABLE tableB;
( Load id int, ;
id int, ;
ord int, ;
fact double, ;
type int ;
firstValue C(240),;
roundedVal C(240), ;
state C(240))
Note that after final field you don't have comma and ; in VFP means continue the command on next line (so removed in last field definition lines). I also changed the 2 field names to be compatible with a free table's field naming (max 10 in length and must start with a letter, no spaces). Easier to use the tables this way.or cursors provided you do it in one shot and do not try to change the structure later.
If you want to use longfieldnames then you can do that just as you do with free tables but the table needs to be part of a database. It would also work for cursors provided you do that in one shot and do not attempt to alter the structure afterwards.
While I added code there to create TableA, TableB, you are saying those tables' data would come from Excel. You didn't really give detailed information about the Excel part of it (how data is represented-is that as a data ranges?). There is a great probability that you create these two tables simply by selecting the data from Excel using ODBC/OLEDB directly.
For getting data from Excel I posted some detailed information on Foxite, you can check the post in this link. I am not giving any sample code here as I don't yet know the Excel part really.
Assuming we got the data from Excel let's check other parts (BTW in table B id is called a Foreign Key, not primary. It links the rows in TableB top TableA).
1st value[i] = If(Type[i]=0, balance[i]*fact[i], balance[i]*fact[i]/(1-fact[i]))
We can use either REPLACE command (xBase command) or SQL Update command to accomplish this. Let's do not think about the differences here (not worth really) and choose SQL Update to do job (the syntax would be reusable in other databases too - say MS SQL server, postgreSQL, mySQL ...).
Update tableB ;
set firstValue = iif( type = 0, ;
tableA.balance * fact, ;
tableA.balance * fact/(1-fact)) ;
from tableA ;
where tableA.Id = tableB.Id
Or slightly simplified:
Update tableB ;
set firstValue = tableA.balance * fact / ;
iif( type = 0, 1, (1-fact)) ;
from tableA ;
where tableA.Id = tableB.Id
Note that VFP would execute this expression per row so we don't need the [i] (array identifier) that you have in your pseudocode.
Next one:
rounded value[i] = If(Type[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
Would be translated in the same manner:
Update tableB ;
set roundVal = iif(type > 0, ;
rounddown(firstValue,1), ;
roundup(firstValue,2)) ;
from tableA ;
where tableA.Id = tableB.Id
However, VFP doesn't have roundup and rounddown functions, I only wrote these as a conceptual translation. What you can do is to create two custom functions that does RoundUp and RoundDown. There are multiple ways to write these functions and IMHO the easiest would be to write them as 2 separate .prg files where those prg files are in your search path when you execute the above SQL command:
RoundUp.prg
Lparameters tnValue, tnPlaces
If Round(m.tnValue, m.tnPlaces) = m.tnValue
Return m.tnValue
Else
Return Round(m.tnValue+((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
RoundDown.prg
Lparameters tnValue, tnPlaces
If Round(m.tnValue, m.tnPlaces) = m.tnValue
Return m.tnValue
Else
Return Round(m.tnValue-((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
The functions in the link you provided doesn't seem right to me for the job (but was not easy to understand and test so didn't spend time on checking thoroughly).
I am not sure if one sheet containing both tables is good. I don't remember off the top of my head, if Tables collection was a member of the WorkSheet or WorkBook. If WorkSheet then that would do. I can check and write sample code for that later (possibly tomorrow).
You could use datatype LOGICAL (l) for Type. In MS SQL server and other backends it correspond to bit (1 or 0). Internally stored as boolean but in expressions used as .T./.F. (true\false symbolic representation in VFP. On code you could simply use it as:
iif( type, ...
same as saying iif(type = .T., ...) - as in Type > 0. And:
iif( !type, ...
same as saying iif( type = .F., ...) or iif( type NOT equal to .T., ... - as in Type = 0.
I didn't use inner join in this case, because it is sufficient to use a from TableA where here (same in other backends, although general tendency is to write that using join).
EDIT: Added the code as another answer.
As per your questions:
Inner join is not needed to be explicitly defined, there is an implicit join there. Instead of writing an SQL update, I preferred to utilize VFP's xBase capabilities and used scan...endscan instead (could do with SQL but would be more complex).
Yes it means putting those 2 RoundUp.prg and RoundDown.prg files into the same directory path of our main file code above BUT only if main file code is in current directory or in search path. To make it more clear, consider:
c:\SomeFolder\RoundUp.prg
c:\SomeFolder\RoundDown.prg
c:\ANOTHERFolder\Main.prg
and you are in:
c:\YetAnotherFolder
If you call main.prg like this:
do ('c:\ANOTHERFolder\Main.prg')
It needs to find RoundUp, RoundDown and it can if c:\Somefolder is included in SET('PATH') - ie:
Set path to c:\SomeFolder;c:\VFPHomeFolderMaybe
Or if you don't want to think of pathing you could include those RoundUp\Down code as procedure in the code (as I did in the code in the other answer - note that in VFP there is no difference between a PROCEDURE and a FUNCTION. You are free to choose either one. Some developers prefer to use FUNCTION for those that return a value - but in fact any PROCEDURE\FUNCTION returns a value so let's say those that are used for a return value.)
I don't think logical type mean "1" or "0" automatically, correct? If
that's the case, I would have to leave it as int type, because the
input is always defined as 1 or 0 for type column.
Well, that is hard to answer formally. In VFP boolean data
type is defined by literals .F. and .T. You can cast(aBoolean to int) and you get 0 and 1 respectively. Or you can cast(1 as logical) to get .T. IOW 1\0 and .T..F. are interchangeable in a sense. It all depends where you want to use it. If data is coming from external source, it would come in as 1\0. Just by casting or getting it into column of datatype logical (implicit cast) it is treated as .T..F. Or you are sending data from a logical to an external source (say an XML, MS SQL server, postgreSql, other OLEDB\ODBC datasource) then .T..F. is casted as 1\0.

VFP8: Check if query returned a result

I am importing a bunch of tables and have found data errors in some of them. These errors were introduced when the tables were created, years ago. I want to create a simple alert to notify me that I should manually check the table.
The following works, but it pops up the query results, which I don't want.
procedure checkForBadRecord
select * ;
from table_x ;
where field_x = 'thing used to determine it's bad'
if _tally > 0 then
messagebox("Check the table for errors!")
endif
endproc
Is there a way to check if a table has any rows that meet a condition without showing the actual rows?
I am using Visual FoxPro 8.
You could add "INTO ARRAY dummyCursorName" after there WHERE clause:
select * ;
from table_x ;
where field_x = 'thing used to determine it's bad' ;
INTO ARRAY dummyCursorName
_TALLY will still report the statistic and no annoying browse window to deal with.
To prevent the result to be shown just specify a target for the result. "into array" or "into cursor" would do.
According to your current code, you are not interested with the row(s) returned so you could simply get the count instead (you also had typo in the code). ie:
procedure checkForBadRecord
local array laBadCount[1]
select count(*) ;
from table_x ;
where field_x = "thing used to determine it's bad" ;
into array laBadCount
use in (select('table_x'))
if laBadCount[1] > 0 then
messagebox("Check the table for errors!")
endif
endproc
Probably instead of writing such a procedure you would want to write this procedure for a more generic use:
if checkForBadRecord('table_x', 'field_x', "thing used to determine it's bad")
messagebox("Check the table table_x for errors!")
endif
procedure checkForBadRecord(tcTableName, tcFieldToCheck, tuValueToCheck)
local array laBadCount[1]
select count(*) ;
from &tcTableName ;
where &tcFieldToCheck = m.tuValueToCheck ;
into array laBadCount
use in (select(m.tcTableName))
return laBadCount[1] > 0
endproc
Note: You could use "To Screen" as well to suppress the results and get the count via _Tally. ie:
procedure checkForBadRecord
set console OFF
select * ;
from table_x ;
where field_x = "thing used to determine it's bad" ;
to SCREEN
set console ON
use in (select('table_x'))
if _Tally > 0 then
messagebox("Check the table for errors!")
endif
endproc

Proper syntax for SAS macro date in Oracle query

I am trying to query the past week's additions to an Oracle database overnight and need to use macros to populate the dates. I am able to run the query below if I hard-code the actual dates. I've tried double and single quotes on the macro vars &sd and &ed. Please advise.
data _null_;
sd = dhms(today()-7,00,00,00);
ed = dhms(today()-1,23,59,59);
call symput("sd", put(sd, datetime20.));
call symput("ed", put(ed, datetime20.));
run;
%put &sd &ed;
proc sql;
connect to oracle (user=x password=x path=x);
create table weekly_test as
select * from connection to oracle
(select * from x.Estimates
where state_fips_code = '41'
and altered_date between
to_date('&sd','DDMONYYYY:HH24:MI:SS')
and to_date('&ed','DDMONYYYY:HH24:MI:SS'));
disconnect from oracle;
quit;
error
ORACLE execute error: ORA-01858: a non-numeric character was found where a numeric was expected.
and with double quotes
and altered_date between
to_date("&sd",'DDMONYYYY:HH24:MI:SS')
and to_date("&ed",'DDMONYYYY:HH24:MI:SS'));
this error
ERROR: ORACLE prepare error: ORA-00904: " 21MAR2012:23:59:59": invalid identifier. SQL
statement: select * from X.Estimates where state_fips_code = '41' and altered_date
between to_date(" 15MAR2012:00:00:00",'DDMONYYYY:HH24:MI:SS') and to_date("
21MAR2012:23:59:59",'DDMONYYYY:HH24:MI:SS').
The best bet is to define your macro variable with single quotes around the values. In fact, I don't think it's necessary to format it as a datetime literal; just construct a normal ANSI date string (YYYY-MM-DD) and you can also get rid of the TO_DATE function call.
For example, try these two statements:
%let SD=%str(%')%sysfunc( putn( %sysfunc(intnx(day,%sysfunc(today()) ,-7)),yymmdd10.))%str(%');
%let ED=%str(%')%sysfunc( putn( %sysfunc(intnx(day,%sysfunc(today()) ,-1)),yymmdd10.))%str(%');
Those define SD as today()-7 and ED as today()-1 (using pure macro code rather than a data step). Then, in your query, reference these macro variables unquoted:
proc sql;
connect to oracle (user=x password=x path=x);
create table weekly_test as
select * from connection to oracle
(select * from x.Estimates
where state_fips_code = '41'
and altered_date between &sd and &ed
);
disconnect from oracle;
quit;
Many thanks Bob. I tried the code you posted and got ORA-01861: literal does not match format string. Anyway you got me thinking on the right path. I just added code to put single quotes around my dates in the data step and it worked. For anyone with similar problems code is below.
data _null_;
sd = dhms(today()-7,00,00,00);
ed = dhms(today()-1,23,59,59);
call symput('sd',"'"|| trim(left(put(sd, datetime20.)))||"'");
call symput('ed', "'"||trim(left(put(ed, datetime20.)))||"'");
run;
%put &sd &ed;
proc sql;
connect to oracle (user=x password=x path=x);
create table weekly_test as
select * from connection to oracle
(select * from x.Estimates
where state_fips_code = '41'
and altered_date between
to_date(&sd,'DDMONYYYY:HH24:MI:SS')
and to_date(&ed,'DDMONYYYY:HH24:MI:SS'));
disconnect from oracle;
quit;
This works....
%LET SD = %SYSFUNC(intnx(day,"&SYSDATE9"d,-7,b),date9.) ;
%LET ED = %SYSFUNC(intnx(day,"&SYSDATE9"d,-1,b),date9.) ;
%PUT &SD &ED ;
proc sql ;
connect to oracle (user=x password=x path=x);
create table weekly_test as
select * from connection to oracle
(select * from x.Estimates
where state_fips_code = '41'
and altered_date between %BQUOTE('&SD') and %BQUOTE('&ED')
);
disconnect from oracle ;
quit ;

Resources