SAS join (or insert) little table to big table - insert

I have little problem. I have big table and few little table where little tables including part of fields from big table. How I can insert (or union) tables on the basis of if field is the same - set data, if little table not have field from big - set null/0 in big table.
Example:
data temp1;
infile DATALINES dsd missover;
input a b c d e f g;
CARDS;
1, 2, 3, 4,5,6
2, 3, , 5
3, 3
4,,3,2,3,
;
run;
data temp2;
infile DATALINES dsd missover;
input a c e g;
CARDS;
5, 2, 3, 4
6, 3, , 5
7, 3
;
run;
Is there an elegant method where if I insert temp2 to temp1 - missing fields in temp2 will set value of null in temp1?
Thank you for help!

That is exactly what SAS does by default.
data want ;
set have1 have2;
run;
It will match the variables by name and any variables that do not exist (in either source) will have missing values.
For better performance when appending a small table to a large table you should use PROC APPEND instead of a data step to avoid having to make new copy of the large dataset. That is more like an "insert". The FORCE option will allow the dataset to be different. But since the new data is being added to the old dataset any extra variables that appear only in HAVE2 will just be ignored and their values will be lost.
proc append base=have1 data=have2 force ;
run;
If you really did have to generate an actual INSERT statement (perhaps you are actually trying to generate SQL code to run in a foreign database) you might want to compare the metadata of the two datasets and find the common variables.
proc contents data=have1 out=cont1 noprint; run;
proc contents data=have2 out=cont2 noprint; run;
proc sql noprint;
select a.name into :varlist separated by ','
from cont2 a
inner join cont1 b
on upcase(a.name) = upcase(b.name)
;
...
insert into have1 (&varlist) select &varlist from have2 ;

It is not very clear to me what operation you intend to do but some initial thoughts are:
To compare columns between two datasets (and check whether a value exists in one of them) it is good practice to use an outer join. You can do joins via MERGE clause in a datastep, or more elegantly use PROC SQL.
However, using either approach you will have to specify which two rows in temp1 and temp2 shall be compared - you are typically joining on a column that is available in both tables.
To help us resolve your issue, could you possibly provide the correct output for your desired operation, if you perform it on temp1 and temp2? This would show what options you've explored and what needs to be fixed there.

you should try proc append.that will be more efficient because you will not reading your big table again and again unlike in
/*reads temp1 which is big table and temp2*/
data temp3;
set temp1 temp2;
run;
/* this does pretty much same as above code but will not read your big table
and will be efficient*/
proc append base=temp1 data=temp2 force;
run;
more on proc append in documentation http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n19kwc3onglzh2n1l2k4e39edv3x.htm

Related

Assign a consistent random number to id in SAS across datasets

I have two datasets data1 and data2 with an id column. I want to assign a random id to each id, but this random number needs to be consistent across datasets. (rand_id for id=1 must be the same in both datasets). The objective is to get:
id
rand_id
1
0.4212
2
0.5124
3
0.1231
id
rand_id
1
0.4212
3
0.1231
2
0.5124
4
0.9102
Note that Id's do not need to be ordered, and some Id's might appear in one dataset but not at the other one. I thought
DATA data1;
SET data1;
CALL STREAMINIT(id);
rand_id=RAND('uniform');
RUN;
and the same for data2 would do the job, but it does not. It just takes as seed the first id and generates a sequence of random numbers.
From the STREAMINIT documentation, it seems it's only called once per data setp. I'd like to be called it in every row. Is this possible?
The idea is to create a table random_values with an associated random id for each id that we later join on the two tables.
*assign random seed;
%let random_seed = 71514218;
*list of unique id;
proc sql;
create table unique_id as
select distinct id
from (
select id from have1
union all
select id from have2
)
;
quit;
*add random values;
data random_values;
set unique_id;
call streaminit(&random_seed.);
rand = rand('uniform', 0, 1);
run;
*join back on have1;
proc sql;
create table have1 as
select t1.id, t2.rand as rand_id
from have1 t1 left join random_values t2
on t1.id = t2.id
;
quit;
*join back on have2;
proc sql;
create table have2 as
select t1.id, t2.rand as rand_id
from have2 t1 left join random_values t2
on t1.id = t2.id
;
quit;
Why not use a lookup dataset. You could create/update it using HASH object.
First make an empty dataset:
data rand_id;
set one(keep=id);
rand_id=.;
stop;
run;
Then process the first dataset. Adding the new RAND_ID variable to that dataset and also populating the RAND_ID dataset with all of the unique ID values.
data one_random;
if _n_=1 then do;
declare hash h(dataset:'rand_id');
rc=h.definekey('id');
rc=h.definedata('id','rand_id');
rc=h.definedone();
end;
if eof then rc=h.output(dataset:'rand_id');
set one end=eof;
if h.find() then do;
rand_id=rand('uniform');
rc=h.add();
end;
drop rc;
run;
Repeat for any other datasets that share the same ID variable.
data two_random;
if _n_=1 then do;
declare hash h(dataset:'rand_id');
rc=h.definekey('id');
rc=h.definedata('id','rand_id');
rc=h.definedone();
end;
if eof then rc=h.output(dataset:'rand_id');
set two end=eof;
if h.find() then do;
rand_id=rand('uniform');
rc=h.add();
end;
drop rc;
run;
Simplest way to do this in my opinion is to create a format dataset. Tom's hash example is fine also, but this is probably easier if you don't know hash tables.
Do NOT seed the random number from the ID itself - this is not random anymore.
data forfmt;
set data1;
call streaminit(7);
label = put(rand('Uniform'),12.9);
start = id;
fmtname = 'RANDIDF';
output;
if _n_ eq 1 then do;
hlo='o';
label='.';
output;
end;
run;
proc format cntlin=forfmt;
quit;
Then you can use put(id,randidf.) to assign the random ID (and use input instead of put and make it an informat, if you want it to be numeric, that's handled via type='i'; and needs the input to be character or turned into character via put). No sorting required, very fast lookup most of the time.
Solved:
DATA data1;
SET data1;
seed = id;
CALL RANUNI(seed,rand_id);
DROP seed;
RUN;
Generates the desired result.

How to update a column based on count of another column from another table using any condition in visual foxpro?

SET STEP ON
Close Databases
Cd e:\ksv\Data
Use ohd IN 0 shared
Use cus IN 0 shared
SELECT * FROM cus inTO TABLE tempcus
ALTER table tempcus ADD COLUMN totalsold int
UPDATE tempcus SET totalsold=RECCOUNT(ohd.status='5') WHERE tempcus.customer=ohd.customer
SELECT * FROM tempcus INTO CURSOR cur
BROWSE
I have tried the above code and i am getting an error saying invalid table number , can someone help me with this.
RECCOUNT() function only gives you a record count for a workarea# or alias, e.g. RECCOUNT("ohd") will give total record count of ohd table.
You want something like:
SELECT COUNT(*) totalsold,cus.customer FROM cus JOIN ohd ON cus.customer=ohd.customer WHERE ohd.cstatus='5' INTO CURSOR cur GROUP BY cus.customer
BROWSE
In VFP, there is a REPLACE command which allows you to replace one or more fields based on whatever values, even if variable results from other queries... or fixed values. Ex: This works on whatever table is the current selected work area and whatever row it is on, unless you apply a scope clause (for condition).
Sample only for context of REPLACE command
use SomeOtherTable in 0 shared
select SomeOtherTable
replace SomeNumberField with 1.234, SomeStringField with 'Hello', etc...
or with condition (bogus, just to show you can apply to multiple rows.
replace SomeNumberField with SomeNumberField * 3 for StatusField = 'X'
Now, back to your original content. It appears you are trying to get a result temporary table with a total number of records from the OHD table where the status = 5. VFP allows you to run SQL-Select into temporary read-write "cursor" tables, that when closed will delete themselves, yet allows them to be modified (such as browse, or other direct manipulation such as with REPLACE command).
You can get the counts you are looking for with a left-join to a query result set. To help you see the pieces individually, I will do in steps so you can follow, then join into one final.
First, you want a count of all records in the OHD table with status = 5 per customer... the "o" and "c" are ALIAS references in the SQL queries below
SET STEP ON
Close Databases
Cd e:\ksv\Data
Use ohd IN 0 shared
Use cus IN 0 shared
select ;
o.customer, ;
count(*) NumberOfRecords ;
from ;
OHD o ;
where ;
o.status = '5' ;
group by ;
o.customer ;
into ;
cursor C_JustCountsPerCustomer READWRITE
The "into cursor" part above will create a workable table and give it the name of "C_JustCountsPerCustomer". I have always tried to use "C_" as a prefix to the table name for the sole purpose to know it is a temporary "CURSOR" result and not a real final table, but that is just my historical naming convention applied.
Now, if you did a browse of this result, you would see each customer's ID and how many with status = '5'. The resulting table "cursor" is like any other table opened and you could index as you need and browse, etc. But this only will give records that HAD status of '5'. But you could have more customers that never had a '5' status record.
Now, getting all your customers and their respective counts into one result table "cursor". I can take the above query and use within a SQL-Select via a LEFT-JOIN meaning, give me everything from the first table (left-side), regardless of a matching record found in the second table (right-side). But if there is a match to the right side, give me those values too.
select ;
c.*, ;
NVL( C_tmpResult.NumberOfRecords, 0000 ) as NumberOfRecords ;
from;
CUS c ;
LEFT JOIN ;
(select ;
o.customer, ;
count(*) NumberOfRecords ;
from ;
OHD o ;
where ;
o.status = '5' ;
group by ;
o.customer ) C_tmpResult ;
ON ;
c.customer = C_tmpResult.customer ;
into ;
cursor C_CusWithCounts readwrite
So, you can see the left-join uses the first query to get the counts, but the primary part of the query gets records from the customer table (alias "c") and is joined on the common customer id column. The "NVL()" states if there IS a value in the C_tmpResult table for the given customer, grab that. If not, assume a count of 0. Yes, I explicitly have 0000 to force a minimum final width to 4 digits in the result in case the first customer does not have any and it make the column only 1 digit wide.
Anyhow, at the end, you would have your result temporary table (cursor) with the customer information AND the count I think you are looking for. You should be able to do a browse and good to go.

Data Calculation for joining two tables

I am studying Foxpro to create a simple application for manipulating data from two tables A and B (size of tableB >> size of tableA). The data from an Excel spreadsheet is imported into these two tables.
tableA
id balance load state
1 10 null l
2 22 null l
3 31 null l
tableB
Load id id ord fact type 1st value rounded value state
1 1 1 0.09 1 null null l
2 1 2 0.02 0 null null l
3 1 3 0.13 1 null null l
4 1 4 -0.05 0 null null l
5 2 1 0.01 1 null null l
6 2 2 0.092 1 null null l
7 2 3 0.03 0 null null l
8 3 1 0.14 1 null null l
9 3 2 0.12 0 null null l
10 3 3 -0.02 0 null null l
My friend wants me to write a Foxpro code to do the following things: first, create empty tableA and tableB containing the columns shown above. Each columns will be loaded by (hundreds of thousands) of data from an excel spreadsheet everyday. Second, for each unique id, the code updates the 3 columns 1st value, rounded value and load with given formulas:
1st value[i] = If(Type[i]=0, load[i-1]*fact[i], load[i-1]*fact[i]/(1-fact[i]))
1st value[1] = If(Type[1]=0, balance[1]*fact[1], balance[1]*fact[1]/(1-fact[1]))
rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
load[i+1] = load[i] + rounded value[i+1] (i >= 1)
load[1] = balance[1] + rounded value[1]
I think I have to create a table like the following to store the calculation above for this step:
Calculation Table
balance id ord 1st value rounded value load
10 1 1 0.989 0.90 10.9 (= 10 + 0.9)
10.9 1 2 0.218 0.20 11.1 (= 10.9 + 0.2)
11.1 1 3 1.658 1.60 12.7 (= 11.1 + 1.6)
11.06 1 4 -0.635 -0.64 11.06 (=12.7 + (-0.64))
Desired output
Using results in Calculation Table, we update the original tableA and tableB as follows:
tableB
Load id id ord 1st value rounded value state
1 1 1 0.989 0.90 calculated
2 1 2 0.218 0.20 calculated
3 1 3 1.658 1.60 calculated
4 1 4 -0.635 -0.64 calculated
5 2 1 ... .... calculated
6 2 2 ... .... calculated
tableA (Note: for each value in `load id`, the `load` column only stores the **last** value in the `calculation` table which corresponds to maximum `ord`)
id balance load state
1 10 9.5 calculated
2 22 ... calculated
3 31 ... calculated
Can anyone please help me with the syntax for creating tableB, computing and store results for columns 1st value, rounded value and load into a calculation table with Inner Join function on id column between tableA and tableB , and update tableB?
My attempt:
First step (Creating two tables A and B with column fields shown above)
CREATE TABLE tableA;
( id int, ;
balance double, ;
load C(240), ;
state C(240), ;)
CREATE TABLE tableB;
( Load id int, ;
id int, ;
ord int, ;
fact double, ;
type binary (not sure....) ;
1st value C(240),;
rounded value C(240), ;
state C(240), ;)
(adding as another answer just because others got too long to read)
can you try your code with this dataset
(drive.google.com/open?id=1uCWwt5ubd2_F8w2gsh3v4VDpibWz7PAz) to see if
you will get the two output tables from your code, each similar to the
one shown in the previous Excel worksheet I uploaded for you?
I downloaded that spreadsheet and here is what I needed to change:
Your ranges were C8:F35 and H8:O62 for tableA and B. Also your "balance" was named "base". New code (downloaded to d:\temp\workbook2.xlsx) edited to match ranges and "balance" to "base":
* Get the data from given excel filename and ranges
* first range is tableA, second one is tableB
GetDataFromExcel("d:\temp\WorkBook2.xlsx", "Sheet1$C8:F35", "Sheet1$H8:O62")
* Now data is in cursors csrA and crsB do the calculation in these
DoCalculation()
* Done. Show the results selecting and browsing the crsA and B
Select crsA
Browse
Select crsB
Browse
* Get specific fields only from crsB
Select loadId, id, ord, firstVal, roundedVal, state ;
from crsB ;
into cursor crsBCustom ;
nofilter
browse
* Check data from both cursors (join)
* I chose the fields as I see fit
* ta and tb are local aliases for crsA and crsB
* helping to write shorter SQL in this case
Select tb.LoadId, tb.Id, ta.base, ta.load, ;
tb.firstValue, tb.roundVal, ;
ta.State as StateA, tb.State as StateB ;
from crsA ta ;
inner join crsB tb on ta.Id = tb.Id ;
order by tb.Id, tb.Ord ;
into cursor crsBoth ;
NoFilter
browse
* Does the specific calculations on specific data
Procedure DoCalculation
*1st value[1] = If(Type[1]=0, Base[1]*fact[1], Base[1]*fact[1]/(1-fact[1]))
*rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
*rounded value[1] = If(1st value[1]>0, rounddown(1st value[1], 1), roundup(1st value[1],2)
*load[1] = Base[1] + rounded value[1]
* i > 1 - ord > 1
*1st value[i] = If(Type[i]=0, load[i-1]*fact[i], load[i-1]*fact[i]/(1-fact[i]))
*rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
*load[i+1] = load[i] + rounded value[i+1] (i >= 1)
*declare local variable
Local lnBase
* select crsB and create an index there
Select CrsB
Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag ALinkB
* select crsA as parent and link to crsB
* using the "id" part of index
Select crsA
Set Relation To Padl(Id,10,'0') Into CrsB
* start looping the rows
Scan
* working with a new Id (1, 2, ...)
* save base value to m.lnBase
lnBase = crsA.Base
* select crsB and start looping the rows there
* because of the index in effect and the relation created
* pointer would be on the first crsB row with a matching Id
* and since Ord is also part of the index the first row of
* given Id
* Limit the looping in crsB (child table) to Id in crsA
* using WHILE clause
Select CrsB
Scan While Id = crsA.Id
* do replacing starting on first row of this Id (Ord=1)
* we don't have any scope clauses in replace, thus
* we are doing "single row" updates
Replace ;
firstValue With m.lnBase*fact / Iif(!Type, 1, 1-fact), ;
roundVal With Iif(firstValue > 0, ;
roundDown(firstValue,1), ;
roundUp(firstValue, 2))
* after each replace update m.lnBase value
* to use in next row
lnBase = m.lnBase + CrsB.roundVal
Endscan
* completed updating crsB
* select crsA and also update crsA.base with final 'load' value
Select crsA
Replace Load With m.lnBase
Endscan
* Update state to 'Calculated'
Update crsA set state = 'Calculated'
Update crsB set state = 'Calculated'
Endproc
* Get data from excel with given filename and ranges
* This code is not generic and expects the
* data to be in a specific format.
* Does not do any error check
Procedure GetDataFromExcel(tcExcelFileName, tcTableARange, tcTableBRange)
* declare and define the connection string to excel
Local lcConStr
lcConStr = ;
'Provider=Microsoft.ACE.OLEDB.12.0;'+;
'Data Source='+Fullpath(m.tcExcelFileName)+';'+;
'Extended Properties="Excel 12.0;HDR=Yes"'
* Declare and define the 2 SQL needed to get data for A and B
* rename the fields in SQL for easier handling
Local lcSQLA, lcSQLB
TEXT to lcSQLA textmerge noshow
Select [id], [base], [load], [state]
from [<< m.tcTableARange >>]
ENDTEXT
TEXT to m.lcSQLB textmerge noshow
select
[Load Id] as LoadId,
[Id], [Ord], [Fact], [Type],
[1st value] as firstValue,
[Rounded value] as roundVal,
[State]
from [<< m.tcTableBRange >>]
ENDTEXT
* Execute the queries and place results in given cursors
ADOQuery(m.lcConStr, m.lcSQLA, "crsTableA")
ADOQuery(m.lcConStr, m.lcSQLB, "crsTableB")
* Sanitize the cursors a bit
* (OledB query would assign rather generic datatypes)
Select Cast(Id As Int) As Id, Cast(Base As Double) As Base, ;
Cast(Load As Double) As Load, Cast(State As c(50)) As State ;
from crsTableA ;
into Cursor crsA ;
readwrite
Select Cast(LoadId As Int) As LoadId, ;
Cast(Id As Int) As Id, Cast(ord As Int) As ord, ;
Cast(fact As Double) As fact, Cast(Type As logical) As Type, ;
Cast(firstValue As Double) As firstValue, ;
Cast(roundVal As Double) As roundVal, ;
Cast(State As c(50)) As State From crsTableB ;
into Cursor CrsB ;
readwrite
Use In (Select('crsTableA'))
Use In (Select('crsTableB'))
Endproc
* roundUp and down custom functions
* RoundUp and Down excel style
* Not correct math wise IMHO
Procedure roundUp(tnValue, tnPlaces)
Local lnResult, lnValue
lnValue = Abs(m.tnValue)
If Round(m.lnValue, m.tnPlaces) != m.lnValue
lnValue = Round(m.lnValue+((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
Return Sign(m.tnValue) * m.lnValue
Endproc
Procedure roundDown(tnValue, tnPlaces)
Local lnResult, lnValue
lnValue = Abs(m.tnValue)
If Round(m.lnValue, m.tnPlaces) != m.lnValue
lnValue = Round(m.lnValue-((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
Return Sign(m.tnValue) * m.lnValue
Endproc
* Generic function to query a given data source
* and place results in a cursor
Procedure ADOQuery(tcConStr,tcQuery,tcCursorName)
Local oConn As 'ADODB.Connection'
Local oRS As ADODB.RecordSet
oConn = Createobject('ADODB.Connection')
oConn.Mode= 1 && adModeRead
oConn.Open( m.tcConStr )
oRS = oConn.Execute(m.tcQuery)
RS2Cursor(oRS,m.tcCursorName)
oRS.Close
oConn.Close
Endproc
* Helper function to ADOQuery to convert
* an ADODB.Recordset to a VFP cursor
Procedure RS2Cursor(toRS, tcCursorName) && simple single cursor - not intended for complex ones
tcCursorName = Iif(Empty(m.tcCursorName),'ADORs',m.tcCursorName)
Local xDOM As 'MSXML.DOMDocument'
xDOM = Createobject('MSXML.DOMDocument')
toRS.Save(xDOM, 1)
Xmltocursor(xDOM.XML, m.tcCursorName)
Endproc
This is the whole code. Just changing the filepath and name to yours, select all the code, right click and execute selection to see results. Or save it as a prg, say ImportMyExcel.prg and run it:
ImportMyExcel()
You could see the results I have so I didn't upload any results.
Also, is Procedure RS2Cursor(toRS, tcCursorName) intended to generate
the 2 output tables? Why do we need this procedure though: Procedure
ADOQuery(tcConStr,tcQuery,tcCursorName)?
Well those procedures are a little tricky for a newcomer (maybe not). I think you should know the history of VFP, cursors, cursor adapters, converting ADO recordset to a cursor etc (probably advanced level). I don't know, those were the procedures I came up with and published also on the foxite link that I gave to you. Just think they are black boxed (like a built-in one) functions doing they are work. ADOQuery's work is to simply query an OLEDB source and return the result as a cursor. With a cursorAdapter you might not need such a procedure but that procedure was designed before CursorAdapter existence.
Two more questions please: 1) where does the m come from in
m.lnBalance?
m. explicitly notifies the compiler that it is a memory variable. It is referred to as MDOT. There are developers who claim it is not needed and generally it leads to long running discussions (and likely you would find my name in those discussions). Up until today nobody could show and\or demonstrate me why we shouldn't or we don't need to use it. If you believe me it is not a preference but a thing that you should use.
2) Don't we need to define crsTableA? Or you meant we can use the
CREATE Table tableA in your previous code to make crsTableA valid?
No. There is no table in that code. We read the data from excel into a cursor (crsTableA and crsTableB initially) and then sanitize into 2 cursors crsA and crsB. All of them are cursors. Cursors are like tables but are not persisted on disk. They may even spend all their life in memory and are gone when you close them. Here I preferred cursors because without harming any real data you could run N times and check your results. When you are satisfied persisting the data is as simple as a "Select ... into" or "insert into ..." (there are more ways too) a table. Even in the case of a table you don't need to use "Create Table ...". A "select Into ..." command can select the data from a source and save it to a table by creating it (like a combined 'create table ...' and then 'insert into ...').
Also, I saw that B9:E12 does not match the range of tableA or tableB
in the Excel spreadsheet I uploaded for you before. Am I missing
something here?
It matched your original samples if you think data starts at B9 and G9 respectively.
I have another question: can you please clarify on what these lines
do: Select CrsB Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag
ALinkB Select crsA Set Relation To Padl(Id,10,'0') Into CrsB.
I think I explained this part in the previous question. I will soon comment the code itself.
Adding as another answer to prevent clutter. I can do further explanations if you need to. Here I used the Excel ranges that would match to sample data. You would replace the range with the actual one (as well as the excel filename):
GetDataFromExcel("c:\myFolder\myExcel.xlsx", "B9:E12", "G9:N19")
DoCalculation()
Select crsA
Browse
Select crsB
Browse
Procedure DoCalculation
*1st value[1] = If(Type[1]=0, balance[1]*fact[1], balance[1]*fact[1]/(1-fact[1]))
*rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
*rounded value[1] = If(1st value[1]>0, rounddown(1st value[1], 1), roundup(1st value[1],2)
*load[1] = balance[1] + rounded value[1]
* i > 1 - ord > 1
*1st value[i] = If(Type[i]=0, load[i-1]*fact[i], load[i-1]*fact[i]/(1-fact[i]))
*rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
*load[i+1] = load[i] + rounded value[i+1] (i >= 1)
Local lnBalance
Select CrsB
Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag ALinkB
Select crsA
Set Relation To Padl(Id,10,'0') Into CrsB
Scan
lnBalance = crsA.Balance
Select CrsB
Scan While Id = crsA.Id
Replace ;
firstValue With m.lnBalance*fact / Iif(!Type, 1, 1-fact), ;
roundVal With Iif(firstValue > 0, ;
roundDown(firstValue,1), ;
roundUp(firstValue, 2))
lnBalance = m.lnBalance + CrsB.roundVal
Endscan
Select crsA
Replace Load With m.lnBalance
Endscan
Endproc
Procedure GetDataFromExcel(tcExcelFileName, tcTableARange, tcTableBRange)
Local lcConStr
lcConStr = ;
'Provider=Microsoft.ACE.OLEDB.12.0;'+;
'Data Source='+Fullpath(m.tcExcelFileName)+';'+;
'Extended Properties="Excel 12.0;HDR=Yes"'
Local lcSQLA, lcSQLB
TEXT to lcSQLA textmerge noshow
Select [id], [balance], [load], [state]
from [Sheet1$<< m.tcTableARange >>]
ENDTEXT
TEXT to m.lcSQLB textmerge noshow
select
[Load Id] as LoadId,
[Id], [Ord], [Fact], [Type],
[1st value] as firstValue,
[Rounded value] as roundVal,
[State]
from [Sheet1$<< m.tcTableBRange >>]
ENDTEXT
ADOQuery(m.lcConStr, m.lcSQLA, "crsTableA")
ADOQuery(m.lcConStr, m.lcSQLB, "crsTableB")
Select Cast(Id As Int) As Id, Cast(Balance As Double) As Balance, ;
Cast(Load As Double) As Load, Cast(State As c(1)) As State ;
from crsTableA ;
into Cursor crsA ;
readwrite
Select Cast(LoadId As Int) As LoadId, ;
Cast(Id As Int) As Id, Cast(ord As Int) As ord, ;
Cast(fact As Double) As fact, Cast(Type As logical) As Type, ;
Cast(firstValue As Double) As firstValue, ;
Cast(roundVal As Double) As roundVal, ;
Cast(State As c(1)) As State From crsTableB ;
into Cursor CrsB ;
readwrite
Use In (Select('crsTableA'))
Use In (Select('crsTableB'))
Endproc
Procedure roundUp(tnValue, tnPlaces)
If Round(m.tnValue, m.tnPlaces) = m.tnValue
Return m.tnValue
Else
Return Round(m.tnValue+((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
Endproc
Procedure roundDown(tnValue, tnPlaces)
If Round(m.tnValue, m.tnPlaces) = m.tnValue
Return m.tnValue
Else
Return Round(m.tnValue-((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
Endproc
Procedure ADOQuery(tcConStr,tcQuery,tcCursorName)
Local oConn As 'ADODB.Connection'
Local oRS As ADODB.RecordSet
oConn = Createobject('ADODB.Connection')
oConn.Mode= 1 && adModeRead
oConn.Open( m.tcConStr )
oRS = oConn.Execute(m.tcQuery)
RS2Cursor(oRS,m.tcCursorName)
oRS.Close
oConn.Close
Endproc
Procedure RS2Cursor(toRS, tcCursorName) && simple single cursor - not intended for complex ones
tcCursorName = Iif(Empty(m.tcCursorName),'ADORs',m.tcCursorName)
Local xDOM As 'MSXML.DOMDocument'
xDOM = Createobject('MSXML.DOMDocument')
toRS.Save(xDOM, 1)
Xmltocursor(xDOM.XML, m.tcCursorName)
Endproc
EDIT: I edited the other answer for the comments beneath it. Now for your questions:
Shouldn't GetDataFromExcel("c:\myFolder\myExcel.xlsx", "B9:E12", "G9:N19") get called after the Procedure Procedure
GetDataFromExcel(tcExcelFileName, tcTableARange, tcTableBRange)??
No. Procedures are always placed after normal execution code in a prg file. IOW if your PRG has:
Do Something
* ...
Procedure SomeProcedure
* ...
endproc
Procedure Something
endproc
Code starts with calling Something and executes the lines after that up until it sees the first Procedure call (or FUNCTION, DEFINE CLASS). Something might be a procedure (as in the sample) or a separate prg.
Shouldn't Procedure roundUp and Procedure roundDown get called before roundDown(firstValue,1), ; roundUp(firstValue, 2))??
No, same as the above. What you say more looks like the rules of core C.
Does the left ID on this line Scan While Id = crsA.Id come from CrsB?? Also, why is there the change from crsA to CrsA? Is this a
typo? – user177196 5 mins ago
Yes. it comes from crsB. But in a sense, you are right I should be explicit and include the alias there as:
Scan while crsB.Id = crsA.Id
In VFP if you don't include an alias, then the one that is current is assumed.
We are scanning crsA in outer loop. Then we are switching to crsB and scanning there, after we are done switching back to crsA (actually scan command remembers the alias it is associated and does this switch when it hits endscan implicitly but I prefer to be explicit).
EDIT:
Select CrsB
Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag ALinkB
Select crsA
Set Relation To Padl(Id,10,'0') Into CrsB
On first two lines we are selecting crsB cursor and creating an index on it. Index expression contains both the Id and Old fields. VFP doesn't support multiple column names in an index key, but it supports expressions. Padding both fields with 10 zeros we are creating keys like:
Id, Ord: 2,3 as an example has a key 00000000020000000003
We could make it smaller but anyway since not knowing how much big the Id,Ord could be made it 10 in length to fit any 32 bits integer value.
Then on 3rd, 4th lines we are selecting cursor crsA and then setting relation from crsA into crsB via the expression Padl(Id,10,'0') - Id padded with 10 zeros. From crsA Id:1 has a relation key of 0000000001 then (matching all index keys that start with 0000000001 whatever the Ord part is - BTW having Ord in index too makes sure that they are ordered by Ord).
In effect, when the record pointer points to Id:1 in crsA, in crsB automatically those with Id:1 are matched (best observed with a browse - browse crsB then select crsA and browse. As you navigate in crsA, you would see the browse window for crsB would show only the rows with matching Id). Conceptually it looks like this controlling the record pointer in both cursors:
crsA (id) crsB (Id, Ord)
1 ----+------- 1,1
+------- 1,2
+------- 1,3
+------- 1,4
2 ----+------- 2,1
+------- 2,2
+------- 2,3
I used that because it is a powerful feature of VFP was an easier way to express what you want. The same could be achieved by using SQL Update too, however, VFP's SQL is not that much powerful and would be much more complex to write (For [1] easy but for > 1 case it gets complex - it was also not so easy in other backends too in distant past but in time, backends like postgreSQL, MS SQL server ... etc have gained much more support for such queries).
Well you have a long question, containing multiple questions within. I will try to reply in pieces (editing my answer in between), since it would be a long answer (might even be good to divide into multiple answers).
First, your create table syntax was close but incorrect. VFP (it is not VFB but V FP by the way), does not support spaces in field names (unless it is a long fieldname). Using field names with spaces would just be asking for trouble. So prefer not using them. It would look like:
CREATE TABLE tableA;
( id int, ;
balance double, ;
load C(240), ;
state C(240))
CREATE TABLE tableB;
( Load id int, ;
id int, ;
ord int, ;
fact double, ;
type int ;
firstValue C(240),;
roundedVal C(240), ;
state C(240))
Note that after final field you don't have comma and ; in VFP means continue the command on next line (so removed in last field definition lines). I also changed the 2 field names to be compatible with a free table's field naming (max 10 in length and must start with a letter, no spaces). Easier to use the tables this way.or cursors provided you do it in one shot and do not try to change the structure later.
If you want to use longfieldnames then you can do that just as you do with free tables but the table needs to be part of a database. It would also work for cursors provided you do that in one shot and do not attempt to alter the structure afterwards.
While I added code there to create TableA, TableB, you are saying those tables' data would come from Excel. You didn't really give detailed information about the Excel part of it (how data is represented-is that as a data ranges?). There is a great probability that you create these two tables simply by selecting the data from Excel using ODBC/OLEDB directly.
For getting data from Excel I posted some detailed information on Foxite, you can check the post in this link. I am not giving any sample code here as I don't yet know the Excel part really.
Assuming we got the data from Excel let's check other parts (BTW in table B id is called a Foreign Key, not primary. It links the rows in TableB top TableA).
1st value[i] = If(Type[i]=0, balance[i]*fact[i], balance[i]*fact[i]/(1-fact[i]))
We can use either REPLACE command (xBase command) or SQL Update command to accomplish this. Let's do not think about the differences here (not worth really) and choose SQL Update to do job (the syntax would be reusable in other databases too - say MS SQL server, postgreSQL, mySQL ...).
Update tableB ;
set firstValue = iif( type = 0, ;
tableA.balance * fact, ;
tableA.balance * fact/(1-fact)) ;
from tableA ;
where tableA.Id = tableB.Id
Or slightly simplified:
Update tableB ;
set firstValue = tableA.balance * fact / ;
iif( type = 0, 1, (1-fact)) ;
from tableA ;
where tableA.Id = tableB.Id
Note that VFP would execute this expression per row so we don't need the [i] (array identifier) that you have in your pseudocode.
Next one:
rounded value[i] = If(Type[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
Would be translated in the same manner:
Update tableB ;
set roundVal = iif(type > 0, ;
rounddown(firstValue,1), ;
roundup(firstValue,2)) ;
from tableA ;
where tableA.Id = tableB.Id
However, VFP doesn't have roundup and rounddown functions, I only wrote these as a conceptual translation. What you can do is to create two custom functions that does RoundUp and RoundDown. There are multiple ways to write these functions and IMHO the easiest would be to write them as 2 separate .prg files where those prg files are in your search path when you execute the above SQL command:
RoundUp.prg
Lparameters tnValue, tnPlaces
If Round(m.tnValue, m.tnPlaces) = m.tnValue
Return m.tnValue
Else
Return Round(m.tnValue+((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
RoundDown.prg
Lparameters tnValue, tnPlaces
If Round(m.tnValue, m.tnPlaces) = m.tnValue
Return m.tnValue
Else
Return Round(m.tnValue-((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
The functions in the link you provided doesn't seem right to me for the job (but was not easy to understand and test so didn't spend time on checking thoroughly).
I am not sure if one sheet containing both tables is good. I don't remember off the top of my head, if Tables collection was a member of the WorkSheet or WorkBook. If WorkSheet then that would do. I can check and write sample code for that later (possibly tomorrow).
You could use datatype LOGICAL (l) for Type. In MS SQL server and other backends it correspond to bit (1 or 0). Internally stored as boolean but in expressions used as .T./.F. (true\false symbolic representation in VFP. On code you could simply use it as:
iif( type, ...
same as saying iif(type = .T., ...) - as in Type > 0. And:
iif( !type, ...
same as saying iif( type = .F., ...) or iif( type NOT equal to .T., ... - as in Type = 0.
I didn't use inner join in this case, because it is sufficient to use a from TableA where here (same in other backends, although general tendency is to write that using join).
EDIT: Added the code as another answer.
As per your questions:
Inner join is not needed to be explicitly defined, there is an implicit join there. Instead of writing an SQL update, I preferred to utilize VFP's xBase capabilities and used scan...endscan instead (could do with SQL but would be more complex).
Yes it means putting those 2 RoundUp.prg and RoundDown.prg files into the same directory path of our main file code above BUT only if main file code is in current directory or in search path. To make it more clear, consider:
c:\SomeFolder\RoundUp.prg
c:\SomeFolder\RoundDown.prg
c:\ANOTHERFolder\Main.prg
and you are in:
c:\YetAnotherFolder
If you call main.prg like this:
do ('c:\ANOTHERFolder\Main.prg')
It needs to find RoundUp, RoundDown and it can if c:\Somefolder is included in SET('PATH') - ie:
Set path to c:\SomeFolder;c:\VFPHomeFolderMaybe
Or if you don't want to think of pathing you could include those RoundUp\Down code as procedure in the code (as I did in the code in the other answer - note that in VFP there is no difference between a PROCEDURE and a FUNCTION. You are free to choose either one. Some developers prefer to use FUNCTION for those that return a value - but in fact any PROCEDURE\FUNCTION returns a value so let's say those that are used for a return value.)
I don't think logical type mean "1" or "0" automatically, correct? If
that's the case, I would have to leave it as int type, because the
input is always defined as 1 or 0 for type column.
Well, that is hard to answer formally. In VFP boolean data
type is defined by literals .F. and .T. You can cast(aBoolean to int) and you get 0 and 1 respectively. Or you can cast(1 as logical) to get .T. IOW 1\0 and .T..F. are interchangeable in a sense. It all depends where you want to use it. If data is coming from external source, it would come in as 1\0. Just by casting or getting it into column of datatype logical (implicit cast) it is treated as .T..F. Or you are sending data from a logical to an external source (say an XML, MS SQL server, postgreSql, other OLEDB\ODBC datasource) then .T..F. is casted as 1\0.

PL/SQL array manipulation function

I'm new in PL/SQL. I have a matrix stored in the DB as a nested table. Something like,
the matrix is stored as a TABLE of objects (and objects are t1 number, t2 number, ... t100 number)
To to get the matrix it would be select x.* from test t, table(t.matrix) x where... , returning
|T1|T2|T3|...|T100|
I want to create a function that returns the sum over the row to be called using SQL only, something equivalent to
select sum(x.T1),sum(x.T2)...sum(x.T100) from test t, table(t.matrix) x where ...
Something like select bigsum(x.*) from table t, table(t.matrix)
It will be called several times, and I don't want to write the 100 columns every time.
If you want to sum the values from 100 different columns, you're going to have to explicitly list those 100 columns at some point. You can encapsulate that logic for that expression in a view or a function or a pipelined table function or some other construct so that you don't have to repeat the expression many times, you just have to reference the abstraction you've created (i.e. call the function that sums the 100 values).
Although it would likely complicate the problem rather than simplifying it, you could potentially create a solution that uses dynamic SQL to generate the 100 columns names and the expression to add them together if you really, really want to avoid writing out 100 column names. It is highly unlikely, however, that the extra complexity of resorting to dynamic SQL would be beneficial unless there are substantial requirements that you haven't mentioned here that make writing out the column names more than a bit repetitive.
" it'll be called several times, and don't want to write the 100
columns every time"
Why not create a view? Write it once, call it as many times as you like:
create or replace view bigsum
select t.whatever
, sum(x.T1) as sum_t1
, sum(x.T2) as sum_t2
...
, sum(x.T100) as sum_t100
from test t
, table(t.matrix) x
group by t.whatever
You would need to include identifying columns from TEST to allow you to join the view to other tables. This approach would give you something close to want you want:
select *
from bigsum
where whatever = 23
You can reduce the amount of typing further by processing a result set from the data dictionary view USER_TYPE_ATTRS (or a SQL*Plus description) in a decent text editor with a regex search'n'replace.
you can create a function in the below given form depending on your condition and if you require parameter then you can add them while creating function and use them in the condition required
create or replace function bigsum
return number
as
sumall number;
begin
select (sum(x.T1),sum(x.T2)...sum(x.T100)) into sumall
from test t, table(t.matrix) x where .(your condition).. ;
return sumall;
end;/
and call it in the manner
select bigsum from dual;

UPDATE on INSERT duplicate primary key in Oracle?

I have a simple INSERT query where I need to use UPDATE instead when the primary key is a duplicate. In MySQL this seems easier, in Oracle it seems I need to use MERGE.
All examples I could find of MERGE had some sort of "source" and "target" tables, in my case, the source and target is the same table. I was not able to make sense of the examples to create my own query.
Is MERGE the only way or maybe there's a better solution?
INSERT INTO movie_ratings
VALUES (1, 3, 5)
It's basically this and the primary key is the first 2 values, so an update would be like this:
UPDATE movie_ratings
SET rating = 8
WHERE mid = 1 AND aid = 3
I thought of using a trigger that would automatically execute the UPDATE statement when the INSERT was called but only if the primary key is a duplicate. Is there any problem doing it this way? I need some help with triggers though as I'm having some difficulty trying to understand them and doing my own.
MERGE is the 'do INSERT or UPDATE as appropriate' statement in Standard SQL, and probably therefore in Oracle SQL too.
Yes, you need a 'table' to merge from, but you can almost certainly create that table on the fly:
MERGE INTO Movie_Ratings M
USING (SELECT 1 AS mid, 3 AS aid, 8 AS rating FROM dual) N
ON (M.mid = N.mid AND M.aid = N.aid)
WHEN MATCHED THEN UPDATE SET M.rating = N.rating
WHEN NOT MATCHED THEN INSERT( mid, aid, rating)
VALUES(N.mid, N.aid, N.rating);
(Syntax not verified.)
A typical way of doing this is
performing the INSERT and catch a DUP_VAL_ON_INDEX and then perform an UPDATE instead
performing the UPDATE first and if SQL%Rows = 0 perform an INSERT
You can't write a trigger on a table that does another operation on the same table. That's causing an Oracle error (mutating tables).
I'm a T-SQL guy but a trigger in this case is not a good solution. Most triggers are not good solutions. In T-SQL, I would simply perform an IF EXISTS (SELECT * FROM dbo.Table WHERE ...) but in Oracle, you have to select the count...
DECLARE
cnt NUMBER;
BEGIN
SELECT COUNT(*)
INTO cnt
FROM mytable
WHERE id = 12345;
IF( cnt = 0 )
THEN
...
ELSE
...
END IF;
END;
It would appear that MERGE is what you need in this case:
MERGE INTO movie_ratings mr
USING (
SELECT rating, mid, aid
WHERE mid = 1 AND aid = 3) mri
ON (mr.movie_ratings_id = mri.movie_ratings_id)
WHEN MATCHED THEN
UPDATE SET mr.rating = 8 WHERE mr.mid = 1 AND mr.aid = 3
WHEN NOT MATCHED THEN
INSERT (mr.rating, mr.mid, mr.aid)
VALUES (1, 3, 8)
Like I said, I'm a T-SQL guy but the basic idea here is to "join" the movie_rating table against itself. If there's no performance hit on using the "if exists" example, I'd use it for readability.

Resources