Data Calculation for joining two tables - visual-foxpro

I am studying Foxpro to create a simple application for manipulating data from two tables A and B (size of tableB >> size of tableA). The data from an Excel spreadsheet is imported into these two tables.
tableA
id balance load state
1 10 null l
2 22 null l
3 31 null l
tableB
Load id id ord fact type 1st value rounded value state
1 1 1 0.09 1 null null l
2 1 2 0.02 0 null null l
3 1 3 0.13 1 null null l
4 1 4 -0.05 0 null null l
5 2 1 0.01 1 null null l
6 2 2 0.092 1 null null l
7 2 3 0.03 0 null null l
8 3 1 0.14 1 null null l
9 3 2 0.12 0 null null l
10 3 3 -0.02 0 null null l
My friend wants me to write a Foxpro code to do the following things: first, create empty tableA and tableB containing the columns shown above. Each columns will be loaded by (hundreds of thousands) of data from an excel spreadsheet everyday. Second, for each unique id, the code updates the 3 columns 1st value, rounded value and load with given formulas:
1st value[i] = If(Type[i]=0, load[i-1]*fact[i], load[i-1]*fact[i]/(1-fact[i]))
1st value[1] = If(Type[1]=0, balance[1]*fact[1], balance[1]*fact[1]/(1-fact[1]))
rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
load[i+1] = load[i] + rounded value[i+1] (i >= 1)
load[1] = balance[1] + rounded value[1]
I think I have to create a table like the following to store the calculation above for this step:
Calculation Table
balance id ord 1st value rounded value load
10 1 1 0.989 0.90 10.9 (= 10 + 0.9)
10.9 1 2 0.218 0.20 11.1 (= 10.9 + 0.2)
11.1 1 3 1.658 1.60 12.7 (= 11.1 + 1.6)
11.06 1 4 -0.635 -0.64 11.06 (=12.7 + (-0.64))
Desired output
Using results in Calculation Table, we update the original tableA and tableB as follows:
tableB
Load id id ord 1st value rounded value state
1 1 1 0.989 0.90 calculated
2 1 2 0.218 0.20 calculated
3 1 3 1.658 1.60 calculated
4 1 4 -0.635 -0.64 calculated
5 2 1 ... .... calculated
6 2 2 ... .... calculated
tableA (Note: for each value in `load id`, the `load` column only stores the **last** value in the `calculation` table which corresponds to maximum `ord`)
id balance load state
1 10 9.5 calculated
2 22 ... calculated
3 31 ... calculated
Can anyone please help me with the syntax for creating tableB, computing and store results for columns 1st value, rounded value and load into a calculation table with Inner Join function on id column between tableA and tableB , and update tableB?
My attempt:
First step (Creating two tables A and B with column fields shown above)
CREATE TABLE tableA;
( id int, ;
balance double, ;
load C(240), ;
state C(240), ;)
CREATE TABLE tableB;
( Load id int, ;
id int, ;
ord int, ;
fact double, ;
type binary (not sure....) ;
1st value C(240),;
rounded value C(240), ;
state C(240), ;)

(adding as another answer just because others got too long to read)
can you try your code with this dataset
(drive.google.com/open?id=1uCWwt5ubd2_F8w2gsh3v4VDpibWz7PAz) to see if
you will get the two output tables from your code, each similar to the
one shown in the previous Excel worksheet I uploaded for you?
I downloaded that spreadsheet and here is what I needed to change:
Your ranges were C8:F35 and H8:O62 for tableA and B. Also your "balance" was named "base". New code (downloaded to d:\temp\workbook2.xlsx) edited to match ranges and "balance" to "base":
* Get the data from given excel filename and ranges
* first range is tableA, second one is tableB
GetDataFromExcel("d:\temp\WorkBook2.xlsx", "Sheet1$C8:F35", "Sheet1$H8:O62")
* Now data is in cursors csrA and crsB do the calculation in these
DoCalculation()
* Done. Show the results selecting and browsing the crsA and B
Select crsA
Browse
Select crsB
Browse
* Get specific fields only from crsB
Select loadId, id, ord, firstVal, roundedVal, state ;
from crsB ;
into cursor crsBCustom ;
nofilter
browse
* Check data from both cursors (join)
* I chose the fields as I see fit
* ta and tb are local aliases for crsA and crsB
* helping to write shorter SQL in this case
Select tb.LoadId, tb.Id, ta.base, ta.load, ;
tb.firstValue, tb.roundVal, ;
ta.State as StateA, tb.State as StateB ;
from crsA ta ;
inner join crsB tb on ta.Id = tb.Id ;
order by tb.Id, tb.Ord ;
into cursor crsBoth ;
NoFilter
browse
* Does the specific calculations on specific data
Procedure DoCalculation
*1st value[1] = If(Type[1]=0, Base[1]*fact[1], Base[1]*fact[1]/(1-fact[1]))
*rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
*rounded value[1] = If(1st value[1]>0, rounddown(1st value[1], 1), roundup(1st value[1],2)
*load[1] = Base[1] + rounded value[1]
* i > 1 - ord > 1
*1st value[i] = If(Type[i]=0, load[i-1]*fact[i], load[i-1]*fact[i]/(1-fact[i]))
*rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
*load[i+1] = load[i] + rounded value[i+1] (i >= 1)
*declare local variable
Local lnBase
* select crsB and create an index there
Select CrsB
Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag ALinkB
* select crsA as parent and link to crsB
* using the "id" part of index
Select crsA
Set Relation To Padl(Id,10,'0') Into CrsB
* start looping the rows
Scan
* working with a new Id (1, 2, ...)
* save base value to m.lnBase
lnBase = crsA.Base
* select crsB and start looping the rows there
* because of the index in effect and the relation created
* pointer would be on the first crsB row with a matching Id
* and since Ord is also part of the index the first row of
* given Id
* Limit the looping in crsB (child table) to Id in crsA
* using WHILE clause
Select CrsB
Scan While Id = crsA.Id
* do replacing starting on first row of this Id (Ord=1)
* we don't have any scope clauses in replace, thus
* we are doing "single row" updates
Replace ;
firstValue With m.lnBase*fact / Iif(!Type, 1, 1-fact), ;
roundVal With Iif(firstValue > 0, ;
roundDown(firstValue,1), ;
roundUp(firstValue, 2))
* after each replace update m.lnBase value
* to use in next row
lnBase = m.lnBase + CrsB.roundVal
Endscan
* completed updating crsB
* select crsA and also update crsA.base with final 'load' value
Select crsA
Replace Load With m.lnBase
Endscan
* Update state to 'Calculated'
Update crsA set state = 'Calculated'
Update crsB set state = 'Calculated'
Endproc
* Get data from excel with given filename and ranges
* This code is not generic and expects the
* data to be in a specific format.
* Does not do any error check
Procedure GetDataFromExcel(tcExcelFileName, tcTableARange, tcTableBRange)
* declare and define the connection string to excel
Local lcConStr
lcConStr = ;
'Provider=Microsoft.ACE.OLEDB.12.0;'+;
'Data Source='+Fullpath(m.tcExcelFileName)+';'+;
'Extended Properties="Excel 12.0;HDR=Yes"'
* Declare and define the 2 SQL needed to get data for A and B
* rename the fields in SQL for easier handling
Local lcSQLA, lcSQLB
TEXT to lcSQLA textmerge noshow
Select [id], [base], [load], [state]
from [<< m.tcTableARange >>]
ENDTEXT
TEXT to m.lcSQLB textmerge noshow
select
[Load Id] as LoadId,
[Id], [Ord], [Fact], [Type],
[1st value] as firstValue,
[Rounded value] as roundVal,
[State]
from [<< m.tcTableBRange >>]
ENDTEXT
* Execute the queries and place results in given cursors
ADOQuery(m.lcConStr, m.lcSQLA, "crsTableA")
ADOQuery(m.lcConStr, m.lcSQLB, "crsTableB")
* Sanitize the cursors a bit
* (OledB query would assign rather generic datatypes)
Select Cast(Id As Int) As Id, Cast(Base As Double) As Base, ;
Cast(Load As Double) As Load, Cast(State As c(50)) As State ;
from crsTableA ;
into Cursor crsA ;
readwrite
Select Cast(LoadId As Int) As LoadId, ;
Cast(Id As Int) As Id, Cast(ord As Int) As ord, ;
Cast(fact As Double) As fact, Cast(Type As logical) As Type, ;
Cast(firstValue As Double) As firstValue, ;
Cast(roundVal As Double) As roundVal, ;
Cast(State As c(50)) As State From crsTableB ;
into Cursor CrsB ;
readwrite
Use In (Select('crsTableA'))
Use In (Select('crsTableB'))
Endproc
* roundUp and down custom functions
* RoundUp and Down excel style
* Not correct math wise IMHO
Procedure roundUp(tnValue, tnPlaces)
Local lnResult, lnValue
lnValue = Abs(m.tnValue)
If Round(m.lnValue, m.tnPlaces) != m.lnValue
lnValue = Round(m.lnValue+((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
Return Sign(m.tnValue) * m.lnValue
Endproc
Procedure roundDown(tnValue, tnPlaces)
Local lnResult, lnValue
lnValue = Abs(m.tnValue)
If Round(m.lnValue, m.tnPlaces) != m.lnValue
lnValue = Round(m.lnValue-((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
Return Sign(m.tnValue) * m.lnValue
Endproc
* Generic function to query a given data source
* and place results in a cursor
Procedure ADOQuery(tcConStr,tcQuery,tcCursorName)
Local oConn As 'ADODB.Connection'
Local oRS As ADODB.RecordSet
oConn = Createobject('ADODB.Connection')
oConn.Mode= 1 && adModeRead
oConn.Open( m.tcConStr )
oRS = oConn.Execute(m.tcQuery)
RS2Cursor(oRS,m.tcCursorName)
oRS.Close
oConn.Close
Endproc
* Helper function to ADOQuery to convert
* an ADODB.Recordset to a VFP cursor
Procedure RS2Cursor(toRS, tcCursorName) && simple single cursor - not intended for complex ones
tcCursorName = Iif(Empty(m.tcCursorName),'ADORs',m.tcCursorName)
Local xDOM As 'MSXML.DOMDocument'
xDOM = Createobject('MSXML.DOMDocument')
toRS.Save(xDOM, 1)
Xmltocursor(xDOM.XML, m.tcCursorName)
Endproc
This is the whole code. Just changing the filepath and name to yours, select all the code, right click and execute selection to see results. Or save it as a prg, say ImportMyExcel.prg and run it:
ImportMyExcel()
You could see the results I have so I didn't upload any results.
Also, is Procedure RS2Cursor(toRS, tcCursorName) intended to generate
the 2 output tables? Why do we need this procedure though: Procedure
ADOQuery(tcConStr,tcQuery,tcCursorName)?
Well those procedures are a little tricky for a newcomer (maybe not). I think you should know the history of VFP, cursors, cursor adapters, converting ADO recordset to a cursor etc (probably advanced level). I don't know, those were the procedures I came up with and published also on the foxite link that I gave to you. Just think they are black boxed (like a built-in one) functions doing they are work. ADOQuery's work is to simply query an OLEDB source and return the result as a cursor. With a cursorAdapter you might not need such a procedure but that procedure was designed before CursorAdapter existence.
Two more questions please: 1) where does the m come from in
m.lnBalance?
m. explicitly notifies the compiler that it is a memory variable. It is referred to as MDOT. There are developers who claim it is not needed and generally it leads to long running discussions (and likely you would find my name in those discussions). Up until today nobody could show and\or demonstrate me why we shouldn't or we don't need to use it. If you believe me it is not a preference but a thing that you should use.
2) Don't we need to define crsTableA? Or you meant we can use the
CREATE Table tableA in your previous code to make crsTableA valid?
No. There is no table in that code. We read the data from excel into a cursor (crsTableA and crsTableB initially) and then sanitize into 2 cursors crsA and crsB. All of them are cursors. Cursors are like tables but are not persisted on disk. They may even spend all their life in memory and are gone when you close them. Here I preferred cursors because without harming any real data you could run N times and check your results. When you are satisfied persisting the data is as simple as a "Select ... into" or "insert into ..." (there are more ways too) a table. Even in the case of a table you don't need to use "Create Table ...". A "select Into ..." command can select the data from a source and save it to a table by creating it (like a combined 'create table ...' and then 'insert into ...').
Also, I saw that B9:E12 does not match the range of tableA or tableB
in the Excel spreadsheet I uploaded for you before. Am I missing
something here?
It matched your original samples if you think data starts at B9 and G9 respectively.
I have another question: can you please clarify on what these lines
do: Select CrsB Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag
ALinkB Select crsA Set Relation To Padl(Id,10,'0') Into CrsB.
I think I explained this part in the previous question. I will soon comment the code itself.

Adding as another answer to prevent clutter. I can do further explanations if you need to. Here I used the Excel ranges that would match to sample data. You would replace the range with the actual one (as well as the excel filename):
GetDataFromExcel("c:\myFolder\myExcel.xlsx", "B9:E12", "G9:N19")
DoCalculation()
Select crsA
Browse
Select crsB
Browse
Procedure DoCalculation
*1st value[1] = If(Type[1]=0, balance[1]*fact[1], balance[1]*fact[1]/(1-fact[1]))
*rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
*rounded value[1] = If(1st value[1]>0, rounddown(1st value[1], 1), roundup(1st value[1],2)
*load[1] = balance[1] + rounded value[1]
* i > 1 - ord > 1
*1st value[i] = If(Type[i]=0, load[i-1]*fact[i], load[i-1]*fact[i]/(1-fact[i]))
*rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
*load[i+1] = load[i] + rounded value[i+1] (i >= 1)
Local lnBalance
Select CrsB
Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag ALinkB
Select crsA
Set Relation To Padl(Id,10,'0') Into CrsB
Scan
lnBalance = crsA.Balance
Select CrsB
Scan While Id = crsA.Id
Replace ;
firstValue With m.lnBalance*fact / Iif(!Type, 1, 1-fact), ;
roundVal With Iif(firstValue > 0, ;
roundDown(firstValue,1), ;
roundUp(firstValue, 2))
lnBalance = m.lnBalance + CrsB.roundVal
Endscan
Select crsA
Replace Load With m.lnBalance
Endscan
Endproc
Procedure GetDataFromExcel(tcExcelFileName, tcTableARange, tcTableBRange)
Local lcConStr
lcConStr = ;
'Provider=Microsoft.ACE.OLEDB.12.0;'+;
'Data Source='+Fullpath(m.tcExcelFileName)+';'+;
'Extended Properties="Excel 12.0;HDR=Yes"'
Local lcSQLA, lcSQLB
TEXT to lcSQLA textmerge noshow
Select [id], [balance], [load], [state]
from [Sheet1$<< m.tcTableARange >>]
ENDTEXT
TEXT to m.lcSQLB textmerge noshow
select
[Load Id] as LoadId,
[Id], [Ord], [Fact], [Type],
[1st value] as firstValue,
[Rounded value] as roundVal,
[State]
from [Sheet1$<< m.tcTableBRange >>]
ENDTEXT
ADOQuery(m.lcConStr, m.lcSQLA, "crsTableA")
ADOQuery(m.lcConStr, m.lcSQLB, "crsTableB")
Select Cast(Id As Int) As Id, Cast(Balance As Double) As Balance, ;
Cast(Load As Double) As Load, Cast(State As c(1)) As State ;
from crsTableA ;
into Cursor crsA ;
readwrite
Select Cast(LoadId As Int) As LoadId, ;
Cast(Id As Int) As Id, Cast(ord As Int) As ord, ;
Cast(fact As Double) As fact, Cast(Type As logical) As Type, ;
Cast(firstValue As Double) As firstValue, ;
Cast(roundVal As Double) As roundVal, ;
Cast(State As c(1)) As State From crsTableB ;
into Cursor CrsB ;
readwrite
Use In (Select('crsTableA'))
Use In (Select('crsTableB'))
Endproc
Procedure roundUp(tnValue, tnPlaces)
If Round(m.tnValue, m.tnPlaces) = m.tnValue
Return m.tnValue
Else
Return Round(m.tnValue+((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
Endproc
Procedure roundDown(tnValue, tnPlaces)
If Round(m.tnValue, m.tnPlaces) = m.tnValue
Return m.tnValue
Else
Return Round(m.tnValue-((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
Endproc
Procedure ADOQuery(tcConStr,tcQuery,tcCursorName)
Local oConn As 'ADODB.Connection'
Local oRS As ADODB.RecordSet
oConn = Createobject('ADODB.Connection')
oConn.Mode= 1 && adModeRead
oConn.Open( m.tcConStr )
oRS = oConn.Execute(m.tcQuery)
RS2Cursor(oRS,m.tcCursorName)
oRS.Close
oConn.Close
Endproc
Procedure RS2Cursor(toRS, tcCursorName) && simple single cursor - not intended for complex ones
tcCursorName = Iif(Empty(m.tcCursorName),'ADORs',m.tcCursorName)
Local xDOM As 'MSXML.DOMDocument'
xDOM = Createobject('MSXML.DOMDocument')
toRS.Save(xDOM, 1)
Xmltocursor(xDOM.XML, m.tcCursorName)
Endproc
EDIT: I edited the other answer for the comments beneath it. Now for your questions:
Shouldn't GetDataFromExcel("c:\myFolder\myExcel.xlsx", "B9:E12", "G9:N19") get called after the Procedure Procedure
GetDataFromExcel(tcExcelFileName, tcTableARange, tcTableBRange)??
No. Procedures are always placed after normal execution code in a prg file. IOW if your PRG has:
Do Something
* ...
Procedure SomeProcedure
* ...
endproc
Procedure Something
endproc
Code starts with calling Something and executes the lines after that up until it sees the first Procedure call (or FUNCTION, DEFINE CLASS). Something might be a procedure (as in the sample) or a separate prg.
Shouldn't Procedure roundUp and Procedure roundDown get called before roundDown(firstValue,1), ; roundUp(firstValue, 2))??
No, same as the above. What you say more looks like the rules of core C.
Does the left ID on this line Scan While Id = crsA.Id come from CrsB?? Also, why is there the change from crsA to CrsA? Is this a
typo? – user177196 5 mins ago
Yes. it comes from crsB. But in a sense, you are right I should be explicit and include the alias there as:
Scan while crsB.Id = crsA.Id
In VFP if you don't include an alias, then the one that is current is assumed.
We are scanning crsA in outer loop. Then we are switching to crsB and scanning there, after we are done switching back to crsA (actually scan command remembers the alias it is associated and does this switch when it hits endscan implicitly but I prefer to be explicit).
EDIT:
Select CrsB
Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag ALinkB
Select crsA
Set Relation To Padl(Id,10,'0') Into CrsB
On first two lines we are selecting crsB cursor and creating an index on it. Index expression contains both the Id and Old fields. VFP doesn't support multiple column names in an index key, but it supports expressions. Padding both fields with 10 zeros we are creating keys like:
Id, Ord: 2,3 as an example has a key 00000000020000000003
We could make it smaller but anyway since not knowing how much big the Id,Ord could be made it 10 in length to fit any 32 bits integer value.
Then on 3rd, 4th lines we are selecting cursor crsA and then setting relation from crsA into crsB via the expression Padl(Id,10,'0') - Id padded with 10 zeros. From crsA Id:1 has a relation key of 0000000001 then (matching all index keys that start with 0000000001 whatever the Ord part is - BTW having Ord in index too makes sure that they are ordered by Ord).
In effect, when the record pointer points to Id:1 in crsA, in crsB automatically those with Id:1 are matched (best observed with a browse - browse crsB then select crsA and browse. As you navigate in crsA, you would see the browse window for crsB would show only the rows with matching Id). Conceptually it looks like this controlling the record pointer in both cursors:
crsA (id) crsB (Id, Ord)
1 ----+------- 1,1
+------- 1,2
+------- 1,3
+------- 1,4
2 ----+------- 2,1
+------- 2,2
+------- 2,3
I used that because it is a powerful feature of VFP was an easier way to express what you want. The same could be achieved by using SQL Update too, however, VFP's SQL is not that much powerful and would be much more complex to write (For [1] easy but for > 1 case it gets complex - it was also not so easy in other backends too in distant past but in time, backends like postgreSQL, MS SQL server ... etc have gained much more support for such queries).

Well you have a long question, containing multiple questions within. I will try to reply in pieces (editing my answer in between), since it would be a long answer (might even be good to divide into multiple answers).
First, your create table syntax was close but incorrect. VFP (it is not VFB but V FP by the way), does not support spaces in field names (unless it is a long fieldname). Using field names with spaces would just be asking for trouble. So prefer not using them. It would look like:
CREATE TABLE tableA;
( id int, ;
balance double, ;
load C(240), ;
state C(240))
CREATE TABLE tableB;
( Load id int, ;
id int, ;
ord int, ;
fact double, ;
type int ;
firstValue C(240),;
roundedVal C(240), ;
state C(240))
Note that after final field you don't have comma and ; in VFP means continue the command on next line (so removed in last field definition lines). I also changed the 2 field names to be compatible with a free table's field naming (max 10 in length and must start with a letter, no spaces). Easier to use the tables this way.or cursors provided you do it in one shot and do not try to change the structure later.
If you want to use longfieldnames then you can do that just as you do with free tables but the table needs to be part of a database. It would also work for cursors provided you do that in one shot and do not attempt to alter the structure afterwards.
While I added code there to create TableA, TableB, you are saying those tables' data would come from Excel. You didn't really give detailed information about the Excel part of it (how data is represented-is that as a data ranges?). There is a great probability that you create these two tables simply by selecting the data from Excel using ODBC/OLEDB directly.
For getting data from Excel I posted some detailed information on Foxite, you can check the post in this link. I am not giving any sample code here as I don't yet know the Excel part really.
Assuming we got the data from Excel let's check other parts (BTW in table B id is called a Foreign Key, not primary. It links the rows in TableB top TableA).
1st value[i] = If(Type[i]=0, balance[i]*fact[i], balance[i]*fact[i]/(1-fact[i]))
We can use either REPLACE command (xBase command) or SQL Update command to accomplish this. Let's do not think about the differences here (not worth really) and choose SQL Update to do job (the syntax would be reusable in other databases too - say MS SQL server, postgreSQL, mySQL ...).
Update tableB ;
set firstValue = iif( type = 0, ;
tableA.balance * fact, ;
tableA.balance * fact/(1-fact)) ;
from tableA ;
where tableA.Id = tableB.Id
Or slightly simplified:
Update tableB ;
set firstValue = tableA.balance * fact / ;
iif( type = 0, 1, (1-fact)) ;
from tableA ;
where tableA.Id = tableB.Id
Note that VFP would execute this expression per row so we don't need the [i] (array identifier) that you have in your pseudocode.
Next one:
rounded value[i] = If(Type[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
Would be translated in the same manner:
Update tableB ;
set roundVal = iif(type > 0, ;
rounddown(firstValue,1), ;
roundup(firstValue,2)) ;
from tableA ;
where tableA.Id = tableB.Id
However, VFP doesn't have roundup and rounddown functions, I only wrote these as a conceptual translation. What you can do is to create two custom functions that does RoundUp and RoundDown. There are multiple ways to write these functions and IMHO the easiest would be to write them as 2 separate .prg files where those prg files are in your search path when you execute the above SQL command:
RoundUp.prg
Lparameters tnValue, tnPlaces
If Round(m.tnValue, m.tnPlaces) = m.tnValue
Return m.tnValue
Else
Return Round(m.tnValue+((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
RoundDown.prg
Lparameters tnValue, tnPlaces
If Round(m.tnValue, m.tnPlaces) = m.tnValue
Return m.tnValue
Else
Return Round(m.tnValue-((10^-(m.tnPlaces+1))*5), m.tnPlaces)
Endif
The functions in the link you provided doesn't seem right to me for the job (but was not easy to understand and test so didn't spend time on checking thoroughly).
I am not sure if one sheet containing both tables is good. I don't remember off the top of my head, if Tables collection was a member of the WorkSheet or WorkBook. If WorkSheet then that would do. I can check and write sample code for that later (possibly tomorrow).
You could use datatype LOGICAL (l) for Type. In MS SQL server and other backends it correspond to bit (1 or 0). Internally stored as boolean but in expressions used as .T./.F. (true\false symbolic representation in VFP. On code you could simply use it as:
iif( type, ...
same as saying iif(type = .T., ...) - as in Type > 0. And:
iif( !type, ...
same as saying iif( type = .F., ...) or iif( type NOT equal to .T., ... - as in Type = 0.
I didn't use inner join in this case, because it is sufficient to use a from TableA where here (same in other backends, although general tendency is to write that using join).
EDIT: Added the code as another answer.
As per your questions:
Inner join is not needed to be explicitly defined, there is an implicit join there. Instead of writing an SQL update, I preferred to utilize VFP's xBase capabilities and used scan...endscan instead (could do with SQL but would be more complex).
Yes it means putting those 2 RoundUp.prg and RoundDown.prg files into the same directory path of our main file code above BUT only if main file code is in current directory or in search path. To make it more clear, consider:
c:\SomeFolder\RoundUp.prg
c:\SomeFolder\RoundDown.prg
c:\ANOTHERFolder\Main.prg
and you are in:
c:\YetAnotherFolder
If you call main.prg like this:
do ('c:\ANOTHERFolder\Main.prg')
It needs to find RoundUp, RoundDown and it can if c:\Somefolder is included in SET('PATH') - ie:
Set path to c:\SomeFolder;c:\VFPHomeFolderMaybe
Or if you don't want to think of pathing you could include those RoundUp\Down code as procedure in the code (as I did in the code in the other answer - note that in VFP there is no difference between a PROCEDURE and a FUNCTION. You are free to choose either one. Some developers prefer to use FUNCTION for those that return a value - but in fact any PROCEDURE\FUNCTION returns a value so let's say those that are used for a return value.)
I don't think logical type mean "1" or "0" automatically, correct? If
that's the case, I would have to leave it as int type, because the
input is always defined as 1 or 0 for type column.
Well, that is hard to answer formally. In VFP boolean data
type is defined by literals .F. and .T. You can cast(aBoolean to int) and you get 0 and 1 respectively. Or you can cast(1 as logical) to get .T. IOW 1\0 and .T..F. are interchangeable in a sense. It all depends where you want to use it. If data is coming from external source, it would come in as 1\0. Just by casting or getting it into column of datatype logical (implicit cast) it is treated as .T..F. Or you are sending data from a logical to an external source (say an XML, MS SQL server, postgreSql, other OLEDB\ODBC datasource) then .T..F. is casted as 1\0.

Related

How to update a column based on count of another column from another table using any condition in visual foxpro?

SET STEP ON
Close Databases
Cd e:\ksv\Data
Use ohd IN 0 shared
Use cus IN 0 shared
SELECT * FROM cus inTO TABLE tempcus
ALTER table tempcus ADD COLUMN totalsold int
UPDATE tempcus SET totalsold=RECCOUNT(ohd.status='5') WHERE tempcus.customer=ohd.customer
SELECT * FROM tempcus INTO CURSOR cur
BROWSE
I have tried the above code and i am getting an error saying invalid table number , can someone help me with this.
RECCOUNT() function only gives you a record count for a workarea# or alias, e.g. RECCOUNT("ohd") will give total record count of ohd table.
You want something like:
SELECT COUNT(*) totalsold,cus.customer FROM cus JOIN ohd ON cus.customer=ohd.customer WHERE ohd.cstatus='5' INTO CURSOR cur GROUP BY cus.customer
BROWSE
In VFP, there is a REPLACE command which allows you to replace one or more fields based on whatever values, even if variable results from other queries... or fixed values. Ex: This works on whatever table is the current selected work area and whatever row it is on, unless you apply a scope clause (for condition).
Sample only for context of REPLACE command
use SomeOtherTable in 0 shared
select SomeOtherTable
replace SomeNumberField with 1.234, SomeStringField with 'Hello', etc...
or with condition (bogus, just to show you can apply to multiple rows.
replace SomeNumberField with SomeNumberField * 3 for StatusField = 'X'
Now, back to your original content. It appears you are trying to get a result temporary table with a total number of records from the OHD table where the status = 5. VFP allows you to run SQL-Select into temporary read-write "cursor" tables, that when closed will delete themselves, yet allows them to be modified (such as browse, or other direct manipulation such as with REPLACE command).
You can get the counts you are looking for with a left-join to a query result set. To help you see the pieces individually, I will do in steps so you can follow, then join into one final.
First, you want a count of all records in the OHD table with status = 5 per customer... the "o" and "c" are ALIAS references in the SQL queries below
SET STEP ON
Close Databases
Cd e:\ksv\Data
Use ohd IN 0 shared
Use cus IN 0 shared
select ;
o.customer, ;
count(*) NumberOfRecords ;
from ;
OHD o ;
where ;
o.status = '5' ;
group by ;
o.customer ;
into ;
cursor C_JustCountsPerCustomer READWRITE
The "into cursor" part above will create a workable table and give it the name of "C_JustCountsPerCustomer". I have always tried to use "C_" as a prefix to the table name for the sole purpose to know it is a temporary "CURSOR" result and not a real final table, but that is just my historical naming convention applied.
Now, if you did a browse of this result, you would see each customer's ID and how many with status = '5'. The resulting table "cursor" is like any other table opened and you could index as you need and browse, etc. But this only will give records that HAD status of '5'. But you could have more customers that never had a '5' status record.
Now, getting all your customers and their respective counts into one result table "cursor". I can take the above query and use within a SQL-Select via a LEFT-JOIN meaning, give me everything from the first table (left-side), regardless of a matching record found in the second table (right-side). But if there is a match to the right side, give me those values too.
select ;
c.*, ;
NVL( C_tmpResult.NumberOfRecords, 0000 ) as NumberOfRecords ;
from;
CUS c ;
LEFT JOIN ;
(select ;
o.customer, ;
count(*) NumberOfRecords ;
from ;
OHD o ;
where ;
o.status = '5' ;
group by ;
o.customer ) C_tmpResult ;
ON ;
c.customer = C_tmpResult.customer ;
into ;
cursor C_CusWithCounts readwrite
So, you can see the left-join uses the first query to get the counts, but the primary part of the query gets records from the customer table (alias "c") and is joined on the common customer id column. The "NVL()" states if there IS a value in the C_tmpResult table for the given customer, grab that. If not, assume a count of 0. Yes, I explicitly have 0000 to force a minimum final width to 4 digits in the result in case the first customer does not have any and it make the column only 1 digit wide.
Anyhow, at the end, you would have your result temporary table (cursor) with the customer information AND the count I think you are looking for. You should be able to do a browse and good to go.

Why a fast function becomes slow when inside ANY?

I have a function that returns an array of integers. If I call:
select getmun_gaz(12554)
it returns an 'integer[]' in 33 ms:
{1547,1564}
Exactly the same result I get with:
select array[1547,1564]
The same type (integer[]), the same values ({1547,1564}), only a slight time difference. So, why this works instantly:
select *
from loc
where id = any(array[1547,1564])
but this takes more than 2 minutes?
select *
from loc
where id = any(getmun_gaz(12554))
I will post the contents of the function and the structure of the tables if needed. The table from where this data comes is indeed large, but this made no difference in the beginning, so why it does in the end?
The function:
CREATE OR REPLACE FUNCTION getmun_gaz(gaz_id integer)
RETURNS integer[] AS
$BODY$
with recursive locpais as (
select l.id, l.nome, l.tipo tid, lt.tipo, lp.pai, 1 as profund
from loc l
left join locpai lp on lp.loc = l.id
left join loctipo lt on lt.id = l.tipo
where l.id = gaz_id
union
select l.id, l.nome, l.tipo tid, lt.tipo, lp.pai, profund+1
from loc l
left join locpai lp on lp.loc = l.id
left join loctipo lt on lt.id = l.tipo
join locpais p on (l.id = p.pai)
)
select array_agg(id) from locpais
where tid = 8
$BODY$
LANGUAGE sql VOLATILE
COST 100;
I'll add the table structures later, if needed. Gotta go now.
It's because of your function's volatility.
Declaring a function as VOLATILE tells Postgres that it has side-effects, and/or its value changes every time you call it (e.g. random()). This means that in your query, Postgres has to call it once for every single row in the loc table, and can't use an index on id to speed things up.
If your function only contains SELECTs, with no calls to other VOLATILE functions, then you can declare it as STABLE. This tells Postgres that the result will stay the same when called multiple times within a single query; the query planner can now call it once at the start and re-use the value for each id comparison, and can also optimimise away most of these comparisons by using an index.
The problem with second one is if function isnt IMMUTABLE db need to calculate his value for each row
At difference with any(array[1547,1564]) is a constant.

PL SQL Query taking too long to execute

Person table have 2 Million rows and 3 Million rows in Finance and on HR schema respectively. Now I'm going to update the Person status on Finance schema with the person_status on the HR schema. Query is running on M3000 Server with 32 GB RAM with Solaris 10 and Oracle 11.2.2. It took 117 hours and still runnning. How can I optimize this query. I have created index on person_no.
DECLARE
CURSOR c IS
SELECT p1.person_no AS "PersonNo1"
, p2.person_no AS "PersonNo2"
, p2.person_status AS "P2_PERSON_STATUS"
FROM person p1
, hr.person p2
WHERE (lower(p1.person_no) = lower(p2.person_no)
OR substr(p1.person_no, instr(p1.person_no, '-') + 1, length(p1.person_no))
= substr(p2.person_no, instr(p2.person_no, '-') + 1, length(p2.person_no)));
BEGIN
FOR i IN c LOOP
UPDATE person
SET person_status_id = decode(lower(i.P2_PERSON_STATUS), 'y', 1, 'n', 2)
WHERE (lower(person_no) = lower(i.PersonNo2)
OR substr(person_no, instr(person_no, '-') + 1, length(person_no))
= substr(i.PersonNo2, instr(i.PersonNo2, '-') + 1, length(i.PersonNo2)));
COMMIT;
END LOOP;
END;
/
There is several things you can do to optimize this. It would be helpful to see a query plan.
You can use a merge statement. Then you have only one statement and you can work optimizing this statement. Something like this
merge into person
using hr.person person_hr
on (person.person_no=person_hr.person_no)
when matched then
update set person_status_id=decode(lower(person_status),'y',1,'n',2);
You have to adjust the on part to match your where statement.
This might then be the source of best optimization. Maybe you have to create an index for lower and also for substr.
Something like
CREATE INDEX person_idx
ON person (lower(person_no))
Also of course for the substrings etc. Hope this helps.
In the code provided as question i can see there are two aspects.
1) Query Optimization
2) ROW BY ROW Update --> Never Recommended for such a huge records count..
Approach to Optmization.
1) Use Merge Statement as it will Bundle up the block into pure SQL thus will enhance your query.
2) Use Function based Index as i can see there are lot of functions like "LOWER" is used in the query "WHERE" conditions.
NOTE : [Over INDEXING may also cause deterioration in the performance}
3) Last but not the least if you have to go by Anonymous block only then avoid using ROW BY ROW Update. Try using Bulk collect Options as illustrated below.
Since i do not have workstation handy with me please bear with any Syntax Errors.
Let me know if this helps.
DECLARE
TYPE p1
IS
TABLE OF <table_name>.<COLUMN_NAME>%TYPE;
PersonNo1 p1;
TYPE p2
IS
TABLE OF <table_name>.<COLUMN_NAME>%TYPE;
PersonNo2 p2;
TYPE status
IS
TABLE OF <table_name>.<COLUMN_NAME>%TYPE;
P2_PERSON_STATUS status;
BEGIN
SELECT p1.person_no AS "PersonNo1" ,
p2.person_no AS "PersonNo2" ,
p2.person_status AS "P2_PERSON_STATUS" BULK COLLECT
INTO PersonNo1,
PersonNo2,
P2_PERSON_STATUS
FROM person p1 ,
hr.person p2
WHERE (lower(p1.person_no) = lower(p2.person_no)
OR SUBSTR(p1.person_no, instr(p1.person_no, '-') + 1, LENGTH(p1.person_no)) = SUBSTR(p2.person_no, instr(p2.person_no, '-') + 1, LENGTH(p2.person_no)));
FORALL I IN PersonNo1.FIRST..PersonNo1.LAST
UPDATE person
SET person_status_id = DECODE(lower(P2_PERSON_STATUS(I)), 'y', 1, 'n', 2)
WHERE (lower(person_no) = lower(PersonNo2(I))
OR SUBSTR(person_no, instr(person_no, '-') + 1, LENGTH(person_no)) = SUBSTR(PersonNo2(I), instr(PersonNo2(I), '-') + 1, LENGTH(PersonNo2(I))));
COMMIT;
END;

Using the data from a for loop as a table in the From in a Select

I am converting some code in Access over to Oracle, and one of the queries in Access uses a table that I am unable to use in Oracle. I am unable to create new tables, so I am trying to figure out a way to use the logic behind the table in the FOR section of my select.
The logic of the table is similar to:
FOR i = 1 To 100
number = number + 1
.AddNew
!tbl_number = number
NEXT i
I'm trying to convert this to oracle, and so far I have:
FOR i in 1 .. 100 LOOP
number := number + 1;
--This is where I am stuck; How do I simulate the table part
END LOOP;
I was thinking a cursor or a record would be the answer, but I can't seem to figure out how to implement that. In the end I basically want to have:
SELECT
table.number
FROM
(
--My for loop logic
) table
EDIT
The calculation is a bit more complicated; that was just an example. They aren't actually sequential, and there isn't really a pattern to rows.
EDIT
Here is a more complicated version of the for loop which is closer to what I'm actually doing:
FOR i in 1 .. 100 LOOP
number1 := number1 + 7;
number2 := (number2 + 8) / number1;
--This is where I am stuck; How do I simulate the table part
END LOOP;
You could use a recursive query (assuming you are on Oralce 11gR2 or later):
with example(idx, number1, number2) as (
-- Anchor Section
select 1
, 1 -- initial value
, 2 -- initial value
from dual
union all
-- Recursive Section
select prev.idx + 1
, prev.number1 + 7
, (prev.number2 + 8) / prev.number1
from example prev
where prev.idx < 100 -- The Guard
)
select * from example;
In the Anchor section set all the values for your first record. Then in the Recursive section setup the logic to determine the next records values as a function of the prior records values.
The Anchor section could select the initial values from some other table rather than being hard coded as in my example.
The recursive section needs to select from the named subquery (in this case example) but may also join to other tables as needed.
You need to generate a set with sequential integer numbers. Maybe you can use this (for Oracle 10g and above):
SELECT
ROWNUM NUM
FROM
DUAL D1,
DUAL D2
CONNECT BY
(D1.DUMMY = D2.DUMMY AND ROWNUM <= 100)

How to put more than 1000 values into an Oracle IN clause [duplicate]

This question already has answers here:
SQL IN Clause 1000 item limit
(5 answers)
Closed 8 years ago.
Is there any way to get around the Oracle 10g limitation of 1000 items in a static IN clause? I have a comma delimited list of many of IDs that I want to use in an IN clause, Sometimes this list can exceed 1000 items, at which point Oracle throws an error. The query is similar to this...
select * from table1 where ID in (1,2,3,4,...,1001,1002,...)
Put the values in a temporary table and then do a select where id in (select id from temptable)
select column_X, ... from my_table
where ('magic', column_X ) in (
('magic', 1),
('magic', 2),
('magic', 3),
('magic', 4),
...
('magic', 99999)
) ...
I am almost sure you can split values across multiple INs using OR:
select * from table1 where ID in (1,2,3,4,...,1000) or
ID in (1001,1002,...,2000)
You may try to use the following form:
select * from table1 where ID in (1,2,3,4,...,1000)
union all
select * from table1 where ID in (1001,1002,...)
Where do you get the list of ids from in the first place? Since they are IDs in your database, did they come from some previous query?
When I have seen this in the past it has been because:-
a reference table is missing and the correct way would be to add the new table, put an attribute on that table and join to it
a list of ids is extracted from the database, and then used in a subsequent SQL statement (perhaps later or on another server or whatever). In this case, the answer is to never extract it from the database. Either store in a temporary table or just write one query.
I think there may be better ways to rework this code that just getting this SQL statement to work. If you provide more details you might get some ideas.
Use ...from table(... :
create or replace type numbertype
as object
(nr number(20,10) )
/
create or replace type number_table
as table of numbertype
/
create or replace procedure tableselect
( p_numbers in number_table
, p_ref_result out sys_refcursor)
is
begin
open p_ref_result for
select *
from employees , (select /*+ cardinality(tab 10) */ tab.nr from table(p_numbers) tab) tbnrs
where id = tbnrs.nr;
end;
/
This is one of the rare cases where you need a hint, else Oracle will not use the index on column id. One of the advantages of this approach is that Oracle doesn't need to hard parse the query again and again. Using a temporary table is most of the times slower.
edit 1 simplified the procedure (thanks to jimmyorr) + example
create or replace procedure tableselect
( p_numbers in number_table
, p_ref_result out sys_refcursor)
is
begin
open p_ref_result for
select /*+ cardinality(tab 10) */ emp.*
from employees emp
, table(p_numbers) tab
where tab.nr = id;
end;
/
Example:
set serveroutput on
create table employees ( id number(10),name varchar2(100));
insert into employees values (3,'Raymond');
insert into employees values (4,'Hans');
commit;
declare
l_number number_table := number_table();
l_sys_refcursor sys_refcursor;
l_employee employees%rowtype;
begin
l_number.extend;
l_number(1) := numbertype(3);
l_number.extend;
l_number(2) := numbertype(4);
tableselect(l_number, l_sys_refcursor);
loop
fetch l_sys_refcursor into l_employee;
exit when l_sys_refcursor%notfound;
dbms_output.put_line(l_employee.name);
end loop;
close l_sys_refcursor;
end;
/
This will output:
Raymond
Hans
I wound up here looking for a solution as well.
Depending on the high-end number of items you need to query against, and assuming your items are unique, you could split your query into batches queries of 1000 items, and combine the results on your end instead (pseudocode here):
//remove dupes
items = items.RemoveDuplicates();
//how to break the items into 1000 item batches
batches = new batch list;
batch = new batch;
for (int i = 0; i < items.Count; i++)
{
if (batch.Count == 1000)
{
batches.Add(batch);
batch.Clear()
}
batch.Add(items[i]);
if (i == items.Count - 1)
{
//add the final batch (it has < 1000 items).
batches.Add(batch);
}
}
// now go query the db for each batch
results = new results;
foreach(batch in batches)
{
results.Add(query(batch));
}
This may be a good trade-off in the scenario where you don't typically have over 1000 items - as having over 1000 items would be your "high end" edge-case scenario. For example, in the event that you have 1500 items, two queries of (1000, 500) wouldn't be so bad. This also assumes that each query isn't particularly expensive in of its own right.
This wouldn't be appropriate if your typical number of expected items got to be much larger - say, in the 100000 range - requiring 100 queries. If so, then you should probably look more seriously into using the global temporary tables solution provided above as the most "correct" solution. Furthermore, if your items are not unique, you would need to resolve duplicate results in your batches as well.
Yes, very weird situation for oracle.
if you specify 2000 ids inside the IN clause, it will fail.
this fails:
select ...
where id in (1,2,....2000)
but if you simply put the 2000 ids in another table (temp table for example), it will works
below query:
select ...
where id in (select userId
from temptable_with_2000_ids )
what you can do, actually could split the records into a lot of 1000 records and execute them group by group.
Here is some Perl code that tries to work around the limit by creating an inline view and then selecting from it. The statement text is compressed by using rows of twelve items each instead of selecting each item from DUAL individually, then uncompressed by unioning together all columns. UNION or UNION ALL in decompression should make no difference here as it all goes inside an IN which will impose uniqueness before joining against it anyway, but in the compression, UNION ALL is used to prevent a lot of unnecessary comparing. As the data I'm filtering on are all whole numbers, quoting is not an issue.
#
# generate the innards of an IN expression with more than a thousand items
#
use English '-no_match_vars';
sub big_IN_list{
#_ < 13 and return join ', ',#_;
my $padding_required = (12 - (#_ % 12)) % 12;
# get first dozen and make length of #_ an even multiple of 12
my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l) = splice #_,0,12, ( ('NULL') x $padding_required );
my #dozens;
local $LIST_SEPARATOR = ', '; # how to join elements within each dozen
while(#_){
push #dozens, "SELECT #{[ splice #_,0,12 ]} FROM DUAL"
};
$LIST_SEPARATOR = "\n union all\n "; # how to join #dozens
return <<"EXP";
WITH t AS (
select $a A, $b B, $c C, $d D, $e E, $f F, $g G, $h H, $i I, $j J, $k K, $l L FROM DUAL
union all
#dozens
)
select A from t union select B from t union select C from t union
select D from t union select E from t union select F from t union
select G from t union select H from t union select I from t union
select J from t union select K from t union select L from t
EXP
}
One would use that like so:
my $bases_list_expr = big_IN_list(list_your_bases());
$dbh->do(<<"UPDATE");
update bases_table set belong_to = 'us'
where id in ($bases_list_expr)
UPDATE
Instead of using IN clause, can you try using JOIN with the other table, which is fetching the id. that way we don't need to worry about limit. just a thought from my side.
Instead of SELECT * FROM table1 WHERE ID IN (1,2,3,4,...,1000);
Use this :
SELECT * FROM table1 WHERE ID IN (SELECT rownum AS ID FROM dual connect BY level <= 1000);
*Note that you need to be sure the ID does not refer any other foreign IDS if this is a dependency. To ensure only existing ids are available then :
SELECT * FROM table1 WHERE ID IN (SELECT distinct(ID) FROM tablewhereidsareavailable);
Cheers

Resources