Parsing a large table into smaller tables - sorting

I'm attempting to take a table, that contains numerous nested tables, and place them into an order based on certain Y-coordinate & object type values. I want to take this "master" table and sort them in Y-location first, then sort it dependent on types (text or lines).
Right now, all I can think of doing is to place two tables that consist of objects that are above and below point 200 on a Y-axis. From that, two tables that are split into two different object types; lines and text.
I cannot seem to get beyond a certain point in my code where I have the same for loops occurring for each table. I do this in order to maintain a point of "top to bottom" with the objects of each type. Ideally, I wish to maintain the following order for my table(s) (and hopefully placed into the larger table for use):
< 201 for text
< 201 for lines
200 for text
200 for lines
Here is what I have so far (where objTable is my master table containing all of the numerous objects, where each of those is a table of their own):
local offset = 0
local upperObjTbl, lowerObjTbl, upperLineTbl, lowerLineTbl = {},{},{},{}
for objKey, object in pairs(objTable) do
if tonumber(object.y) < 201 and object.object ~= "line" then
offset = totalOffset + object.offset
table.insert(lowerObjTbl, #lowerObjTbl + 1, object)
end
end
for objKey, object in pairs(objTable) do
if tonumber(object.y) < 201 and object.object == "line" then
offset = totalOffset + object.offset
table.insert(lowerObjTbl, #lowerObjTbl + 1, object)
end
end
for objKey, object in pairs(objTable) do
if tonumber(object.y) > 200 and object.object ~= "line" then
offset = totalOffset + object.offset
table.insert(higherObjTbl, #higherObjTbl + 1, object)
end
end
for objKey, object in pairs(objTable) do
if tonumber(object.y) > 200 and object.object == "line" then
offset = totalOffset + object.offset
table.insert(higherObjTbl, #higherObjTbl + 1, object)
end
end
Ideally, what I would like to happen is to compact this into a much better for loop that, no matter what type of object it is or where on the Y-axis it is located, it places them in order of Y-axis first (lowest to highest) and text before lines.

Related

In Visual FoxPro, how does one incorporate a SUM REST command into a SCAN loop?

I am trying to complete a mortality table, using loops in Visual Foxpro. I have run into one difficulty where the math operation involves doing a sum of of all data in a column for the remaining rows - this needs to be incorporated into a loop. The strategy I thought would work, nesting a SUM REST function into the SCAN REST function, was not successful, and I haven't found a good alternative approach.
In FoxPro, I can successfully use the SCAN function as follows, say:
Go 1
Replace survivors WITH 1000000
SCATTER NAME oprev
SKIP
SCAN rest
replace survivors WITH (1 - oprev.prob) * oprev.survivors
SCATTER NAME oprev
ENDSCAN
(to take the mortality rates in a table and use it to compute number of survivors at each age)
Or, say:
Replace Yearslived WITH 0
SCATTER NAME oprev1
SKIP
SCAN rest
replace Yearslived WITH (oprev1.survivors + survivors) * 0.5
SCATTER NAME oprev1
ENDSCAN
In order to complete a mortality table I want to use the Yearslived and survivors data (which were produced using the SCANs above) to get life expectancy data as follows. Say we have the simplified table:
SURVIVORS YEARSLIVED LIFEEXP
100 0 ?
80 90 ?
60 70 ?
40 50 ?
20 30 ?
0 10 ?
Then each LIFEEXP record should be the sum of the remaining YEARSLIVED records divided by the corresponding Survivors record, i.e:
LIFEEXP (1) = (90+70+50+30+10)/100
LIFEEXP (2) = (70+50+30+10)/80
...and so on.
I attempted to do this with a similar SCAN approach - see below:
Go 1
SCATTER NAME Oprev2
SCAN rest
replace lifeexp WITH ((SUM yearslived Rest) - oprev2.yearslived) / oprev2.survivors
SCATTER NAME oprev2
ENDSCAN
But here I get the error message "Function name is missing)." Help tells me this is probably because the function contains too many arguments.
So I then also tried to break things down and first use SCAN just to get all of my SUM REST data, as follows:
SCAN rest
SUM yearslived REST
END SCAN
... in the hope that I could get this data, define it as a variable, and create a simpler SCAN function above. However, I seem to be doing something wrong here as well, as instead of getting all necessary sums (first the sum of rows 2 to end, then 3 to end, etc.), I only get one sum, of all the yearslived data. In other words, using the sample data, I am given just 250, instead of the list 250, 160, 90, 40, 10.
What am I doing wrong? And more generally, how can I create a loop in Foxpro that includes a function where you Sum up all remaining data in a specific column over and over again (first 2nd through last record, then 3rd through last record, and so on)?
Any help will be much appreciated!
TM
Well you are really hiding the important detail, your table's structure, sample data and desired output. Then it is mostly guess work which have a high chance of to be true.
You seem to be trying to do something like this:
Create Cursor Mortality (Survivors i, YearsLived i, LifeExp b)
Local ix, oprev1
For ix=100 To 0 Step -20
Insert Into Mortality (Survivors, YearsLived) Values (m.ix,0)
Endfor
Locate
Survivors = Mortality.Survivors
Skip
Scan Rest
Replace YearsLived With (m.Survivors + Mortality.Survivors) * 0.5
Survivors = Mortality.Survivors
Endscan
*** Here is the part that deals with your sum problem
Local nRecNo, nSum
Scan
* Save current recnord number
nRecNo = Recno()
Skip
* Sum REST after skipping to next row
Sum YearsLived Rest To nSum
* Position back to row where we started
Go m.nRecNo
* Do the replacement
Replace LifeExp With Iif(Survivors=0,0,m.nSum/Survivors)
* ENDSCAN would implicitly move to next record
Endscan
* We are done. Go first record and browse
Locate
Browse
While there are N ways to do this in VFP, this is one xbase approach to do that and relatively simple to understand IMHO.
Where did you go wrong?
Well, you tried to use SUM as if it were a function, but it is a command. There is SUM() function for SQL as an aggregate function but here you are using the xBase command SUM.
EDIT: And BTW in this code:
SCAN rest
SUM yearslived REST
ENDSCAN
What you are doing is, starting a SCAN with a scope of REST, in loop you are using another scoped command
SUM yearslived REST
This effectively does the summing on the REST of records and places the record pointer to bottom. Endscan further advances it to eof(). Thus it only works for the first record.

How to subtract or add time series data of a CombiTimeTable in Modelica?

I have a text file that is used in a CombiTimeTable. The text file looks like as follows:
#1
double tab1(5,2) # comment line
0 0
1 1
2 4
3 9
4 16
The first column is time and the second one is my data. My goal is to add each datum to the previous one, starting from the second row.
model example
Modelica.Blocks.Sources.CombiTimeTable Tsink(fileName = "C:Tin.txt", tableName = "tab1", tableOnFile = true, timeScale = 60) annotation(
Placement(visible = true, transformation(origin = {-70, 30}, extent = {{-10, -10}, {10, 10}}, rotation = 0)));
equation
end example;
Tsink.y[1] is the column 2 of the table but I do not know how to access it and how to implement an operation on it. Thanks for your help.
You can't use the blocks of the ModelicaStandardTables here, which are only meant for interpolation and hence do not expose the sample points to the Modelica model. However, you can use the Modelica library ExternData to easily read the array from a CSV file and do the required operations on the read data. For example,
model Example "Example model to read array and operate on it"
parameter ExternData.CSVFile dataSource(
fileName="C:/Tin.csv") "Data source"
annotation(Placement(transformation(extent={{-60,60},{-40,80}})));
parameter Integer n = 5 "Number of rows (must be known)";
parameter Real a[n,2] = dataSource.getRealArray2D(n, 2) "Array from CSV file";
parameter Real y[n - 1] = {a[i,2] + a[i + 1,2] for i in 1:n - 1} "Vector";
annotation(uses(ExternData(version="2.6.1")));
end Example;
where Tin.csv is a CSV file with comma as delimiter
0,0
1,1
2,4
3,9
4,16

CSV - Processing each group of contiguous rows having the same values for certain fields

I have a large CSV file with the following headers: "sku", "year", "color", "price", "discount", "inventory", "published_on", "rate", "demographic" and "tags".
I would like to perform various calculations for each contiguous group of rows having the same values for "sku", "year" and "color". I will refer to this partition of the file as each group of rows. For example, if the file looked like this:
sku,year,color,price,discount,...
100,2019,white,24.61,2.3,...
100,2019,white,29.11,2.1,...
100,2019,white,33.48,2.9,...
100,2019,black,58.12,1.3,...
200,2018,brown,44.15,3.1,...
200,2018,brown,53.07,3.2,...
100,2019,white,16.91,2.9,...
there would be four groups of rows: rows 1, 2 and 3 (after the header row), row 4 alone, rows 5 and 6 and row 7 alone. Notice that the last row is not included in the first group even though it has the same values for the first three fields. That it is because it is not contiguous with the first group.
An example of a calculation that might be performed for each group of rows would be to determine the total inventory for the group. In general, the measure to be computed is some function of the values contained in all the rows of the group of rows. The specific calculations for each group of rows is not central to my question. Let us simply assume that each group of rows is passed to some method which returns the measure of interest.
I wish to return an array containing one element per group of rows, each element (perhaps an array or hash) containing the common values of "sku", "year" and "color" and the calculated measure of interest.
Because the file is large it must be read line-by-line, rather than gulping it into an array.
What's the best way to do this?
Enumerator#chunk is perfect for this.
CSV.foreach('path/to/csv', headers: true).
chunk { |row| row.values_at('sku', 'year', 'color') }.
each do |(sku, year, color), rows|
# process `rows` with the current `[sku, year, color]` combination
end
Obviously, that last each can be replaced by map or flat_map, as needed.
Here is an example of how that might be done. I will read the CSV file line-by-line to minimize memory requirements.
Code
require 'csv'
def doit(fname, common_headers)
CSV.foreach(fname, headers: true).
slice_when { |csv1,csv2| csv1.values_at(*common_headers) !=
csv2.values_at(*common_headers) }.
each_with_object({}) { |arr,h|
h[arr.first.to_h.slice(*common_headers)] = calc(arr) }
end
def calc(arr)
arr.sum { |csv| csv['price'].to_f }.fdiv(arr.size).round(2)
end
The method calc needs to be customized for the application. Here I am computing the average price for each contiguous group of records having the same values for "sku", "year" and "color".
See CSV::foreach, Enumerable#slice_when, CSV::Row#values_at, CSV::Row#to_h and Hash#slice.
Example
Now let's construct a CSV file.
str =<<~END
sku,year,color,price
1,2015,red,22.41
1,2015,red,33.61
1,2015,red,12.15
1,2015,blue,36.18
2,2015,yellow,9.08
2,2015,yellow,13.71
END
fname = 't.csv'
File.write(fname, str)
#=> 129
The common headers must be given:
common_headers = ['sku', 'year', 'color']
The average prices are obtained by executing doit:
doit(fname, common_headers)
#=> {{"sku"=>"1", "year"=>"2015", "color"=>"red"}=>22.72,
# {"sku"=>"1", "year"=>"2015", "color"=>"blue"}=>36.18,
# {"sku"=>"2", "year"=>"2015", "color"=>"yellow"}=>11.4}
Note:
((22.41 + 33.61 + 12.15)/3).round(2)
#=> 22.72
((36.18)/1).round(2)
#=> 36.18
((9.08 + 13.71)/2).round(2)
#=> 11.4
The methods foreach and slice_when both return enumerators. Therefore, for each contiguous block of lines from the file having the same values for the keys in common_headers, memory is acquired, calculations are performed for those lines and then that memory is released (by Ruby). In addition, memory is needed to hold the hash that is returned at the end.

Get Capped Maximum Value From List

I have a list of values that range anywhere from 500-1000. I have a second list of values that denote relevant breakpoints in the 500-1000 range (500, 520, 540, 600, etc). I need to return the highest value in the second list that is less than the value in a given number from the first list. I noticed the "N" functions let you set a conditional on them, so for example if I do:
List.Max(List.FirstN(SomeTable[Breakpoints], each _ < 530))
It correctly returns 520 to me. However if I put this inside an AddColumn function and change the 530 to a local field reference:
Table.AddColumn(MyTable, "MinValue", each List.Max(List.FirstN(SomeTable[Breakpoints], each _ < [SomeNumbers])))
Then I get a "We cannot apply field access to the type Number" error. Is what I'm trying to do possible and I'm just formatting it wrong? I always get confused with scope and references in PQ, so it may just be that.
After each, [SomeNumbers] by itself is short for _[SomeNumbers] (which is what you see when filtering a column). In the List.FirstN call, _ refers to a number in the list instead of a row in a table: the value of _ is tied to the closest each, where closeness is measured by the number of layers of nesting between _ and the appearance of each . Therefore, in your code [SomeNumbers] is trying to find the column SomeNumbers on a number, which doesn't exist.
There are a couple ways to fix this:
You can use a let...in statement to store the current value of the SomeNumbers column to use it for later, like so:
each
let
currentNumber = [SomeNumbers],
result = List.Max(List.FirstN(SomeTable[Breakpoints], each _ < currentNumber))
in
result
You can explicitly define a function with the (x) => ... syntax instead of using each twice, like so:
each List.Max(List.FirstN(SomeTable[Breakpoints], (point) => point < [SomeNumbers]))

How would I find an unknown pattern in an array of bytes?

I am building a tool to help me reverse engineer database files. I am targeting my tool towards fixed record length flat files.
What I know:
1) Each record has an index(ID).
2) Each record is separated by a delimiter.
3) Each record is fixed width.
4) Each column in each record is separated by at least one x00 byte.
5) The file header is at the beginning (I say this because the header does not contain the delimiter..)
Delimiters I have found in other files are: ( xFAxFA, xFExFE, xFDxFD ) But this is kind of irrelevant considering that I may use the tool on a different database in the future. So I will need something that will be able to pick out a 'pattern' despite how many bytes it is made of. Probably no more than 6 bytes? It would probably eat up too much data if it was more. But, my experience doing this is limited.
So I guess my question is, how would I find UNKNOWN delimiters in a large file? I feel that given, 'what I know' I should be able to program something, I just dont know where to begin...
# Really loose pseudo code
def begin_some_how
# THIS IS THE PART I NEED HELP WITH...
# find all non-zero non-ascii sets of 2 or more bytes that repeat more than twice.
end
def check_possible_record_lengths
possible_delimiter = begin_some_how
# test if any of the above are always the same number of bytes apart from each other(except one instance, the header...)
possible_records = file.split(possible_delimiter)
rec_length_count = possible_records.map{ |record| record.length}.uniq.count
if rec_length_count == 2 # The header will most likely not be the same size.
puts "Success! We found the fixed record delimiter: #{possible_delimiter}
else
puts "Wrong delimiter found"
end
end
possible = [",", "."]
result = [0, ""]
possible.each do |delimiter|
sizes = file.split( delimiter ).map{ |record| record.size }
next if sizes.size < 2
average = 0.0 + sizes.inject{|sum,x| sum + x }
average /= sizes.size #This should be the record length if this is the right delimiter
deviation = 0.0 + sizes.inject{|sum,x| sum + (x-average)**2 }
matching_value = average / (deviation**2)
if matching_value > result[0] then
result[0] = matching_value
result[1] = delimiter
end
end
Take advantage of the fact that the records have constant size. Take every possible delimiter and check how much each record deviates from the usual record length. If the header is small enough compared rest of the file this should work.

Resources