Rowspan "clean-up" algorithm - algorithm

I have a structure, that resembles HTML table's colspan/rowspan feature:
[
[1,4], [4,1]
[2,2], [2,2]
[1,1], [1,1], [1,1], [1,1]
]
is like
<tr>
<td rowspan=4></td>
<td colspan=4></td>
</tr>
<tr>
<td rowspan=2 colspan=2></td>
<td colspan=2 rowspan=2></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
----------------
| | |
| |-----------|
| | | |
| |-----------|
| | | | | |
----------------
The second row (and the first cell from the first row, that spans all 4 rows) in
[
[1,3], [4,1]
[2,1], [2,1]
[1,1], [1,1], [1,1], [1,1]
]
and "topology" of the table remains the same
However a table like
[
[1,4], [4,1]
[2,2], [2,1]
[2,1],
[1,1], [1,1], [1,1], [1,1]
]
----------------
| | |
| |-----------|
| | | |
| | |-----|
| | | |
| |-----------|
| | | | | |
----------------
is not "collapsible"
What is an effecient algorithm to perform this transformation or leave the table as is? Any programming language would work.
Assume, that the structure is valid (not missing cells, the table is rectangular) if it simplifies the task

Convert the cell extents to coordinates.
0 1 2 3 4 5
0 ----------------
| | |
1 | |-----------|
| | | |
| | | |
| | | |
3 | |-----------|
| | | | | |
4 ----------------
Compute the sorted set of y-coordinates (0, 1, 3, 4). Map each coordinate to its index in the set (0: 0, 1: 1, 3: 2, 4: 3).
Compute the sorted set of x-coordinates (0, 1, 2, 3, 4, 5). Map each coordinate to its index in the set (identity map).
0 1 2 3 4 5
0 ----------------
| | |
1 | |-----------|
| | | |
| | | |
| | | |
2 | |-----------|
| | | | | |
3 ----------------
Convert the cell coordinates back to extents.

Related

Subtract value row by row in matlab

I have a 1 column matrix with the following values:
*-------*
| 6 |
| 4 |
| 3 |
| 1 |
| 1 |
*-------*
With this function, starting from the first value, I subtract the value in the following row and place 0 at the end. This is the result:
Delta = Ctv_ds_universal(1:(end-1),1)-Ctv_ds_universal(2:end,1);
Delta(end+1)=0;
*-----------*
| 2 (6-4) |
| 1 (4-3) |
| 2 (3-1) |
| 0 (1-1) |
| 0 |
*-----------*
Now, I would like to reverse the order and start subtracting from down to the top, placing 0 at the beginning. How can I modify the function?
*------------*
| 0 |
| -2 (4-6) |
| -1 (3-4) |
| -2 (1-3) |
| 0 (1-1) |
*------------*
Delta = 0;
Delta = [Delta; Ctv_ds_universal(2:end,1)-Ctv_ds_universal(1:end-1,1)];

Another Combination

Very similar to my last question, now I want only the, "full combination," for a group in order of priority. So, from this source table:
+-------+-------+----------+
| GROUP | State | Priority |
+-------+-------+----------+
| 1 | MI | 1 |
| 1 | IA | 2 |
| 1 | CA | 3 |
| 1 | ND | 4 |
| 1 | AZ | 5 |
| 2 | IA | 2 |
| 2 | NJ | 1 |
| 2 | NH | 3 |
And so on...
I need a query that returns:
+-------+---------------------+
| GROUP | COMBINATION |
+-------+---------------------+
| 1 | MI, IA, CA, ND, AZ |
| 2 | NJ, IA, NH |
+-------+---------------------+
Thanks for the help, again!
Use listagg() ordering by priority within the group.
SELECT "GROUP",
listagg("STATE", ', ') WITHIN GROUP (ORDER BY "PRIORITY")
FROM "ELBAT"
GROUP BY "GROUP";
db<>fiddle

Enumerating Cartesian product while minimizing repetition

Given two sets, e.g.:
{A B C}, {1 2 3 4 5 6}
I want to generate the Cartesian product in an order that puts as much space as possible between equal elements. For example, [A1, A2, A3, A4, A5, A6, B1…] is no good because all the As are next to each other. An acceptable solution would be going "down the diagonals" and then every time it wraps offsetting by one, e.g.:
[A1, B2, C3, A4, B5, C6, A2, B3, C4, A5, B6, C1, A3…]
Expressed visually:
| | A | B | C | A | B | C | A | B | C | A | B | C | A | B | C | A | B | C |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | | | | | | | | | | | | | | | | | |
| 2 | | 2 | | | | | | | | | | | | | | | | |
| 3 | | | 3 | | | | | | | | | | | | | | | |
| 4 | | | | 4 | | | | | | | | | | | | | | |
| 5 | | | | | 5 | | | | | | | | | | | | | |
| 6 | | | | | | 6 | | | | | | | | | | | | |
| 1 | | | | | | | | | | | | | | | | | | |
| 2 | | | | | | | 7 | | | | | | | | | | | |
| 3 | | | | | | | | 8 | | | | | | | | | | |
| 4 | | | | | | | | | 9 | | | | | | | | | |
| 5 | | | | | | | | | | 10| | | | | | | | |
| 6 | | | | | | | | | | | 11| | | | | | | |
| 1 | | | | | | | | | | | | 12| | | | | | |
| 2 | | | | | | | | | | | | | | | | | | |
| 3 | | | | | | | | | | | | | 13| | | | | |
| 4 | | | | | | | | | | | | | | 14| | | | |
| 5 | | | | | | | | | | | | | | | 15| | | |
| 6 | | | | | | | | | | | | | | | | 16| | |
| 1 | | | | | | | | | | | | | | | | | 17| |
| 2 | | | | | | | | | | | | | | | | | | 18|
or, equivalently but without repeating the rows/columns:
| | A | B | C |
|---|----|----|----|
| 1 | 1 | 17 | 15 |
| 2 | 4 | 2 | 18 |
| 3 | 7 | 5 | 3 |
| 4 | 10 | 8 | 6 |
| 5 | 13 | 11 | 9 |
| 6 | 16 | 14 | 12 |
I imagine there are other solutions too, but that's the one I found easiest to think about. But I've been banging my head against the wall trying to figure out how to express it generically—it's a convenient thing that the cardinality of the two sets are multiples of each other, but I want the algorithm to do The Right Thing for sets of, say, size 5 and 7. Or size 12 and 69 (that's a real example!).
Are there any established algorithms for this? I keep getting distracted thinking of how rational numbers are mapped onto the set of natural numbers (to prove that they're countable), but the path it takes through ℕ×ℕ doesn't work for this case.
It so happens the application is being written in Ruby, but I don't care about the language. Pseudocode, Ruby, Python, Java, Clojure, Javascript, CL, a paragraph in English—choose your favorite.
Proof-of-concept solution in Python (soon to be ported to Ruby and hooked up with Rails):
import sys
letters = sys.argv[1]
MAX_NUM = 6
letter_pos = 0
for i in xrange(MAX_NUM):
for j in xrange(len(letters)):
num = ((i + j) % MAX_NUM) + 1
symbol = letters[letter_pos % len(letters)]
print "[%s %s]"%(symbol, num)
letter_pos += 1
String letters = "ABC";
int MAX_NUM = 6;
int letterPos = 0;
for (int i=0; i < MAX_NUM; ++i) {
for (int j=0; j < MAX_NUM; ++j) {
int num = ((i + j) % MAX_NUM) + 1;
char symbol = letters.charAt(letterPos % letters.length);
String output = symbol + "" + num;
++letterPos;
}
}
What about using something fractal/recursive? This implementation divides a rectangular range into four quadrants then yields points from each quadrant. This means that neighboring points in the sequence differ at least by quadrant.
#python3
import sys
import itertools
def interleave(*iters):
for elements in itertools.zip_longest(*iters):
for element in elements:
if element != None:
yield element
def scramblerange(begin, end):
width = end - begin
if width == 1:
yield begin
else:
first = scramblerange(begin, int(begin + width/2))
second = scramblerange(int(begin + width/2), end)
yield from interleave(first, second)
def scramblerectrange(top=0, left=0, bottom=1, right=1, width=None, height=None):
if width != None and height != None:
yield from scramblerectrange(bottom=height, right=width)
raise StopIteration
if right - left == 1:
if bottom - top == 1:
yield (left, top)
else:
for y in scramblerange(top, bottom):
yield (left, y)
else:
if bottom - top == 1:
for x in scramblerange(left, right):
yield (x, top)
else:
halfx = int(left + (right - left)/2)
halfy = int(top + (bottom - top)/2)
quadrants = [
scramblerectrange(top=top, left=left, bottom=halfy, right=halfx),
reversed(list(scramblerectrange(top=top, left=halfx, bottom=halfy, right=right))),
scramblerectrange(top=halfy, left=left, bottom=bottom, right=halfx),
reversed(list(scramblerectrange(top=halfy, left=halfx, bottom=bottom, right=right)))
]
yield from interleave(*quadrants)
if __name__ == '__main__':
letters = 'abcdefghijklmnopqrstuvwxyz'
output = []
indices = dict()
for i, pt in enumerate(scramblerectrange(width=11, height=5)):
indices[pt] = i
x, y = pt
output.append(letters[x] + str(y))
table = [[indices[x,y] for x in range(11)] for y in range(5)]
print(', '.join(output))
print()
pad = lambda i: ' ' * (2 - len(str(i))) + str(i)
header = ' |' + ' '.join(map(pad, letters[:11]))
print(header)
print('-' * len(header))
for y, row in enumerate(table):
print(pad(y)+'|', ' '.join(map(pad, row)))
Outputs:
a0, i1, a2, i3, e0, h1, e2, g4, a1, i0, a3, k3, e1,
h0, d4, g3, b0, j1, b2, i4, d0, g1, d2, h4, b1, j0,
b3, k4, d1, g0, d3, f4, c0, k1, c2, i2, c1, f1, a4,
h2, k0, e4, j3, f0, b4, h3, c4, j2, e3, g2, c3, j4,
f3, k2, f2
| a b c d e f g h i j k
-----------------------------------
0| 0 16 32 20 4 43 29 13 9 25 40
1| 8 24 36 28 12 37 21 5 1 17 33
2| 2 18 34 22 6 54 49 39 35 47 53
3| 10 26 50 30 48 52 15 45 3 42 11
4| 38 44 46 14 41 31 7 23 19 51 27
If your sets X and Y are sizes m and n, and Xi is the index of the element from X that's in the ith pair in your Cartesian product (and similar for Y), then
Xi = i mod n;
Yi = (i mod n + i div n) mod m;
You could get your diagonals a little more spread out by filling out your matrix like this:
for (int i = 0; i < m*n; i++) {
int xi = i % n;
int yi = i % m;
while (matrix[yi][xi] != 0) {
yi = (yi+1) % m;
}
matrix[yi][xi] = i+1;
}

Selecting rows in DataFrame by/after grouping using DataFramesMeta in Julia

I am trying to select certain data rows in a DataFrame using #linq macros:
using DataFrames, DataFramesMeta
df=DataFrame(x = ["a", "a", "a", "b", "b", "b"],
y = [1, 2, 3, 2, 3, 4],
z = [100, 200, 300, 456, 345, 234])
| Row | x | y | z |
|-----|-----|---|-----|
| 1 | "a" | 1 | 100 |
| 2 | "a" | 2 | 200 |
| 3 | "a" | 3 | 300 |
| 4 | "b" | 2 | 456 |
| 5 | "b" | 3 | 345 |
| 6 | "b" | 4 | 234 |
I am trying to select those rows that have the maximum y for a given type of x, that is
| Row | x | y | z |
|-----|-----|---|-----|
| 1 | "a" | 3 | 300 |
| 2 | "b" | 4 | 234 |
So, I am grouping by column x and adding a column with the maxima
#linq df |> #by(:x, maxY = maximum(:y))
which gives
| Row | x | maxY |
|-----|-----|------|
| 1 | "a" | 3 |
| 2 | "b" | 4 |
but I don't see how to put the corresponding z entries back in. Probably, it would be join but I don't see how to do that or get the result in another, simple way.
You can do it in one line joining on=[:x,:y] but for this to work you need to name the maximum(:y) column y not maxY:
df2 = #linq df |> by(:x, y=maximum(:y)) |> join(df, on=[:x, :y])
You can later rename that column to the intended maxY:
rename!(df2, :y, :maxY)

Confused: would correlation be "--" in Statsample?

I am very new to statsample and having some basic questions. With this sample data:
[[1, 2, 3, 3],[2, 3, 3, 5],[4, 1, 3, 4]]
I create a 4x4 statsample dataaset called ds and get the following output for each call:
puts ds.summary
gets
= Dataset 1
Cases: 3
Element:[actuals]
== Vector 3
n :3
n valid:3
factors:3
mode: 3
Distribution
+---+---+---------+
| 3 | 3 | 100.00% |
+---+---+---------+
Element:[mids]
== Vector 2
n :3
n valid:3
factors:1,2,3
mode: 2
Distribution
+---+---+--------+
| 1 | 1 | 33.33% |
| 2 | 1 | 33.33% |
| 3 | 1 | 33.33% |
+---+---+--------+
Element:[predicteds]
== Vector 4
n :3
n valid:3
factors:3,4,5
mode: 3
Distribution
+---+---+--------+
| 3 | 1 | 33.33% |
| 4 | 1 | 33.33% |
| 5 | 1 | 33.33% |
+---+---+--------+
Element:[prediction_error]
== Vector 5
n :3
n valid:3
factors:0,1,2
mode: 0
Distribution
+---+---+--------+
| 0 | 1 | 33.33% |
| 1 | 1 | 33.33% |
| 2 | 1 | 33.33% |
+---+---+--------+
Element:[uids]
== Vector 1
n :3
n valid:3
factors:1,2,4
mode: 1
Distribution
+---+---+--------+
| 1 | 1 | 33.33% |
| 2 | 1 | 33.33% |
| 4 | 1 | 33.33% |
+---+---+--------+
Which seems reasonable but then:
cm = ds.correlation_matrix
puts cm.summary
gets this, which is confusing:
Correlation Matrix
+------------------+---------+-------+------------+------------------+-------+
| | actuals | mids | predicteds | prediction_error | uids |
+------------------+---------+-------+------------+------------------+-------+
| actuals | 1.000 | -- | -- | -- | -- |
| mids | -- | 1.000 | -- | -- | -- |
| predicteds | -- | -- | 1.000 | -- | -- |
| prediction_error | -- | -- | -- | 1.000 | -- |
| uids | -- | -- | -- | -- | 1.000 |
+------------------+---------+-------+------------+------------------+-------+
You created a dataset with nominal vectors, not scalar ones. So, correlations between not numeric vectors is always 0.

Resources