Avoid accuracy problems while computing the permanent using the Ryser formula - algorithm

Task
I want to calculate the permanent P of a NxN matrix for N up to 100. I can make use of the fact that the matrix features only M=4 (or slightly more) different rows and cols. The matrix might look like
A1 ... A1 B1 ... B1 C1 ... C1 D1 ... D1 |
... | r1 identical rows
A1 ... A1 B1 ... B1 C1 ... C1 D1 ... D1 |
A2 ... A2 B2 ... B2 C2 ... C2 D2 ... D2
...
A2 ... A2 B2 ... B2 C2 ... C2 D2 ... D2
A3 ... A3 B3 ... B2 C2 ... C2 D2 ... D2
...
A3 ... A3 B3 ... B3 C3 ... C3 D3 ... D3
A4 ... A4 B4 ... B4 C4 ... C4 D4 ... D4
...
A4 ... A4 B4 ... B4 C4 ... C4 D4 ... D4
---------
c1 identical cols
and c and r are the multiplicities of cols and rows. All values in the matrix are laying between 0 and 1 and are encoded as double precision floating-point numbers.
Algorithm
I tried to use the Ryser formula to calculate the permanent. For the formula, one needs to first calculate the sum of each row and multiply all the row sums. For the matrix above this yields
S0 = (c1 * A1 + c2 * B1 + c3 * C1 + c4 * D1)^r1 * ...
* (c1 * A4 + c2 * B4 + c3 * C4 + c4 * D4)^r4
As a next step the same is done with col 1 deleted
S1 = ((c1-1) * A1 + c2 * B1 + c3 * C1 + c4 * D1)^r1 * ...
* ((c1-1) * A4 + c2 * B4 + c3 * C4 + c4 * D4)^r4
and this number is subtracted from S0.
The algorithm continues with all possible ways to delete single and group of cols and the products of the row sums of the remaining matrix are added (even number of cols deleted) and subtracted (odd number of cols deleted).
The task can be solved relative efficiently if one makes use of the identical cols (for example the result S1 will pop up exactly c1 times).
Problem
Even if the final result is small the values of the intermediate results S0, S1, ... can reach values up to N^N. A double can hold this number but the absolute precision for such big numbers is below or on the order of the expected overall result. The expected result P is on the order of c1!*c2!*c3!*c4! (actually I am interested in P/(c1!*c2!*c3!*c4!) which should lay between 0 and 1).
I tried to arrange the additions and subtractions of the values S in a way that the sums of the intermediate results are around 0. This helps in the sense that I can avoid intermediate results that are exceeding N^N, but this improves things only a little bit. I also thought about using logarithms for the intermediate results to keep the absolute numbers down - but the relative accuracy of the encoded numbers will be still bounded by the encoding as floating point number and I think I will run into the same problem. If possible, I want to avoid the usage of data types that are implementing a variable-precision arithmetic for performance reasons (currently I am using matlab).

Related

Clusterization algorithm

I have problem with clusterization of clients.
I have a dataset with columns such as name, address, email, phone, etc. (in a example A,B,C). Each row has unique identifier (ID). I need to assign CLUSTER_ID (X) to each row. In one cluster all rows have one or more the same attributes as other rows. So clients with ID=1,2,3 have the same A attribute and clients with ID=3,10 have the same B attribute then ID=1,2,3,10 should be in the same cluster.
How can I solve this problem using SQL?
If it's not possible how to write the algorithm (pseudocode)?
The performance is very important, because the dataset contains milions of rows.
Sample Input:
ID A B C
1 A1 B3 C1
2 A1 B2 C5
3 A1 B10 C10
4 A2 B1 C5
5 A2 B8 C1
6 A3 B1 C4
7 A4 B6 C3
8 A4 B3 C5
9 A5 B7 C2
10 A6 B10 C3
11 A8 B5 C4
Sample Output:
ID A B C X
1 A1 B3 C1 1
2 A1 B2 C5 1
3 A1 B10 C10 1
4 A2 B1 C5 1
5 A2 B8 C1 1
6 A3 B1 C4 1
7 A4 B6 C3 1
8 A4 B3 C5 1
9 A5 B7 C2 2
10 A6 B10 C3 1
11 A8 B5 C4 1
Thanks for any help.
A possible way is by repeating updates for the empty X.
Start with cluster_id 1.
F.e. by using a variable.
SET #CurrentClusterID = 1
Take the top 1 record, and update it's X to 1.
Now loop an update for all records with an empty X,
and that can be linked to a record with X = 1 and that has the same A or B or C
Disclaimer:
The statement will vary depending on the RDBMS.
This is just intended as pseudo-code.
WHILE (<<some check to see if there were records updated>>)
BEGIN
UPDATE yourtable t
SET t.X = #CurrentClusterID
WHERE t.X IS NULL
AND EXISTS (
SELECT 1 FROM yourtable d
WHERE d.X = #CurrentClusterID
AND (d.A = t.A OR d.B = t.B OR d.C = t.C)
);
END
Loop that till it updates 0 records.
Now repeat the method for the other clusters, till there are no more empty X in the table.
1) Increase the #CurrentClusterID by 1
2) Update the next top 1 record with an empty X to the new #CurrentClusterID
3) Loop the update till no-more updates were done.
An example test on db<>fiddle here for MS Sql Server.

Uniqueness in Permutation and Combination

I am trying to create some pseudocode to generate possible outcomes for this scenario:
There is a tournament taking place, where each round all players in the tournament are in a group with other players of different teams.
Given x amount of teams, each team has exactly n amount of players. What are the possible outcomes for groups of size r where you can only have one player of each team AND the player must have not played with any of the other players already in previous rounds.
Example: 4 teams (A-D), 4 players each team, 4 players each grouping.
Possible groupings are: (correct team constraint)
A1, B1, C1, D1
A1, B3, C1, D2
But not: (violates same team constraint)
A1, A3, C2, D2
B3, C2, D4, B1
However, the uniqueness constraint comes into play in this grouping
A1, B1, C1, D1
A1, B3, C1, D2
While it does follow the constraints of playing with different teams, it has broken the rule of uniqueness of playing with different players. In this case A1 is grouped up twice with C1
At the end of the day the pseudocode should be able to create something like the following
Round 1 Round 2 Round 3 Round 4
a1 b1 a1 d4 a1 c2 a1 c4
c1 d1 b2 c3 b4 d3 d2 b3
a2 b2 a2 d1 a2 c3 a2 c1
c2 d2 b3 c4 b1 d4 d3 b4
a3 b3 a3 d2 a3 c4 a3 c2
c3 d3 b4 c1 b2 d1 d4 b1
a4 b4 a4 d3 a4 c1 a4 c3
c4 d4 b1 c2 b3 d2 d1 b2
In the example you see that in each round no player has been grouped up with another previous player.
If the number of players on a team is a prime power (2, 3, 4, 5, 7, 8, 9, 11, 13, 16, 17, 19, etc.), then here's an algorithm that creates a schedule with the maximum number of rounds, based on a finite affine plane.
We work in the finite field GF(n), where n is the number of players on a team. GF(n) has its own notion of multiplication; when n is a prime, it's multiplication mod n, and when n is higher power of some prime, it's multiplication of univariate polynomials mod some irreducible polynomial of the appropriate degree. Each team is identified by a nonzero element of GF(n); let the set of team identifiers be T. Each team member is identified by a pair in T×GF(n). For each nonzero element r of GF(n), the groups for round r are
{{(t, r*t + c) | t in T} | c in GF(n)},
where * and + denote multiplication and addition respectively in GF(n).
Implementation in Python 3
This problem is very closely related to the Social Golfer Problem. The Social Golfer Problem asks, given n players who each play once a day in g groups of size s (n = g×s), how many days can they be scheduled such that no player plays with any other player more than once?
The algorithms for finding solutions to instances of Social Golfer problems are a patchwork of constraint solvers and mathematical constructions, which together don't address very many cases satisfactorily. If the number of players on a team is equal to the group size, then solutions to this problem can be derived by interpreting the first day's schedule as the team assignments and then using the rest of the schedule. There may be other constructions.

Cumulating row data over last 12 months in powerquery

I am creating a dashboard using Excel Powerquery(aka. M), in which I need to create a measure which requires rolling up values for last 12 months for two dimension
Example:
Input:
D1 | D2 | MonthYear(D3) | Value
A1 B1 Mar2016 1
A2 B1 Mar2016 2
A3 B1 Mar2016 3
A1 B1 Apr2016 4
A2 B1 Apr2016 5
A3 B1 Apr2016 6
A1 B1 May2016 7
A2 B1 May2016 8
A3 B1 May2016 9
Output:
D1 | D2 | MonthYear(D3) | Value
A1 B1 Mar2016 1
A2 B1 Mar2016 2
A3 B1 Mar2016 3
A1 B1 Apr2016 4+1
A2 B1 Apr2016 5+2
A3 B1 Apr2016 6+3
A1 B1 May2016 7+4+1
A2 B1 May2016 8+5+2
A3 B1 May2016 9+6+3
Also sum should be done only for last 12 months if more data is available. ANy help is appreciated
I covered a very similar scenario to this in my demo file: Power Query demo - Running Total.xlsx
You can download it from my OneDrive and review the steps:
https://1drv.ms/f/s!AGLFDsG7h6JPgw4
Basically you add an Index, Group By the "group columns" (in your scenario D1 and D2) and create an "All Rows" Aggregate column. Then you Copy the "All Rows" column, Expand both "All Rows" columns, Filter and finally Group By and Sum to create the Running Total.
The only bit of code is the Added column to produce a true/false column for the filter, e.g.
[Index] >= [#"All Rows - Copy.Index"]

I want to define f^2 to be 1 but leave f undefined

I want, for example, for Mathematica to generate 7 + 5f if I write the expression (2+f) (3+f). I always want f^2 to be computed as 1 (or any other value I assign to it) but for f to be a special undefined symbol. If I define f^2:=1 I get a Tag Power is protected error message.
I am a Mathematica newbie, self taught, so please try to answer this in as elementary fashion as possible.
For the record, I am trying to define Clifford algebra operations in n-dimensional space-time and being able to make an assignment like this would tremendously simplify the task.
Generalized to all symbols e1,e2,e3,...,en
x = (a + a1 e1 + a2 e2 + a3 e3 + a4 e1 e2 - a5 e1 e3 + a6 e2 e3 +
a7 e1 e2 e3);
y = (b + b1 e1 + b2 e2 + b3 e3 + b4 e1 e2 - b5 e1 e3 + b6 e2 e3 +
b7 e1 e2 e3);
ReplaceAll[
Expand[x y],
Power[e_, 2] /; First[Characters[ToString[e]]] === "e" -> 1
]
This way which I have just learned from #Edmund is more elegant:
Expand[(2 + e1)(3 + e2)] /.Power[s_Symbol,2]/; StringStartsQ["e"]#SymbolName[s]->1
6 + 3 e1 + 2 e2 + e1 e2
ReplaceAll[Expand[(2 + f) (3 + f)], Power[f, 2] -> 1]
7 + 5 f

Jface tableviewer multi sort columns

i am implementing a tableviewer that is able to sort values depengin on their column order.
e.g. column1-column2-columnX
sorts the rows first on the values of column 1, then column 2, column....
Therefore i want to use a ColumnViewerSorter, especially the method
"int doCompare(Viewer viewer, Object e1, Object e2);"
inside this method i want to sort depending on other Tableviewer Row/Cells for comparison and the difficulty is that JFace tableviewer does the sort in the view only, so i have to "ask" the tableviewer itself for the actual value of e.g. "column 1, row 20"
using the function "viewer.getElementAt(index)" inside "docompare" wold be ok, but inside docompare i do have no reference to the objects e1 and e2 position in the tableviewer.
how could i achieve that?
thanking you very much in advance for helping me
best regards,
Malcom
You could iterate through all items in the table viewer and see where the objects e1 and e2 are, of course.
BUT... and I hope I understand your problem correctly... why do you want to implement multisorting?
Let's say you have 3 columns:
Col1 Col2 Col3
-------------------------
a2 b1 c4
a1 b2 c1
a2 b1 c3
To obtain the sorting order Col1-Col2-Col3, the user can click on Col3, then on Col2, and in the end Col1:
Col1 Col2 _Col3_
-------------------------
a1 b2 c1
a2 b1 c3
a2 b1 c4
Col1 _Col2_ Col3
-------------------------
a2 b1 c3
a2 b1 c4
a1 b2 c1
_Col1_ Col2 Col3
-------------------------
a1 b2 c1
a2 b1 c3
a2 b1 c4
This might not be the best example, but to obtain "multi sorting", the user just has to sort the desired columns in the opposite order.

Resources