Functional data structure - data-structures

I'm currently trying to make a non-trivial calculator like Maple, wolfram alpha and those. Just for fun. But I have made the constraint that it has to be in a pure strict functional language. That means no lazy evaluation and mutable structures like arrays.
The question is simply what would be an efficient data structure to make vectors and matrices? The "easy" go to answer would of course be lists, but I find them highly inefficient when it comes to products of matrices. To formalize even more, the vectors and matrices should be of arbitrary size.

You can leave it abstract by representing vectors and matrices as a product type (e.g. record or tuple) of dimensions and a function from index or row and column to element value. Note that you could also accumulate symbolic vector-matrix expressions and, therefore, simplify them before evaluation in order to eliminate as many temporaries as possible.
If your vectors and matrices are sparse then you might want to use a dictionary for the concrete representation. If they are dense then you might want to use an immutable array.

Thanks Jon
I found a better structure, or semi the same.
I use a binary three as structure, where every left child is the subtree with root that is the node with key value that is the next entry of the column of the matrix and the right child is the next node in the row. so it would look like
1 -- 2 -- 3
| | |
4 -- 5 -- 6
| | |
7 -- 8 -- 9
Where | is the left node and -- is the right node of the tree, it is even possible to not represent the same node twice.
So if we have the matrix
1 2 3
4 5 6
7 8 9
where node(Left, 1, Right) would be the root of the tree, and Left is the submatrix
4 5 6
7 8 9
where Node(_, 4, _) is the root of this matrix and Right is the submatrix
2 3
5 6
8 9
with root Node(_, 2, _).
This is at worst as good as arrays if we analyse the worst case runtime.
in some cases it is faster, if we for example wish to get the submatrix
5 6
8 9
We simple go one left and then right from the root and we have the whole tree.
We get the same properties we have with single linked list, as we can create a new matrix or vector(1 x m or m x 1 matrix) of an existing one simply adding the new nodes and det the children to the right node in the old matrix, and still have the old matrix.

Related

Algorithm to find largest identical-row square in matrix

I have a matrix of 100x100 size and need to find the largest set of rows and columns that create a square having equal rows. Example:
A B C D E F C D E
a 0 1 2 3 4 5 a 2 3 4
b 2 9 7 9 8 2
c 9 0 6 8 9 7 ==>
d 8 9 2 3 4 8 d 2 3 4
e 7 2 2 3 4 5 e 2 3 4
f 0 3 6 8 7 2
Currently I am using this algorithm:
candidates = [] // element type is {rows, cols}
foreach row
foreach col
candidates[] = {[row], [col]}
do
retval = candidates.first
foreach candidates as candidate
foreach newRow > candidates.rows.max
foreach newCol > candidates.cols.max
// compare matrix cells in candidate to newRow and newCol
if (newCandidateHasEqualRows)
newCandidates[] = {candidate.rows+newRow, candidate.cols+newCol}
candidates = newCandidates
while candidates.count
return retval
Has anyone else come across a problem similar to this? And is there a better algorithm to solve it?
Here's the NP-hardness reduction I mentioned, from biclique. Given a bipartite graph, make a matrix with a row for each vertex in part A and a column for each vertex in part B. For every edge that is present, put a 0 in the corresponding matrix entry. Put a unique positive integer for each other matrix entry. For all s > 1, there is a Ks,s subgraph if and only if there is a square of size s (which necessarily is all zero).
Given a fixed set of rows, the optimal set of columns is easily determined. You could try the a priori algorithm on sets of rows, where a set of rows is considered frequent iff there exist as many columns that, together with the rows, form a valid square.
I've implemented a branch and bound solver for this problem in C++ at http://pastebin.com/J1ipWs5b. To my surprise, it actually solves randomly-generated puzzles of size up to 100x100 quite quickly: on one problem with each matrix cell chosen randomly from 0-9, an optimal 4x4 solution is found in about 750ms on my old laptop. As the range of cell entries is reduced down to just 0-1, the solution times get drastically longer -- but still, at 157s (for the one problem I tried, which had an 8x8 optimal solution), this isn't terrible. It seems to be very sensitive to the size of the optimal solution.
At any point in time, we have a partial solution consisting of a set of rows that are definitely included, and a set of rows that are definitely excluded. (The inclusion status of the remaining rows is yet to be determined.) First, we pick a remaining row to "try". We try including the row; then (if necessary; see below) we try excluding it. "Trying" here means recursively solving the corresponding subproblem. We record the set of columns that are identical across all rows that are definitely included in the solution. As rows are added to the partial solution, this set of columns can only shrink.
There are a couple of improvements beyond the standard B&B idea of pruning the search when we determine that we can't develop the current partial solution into a better (i.e. larger) complete solution than some complete solution we have already found:
A dominance rule. If there are any rows that can be added to the current partial solution without shrinking the set of identical columns at all, then we can safely add them immediately, and we never have to consider not adding them. This potentially saves a lot of branching, especially if there are many similar rows in the input.
We can reorder the remaining (not definitely included or definitely excluded) rows arbitrarily. So in particular, we can always pick as the next row to consider the row that would most shrink the set of identical columns: this (perhaps counterintuitive) strategy has the effect of eliminating bad combinations of rows near the top of the search tree, which speeds up the search a lot. It also happens to complement the dominance rule above, because it means that if there are ever two rows X and Y such that X preserves a strict subset of the identical columns that Y preserves, then X will be added to the solution first, which in turn means that whenever X is included, Y will be forced in by the dominance rule and we don't need to consider the possibility of including X but excluding Y.

Algorithm to find adjacent cells in a matrix

For example, consider a non-wraparound 4x4 matrix;
1 2 5 1
5 2 5 2
9 3 1 7
2 9 0 3
If I wanted to find the neighbours of, say, the 5 in the first row = 2,5,1. Is there a more efficient solution than doing two for loops and adding a bunch of if conditions?
Yes. If you really need to find the neighbors, then you have an option to use graphs.
Graphs are basically vertex classes w/ their adjacent vertexes, forming an edge. We can see here that 2 forms an edge w/ 5, and 1 form an edge w/ 5, etc.
If you're going to need to know the neighbors VERY frequently(because this is inefficient if you're not), then implement your own vertex class, wrapping the value(5) in a generic T val variable. Have a hashtable of adjacent numbers and their respective distances(1 in this case, and if you need to find neighbors of 2, then you're going to need to assign those as well) by add(vertex, distance) into the hashtable.
Later on, simply iterate through the hashtable for the neighbors.
However, for an array this simple, there isn't much overhead for just doing a for loop and using "a bunch of if statements". In reality you only need to have if(boundaries check) for every direction(which is 4).
Hopefully this helps.

Hungarian algorithm matching one set to itself

I'm looking for a variation on the Hungarian algorithm (I think) that will pair N people to themselves, excluding self-pairs and reverse-pairs, where N is even.
E.g. given N0 - N6 and a matrix C of costs for each pair, how can I obtain the set of 3 lowest-cost pairs?
C = [ [ - 5 6 1 9 4 ]
[ 5 - 4 8 6 2 ]
[ 6 4 - 3 7 6 ]
[ 1 8 3 - 8 9 ]
[ 9 6 7 8 - 5 ]
[ 4 2 6 9 5 - ] ]
In this example, the resulting pairs would be:
N0, N3
N1, N4
N2, N5
Having typed this out I'm now wondering if I can just increase the cost values in the "bottom half" of the matrix... or even better, remove them.
Is there a variation of Hungarian that works on a non-square matrix?
Or, is there another algorithm that solves this variation of the problem?
Increasing the values of the bottom half can result in an incorrect solution. You can see this as the corner coordinates (in your example coordinates 0,1 and 5,6) of the upper half will always be considered to be in the minimum X pairs, where X is the size of the matrix.
My Solution for finding the minimum X pairs
Take the standard Hungarian algorithm
You can set the diagonal to a value greater than the sum of the elements in the unaltered matrix — this step may allow you to speed up your implementation, depending on how your implementation handles nulls.
1) The first step of the standard algorithm is to go through each row, and then each column, reducing each row and column individually such that the minimum of each row and column is zero. This is unchanged.
The general principle of this solution, is to mirror every subsequent step of the original algorithm around the diagonal.
2) The next step of the algorithm is to select rows and columns so that every zero is included within the selection, using the minimum number of rows and columns.
My alteration to the algorithm means that when selecting a row/column, also select the column/row mirrored around that diagonal, but count it as one row or column selection for all purposes, including counting the diagonal (which will be the intersection of these mirrored row/column selection pairs) as only being selected once.
3) The next step is to check if you have the right solution — which in the standard algorithm means checking if the number of rows and columns selected is equal to the size of the matrix — in your example if six rows and columns have been selected.
For this variation however, when calculating when to end the algorithm treat each row/column mirrored pair of selections as a single row or column selection. If you have the right solution then end the algorithm here.
4) If the number of rows and columns is less than the size of the matrix, then find the smallest unselected element, and call it k. Subtract k from all uncovered elements, and add it to all elements that are covered twice (again, counting the mirrored row/column selection as a single selection).
My alteration of the algorithm means that when altering values, you will alter their mirrored values identically (this should happen naturally as a result of the mirrored selection process).
Then go back to step 2 and repeat steps 2-4 until step 3 indicates the algorithm is finished.
This will result in pairs of mirrored answers (which are the coordinates — to get the value of these coordinates refer back to the original matrix) — you can safely delete half of each pair arbitrarily.
To alter this algorithm to find the minimum R pairs, where R is less than the size of the matrix, reduce the stopping point in step 3 to R. This alteration is essential to answering your question.
As #Niklas B, stated you are solving Weighted perfect matching problem
take a look at this
here is part of document describing Primal-dual algorithm for weighted perfect matching
Please read all and let me know if is useful to you

Finding good heuristic for A* search

I'm trying to find the optimal solution for a little puzzle game called Twiddle (an applet with the game can be found here). The game has a 3x3 matrix with the number from 1 to 9. The goal is to bring the numbers in the correct order using the minimum amount of moves. In each move you can rotate a 2x2 square either clockwise or counterclockwise.
I.e. if you have this state
6 3 9
8 7 5
1 2 4
and you rotate the upper left 2x2 square clockwise you get
8 6 9
7 3 5
1 2 4
I'm using a A* search to find the optimal solution. My f() is simply the number of rotations needed. My heuristic function already leads to the optimal solution (if I modify it, see the notice a t the end) but I don't think it's the best one you can find. My current heuristic takes each corner, looks at the number at the corner and calculates the manhatten distance to the position this number will have in the solved state (which gives me the number of rotation needed to bring the number to this postion) and sums all these values. I.e. You take the above example:
6 3 9
8 7 5
1 2 4
and this end state
1 2 3
4 5 6
7 8 9
then the heuristic does the following
6 is currently at index 0 and should by at index 5: 3 rotations needed
9 is currently at index 2 and should by at index 8: 2 rotations needed
1 is currently at index 6 and should by at index 0: 2 rotations needed
4 is currently at index 8 and should by at index 3: 3 rotations needed
h = 3 + 2 + 2 + 3 = 10
Additionally, if h is 0, but the state is not completely ordered, than h = 1.
But there is the problem, that you rotate 4 elements at once. So there a rare cases where you can do two (ore more) of theses estimated rotations in one move. This means theses heuristic overestimates the distance to the solution.
My current workaround is, to simply excluded one of the corners from the calculation which solves this problem at least for my test-cases. I've done no research if really solves the problem or if this heuristic still overestimates in some edge-cases.
So my question is: What is the best heuristic you can come up with?
(Disclaimer: This is for a university project, so this is a bit of homework. But I'm free to use any resource if can come up with, so it's okay to ask you guys. Also I will credit Stackoverflow for helping me ;) )
Simplicity is often most effective. Consider the nine digits (in the rows-first order) as forming a single integer. The solution is represented by the smallest possible integer i(g) = 123456789. Hence I suggest the following heuristic h(s) = i(s) - i(g). For your example, h(s) = 639875124 - 123456789.
You can get an admissible (i.e., not overestimating) heuristic from your approach by taking all numbers into account, and dividing by 4 and rounding up to the next integer.
To improve the heuristic, you could look at pairs of numbers. If e.g. in the top left the numbers 1 and 2 are swapped, you need at least 3 rotations to fix them both up, which is a better value than 1+1 from considering them separately. In the end, you still need to divide by 4. You can pair up numbers arbitrarily, or even try all pairs and find the best division into pairs.
All elements should be taken into account when calculating distance, not just corner elements. Imagine that all corner elements 1, 3, 7, 9 are at their home, but all other are not.
It could be argued that those elements that are neighbors in the final state should tend to become closer during each step, so neighboring distance can also be part of heuristic, but probably with weaker influence than distance of elements to their final state.

Binary search on unsorted arrays?

I came across this document Binary Search Revisited where the authors have proved/explained that binary search can be used for unsorted arrays (lists) as well. I haven't grokked much of the document on a first reading.
Have any of you already explored this ?
I've just read the paper. To me the author uses the term binary search to address the Bisection method used to find the zeros of a continuous function.
The examples in the paper are clearly inspired to problems like find the zero into an
interval (with translation on the y axe) or find the max/min of a function in tabular data.
The arrays the essay consider are not random filled ones, you will find a rule to construct them (it is the rule tied to the function used to dump them)
Said that it is a good chance of tinkering about different algorithms belonging to a common family in order to find similarity and differences. A good chance to expand your experiences.
Definitely not a new concept or an undervalued one.
Lookin for 3 into that unsorted list with binary or bisection :
L = 1 5 2 9 38 11 3
1-Take mid point in the whole list L : 9
3 < 9 so delete the right part of the list (38 11 3)
here you can already understand you will never find 3
2-Take mid point in the remaining list 1 5 2 : 5
3 > 5 so delete the right part of the list (5 2)
remains 1
Result : 3 unfound
Two remarks:
1-the binary or bisection algorithm consider right and left as an indication of the order
So i have rudely applied the usual algo considering right is high and left is low
If you consieder the opposit, ie right is low and left is high, then, trying to find 3 in this slighty similar list will lead to " 3 unfound"
L' = L = 1 5 2 9 3 38 11
3 < 9 / take right part : 3 38 11
mid point 38
3 < 38 take right part : 11
3 unfound
2- if you accept to re apply systematicly the algorithm on the dropped part of the list than it leads to searching the element in a list of n elements Complexity will be O(n) exactly the same as running all the list from beg to end to search your value.
The time of search could be slightly shorter.
Why ? let's consider you look one by one from beg. to end for the value 100000 in a sorted list. You will find it at the end of your list ! :-)
If now this list is unorderd and your value 100000 is for example exactly at the mid point ... bingo !!
Binary Search can be implemented on a rotated unsorted array/list.

Resources