An alldifferent for tuples - prolog

I'm trying to solve a sudoku with the viewpoint that every number has 9 positions. This is the representation for my sudoku:
From the table you can read that number 5 has following positions (Row,Col) in the sudoku: (2,8),(4,2),(6,5).
When I mention a row in my explanation, I mean a row like this:
For example, the above row is row 1.
What I have done is the following:
For every row check if all ROW-Values in that row are different using alldifferent from ic_global.
Do the same as above but then for the COLUMN-Values.
For every row, check if the square numbers are different (calculated using a row and col value each time), using alldifferent again.
The above things work fine and I get a solution for the sudoku but not the correct one. This is because I have to check one more thing: every position must be different. With the current state of my solver I could get a solution that has multiple numbers on the same position, f.e.: 2 and 3 could both be at position (5,7) because I don't check if all positions are different.
How would I tackle this problem?
I tried to get ALL the positions in one list in tuple form and then check if all tuples are different but I have been struggling for hours and I'm getting really desperate. I hope I can find a solution here.
EDIT: Added code

As you already know, all_different/1 and related constraints work on integers. Also, in your case, you are actually interested in a special case of tuples, namely pairs consisting of rows and columns.
So, your question can actually be reduced to:
How can I injectively map pairs of integers to integers?
Suppose you have pairs of the form A-B, where both A and B are constrained to 1..9.
I can put such pairs in a one-to-one correspondence to integers in several ways. A very easy function that does this is: 9×A + B. Think about it!
Thus, I recommend you map such positions to integers in this way or a similar one, and then post all_different/1 on these integers.
Exercise: Think about other possible mappings and their properites. Then generalize them to work on tuples.

Related

Amount of arrays with unique numbers

I have been wondering if there is any better solution of this problem:
Let's assume that there are n containers (they might not have the same length). In each of them we have some numbers. What is the amount of n-length arrays that are created by taking one element from every container? Those numbers in the newly formed arrays must be unique (e.g. (2,3,3) can not be created but (2,4,3) can).
Here is an exaple:
n=3
c1=(1,6,7)
c2=(1,6,7)
c3=(6,7)
The correct answer is 4, because we can create those four arrays: (1,6,7), (1,7,6), (6,1,7), (6,7,1).
Edit: None of the n containers contain duplicates and all the elements in the new arrays must have the same order as the order of the containers they belong to.
So my question is: Is there any better way to calculate the number of those arrays than just by generating every single possibility and checking if it has no repetitions?
You do not need to generate each possibility and then check whether or not it has repetitions - you can do that before adding the would-be duplicate element, saving a lot of wasted work further down the line. But yes, given the requirement that
all the elements in the new arrays must have the same order as the
order of the containers they belong to
you cannot simply count permutations, or combinations of m-over-n, which would have been much quicker (as there is a closed formula for those).
Therefore, the optimal algorithm is probably to use a backtracking approach with a set to avoid duplicates while building partial answers, and count the number of valid answers found.
The problem looks somewhat like counting possible answers to a 1-dimensional sudoku: choose one element each from each region, ensuring no duplicates. For many cases, there may be 0 answers - imagine n=4, c=[[1,2],[2,3],[3,1],[2,3]]. For example, if are less than k unique elements for a subset of k containers, no answer is possible.

n-place mastermind variation algorithm

A few days ago I came across such a problem at the contest my uni was holding:
Given the history of guesses in a mastermind game using digits instead
of colors in a form of pairs (x, y) where x is the guess and y is how
many digits were placed correctly, guess the correct number. Each
input is guaranteed to have a solution.
Example for a 5-place game:
(90342, 2)
(70794, 0)
(39458, 2)
(34109, 1)
(51545, 2)
(12531, 1)
Should yield:
39542
Create an algorithm to correctly guess the result in an n-place
mastermind given the history.
So the only idea I had was to keep the probability of each digit being correct based on the correct shots in a given guess and then try to generate the most possible number, then the next one and so on - so for example we'd have 9 being 40% possible for the first place (cause the first guess has 2/5=40% correct), 7 being impossible and so on. Then we do the same for other places in the number and finally generate a number with the highest probability to test it against all the guesses.
The problem with this approach, though, is that generating the next possible number, and the next, and so on (as we probably won't score a home run in the first try) is really non-trivial (or at least I don't see an easy way of implementing this) and since this contest had something like a 90 minute timeframe and this wasn't the only problem, I don't think something so elaborate was the anticipated approach.
So how could one do it easier?
An approach that comes to mind is to write a routine that can generally filter an enumeration of combinations based on a particular try and its score.
So for your example, you would initially pick one of the most constrained tries (one of the ones with a score of 2) as a filter and then enumerate all combinations that satisfy it.
The output from that enumeration is then used as input to a filter run for the next unprocessed try, and so on, until the list of tries is exhausted.
The candidate try that comes out of the final enumeration is the solution.
Probability does not apply here. In this case a number is either right or wrong. There is no "partially right".
For 5 digits you can just test all 100,000 possible numbers against the given history and throw out the ones where the matches are incorrect. This approach becomes impractical for larger numbers at some point. You will be left with a list of numbers that meet the criteria. If there is exactly one in the list, then you have solved it.
python code, where matches counts the matching digits of its 2 parameters:
for k in range(0,100000):
if matches(k,90342)==2 and matches(k,70794)==0 and matches(k,39458)==2 and matches(k,34109)==1 and matches(k,51545)==2 and matches(k,12531):
print k
prints:
39542

Custom heuristic in ECLiPSe CLP

Consider the following puzzle:
A cell is either marked or unmarked. Numbers along the right and bottom side of the puzzle denote the total sum for a certain row or column. Cells contribute (if marked) to the sum in its row and column: a cell in position (i,j) contributes i to the column sum and j to the row sum. For example, in the first row in the picture above, the 1st, 2nd and 5th cell are marked. These contribute 1 + 2 + 5 to the row sum (thus totalling 8), and 1 each to their column sum.
I have a solver in ECLiPSe CLP for this puzzle and I am tyring to write a custom heuristic for it.
The easiest cells to start with, I think, are those for which the column and row hint are as low as possible. In general, the lower N is, the fewer possibilities exist to write N as a sum of natural numbers between 1 and N. In the context of this puzzle it means the cell with the lowest column hint + row hint has lowest odds of being wrong, so less backtracking.
In the implementation I have a NxN array that represents the board, and two lists of size N that represent the hints. (The numbers to the side and on the bottom.)
I see two options:
Write a custom selection predicate for search/6. However, if I understand correctly, I can only give it 2 parameters. There's no way to calculate the row + column sum for a given variable then, because I need to be able to pass it to the predicate. I need 4 parameters.
Ignore search/6 and write an own labelling method. That's how I have
it right now, see the code below.
It takes the board (the NxN array containing all decision variables), both lists of hints and returns a list containing all variables, now sorted according to their row + column sum.
However, this possibly cannot get any more cumbersome, as you can see. To be able to sort, I need to attach the sum to each variable, but in order to do that, I first need to convert it to a term that also contains the coordinates of said variable, so that I convert back to the variable as soon as sorting is done...
lowest_hints_first(Board,RowArr,ColArr,Out) :-
dim(Board,[N,N]),
dim(OutBoard,[N,N]),
( multifor([I,J],[1,1],[N,N]), foreach(Term,Terms), param(RowArr,ColArr) do
RowHint is ColArr[I],
ColHint is RowArr[J],
TotalSum is RowHint + ColHint,
Term = field(I,J,TotalSum)
),
sort(3,<,Terms,SortedTerms), % Sort based on TotalSum
terms_to_vars(SortedTerms,Board,Out), % Convert fields back to vars...
( foreach(Var,Out) do
indomain(Var,max)
).
terms_to_vars([],_,[]).
terms_to_vars([field(I,J,TotalSum)|RestTerms],Vars,[Out|RestOut]) :-
terms_to_vars(RestTerms,Vars,RestOut),
Out is Vars[I,J].
In the end this heuristic is barely faster than input_order. I suspect its due to the awful way it's implemented. Any ideas on how to do it better? Or is my feeling that this heuristic should be a huge improvement incorrect?
I see you are already happy with the improvement suggested by Joachim; however, as you ask for further improvements of your heuristic, consider that there is only one way to get 0 as a sum, as well as there is only one way to get 15.
There is only one way to get 1 and 14, 2 and 13; two ways to get 3 and 12.
In general, if you have K ways to get sum N, you also have K ways to get 15-N.
So the difficult sums are not the large ones, they are the middle ones.

How to enumerate all states in the 8-puzzle?

I am solving the 8-puzzle. It is a problem which looks like this:
Image courtesy of: https://ece.uwaterloo.ca/~dwharder/aads/Algorithms/N_puzzles/ (you can also see there for a more detailed description of the 8-puzzle). The user can move a square adjacent to the blank into the blank. The task is to restore the arrangement as shown in the picture, starting from some arbitrary arrangement.
Now, of course the state can be described as a permutation of 9 digits. In case of the picture shown, the permutation is:
1 2 3 4 5 6 7 8 0
However, not all permutations are reachable from the shown configuration. Therefore, I have the following questions.
What is the number of permutations obtainable from the shown initial configuration by sliding tiles into the blank?
Call the answer to the above N. Now, I want a 1-1 mapping from integers from 1 to N to permutations. That is, I want to have a function that takes a permutation and returns an appropriate integer as well as a function that takes an integer and returns the permutation. The mapping has to be a bijection (i.e. an imperfect hash is not enough).
181440.
Stick them in an array and sort it, e.g. lexicographically. Then converting integers to permutations is O(1), and going the other way is O(log n).
Well if you just want to enumerate the different possible states that can be reached, you can just depth first search from your initial state. It's very possible to generate the valid next states given a current state, for example: moving a tile down into the empty space is the same as swapping the 0 tile with the tile 3 before it in the permutation if there is one. So you just do a dfs and keep a hashset of all the permutations as your visited array which could be stored as ints or strings. there are only 9! possible states which is only 362880. If you need a 1-1 mapping from the set of integers just make the hashset a hashtable and everytime you find a new state just add it to the hash table at the next index. You could also find the shortest solution by doing a breadth first first instead and just breaking when you find the solved state.

Find the "largest" dense sub matrix in a large sparse matrix

Given a large sparse matrix (say 10k+ by 1M+) I need to find a subset, not necessarily continuous, of the rows and columns that form a dense matrix (all non-zero elements). I want this sub matrix to be as large as possible (not the largest sum, but the largest number of elements) within some aspect ratio constraints.
Are there any known exact or aproxamate solutions to this problem?
A quick scan on Google seems to give a lot of close-but-not-exactly results. What terms should I be looking for?
edit: Just to clarify; the sub matrix need not be continuous. In fact the row and column order is completely arbitrary so adjacency is completely irrelevant.
A thought based on Chad Okere's idea
Order the rows from largest count to smallest count (not necessary but might help perf)
Select two rows that have a "large" overlap
Add all other rows that won't reduce the overlap
Record that set
Add whatever row reduces the overlap by the least
Repeat at #3 until the result gets to small
Start over at #2 with a different starting pair
Continue until you decide the result is good enough
I assume you want something like this. You have a matrix like
1100101
1110101
0100101
You want columns 1,2,5,7 and rows 1 and 2, right? That submatrix would 4x2 with 8 elements. Or you could go with columns 1,5,7 with rows 1,2,3 which would be a 3x3 matrix.
If you want an 'approximate' method, you could start with a single non-zero element, then go on to find another non-zero element and add it to your list of rows and columns. At some point you'll run into a non-zero element that, if it's rows and columns were added to your collection, your collection would no longer be entirely non-zero.
So for the above matrix, if you added 1,1 and 2,2 you would have rows 1,2 and columns 1,2 in your collection. If you tried to add 3,7 it would cause a problem because 1,3 is zero. So you couldn't add it. You could add 2,5 and 2,7 though. Creating the 4x2 submatrix.
You would basically iterate until you can't find any more new rows and columns to add. That would get you too a local minimum. You could store the result and start again with another start point (perhaps one that didn't fit into your current solution).
Then just stop when you can't find any more after a while.
That, obviously, would take a long time, but I don't know if you'll be able to do it any more quickly.
I know you aren't working on this anymore, but I thought someone might have the same question as me in the future.
So, after realizing this is an NP-hard problem (by reduction to MAX-CLIQUE) I decided to come up with a heuristic that has worked well for me so far:
Given an N x M binary/boolean matrix, find a large dense submatrix:
Part I: Generate reasonable candidate submatrices
Consider each of the N rows to be a M-dimensional binary vector, v_i, where i=1 to N
Compute a distance matrix for the N vectors using the Hamming distance
Use the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) algorithm to cluster vectors
Initially, each of the v_i vectors is a singleton cluster. Step 3 above (clustering) gives the order that the vectors should be combined into submatrices. So each internal node in the hierarchical clustering tree is a candidate submatrix.
Part II: Score and rank candidate submatrices
For each submatrix, calculate D, the number of elements in the dense subset of the vectors for the submatrix by eliminating any column with one or more zeros.
Select the submatrix that maximizes D
I also had some considerations regarding the min number of rows that needed to be preserved from the initial full matrix, and I would discard any candidate submatrices that did not meet this criteria before selecting a submatrix with max D value.
Is this a Netflix problem?
MATLAB or some other sparse matrix libraries might have ways to handle it.
Is your intent to write your own?
Maybe the 1D approach for each row would help you. The algorithm might look like this:
Loop over each row
Find the index of the first non-zero element
Find the index of the non-zero row element with the largest span between non-zero columns in each row and store both.
Sort the rows from largest to smallest span between non-zero columns.
At this point I start getting fuzzy (sorry, not an algorithm designer). I'd try looping over each row, lining up the indexes of the starting point, looking for the maximum non-zero run of column indexes that I could.
You don't specify whether or not the dense matrix has to be square. I'll assume not.
I don't know how efficient this is or what its Big-O behavior would be. But it's a brute force method to start with.
EDIT. This is NOT the same as the problem below.. My bad...
But based on the last comment below, it might be equivilent to the following:
Find the furthest vertically separated pair of zero points that have no zero point between them.
Find the furthest horizontally separated pair of zero points that have no zeros between them ?
Then the horizontal region you're looking for is the rectangle that fits between these two pairs of points?
This exact problem is discussed in a gem of a book called "Programming Pearls" by Jon Bentley, and, as I recall, although there is a solution in one dimension, there is no easy answer for the 2-d or higher dimensional variants ...
The 1=D problem is, effectively, find the largest sum of a contiguous subset of a set of numbers:
iterate through the elements, keeping track of a running total from a specific previous element, and the maximum subtotal seen so far (and the start and end elemnt that generateds it)... At each element, if the maxrunning subtotal is greater than the max total seen so far, the max seen so far and endelemnt are reset... If the max running total goes below zero, the start element is reset to the current element and the running total is reset to zero ...
The 2-D problem came from an attempt to generate a visual image processing algorithm, which was attempting to find, within a stream of brightnesss values representing pixels in a 2-color image, find the "brightest" rectangular area within the image. i.e., find the contained 2-D sub-matrix with the highest sum of brightness values, where "Brightness" was measured by the difference between the pixel's brighness value and the overall average brightness of the entire image (so many elements had negative values)
EDIT: To look up the 1-D solution I dredged up my copy of the 2nd edition of this book, and in it, Jon Bentley says "The 2-D version remains unsolved as this edition goes to print..." which was in 1999.

Resources