A good randomizer for puzzle-15 - algorithm

I have implemented a puzzle 15 for people to compete online. My current randomizer works by starting from the good configuration and moving tiles around for 100 moves (arbitrary number)
Everything is fine, however, once in a little while the tiles are shuffled too easy and it takes only a few moves to solve the puzzle, therefore the game is really unfair for some people reaching better scores in a much higher speed.
What would be a good way to randomize the initial configuration so it is not "too easy"?

You can generate a completely random configuration (that is solvable) and then use some solver to determine the optimal sequence of moves. If the sequence is long enough for you, good, otherwise generate a new configuration and repeat.
Update & details
There is an article on Wikipedia about the 15-puzzle and when it is (and isn't) solvable. In short, if the empty square is in the lower-right corner, then the puzzle is solvable if and only if the number of inversions (an inversion is a swap of two elements in the sequence, not necessarily adjacent elements) with respect to the goal permutation is even.
You can then easily generate a solvable start state by doing an even number of inversions, which may lead to a not-so-easy-to-solve state far quicker than by doing regular moves, and it is guaranteed that it will remain solvable.
In fact, you don't need to use a search algorithm as I mentioned above, but an admissible heuristic. Such one always underestimates never overestimates the number of moves needed to solve the puzzle, i.e. you are guaranteed that it will not take less moves that the heuristic tells you.
A good heuristic is the sum of manhattan distances of each number to its goal position.
Summary
In short, a possible (very simple) algorithm for generating starting positions might look like this:
1: current_state <- goal_state
2: swap two arbitrary (randomly selected) pieces
3: swap two arbitrary (randomly selected) pieces again (to ensure solvability)
4: h <- heuristic(current_state)
5: if h > desired threshold
6: return current_state
7: else
8: go to 2.
To be absolutely certain about how difficult a state is, you need to find the optimal solution using some solver. Heuristics will give you only an estimate.

I would do this
start from solution (just like you did)
make valid turn in random direction
so you must keep track where the gap is and generate random direction (N,E,S,W) and do the move. I think this part you have done too.
compute the randomness of your placements
So compute some coefficient dependent on the order of the array. So ordered (solved) solutions will have low values and random will have high values. The equation for the coefficiet however is a matter of trial and error. Here some ideas what to use:
correlation coefficient
sum of average difference of value and its neighbors
1 2 4
3 6 5
9 8 7
coeff(6)= (|6-3|+|6-5|+|6-2|+|6-8|)/4
coeff=coeff(1)+coeff(2)+...coeff(15)
abs distance from ordered array
You can combine more approaches together. You can divide this to separated rows and columns and then combine the sub coefficients together.
loop #2 unit coefficient from #3 is high enough (treshold)
The treshold can be used also to change the difficulty.

Related

Minimal data structure to prevent 2D-grid traveler from repeat itself

I'm sorry if this is a duplicate of some thread, but I'm really not sure how to describe the question.
I'm wondering what is the minimal data structure to prevent 2D-grid traveler from repeating itself (i.e. travel to some point it already traveled before). The traveler can only move horizontally or vertically 1 step each time. For my special case (below), the 2D-grid is actually a lower-left triagle where one coordinate never exceeds another.
For example, with 1D case, this can be simply done by recording the direction of last travel. If direction changes, it's repeating itself.
For 2D case it becomes complicated. The most trivial way would be creating a list recording the points traveled before, but I'm wondering are there more efficient ways to do that?
I'm implementing a more-or-less "4-finger" algorithm for 4-sum where the 2 fingers in the middle moves in two directions (namely i, j, k, and l):
i=> <=j=> <=k=> <=l
1 2 3 ... 71 72 ... 123 124 ... 201 202 203
The directions fingers travel are decided (or suggested) by some algorithm but might lead to forever-loop. Therefore, I have to force not to take some suggestion if the 2 fingers in the middle starts to repeat history position.
EDIT
Among these days, I found 2 solutions. None of them is ideal solution to this problem, but they're at least somewhat usable:
As #Sorin mentioned below, one solution would be saving a bit array representing state of all cells. For the triangular-grid example here, we can even condense the array to cut memory cost by half (though requiring k^2 time to compute the bit position where k is the degree of freedom i.e. 2 here. A standard array would use only linear time).
Another solution would be directly avoid backward-travelling. Set up the algorithm such that j and k only move in one direction (this is probably greedy).
But still since the 2D-grid traveler have the nice property that it moves along axis 1 step each time, I'm wondering are there more "specialized" representation
for this kind of movement.
Thanks for your help!
If you are looking for optimal lookup complexity, then a hashset is the best thing. You need O(N) memory but all lookups & insertions will be O(1).
If it's often that you visit most of the cells then you can even skip the hash part and store a bit array. That is store one bit for every cell and just check if the corresponding bit is 0 or 1. This is much more compact in memory (at least 32x, one bit vs. one int, but likely more as you also skip storing some pointers internal to the datastructure, 64 bits).
If this still take too much space, you could use a bloom filter (link), but that will give you some false positives (tells you that you've visited a cell, but in fact you didn't). If that's something you can live with the space savings are fairly huge.
Other structures like BSP or Kd-trees could work as well. Once you reach a point where everything is either free or occupied (ignoring the unused cells in the upper triangle) you can store all that information in a single node.
This is hard to recommend because of it's complexity and that it will likely also use O(N) memory in many cases, but with a larger constant. Also all checks will be O(logN).

Markov chains and Random walks on top of biological data

I'm coming from biology's field and thus I have some difficulties in understanding (intuitively?) some of the ideas of that paper. I really tried my best to decipher it step by step by using a lot of google and youtube, but now I feel, it's the time to refer to the professionals in that field.
Before filling out the whole universe with (unordered) questions, let me put the whole thing down and try to introduce you to the subject while at the same time explain to you what I got so far from my research on that.
Microarrays
For those that do not have any idea of what this is, you can imagine, that it is literally an array (matrix) where each cell of it contains a probe for a specific gene. Making the long story short, by the end of the microarray experiment, you have a matrix (in computational terms) with each column representing a sample, each line a different gene while the contents of the matrix represent the expression values of the genes for each sample.
Pathways
In biology pathway / gene-set they call a set of genes that interact with each other forming a small network responsible for a specific function.These pathways are not isolated but they talk/interact with each other too. What that paper does on the first hand, is to expand the initial pathway (let us call it target pathway), by including some other genes from other pathways that might interact with that.
Procedure
1.
Let's assume now that we have a matrix G x S. Where G for genes and S for Samples. We construct a gene co-expression network (G x G) using as weights the Pearson's correlation coefficients between genes' pairs (a). This could also be represented as an undirected weighted graph. .
2.
For each gene (row OR column) we calculate the weighted degree (d) which is nothing more than the sum of all correlation coefficients of that gene.
3.
From the two previous matrices, they construct the transition matrix producing the probabilities (P) to transit from one gene to another by using the
formula
Q1. Why do they call this transition probability? Is there any intuitive way to see this as a probability in the biological context?
4.
Since we have the whole transition matrix, we can define a subnetwork of the initial one, that we want to expand it and it consisted out of let's say 15 genes. In that step, they used formula number 3 (on the paper) which transforms the values of the initial transition matrix as it says. They set the probability of 1 on the nodes that are part of the selected subnetwork because they define them as absorbing states.
Q2. In that same formula (3), I cannot understand what the second condition does. When should the probability be 0? Intuitively, in my opinion, all nodes that didn't exist in subnetwork, should have the P_ij value as a probability.
5.
After that, the newly constructed transition matrix is showed at formula (4) in the paper and I managed to understand it using this excellent article.
6.
Here is where everything is getting more blur for me and where I need the most of the help. What I imagine at that step, is that the algorithm starts randomly from one node and keep walking around the network. In order to construct a relevance function (What that exactly means?), they firstly calculate a probability called joint probability of visiting one node/edge E(i,j) and noted as :
From the other hand they seem to calculate another probability called probability of a walk of length L starting in x and denoted as :
7.
In the next step, they divide the previously calculated probabilities and calculate the number of times a random walk starts in x using the transition from i to j that I don't really understand what this means.
After that step, I lost their reasoning at all :-P.
I'm not expecting an expert to come open my mind and give me understand that procedure. What I'm expecting is some guidelines, hints, ideas, useful resources or more intuitive approaches to understanding the whole procedure. Then when I fully understand it I will try to implement it on R or python.
So any idea / critics is welcome.
Thanks.

Mutually Overlapping Subset of Activites

I am prepping for a final and this was a practice problem. It is not a homework problem.
How do I go about attacking this? Also, more generally, how do I know when to use Greedy vs. Dynamic programming? Intuitively, I think this is a good place to use greedy. I'm also thinking that if I could somehow create an orthogonal line and "sweep" it, checking the #of intersections at each point and updating a global max, then I could just return the max at the end of the sweep. I'm not sure how to plane sweep algorithmically though.
a. We are given a set of activities I1 ... In: each activity Ii is represented by its left-point Li and its right-point Ri. Design a very efficient algorithm that finds the maximum number of mutually overlapping subset of activities (write your solution in English, bullet by bullet).
b. Analyze the time complexity of your algorithm.
Proposed solution:
Ex set: {(0,2) (3,7) (4,6) (7,8) (1,5)}
Max is 3 from interval 4-5
1) Split start and end points into two separate arrays and sort them in non-decreasing order
Start points: [0,1,3,4,7] (SP)
End points: [2,5,6,7,8] (EP)
I know that I can use two pointers to sort of simulate the plane sweep, but I'm not exactly sure how. I'm stuck here.
I'd say your idea of a sweep is good.
You don't need to worry about planar sweeping, just use the start/end points. Put the elements in a queue. In every step take the smaller element from the queue front. If it's a start point, increment current tasks count, otherwise decrement it.
Since you don't need to point which tasks are overlapping - just the count of them - you don't need to worry about specific tasks duration.
Regarding your greedy vs DP question, in my non-professional opinion greedy may not always provide valid answer, whereas DP only works for problem that can be divided into smaller subproblems well. In this case, I wouldn't call your sweep-solution either.

Ask for an algorithm

I want to know what algorithm the following problem can refer to. I guess it can be represented in CSP but the actions are randomized.
Assume I am playing monopoly. I can choose either 1, 2, or 3 dice for movement in each round. My goal is to skip other players' buildings and also go to a specific range of grids. What is a good algorithm to
select number of dice in each round
subject to
1. minimize number of rounds
2. skip some grids
3. move to some grids
3 dice is a pretty small number, so brute force should work well. Assign a value to each square (maybe -$50 to a property with rent $50 if an opponent owns it), then compute the expected value of each roll (1/6 of each of the next six squares for 1 die, 1/36 of two squares ahead + 1/18 of three squares ahead + ... + 1/36 of 12 squares ahead, etc.).
Sounds like you need some sort of combination of A* or Dijkstra and MiniMax (atleast if you want have a better result than just what can be derived from the current state).
You need to assign weighting (risk or priorities) to each of your constraints, and this depends on how risk averse you are. You would assign a positive value for a benefit and a negative value for a penalty. So, for example, element 1. would always suggest using 3 dice, but you need to assign a relative benefit for each of the 3 options. Then for the other 2 you would, for each choice of dice calculate the probability of landing on a beneficial grid and similarly a penalty grid, multiple these by your weighting, and add them all together, choosing the most positive outcome. But note that different weightings will give different answers. If you give option 1 an absurdly high weighting it will always chose 3 dice. Otherwise any outcome is possible.

Sorting a list of numbers with modified cost

First, this was one of the four problems we had to solve in a project last year and I couldn’t find a suitable algorithm so we handle in a brute force solution.
Problem: The numbers are in a list that is not sorted and supports only one type of operation. The operation is defined as follows:
Given a position i and a position j the operation moves the number at position i to position j without altering the relative order of the other numbers. If i > j, the positions of the numbers between positions j and i - 1 increment by 1, otherwise if i < j the positions of the numbers between positions i+1 and j decreases by 1. This operation requires i steps to find a number to move and j steps to locate the position to which you want to move it. Then the number of steps required to move a number of position i to position j is i+j.
We need to design an algorithm that given a list of numbers, determine the optimal (in terms of cost) sequence of moves to rearrange the sequence.
Attempts:
Part of our investigation was around NP-Completeness, we make it a decision problem and try to find a suitable transformation to any of the problems listed in Garey and Johnson’s book: Computers and Intractability with no results. There is also no direct reference (from our point of view) to this kind of variation in Donald E. Knuth’s book: The art of Computer Programing Vol. 3 Sorting and Searching. We also analyzed algorithms to sort linked lists but none of them gives a good idea to find de optimal sequence of movements.
Note that the idea is not to find an algorithm that orders the sequence, but one to tell me the optimal sequence of movements in terms of cost that organizes the sequence, you can make a copy and sort it to analyze the final position of the elements if you want, in fact we may assume that the list contains the numbers from 1 to n, so we know where we want to put each number, we are just concerned with minimizing the total cost of the steps.
We tested several greedy approaches but all of them failed, divide and conquer sorting algorithms can’t be used because they swap with no cost portions of the list and our dynamic programing approaches had to consider many cases.
The brute force recursive algorithm takes all the possible combinations of movements from i to j and then again all the possible moments of the rest of the element’s, at the end it returns the sequence with less total cost that sorted the list, as you can imagine the cost of this algorithm is brutal and makes it impracticable for more than 8 elements.
Our observations:
n movements is not necessarily cheaper than n+1 movements (unlike swaps in arrays that are O(1)).
There are basically two ways of moving one element from position i to j: one is to move it directly and the other is to move other elements around i in a way that it reaches the position j.
At most you make n-1 movements (the untouched element reaches its position alone).
If it is the optimal sequence of movements then you didn’t move the same element twice.
This problem looks like a good candidate for an approximation algorithm but that would only give us a good enough answer. Since you want the optimal answer, this is what I'd do to improve on the brute force approach.
Instead of blindly trying every permutations, I'd use a backtracking approach that would maintain the best solution found and prune any branches that exceed the cost of our best solution. I would also add a transposition table to avoid redoing searches on states that were reached by previous branches using different move permutations.
I would also add a few heuristics to explore moves that are more likely to reach good results before any other moves. For example, prefer moves that have a small cost first. I'd need to experiment before I can tell which heuristics would work best if any.
I would also try to find the longest increasing subsequence of numbers in the original array. This will give us a sequence of numbers that don't need to be moved which should considerably cut the number of branches we need to explore. This also greatly speeds up searches on list that are almost sorted.
I'd expect these improvements to be able to handle lists that are far greater than 8 but when dealing with large lists of random numbers, I'd prefer an approximation algorithm.
By popular demand (1 person), this is what I'd do to solve this with a genetic algorithm (the meta-heuristique I'm most familiar with).
First, I'd start by calculating the longest increasing subsequence of numbers (see above). Every item that is not part of that set has to be moved. All we need to know now is in what order.
The genomes used as input for the genetic algorithm, is simply an array where each element represents an item to be moved. The order in which the items show up in the array represent the order in which they have to be moved. The fitness function would be the cost calculation described in the original question.
We now have all the elements needed to plug the problem in a standard genetic algorithm. The rest is just tweaking. Lots and lots of tweaking.

Resources