Sorting a list of numbers with modified cost - algorithm

First, this was one of the four problems we had to solve in a project last year and I couldn’t find a suitable algorithm so we handle in a brute force solution.
Problem: The numbers are in a list that is not sorted and supports only one type of operation. The operation is defined as follows:
Given a position i and a position j the operation moves the number at position i to position j without altering the relative order of the other numbers. If i > j, the positions of the numbers between positions j and i - 1 increment by 1, otherwise if i < j the positions of the numbers between positions i+1 and j decreases by 1. This operation requires i steps to find a number to move and j steps to locate the position to which you want to move it. Then the number of steps required to move a number of position i to position j is i+j.
We need to design an algorithm that given a list of numbers, determine the optimal (in terms of cost) sequence of moves to rearrange the sequence.
Attempts:
Part of our investigation was around NP-Completeness, we make it a decision problem and try to find a suitable transformation to any of the problems listed in Garey and Johnson’s book: Computers and Intractability with no results. There is also no direct reference (from our point of view) to this kind of variation in Donald E. Knuth’s book: The art of Computer Programing Vol. 3 Sorting and Searching. We also analyzed algorithms to sort linked lists but none of them gives a good idea to find de optimal sequence of movements.
Note that the idea is not to find an algorithm that orders the sequence, but one to tell me the optimal sequence of movements in terms of cost that organizes the sequence, you can make a copy and sort it to analyze the final position of the elements if you want, in fact we may assume that the list contains the numbers from 1 to n, so we know where we want to put each number, we are just concerned with minimizing the total cost of the steps.
We tested several greedy approaches but all of them failed, divide and conquer sorting algorithms can’t be used because they swap with no cost portions of the list and our dynamic programing approaches had to consider many cases.
The brute force recursive algorithm takes all the possible combinations of movements from i to j and then again all the possible moments of the rest of the element’s, at the end it returns the sequence with less total cost that sorted the list, as you can imagine the cost of this algorithm is brutal and makes it impracticable for more than 8 elements.
Our observations:
n movements is not necessarily cheaper than n+1 movements (unlike swaps in arrays that are O(1)).
There are basically two ways of moving one element from position i to j: one is to move it directly and the other is to move other elements around i in a way that it reaches the position j.
At most you make n-1 movements (the untouched element reaches its position alone).
If it is the optimal sequence of movements then you didn’t move the same element twice.

This problem looks like a good candidate for an approximation algorithm but that would only give us a good enough answer. Since you want the optimal answer, this is what I'd do to improve on the brute force approach.
Instead of blindly trying every permutations, I'd use a backtracking approach that would maintain the best solution found and prune any branches that exceed the cost of our best solution. I would also add a transposition table to avoid redoing searches on states that were reached by previous branches using different move permutations.
I would also add a few heuristics to explore moves that are more likely to reach good results before any other moves. For example, prefer moves that have a small cost first. I'd need to experiment before I can tell which heuristics would work best if any.
I would also try to find the longest increasing subsequence of numbers in the original array. This will give us a sequence of numbers that don't need to be moved which should considerably cut the number of branches we need to explore. This also greatly speeds up searches on list that are almost sorted.
I'd expect these improvements to be able to handle lists that are far greater than 8 but when dealing with large lists of random numbers, I'd prefer an approximation algorithm.
By popular demand (1 person), this is what I'd do to solve this with a genetic algorithm (the meta-heuristique I'm most familiar with).
First, I'd start by calculating the longest increasing subsequence of numbers (see above). Every item that is not part of that set has to be moved. All we need to know now is in what order.
The genomes used as input for the genetic algorithm, is simply an array where each element represents an item to be moved. The order in which the items show up in the array represent the order in which they have to be moved. The fitness function would be the cost calculation described in the original question.
We now have all the elements needed to plug the problem in a standard genetic algorithm. The rest is just tweaking. Lots and lots of tweaking.

Related

A good randomizer for puzzle-15

I have implemented a puzzle 15 for people to compete online. My current randomizer works by starting from the good configuration and moving tiles around for 100 moves (arbitrary number)
Everything is fine, however, once in a little while the tiles are shuffled too easy and it takes only a few moves to solve the puzzle, therefore the game is really unfair for some people reaching better scores in a much higher speed.
What would be a good way to randomize the initial configuration so it is not "too easy"?
You can generate a completely random configuration (that is solvable) and then use some solver to determine the optimal sequence of moves. If the sequence is long enough for you, good, otherwise generate a new configuration and repeat.
Update & details
There is an article on Wikipedia about the 15-puzzle and when it is (and isn't) solvable. In short, if the empty square is in the lower-right corner, then the puzzle is solvable if and only if the number of inversions (an inversion is a swap of two elements in the sequence, not necessarily adjacent elements) with respect to the goal permutation is even.
You can then easily generate a solvable start state by doing an even number of inversions, which may lead to a not-so-easy-to-solve state far quicker than by doing regular moves, and it is guaranteed that it will remain solvable.
In fact, you don't need to use a search algorithm as I mentioned above, but an admissible heuristic. Such one always underestimates never overestimates the number of moves needed to solve the puzzle, i.e. you are guaranteed that it will not take less moves that the heuristic tells you.
A good heuristic is the sum of manhattan distances of each number to its goal position.
Summary
In short, a possible (very simple) algorithm for generating starting positions might look like this:
1: current_state <- goal_state
2: swap two arbitrary (randomly selected) pieces
3: swap two arbitrary (randomly selected) pieces again (to ensure solvability)
4: h <- heuristic(current_state)
5: if h > desired threshold
6: return current_state
7: else
8: go to 2.
To be absolutely certain about how difficult a state is, you need to find the optimal solution using some solver. Heuristics will give you only an estimate.
I would do this
start from solution (just like you did)
make valid turn in random direction
so you must keep track where the gap is and generate random direction (N,E,S,W) and do the move. I think this part you have done too.
compute the randomness of your placements
So compute some coefficient dependent on the order of the array. So ordered (solved) solutions will have low values and random will have high values. The equation for the coefficiet however is a matter of trial and error. Here some ideas what to use:
correlation coefficient
sum of average difference of value and its neighbors
1 2 4
3 6 5
9 8 7
coeff(6)= (|6-3|+|6-5|+|6-2|+|6-8|)/4
coeff=coeff(1)+coeff(2)+...coeff(15)
abs distance from ordered array
You can combine more approaches together. You can divide this to separated rows and columns and then combine the sub coefficients together.
loop #2 unit coefficient from #3 is high enough (treshold)
The treshold can be used also to change the difficulty.

Mutually Overlapping Subset of Activites

I am prepping for a final and this was a practice problem. It is not a homework problem.
How do I go about attacking this? Also, more generally, how do I know when to use Greedy vs. Dynamic programming? Intuitively, I think this is a good place to use greedy. I'm also thinking that if I could somehow create an orthogonal line and "sweep" it, checking the #of intersections at each point and updating a global max, then I could just return the max at the end of the sweep. I'm not sure how to plane sweep algorithmically though.
a. We are given a set of activities I1 ... In: each activity Ii is represented by its left-point Li and its right-point Ri. Design a very efficient algorithm that finds the maximum number of mutually overlapping subset of activities (write your solution in English, bullet by bullet).
b. Analyze the time complexity of your algorithm.
Proposed solution:
Ex set: {(0,2) (3,7) (4,6) (7,8) (1,5)}
Max is 3 from interval 4-5
1) Split start and end points into two separate arrays and sort them in non-decreasing order
Start points: [0,1,3,4,7] (SP)
End points: [2,5,6,7,8] (EP)
I know that I can use two pointers to sort of simulate the plane sweep, but I'm not exactly sure how. I'm stuck here.
I'd say your idea of a sweep is good.
You don't need to worry about planar sweeping, just use the start/end points. Put the elements in a queue. In every step take the smaller element from the queue front. If it's a start point, increment current tasks count, otherwise decrement it.
Since you don't need to point which tasks are overlapping - just the count of them - you don't need to worry about specific tasks duration.
Regarding your greedy vs DP question, in my non-professional opinion greedy may not always provide valid answer, whereas DP only works for problem that can be divided into smaller subproblems well. In this case, I wouldn't call your sweep-solution either.

Optimized Algorithm: Fastest Way to Derive Sets

I'm writing a program for a competition and I need to be faster than all the other competitors. For this I need a little algorithm help; ideally I'd be using the fastest algorithm.
For this problem I am given 2 things. The first is a list of tuples, each of which contains exactly two elements (strings), each of which represents an item. The second is an integer, which indicates how many unique items there are in total. For example:
# of items = 3
[("ball","chair"),("ball","box"),("box","chair"),("chair","box")]
The same tuples can be repeated/ they are not necessarily unique.) My program is supposed to figure out the maximum number of tuples that can "agree" when the items are sorted into two groups. This means that if all the items are broken into two ideal groups, group 1 and group 2, what are the maximum number of tuples that can have their first item in group 1 and their second item in group 2.
For example, the answer to my earlier example would be 2, with "ball" in group 1 and "chair" and "box" in group 2, satisfying the first two tuples. I do not necessarily need know what items go in which group, I just need to know what the maximum number of satisfied tuples could be.
At the moment I'm trying a recursive approach, but its running on (n^2), far too inefficient in my opinion. Does anyone have a method that could produce a faster algorithm?
Thanks!!!!!!!!!!
Speed up approaches for your task:
1. Use integers
Convert the strings to integers (store the strings in an array and use the position for the tupples.
String[] words = {"ball", "chair", "box"};
In tuppls ball now has number 0 (pos 0 in array) , chair 1, box 2.
comparing ints is faster than Strings.
2. Avoid recursion
Recursion is slow, due the recursion overhead.
For example look at binarys search algorithm in a recursive implementatiion, then look how java implements binSearch() (with a while loop and iteration)
Recursion is helpfull if problems are so complex that a non recursive implementation is to complex for a human brain.
An iterataion is faster, but not in the case when you mimick recursive calls by implementing your own stack.
However you can start implementing using a recursiove algorithm, once it works and it is a suited algo, then try to convert to a non recursive implementation
3. if possible avoid objects
if you want the fastest, the now it becomes ugly!
A tuppel array can either be stored in as array of class Point(x,y) or probably faster,
as array of int:
Example:
(1,2), (2,3), (3,4) can be stored as array: (1,2,2,3,3,4)
This needs much less memory because an object needs at least 12 bytes (in java).
Less memory becomes faster, when the array are really big, then your structure will hopefully fits in the processor cache, while the objects array does not.
4. Programming language
In C it will be faster than in Java.
Maximum cut is a special case of your problem, so I doubt you have a quadratic algorithm for it. (Maximum cut is NP-complete and it corresponds to the case where every tuple (A,B) also appears in reverse as (B,A) the same number of times.)
The best strategy for you to try here is "branch and bound." It's a variant of the straightforward recursive search you've probably already coded up. You keep track of the value of the best solution you've found so far. In each recursive call, you check whether it's even possible to beat the best known solution with the choices you've fixed so far.
One thing that may help (or may hurt) is to "probe": for each as-yet-unfixed item, see if putting that item on one of the two sides leads only to suboptimal solutions; if so, you know that item needs to be on the other side.
Another useful trick is to recurse on items that appear frequently both as the first element and as the second element of your tuples.
You should pay particular attention to the "bound" step --- finding an upper bound on the best possible solution given the choices you've fixed.

Minimum ordering of a loop of values

Given a sequence like:
1,2,1,2,1,1,1,2,1,2,1,3,1,2,1,2,1,3
What is an efficient way to obtain the minimum ordering:
1,1,1,2,1,2,1,3,1,2,1,2,1,3,1,2,1,2
The brute force approach is obvious so please don't recommend it -- unless to provide a convincing proof that the brute force approach is the only way to do it!
More detail
I have an algorithm that generates a list of numbers. The output of the algorithm is a list / array, but logically the numbers represent a loop, where only the relative order of the elements matters. In order to store these loops for later comparison I want to put store them in such a way that the one-dimensional list of numbers stored represents the minimum ordering of the elements in the loop. A picture would be most helpful:
This loop describes the path around the T tetromino, where 1 is move forward, 2 is turn right, and 3 is turn left. It doesn't matter where you start or even which direction you go, following this sequence of 18 moves will get you a T tetromino. The output of the algorithm which produces this loop will return the elements with an arbitrary starting point and direction. So the returned array could be:
Arbitrary initial ordering:
1,2,1,2,1,1,1,2,1,2,1,3,1,2,1,2,1,3
There is one minimum ordering, though. It can be obtained from two different circuits, reflecting the fact that the T tetromino is symmetrical:
Minimum ordering:
1,1,1,2,1,2,1,3,1,2,1,2,1,3,1,2,1,2
Brute force
The obvious brute force method is to construct all possible orderings and take the minimum. My question is whether there is a more clever and efficient way of doing this.
This is a well-studied problem called the lexicographically minimal string rotation.
The better algorithms all run in O(n) time as opposed to O(n*n) for the naive algorithm.

Sort array in ascending order while minimizing "cost"

I'm taking comp 2210 (Data Structures) next semester and I've been doing the homework for the summer semester that is posted online. Until now, I've had no problems doing the assignments. Take a look at assignment 4 below, and see if you can give me a hint as to how to approach it. Please don't provide a complete algorithm, just an approach. Thanks!
A “costed sort” is an algorithm in which a sequence of values must be arranged in ascending order. The sort is
carried out by interchanging the position of two values one at a time until the sequence is in the proper order. Each
interchange incurs a cost, which is calculated as the sum of the two values involved in the interchange. The total
cost of the sort is the sum of the cost of the interchanges.
For example, suppose the starting
sequence were {3, 2, 1}. One possible
series of interchanges is
Interchange 1: {3, 1, 2} interchange cost = 0
Interchange 2: {1, 3, 2} interchange cost = 4
Interchange 3: {1, 2, 3} interchange cost = 5,
given a total cost of 9
You are to write a program that determines the minimal cost to arrange a specific sequence of numbers.
Edit: The professor does not allow brute forcing.
If you want to surprise your professor, you could use Simulated Annealing. Then again, if you manage that, you can probably skip a few courses :). Note that this algorithm will only give an approximate answer.
Otherwise: try a Backtracking algorithm, or Branch and Bound. These will both find the optimal answer.
What do you mean "brute forcing?" Do you mean "try all possible combinations and select the cheapest?" Just checking.
I think "branch and bound" is what you're looking for - check any source on algorithms. It is "like" brute force, except as you try a sequence of moves, as soon as that sequence of moves is less optimal than any other sequence of moves tried so far, you can abandon the sequence that got you to that point - the cost. This is one flavor of the "backtracking" mentioned above.
My preferred language for doing this would be Prolog but I'm weird.
Simulated Annealing is a PROBABLISTIC algorithm - if the solution space has local minima, then you may be trapped in one and get what you think is the right answer but isn't. There are ways around that and the literature all about that can be found but I don't agree that it's the tool you want.
You could try the related genetic algorithms too if that's the way you want to go.
Have you learned trees? You could create a tree with all possible changes leading to the desired result. The trick, of course, is to avoid creating the whole tree -- particularly when a part of it is obviously not the best solution, right?
I think the appropriate approach is to think hard about what defining properties a minimal "cost" sort has. Then figure out the cost by simulating this ideal sort. The key element here is you don't have to implement a general minimal cost sorting algorithm.
For example, let's say the defining property of a minimal cost sort is that every exchange puts at least one of the exchanged element in it's sorted position (I don't know if this is true). Every exchange based sort would love to be able to have this property, but it's not easy(possible?) in the general case. However You can easily create a program that takes an unsorted array, takes the sorted version (which itself can be generated by an unoptimal algorithm), and then using this information decides the minimum cost to achieve the sorted array from the unsorted array.
Description
I think the cheapest way to do this is to swap the cheapest misplaced item with the item that belongs in its spot. I believe this reduces cost by moving the most expensive things just once. If there are n-elements that are out of place, then there will be at most n-1 swaps to put them in place, at a cost of n-1 * cost of least item + cost of all other out of place.
If the globally cheapest element is not misplaced and the spread between this cheapest one and the cheapest misplaced one is great enough, it can be cheaper to swap the cheapest one in its correct place with the cheapest misplaced one. The cost then is n-1 * cheapest + cost of all out of place + cost of cheapest out of place.
Example
For [4,1,2,3], this algorithm exchanges (1,2) to produce:
[4,2,1,3]
and then swaps (3,1) to produce:
[4,2,3,1]
and then swaps (4,1) to produce:
[1,2,3,4]
Notice that each misplaced item in [2,3,4] is moved only once and is swapped with the lowest cost item.
Code
Ooops: "Please don't provide a complete algorithm, just an approach." Removed my code.
In an effort to only get you going on it this may not make complete sense.
Determine all possible moves, and the cost for each move and store those somehow, perform the least expensive move, then determine the moves that can be performed from this variation, storing those with the rest of your stored moves, perform the least expensive etc until the array is sorted.
I love solving things like this.
This problem is also known as the Silly Sort problem in some ACM contests. Take a look at this solution using Divide & Conquer.
Try different sorting algorithms on the same input data and print the minimum.

Resources