Shortest Path: Picking up cards without duplicates - algorithm

I have a homework question in my algorithms class that asks the following:
You have a game board and a path to the end. You move one step at a
time. At each 'position' you step to there is a stack of cards (a
subset of the standard 52 card deck). There could be 1 card, 2 cards,
3 cards, etc. No duplicates, and there is at least one card.
The purpose of the game is to pick a card at each position. You
cannot select the same card twice. By the time you reach the end,
you want the total face value of the cards to be minimal.
Devise an algorithm that, given how many positions there are, and what set of cards are at each position, find the minimal combination of cards to pick up.
I don't really know where to start. I could do an exhaustive search but I fear that would not be efficient enough. I know that it's not as simple as just picking the smallest card at each position. Since you cannot pick the same card twice, you might encounter a situation where it's optimal to pick a slightly higher value card initially, then the much cheaper one at a later stage. I considered creating a 'decision tree', but that wouldn't help with time complexity either.

Use backtracking to find all possible paths, and then choose the path that yields the minimum value.
You can pre-process the data with two (possibly more) rules:
If one stack of cards in the path has a card that no other stack has, you can reduce the cards in that stack to the unique card.
If any stack in the path has only one card, you can remove all similar cards from other stacks.
You're still looking at a worst case of roughly 52! if the path is very long.

After looking into this for quite a while, I have not found an algorithm that has a worst case time complexity that is better than O(s!) where s is the number of stacks. For the pure theorist the problem can be solved in constant time as the number of cards and stacks has an upper limit. It is just a very ... big constant. The big-O notation only makes sense if your input size has no upper limit (we could achieve that requirement if we would allow the use of a custom card set, where the face value could go from 1..n, where n is variable).
Still I would like to pass on some thoughts that can help write an algorithm that performs well in many (but not all) configurations:
Definitions
A card's id is the combination of rank (1..13) and suit (clubs, hearts, diamonds, spades). Rank 0 = Ace of clubs, rank 1 = Ace of Hearts, ... rank 4 = Two of clubs, ... rank 7 = Two of Spades, ... rank 51 = King of Spades, the last one.
s is the number of stacks
ni is the number of cards on stack number i.
Algorithm
Mark all cards with the number of the stack they are in;
Sort the cards on each stack by increasing id (so with the lowest id at the top of the stack), and also create a map (by id) so the position of a cards in the sorted stack can be quickly found by its id.
Create a another list which has all cards together, also sorted by increasing id. Also create buckets by id for this overall sorted list, to allow direct lookup by id. As cards are not unique across stacks there can be duplicate id values.
Clean-up 1: For every stack i with ni > s: remove ni - s cards from the bottom of that stack so that the resulting stack size is at most s. This is because there is no way such a card could be part of an optimal solution. There are at most s-1 duplicates in that stack, so the first non-duplicate card will be at position <=s in the sort order of that stack. Whichever way you pick cards from the other stacks, at least one of the first s cards can be picked from this stack. There is no reason to take a card further towards the bottom of that stack. So they can be removed. Make sure to replicate the removal of cards in the overall list of step 3. After this step there are at most s² cards in the configuration.
Clean-up 2: Go through the sorted list of all cards: whenever a unique id X is found (i.e. it is alone in its bucket), get the stack number of that card X, and remove from that stack any card below card X. There is no way that these removed cards contribute to an optimal solution. Imagine such a card would be picked: it could be replaced by the unique card X and provide an equal or better solution (i.e. lower sum of ranks). Make sure to replicate the removal of cards in the overall list of step 3. This step could make some cards unique that were not unique before. As this step progresses through all the cards in order, these newly-become-unique cards will be treated in the same way. Once this step is completed we have stacks that have all their cards being duplicate with a card in another stack, except for maybe one card, which then is the card at the bottom of the stack.
Now we can pick a card.
If there is a stack with no card then this branch in the search tree offered no solution. Backtrack as we should have picked a card from this stack earlier on.
If there is a stack with just one card, pick that card.
Otherwise: It is clear that picking a card with minimal id (i.e. at the start of the overall sorted list), will lead to an optimal solution. Imagine you would create a solution without picking a card with this minimal id, then you would have taken another card from the stack(s) which have these minimal id cards. But then you could simply improve the solution by replacing one of these picked cards by one of the minimal id cards (in the same stack). If there is only one such minimal id card, take that one. Otherwise the algorithm has to branch of in as many branches as there are duplicates of this minimal id card. This represents a node in the typical search tree this algorithm has to walk through.
As you pick a card, remove all other cards of that stack (applying the change to the overal list as well), and the stack itself, thereby reducing the value s with 1.
If at this point s is 0, then we have a "solution", but maybe not the best one. If it is better than the one we had so far, register this as the best solution. Backtrack in order to visit other branches in the search tree, which might still have better solutions. If, on the other hand, s > 0, repeat from step 4 onwards (with this decreased s).
Note that when backtracking (in steps 6 or 7), you need also to restore the data structure (removed cards should be added again to the stacks and in the overall list in their sorted position).
The above algorithm still performs bad when you have a lot of duplicates.
Prune branches in search tree
If you would have a way to find a lower-bound of the value-sum you can potentially reach in the current branch of the search tree, then you could benefit from this knowledge and sometimes backtrack at an earlier stage (i.e. "prune" a branch): as soon as this lower bound is equal or higher than the best solution found so far, there is no use in continuing the search in that branch; it can never lead to a better solution. The better (i.e. higher) you can set this lower bound, the better performance the algorithm will have.
Here are a few ideas for calculating a lower bound for the sum:
First of all, this lower-bound sum would obviously include the sum of the card ranks that have already been taken at this point in the search. To this the minimum values should be added of the cards that still need to be picked. I suggest two ways to do that:
Add the ranks of the cards at the tops of the remaining stacks. Some of these values could belong to duplicate cards, and so they might not actually contribute to the solution, but a true solution would have values which are equal or greater than these values. So this sum will represent a lower bound.
Alternatively, you could identify the s lowest and distinct id values that are still available in the overall sorted list, and take the sum of those. Any lower sum than that would have to use duplicates, which is not allowed, so this represents a lower bound. On the other hand an actual solution might have a greater sum, because this lower bound might have counted values which were on the same stack, which is not allowed either.
The calculation should be done incrementally, i.e. one should avoid to have to calculate it from scratch again and again. Instead, as each step in the algorithm is performed and cards are removed/picked, this lower-bound should be adapted accordingly, which will be more efficient.
A smarter combination of the two above methods could be used. However, the smarter you make it, the more time it will take to calculate it, and the algorithm might or might not get a performance improvement because of it.
So far my conclusions. The most difficult configurations have many duplicates. For instance a configuration with 52 stacks, where all stacks have almost all cards (just a few cards taken out here and there), will be difficult to solve quickly.

Related

Algorithmic help needed (N bags and items distributed randomly)

I have encountered an algorithmic problem but am not able to figure out anything better than brute force or reduce it to a better know problem. Any hints?
There are N bags of variable sizes and N types of items. Each type of items belongs to one bag. There are lots of items of each type and each item may be of a different size. Initially, these items are distributed across all the bags randomly. We have to place the items in their respective bags. However, we can only operate with a pair of bags at one time by exchanging items (as much as possible) and proceeding to the next pair. The aim is to reduce the total number of pairs. Edit: The aim is to find a sequence of transfers that minimizes the total number of bag pairs involved
Clarification:
The bags are not arbitrarily large (You can assume the bag and item sizes to be integers between 0 to 1000 if it helps). You'll frequently encounter scenarios where the all the items between 2 bags cannot be swapped due to the limited capacity of one of the bags. This is where the algorithm needs to make an optimisation. Perhaps, if another pair of bags were swapped first, the current swap can be done in one go. To illustrate this, let's consider Bags A, B and C and their items 1, 2, 3 respectively. The number in the brackets is the size.
A(10) : 3(8)
B(10): 1(2), 1(3)
C(10): 1(4)
The swap orders can be AB, AC, AB or AC, AB. The latter is optimal as the number of swaps is lesser.
Since I cannot come to an idea for an algorithm that will always find an optimal answer, and approximation of the fitness of the solution (amount of swaps) is also fine, I suggest a stochastic local search algorithm with pruning.
Given a random starting configuration, this algorithm considers all possible swaps, and makes a weighed decision based on chance: the better a swap is, the more likely it is chosen.
The value of a swap would be the sum of the value of the transaction of an item, which is zero if the item does not end up in it's belonging bag, and is positive if it does end up there. The value increases as the item's size increases (the idea behind this is that a larger block is hard to move many times in comparison to smaller blocks). This fitness function can be replaced by any other fitness function, it's efficiency is unknown until empirically shown.
Since any configuration can be the consequence of many preceding swaps, we keep track of which configurations we have seen before, along with a fitness (based on how many items are in their correct bag - this fitness is not related to the value of a swap) and the list of preceded swaps. If the fitness function for a configuration is the sum of the items that are in their correct bags, then the amount of items in the problem is the highest fitness (and therefor marks a configuration to be a solution).
A swap is not possible if:
Either of the affected bags is holding more than it's capacity after the potential swap.
The new swap brings you back to the last configuration you were in before the last swap you did (i.e. reversed swap).
When we identify potential swaps, we look into our list of previously seen configurations (use a hash function for O(1) lookup). Then we either set its preceded swaps to our preceded swaps (if our list is shorter than it's), or we set our preceded swaps to its list (if it's list is shorter than ours). We can do this because it does not matter which swaps we did, as long as the amount of swaps is as small as possible.
If there are no more possible swaps left in a configuration, it means you're stuck. Local search tells you 'reset' which you can do in may ways, for instance:
Reset to a previously seen state (maybe the best one you've seen so far?)
Reset to a new valid random solution
Note
Since the algorithm only allows you to do valid swaps, all constraints will be met for each configuration.
The algorithm does not guarantee to 'stop' out of the box, you can implement a maximum number of iterations (swaps)
The algorithm does not guarantee to find a correct solution, as it does it's best to find a better configuration each iteration. However, since a perfect solution (set of swaps) should look closely to an almost perfect solution, a human might be able to finish what the local search algorithm was not after it results in a invalid configuration (where not every item is in its correct bag).
The used fitness functions and strategies are very likely not the most efficient out there. You could look around to find better ones. A more efficient fitness function / strategy should result in a good solution faster (less iterations).

Maximizing expected gain in a social network with probability

I am required to solve a specific problem.
I'm given a representation of a social network.
Each node is a person, each edge is a connection between two persons. The graph is undirected (as you would expect).
Each person has a personal "affinity" for buying a product (to simplify things, let's say there's only one product involved in this whole problem).
In each "step" in time, each person, independently, chooses whether to buy the product or not.
There's probability invovled here. A few parameters are taken into account:
His personal affinity for the product,
The percentage of his friends that already bought the product
The gain for a person buying the product is 1 dollar.
The problem is to point out X persons (let's say, 5 persons) that will receive the product in step 0, and will maximize the total expected value of the gain after Y steps (let's say, 10 steps)
The network is very large. It's not possible to simulate all the options in a naive way.
What tool / library / algorithm should I be using?
Thank you.
P.S.
When investigating this matter in google and wikipedia, a few terms kept popping up:
Dynamic network analysis
Epidemic model
but it didn't help me to find an answer
Generally, people who have the most neighbours have the most influence when they buy something.
So my heuristic would be to order people first by the number of neighbours they have (in decreasing order), then by the number of neighbours that each of those neighbours has (in order from highest to lowest), and so on. You will need at most Y levels of neighbour counts, though fewer may suffice in practice. Then simply take the first X people on this list.
This is only a heuristic, because e.g. if a person has many neighbours but most or all of them are likely to have already bought the product through other connections, then it may give a higher expectation to select a different person having fewer neighbours, but whose neighbours are less likely to already own the product.
You do not need to construct the entire list and then sort it; you can construct the list and then insert each item into a heap, and then just extract the highest-scoring X people. This will be much faster if X is small.
If X and Y are as low as you suggest then this calculation will be pretty fast, so it would be worth doing repeated runs in which instead of starting with the first X people owning the product, for each run you randomly select the initial X owners according to a probability that depends on their position in the list (the further down the list, the lower the probability).
Check out the concept of submodularity, a pretty powerful mathematical concept. In particular, check out slide 19, where submodularity is used to answer the question "Given a social graph, who should get free cell phones?". If you have access, also read the corresponding paper. That should get you started.

Optimally reordering cards in a wallet?

I was out buying groceries the other day and needed to search through my wallet to find my credit card, my customer rewards (loyalty) card, and my photo ID. My wallet has dozens of other cards in it (work ID, other credit cards, etc.), so it took me a while to find everything.
My wallet has six slots in it where I can put cards, with only the first card in each slot initially visible at any one time. If I want to find a specific card, I have to remember which slot it's in, then look at all the cards in that slot one at a time to find it. The closer it is to the front of a slot, the easier it is to find it.
It occurred to me that this is pretty much a data structures question. Suppose that you have a data structure consisting of k linked lists, each of which can store an arbitrary number of elements. You want to distribute elements into the linked lists in a way that minimizes looking up. You can use whatever system you want for distributing elements into the different lists, and can reorder lists whenever you'd like. Given this setup, is there an optimal way to order the lists, under any of the assumptions:
You are given the probabilities of accessing each element in advance and accesses are independent, or
You have no knowledge in advance what elements will be accessed when?
The informal system I use in my wallet is to "hash" cards into different slots based on use case (IDs, credit cards, loyalty cards, etc.), then keep elements within each slot roughly sorted by access frequency. However, maybe there's a better way to do this (for example, storing the k most frequently-used elements at the front of each slot regardless of their use case).
Is there a known system for solving this problem? Is this a well-known problem in data structures? If so, what's the optimal solution?
(In case this doesn't seem programming-related: I could imagine an application in which the user has several drop-down lists of commonly-used items, and wants to keep those items ordered in a way that minimizes the time required to find a particular item.)
Although not a full answer for general k, this 1985 paper by Sleator and Tarjan gives a helpful analysis of the amortised complexity of several dynamic list update algorithms for the case k=1. It turns out that move-to-front is very good: assuming fixed access probabilities for each item, it never requires more than twice the number of steps (moves and swaps) that would be required by the optimal (static) algorithm, in which all elements are listed in nonincreasing order of probability.
Interestingly, a couple of other plausible heuristics -- namely swapping with the previous element after finding the desired element, and maintaining order according to explicit frequency counts -- don't share this desirable property. OTOH, on p. 2 they mention that an earlier paper by Rivest showed that the expected amortised cost of any access under swap-with-previous is <= the corresponding cost under move-to-front.
I've only read the first few pages, but it looks relevant to me. Hope it helps!
You need to look at skip lists. There is a similar problem with arranging stations for a train system where there are express trains and regular trains. An express train stops only at express stations while regular trains stop at regular stations and express stations. Where should the express stops be placed so that one can minimize the average number of stops when travelling from a start station to any station.
The solution is to use stations at ternary numbers (i.e., at 1, 3, 6, 10 etc where T_n = n * (n + 1) / 2).
This is assuming all stops (or cards) are equally likely to be accessed.
If you know the access probabilities of your n cards in advance and you have k wallet slots and accesses are independent, isn't it fairly clear that the greedy solution is optimal? That is, the most frequently-accessed k cards go at the front of the pockets, next-most-frequently accessed k go immediately behind, and so forth? (You never want a lower-probability card ranked before a higher-probability card.)
If you don't know the access probabilities, but you do know they exist and that card accesses are independent, I imagine sorting the cards similarly, but by number-of-accesses-seen-so-far instead is asymptotically optimal. (Move-to-front is cool too, but I don't see an obvious reason to use it here.)
Perhaps you get something interesting if you penalise card moves as well; if I have any known probability distribution on card accesses, independent or not, I just greedily re-sort the cards every time I do an access.

Sorting a list of numbers with modified cost

First, this was one of the four problems we had to solve in a project last year and I couldn’t find a suitable algorithm so we handle in a brute force solution.
Problem: The numbers are in a list that is not sorted and supports only one type of operation. The operation is defined as follows:
Given a position i and a position j the operation moves the number at position i to position j without altering the relative order of the other numbers. If i > j, the positions of the numbers between positions j and i - 1 increment by 1, otherwise if i < j the positions of the numbers between positions i+1 and j decreases by 1. This operation requires i steps to find a number to move and j steps to locate the position to which you want to move it. Then the number of steps required to move a number of position i to position j is i+j.
We need to design an algorithm that given a list of numbers, determine the optimal (in terms of cost) sequence of moves to rearrange the sequence.
Attempts:
Part of our investigation was around NP-Completeness, we make it a decision problem and try to find a suitable transformation to any of the problems listed in Garey and Johnson’s book: Computers and Intractability with no results. There is also no direct reference (from our point of view) to this kind of variation in Donald E. Knuth’s book: The art of Computer Programing Vol. 3 Sorting and Searching. We also analyzed algorithms to sort linked lists but none of them gives a good idea to find de optimal sequence of movements.
Note that the idea is not to find an algorithm that orders the sequence, but one to tell me the optimal sequence of movements in terms of cost that organizes the sequence, you can make a copy and sort it to analyze the final position of the elements if you want, in fact we may assume that the list contains the numbers from 1 to n, so we know where we want to put each number, we are just concerned with minimizing the total cost of the steps.
We tested several greedy approaches but all of them failed, divide and conquer sorting algorithms can’t be used because they swap with no cost portions of the list and our dynamic programing approaches had to consider many cases.
The brute force recursive algorithm takes all the possible combinations of movements from i to j and then again all the possible moments of the rest of the element’s, at the end it returns the sequence with less total cost that sorted the list, as you can imagine the cost of this algorithm is brutal and makes it impracticable for more than 8 elements.
Our observations:
n movements is not necessarily cheaper than n+1 movements (unlike swaps in arrays that are O(1)).
There are basically two ways of moving one element from position i to j: one is to move it directly and the other is to move other elements around i in a way that it reaches the position j.
At most you make n-1 movements (the untouched element reaches its position alone).
If it is the optimal sequence of movements then you didn’t move the same element twice.
This problem looks like a good candidate for an approximation algorithm but that would only give us a good enough answer. Since you want the optimal answer, this is what I'd do to improve on the brute force approach.
Instead of blindly trying every permutations, I'd use a backtracking approach that would maintain the best solution found and prune any branches that exceed the cost of our best solution. I would also add a transposition table to avoid redoing searches on states that were reached by previous branches using different move permutations.
I would also add a few heuristics to explore moves that are more likely to reach good results before any other moves. For example, prefer moves that have a small cost first. I'd need to experiment before I can tell which heuristics would work best if any.
I would also try to find the longest increasing subsequence of numbers in the original array. This will give us a sequence of numbers that don't need to be moved which should considerably cut the number of branches we need to explore. This also greatly speeds up searches on list that are almost sorted.
I'd expect these improvements to be able to handle lists that are far greater than 8 but when dealing with large lists of random numbers, I'd prefer an approximation algorithm.
By popular demand (1 person), this is what I'd do to solve this with a genetic algorithm (the meta-heuristique I'm most familiar with).
First, I'd start by calculating the longest increasing subsequence of numbers (see above). Every item that is not part of that set has to be moved. All we need to know now is in what order.
The genomes used as input for the genetic algorithm, is simply an array where each element represents an item to be moved. The order in which the items show up in the array represent the order in which they have to be moved. The fitness function would be the cost calculation described in the original question.
We now have all the elements needed to plug the problem in a standard genetic algorithm. The rest is just tweaking. Lots and lots of tweaking.

Looking for a multidimensional optimization algorithm

Problem description
There are different categories which contain an arbitrary amount of elements.
There are three different attributes A, B and C. Each element does have an other distribution of these attributes. This distribution is expressed through a positive integer value. For example, element 1 has the attributes A: 42 B: 1337 C: 18. The sum of these attributes is not consistent over the elements. Some elements have more than others.
Now the problem:
We want to choose exactly one element from each category so that
We hit a certain threshold on attributes A and B (going over it is also possible, but not necessary)
while getting a maximum amount of C.
Example: we want to hit at least 80 A and 150 B in sum over all chosen elements and want as many C as possible.
I've thought about this problem and cannot imagine an efficient solution. The sample sizes are about 15 categories from which each contains up to ~30 elements, so bruteforcing doesn't seem to be very effective since there are potentially 30^15 possibilities.
My model is that I think of it as a tree with depth number of categories. Each depth level represents a category and gives us the choice of choosing an element out of this category. When passing over a node, we add the attributes of the represented element to our sum which we want to optimize.
If we hit the same attribute combination multiple times on the same level, we merge them so that we can stripe away the multiple computation of already computed values. If we reach a level where one path has less value in all three attributes, we don't follow it anymore from there.
However, in the worst case this tree still has ~30^15 nodes in it.
Does anybody of you can think of an algorithm which may aid me to solve this problem? Or could you explain why you think that there doesn't exist an algorithm for this?
This question is very similar to a variation of the knapsack problem. I would start by looking at solutions for this problem and see how well you can apply it to your stated problem.
My first inclination to is try branch-and-bound. You can do it breadth-first or depth-first, and I prefer depth-first because I think it's cleaner.
To express it simply, you have a tree-walk procedure walk that can enumerate all possibilities (maybe it just has a 5-level nested loop). It is augmented with two things:
At every step of the way, it keeps track of the cost at that point, where the cost can only increase. (If the cost can also decrease, it becomes more like a minimax game tree search.)
The procedure has an argument budget, and it does not search any branches where the cost can exceed the budget.
Then you have an outer loop:
for (budget = 0; budget < ... ; budget++){
walk(budget);
// if walk finds a solution within the budget, halt
}
The amount of time it takes is exponential in the budget, so easier cases will take less time. The fact that you are re-doing the search doesn't matter much because each level of the budget takes as much or more time than all the previous levels combined.
Combine this with some sort of heuristic about the order in which you consider branches, and it may give you a workable solution for typical problems you give it.
IF that doesn't work, you can fall back on basic heuristic programming. That is, do some cases by hand, and pay attention to how you did it. Then program it the same way.
I hope that helps.

Resources