Related
Apologies in advance if the wording of my question is confusing. I've been having lots of trouble trying to explain it.
Basically I'm trying to write an algorithm that will take in a set of items, for example, the letters in the alphabet and a combination size limit (1,2,3,4...) and will produce all the possible combinations for each size limit.
So for example lets say our set of items was chars A,B,C,D,E and my combination limit was 3, the result I would have would be:
A,
AB, AC, AD, AE,
ABC, ABD, ABE, ACD, ACE, ADE,
B,
BC, BD, BE,
BCD, BCE, BDE,
C,
CD, CE,
CDE,
D,
DE,
E
Hopefully that makes sense.
For the context, I want to use this for my game to generate army compositions with limits to how many different types of units they will be composed of. I don't want to have to do it manually!
Could I please gets some advice?
A recursion can do the job. The idea is to choose a letter, print it as a possibility and combine it with all letters after it:
#include <bits/stdc++.h>
using namespace std;
string letters[] = {"A", "B", "C", "D", "E"};
int alphabetSize = 5;
int combSizeLim = 3;
void gen(int index = 0, int combSize = 0, string comb = ""){
if(combSize > combSizeLim) return;
cout<<comb<<endl;
for(int i = index; i < alphabetSize; i++){
gen(i + 1, combSize + 1, comb + letters[i]);
}
}
int main(){
gen();
return 0;
}
OUTPUT:
A
AB
ABC
ABD
ABE
AC
ACD
ACE
AD
ADE
AE
B
BC
BCD
BCE
BD
BDE
BE
C
CD
CDE
CE
D
DE
E
Here's a simple recursive solution. (The recursion depth is limited to the length of the set, and that cannot be too big or there will be too many combinations. But if you think it will be a problem, it's not that hard to convert it to an iterative solution by using your own stack, again of the same size as the set.)
I'm using a subset of Python as pseudo-code here. In real Python, I would have written a generator instead of passing collection through the recursion.
def gen_helper(collection, elements, curr_element, max_elements, prefix):
if curr_element == len(elements) or max_elements == 0:
collection.append(prefix)
else:
gen_helper(collection, elements, curr_element + 1,
max_elements - 1, prefix + [elements[curr_element]])
gen_helper(collection, elements, curr_element + 1,
max_elements, prefix)
def generate(elements, max_elements):
collection = []
gen_helper(collection, elements, 0, max_elements, [])
return collection
The working of the recursive function (gen_helper) is really simple. It is given a prefix of elements already selected, the index of an element to consider, and the number of elements still allowed to be selected.
If it can't select any more elements, it must choose to just add the current prefix to the accumulated result. That will happen if:
The scan has reached the end of the list of elements, or
The number of elements allowed to be added has reached 0.
Otherwise, it has precisely two options: either it selects the current element or it doesn't. If it chooses to select, it must continue the scan with a reduced allowable count (since it has used up one possible element). If it chooses not to select, it must continue the scan with the same count.
Since we want all possible combinations (as opposed to, say, a random selection of valid combinations), we need to make both choices, one after the other.
I am looking at this challenge:
Given a tree with N nodes and N-1 edges. Each edge on the tree is labelled by a string of lowercase letters from the Latin alphabet. Given Q queries, consisting of two nodes u and v, check if it is possible to make a palindrome string which uses all the characters that belong to the string labelled on the edges in the path from node u to node v.
Characters can be used in any order.
N is of the order of 105 and Q is of the order of 106
Input:
N=3
u=1 v=3 weight=bc
u=1 v=2 weight=aba
Q=4
u=1 v=2
u=2 v=3
u=3 v=1
u=3 v=3
Output:
YES
YES
NO
NO
What I thought was to compute the LCA between 2 nodes by precomputation in O(1) using sparse table and Range minimum query on Euler tower and then see the path from LCA to node u and LCA to node v and store all the characters frequency. If the sum of frequency of all the characters is odd, we check if the frequency of each character except one is odd. If the sum of frequency of all the characters is even, we check if the frequency of each character is even. But this process will surely time out because Q can be upto 106.
Is there anyone with a better algorithm?
Preparation Step
Prepare your data structure as follows:
For each node get the path to the root, get all letters on the path, and only retain a letter when it occurs an odd number of times on that path. Finally encode that string with unique letters as a bit pattern, where bit 0 is set when there is an "a", bit 1 is set when there is a "b", ... bit 25 is set when there is a "z". Store this pattern with the node.
This preprocessing can be done with a depth-first recursive procedure, where the current node's pattern is passed down to the children, which can apply the edge's information to that pattern to create their own pattern. So this preprocessing can run in linear time in terms of the total number of characters in the tree, or more precisely O(N+S), where S represents that total number of characters.
Query Step
When a query is done perform the bitwise XOR on the two involved patterns. If the result is 0 or it has only one bit set, return "YES", else return "NO". So the query will not visit any other nodes than just the two ones that are given, look up the two patterns and perform their XOR and make the bit test. All this happens in constant time for one query.
The last query given in the question shows that the result should be "NO" when the two nodes are the same node. This is a boundary case, as it is debatable whether an empty string is a palindrome or not. The above XOR algorithm would return "YES", so you would need a specific test for this boundary case, and return "NO" instead.
Explanation
This works because if we look at the paths both nodes have to the root, they may share a part of their path. The characters on that common path should not be considered, and the XOR will make sure they aren't. Where the paths differ, we actually have the edges on the path from the one node to the other. There we see the characters that should contribute to a palindrome.
If a character appears an even number of times in those edges, it poses no problem for creating a palindrome. The XOR makes sure those characters "disappear".
If a character appears an odd number of times, all but one can mirror each other like in the even case. The remaining one can only be used in an odd-length palindrome, and only in the centre position of it. So there can only be one such character. This translates to the test that the XOR result is allowed to have 1 bit set (but not more).
Implementation
Here is an implementation in JavaScript. The example run uses the input as provided in the question. I did not bother to turn the query results from boolean to NO/YES:
function prepare(edges) {
// edges: array of [u, v, weight] triplets
// Build adjacency list from the list of edges
let adjacency = {};
for (let [u, v, weight] of edges) {
// convert weight to pattern, as we don't really need to
// store the strings
let pattern = 0;
for (let i = 0; i < weight.length; i++) {
let ascii = weight.charCodeAt(i) - 97;
pattern ^= 1 << ascii; // toggle bit that corresponds to letter
}
if (v in adjacency && u in adjacency) throw "Cycle detected!";
if (!(v in adjacency)) adjacency[v] = {};
if (!(u in adjacency)) adjacency[u] = {};
adjacency[u][v] = pattern;
adjacency[v][u] = pattern;
}
// Prepare the consolidated path-pattern for each node
let patterns = {}; // This is the information to return
function dfs(u, parent, pathPattern) {
patterns[u] = pathPattern;
for (let v in adjacency[u]) {
// recurse into the "children" (the parent is not revisited)
if (v !== parent) dfs(v, u, adjacency[u][v] ^ pathPattern);
}
}
// Start a DFS from an arbitrary node as root
dfs(edges[0][0], null, 0);
return patterns;
}
function query(nodePatterns, u, v) {
if (u === v) return false; // Boundary case.
let pattern = nodePatterns[u] ^ nodePatterns[v];
// "smart" test to verify that at most 1 bit is set
return pattern === (pattern & -pattern);
}
// Example:
let edges = [[1, 3, "bc"], [1, 2, "aba"]];
let queries = [[1, 2], [2, 3], [3, 1], [3, 3]];
let nodePatterns = prepare(edges);
for (let [u, v] of queries) {
console.log(u, v, query(nodePatterns, u, v));
}
First of all, let's choose a root. Now imagine that each edge points to a node which is deeper in the tree. Instead of having strings on edges, put them on vertices that those edges point to. Now there is no string only at your root. Now for each vertex calculate and store amount of each letter in it's string.
Since now we'll be doing stuff for each letter seperately.
Using DFS, calculate for each node v number of letters on vertices on a path from v to root. You'll also need LCA, so you may precompute RMQ or find LCA in O(logn) if you like. Let Letters[v][c] be number of letters c on path from v to root. Then, to find number of letter c from u to v just use Letters[v][c] + Letters[u][c] - 2 * Letters[LCA(v, u)][c]. You can check amount of single letter in O(1) (or O(logn) if you're not using RMQ). So in 26* O(1) you can check every single possible letter.
I bought tree boxes of tea bags with differents flavors (A, B, C).
I wish to mix them in such a way that - there is never two consecutive bags of the same flavor (ABCCAB is avoided) ; - the mixing is the "most" random, i.e. avoid patterns such as ABCABCABC... or ABABAB...BCBCBC...CACACA.
Is there a known algorithm for this mix ?
Presently I randomly shuffle many "ABC" and concatenate the results, swapping the first letters if the latest letter of the previous shuffle is the same than the beginning of the new shuffle (...ABCCAB => ...ABCACB).
I guess I could improve this algorithm by pre-computing the permutations of ABC, and draw one permutations among the ones who do not begin with the same letter than the previous permutation.
I tried to "google" this problem but as a French native speaker, I probably miss the appropriate key-words.
PS : I posted this question on scicomp.stackexchange.com previously, and being advised to duplicate it here.
Something like this should work :
amount_of_teabags_per_flavour = x
choices = {
A : amount_of_teabags_per_flavour,
B : amount_of_teabags_per_flavour,
C : amount_of_teabags_per_flavour
}
previous_choice = 0
picks_left = amount_of_teabags_per_flavour * choices.size
function select_available_choices() :
mandatory_choice = [ key from choices where key != previous_choice and value <= picks_left/2 ]
if mandatory_choice == [] :
available_choices = [ key from choices where key != previous_choice and value > 0 ]
otherwise
available_choices = mandatory_choice
result = []
select_available_choices()
while available_choices != [] :
choice = pick_randomly_from(available_choices)
result[last] = choice
previous_choice = choice
choices[choice]--
picks_left--
select_available_choices()
You can start with one of the three letters A, B or C at random for the first flavor, then compute the next one from one of the two other possibilities (since you don't want flavors to repeat).
Quick script in python3:
from random import randint
rand = lambda x: randint(0,x)
flavors = ['A', 'B', 'C']
freq = [0, 0, 0]
index = rand(2)
mix = flavors[index]
for i in range(1000):
index = (index + 1 + rand(1)) % 3
mix += flavors[index]
freq[index] += 1
print(mix)
print(freq)
You can see that the randomness is good (given the frequencies, and a wide enough range) and no character is repeated twice in a row.
Avoiding repetition at the boundary, is a graph problem. You have a directed edge from each node (permutation) to every other node ... except for the doubled boundary. You need a Hamiltonian path through the graph; solutions are readily available on line. Then duplicate or truncate that path as much as needed to match the quantity of bags in each box.
If you want to simplify the problem, then generate edges to connect each node only with a node that starts with the second bag of the previous one. For instance, with your 3-box design ...
ABC -> BAC, BCA
CBA -> BAC, BCA
BAC -> ABC, ACB
CAB -> ABC, ACB
ACB -> CAB, CBA
BCA -> CAB, CBA
Any graph built from a full permutation set has a Hamiltonian path, and it's much faster to find. A simple recursive algorithm (with backtracking) should be easy to code. If you sort your choices by second-order availability, I think the problem solves itself without backtracking.
For instance, if you are at node ABC, look at the available choices starting with B: BAC and BCA Look at the second letters of those: A gives you only one remaining choice (ABC already got used), but C still has two choices. Therefore, you move to BCA. If your top choices are tied, then decide any way you wish ... preserving your ability to return to the starting node at the end.
Does that solve your problem?
I have a list of elements, each one identified with a type, I need to reorder the list to maximize the minimum distance between elements of the same type.
The set is small (10 to 30 items), so performance is not really important.
There's no limit about the quantity of items per type or quantity of types, the data can be considered random.
For example, if I have a list of:
5 items of A
3 items of B
2 items of C
2 items of D
1 item of E
1 item of F
I would like to produce something like:
A, B, C, A, D, F, B, A, E, C, A, D, B, A
A has at least 2 items between occurences
B has at least 4 items between occurences
C has 6 items between occurences
D has 6 items between occurences
Is there an algorithm to achieve this?
-Update-
After exchanging some comments, I came to a definition of a secondary goal:
main goal: maximize the minimum distance between elements of the same type, considering only the type(s) with less distance.
secondary goal: maximize the minimum distance between elements on every type. IE: if a combination increases the minimum distance of a certain type without decreasing other, then choose it.
-Update 2-
About the answers.
There were a lot of useful answers, although none is a solution for both goals, specially the second one which is tricky.
Some thoughts about the answers:
PengOne: Sounds good, although it doesn't provide a concrete implementation, and not always leads to the best result according to the second goal.
Evgeny Kluev: Provides a concrete implementation to the main goal, but it doesn't lead to the best result according to the secondary goal.
tobias_k: I liked the random approach, it doesn't always lead to the best result, but it's a good approximation and cost effective.
I tried a combination of Evgeny Kluev, backtracking, and tobias_k formula, but it needed too much time to get the result.
Finally, at least for my problem, I considered tobias_k to be the most adequate algorithm, for its simplicity and good results in a timely fashion. Probably, it could be improved using Simulated annealing.
First, you don't have a well-defined optimization problem yet. If you want to maximized the minimum distance between two items of the same type, that's well defined. If you want to maximize the minimum distance between two A's and between two B's and ... and between two Z's, then that's not well defined. How would you compare two solutions:
A's are at least 4 apart, B's at least 4 apart, and C's at least 2 apart
A's at least 3 apart, B's at least 3 apart, and C's at least 4 apart
You need a well-defined measure of "good" (or, more accurately, "better"). I'll assume for now that the measure is: maximize the minimum distance between any two of the same item.
Here's an algorithm that achieves a minimum distance of ceiling(N/n(A)) where N is the total number of items and n(A) is the number of items of instance A, assuming that A is the most numerous.
Order the item types A1, A2, ... , Ak where n(Ai) >= n(A{i+1}).
Initialize the list L to be empty.
For j from k to 1, distribute items of type Ak as uniformly as possible in L.
Example: Given the distribution in the question, the algorithm produces:
F
E, F
D, E, D, F
D, C, E, D, C, F
B, D, C, E, B, D, C, F, B
A, B, D, A, C, E, A, B, D, A, C, F, A, B
This sounded like an interesting problem, so I just gave it a try. Here's my super-simplistic randomized approach, done in Python:
def optimize(items, quality_function, stop=1000):
no_improvement = 0
best = 0
while no_improvement < stop:
i = random.randint(0, len(items)-1)
j = random.randint(0, len(items)-1)
copy = items[::]
copy[i], copy[j] = copy[j], copy[i]
q = quality_function(copy)
if q > best:
items, best = copy, q
no_improvement = 0
else:
no_improvement += 1
return items
As already discussed in the comments, the really tricky part is the quality function, passed as a parameter to the optimizer. After some trying I came up with one that almost always yields optimal results. Thank to pmoleri, for pointing out how to make this a whole lot more efficient.
def quality_maxmindist(items):
s = 0
for item in set(items):
indcs = [i for i in range(len(items)) if items[i] == item]
if len(indcs) > 1:
s += sum(1./(indcs[i+1] - indcs[i]) for i in range(len(indcs)-1))
return 1./s
And here some random result:
>>> print optimize(items, quality_maxmindist)
['A', 'B', 'C', 'A', 'D', 'E', 'A', 'B', 'F', 'C', 'A', 'D', 'B', 'A']
Note that, passing another quality function, the same optimizer could be used for different list-rearrangement tasks, e.g. as a (rather silly) randomized sorter.
Here is an algorithm that only maximizes the minimum distance between elements of the same type and does nothing beyond that. The following list is used as an example:
AAAAA BBBBB CCCC DDDD EEEE FFF GG
Sort element sets by number of elements of each type in descending order. Actually only largest sets (A & B) should be placed to the head of the list as well as those element sets that have one element less (C & D & E). Other sets may be unsorted.
Reserve R last positions in the array for one element from each of the largest sets, divide the remaining array evenly between the S-1 remaining elements of the largest sets. This gives optimal distance: K = (N - R) / (S - 1). Represent target array as a 2D matrix with K columns and L = N / K full rows (and possibly one partial row with N % K elements). For example sets we have R = 2, S = 5, N = 27, K = 6, L = 4.
If matrix has S - 1 full rows, fill first R columns of this matrix with elements of the largest sets (A & B), otherwise sequentially fill all columns, starting from last one.
For our example this gives:
AB....
AB....
AB....
AB....
AB.
If we try to fill the remaining columns with other sets in the same order, there is a problem:
ABCDE.
ABCDE.
ABCDE.
ABCE..
ABD
The last 'E' is only 5 positions apart from the first 'E'.
Sequentially fill all columns, starting from last one.
For our example this gives:
ABFEDC
ABFEDC
ABFEDC
ABGEDC
ABG
Returning to linear array we have:
ABFEDCABFEDCABFEDCABGEDCABG
Here is an attempt to use simulated annealing for this problem (C sources): http://ideone.com/OGkkc.
I believe you could see your problem like a bunch of particles that physically repel eachother. You could iterate to a 'stable' situation.
Basic pseudo-code:
force( x, y ) = 0 if x.type==y.type
1/distance(x,y) otherwise
nextposition( x, force ) = coined?(x) => same
else => x + force
notconverged(row,newrow) = // simplistically
row!=newrow
row=[a,b,a,b,b,b,a,e];
newrow=nextposition(row);
while( notconverged(row,newrow) )
newrow=nextposition(row);
I don't know if it converges, but it's an idea :)
I'm sure there may be a more efficient solution, but here is one possibility for you:
First, note that it is very easy to find an ordering which produces a minimum-distance-between-items-of-same-type of 1. Just use any random ordering, and the MDBIOST will be at least 1, if not more.
So, start off with the assumption that the MDBIOST will be 2. Do a recursive search of the space of possible orderings, based on the assumption that MDBIOST will be 2. There are a number of conditions you can use to prune branches from this search. Terminate the search if you find an ordering which works.
If you found one that works, try again, under the assumption that MDBIOST will be 3. Then 4... and so on, until the search fails.
UPDATE: It would actually be better to start with a high number, because that will constrain the possible choices more. Then gradually reduce the number, until you find an ordering which works.
Here's another approach.
If every item must be kept at least k places from every other item of the same type, then write down items from left to right, keeping track of the number of items left of each type. At each point put down an item with the largest number left that you can legally put down.
This will work for N items if there are no more than ceil(N / k) items of the same type, as it will preserve this property - after putting down k items we have k less items and we have put down at least one of each type that started with at ceil(N / k) items of that type.
Given a clutch of mixed items you could work out the largest k you can support and then lay out the items to solve for this k.
I have a symmetric matrix like shown in the image attached below.
I've made up the notation A.B which represents the value at grid point (A, B). Furthermore, writing A.B.C gives me the minimum grid point value like so: MIN((A,B), (A,C), (B,C)).
As another example A.B.D gives me MIN((A,B), (A,D), (B,D)).
My goal is to find the minimum values for ALL combinations of letters (not repeating) for one row at a time e.g for this example I need to find min values with respect to row A which are given by the calculations:
A.B = 6
A.C = 8
A.D = 4
A.B.C = MIN(6,8,6) = 6
A.B.D = MIN(6, 4, 4) = 4
A.C.D = MIN(8, 4, 2) = 2
A.B.C.D = MIN(6, 8, 4, 6, 4, 2) = 2
I realize that certain calculations can be reused which becomes increasingly important as the matrix size increases, but the problem is finding the most efficient way to implement this reuse.
Can point me in the right direction to finding an efficient algorithm/data structure I can use for this problem?
You'll want to think about the lattice of subsets of the letters, ordered by inclusion. Essentially, you have a value f(S) given for every subset S of size 2 (that is, every off-diagonal element of the matrix - the diagonal elements don't seem to occur in your problem), and the problem is to find, for each subset T of size greater than two, the minimum f(S) over all S of size 2 contained in T. (And then you're interested only in sets T that contain a certain element "A" - but we'll disregard that for the moment.)
First of all, note that if you have n letters, that this amounts to asking Omega(2^n) questions, roughly one for each subset. (Excluding the zero- and one-element subsets and those that don't include "A" saves you n + 1 sets and a factor of two, respectively, which is allowed for big Omega.) So if you want to store all these answers for even moderately large n, you'll need a lot of memory. If n is large in your applications, it might be best to store some collection of pre-computed data and do some computation whenever you need a particular data point; I haven't thought about what would work best, but for example computing data only for a binary tree contained in the lattice would not necessarily help you anything beyond precomputing nothing at all.
With these things out of the way, let's assume you actually want all the answers computed and stored in memory. You'll want to compute these "layer by layer", that is, starting with the three-element subsets (since the two-element subsets are already given by your matrix), then four-element, then five-element, etc. This way, for a given subset S, when we're computing f(S) we will already have computed all f(T) for T strictly contained in S. There are several ways that you can make use of this, but I think the easiest might be to use two such subset S: let t1 and t2 be two different elements of T that you may select however you like; let S be the subset of T that you get when you remove t1 and t2. Write S1 for S plus t1 and write S2 for S plus t2. Now every pair of letters contained in T is either fully contained in S1, or it is fully contained in S2, or it is {t1, t2}. Look up f(S1) and f(S2) in your previously computed values, then look up f({t1, t2}) directly in the matrix, and store f(T) = the minimum of these 3 numbers.
If you never select "A" for t1 or t2, then indeed you can compute everything you're interested in while not computing f for any sets T that don't contain "A". (This is possible because the steps outlined above are only interesting whenever T contains at least three elements.) Good! This leaves just one question - how to store the computed values f(T). What I would do is use a 2^(n-1)-sized array; represent each subset-of-your-alphabet-that-includes-"A" by the (n-1) bit number where the ith bit is 1 whenever the (i+1)th letter is in that set (so 0010110, which has bits 2, 4, and 5 set, represents the subset {"A", "C", "D", "F"} out of the alphabet "A" .. "H" - note I'm counting bits starting at 0 from the right, and letters starting at "A" = 0). This way, you can actually iterate through the sets in numerical order and don't need to think about how to iterate through all k-element subsets of an n-element set. (You do need to include a special case for when the set under consideration has 0 or 1 element, in which case you'll want to do nothing, or 2 elements, in which case you just copy the value from the matrix.)
Well, it looks simple to me, but perhaps I misunderstand the problem. I would do it like this:
let P be a pattern string in your notation X1.X2. ... .Xn, where Xi is a column in your matrix
first compute the array CS = [ (X1, X2), (X1, X3), ... (X1, Xn) ], which contains all combinations of X1 with every other element in the pattern; CS has n-1 elements, and you can easily build it in O(n)
now you must compute min (CS), i.e. finding the minimum value of the matrix elements corresponding to the combinations in CS; again you can easily find the minimum value in O(n)
done.
Note: since your matrix is symmetric, given P you just need to compute CS by combining the first element of P with all other elements: (X1, Xi) is equal to (Xi, X1)
If your matrix is very large, and you want to do some optimization, you may consider prefixes of P: let me explain with an example
when you have solved the problem for P = X1.X2.X3, store the result in an associative map, where X1.X2.X3 is the key
later on, when you solve a problem P' = X1.X2.X3.X7.X9.X10.X11 you search for the longest prefix of P' in your map: you can do this by starting with P' and removing one component (Xi) at a time from the end until you find a match in your map or you end up with an empty string
if you find a prefix of P' in you map then you already know the solution for that problem, so you just have to find the solution for the problem resulting from combining the first element of the prefix with the suffix, and then compare the two results: in our example the prefix is X1.X2.X3, and so you just have to solve the problem for
X1.X7.X9.X10.X11, and then compare the two values and choose the min (don't forget to update your map with the new pattern P')
if you don't find any prefix, then you must solve the entire problem for P' (and again don't forget to update the map with the result, so that you can reuse it in the future)
This technique is essentially a form of memoization.