tea bag flavors mixing algorithm

tea bag flavors mixing algorithm - algorithm

I bought tree boxes of tea bags with differents flavors (A, B, C).
I wish to mix them in such a way that - there is never two consecutive bags of the same flavor (ABCCAB is avoided) ; - the mixing is the "most" random, i.e. avoid patterns such as ABCABCABC... or ABABAB...BCBCBC...CACACA.
Is there a known algorithm for this mix ?
Presently I randomly shuffle many "ABC" and concatenate the results, swapping the first letters if the latest letter of the previous shuffle is the same than the beginning of the new shuffle (...ABCCAB => ...ABCACB).
I guess I could improve this algorithm by pre-computing the permutations of ABC, and draw one permutations among the ones who do not begin with the same letter than the previous permutation.
I tried to "google" this problem but as a French native speaker, I probably miss the appropriate key-words.
PS : I posted this question on scicomp.stackexchange.com previously, and being advised to duplicate it here.

Something like this should work :
amount_of_teabags_per_flavour = x
choices = {
A : amount_of_teabags_per_flavour,
B : amount_of_teabags_per_flavour,
C : amount_of_teabags_per_flavour
}
previous_choice = 0
picks_left = amount_of_teabags_per_flavour * choices.size
function select_available_choices() :
mandatory_choice = [ key from choices where key != previous_choice and value <= picks_left/2 ]
if mandatory_choice == [] :
available_choices = [ key from choices where key != previous_choice and value > 0 ]
otherwise
available_choices = mandatory_choice
result = []
select_available_choices()
while available_choices != [] :
choice = pick_randomly_from(available_choices)
result[last] = choice
previous_choice = choice
choices[choice]--
picks_left--
select_available_choices()

You can start with one of the three letters A, B or C at random for the first flavor, then compute the next one from one of the two other possibilities (since you don't want flavors to repeat).
Quick script in python3:
from random import randint
rand = lambda x: randint(0,x)
flavors = ['A', 'B', 'C']
freq = [0, 0, 0]
index = rand(2)
mix = flavors[index]
for i in range(1000):
index = (index + 1 + rand(1)) % 3
mix += flavors[index]
freq[index] += 1
print(mix)
print(freq)
You can see that the randomness is good (given the frequencies, and a wide enough range) and no character is repeated twice in a row.

Avoiding repetition at the boundary, is a graph problem. You have a directed edge from each node (permutation) to every other node ... except for the doubled boundary. You need a Hamiltonian path through the graph; solutions are readily available on line. Then duplicate or truncate that path as much as needed to match the quantity of bags in each box.
If you want to simplify the problem, then generate edges to connect each node only with a node that starts with the second bag of the previous one. For instance, with your 3-box design ...
ABC -> BAC, BCA
CBA -> BAC, BCA
BAC -> ABC, ACB
CAB -> ABC, ACB
ACB -> CAB, CBA
BCA -> CAB, CBA
Any graph built from a full permutation set has a Hamiltonian path, and it's much faster to find. A simple recursive algorithm (with backtracking) should be easy to code. If you sort your choices by second-order availability, I think the problem solves itself without backtracking.
For instance, if you are at node ABC, look at the available choices starting with B: BAC and BCA Look at the second letters of those: A gives you only one remaining choice (ABC already got used), but C still has two choices. Therefore, you move to BCA. If your top choices are tied, then decide any way you wish ... preserving your ability to return to the starting node at the end.
Does that solve your problem?

Related

Creating a list of all possible combinations from a set of items for n combination sizes

Apologies in advance if the wording of my question is confusing. I've been having lots of trouble trying to explain it.
Basically I'm trying to write an algorithm that will take in a set of items, for example, the letters in the alphabet and a combination size limit (1,2,3,4...) and will produce all the possible combinations for each size limit.
So for example lets say our set of items was chars A,B,C,D,E and my combination limit was 3, the result I would have would be:
A,
AB, AC, AD, AE,
ABC, ABD, ABE, ACD, ACE, ADE,
B,
BC, BD, BE,
BCD, BCE, BDE,
C,
CD, CE,
CDE,
D,
DE,
E
Hopefully that makes sense.
For the context, I want to use this for my game to generate army compositions with limits to how many different types of units they will be composed of. I don't want to have to do it manually!
Could I please gets some advice?

A recursion can do the job. The idea is to choose a letter, print it as a possibility and combine it with all letters after it:
#include <bits/stdc++.h>
using namespace std;
string letters[] = {"A", "B", "C", "D", "E"};
int alphabetSize = 5;
int combSizeLim = 3;
void gen(int index = 0, int combSize = 0, string comb = ""){
if(combSize > combSizeLim) return;
cout<<comb<<endl;
for(int i = index; i < alphabetSize; i++){
gen(i + 1, combSize + 1, comb + letters[i]);
}
}
int main(){
gen();
return 0;
}
OUTPUT:
A
AB
ABC
ABD
ABE
AC
ACD
ACE
AD
ADE
AE
B
BC
BCD
BCE
BD
BDE
BE
C
CD
CDE
CE
D
DE
E

Here's a simple recursive solution. (The recursion depth is limited to the length of the set, and that cannot be too big or there will be too many combinations. But if you think it will be a problem, it's not that hard to convert it to an iterative solution by using your own stack, again of the same size as the set.)
I'm using a subset of Python as pseudo-code here. In real Python, I would have written a generator instead of passing collection through the recursion.
def gen_helper(collection, elements, curr_element, max_elements, prefix):
if curr_element == len(elements) or max_elements == 0:
collection.append(prefix)
else:
gen_helper(collection, elements, curr_element + 1,
max_elements - 1, prefix + [elements[curr_element]])
gen_helper(collection, elements, curr_element + 1,
max_elements, prefix)
def generate(elements, max_elements):
collection = []
gen_helper(collection, elements, 0, max_elements, [])
return collection
The working of the recursive function (gen_helper) is really simple. It is given a prefix of elements already selected, the index of an element to consider, and the number of elements still allowed to be selected.
If it can't select any more elements, it must choose to just add the current prefix to the accumulated result. That will happen if:
The scan has reached the end of the list of elements, or
The number of elements allowed to be added has reached 0.
Otherwise, it has precisely two options: either it selects the current element or it doesn't. If it chooses to select, it must continue the scan with a reduced allowable count (since it has used up one possible element). If it chooses not to select, it must continue the scan with the same count.
Since we want all possible combinations (as opposed to, say, a random selection of valid combinations), we need to make both choices, one after the other.

Implementing cartesian product, such that it can skip iterations

I want to implement a function which will return cartesian product of set, repeated given number. For example
input: {a, b}, 2
output:
aa
ab
bb
ba
input: {a, b}, 3
aaa
aab
aba
baa
bab
bba
bbb
However the only way I can implement it is firstly doing cartesion product for 2 sets("ab", "ab), then from the output of the set, add the same set. Here is pseudo-code:
function product(A, B):
result = []
for i in A:
for j in B:
result.append([i,j])
return result
function product1(chars, count):
result = product(chars, chars)
for i in range(2, count):
result = product(result, chars)
return result
What I want is to start computing directly the last set, without computing all of the sets before it. Is this possible, also a solution which will give me similar result, but it isn't cartesian product is acceptable.
I don't have problem reading most of the general purpose programming languages, so if you need to post code you can do it in any language you fell comfortable with.

Here's a recursive algorithm that builds S^n without building S^(n-1) "first". Imagine an infinite k-ary tree where |S| = k. Label with the elements of S each of the edges connecting any parent to its k children. An element of S^m can be thought of as any path of length m from the root. The set S^m, in that way of thinking, is the set of all such paths. Now the problem of finding S^n is a problem of enumerating all paths of length n - and we can name a path by considering the sequence of edge labels from beginning to end. We want to directly generate S^n without first enumerating all of S^(n-1), so a depth-first search modified to find all nodes at depth n seems appropriate. This is essentially how the below algorithm works:
// collection to hold generated output
members = []
// recursive function to explore product space
Products(set[1...n], length, current[1...m])
// if the product we're working on is of the
// desired length then record it and return
if m = length then
members.append(current)
return
// otherwise we add each possible value to the end
// and generate all products of the desired length
// with the new vector as a prefix
for i = 1 to n do
current.addLast(set[i])
Products(set, length, current)
currents.removeLast()
// reset the result collection and request the set be generated
members = []
Products([a, b], 3, [])
Now, a breadth-first approach is no less efficient than a depth-first one, and if you think about it would be no different from exactly what you're already doing. Indeed, and approach that generates S^n must necessarily generate S^(n-1) at least once, since that can be found in a solution to S^n.

Algorithm to separate items of the same type

I have a list of elements, each one identified with a type, I need to reorder the list to maximize the minimum distance between elements of the same type.
The set is small (10 to 30 items), so performance is not really important.
There's no limit about the quantity of items per type or quantity of types, the data can be considered random.
For example, if I have a list of:
5 items of A
3 items of B
2 items of C
2 items of D
1 item of E
1 item of F
I would like to produce something like:
A, B, C, A, D, F, B, A, E, C, A, D, B, A
A has at least 2 items between occurences
B has at least 4 items between occurences
C has 6 items between occurences
D has 6 items between occurences
Is there an algorithm to achieve this?
-Update-
After exchanging some comments, I came to a definition of a secondary goal:
main goal: maximize the minimum distance between elements of the same type, considering only the type(s) with less distance.
secondary goal: maximize the minimum distance between elements on every type. IE: if a combination increases the minimum distance of a certain type without decreasing other, then choose it.
-Update 2-
About the answers.
There were a lot of useful answers, although none is a solution for both goals, specially the second one which is tricky.
Some thoughts about the answers:
PengOne: Sounds good, although it doesn't provide a concrete implementation, and not always leads to the best result according to the second goal.
Evgeny Kluev: Provides a concrete implementation to the main goal, but it doesn't lead to the best result according to the secondary goal.
tobias_k: I liked the random approach, it doesn't always lead to the best result, but it's a good approximation and cost effective.
I tried a combination of Evgeny Kluev, backtracking, and tobias_k formula, but it needed too much time to get the result.
Finally, at least for my problem, I considered tobias_k to be the most adequate algorithm, for its simplicity and good results in a timely fashion. Probably, it could be improved using Simulated annealing.

First, you don't have a well-defined optimization problem yet. If you want to maximized the minimum distance between two items of the same type, that's well defined. If you want to maximize the minimum distance between two A's and between two B's and ... and between two Z's, then that's not well defined. How would you compare two solutions:
A's are at least 4 apart, B's at least 4 apart, and C's at least 2 apart
A's at least 3 apart, B's at least 3 apart, and C's at least 4 apart
You need a well-defined measure of "good" (or, more accurately, "better"). I'll assume for now that the measure is: maximize the minimum distance between any two of the same item.
Here's an algorithm that achieves a minimum distance of ceiling(N/n(A)) where N is the total number of items and n(A) is the number of items of instance A, assuming that A is the most numerous.
Order the item types A1, A2, ... , Ak where n(Ai) >= n(A{i+1}).
Initialize the list L to be empty.
For j from k to 1, distribute items of type Ak as uniformly as possible in L.
Example: Given the distribution in the question, the algorithm produces:
F
E, F
D, E, D, F
D, C, E, D, C, F
B, D, C, E, B, D, C, F, B
A, B, D, A, C, E, A, B, D, A, C, F, A, B

This sounded like an interesting problem, so I just gave it a try. Here's my super-simplistic randomized approach, done in Python:
def optimize(items, quality_function, stop=1000):
no_improvement = 0
best = 0
while no_improvement < stop:
i = random.randint(0, len(items)-1)
j = random.randint(0, len(items)-1)
copy = items[::]
copy[i], copy[j] = copy[j], copy[i]
q = quality_function(copy)
if q > best:
items, best = copy, q
no_improvement = 0
else:
no_improvement += 1
return items
As already discussed in the comments, the really tricky part is the quality function, passed as a parameter to the optimizer. After some trying I came up with one that almost always yields optimal results. Thank to pmoleri, for pointing out how to make this a whole lot more efficient.
def quality_maxmindist(items):
s = 0
for item in set(items):
indcs = [i for i in range(len(items)) if items[i] == item]
if len(indcs) > 1:
s += sum(1./(indcs[i+1] - indcs[i]) for i in range(len(indcs)-1))
return 1./s
And here some random result:
>>> print optimize(items, quality_maxmindist)
['A', 'B', 'C', 'A', 'D', 'E', 'A', 'B', 'F', 'C', 'A', 'D', 'B', 'A']
Note that, passing another quality function, the same optimizer could be used for different list-rearrangement tasks, e.g. as a (rather silly) randomized sorter.

Here is an algorithm that only maximizes the minimum distance between elements of the same type and does nothing beyond that. The following list is used as an example:
AAAAA BBBBB CCCC DDDD EEEE FFF GG
Sort element sets by number of elements of each type in descending order. Actually only largest sets (A & B) should be placed to the head of the list as well as those element sets that have one element less (C & D & E). Other sets may be unsorted.
Reserve R last positions in the array for one element from each of the largest sets, divide the remaining array evenly between the S-1 remaining elements of the largest sets. This gives optimal distance: K = (N - R) / (S - 1). Represent target array as a 2D matrix with K columns and L = N / K full rows (and possibly one partial row with N % K elements). For example sets we have R = 2, S = 5, N = 27, K = 6, L = 4.
If matrix has S - 1 full rows, fill first R columns of this matrix with elements of the largest sets (A & B), otherwise sequentially fill all columns, starting from last one.
For our example this gives:
AB....
AB....
AB....
AB....
AB.
If we try to fill the remaining columns with other sets in the same order, there is a problem:
ABCDE.
ABCDE.
ABCDE.
ABCE..
ABD
The last 'E' is only 5 positions apart from the first 'E'.
Sequentially fill all columns, starting from last one.
For our example this gives:
ABFEDC
ABFEDC
ABFEDC
ABGEDC
ABG
Returning to linear array we have:
ABFEDCABFEDCABFEDCABGEDCABG
Here is an attempt to use simulated annealing for this problem (C sources): http://ideone.com/OGkkc.

I believe you could see your problem like a bunch of particles that physically repel eachother. You could iterate to a 'stable' situation.
Basic pseudo-code:
force( x, y ) = 0 if x.type==y.type
1/distance(x,y) otherwise
nextposition( x, force ) = coined?(x) => same
else => x + force
notconverged(row,newrow) = // simplistically
row!=newrow
row=[a,b,a,b,b,b,a,e];
newrow=nextposition(row);
while( notconverged(row,newrow) )
newrow=nextposition(row);
I don't know if it converges, but it's an idea :)

I'm sure there may be a more efficient solution, but here is one possibility for you:
First, note that it is very easy to find an ordering which produces a minimum-distance-between-items-of-same-type of 1. Just use any random ordering, and the MDBIOST will be at least 1, if not more.
So, start off with the assumption that the MDBIOST will be 2. Do a recursive search of the space of possible orderings, based on the assumption that MDBIOST will be 2. There are a number of conditions you can use to prune branches from this search. Terminate the search if you find an ordering which works.
If you found one that works, try again, under the assumption that MDBIOST will be 3. Then 4... and so on, until the search fails.
UPDATE: It would actually be better to start with a high number, because that will constrain the possible choices more. Then gradually reduce the number, until you find an ordering which works.

Here's another approach.
If every item must be kept at least k places from every other item of the same type, then write down items from left to right, keeping track of the number of items left of each type. At each point put down an item with the largest number left that you can legally put down.
This will work for N items if there are no more than ceil(N / k) items of the same type, as it will preserve this property - after putting down k items we have k less items and we have put down at least one of each type that started with at ceil(N / k) items of that type.
Given a clutch of mixed items you could work out the largest k you can support and then lay out the items to solve for this k.

string of integers puzzle

I apologize for not have the math background to put this question in a more formal way.
I'm looking to create a string of 796 letters (or integers) with certain properties.
Basically, the string is a variation on a De Bruijn sequence B(12,4), except order and repetition within each n-length subsequence are disregarded.
i.e. ABBB BABA BBBA are each equivalent to {AB}.
In other words, the main property of the string involves looking at consecutive groups of 4 letters within the larger string
(i.e. the 1st through 4th letters, the 2nd through 5th letters, the 3rd through 6th letters, etc)
And then producing the set of letters that comprise each group (repetitions and order disregarded)
For example, in the string of 9 letters:
A B B A C E B C D
the first 4-letter groups is: ABBA, which is comprised of the set {AB}
the second group is: BBAC, which is comprised of the set {ABC}
the third group is: BACE, which is comprised of the set {ABCE}
etc.
The goal is for every combination of 1-4 letters from a set of N letters to be represented by the 1-4-letter resultant sets of the 4-element groups once and only once in the original string.
For example, if there is a set of 5 letters {A, B, C, D, E} being used
Then the possible 1-4 letter combinations are:
A, B, C, D, E,
AB, AC, AD, AE, BC, BD, BE, CD, CE, DE,
ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE,
ABCD, ABCE, ABDE, ACDE, BCDE
Here is a working example that uses a set of 5 letters {A, B, C, D, E}.
D D D D E C B B B B A E C C C C D A E E E E B D A A A A C B D D B
The 1st through 4th elements form the set: D
The 2nd through 5th elements form the set: DE
The 3rd through 6th elements form the set: CDE
The 4th through 7th elements form the set: BCDE
The 5th through 8th elements form the set: BCE
The 6th through 9th elements form the set: BC
The 7th through 10th elements form the set: B
etc.
* I am hoping to find a working example of a string that uses 12 different letters (a total of 793 4-letter groups within a 796-letter string) starting (and if possible ending) with 4 of the same letter. *
Here is a working solution for 7 letters:
AAAABCDBEAAACDECFAAADBFBACEAGAADEFBAGACDFBGCCCCDGEAFAGCBEEECGFFBFEGGGGFDEEEEFCBBBBGDCFFFFDAGBEGDDDDBE

Beware that in order to attempt exhaustive search (answer in VB is trying a naive version of that) you'll first have to solve the problem of generating all possible expansions while maintaining lexicographical order. Just ABC, expands to all perms of AABC, plus all perms of ABBC, plus all perms of ABCC which is 3*4! instead of just AABC. If you just concatenate AABC and AABD it would cover just 4 out of 4! perms of AABC and even that by accident. Just this expansion will bring you exponential complexity - end of game. Plus you'll need to maintain association between all explansions and the set (the set becomes a label).
Your best bet is to use one of known efficient De Bruijn constuctors and try to see if you can put your set-equivalence in there. Check out
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.674&rep=rep1&type=pdf
and
http://www.dim.uchile.cl/~emoreno/publicaciones/FINALES/copyrighted/IPL05-De_Bruijn_sequences_and_De_Bruijn_graphs_for_a_general_language.pdf
for a start.
If you know graphs, another viable option is to start with De Bruijn graph and formulate your set-equivalence as a graph rewriting. 2nd paper does De Bruijn graph partitioning.
BTW, try VB answer just for A,B,AB (at least expansion is small) - it will make AABBAB and construct ABBA or ABBAB (or throw in a decent language) both of which are wrong. You can even prove that it will always miss with 1st lexical expansions (that's what AAB, AAAB etc. are) just by examining first 2 passes (it will always miss 2nd A for NxA because (N-1)xA+B is in the string (1st expansion of {AB}).
Oh and if we could establish how many of each letters an optimal soluton should have (don't look at B(5,2) it's too easy and regular :-) a random serch would be feasible - you generate candidates with provable traits (like AAAA, BBBB ... are present and not touching and is has n1 A-s, n2 B-s ...) and random arrangement and then test whether they are solutions (checking is much faster than exhaustive search in this case).

Cool problem. Just a draft/psuedo algo:
dim STR-A as string = getall(ABCDEFGHIJKL)
//custom function to generate concat list of all 793 4-char combos.
//should be listed side-by-side to form 3172 character-long string.
//different ordering may ultimately produce different results.
//brute-forcing all orders of combos is too much work (793! is a big #).
//need to determine how to find optimal ordering, for this particular
//approach below.
dim STR-B as string = "" // to hold the string you're searching for
dim STR-C as string = "" // to hold the sub-string you are searching in
dim STR-A-NEW as string = "" //variable to hold your new string
dim MATCH as boolean = false //variable to hold matching status
while len(STR-A) > 0
//check each character in STR-A, which will be shorted by 1 char on each
//pass.
MATCH = false
STR-B = left(STR-A, 4)
STR-B = reduce(STR-B)
//reduce(str) is a custom re-usable function to sort & remove duplicates
for i as integer = 1 to len((STR-A) - 1)
STR-C = substr(STR-A, i, 4)
//gives you the 4-character sequence beginning at position i
STR-C = reduce(STR-C)
IF STR-B = STR-C Then
MATCH = true
exit for
//as long as there is even one match, you can throw-away the first
//letter
END IF
i = i+1
next
IF match = false then
//if you didn't find a match, then the first letter should be saved
STR-A-NEW += LEFT(STR-B, 1)
END IF
MATCH = false //re-init MATCH
STR-A = RIGHT(STR-A, LEN(STR-A) - 1) //re-init STR_A
wend
Anyway -- there could be problems at this, and you'd need to write another function to parse your result string (STR-A-NEW) to prove that it's a viable answer...

I've been thinking about this one and I'm sketching out a solution.
Let's call a string of four symbols a word and we'll write S(w) to denote the set of symbols in word w.
Each word abcd has "follow-on" words bcde where a,...,e are all symbols.
Let succ(w) be the set of follow-on words v for w such that S(w) != S(v). succ(w) is the set of successor words that can follow on from the first symbol in w if w is in a solution.
For each non-empty set of symbols s of cardinality at most four, let words(s) be the set of words w such that S(w) = s. Any solution must contain exactly one word in words(s) for each such set s.
Now we can do a reasonable search. The basic idea is this: say we are exploring a search path ending with word w. The follow-on word must be a non-excluded word in succ(w). A word v is excluded if the search path contains some word w such that v in words(S(w)).
You can be slightly more cunning: if we track the possible "predecessor" words to a set s (i.e., words w with a successor v such that v in words(s)) and reach a point where every predecessor of s is excluded, then we know we have reached a dead end, since we'll never be able to obtain s from any extension of the current search path.
Code to follow after the weekend, with a bit of luck...

Here is my proposal. I'll admit upfront this is a performance and memory hog.
This may be overkill, but have a class We'll call it UniqueCombination This will contain a unique 1-4 char reduced combination of the input set (i.e. A,AB,ABC,...) This will also contain a list of possible combination (AB {AABB,ABAB,BBAA,...}) this will need a method that determines if any possible combination overlaps any possible combination of another UniqueCombination by three characters. Also need a override that takes a string as well.
Then we start with the string "AAAA" then we find all of the UniqueCombinations that overlap this string. Then we find how many uniqueCombinations those possible matches overlap with. (we could be smart at this point an store this number.) Then we pick the one with the least number of overlaps greater than 0. Use up the ones with the least possible matches first.
Then we find a specific combination for the chosen UniqueCombination and add it to the final string. Remove this UniqueCombination from the list, then as we find overlaps for current string. rinse and repeat. (we could be smart and on subsequent runs while searching for overlaps we could remove any of the unreduced combination that are contained in the final string.)
Well that's my plan I will work on the code this weekend. Granted this does not guarantee that the final 4 characters will be 4 of the same letter (it might actually be trying to avoid that but I will look into that as well.)

If there is a non-exponential solution at all it may need to be formulated in terms of a recursive "growth" from a problem with a smaller size i.e to contruct B(N,k) from B(N-1,k-1) or from B(N-1,k) or from B(N,k-1).
Systematic construction for B(5,2) - one step at the time :-) It's bound to get more complex latter [card stands for cardinality, {AB} has card=2, I'll also call them 2-s, 3-s etc.] Note, 2-s and 3-s will be k-1 and k latter (I hope).
Initial. Start with k-1 result and inject symbols for singletons
(unique expansion empty intersection):
ABCDE -> AABBCCDDEE
mark used card=2 sets: AB,BC,CD,DE
Rewriting. Form card=3 sets to inject symbols into marked card=2.
1st feasible lexicographic expansion fires (may have to backtrack for k>2)
it's OK to use already marked 2-s since they'll all get replaced
but may have to do a verification pass for higher k
AB->ACB, BC->BCD, CD->CED, DE->DAE ==> AACBBDCCEDDAEEB
mark/verify used 2s
normally keep marking/unmarking during the construction but also keep keep old
mark list
marking/unmarking can get expensive if there's backtracking in #3
Unused: AB, BE
For higher k may need several recursive rewriting passes
possibly partitioning new sets into classes
Finalize: unused 2-s should overlap around the edge (that's why it's cyclic)
ABE - B can go to the begining or and: AACBBDCCEDDAEEB
Note: a step from B(N-1,k) to B(N,k) may need injection of pseudo-signletons, like doubling or trippling A
B(5,2) -> B(5,3) - B(5,4)
Initial. same: - ABCDE -> AAACBBBDCCCEDDDAEEEB
no use of marking 3-sets since they are all going to be chenged
Rewriting.
choose systematic insertion positions
AAA_CBBB_DCCC_EDDD_AEEE_B
mark all 2-s released by this: AC,AD,BD,BE,CE
use marked 2-s to decide inserted symbols - totice total regularity:
AxCB D -> ADCB
BxDC E -> BEDC
CxED A -> CAED
DxAE B => DBAE
ExBA C -> ECBA
Verify that 3-s are all used (marked inserted symbols just for fun)
AAA[D]CBBB[E]DCCC[A]EDDD[B]AEEE[C]B
Note: Systematic choice if insertion point deterministically dictated insertions (only AD can fit 1st, AC would create duplicate 2-set (AAC, ACC))
Note: It's not going to be so nice for B(6,2) and B(6,3) since number of 2-s will exceede 2x the no of 1-s. This is important since 2-s sit naturally on the sides of 1-s like CBBBE and the issue is how to place them when you run out of 1-s.
B(5,3) is so symetrical that just repeating #1 produces B(5.4):
AAAADCBBBBEDCCCCAEDDDDBAEEEECB

Shortest path to transform one word into another

For a Data Structures project, I must find the shortest path between two words (like "cat" and "dog"), changing only one letter at a time. We are given a Scrabble word list to use in finding our path. For example:
cat -> bat -> bet -> bot -> bog -> dog
I've solved the problem using a breadth first search, but am seeking something better (I represented the dictionary with a trie).
Please give me some ideas for a more efficient method (in terms of speed and memory). Something ridiculous and/or challenging is preferred.
I asked one of my friends (he's a junior) and he said that there is no efficient solution to this problem. He said I would learn why when I took the algorithms course. Any comments on that?
We must move from word to word. We cannot go cat -> dat -> dag -> dog. We also have to print out the traversal.

NEW ANSWER
Given the recent update, you could try A* with the Hamming distance as a heuristic. It's an admissible heuristic since it's not going to overestimate the distance
OLD ANSWER
You can modify the dynamic-program used to compute the Levenshtein distance to obtain the sequence of operations.
EDIT: If there are a constant number of strings, the problem is solvable in polynomial time. Else, it's NP-hard (it's all there in wikipedia) .. assuming your friend is talking about the problem being NP-hard.
EDIT: If your strings are of equal length, you can use Hamming distance.

With a dictionary, BFS is optimal, but the running time needed is proportional to its size (V+E). With n letters, the dictionary might have ~a^n entires, where a is alphabet size. If the dictionary contains all words but the one that should be on the end of chain, then you'll traverse all possible words but won't find anything. This is graph traversal, but the size might be exponentially large.
You may wonder if it is possible to do it faster - to browse the structure "intelligently" and do it in polynomial time. The answer is, I think, no.
The problem:
You're given a fast (linear) way to check if a word is in dictionary, two words u, v and are to check if there's a sequence u -> a1 -> a2 -> ... -> an -> v.
is NP-hard.
Proof: Take some 3SAT instance, like
(p or q or not r) and (p or not q or r)
You'll start with 0 000 00 and are to check if it is possible to go to 2 222 22.
The first character will be "are we finished", three next bits will control p,q,r and two next will control clauses.
Allowed words are:
Anything that starts with 0 and contains only 0's and 1's
Anything that starts with 2 and is legal. This means that it consists of 0's and 1's (except that the first character is 2, all clauses bits are rightfully set according to variables bits, and they're set to 1 (so this shows that the formula is satisfable).
Anything that starts with at least two 2's and then is composed of 0's and 1's (regular expression: 222* (0+1)*, like 22221101 but not 2212001
To produce 2 222 22 from 0 000 00, you have to do it in this way:
(1) Flip appropriate bits - e.g. 0 100 111 in four steps. This requires finding a 3SAT solution.
(2) Change the first bit to 2: 2 100 111. Here you'll be verified this is indeed a 3SAT solution.
(3) Change 2 100 111 -> 2 200 111 -> 2 220 111 -> 2 222 111 -> 2 222 211 -> 2 222 221 -> 2 222 222.
These rules enforce that you can't cheat (check). Going to 2 222 22 is possible only if the formula is satisfable, and checking that is NP-hard. I feel it might be even harder (#P or FNP probably) but NP-hardness is enough for that purpose I think.
Edit: You might be interested in disjoint set data structure. This will take your dictionary and group words that can be reached from each other. You can also store a path from every vertex to root or some other vertex. This will give you a path, not neccessarily the shortest one.

There are methods of varying efficiency for finding links - you can construct a complete graph for each word length, or you can construct a BK-Tree, for example, but your friend is right - BFS is the most efficient algorithm.
There is, however, a way to significantly improve your runtime: Instead of doing a single BFS from the source node, do two breadth first searches, starting at either end of the graph, and terminating when you find a common node in their frontier sets. The amount of work you have to do is roughly half what is required if you search from only one end.

You can make it a little quicker by removing the words that are not the right length, first. More of the limited dictionary will fit into the CPU's cache. Probably all of it.
Also, all of the strncmp comparisons (assuming you made everything lowercase) can be memcmp comparisons, or even unrolled comparisons, which can be a speedup.
You could use some preprocessor magic and hard-compile the task for that word-length, or roll a few optimized variations of the task for common word lengths. All of those extra comparisons can 'go away' for pure unrolled fun.

This is a typical dynamic programming problem. Check for the Edit Distance problem.

What you are looking for is called the Edit Distance. There are many different types.
From (http://en.wikipedia.org/wiki/Edit_distance): "In information theory and computer science, the edit distance between two strings of characters is the number of operations required to transform one of them into the other."
This article about Jazzy (the java spell check API) has a nice overview of these sorts of comparisons (it's a similar problem - providing suggested corrections) http://www.ibm.com/developerworks/java/library/j-jazzy/

You could find the longest common subsequence, and therefore finding the letters that must be changed.

My gut feeling is that your friend is correct, in that there isn't a more efficient solution, but that is assumming you are reloading the dictionary every time. If you were to keep a running database of common transitions, then surely there would be a more efficient method for finding a solution, but you would need to generate the transitions beforehand, and discovering which transitions would be useful (since you can't generate them all!) is probably an art of its own.

bool isadjacent(string& a, string& b)
{
int count = 0; // to store count of differences
int n = a.length();
// Iterate through all characters and return false
// if there are more than one mismatching characters
for (int i = 0; i < n; i++)
{
if (a[i] != b[i]) count++;
if (count > 1) return false;
}
return count == 1 ? true : false;
}
// A queue item to store word and minimum chain length
// to reach the word.
struct QItem
{
string word;
int len;
};
// Returns length of shortest chain to reach 'target' from 'start'
// using minimum number of adjacent moves. D is dictionary
int shortestChainLen(string& start, string& target, set<string> &D)
{
// Create a queue for BFS and insert 'start' as source vertex
queue<QItem> Q;
QItem item = {start, 1}; // Chain length for start word is 1
Q.push(item);
// While queue is not empty
while (!Q.empty())
{
// Take the front word
QItem curr = Q.front();
Q.pop();
// Go through all words of dictionary
for (set<string>::iterator it = D.begin(); it != D.end(); it++)
{
// Process a dictionary word if it is adjacent to current
// word (or vertex) of BFS
string temp = *it;
if (isadjacent(curr.word, temp))
{
// Add the dictionary word to Q
item.word = temp;
item.len = curr.len + 1;
Q.push(item);
// Remove from dictionary so that this word is not
// processed again. This is like marking visited
D.erase(temp);
// If we reached target
if (temp == target)
return item.len;
}
}
}
return 0;
}
// Driver program
int main()
{
// make dictionary
set<string> D;
D.insert("poon");
D.insert("plee");
D.insert("same");
D.insert("poie");
D.insert("plie");
D.insert("poin");
D.insert("plea");
string start = "toon";
string target = "plea";
cout << "Length of shortest chain is: "
<< shortestChainLen(start, target, D);
return 0;
}
Copied from: https://www.geeksforgeeks.org/word-ladder-length-of-shortest-chain-to-reach-a-target-word/

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio