This is a question on combinatorics from a non-mathematician, so please try to bear with me!
Given an array of n distinct characters, I want to generate subsets of k characters in a minimal-change order, i.e. an order in which generation i+1 contains exactly one character that was not in generation i. That's not too hard in itself. However, I also want to maximise the number of cases in which the character that is swapped out in generation i+1 is the same character that was swapped in in generation i. To illustrate, for n=7, k=3:
abc abd abe* abf* abg* afg aeg* adg* acg* acd ace* acf* aef adf* ade
bde bdf bef bcf* bce bcd* bcg* bdg beg* bfg* cfg ceg* cdg* cde cdf* cef def deg dfg efg
The asterisked strings indicate the case I want to maximise; e.g. the e that is new in generation 3, abe, replaces a d that was new in generation 2, abd. It doesn't seem possible to have this happen in every generation, but I want it to happen as often as possible.
Typical array sizes that I use are 20-30 and subset sizes around 5-8.
I'm using an odd language, Icon (or actually its derivative Unicon), so I don't expect anyone to post code that I can used directly. But I will be grateful for answers or hints in pseudo-code, and will do my best to translate C etc. Also, I have noticed that problems of this kind are often discussed in terms of arrays of integers, and I can certainly apply solutions posted in such terms to my own problem.
Thanks
Kim Bastin
Edit 15 June 2010:
I do seem to have got into deeper water than I thought, and while I'm grateful for all answers, not all of them have been relevant. As an example of a solution which is NOT adequate, let me post my own Unicon procedure for generating k-ary subsets of a character set s in a minimal change order. Things you need to know to understand the code are: a preposed * means the size of a structure, so if s is a string, *s means the size of s (the number of characters it contains). || is a string concatenation operation. A preposed ! produces each element of a structure, e.g. each character of a string, in turn on successive passes. And the 'suspend' control structure returns a result from a procedure, but leaves the procedure 'in suspense', with all local variables in place, so that new results can be produced if the procedure is called in a loop.
procedure revdoor(s, k)
# Produces all k-subsets of a string or character set s in a 'revolving
# door' order. Each column except the first traverses the characters
# available to it in alphabetical and reverse alphabetical order
# alternately. The order of the input string is preserved.
# If called in a loop as revdoor("abcdefg", 3),
# the order of production is: abc, abd, abe, abf, abg, acg, acf, ace, acd,
# ade, adf, adg, aeg, aef, afg, bfg, bef, beg, bdg, bdf, bde, bcd, bce,
# bcf, bcg, cdg, cdf, cde, cef, ceg, cfg, dfg, deg, def, efg
local i
static Ctl
if /Ctl then { # this means 'if Ctl doesn't exist'
if k = 0 then return ""
Ctl := list(k, 1) # a list of k elements, each initialised to 1.
}
if Ctl[k] = 1 then {
if k = 1 then suspend !s else
every i := 1 to *s-k+1 do {
suspend s[i] || revdoor(s[i+1:0], k-1)
}
} else {
if k = 1 then suspend !reverse(s) else
every i := -k to -*s by -1 do {
suspend s[i] || revdoor(s[i+1:0], k-1)
}
}
# the following line multiplies element k of Ctl by -1 if k < size of Ctl
# (this controls the order of generation of characters),
# and destroys Ctl on final exit from the procedure.
if k < *Ctl then Ctl[k] *:= -1 else Ctl := &null
end
Note that the output of the above procedure is not optimal in my sense. One result of my investigations so far is that the maximum 'swapping score' for a list of k-ary subsets of n elements is not less than comb(n-1, k), or in the case of n=7, k=3, the maximum score is at least comb(6, 3) = 20. I define the 'swapping score' of a list as the number of items in the list whose new element replaces an element in the previous item which was itself new. I haven't got the mathematical equipment to prove this, but it is easy to see with k=1 or k=2. For certain (n,k) a slightly higher score is possible, as in the case of n=7, k=3:
abc abd abe abf abg
acg adg aeg afg
efg dfg cfg bfg
beg bdg bcg
bcd bce bcf
bdf bef
def cef aef
adf acf
acd ace
ade
bde cde
cdf cdg
ceg
deg (swapping score = 21)
It may be noted that the above list is in 'strong minimal change order' (like word golf: the new character is always in the same position as the character it replaces), which may indicate the direction my own work is taking. I hope to post something more in a few days.
Kim
It's fairly straightforward. In order to maximise replacement just think of the characters as numbers and increment the string by one till you have reached the upper limit.
Then check to see that you don't use the same character twice in the string.
I think this would work:
char c[] = {'a', 'b', 'c', 'd', 'e'};
const int n = 5;
const int k = 3;
char s[k];
void print()
{
for( int i = 0; i < k; ++i )
putchar(c[s[i]]);
putchar('\n');
}
bool increment( int m )
{
// reached the limit?
if( ++s[m] == n && m == 0 )
return false;
next:
for( int i = 0; i < m; ++i )
{
if( s[m] == n )
{
// carry
s[m] = 0;
if( !increment( m-1 ))
return false;
goto next;
}
else if( s[i] == s[m] )
{
// the character is already used
++s[m];
goto next;
}
}
return true;
}
int main(int, char**)
{
// initialise
for( int i = 0; i < k; ++i )
s[i] = i;
// enumerate all combinations
do
print();
while(increment(k-1));
}
Kim, your problem description sounds very much like a (homework) attempt to describe the simplest solution for enumerating all k-combinations of a set of n elements - without giving the actual solution away too easily. Anyway, see below for my shot. I used Java but the important parts are not different from C.
public class Homework
{
/**
* Prints all k-combinations of a set of n elements. Answer to this
* question: http://stackoverflow.com/questions/2698551
*/
public static void main(String[] args)
{
Combinations combinations = new Combinations(7, 3);
System.out.printf(
"Printing all %d %d-combinations of a set with %d elements:\n",
combinations.size(), combinations.k, combinations.n);
for (int[] c : combinations)
System.out.println(Arrays.toString(c));
}
/**
* Provides an iterator for all k-combinations of a set of n elements.
*/
static class Combinations implements Iterable<int[]>
{
public final int n, k;
public Combinations(int n, int k)
{
if (n < 1 || n < k)
throw new IllegalArgumentException();
this.n = n;
this.k = k;
}
#Override
public Iterator<int[]> iterator()
{
return new Iterator<int[]>()
{
private int[] c;
#Override
public void remove() { throw new UnsupportedOperationException(); }
#Override
public int[] next()
{
if (c == null)
{
c = new int[k];
for (int i = 0; i < k; i++)
c[i] = i;
}
else
{
int i = c.length - 1;
while (i >= 0 && c[i] == n - k + i)
i--;
if (i < 0)
throw new NoSuchElementException();
c[i]++;
for (int j = i + 1; j < c.length; j++)
c[j] = c[i] + j - i;
}
return c.clone(); // remove defensive copy if performance is more important
}
#Override
public boolean hasNext() { return c == null || c[0] < n - k; }
};
}
/**
* Returns number of combinations: n! / (k! * (n - k)!).
*/
public BigInteger size()
{
BigInteger s = BigInteger.valueOf(n);
for (int i = n - 1; i > n - k; i--)
s = s.multiply(BigInteger.valueOf(i));
for (int i = k; i > 1; i--)
s = s.divide(BigInteger.valueOf(i));
return s;
}
}
}
Here is the output for your example:
Printing all 35 3-combinations of a set with 7 elements:
[0, 1, 2] [0, 1, 3] [0, 1, 4] [0, 1, 5] [0, 1, 6] [0, 2, 3] [0, 2, 4] [0, 2, 5] [0, 2, 6] [0, 3, 4]
[0, 3, 5] [0, 3, 6] [0, 4, 5] [0, 4, 6] [0, 5, 6] [1, 2, 3] [1, 2, 4] [1, 2, 5] [1, 2, 6] [1, 3, 4]
[1, 3, 5] [1, 3, 6] [1, 4, 5] [1, 4, 6] [1, 5, 6] [2, 3, 4] [2, 3, 5] [2, 3, 6] [2, 4, 5] [2, 4, 6]
[2, 5, 6] [3, 4, 5] [3, 4, 6] [3, 5, 6] [4, 5, 6]
Rather than start with an algorithm, I've tried to think of a way to find a form for the maximum "swapping score", so that you know what to shoot for. Often an algorithm for producing the desired structure can emerge from such a proof.
It's been a long time since university, but I've tried to think of a combinatorial model that will help to figure this out, without very much luck.
I started by imagining the set of combinations as vertices in a graph, with a edges corresponding to the "adjacency" (only one element difference) of the combinations. So:
"n choose k" vertices
each vertex has degree k(n-k)
number of edges = "n choose k" * k(n-k) / 2 = "n choose 2" * "n-2 choose k-1"
There's a lot of symmetry to these graphs. The graph is the same for any given {n,k} as it is for {n,n-k}. If k=1 or k=n-1 it's the complete graph on n vertices (each combinations differs from all the others by only one character). I can't see an obvious algorithm from this, though.
Edit: My next thought was to conceive the graph with a slightly different interpretation. You can think of each {n,k}-combination as a sequence of n bits where there are k 1s. The position of the 1s corresponds to which of the n characters is present in the combination. So for n=7 k=3, abc is 1110000, adg is 1001001, efg is 0000111. With this you can also imagine the points lying at the corners of an n-dimensional hypercube. So for a given subsequence, the edges match your "minimal swapping" criteria if they are co-planar: I think of them as "cutting planes" through the hypercube.
You are looking for a Hamiltonian path through this graph of combinations, one that meets your special criteria.
Another way to think of your problem is to minimize the number of times in the sequence that you do change which character in the combination is being altered.
For a good answer, would computing the list of combinations all at once be acceptable, or do you need to compute them one at a time? In other words, do you need a function:
Combination nextCombo();
or would
vector<Combination> allCombinations();
be acceptable?
If computing the combinations in batch is permissible, it is possible that an iterative-deepening A* search (or just an A* search but I suspect it'd run out of memory) would work. With an admissible heuristic, A* is guaranteed to give the optimum. I'm short of time, so I decided to post this partial answer and edit the post if I get time to write code.
A* is a graph search algorithm. In this case, the nodes are lists of combinations used so far (no duplicates allowed in the list). My plan was to use a bit-string representation for the nodes. n=30 would fit into a 32 bit integer. We can arbitrarily permute any solution so that the first combination begins with 0's and ends in 1's, i.e. 000...1111. A node with a shorter list is connected to a longer one if the two lists are the same up until the last element and the last element differs only in having one 0'bit flipped to a 1 and one 1 bit flipped to a 0. The distance between the two is 0 if there was a "swap" and 1 if there was a swap.
A second representation for each combination is a sorted list of the bits that are turned on. One possible admissible heuristic for this graph is to use this index list. Each time (in the list of combinations) an index is used at a particular position in the index list, mark it off. The number of positions with un-used indices - the current last changed index is (I believe) the minimal number of swaps that need to happen.
To illustrate this heuristic, consider the sequence abc abd abe* abf* abg afg from above. (the letters would be numbers in my treatment, but that is a minor difference). This sequence (which would be one node in the search-graph) would have the following places marked:
1 2 3
*a
b *b
c c *c
d d *d
e e *e
*f *f
*g
Thus the heuristic would predict that there is at least one swap required (since there are no unmarked elements in position 3 and the current position is 2).
If I get the time, I'll edit this to post code and performance of the algorithm.
Re: the NP completeness result (in a comment to Zac Thompson's answer). The graph on which we are searching for a minimal cost Hamiltonian path has a very special structure. For example, the normally NP-complete Hamiltonian Path problem can be solved in O(n) time with the "enumerate all combinations" algorithm - with n being the number of nodes in the graph. This structure makes it possible that, though on a general graph, vertex cover is hard, on your graph it may be polynomial (even linear or quadratic). Of course, since the graph has a lot of nodes for e.g. n=30, k=8 you may still have a lot of computation ahead of you.
I worked on this problem in 2010 but failed to find a solution then. A few days ago I had another look at some of my notes from that time and suspected I had been very close to a solution. A few minutes later I had the key.
To recapitalute: the requirement is a strict minimal change ordering of the k-subsets of a string s such that LIFO (last in first out) is maximised. I refer to this as maximised ‘swapping’ in earlier posts.
I call the algorithm maxlifo (maximised LIFO) after the key requirement. It takes two parameters, a string s, which must not contain duplicated characters, and a positive integer k not greater than the size of s. The algorithm is recursive, i.e. maxlifo(s, k) uses the output of maxlifo(s, k-1) down to k=1. Output is returned as a list.
Below I give an informal explanation, with examples, using the string "abcdefg" and various values of k. This is followed by an example of implementation as a Unicon procedure. (I’m not fluent in any of the more commonly used languages.)
The case k=1 is trivial — it returns the elements of s in order from first to last.
For k>1, apply the following rules to the output of maxlifo(s, k-1):
(1) For each element of the output of maxlifo(s, k-1), list in a row the k-subsets built from that element with each missing character of s in turn. The order of characters in the subsets is as in s.
(2) Working from the second row down, substitute blank ‘placeholders’ for all occurrences of subsets that appear in an earlier row. Each k-subset of s now appears just once.
(3) In each non-blank row, mark with an initial ! each subset such that there is a placeholder at the same position in the next row. This marking means ‘first’. Exactly one subset will be so marked in each non-blank row.
(4) Delete all rows that are completely blank (contain only placeholders).
(5) In each remaining row except the last, mark with a final ! the subset in the position corresponding to the subset marked ‘first’ in the next lower row. This marking means ‘last’.
Now we can list the final maxlifo ordering of the subsets. Each row from top to bottom is ordered and its elements added in that order to the output list.
(6) In each row from the top down:
(6.1) Remove all blank placeholders.
(6.2) Add to the output list the subset marked ‘first’ (initial !) and remove it from the row.
(6.3) If there are still subsets remaining in the row, either the leftmost or the rightmost subset will be marked ‘last’ (final !). If the rightmost subset is marked ‘last’, add the subsets to the output list in order from left to right, otherwise in order from right to left.
(7) After processing all rows, return the output list.
Example using maxlifo("abcdefg", 2):
Col1 contains the output of maxlifo("abcdefg", 1). The rows of Col2 contain the cliques formed with the remaining characters of s:
Col1 Col2
---- ----------------------------
a ab ac ad ae af ag
b ab bc bd be bf bg
c ac bc cd ce cf cg
d ad bd cd de df dg
e ae be ce de ef eg
f af bf cf df ef fg
g ag bg cg dg eg fg
Blank out subsets that appear in an earlier row:
a ab ac ad ae af ag
b bc bd be bf bg
c cd ce cf cg
d de df dg
e ef eg
f fg
g
Mark the ‘first’ subset in each row (the one with a blank below it):
a !ab ac ad ae af ag
b !bc bd be bf bg
c !cd ce cf cg
d !de df dg
e !ef eg
f !fg
g
Delete all completely blank rows (only one in this case):
a !ab ac ad ae af ag
b !bc bd be bf bg
c !cd ce cf cg
d !de df dg
e !ef eg
f !fg
Mark the ’last’ subset in each row (the one with a ‘first’ subset below it).
a !ab ac! ad ae af ag
b !bc bd! be bf bg
c !cd ce! cf cg
d !de df! dg
e !ef eg!
f !fg
Output each row in the order described above: ‘first’, unmarked, ’last’:
Ordered rows:
a !ab ac! ad ae af ag ab ag af ae ad ac
b !bc bd! be bf bg bc bg bf be bd
c !cd ce! cf cg cd cg cf ce
d !de df! dg de dg df
e !ef eg! ef eg
f !fg fg
Output: [ab ag af ae ad ac bc bg bf be bd cd cg cf ce df dg de ef eg fg]
Examples for 3 <= k <= 6 are given in less detail. The blank rows deleted in step 4 are left in place.
maxlifo("abcdefg", 3):
Ordered rows:
ab !abc abd abe abf abg! abc abd abe abf abg
ag acg adg aeg! !afg afg acg adg aeg
af acf adf! !aef aef acf adf
ae ace! !ade ade ace
ad !acd! acd
ac
bc !bcd bce bcf bcg! bcd bce bcf bcg
bg bdg beg! !bfg bfg bdg beg
bf bdf! !bef bef bdf
be !bde! bde
bd
cd !cde cdf cdg! cde cdf cdg
cg ceg! !cfg cfg ceg
cf !cef! cef
ce
de !def deg! def deg
dg !dfg! dfg
df
ef !efg efg
eg
fg
Output: [abc abd abe abf abg afg acg adg aeg aef acf adf ade ace acd
bcd bce bcf bcg bfg bdg beg bef bdf bde
cde cdf cdg cfg ceg cef
def deg dfg
efg]
maxlifo("abcdefg", 4):
Ordered rows:
abc !abcd abce! abcf abcg abcd abcg abcf abce
abd !abde abdf! abdg abde abdg abdf
abe !abef abeg! abef abeg
abf !abfg! abfg
abg
afg acfg! adfg !aefg aefg adfg acfg
acg !acdg aceg! acdg aceg
adg !adeg! adeg
aeg
aef acef! !adef adef acef
acf !acdf! acdf
adf
ade !acde! acde
ace
acd
bcd !bcde bcdf! bcdg bcde bcdg bcdf
bce !bcef bceg! bcef bceg
bcf !bcfg! bcfg
bcg
bfg bdfg! !befg befg bdfg
bdg !bdeg! bdeg
beg
bef !bdef! bdef
bdf
bde
cde !cdef cdeg! cdef cdeg
cdf !cdfg! cdfg
cdg
cfg !cefg! cefg
ceg
cef
def !defg defg
deg
dfg
efg
Output: [abcd abcg abcf abce abde abdg abdf abef abeg abfg aefg adfg acfg acdg aceg adeg adef acef acdf acde
bcde bcdg bcdf bcef bceg bcfg befg bdfg bdeg bdef
cdef cdeg cdfg cefg
defg]
maxlifo("abcdefg", 5):
Ordered rows:
abcd !abcde abcdf abcdg! abcde abcdf abcdg
abcg abceg! !abcfg abcfg abceg
abcf !abcef! abcef
abce
abde !abdef abdeg! abdef abdeg
abdg !abdfg! abdfg
abdf
abef !abefg! abefg
abeg
abfg
aefg acefg! !adefg adefg acefg
adfg !acdfg! acdfg
acfg
acdg !acdeg! acdeg
aceg
adeg
adef !acdef! acdef
acef
acdf
acde
bcde !bcdef bcdeg! bcdef bcdeg
bcdg !bcdfg! bcdfg
bcdf
bcef !bcefg! bcefg
bceg
bcfg
befg !bdefg! bdefg
bdfg
bdeg
bdef
cdef !cdefg cdefg
cdeg
cdfg
cefg
defg
Output: [abcde abcdf abcdg abcfg abceg abcef abdef abdeg abdfg abefg adefg acefg acdfg acdeg acdef
bcdef bcdeg bcdfg bcefg bdefg
cdefg]
maxlifo("abcdefg", 6):
Ordered rows:
abcde !abcdef abcdeg! abcdef abcdeg
abcdf !abcdfg! abcdfg
abcdg
abcfg !abcefg! abcefg
abceg
abcef
abdef !abdefg! abdefg
abdeg
abdfg
abefg
adefg
acefg !acdefg! acdefg
acdfg
acdeg
acdef
bcdef !bcdefg bcdefg
bcdeg
bcdfg
bcefg
bdefg
cdefg
Output: [abcdef abcdeg abcdfg abcefg abdefg acdefg bcdefg]
Unicon implementation:
procedure maxlifo(s:string, k:integer)
# A solution to my combinatorics problem from 2010.
# Return a list of the k subsets of the characters of a string s
# in a minimal change order such that last-in first-out is maximised.
# String s must not contain duplicate characters and in the present
# implementation must not contain "!", which is used as a marker.
local ch, cand, Hit, inps, i, j, K, L, Outp, R, S
# Errors
if *cset(s) ~= *s then
stop("Duplicate characters in set in maxlifo(", s, ", ", k, ")")
if find("!", s) then
stop("Illegal character in set in maxlifo(", s, ", ", k, ")")
if k > *s then
stop("Subset size larger than set size in maxlifo(", s, ", ", k, ")")
# Special cases
if k = 0 then return []
if k = *s then return [s]
Outp := []
if k = 1 then {
every put(Outp, !s)
return Outp
}
# Default case
S := set()
K := []
# Build cliques from output of maxlifo(s, k-1) with the remaining
# characters in s, substituting empty strings as placeholders for
# subsets already listed.
every inps := !maxlifo(s, k-1) do {
R := []
every ch := !s do
if not find(ch, inps) then {
cand := reorder(inps ++ ch, s)
if member(S, cand) then cand := "" else insert(S, cand)
put(R, cand)
}
put(K, R)
}
# Mark ‘first’ subset in each row with initial "!"
every i := 1 to *K - 1 do {
every j := 1 to *K[i] do
if K[i, j] ~== "" & K[i+1, j] == "" then {
K[i, j] := "!" || K[i, j]
break
}
}
# Remove rows containing only placeholders
every i := *K to 1 by -1 do {
every if !K[i] ~== "" then break next
delete(K, i)
}
# Mark ‘last’ subset in each row with final "!"
every i := 1 to *K - 1 do
every j := 1 to *K[i] do
if K[i+1, j][1] == "!" then {
K[i, j] ||:= "!"
break
}
# Build output list
every R := !K do {
# Delete placeholders from row (no longer needed and in the way)
every j := *R to 1 by -1 do if R[j] == "" then delete(R, j)
# Handle ‘first’ subset and remove from row
# N.B. ‘First’ subset will be leftmost or rightmost in row
if R[1][1] == "!" then
put(Outp, trim(get(R), '!', 0))
else put(Outp, trim(pull(R), '!', 0))
# Handle any remaining subsets, ‘last’ subset last, stripping '!' markers
# N.B. ‘Last’ subset will be leftmost or rightmost in row after removal
# of ‘first’ subset.
if R[-1][-1] == "!" then while put(Outp, trim(get(R), '!', 0)) else
while put(Outp, trim(pull(R), '!', 0))
}
return Outp
end
procedure reorder(cs:cset, s:string)
# Reorder cset cs according to string s
local r
# If no s, return set in alphabetical order
if /s then return string(cs)
r := ""
s ? while tab(upto(cs)) do r ||:= move(1)
return r
end
Related
What parameters are passed to the generator:
x - word number;
N is the size of the alphabet;
L is the length of the output word.
It is necessary to implement a non-recursive algorithm that will return a word based on the three parameters passed.
Alphabet - Latin letters in alphabetical order, caps.
For N = 5, L = 3 we construct a correspondence of x to words:
0: ABC
1: ABD
2: ABE
3: ACB
4: ACD
5: ACE
6: ADB
7: ADC
8 ADE
9: AEB
10 AEC
11 AED
12 BAC
...
My implementation of the algorithm works for L = 1; 2. But errors appear on L = 3. The algorithm is based on shifts when accessing the alphabet. The h array stores the indices of the letters in the new dictionary (from which the characters that have already entered the word are excluded). Array A stores casts of indices h into the original dictionary (adds indents for each character removed from the alphabet to the left). Thus, in the end, array A stores Permutations without repetitions.
private static String getS (int x, int N, int L) {
String s = "ABCDEFGHJKLMNOPQ";
String out = "";
int [] h = new int [N];
int [] A = new int [N];
for (int i = 0; i <L; i ++) {
h [i] = (x / (factory (N - 1 - i) / factory (N - L)))% (N-i);
int sum = h [i];
for (int j = 0; j <i; j ++)
sum + = ((h [i]> = h [j])? 1: 0);
A [i] = sum;
out + = s.charAt (A [i]);
}
return out;
}
To generate a random word of length L: keep the alphabet in an array of size N, and get a random word of length L by swapping the i'th element for a random element in i to N-1 for i in [0, L-1].
To generate the x'th word of length L in alphabetical order: Note that for a word of size L made up of distinct letters from an alphabet of size N, there are (N-1)! / (N-L)! words starting with any given letter.
E.g., N=5, L=3, alphabet = ABCDE. The number of words starting with A (or any letter) is 4! / 2! = 12. These are all the ordered N-L-length subsets of the available N-1 letters.
So the first letter of word(x, N, L) is the x / ((N-1)! / (N-L)!) letter of the alphabet (zero-indexed).
You can then build your word recursively.
E.g., word(15, 5, 3, ABCDE): The first letter is 15 / (4! / 2!) = 15 / 12) = 1, so B.
We get the second letter recursively: word((15 % (4! / 2!), 4, 2, ACDE) = word(3, 4, 2, ACDE). Since 3 / (3! / 2!) = 3 / 3 = 1, the second letter is C.
Third letter: word(3%3, 3, 1, ADE) = word(0, 3, 1, ADE) = A.
0. ABC
1. ABD
2. ABE
3. ACB
4. ACD
5. ACE
6. ADB
7. ADC
8. ADE
9. AEB
10. AEC
11. AED
12. BAC
13. BAD
14. BAE
15. BCA
A different approach. You have a list of five letters: [ABCDE] and you have some words made from three of those letters with no repeats. Hence each letter is either included (1) or not included (0) in the word. That maps each word onto a five bit integer with only three bits set. In more general terms, you have each word mapping onto an L bit integer with N bits set.
That suggests running through the L bit integers, counting the number of set bits. Keep track of how many integers have N bits set. When you reach the required position, translate the integer back into a word: 22 -> 10110 -> ACD.
There are various tricks to speed up counting set bits using some logic operations if the simple approach isn't fast enough.
ETA: I should have made clear that you scan in reverse order from 0b11111 down to 0b00000. That matches with alphabetical order. ABC (11100) comes before CDE (00111).
I recently encountered a much more difficult variation of this problem, but realized I couldn't generate a solution for this very simple case. I searched Stack Overflow but couldn't find a resource that previously answered this.
You are given a triangle ABC, and you must compute the number of paths of certain length that start at and end at 'A'. Say our function f(3) is called, it must return the number of paths of length 3 that start and end at A: 2 (ABA,ACA).
I'm having trouble formulating an elegant solution. Right now, I've written a solution that generates all possible paths, but for larger lengths, the program is just too slow. I know there must be a nice dynamic programming solution that reuses sequences that we've previously computed but I can't quite figure it out. All help greatly appreciated.
My dumb code:
def paths(n,sequence):
t = ['A','B','C']
if len(sequence) < n:
for node in set(t) - set(sequence[-1]):
paths(n,sequence+node)
else:
if sequence[0] == 'A' and sequence[-1] == 'A':
print sequence
Let PA(n) be the number of paths from A back to A in exactly n steps.
Let P!A(n) be the number of paths from B (or C) to A in exactly n steps.
Then:
PA(1) = 1
PA(n) = 2 * P!A(n - 1)
P!A(1) = 0
P!A(2) = 1
P!A(n) = P!A(n - 1) + PA(n - 1)
= P!A(n - 1) + 2 * P!A(n - 2) (for n > 2) (substituting for PA(n-1))
We can solve the difference equations for P!A analytically, as we do for Fibonacci, by noting that (-1)^n and 2^n are both solutions of the difference equation, and then finding coefficients a, b such that P!A(n) = a*2^n + b*(-1)^n.
We end up with the equation P!A(n) = 2^n/6 + (-1)^n/3, and PA(n) being 2^(n-1)/3 - 2(-1)^n/3.
This gives us code:
def PA(n):
return (pow(2, n-1) + 2*pow(-1, n-1)) / 3
for n in xrange(1, 30):
print n, PA(n)
Which gives output:
1 1
2 0
3 2
4 2
5 6
6 10
7 22
8 42
9 86
10 170
11 342
12 682
13 1366
14 2730
15 5462
16 10922
17 21846
18 43690
19 87382
20 174762
21 349526
22 699050
23 1398102
24 2796202
25 5592406
26 11184810
27 22369622
28 44739242
29 89478486
The trick is not to try to generate all possible sequences. The number of them increases exponentially so the memory required would be too great.
Instead, let f(n) be the number of sequences of length n beginning and ending A, and let g(n) be the number of sequences of length n beginning with A but ending with B. To get things started, clearly f(1) = 1 and g(1) = 0. For n > 1 we have f(n) = 2g(n - 1), because the penultimate letter will be B or C and there are equal numbers of each. We also have g(n) = f(n - 1) + g(n - 1) because if a sequence ends begins A and ends B the penultimate letter is either A or C.
These rules allows you to compute the numbers really quickly using memoization.
My method is like this:
Define DP(l, end) = # of paths end at end and having length l
Then DP(l,'A') = DP(l-1, 'B') + DP(l-1,'C'), similar for DP(l,'B') and DP(l,'C')
Then for base case i.e. l = 1 I check if the end is not 'A', then I return 0, otherwise return 1, so that all bigger states only counts those starts at 'A'
Answer is simply calling DP(n, 'A') where n is the length
Below is a sample code in C++, you can call it with 3 which gives you 2 as answer; call it with 5 which gives you 6 as answer:
ABCBA, ACBCA, ABABA, ACACA, ABACA, ACABA
#include <bits/stdc++.h>
using namespace std;
int dp[500][500], n;
int DP(int l, int end){
if(l<=0) return 0;
if(l==1){
if(end != 'A') return 0;
return 1;
}
if(dp[l][end] != -1) return dp[l][end];
if(end == 'A') return dp[l][end] = DP(l-1, 'B') + DP(l-1, 'C');
else if(end == 'B') return dp[l][end] = DP(l-1, 'A') + DP(l-1, 'C');
else return dp[l][end] = DP(l-1, 'A') + DP(l-1, 'B');
}
int main() {
memset(dp,-1,sizeof(dp));
scanf("%d", &n);
printf("%d\n", DP(n, 'A'));
return 0;
}
EDITED
To answer OP's comment below:
Firstly, DP(dynamic programming) is always about state.
Remember here our state is DP(l,end), represents the # of paths having length l and ends at end. So to implement states using programming, we usually use array, so DP[500][500] is nothing special but the space to store the states DP(l,end) for all possible l and end (That's why I said if you need a bigger length, change the size of array)
But then you may ask, I understand the first dimension which is for l, 500 means l can be as large as 500, but how about the second dimension? I only need 'A', 'B', 'C', why using 500 then?
Here is another trick (of C/C++), the char type indeed can be used as an int type by default, which value is equal to its ASCII number. And I do not remember the ASCII table of course, but I know that around 300 will be enough to represent all the ASCII characters, including A(65), B(66), C(67)
So I just declare any size large enough to represent 'A','B','C' in the second dimension (that means actually 100 is more than enough, but I just do not think that much and declare 500 as they are almost the same, in terms of order)
so you asked what DP[3][1] means, it means nothing as the I do not need / calculate the second dimension when it is 1. (Or one can think that the state dp(3,1) does not have any physical meaning in our problem)
In fact, I always using 65, 66, 67.
so DP[3][65] means the # of paths of length 3 and ends at char(65) = 'A'
You can do better than the dynamic programming/recursion solution others have posted, for the given triangle and more general graphs. Whenever you are trying to compute the number of walks in a (possibly directed) graph, you can express this in terms of the entries of powers of a transfer matrix. Let M be a matrix whose entry m[i][j] is the number of paths of length 1 from vertex i to vertex j. For a triangle, the transfer matrix is
0 1 1
1 0 1.
1 1 0
Then M^n is a matrix whose i,j entry is the number of paths of length n from vertex i to vertex j. If A corresponds to vertex 1, you want the 1,1 entry of M^n.
Dynamic programming and recursion for the counts of paths of length n in terms of the paths of length n-1 are equivalent to computing M^n with n multiplications, M * M * M * ... * M, which can be fast enough. However, if you want to compute M^100, instead of doing 100 multiplies, you can use repeated squaring: Compute M, M^2, M^4, M^8, M^16, M^32, M^64, and then M^64 * M^32 * M^4. For larger exponents, the number of multiplies is about c log_2(exponent).
Instead of using that a path of length n is made up of a path of length n-1 and then a step of length 1, this uses that a path of length n is made up of a path of length k and then a path of length n-k.
We can solve this with a for loop, although Anonymous described a closed form for it.
function f(n){
var as = 0, abcs = 1;
for (n=n-3; n>0; n--){
as = abcs - as;
abcs *= 2;
}
return 2*(abcs - as);
}
Here's why:
Look at one strand of the decision tree (the other one is symmetrical):
A
B C...
A C
B C A B
A C A B B C A C
B C A B B C A C A C A B B C A B
Num A's Num ABC's (starting with first B on the left)
0 1
1 (1-0) 2
1 (2-1) 4
3 (4-1) 8
5 (8-3) 16
11 (16-5) 32
Cleary, we can't use the strands that end with the A's...
You can write a recursive brute force solution and then memoize it (aka top down dynamic programming). Recursive solutions are more intuitive and easy to come up with. Here is my version:
# search space (we have triangle with nodes)
nodes = ["A", "B", "C"]
#cache # memoize!
def recurse(length, steps):
# if length of the path is n and the last node is "A", then it's
# a valid path and we can count it.
if length == n and ((steps-1)%3 == 0 or (steps+1)%3 == 0):
return 1
# we don't want paths having len > n.
if length > n:
return 0
# from each position, we have two possibilities, either go to next
# node or previous node. Total paths will be sum of both the
# possibilities. We do this recursively.
return recurse(length+1, steps+1) + recurse(length+1, steps-1)
I'm considering all permutations of 0, ..., n-1 in lexicographic order. I'm given two ranks, i and j, and asked to find the rank of the permutation that results from applying the i'th permutation to the j'th permutation.
A couple examples for n=3:
p(3) = [1, 2, 0], p(4) = [2, 0, 1], result = [0, 1, 2], rank = 0
Given i = j = 4, we get [2, 0, 1] applied to itself is [1, 2, 0], rank = 3.
What I've come up with so far: I convert the ranks to their respective permutations via Lehmer codes, calculate the desired permutation, and convert back to rank via Lehmer codes.
Can anyone suggest a way to get the rank of the desired permutation from the other two ranks, without having to actually calculate the permutations? Storing the n! x n! array is not an option.
-edit- Note that I'm not wedded to lexicographic order if some other ordering would enable this.
-edit- Here are the n! by n! grids for n=3 & 4, for lexicographic ranks. Row i is indexed into column j to get the output. Note that the n=3 grid is identical to the top-left corner of the n=4 grid.
00|01|02|03|04|05|
01|00|03|02|05|04|
02|04|00|05|01|03|
03|05|01|04|00|02|
04|02|05|00|03|01|
05|03|04|01|02|00|
00|01|02|03|04|05|06|07|08|09|10|11|12|13|14|15|16|17|18|19|20|21|22|23|
01|00|03|02|05|04|07|06|09|08|11|10|13|12|15|14|17|16|19|18|21|20|23|22|
02|04|00|05|01|03|08|10|06|11|07|09|14|16|12|17|13|15|20|22|18|23|19|21|
03|05|01|04|00|02|09|11|07|10|06|08|15|17|13|16|12|14|21|23|19|22|18|20|
04|02|05|00|03|01|10|08|11|06|09|07|16|14|17|12|15|13|22|20|23|18|21|19|
05|03|04|01|02|00|11|09|10|07|08|06|17|15|16|13|14|12|23|21|22|19|20|18|
06|07|12|13|18|19|00|01|14|15|20|21|02|03|08|09|22|23|04|05|10|11|16|17|
07|06|13|12|19|18|01|00|15|14|21|20|03|02|09|08|23|22|05|04|11|10|17|16|
08|10|14|16|20|22|02|04|12|17|18|23|00|05|06|11|19|21|01|03|07|09|13|15|
09|11|15|17|21|23|03|05|13|16|19|22|01|04|07|10|18|20|00|02|06|08|12|14|
10|08|16|14|22|20|04|02|17|12|23|18|05|00|11|06|21|19|03|01|09|07|15|13|
11|09|17|15|23|21|05|03|16|13|22|19|04|01|10|07|20|18|02|00|08|06|14|12|
12|18|06|19|07|13|14|20|00|21|01|15|08|22|02|23|03|09|10|16|04|17|05|11|
13|19|07|18|06|12|15|21|01|20|00|14|09|23|03|22|02|08|11|17|05|16|04|10|
14|20|08|22|10|16|12|18|02|23|04|17|06|19|00|21|05|11|07|13|01|15|03|09|
15|21|09|23|11|17|13|19|03|22|05|16|07|18|01|20|04|10|06|12|00|14|02|08|
16|22|10|20|08|14|17|23|04|18|02|12|11|21|05|19|00|06|09|15|03|13|01|07|
17|23|11|21|09|15|16|22|05|19|03|13|10|20|04|18|01|07|08|14|02|12|00|06|
18|12|19|06|13|07|20|14|21|00|15|01|22|08|23|02|09|03|16|10|17|04|11|05|
19|13|18|07|12|06|21|15|20|01|14|00|23|09|22|03|08|02|17|11|16|05|10|04|
20|14|22|08|16|10|18|12|23|02|17|04|19|06|21|00|11|05|13|07|15|01|09|03|
21|15|23|09|17|11|19|13|22|03|16|05|18|07|20|01|10|04|12|06|14|00|08|02|
22|16|20|10|14|08|23|17|18|04|12|02|21|11|19|05|06|00|15|09|13|03|07|01|
23|17|21|11|15|09|22|16|19|05|13|03|20|10|18|04|07|01|14|08|12|02|06|00|
Here are the factoradics for n=4. I left off the last digit, which is always zero, for compactness.
000|001|010|011|020|021|100|101|110|111|120|121|200|201|210|211|220|221|300|301|310|311|320|321|
001|000|011|010|021|020|101|100|111|110|121|120|201|200|211|210|221|220|301|300|311|310|321|320|
010|020|000|021|001|011|110|120|100|121|101|111|210|220|200|221|201|211|310|320|300|321|301|311|
011|021|001|020|000|010|111|121|101|120|100|110|211|221|201|220|200|210|311|321|301|320|300|310|
020|010|021|000|011|001|120|110|121|100|111|101|220|210|221|200|211|201|320|310|321|300|311|301|
021|011|020|001|010|000|121|111|120|101|110|100|221|211|220|201|210|200|321|311|320|301|310|300|
100|101|200|201|300|301|000|001|210|211|310|311|010|011|110|111|320|321|020|021|120|121|220|221|
101|100|201|200|301|300|001|000|211|210|311|310|011|010|111|110|321|320|021|020|121|120|221|220|
110|120|210|220|310|320|010|020|200|221|300|321|000|021|100|121|301|311|001|011|101|111|201|211|
111|121|211|221|311|321|011|021|201|220|301|320|001|020|101|120|300|310|000|010|100|110|200|210|
120|110|220|210|320|310|020|010|221|200|321|300|021|000|121|100|311|301|011|001|111|101|211|201|
121|111|221|211|321|311|021|011|220|201|320|301|020|001|120|101|310|300|010|000|110|100|210|200|
200|300|100|301|101|201|210|310|000|311|001|211|110|320|010|321|011|111|120|220|020|221|021|121|
201|301|101|300|100|200|211|311|001|310|000|210|111|321|011|320|010|110|121|221|021|220|020|120|
210|310|110|320|120|220|200|300|010|321|020|221|100|301|000|311|021|121|101|201|001|211|011|111|
211|311|111|321|121|221|201|301|011|320|021|220|101|300|001|310|020|120|100|200|000|210|010|110|
220|320|120|310|110|210|221|321|020|300|010|200|121|311|021|301|000|100|111|211|011|201|001|101|
221|321|121|311|111|211|220|320|021|301|011|201|120|310|020|300|001|101|110|210|010|200|000|100|
300|200|301|100|201|101|310|210|311|000|211|001|320|110|321|010|111|011|220|120|221|020|121|021|
301|201|300|101|200|100|311|211|310|001|210|000|321|111|320|011|110|010|221|121|220|021|120|020|
310|210|320|110|220|120|300|200|321|010|221|020|301|100|311|000|121|021|201|101|211|001|111|011|
311|211|321|111|221|121|301|201|320|011|220|021|300|101|310|001|120|020|200|100|210|000|110|010|
320|220|310|120|210|110|321|221|300|020|200|010|311|121|301|021|100|000|211|111|201|011|101|001|
321|221|311|121|211|111|320|220|301|021|201|011|310|120|300|020|101|001|210|110|200|010|100|000|
I found an algorithm to convert between permutations and ranks in linear time. That's not quite what I want, but is probably good enough. It turns out that the fact that I don't care about lexicographic order is important. The ranking this uses is weird. I'm going to give two functions, one that converts from a rank to a permutation, and one that does the inverse.
First, to unrank (go from rank to permutation)
Initialize:
n = length(permutation)
r = desired rank
p = identity permutation of n elements [0, 1, ..., n]
unrank(n, r, p)
if n > 0 then
swap(p[n-1], p[r mod n])
unrank(n-1, floor(r/n), p)
fi
end
Next, to rank:
Initialize:
p = input permutation
q = inverse input permutation (in linear time, q[p[i]] = i for 0 <= i < n)
n = length(p)
rank(n, p, q)
if n=1 then return 0 fi
s = p[n-1]
swap(p[n-1], p[q[n-1]])
swap(q[s], q[n-1])
return s + n * rank(n-1, p, q)
end
That's the pseudocode. For my project I'll be careful to work with a copy of p so I don't mutate it when calculating its rank.
The running time of both of these is O(n).
There's a nice, readable paper explaining why this works: Ranking & Unranking Permutations in Linear Time, by Myrvold & Ruskey, Information Processing Letters Volume 79, Issue 6, 30 September 2001, Pages 281–284.
http://webhome.cs.uvic.ca/~ruskey/Publications/RankPerm/MyrvoldRuskey.pdf
If, in addition to R, you are not wedded to a particular P either, we could redefine the permutation function to facilitate a possible answer. The function, newPerm, below would permute a list in relation to R with the same consistency as the permuting function that "indexes into."
The example below is not optimized for efficiency (e.g., ranking/unranking can be done in O(n)). The last two lines of output compare the redefined permuting function to the "indexing" permuting function - as you can see, they both generate the same number of unique permutations when mapped to the permutation set. The function, f, would be the answer to the question.
Haskell code:
import Data.List (sort,permutations)
import Data.Maybe (fromJust)
sortedPermutations = sort $ permutations [0,1,2,3,4,5,6]
rank p = fromJust (lookup p rs) where rs = zip sortedPermutations [0..]
unrank r = fromJust (lookup r ps) where ps = zip [0..] sortedPermutations
tradPerm p s = foldr (\a b -> s!!a : b) [] p
newPerm p s = unrank (f (rank p) (rank s))
f r1 r2 = let l = r1 - r2 in if l < 0 then length sortedPermutations + l else l
Output:
*Main Data.List> unrank 3
[0,1,2,3,5,6,4]
*Main Data.List> unrank 8
[0,1,2,4,5,3,6]
*Main Data.List> f 3 8
5035
*Main Data.List> newPerm [0,1,2,3,5,6,4] [0,1,2,4,5,3,6]
[6,5,4,3,0,2,1]
*Main Data.List> rank [6,5,4,3,0,2,1]
5035
*Main Data.List> length $ group $ sort $ map (tradPerm [1,2,5,0,4,3,6]) sortedPermutations
5040
*Main Data.List> length $ group $ sort $ map (newPerm [1,2,5,0,4,3,6]) sortedPermutations
5040
I've got another interesing programming/mathematical problem.
For a given natural number q from interval [2; 10000] find the number n
which is equal to sum of q-th powers of its digits modulo 2^64.
for example: for q=3, n=153; for q=5, n=4150.
I wasn't sure if this problem fits more to math.se or stackoverflow, but this was a programming task which my friend told me quite a long time ago. Now I remembered that and would like to know how such things can be done. How to approach this?
There are two key points,
the range of possible solutions is bounded,
any group of numbers whose digits are the same up to permutation con contain at most one solution.
Let us take a closer look at the case q = 2. If a d-digit number n is equal to the sum of the squares of its digits, then
n >= 10^(d-1) // because it's a d-digit number
n <= d*9^2 // because each digit is at most 9
and the condition 10^(d-1) <= d*81 is easily translated to d <= 3 or n < 1000. That's not many numbers to check, a brute-force for those is fast. For q = 3, the condition 10^(d-1) <= d*729 yields d <= 4, still not many numbers to check. We could find smaller bounds by analysing further, for q = 2, the sum of the squares of at most three digits is at most 243, so a solution must be less than 244. The maximal sum of squares of digits in that range is reached for 199: 1² + 9² + 9² = 163, continuing, one can easily find that a solution must be less than 100. (The only solution for q = 2 is 1.) For q = 3, the maximal sum of four cubes of digits is 4*729 = 2916, continuing, we can see that all solutions for q = 3 are less than 1000. But that sort of improvement of the bound is only useful for small exponents due to the modulus requirement. When the sum of the powers of the digits can exceed the modulus, it breaks down. Therefore I stop at finding the maximal possible number of digits.
Now, without the modulus, for the sum of the q-th powers of the digits, the bound would be approximately
q - (q/20) + 1
so for larger q, the range of possible solutions obtained from that is huge.
But two points come to the rescue here, first the modulus, which limits the solution space to 2 <= n < 2^64, at most 20 digits, and second, the permutation-invariance of the (modular) digital power sum.
The permutation invariance means that we only need to construct monotonous sequences of d digits, calculate the sum of the q-th powers and check whether the number thus obtained has the correct digits.
Since the number of monotonous d-digit sequences is comparably small, a brute-force using that becomes feasible. In particular if we ignore digits not contributing to the sum (0 for all exponents, 8 for q >= 22, also 4 for q >= 32, all even digits for q >= 64).
The number of monotonous sequences of length d using s symbols is
binom(s+d-1, d)
s is for us at most 9, d <= 20, summing from d = 1 to d = 20, there are at most 10015004 sequences to consider for each exponent. That's not too much.
Still, doing that for all q under consideration amounts to a long time, but if we take into account that for q >= 64, for all even digits x^q % 2^64 == 0, we need only consider sequences composed of odd digits, and the total number of monotonous sequences of length at most 20 using 5 symbols is binom(20+5,20) - 1 = 53129. Now, that looks good.
Summary
We consider a function f mapping digits to natural numbers and are looking for solutions of the equation
n == (sum [f(d) | d <- digits(n)] `mod` 2^64)
where digits maps n to the list of its digits.
From f, we build a function F from lists of digits to natural numbers,
F(list) = sum [f(d) | d <- list] `mod` 2^64
Then we are looking for fixed points of G = F ∘ digits. Now n is a fixed point of G if and only if digits(n) is a fixed point of H = digits ∘ F. Hence we may equivalently look for fixed points of H.
But F is permutation-invariant, so we can restrict ourselves to sorted lists and consider K = sort ∘ digits ∘ F.
Fixed points of H and of K are in one-to-one correspondence. If list is a fixed point of H, then sort(list) is a fixed point of K, and if sortedList is a fixed point of K, then H(sortedList) is a permutation of sortedList, hence H(H(sortedList)) = H(sortedList), in other words, H(sortedList) is a fixed point of K, and sort resp. H are bijections between the set of fixed points of H and K.
A further improvement is possible if some f(d) are 0 (modulo 264). Let compress be a function that removes digits with f(d) mod 2^64 == 0 from a list of digits and consider the function L = compress ∘ K.
Since F ∘ compress = F, if list is a fixed point of K, then compress(list) is a fixed point of L. Conversely, if clist is a fixed point of L, then K(clist) is a fixed point of K, and compress resp. K are bijections between the sets of fixed points of L resp. K. (And H(clist) is a fixed point of H, and compress ∘ sort resp. H are bijections between the sets of fixed points of L resp. H.)
The space of compressed sorted lists of at most d digits is small enough to brute-force for the functions f under consideration, namely power functions.
So the strategy is:
Find the maximal number d of digits to consider (bounded by 20 due to the modulus, smaller for small q).
Generate the compressed monotonic sequences of up to d digits.
Check whether the sequence is a fixed point of L, if it is, F(sequence) is a fixed point of G, i.e. a solution of the problem.
Code
Fortunately, you haven't specified a language, so I went for the option of simplest code, i.e. Haskell:
{-# LANGUAGE CPP #-}
module Main (main) where
import Data.List
import Data.Array.Unboxed
import Data.Word
import Text.Printf
#include "MachDeps.h"
#if WORD_SIZE_IN_BITS == 64
type UINT64 = Word
#else
type UINT64 = Word64
#endif
maxDigits :: UINT64 -> Int
maxDigits mx = min 20 $ go d0 (10^(d0-1)) start
where
d0 = floor (log (fromIntegral mx) / log 10) + 1
mxi :: Integer
mxi = fromIntegral mx
start = mxi * fromIntegral d0
go d p10 mmx
| p10 > mmx = d-1
| otherwise = go (d+1) (p10*10) (mmx+mxi)
sortedDigits :: UINT64 -> [UINT64]
sortedDigits = sort . digs
where
digs 0 = []
digs n = case n `quotRem` 10 of
(q,r) -> r : digs q
generateSequences :: Int -> [a] -> [[a]]
generateSequences 0 _
= [[]]
generateSequences d [x]
= [replicate d x]
generateSequences d (x:xs)
= [replicate k x ++ tl | k <- [d,d-1 .. 0], tl <- generateSequences (d-k) xs]
generateSequences _ _ = []
fixedPoints :: (UINT64 -> UINT64) -> [UINT64]
fixedPoints digFun = sort . map listNum . filter okSeq $
[ds | d <- [1 .. mxdigs], ds <- generateSequences d contDigs]
where
funArr :: UArray UINT64 UINT64
funArr = array (0,9) [(i,digFun i) | i <- [0 .. 9]]
mxval = maximum (elems funArr)
contDigs = filter ((/= 0) . (funArr !)) [0 .. 9]
mxdigs = maxDigits mxval
listNum = sum . map (funArr !)
numFun = listNum . sortedDigits
listFun = inter . sortedDigits . listNum
inter = go contDigs
where
go cds#(c:cs) dds#(d:ds)
| c < d = go cs dds
| c == d = c : go cds ds
| otherwise = go cds ds
go _ _ = []
okSeq ds = ds == listFun ds
solve :: Int -> IO ()
solve q = do
printf "%d:\n " q
print (fixedPoints (^q))
main :: IO ()
main = mapM_ solve [2 .. 10000]
It's not optimised, but as is, it finds all solutions for 2 <= q <= 10000 in a little below 50 minutes on my box, starting with
2:
[1]
3:
[1,153,370,371,407]
4:
[1,1634,8208,9474]
5:
[1,4150,4151,54748,92727,93084,194979]
6:
[1,548834]
7:
[1,1741725,4210818,9800817,9926315,14459929]
8:
[1,24678050,24678051,88593477]
9:
[1,146511208,472335975,534494836,912985153]
10:
[1,4679307774]
11:
[1,32164049650,32164049651,40028394225,42678290603,44708635679,49388550606,82693916578,94204591914]
And ending with
9990:
[1,12937422361297403387,15382453639294074274]
9991:
[1,16950879977792502812]
9992:
[1,2034101383512968938]
9993:
[1]
9994:
[1,9204092726570951194,10131851145684339988]
9995:
[1]
9996:
[1,10606560191089577674,17895866689572679819]
9997:
[1,8809232686506786849]
9998:
[1]
9999:
[1]
10000:
[1,11792005616768216715]
The exponents from about 10 to 63 take longest (individually, not cumulative), there's a remarkable speedup from exponent 64 on due to the reduced search space.
Here is a brute force solution that will solve for all such n, including 1 and any other n greater than the first within whatever range you choose (in this case I chose base^q as my range limit). You could modify to ignore the special case of 1 and also to return after the first result. It's in C#, but might look nicer in a language with a ** exponentiation operator. You could also pass in your q and base as parameters.
int q = 5;
int radix = 10;
for (int input = 1; input < (int)Math.Pow(radix, q); input++)
{
int sum = 0;
for (int i = 1; i < (int)Math.Pow(radix, q); i *= radix)
{
int x = input / i % radix; //get current digit
sum += (int)Math.Pow(x, q); //x**q;
}
if (sum == input)
{
Console.WriteLine("Hooray: {0}", input);
}
}
So, for q = 5 the results are:
Hooray: 1
Hooray: 4150
Hooray: 4151
Hooray: 54748
Hooray: 92727
Hooray: 93084
Given a list L of n character strings, and an input character string S, what is an efficient way to find the character string in L that contains the most characters that exist in S? We want to find the string in L that is most-closely made up of the letters contained in S.
The obvious answer is to loop through all n strings and check to see how many characters in the current string exist in S. However, this algorithm will be run frequently, and the list L of n string will be stored in a database... loop manually through all n strings would require something like big-Oh of n*m^2, where n is the number of strings in L, and m is the max length of any string in L, as well as the max length of S... in this case m is actually a constant of 150.
Is there a better way than just a simple loop? Is there a data structure I can load the n strings into that would give me fast search ability? Is there an algorithm that uses the pre-calculated meta-data about each of the n strings that would perform better than a loop?
I know there are a lot of geeks out there that are into the algorithms. So please help!
Thanks!
If you are after substrings, a Trie or Patrica trie might be a good starting point.
If you don't care about the order, just about the number of each symbol or letter, I would calculate the histogram of all strings and then compare them with the histogram of the input.
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Hello World => ...11..1...3..2..1....1...
This will lower the costs to O(26 * m + n) plus the preprocessing once if you consider only case-insensitive latin letters.
If m is constant, you could interpret the histogram as a 26 dimensional vector on a 26 dimensional unit sphere by normalizing it. Then you could just calculate the Dot Product of two vectors yielding the cosine of the angle between the two vectors, and this value should be proportional to the similarity of the strings.
Assuming m = 3, a alphabet A = { 'U', 'V', 'W' } of size three only, and the following list of strings.
L = { "UUU", "UVW", "WUU" }
The histograms are the following.
H = { (3, 0, 0), (1, 1, 1), (2, 0, 1) }
A histogram h = (x, y, z) is normalized to h' = (x/r, y/r, z/r) with r the Euclidian norm of the histogram h - that is r = sqrt(x² + y² + z²).
H' = { (1.000, 0.000, 0.000), (0.577, 0.577, 0.577), (0.894, 0.000, 0.447) }
The input S = "VVW" has the histogram hs = (0, 2, 1) and the normalized histogram hs' = (0.000, 0.894, 0.447).
Now we can calculate the similarity of two histograms h1 = (a, b, c) and h2 = (x, y, z) as the Euclidian distance of both histograms.
d(h1, h2) = sqrt((a - x)² + (b - y)² + (c - z)²)
For the example we obtain.
d((3, 0, 0), (0, 2, 1)) = 3.742
d((1, 1, 1), (0, 2, 1)) = 1.414
d((2, 0, 1), (0, 2, 1)) = 2.828
Hence "UVW" is closest to "VVW" (smaller numbers indicate higher similarity).
Using the normalized histograms h1' = (a', b', c') and h2' = (x', y', z') we can calculate the distance as the dot product of both histograms.
d'(h1', h2') = a'x' + b'y' + c'z'
For the example we obtain.
d'((1.000, 0.000, 0.000), (0.000, 0.894, 0.447)) = 0.000
d'((0.577, 0.577, 0.577), (0.000, 0.894, 0.447)) = 0.774
d'((0.894, 0.000, 0.447), (0.000, 0.894, 0.447)) = 0.200
Again "UVW" is determined to be closest to "VVW" (larger numbers indicate higher similarity).
Both version yield different numbers, but the results are always the same. One could also use other norms - Manhattan distance (L1 norm) for example - but this will only change the numbers because norms in finite dimensional vector spaces are all equivalent.
Sounds like you need a trie. Tries are used to search for words similar to the way a spell checker will work. So if the String S has the characters in the same order as the Strings in L then this may work for you.
If however, the order of the characters in S is not relevant - like a set of scrabble tiles and you want to search for the longest word - then this is not your solution.
What you want is a BK-Tree. It's a bit unintuitive, but very cool - and it makes it possible to search for elements within a levenshtein (edit) distance threshold in O(log n) time.
If you care about ordering in your input strings, use them as is. If you don't you can sort the individual characters before inserting them into the BK-Tree (or querying with them).
I believe what you're looking for can be found here: Fuzzy Logic Based Search Technique
It's pretty heavy, but so is what you're asking for. It talks about word similarities, and character misplacement.
i.e:
L I N E A R T R N A S F O R M
L I N A E R T R A N S F O R M
L E N E A R T R A N S F R M
it seems to me that the order of the characters is not important in your problem, but you are searching for "near-anagrams" of the word S.
If that's so, then you can represent every word in the set L as an array of 26 integers (assuming your alphabet has 26 letters). You can represent S similarly as an array of 26 integers; now to find the best match you just run once through the set L and calculate a distance metric between the S-vector and the current L-vector, however you want to define the distance metric (e.g. euclidean / sum-of-squares or Manhattan / sum of absolute differences). This is O(n) algorithm because the vectors have constant lengths.
Here is a T-SQL function that has been working great for me, gives you the edit distance:
Example:
SELECT TOP 1 [StringValue] , edit_distance([StringValue, 'Input Value')
FROM [SomeTable]
ORDER BY edit_distance([StringValue, 'Input Value')
The Function:
CREATE FUNCTION edit_distance(#s1 nvarchar(3999), #s2 nvarchar(3999))
RETURNS int
AS
BEGIN
DECLARE #s1_len int, #s2_len int, #i int, #j int, #s1_char nchar, #c int, #c_temp int,
#cv0 varbinary(8000), #cv1 varbinary(8000)
SELECT #s1_len = LEN(#s1), #s2_len = LEN(#s2), #cv1 = 0x0000, #j = 1, #i = 1, #c = 0
WHILE #j <= #s2_len
SELECT #cv1 = #cv1 + CAST(#j AS binary(2)), #j = #j + 1
WHILE #i <= #s1_len
BEGIN
SELECT #s1_char = SUBSTRING(#s1, #i, 1), #c = #i, #cv0 = CAST(#i AS binary(2)), #j = 1
WHILE #j <= #s2_len
BEGIN
SET #c = #c + 1
SET #c_temp = CAST(SUBSTRING(#cv1, #j+#j-1, 2) AS int) +
CASE WHEN #s1_char = SUBSTRING(#s2, #j, 1) THEN 0 ELSE 1 END
IF #c > #c_temp SET #c = #c_temp
SET #c_temp = CAST(SUBSTRING(#cv1, #j+#j+1, 2) AS int)+1
IF #c > #c_temp SET #c = #c_temp
SELECT #cv0 = #cv0 + CAST(#c AS binary(2)), #j = #j + 1
END
SELECT #cv1 = #cv0, #i = #i + 1
END
RETURN #c
END