Overlapping sequences - algorithm

There are many sequences of start-end pairs. How to find all the ranges which are contained in all of the sequences? Start and end are integers and they may be far away, so making bit-fields of the sequences and &-ing them is not feasible. The ranges (i.e. start-end pairs) on one "row" (i.e. one sequence) do not overlap, if that helps. And there is lower and upper bound for start and end, 32-bits integers would be enough I think (i.e., 0 <= values <= 65535).
Let me draw an example:
|----------| |---------------------| |----------|
|----------------------| |-------|
|---------------------| |--|
The result should be:
|--------| |--|
The example above would read, approximately:
row1 = (100, 200), (300, 600), (800, 900)
row2 = (140, 450), (780, 860)
row3 = (280, 580), (820, 860)
result = (300, 450), (820, 860)
Also, is there any known algorithm for this? I mean, does this problem have a name?

That should not be to hard assuming the ranges in each sequence do not overlap. In this case it is just a matter of iterating over all points and tracking when you enter or leave a range.
Throw all your points from all your sequences into one list, sort it and remember for each point if it is a start or end point.
100 S ---
140 S | ---
200 E --- |
280 S | ---
300 S --- | |
450 E | --- |
580 E | ---
600 E ---
780 S ---
800 S --- |
820 S | | ---
860 E | --- |
860 E | ---
900 E ---
Now you iterate over this list and every time you encounter a start point you increment a counter, every time you encounter an end point you decrement the counter.
0
100 S 1
140 S 2
200 E 1
280 S 2
300 S 3 <--
450 E 2 <--
580 E 1
600 E 0
780 S 1
800 S 2
820 S 3 <--
860 E 2 <--
860 E 1
900 E 0
When the counter is equal to the number of sequences - three in your example - you have found the start of one range and the next point is the end of this range.
Note that it is not even required to build the list explicitly if the ranges in each sequence are sorted by start or can be sorted by start. In this case you can just iterate over all sequences in parallel by keeping a pointers to the current range in each sequence.
Here the whole thing in C# - the class for ranges.
internal sealed class Range
{
private readonly Int32 start = 0;
private readonly Int32 end = 0;
public Range(Int32 start, Int32 end)
{
this.start = start;
this.end = end;
}
internal Int32 Start
{
get { return this.start; }
}
internal Int32 End
{
get { return this.end; }
}
}
The class for points with a flag to discriminate between start and end points.
internal sealed class Point
{
private readonly Int32 position = 0;
private readonly Boolean isStartPoint = false;
public Point(Int32 position, Boolean isStartPoint)
{
this.position = position;
this.isStartPoint = isStartPoint;
}
internal Int32 Position
{
get { return this.position; }
}
internal Boolean IsStartPoint
{
get { return this.isStartPoint; }
}
}
And finally the algorithm and test program.
internal static class Program
{
private static void Main()
{
var s1 = new List<Range> { new Range(100, 200), new Range(300, 600), new Range(800, 900) };
var s2 = new List<Range> { new Range(140, 450), new Range(780, 860) };
var s3 = new List<Range> { new Range(280, 580), new Range(820, 860) };
var sequences = new List<List<Range>> { s1, s2, s3 };
var startPoints = sequences.SelectMany(sequence => sequence)
.Select(range => new Point(range.Start, true));
var endPoints = sequences.SelectMany(sequence => sequence)
.Select(range => new Point(range.End, false));
var points = startPoints.Concat(endPoints).OrderBy(point => point.Position);
var counter = 0;
foreach (var point in points)
{
if (point.IsStartPoint)
{
counter++;
if (counter == sequences.Count)
{
Console.WriteLine("Start {0}", point.Position);
}
}
else
{
if (counter == sequences.Count)
{
Console.WriteLine("End {0}", point.Position);
Console.WriteLine();
}
counter--;
}
}
Console.ReadLine();
}
}
The output is as desired the following.
Start 300
End 450
Start 820
End 860

I think you can do that rather simply by fusing the sequences 2 by 2.
Each fusion should be doable in linear time of the number of intervals in the considered sequences (if the sequences are sorted), and M-1 fusions are required (with M number of sequences)
Taking your example and adding an extra sequence:
|----------| |---------------------| |----------|
|----------------------| |-------|
|---------------------| |--|
|-----------------------------------| |-----|
Fuse by pair of sequences:
|-----| |--------| |-----|
|---------------------| |--|
Fuse again:
|--------| |--|
But you may be able to find a faster way to do it. The worst case has O(N log M) runtime (N total number of intervals).
Edit: Pseudocode for fusion
Take s1 and s2 an iterator on each sequence
While there are still intervals in both sequences
Compare the intervals:
If s1.begin < s2.begin
If s2.begin < s1.end
If s2.end > s1.end
Add [s2.begin,s1.end] to the fused sequence
Increment s1
Else
Add [s2.begin,s2.end] to the fused sequence
Increment s2
Else
Increment s1
Else
Same thing with s1 and s2 reversed

It's called the longest common substring. I can be solved using suffix trees. There is a pretty clean Java implementation of it available on this blog and it works with more than just two source strings.
I don't know if you are dealing with characters but I'm sure you can probably adapt it if you don't.

Related

Compact way to produce a large sequence of strings in lexical order

I want to generate a sequence of strings with the following properties:
Lexically ordered
Theoretically infinite
Compact over a realistic range
Generated by a simple process of incrementation
Matches the regexp /\w+/
The obvious way to generate a lexically-ordered sequence is to choose a string length and pad the strings with a base value like this: 000000, 000001, etc. This approach poses a trade-off between the number of permutations and compactness: a string long enough to yield many permutations will be filled many zeros along the way. Plus, the length I choose sets an upper bound on the total number of permutations unless I have some mechanism for expanding the string when it maxes out.
So I came up with a sequence that works like this:
Each string consists of a "head", which is a base-36 number, followed by an underscore, and then the "tail", which is also a base-36 number padded by an increasing number of zeros
The first cycle goes from 0_0 to 0_z
The second cycle goes from 1_00 to 1_zz
The third cycle goes from 2_000 to 2_zzz, and so on
Once the head has reached z and the tail consists of 36 zs, the first "supercycle" has ended. Now the whole sequence starts over, except the z remains at the beginning, so the new cycle starts with z0_0, then continues to z1_00, and so on
The second supercycle goes zz0_0, zz1_00, and so on
Although the string of zs in the head could become unwieldy over the long run, a single supercycle contains over 10^56 permutations, which is far more than I ever expect to use. The sequence is theoretically infinite but very compact within a realistic range. For instance, the trillionth permutation is a succinct 7_bqd55h8s.
I can generate the sequence relatively simply with this javascript function:
function genStr (n) {
n = BigInt(n);
let prefix = "",
cycle = 0n,
max = 36n ** (cycle + 1n);
while (n >= max) {
n -= max;
if (cycle === 35n) {
prefix += "z";
cycle = 0n;
} else {
cycle++;
}
max = 36n ** (cycle + 1n);
}
return prefix
+ cycle.toString(36)
+ "_"
+ n.toString(36).padStart(Number(cycle) + 1, 0);
}
The n parameter is a number that I increment and pass to the function to get the next member of the sequence. All I need to keep track of is a simple integer, making the sequence very easy to use.
So obviously I spent a lot of time on this and I think it's pretty good, but I'm wondering if there is a better way. Is there a good algorithm for generating a sequence along the lines of the one I'm looking for?
A close idea to yours. (more rafined than my first edit...).
Let our alphabet be A = {0,1,2,3}.
Let |2| mean we iterate from 0 to 2 and |2|^2 mean we generate the cartesian product in a lexically sorted manner (00,01,10,11).
We start with
0 |3|
So we have a string of length 2. We "unshift" the digit 1 which "factorizes" since any 0|3|... is less than 1|3|^2.
1 |3|^2
Same idea: unshift 2, and make words of length 4.
2 |3|^3
Now we can continue and generate
3 |2| |3|^3
Notice |2| and not |3|. Now our maximum number becomes 32333. And as you did, we can now add the carry and start a new supercycle:
33 0|3|
This is a slight improvement, since _ can now be part of our alphabet: we don't need to reserve it as a token separator.
In our case we can represent in a supercycle:
n + n^2 + ... + n^(n-1) + (n-1) * n^(n-1)
\-----------------------/\--------------/
geometric special
In your case, the special part would be n^n (with the nuance that you have theorically one char less so replace n with n-1 everywhere)
The proposed supercycle is of length :
P = (n \sum_{k = 0}^{n-2} n^k) + (n-1) * n^(n-1)
P = (n \sum_{k = 0}^{n-3} n^k) + n^n
P = n(n^{n-2} - 1)/(n-1) + n^n
Here is an example diff with alphabet A={0,1,2}
my genStr(grandinero)
,00 0_0
,01 0_1
,02 0_2
,100 1_00
,101 1_01
,102 1_02
,110 1_10
,111 1_11
,112 1_12
,120 1_20
,121 1_21
,122 1_22
,2000 2_000
,2001 2_001
,2002 2_002
,2010 2_010
,2011 2_011
,2012 2_012
,2020 2_020
,2021 2_021
,2022 2_022
,2100 2_100
,2101 2_101
,2102 2_102
,2110 2_110
,2111 2_111
,2112 2_112
,2120 2_120
,2121 2_121
,2122 2_122
22,00 2_200 <-- end of my supercycle if no '_' allowed
22,01 2_201
22,02 2_202
22,100 2_210
22,101 2_211
22,102 2_212
22,110 2_220
22,111 2_221
22,112 2_222 <-- end of yours
22,120 z0_0
That said, for a given number x, we can can count how many supercycles (E(x / P)) there are, each supercycle making two leading e (e being the last char of A).
e.g: A = {0,1,2} and x = 43
e = 2
P = n(n^{n-2} - 1)/(n-1) + n^n = 3(3^1 -1)/2 + 27 = 30
// our supercycle is of length 30
E(43/30) = 1 // 43 makes one supercycle and a few more "strings"
r = x % P = 13 // this is also x - (E(43/30) * 30) (the rest of the euclidean division by P)
Then for the left over (r = x % P) two cases to consider:
either we fall in the geometric sequence
either we fall in the (n-1) * n^(n-1) part.
1. Adressing the geometric sequence with cumulative sums (x < S_w)
Let S_i be the cumsum of n, n^2,..
S_i = n\sum_{k = 0}^{i-1} n^k
S_i = n/(n-1)*(n^i - 1)
which gives S_0 = 0, S_1 = n, S_2 = n + n^2...
So basically, if x < S_1, we get 0(x), elif x < S_2, we get 1(x-S_1)
Let S_w = S_{n-1} the count of all the numbers we can represent.
If x <= S_w then we want the i such that
S_i < x <= S_{i+1} <=> n^i < (n-1)/n * x + 1 <= n^{i+1}
We can then apply some log flooring (base(n)) to get that i.
We can then associate the string: A[i] + base_n(x - S_i).
Illustration:
This time with A = {0,1,2,3}.
Let x be 17.
Our consecutive S_i are:
S_0 = 0
S_1 = 4
S_2 = S_1 + 4^2 = 20
S_3 = S_2 + 4^3 = 84
S_w = S_{4-1} = S_3 = 84
x=17 is indeed less than 84, we will be able to affect it to one of the S_i ranges.
In particular S_1==4 < x==17 <= S_2==20.
We remove the strings encoded by the leading 0(there are a number S_1 of those strings).
The position to encode with the leading 1 is
x - 4 = 13.
And we conclude the thirteen's string generated with a leading 1 is base_4(13) = '31' (idem string -> '131')
Should we have had x = 21, we would have removed the count of S_2 so 21-20 = 1, which in turn gives with a leading 2 the string '2001'.
2. Adressing x in the special part (x >= S_w)
Let's consider study case below:
with A = {0,1,2}
The special part is
2 |1| |2|^2
that is:
2 0 00
2 0 01
2 0 02
2 0 10
2 0 11
2 0 12
2 0 20
2 0 21
2 0 22
2 1 20
2 1 21
2 1 22
2 1 10
2 1 11
2 1 12
2 1 20
2 1 21
2 1 22
Each incremented number of the second column (here 0 to 1 (specified from |1|)) gives 3^2 combination.
This is similar to the geometric series except that here each range is constant. We want to find the range which means we know which string to prefix.
We can represent it as the matrix
20 (00,01,02,10,11,12,20,21,22)
21 (00,01,02,10,11,12,20,21,22)
The portion in parenthesis is our matrix.
Every item in a row is simply its position base_3 (left-padded with 0).
e.g: n=7 has base_3 value '21'. (7=2*3+1).
'21' does occur in position 7 in the row.
Assuming we get some x (relative to that special part).
E(x / 3^2) gives us the row number (here E(7/9) = 0 so prefix is '20')
x % 3^2 give us the position in the row (here base_3(7%9)='21' giving us the final string '2021')
If we want to observe it remember that we substracted S_w=12 before to get x = 7, so we would call myGen(7+12)
Some code
Notice the same output as long as we stand in the "geometric" range, without supercycle.
Obviously, when carry starts to appear, it depends on whether I can use '_' or not. If yes, my words get shorter otherwise longer.
// https://www.cs.sfu.ca/~ggbaker/zju/math/int-alg.html
// \w insensitive could give base64
// but also éè and other accents...
function base_n(x, n, A) {
const a = []
while (x !== 0n) {
a.push(A[Number(x % n)])
x = x / n // auto floor with bigInt
}
return a.reverse().join('')
}
function mygen (A) {
const n = A.length
const bn = BigInt(n)
const A_last = A[A.length-1]
const S = Array(n).fill(0).map((x, i) => bn * (bn ** BigInt(i) - 1n) / (bn - 1n))
const S_w = S[n-1]
const w = S_w + (bn - 1n) * bn ** (bn - 1n)
const w2 = bn ** (bn - 1n)
const flog_bn = x => {
// https://math.stackexchange.com/questions/1627914/smart-way-to-calculate-floorlogx
let L = 0
while (x >= bn) {
L++
x /= bn
}
return L
}
return function (x) {
x = BigInt(x)
let r = x % w
const q = (x - r) / w
let s
if (r < S_w) {
const i = flog_bn(r * (bn - 1n) / bn + 1n)
const r2 = r - S[i]
s = A[i] + base_n(r2, bn, A).padStart(i+1, '0')
} else {
const n2 = r - S_w
const r2 = n2 % w2
const q2 = (n2 - r2 ) / w2
s = A_last + A[q2] + base_n(r2, bn, A).padStart(n-1, '0')
}
// comma below __not__ necessary, just to ease seeing cycles
return A_last.repeat(2*Number(q)) +','+ s
}
}
function genStr (A) {
A = A.filter(x => x !== '_')
const bn_noUnderscore = BigInt(A.length)
return function (x) {
x = BigInt(x);
let prefix = "",
cycle = 0n,
max = bn_noUnderscore ** (cycle + 1n);
while (x >= max) {
x -= max;
if (cycle === bn_noUnderscore - 1n) {
prefix += "z";
cycle = 0n;
} else {
cycle++;
}
max = bn_noUnderscore ** (cycle + 1n);
}
return prefix
+ base_n(cycle, bn_noUnderscore, A)
+ "_"
+ base_n(x, bn_noUnderscore, A).padStart(Number(cycle) + 1, 0);
}
}
function test(a, b, x){
console.log(a(x), b(x))
}
{
console.log('---my supercycle is shorter if underscore not used. Plenty of room for grandinero')
const A = '0123456789abcdefghijklmnopqrstuvwxyz'.split('').sort((a,b)=>a.localeCompare(b))
let my = mygen(A)
const grandinero = genStr(A)
test(my, grandinero, 1e4)
test(my, grandinero, 1e12)
test(my, grandinero, 106471793335560744271846581685593263893929893610517909620n) // cycle ended for me (w variable value)
}
{
console.log('---\n my supercycle is greater if underscore is used in my alphabet (not grandinero since "forbidden')
// underscore used
const A = '0123456789abcdefghijklmnopqrstuvwxyz_'.split('').sort((a,b)=>a.localeCompare(b))
let my = mygen(A)
const grandinero = genStr(A)
test(my, grandinero, 1e12)
test(my, grandinero, 106471793335560744271846581685593263893929893610517909620n) // cycle ended for me (w variable value)
test(my, grandinero, 1e57) // still got some place in the supercycle
}
After considering the advice provided by #kaya3 and #grodzi and reviewing my original code, I have made some improvements. I realized a few things:
There was a bug in my original code. If one cycle ends at z_z (actually 36 z's after the underscore, but you get the idea) and the next one begins at z0_0, then lexical ordering is broken because _ comes after 0. The separator (or "neck") needs to be lower in lexical order than the lowest possible value of the head.
Though I was initially resistant to the idea of rolling a custom baseN generator so that more characters can be included, I have now come around to the idea.
I can squeeze more permutations out of a given string length by also incrementing the neck. For example, I can go from A00...A0z to A10...A1z, and so on, thus increasing the number of unique strings I can generate with A as the head before I move on to B.
With that in mind, I have revised my code:
// this is the alphabet used in standard baseN conversions:
let baseAlpha = "0123456789abcdefghijklmnopqrstuvwxyz";
// this is a factory for creating a new string generator:
function sequenceGenerator (config) {
let
// alphabets for the head, neck and body:
headAlpha = config.headAlpha,
neckAlpha = config.neckAlpha,
bodyAlpha = config.bodyAlpha,
// length of the body alphabet corresponds to the
// base of the numbering system:
base = BigInt(bodyAlpha.length),
// if bodyAlpha is identical to an alphabet that
// would be used for a standard baseN conversion,
// then use the built-in method, which should be
// much faster:
convertBody = baseAlpha.startsWith(bodyAlpha)
? (n) => n.toString(bodyAlpha.length)
// otherwise, roll a custom baseN generator:
: function (n) {
let s = "";
while (n > 0n) {
let i = n % base;
s = bodyAlpha[i] + s;
n = n / base;
}
return s;
},
// n is used to cache the last iteration and is
// incremented each time you call `getNext`
// it can optionally be initialized to a value other
// than 0:
n = BigInt(config.start || 0),
// see below:
headCycles = [0n],
cycleLength = 0n;
// the length of the body increases by 1 each time the
// head increments, meaning that the total number of
// permutations increases geometrically for each
// character in headAlpha
// here we cache the maximum number of permutations for
// each length of the body
// since we know these values ahead of time, calculating
// them in advance saves time when we generate a new
// string
// more importantly, it saves us from having to do a
// reverse calculation involving Math.log, which requires
// converting BigInts to Numbers, which breaks the
// program on larger numbers:
for (let i = 0; i < headAlpha.length; i++) {
// the maximum number of permutations depends on both
// the string length (i + 1) and the number of
// characters in neckAlpha, since the string length
// remains the same while the neck increments
cycleLength += BigInt(neckAlpha.length) * base ** BigInt(i + 1);
headCycles.push(cycleLength);
}
// given a number n, this function searches through
// headCycles to find where the total number of
// permutations exceeds n
// this is how we avoid the reverse calculation with
// Math.log to determine which head cycle we are on for
// a given permutation:
function getHeadCycle (n) {
for (let i = 0; i < headCycles.length; i++) {
if (headCycles[i] > n) return i;
}
}
return {
cycleLength: cycleLength,
getString: function (n) {
let cyclesDone = Number(n / cycleLength),
headLast = headAlpha[headAlpha.length - 1],
prefix = headLast.repeat(cyclesDone),
nn = n % cycleLength,
headCycle = getHeadCycle(nn),
head = headAlpha[headCycle - 1],
nnn = nn - headCycles[headCycle - 1],
neckCycleLength = BigInt(bodyAlpha.length) ** BigInt(headCycle),
neckCycle = nnn / neckCycleLength,
neck = neckAlpha[Number(neckCycle)],
body = convertBody(nnn % neckCycleLength);
body = body.padStart(headCycle , bodyAlpha[0]);
return prefix + head + neck + body;
},
getNext: function () { return this.getString(n++); }
};
}
let bodyAlpha = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz",
getStr = sequenceGenerator({
// achieve more permutations within a supercycle
// with a larger headAlpha:
headAlpha: "123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
// the highest value of neckAlpha must be lower than
// the lowest value of headAlpha:
neckAlpha: "0",
bodyAlpha: bodyAlpha
});
console.log("---supercycle length:");
console.log(Number(getStr.cycleLength));
console.log("---first two values:")
console.log(getStr.getNext());
console.log(getStr.getNext());
console.log("---arbitrary large value (1e57):");
console.log(getStr.getString(BigInt(1e57)));
console.log("");
// here we use a shorter headAlpha and longer neckAlpha
// to shorten the maximum length of the body, but this also
// decreases the number of permutations in the supercycle:
getStr = sequenceGenerator({
headAlpha: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
neckAlpha: "0123456789",
bodyAlpha: bodyAlpha
});
console.log("---supercycle length:");
console.log(Number(getStr.cycleLength));
console.log("---first two values:");
console.log(getStr.getNext());
console.log(getStr.getNext());
console.log("---arbitrary large value (1e57):");
console.log(getStr.getString(BigInt(1e57)));
EDIT
After further discussion with #grodzi, I have made some more improvements:
I realized that the "neck" or separator wasn't providing much value, so I have gotten rid of it. Later edit: actually, the separator is necessary. I am not sure why I thought it wasn't. Without the separator, the beginning of each new supercycle will lexically precede the end of the previous supercycle. I haven't changed my code below, but anyone using this code should include a separator. I have also realized that I was wrong to use an underscore as the separator. The separator must be a character, such as the hyphen, which lexically precedes the lowest digit used in the sequence (0).
I have taken #grodzi's suggestion to allow the length of the tail to continue growing indefinitely.
Here is the new code:
let baseAlpha = "0123456789abcdefghijklmnopqrstuvwxyz";
function sequenceGenerator (config) {
let headAlpha = config.headAlpha,
tailAlpha = config.tailAlpha,
base = BigInt(tailAlpha.length),
convertTail = baseAlpha.startsWith(tailAlpha)
? (n) => n.toString(tailAlpha.length)
: function (n) {
if (n === 0n) return "0";
let s = "";
while (n > 0n) {
let i = n % base;
s = tailAlpha[i] + s;
n = n / base;
}
return s;
},
n = BigInt(config.start || 0);
return {
getString: function (n) {
let cyclesDone = 0n,
headCycle = 0n,
initLength = 0n,
accum = 0n;
for (;; headCycle++) {
let _accum = accum + base ** (headCycle + 1n + initLength);
if (_accum > n) {
n -= accum;
break;
} else if (Number(headCycle) === headAlpha.length - 1) {
cyclesDone++;
initLength += BigInt(headAlpha.length);
headCycle = -1n;
}
accum = _accum;
}
let headLast = headAlpha[headAlpha.length - 1],
prefix = headLast.repeat(Number(cyclesDone)),
head = headAlpha[Number(headCycle)],
tail = convertTail(n),
tailLength = Number(headCycle + initLength);
tail = tail.padStart(tailLength, tailAlpha[0]);
return prefix + head + tail;
},
getNext: function () { return this.getString(n++); }
};
}
let alpha = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz",
genStr = sequenceGenerator({headAlpha: alpha, tailAlpha: alpha});
console.log("--- first string:");
console.log(genStr.getString(0n));
console.log("--- 1e+57");
console.log(genStr.getString(BigInt(1e+57)));
console.log("--- end of first supercycle:");
console.log(genStr.getString(63n*(1n-(63n**63n))/(1n-63n)-1n));
console.log("--- start of second supercycle:");
console.log(genStr.getString(63n*(1n-(63n**63n))/(1n-63n)));

Display all the possible numbers having its digits in ascending order

Write a program that can display all the possible numbers in between given two numbers, having its digits in ascending order.
For Example:-
Input: 5000 to 6000
Output: 5678 5679 5689 5789
Input: 90 to 124
Output: 123 124
Brute force approach can make it count to all numbers and check of digits for each one of them. But I want approaches that can skip some numbers and can bring complexity lesser than O(n). Do any such solution(s) exists that can give better approach for this problem?
I offer a solution in Python. It is efficient as it considers only the relevant numbers. The basic idea is to count upwards, but handle overflow somewhat differently. While we normally set overflowing digits to 0, here we set them to the previous digit +1. Please check the inline comments for further details. You can play with it here: http://ideone.com/ePvVsQ
def ascending( na, nb ):
assert nb>=na
# split each number into a list of digits
a = list( int(x) for x in str(na))
b = list( int(x) for x in str(nb))
d = len(b) - len(a)
# if both numbers have different length add leading zeros
if d>0:
a = [0]*d + a # add leading zeros
assert len(a) == len(b)
n = len(a)
# check if the initial value has increasing digits as required,
# and fix if necessary
for x in range(d+1, n):
if a[x] <= a[x-1]:
for y in range(x, n):
a[y] = a[y-1] + 1
break
res = [] # result set
while a<=b:
# if we found a value and add it to the result list
# turn the list of digits back into an integer
if max(a) < 10:
res.append( int( ''.join( str(k) for k in a ) ) )
# in order to increase the number we look for the
# least significant digit that can be increased
for x in range( n-1, -1, -1): # count down from n-1 to 0
if a[x] < 10+x-n:
break
# digit x is to be increased
a[x] += 1
# all subsequent digits must be increased accordingly
for y in range( x+1, n ):
a[y] = a[y-1] + 1
return res
print( ascending( 5000, 9000 ) )
Sounds like task from Project Euler. Here is the solution in C++. It is not short, but it is straightforward and effective. Oh, and hey, it uses backtracking.
// Higher order digits at the back
typedef std::vector<int> Digits;
// Extract decimal digits of a number
Digits ExtractDigits(int n)
{
Digits digits;
while (n > 0)
{
digits.push_back(n % 10);
n /= 10;
}
if (digits.empty())
{
digits.push_back(0);
}
return digits;
}
// Main function
void PrintNumsRec(
const Digits& minDigits, // digits of the min value
const Digits& maxDigits, // digits of the max value
Digits& digits, // digits of current value
int pos, // current digits with index greater than pos are already filled
bool minEq, // currently filled digits are the same as of min value
bool maxEq) // currently filled digits are the same as of max value
{
if (pos < 0)
{
// Print current value. Handle leading zeros by yourself, if need
for (auto pDigit = digits.rbegin(); pDigit != digits.rend(); ++pDigit)
{
if (*pDigit >= 0)
{
std::cout << *pDigit;
}
}
std::cout << std::endl;
return;
}
// Compute iteration boundaries for current position
int first = minEq ? minDigits[pos] : 0;
int last = maxEq ? maxDigits[pos] : 9;
// The last filled digit
int prev = digits[pos + 1];
// Make sure generated number has increasing digits
int firstInc = std::max(first, prev + 1);
// Iterate through possible cases for current digit
for (int d = firstInc; d <= last; ++d)
{
digits[pos] = d;
if (d == 0 && prev == -1)
{
// Mark leading zeros with -1
digits[pos] = -1;
}
PrintNumsRec(minDigits, maxDigits, digits, pos - 1, minEq && (d == first), maxEq && (d == last));
}
}
// High-level function
void PrintNums(int min, int max)
{
auto minDigits = ExtractDigits(min);
auto maxDigits = ExtractDigits(max);
// Make digits array of the same size
while (minDigits.size() < maxDigits.size())
{
minDigits.push_back(0);
}
Digits digits(minDigits.size());
int pos = digits.size() - 1;
// Placeholder for leading zero
digits.push_back(-1);
PrintNumsRec(minDigits, maxDigits, digits, pos, true, true);
}
void main()
{
PrintNums(53, 297);
}
It uses recursion to handle arbitrary amount of digits, but it is essentially the same as the nested loops approach. Here is the output for (53, 297):
056
057
058
059
067
068
069
078
079
089
123
124
125
126
127
128
129
134
135
136
137
138
139
145
146
147
148
149
156
157
158
159
167
168
169
178
179
189
234
235
236
237
238
239
245
246
247
248
249
256
257
258
259
267
268
269
278
279
289
Much more interesting problem would be to count all these numbers without explicitly computing it. One would use dynamic programming for that.
There is only a very limited number of numbers which can match your definition (with 9 digits max) and these can be generated very fast. But if you really need speed, just cache the tree or the generated list and do a lookup when you need your result.
using System;
using System.Collections.Generic;
namespace so_ascending_digits
{
class Program
{
class Node
{
int digit;
int value;
List<Node> children;
public Node(int val = 0, int dig = 0)
{
digit = dig;
value = (val * 10) + digit;
children = new List<Node>();
for (int i = digit + 1; i < 10; i++)
{
children.Add(new Node(value, i));
}
}
public void Collect(ref List<int> collection, int min = 0, int max = Int16.MaxValue)
{
if ((value >= min) && (value <= max)) collection.Add(value);
foreach (Node n in children) if (value * 10 < max) n.Collect(ref collection, min, max);
}
}
static void Main(string[] args)
{
Node root = new Node();
List<int> numbers = new List<int>();
root.Collect(ref numbers, 5000, 6000);
numbers.Sort();
Console.WriteLine(String.Join("\n", numbers));
}
}
}
Why the brute force algorithm may be very inefficient.
One efficient way of encoding the input is to provide two numbers: the lower end of the range, a, and the number of values in the range, b-a-1. This can be encoded in O(lg a + lg (b - a)) bits, since the number of bits needed to represent a number in base-2 is roughly equal to the base-2 logarithm of the number. We can simplify this to O(lg b), because intuitively if b - a is small, then a = O(b), and if b - a is large, then b - a = O(b). Either way, the total input size is O(2 lg b) = O(lg b).
Now the brute force algorithm just checks each number from a to b, and outputs the numbers whose digits in base 10 are in increasing order. There are b - a + 1 possible numbers in that range. However, when you represent this in terms of the input size, you find that b - a + 1 = 2lg (b - a + 1) = 2O(lg b) for a large enough interval.
This means that for an input size n = O(lg b), you may need to check in the worst case O(2 n) values.
A better algorithm
Instead of checking every possible number in the interval, you can simply generate the valid numbers directly. Here's a rough overview of how. A number n can be thought of as a sequence of digits n1 ... nk, where k is again roughly log10 n.
For a and a four-digit number b, the iteration would look something like
for w in a1 .. 9:
for x in w+1 .. 9:
for y in x+1 .. 9:
for x in y+1 .. 9:
m = 1000 * w + 100 * x + 10 * y + w
if m < a:
next
if m > b:
exit
output w ++ x ++ y ++ z (++ is just string concatenation)
where a1 can be considered 0 if a has fewer digits than b.
For larger numbers, you can imagine just adding more nested for loops. In general, if b has d digits, you need d = O(lg b) loops, each of which iterates at most 10 times. The running time is thus O(10 lg b) = O(lg b) , which is a far better than the O(2lg b) running time you get by checking if every number is sorted or not.
One other detail that I have glossed over, which actually does affect the running time. As written, the algorithm needs to consider the time it takes to generate m. Without going into the details, you could assume that this adds at worst a factor of O(lg b) to the running time, resulting in an O(lg2 b) algorithm. However, using a little extra space at the top of each for loop to store partial products would save lots of redundant multiplication, allowing us to preserve the originally stated O(lg b) running time.
One way (pseudo-code):
for (digit3 = '5'; digit3 <= '6'; digit3++)
for (digit2 = digit3+1; digit2 <= '9'; digit2++)
for (digit1 = digit2+1; digit1 <= '9'; digit1++)
for (digit0 = digit1+1; digit0 <= '9'; digit0++)
output = digit3 + digit2 + digit1 + digit0; // concatenation

Manacher's algorithm (algorithm to find longest palindrome substring in linear time)

After spending about 6-8 hours trying to digest the Manacher's algorithm, I am ready to throw in the towel. But before I do, here is one last shot in the dark: can anyone explain it? I don't care about the code. I want somebody to explain the ALGORITHM.
Here seems to be a place that others seemed to enjoy in explaining the algorithm:
http://www.leetcode.com/2011/11/longest-palindromic-substring-part-ii.html
I understand why you would want to transform the string, say, 'abba' to #a#b#b#a#
After than I'm lost. For example, the author of the previously mentioned website says the key part of the algorithm is:
if P[ i' ] ≤ R – i,
then P[ i ] ← P[ i' ]
else P[ i ] ≥ P[ i' ]. (Which we have to expand past
the right edge (R) to find P[ i ])
This seems wrong, because he/she says at one point that P[i] equals 5 when P[i'] = 7 and P[i] is not less or equal to R - i.
If you are not familiar with the algorithm, here are some more links: http://tristan-interview.blogspot.com/2011/11/longest-palindrome-substring-manachers.html (I've tried this one, but the terminology is awful and confusing. First, some things are not defined. Also, too many variables. You need a checklist to recall what variable is referring to what.)
Another is: http://www.akalin.cx/longest-palindrome-linear-time (good luck)
The basic gist of the algorithm is to find the longest palindrome in linear time. It can be done in O(n^2) with a minimum to medium amount of effort. This algorithm is supposed to be quite "clever" to get it down to O(n).
I agree that the logic isn't quite right in the explanation of the link. I give some details below.
Manacher's algorithm fills in a table P[i] which contains how far the palindrome centered at i extends. If P[5]=3, then three characters on either side of position five are part of the palindrome. The algorithm takes advantage of the fact that if you've found a long palindrome, you can fill in values of P on the right side of the palindrome quickly by looking at the values of P on the left side, since they should mostly be the same.
I'll start by explaining the case you were talking about, and then I'll expand this answer as needed.
R indicates the index of the right side of the palindrome centered at C. Here is the state at the place you indicated:
C=11
R=20
i=15
i'=7
P[i']=7
R-i=5
and the logic is like this:
if P[i']<=R-i: // not true
else: // P[i] is at least 5, but may be greater
The pseudo-code in the link indicates that P[i] should be greater than or equal to P[i'] if the test fails, but I believe it should be greater than or equal to R-i, and the explanation backs that up.
Since P[i'] is greater than R-i, the palindrome centered at i' extends past the palindrome centered at C. We know the palindrome centered at i will be at least R-i characters wide, because we still have symmetry up to that point, but we have to search explicitly beyond that.
If P[i'] had been no greater than R-i, then the largest palindrome centered at i' is within the largest palindrome centered at C, so we would have known that P[i] couldn't be any larger than P[i']. If it was, we would have a contradiction. It would mean that we would be able to extend the palindrome centered at i beyond P[i'], but if we could, then we would also be able to extend the palindrome centered at i' due to the symmetry, but it was already supposed to be as large as possible.
This case is illustrated previously:
C=11
R=20
i=13
i'=9
P[i']=1
R-i=7
In this case, P[i']<=R-i. Since we are still 7 characters away from the edge of the palindrome centered at C, we know that at least 7 characters around i are the same as the 7 characters around i'. Since there was only a one character palindrome around i', there is a one character palindrome around i as well.
j_random_hacker noticed that the logic should be more like this:
if P[i']<R-i then
P[i]=P[i']
else if P[i']>R-i then
P[i]=R-i
else P[i]=R-i + expansion
If P[i'] < R-i, then we know that P[i]==P[i'], since we're still inside the palindrome centered at C.
If P[i'] > R-i, then we know that P[i]==R-i, because otherwise the palindrome centered at C would have extended past R.
So the expansion is really only necessary in the special case where P[i']==R-i, so we don't know if the palindrome at P[i] may be longer.
This is handled in the actual code by setting P[i]=min(P[i'],R-i) and then always expanding. This way of doing it doesn't increase the time complexity, because if no expansion is necessary, the time taken to do the expansion is constant.
I have found one of the best explanation so far at the following link:
http://tarokuriyama.com/projects/palindrome2.php
It also has a visualization for the same string example (babcbabcbaccba) used at the first link mentioned in the question.
Apart from this link, i also found the code at
http://algs4.cs.princeton.edu/53substring/Manacher.java.html
I hope it will be helpful to others trying hard to understand the crux of this algorithm.
The Algorithm on this site seems understandable to the certain point
http://www.akalin.cx/longest-palindrome-linear-time
To understand this particular approach the best is to try to solving the problem on paper and catching the tricks you can implement to avoid checking for the palindrome for each possible center.
First answer yourself - when you find a palindrome of a given length, let's say 5 - can't you as a next step just jump to the end of this palindrome (skipping 4 letters and 4 mid-letters)?
If you try to create a palindrome with length 8 and place another palindrome with length > 8, which center is in the right side of the first palindrome you will notice something funny. Try it out:
Palindrome with length 8 - WOWILIKEEKIL - Like + ekiL = 8
Now in most cases you would be able to write down the place between two E's as a center and number 8 as the length and jump after the last L to look for the center of the bigger palindrome.
This approach is not correct, which the center of bigger palindrome can be inside ekiL and you would miss it if you would jump after the last L.
After you find LIKE+EKIL you place 8 in the array that these algos use and this looks like:
[0,1,0,3,0,1,0,1,0,3,0,1,0,1,0,1,8]
for
[#,W,#,O,#,W,#,I,#,L,#,I,#,K,#,E,#]
The trick is that you already know that most probably next 7 (8-1) numbers after 8 will be the same as on the left side, so the next step is to automatically copy 7 numbers from left of 8 to right of 8 keeping in mind they are not yet final.
The array would look like this
[0,1,0,3,0,1,0,1,0,3,0,1,0,1,0,1,8,1,0,1,0,1,0,3] (we are at 8)
for
[#,W,#,O,#,W,#,I,#,L,#,I,#,K,#,E,#,E,#,K,#,I,#,L]
Let's make an example, that such jump would destroy our current solution and see what we can notice.
WOWILIKEEKIL - lets try to make bigger palindrome with the center somewhere within EKIL.
But its not possible - we need to change word EKIL to something that contain palindrome.
What? OOOOOh - thats the trick.
The only possibility to have a bigger palindrome with the center in the right side of our current palindrome is that it is already in the right (and left) side of palindrome.
Let's try to build one based on WOWILIKEEKIL
We would need to change EKIL to for example EKIK with I as a center of the bigger palindrome - remember to change LIKE to KIKE as well.
First letters of our tricky palindrome will be:
WOWIKIKEEKIK
as said before - let the last I be the center of the bigger pallindrome than KIKEEKIK:
WOWIKIKEEKIKEEKIKIW
let's make the array up to our old pallindrom and find out how to laverage the additional info.
for
[_ W _ O _ W _ I _ K _ I _ K _ E _ E _ K _ I _ K _ E _ E _ K _ I _ K _ I _ W ]
it will be
[0,1,0,3,0,1,0,1,0,3,0,3,0,1,0,1,8
we know that the next I - a 3rd will be the longest pallindrome, but let's forget about it for a bit. lets copy the numbers in the array from the left of 8 to the right (8 numbers)
[0,1,0,3,0,1,0,1,0,3,0,3,0,1,0,1,8,1,0,1,0,3,0,3]
In our loop we are at between E's with number 8. What is special about I (future middle of biggest pallindrome) that we cannot jump right to K (the last letter of currently biggest pallindrome)?
The special thing is that it exceeds the current size of the array ... how?
If you move 3 spaces to the right of 3 - you are out of array. It means that it can be the middle of the biggest pallindrome and the furthest you can jump is this letter I.
Sorry for the length of this answer - I wanted to explain the algorythm and can assure you - #OmnipotentEntity was right - I understand it even better after explaining to you :)
Full Article: http://www.zrzahid.com/longest-palindromic-substring-in-linear-time-manachers-algorithm/
First of all lets observe closely to a palindrome in order to find some interesting properties. For example, S1 = "abaaba" and S2="abcba", both are palindrome but what is the non-trivial (i.e. not length or characters) difference between them? S1 is a palindrome centered around the invisible space between i=2 and i=3 (non-existent space!). On the other hand S2 is centered around character at i=2 (ie. c). In order to graciously handle the center of a palindrome irrespective of the odd/even length, lets transform the palindrome by inserting special character $ in between characters. Then S1="abba" and S2="abcba" will be transformed into T1="$a$b$a$a$b$a$" centered at i=6 and T2="$a$b$c$b$a$" centered at i=5. Now, we can see that centers are existent and lengths are consistent 2*n+1, where n=length of original string. For example,
i' c i
-----------------------------------------------------
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12|
-----------------------------------------------------
T1=| $ | a | $ | b | $ | a | $ | a | $ | b | $ | a | $ |
-----------------------------------------------------
Next, observe that from the symmetric property of a (transformed) palindrome T around the center c, T[c-k] = T[c+k] for 0<= k<= c. That is positions c-k and c+k are mirror to each other. Let's put it another way, for an index i on the right of center c, the mirror index i' is on the left of c such that c-i'=i-c => i'=2*c-i and vice versa. That is,
For each position i on the right of center c of a palindromic substring, the mirror position of i is, i'=2*c-i, and vice versa.
Let us define an array P[0..2*n] such that P[i] equals to the length of the palindrome centered at i. Note that, length is actually measured by number of characters in the original string (by ignoring special chars $). Also let min and max be respectively the leftmost and rightmost boundary of a palindromic substring centered at c. So, min=c-P[c] and max=c+P[c]. For example, for palindrome S="abaaba", the transformed palindrome T, mirror center c=6, length array P[0..12], min=c-P[c]=6-6=0, max=c+P[c]=6+6=12 and two sample mirrored indices i and i' are shown in the following figure.
min i' c i max
-----------------------------------------------------
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12|
-----------------------------------------------------
T=| $ | a | $ | b | $ | a | $ | a | $ | b | $ | a | $ |
-----------------------------------------------------
P=| 0 | 1 | 0 | 3 | 0 | 5 | 6 | 1 | 0 | 3 | 0 | 1 | 0 |
-----------------------------------------------------
With such a length array P, we can find the length of longest palindromic substring by looking into the max element of P. That is,
P[i] is the length of a palindromic substring with center at i in the transformed string T, ie. center at i/2 in the original string S; Hence the longest palindromic substring would be the substring of length P[imax] starting from index (imax-P[imax])/2 such that imax is the index of maximum element in P.
Let us draw a similar figure in the following for our non-palindromic example string S="babaabca".
min c max
|----------------|-----------------|
--------------------------------------------------------------------
idx= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|
---------------------------------------------------------------------
T=| $ | b | $ | a | $ | b | $ | a | $ | a | $ | b | $ | c | $ | a | $ |
---------------------------------------------------------------------
P=| 0 | 1 | 0 | 3 | 0 | 3 | 0 | 1 | 4 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
---------------------------------------------------------------------
Question is how to compute P efficiently. The symmetric property suggests the following cases that we could potentially use to compute P[i] by using previously computed P[i'] at the mirror index i' on the left, hence skipping a lot of computations. Let's suppose that we have a reference palindrome (first palindrome) to start with.
A third palindrome whose center is within the right side of a first palindrome will have exactly the same length as that of a second palindrome anchored at the mirror center on the left side, if the second palindrome is within the bounds of the first palindrome by at least one character.
For example in the following figure with the first palindrome centered at c=8 and bounded by min=4 and max=12, length of the third palindrome centered at i=9 (with mirror index i'= 2*c-i = 7) is, P[i] = P[i'] = 1. This is because the second palindrome centered at i' is within the bounds of first palindrome. Similarly, P[10] = P[6] = 0.
|----3rd----|
|----2nd----|
|-----------1st Palindrome---------|
min i' c i max
|------------|---|---|-------------|
--------------------------------------------------------------------
idx= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|
---------------------------------------------------------------------
T=| $ | b | $ | a | $ | b | $ | a | $ | a | $ | b | $ | c | $ | a | $ |
---------------------------------------------------------------------
P=| 0 | 1 | 0 | 3 | 0 | 3 | 0 | 1 | 4 | ? | ? | ? | ? | ? | ? | ? | ? |
---------------------------------------------------------------------
Now, question is how to check this case? Note that, due to symmetric property length of segment [min..i'] is equals to the length of segment [i..max]. Also, note that 2nd palindrome is completely within 1st palindrome iff left edge of the 2nd palindrome is inside the left boundary, min of the 1st palindrome. That is,
i'-P[i'] >= min
=>P[i']-i' &lt -min (negate)
=>P[i'] &lt i'-min
=>P[i'] &lt max-i [(max-i)=(i'-min) due to symmetric property].
Combining all the facts in case 1,
P[i] = P[i'], iff (max-i) &gt P[i']
If the second palindrome meets or extends beyond the left bound of the first palindrome, then the third palindrome is guaranteed to have at least the length from its own center to the right outermost character of the first palindrome. This length is the same from the center of the second palindrome to the left outermost character of the first palindrome.
For example in the following figure, second palindrome centered at i=5 extends beyond the left bound of the first palindrome. So, in this case we can't say P[i]=P[i']. But length of the third palindrome centered at i=11, P[i] is at least the length from its center i=11 to the right bound max=12 of first palindrome centered at c. That is, P[i]>=1. This means third palindrome could be extended past max if and only if next immediate character past max matches exactly with the mirrored character, and we continue this check beyond. For example, in this case P[13]!=P[9] and it can't be extended. So, P[i] = 1.
|-------2nd palindrome------| |----3rd----|---?
|-----------1st Palindrome---------|
min i' c i max
|----|-----------|-----------|-----|
--------------------------------------------------------------------
idx= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|
---------------------------------------------------------------------
T=| $ | b | $ | a | $ | b | $ | a | $ | a | $ | b | $ | c | $ | a | $ |
---------------------------------------------------------------------
P=| 0 | 1 | 0 | 3 | 0 | 3 | 0 | 1 | 4 | 1 | 0 | ? | ? | ? | ? | ? | ? |
---------------------------------------------------------------------
So, how to check this case? This is simply the failed check for case 1. That is, second palindrome will extend past left edge of first palindrome iff,
i'-P[i'] &lt min
=>P[i']-i' >= -min [negate]
=>P[i'] >= i'-min
=>P[i'] >= max-i [(max-i)=(i'-min) due to symmetric property].
That is, P[i] is at least (max-i) iff (max-i) P[i]>=(max-i), iff (max-i)
Now, if the third palindrome does extend beyond max then we need to update the center and the boundary of the new palindrome.
If the palindrome centered at i does expand past max then we have new (extended) palindrome, hence a new center at c=i. Update max to the rightmost boundary of the new palindrome.
Combining all the facts in case 1 and case 2, we can come up with a very beautiful little formulae:
Case 1: P[i] = P[i'], iff (max-i) > P[i']
Case 2: P[i]>=(max-i), iff (max-i) = min(P[i'], max-i).
That is, P[i]=min(P[i'], max-i) when the third palindrome is not extendable past max. Otherwise, we have new third palindrome at center at c=i and new max=i+P[i].
Neither the first nor second palindrome provides information to help determine the palindromic length of a fourth palindrome whose center is outside the right side of the first palindrome.
That is, we can't determine preemptively P[i] if i>max. That is,
P[i] = 0, iff max-i &lt 0
Combining all the cases, we conclude the formulae:
P[i] = max>i ? min(P[i'], max-i) : 0. In case we can expand beyond max then we expand by matching characters beyond max with the mirrored character with respect to new center at c=i. Finally when we have a mismatch we update new max=i+P[i].
Reference: algorithm description in wiki page
This material is of great help for me to understand it:
http://solutionleetcode.blogspot.com/2013/07/leetcode-longest-palindromic-substring.html
Define T as the length of the longest palindromic substrings centered at each of the characters.
The key thing is, when smaller palindromes are completely embedded within the longer palindrome, T[i] should also be symmetric within the longer palindrome.
Otherwise, we will have to compute T[i] from scratch, rather than induce from the symmetric left part.
class Palindrome
{
private int center;
private int radius;
public Palindrome(int center, int radius)
{
if (radius < 0 || radius > center)
throw new Exception("Invalid palindrome.");
this.center = center;
this.radius = radius;
}
public int GetMirror(int index)
{
int i = 2 * center - index;
if (i < 0)
return 0;
return i;
}
public int GetCenter()
{
return center;
}
public int GetLength()
{
return 2 * radius;
}
public int GetRight()
{
return center + radius;
}
public int GetLeft()
{
return center - radius;
}
public void Expand()
{
++radius;
}
public bool LargerThan(Palindrome other)
{
if (other == null)
return false;
return (radius > other.radius);
}
}
private static string GetFormatted(string original)
{
if (original == null)
return null;
else if (original.Length == 0)
return "";
StringBuilder builder = new StringBuilder("#");
foreach (char c in original)
{
builder.Append(c);
builder.Append('#');
}
return builder.ToString();
}
private static string GetUnFormatted(string formatted)
{
if (formatted == null)
return null;
else if (formatted.Length == 0)
return "";
StringBuilder builder = new StringBuilder();
foreach (char c in formatted)
{
if (c != '#')
builder.Append(c);
}
return builder.ToString();
}
public static string FindLargestPalindrome(string str)
{
string formatted = GetFormatted(str);
if (formatted == null || formatted.Length == 0)
return formatted;
int[] radius = new int[formatted.Length];
try
{
Palindrome current = new Palindrome(0, 0);
for (int i = 0; i < formatted.Length; ++i)
{
radius[i] = (current.GetRight() > i) ?
Math.Min(current.GetRight() - i, radius[current.GetMirror(i)]) : 0;
current = new Palindrome(i, radius[i]);
while (current.GetLeft() - 1 >= 0 && current.GetRight() + 1 < formatted.Length &&
formatted[current.GetLeft() - 1] == formatted[current.GetRight() + 1])
{
current.Expand();
++radius[i];
}
}
Palindrome largest = new Palindrome(0, 0);
for (int i = 0; i < radius.Length; ++i)
{
current = new Palindrome(i, radius[i]);
if (current.LargerThan(largest))
largest = current;
}
return GetUnFormatted(formatted.Substring(largest.GetLeft(), largest.GetLength()));
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
return null;
}
Fast Javascript Solution to finding the longest palindrome in a string:
const lpal = str => {
let lpal = ""; // to store longest palindrome encountered
let pal = ""; // to store new palindromes found
let left; // to iterate through left side indices of the character considered to be center of palindrome
let right; // to iterate through left side indices of the character considered to be center of palindrome
let j; // to iterate through all characters and considering each to be center of palindrome
for (let i=0; i<str.length; i++) { // run through all characters considering them center of palindrome
pal = str[i]; // initializing current palindrome
j = i; // setting j as index at the center of palindorme
left = j-1; // taking left index of j
right = j+1; // taking right index of j
while (left >= 0 && right < str.length) { // while left and right indices exist
if(str[left] === str[right]) { //
pal = str[left] + pal + str[right];
} else {
break;
}
left--;
right++;
}
if(pal.length > lpal.length) {
lpal = pal;
}
pal = str[i];
j = i;
left = j-1;
right = j+1;
if(str[j] === str[right]) {
pal = pal + str[right];
right++;
while (left >= 0 && right < str.length) {
if(str[left] === str[right]) {
pal = str[left] + pal + str[right];
} else {
break;
}
left--;
right++;
}
if(pal.length > lpal.length) {
lpal = pal;
}
}
}
return lpal;
}
Example Input
console.log(lpal("gerngehgbrgregbeuhgurhuygbhsbjsrhfesasdfffdsajkjsrngkjbsrjgrsbjvhbvhbvhsbrfhrsbfsuhbvsuhbvhvbksbrkvkjb"));
Output
asdfffdsa
I went through the same frustration/struggle and I found the solution on this page, https://www.hackerearth.com/practice/algorithms/string-algorithm/manachars-algorithm/tutorial/, to be easiest to understand.
I tried to implement this solution in my own style, and I think I can understand the algorithm now. I also tried to stuff as many explanations in the code as possible to explain the algo. Hope this help!
#Manacher's Algorithm
def longestPalindrome(s):
s = s.lower()
#Insert special characters, #, between characters
#Insert another special in the front, $, and at the end, #, of string to avoid bound checking.
s1 = '$#'
for c in s:
s1 += c + '#'
s1 = s1+'#'
#print(s, " -modified into- ", s1)
#Palin[i] = length of longest palindrome start at center i
Palin = [0]*len(s1)
#THE HARD PART: THE MEAT of the ALGO
#c and r help keeping track of the expanded regions.
c = r = 0
for i in range(1,len(s1)-1): #NOTE: this algo always expands around center i
#if we already expanded past i, we can retrieve partial info
#about this location i, by looking at the mirror from left side of center.
if r > i: #---nice, we look into memory of the past---
#look up mirror from left of center c
mirror = c - (i-c)
#mirror's largest palindrome = Palin[mirror]
#case1: if mirror's largest palindrome expands past r, choose r-i
#case2: if mirror's largest palindrome is contains within r, choose Palin[mirror]
Palin[i] = min(r-i, Palin[mirror])
#keep expanding around center i
#NOTE: instead of starting to expand from i-1 and i+1, which is repeated work
#we start expanding from Palin[i],
##which is, if applicable, updated in previous step
while s1[i+1+Palin[i]] == s1[i-1-Palin[i]]:
Palin[i] += 1
#if expanded past r, update r and c
if i+Palin[i] > r:
c = i
r = i + Palin[i]
#the easy part: find the max length, remove special characters, and return
max_center = max_length = 0
for i in range(len(s1)):
if Palin[i] > max_length:
max_length = Palin[i]
max_center = i
output = s1[max_center-max_length : max_center+max_length]
output = ''.join(output.split('#'))
return output # for the (the result substring)
using namespace std;
class Palindrome{
public:
Palindrome(string st){
s = st;
longest = 0;
maxDist = 0;
//ascii: 126(~) - 32 (space) = 94
// for 'a' to 'z': vector<vector<int>> v(26,vector<int>(0));
vector<vector<int>> v(94,vector<int>(0)); //all ascii
mDist.clear();
vPos = v;
bDebug = true;
};
string s;
string sPrev; //previous char
int longest; //longest palindrome size
string sLongest; //longest palindrome found so far
int maxDist; //max distance been checked
bool bDebug;
void findLongestPal();
int checkIfAnchor(int iChar, int &i);
void checkDist(int iChar, int i);
//store char positions in s pos[0] : 'a'... pos[25] : 'z'
// 0123456
// i.e. "axzebca" vPos[0][0]=0 (1st. position of 'a'), vPos[0][1]=6 (2nd pos. of 'a'),
// vPos[25][0]=2 (1st. pos. of 'z').
vector<vector<int>> vPos;
//<anchor,distance to check>
// i.e. abccba anchor = 3: position of 2nd 'c', dist = 3
// looking if next char has a dist. of 3 from previous one
// i.e. abcxcba anchor = 4: position of 2nd 'c', dist = 4
map<int,int> mDist;
};
//check if current char can be an anchor, if so return next distance to check (3 or 4)
// i.e. "abcdc" 2nd 'c' is anchor for sub-palindrome "cdc" distance = 4 if next char is 'b'
// "abcdd: 2nd 'd' is anchor for sub-palindrome "dd" distance = 3 if next char is 'c'
int Palindrome::checkIfAnchor(int iChar, int &i){
if (bDebug)
cout<<"checkIfAnchor. i:"<<i<<" iChar:"<<iChar<<endl;
int n = s.size();
int iDist = 3;
int iSize = vPos[iChar].size();
//if empty or distance to closest same char > 2
if ( iSize == 0 || vPos[iChar][iSize - 1] < (i - 2)){
if (bDebug)
cout<<" .This cannot be an anchor! i:"<<i<<" : iChar:"<<iChar<<endl;
//store char position
vPos[iChar].push_back(i);
return -1;
}
//store char position of anchor for case "cdc"
vPos[iChar].push_back(i);
if (vPos[iChar][iSize - 1] == (i - 2))
iDist = 4;
//for case "dd" check if there are more repeated chars
else {
int iRepeated = 0;
while ((i+1) < n && s[i+1] == s[i]){
i++;
iRepeated++;
iDist++;
//store char position
vPos[iChar].push_back(i);
}
}
if (bDebug)
cout<<" .iDist:"<<iDist<<" i:"<<i<<endl;
return iDist;
};
//check distance from previous same char, and update sLongest
void Palindrome::checkDist(int iChar, int i){
if (bDebug)
cout<<"CheckDist. i:"<<i<<" iChar:"<<iChar<<endl;
int iDist;
int iSize = vPos[iChar].size();
bool b1stOrLastCharInString;
bool bDiffDist;
//checkAnchor will add this char position
if ( iSize == 0){
if (bDebug)
cout<<" .1st time we see this char. Assign it INT_MAX Dist"<<endl;
iDist = INT_MAX;
}
else {
iDist = i - vPos[iChar][iSize - 1];
}
//check for distances being check, update them if found or calculate lengths if not.
if (mDist.size() == 0) {
if (bDebug)
cout<<" .no distances to check are registered, yet"<<endl;
return;
}
int i2ndMaxDist = 0;
for(auto it = mDist.begin(); it != mDist.end();){
if (bDebug)
cout<<" .mDist. anchor:"<<it->first<<" . dist:"<<it->second<<endl;
b1stOrLastCharInString = false;
bDiffDist = it->second == iDist; //check here, because it can be updated in 1st. if
if (bDiffDist){
if (bDebug)
cout<<" .Distance checked! :"<<iDist<<endl;
//check if it's the first char in the string
if (vPos[iChar][iSize - 1] == 0 || i == (s.size() - 1))
b1stOrLastCharInString = true;
//we will continue checking for more...
else {
it->second += 2; //update next distance to check
if (it->second > maxDist) {
if (bDebug)
cout<<" .previous MaxDist:"<<maxDist<<endl;
maxDist = it->second;
if (bDebug)
cout<<" .new MaxDist:"<<maxDist<<endl;
}
else if (it->second > i2ndMaxDist) {//check this...hmtest
i2ndMaxDist = it->second;
if (bDebug)
cout<<" .second MaxDist:"<<i2ndMaxDist<<endl;
}
it++;
}
}
if (!bDiffDist || b1stOrLastCharInString) {
if (bDebug && it->second != iDist)
cout<<" .Dist diff. Anchor:"<<it->first<<" dist:"<<it->second<<" iDist:"<<iDist<<endl;
else if (bDebug)
cout<<" .Palindrome found at the beggining or end of the string"<<endl;
//if we find a closest same char.
if (!b1stOrLastCharInString && it->second > iDist){
if (iSize > 1) {
if (bDebug)
cout<<" . < Dist . looking further..."<<endl;
iSize--;
iDist = i - vPos[iChar][iSize - 1];
continue;
}
}
if (iDist == maxDist) {
maxDist = 0;
if (bDebug)
cout<<" .Diff. clearing Max Dist"<<endl;
}
else if (iDist == i2ndMaxDist) {
i2ndMaxDist = 0;
if (bDebug)
cout<<" .clearing 2nd Max Dist"<<endl;
}
int iStart;
int iCurrLength;
//first char in string
if ( b1stOrLastCharInString && vPos[iChar].size() > 0 && vPos[iChar][iSize - 1] == 0){
iStart = 0;
iCurrLength = i+1;
}
//last char in string
else if (b1stOrLastCharInString && i == (s.size() - 1)){
iStart = i - it->second;
iCurrLength = it->second + 1;
}
else {
iStart = i - it->second + 1;
iCurrLength = it->second - 1; //"xabay" anchor:2nd. 'a'. Dist from 'y' to 'x':4. length 'aba':3
}
if (iCurrLength > longest){
if (bDebug)
cout<<" .previous Longest!:"<<sLongest<<" length:"<<longest<<endl;
longest = iCurrLength;
sLongest = s.substr(iStart, iCurrLength);
if (bDebug)
cout<<" .new Longest!:"<<sLongest<<" length:"<<longest<<endl;
}
if (bDebug)
cout<<" .deleting iterator for anchor:"<<it->first<<" dist:"<<it->second<<endl;
mDist.erase(it++);
}
}
//check if we need to get new max distance
if (maxDist == 0 && mDist.size() > 0){
if (bDebug)
cout<<" .new maxDist needed";
if (i2ndMaxDist > 0) {
maxDist = i2ndMaxDist;
if (bDebug)
cout<<" .assigned 2nd. max Dist to max Dist"<<endl;
}
else {
for(auto it = mDist.begin(); it != mDist.end(); it++){
if (it->second > maxDist)
maxDist = it->second;
}
if (bDebug)
cout<<" .new max dist assigned:"<<maxDist<<endl;
}
}
};
void Palindrome::findLongestPal(){
int n = s.length();
if (bDebug){
cout<<"01234567891123456789212345"<<endl<<"abcdefghijklmnopqrstuvwxyz"<<endl<<endl;
for (int i = 0; i < n;i++){
if (i%10 == 0)
cout<<i/10;
else
cout<<i;
}
cout<<endl<<s<<endl;
}
if (n == 0)
return;
//process 1st char
int j = 0;
//for 'a' to 'z' : while (j < n && (s[j] < 'a' && s[j] > 'z'))
while (j < n && (s[j] < ' ' && s[j] > '~'))
j++;
if (j > 0){
s.substr(j);
n = s.length();
}
// for 'a' to 'z' change size of vector from 94 to 26 : int iChar = s[0] - 'a';
int iChar = s[0] - ' ';
//store char position
vPos[iChar].push_back(0);
for (int i = 1; i < n; i++){
if (bDebug)
cout<<"findLongestPal. i:"<<i<<" "<<s.substr(0,i+1)<<endl;
//if max. possible palindrome would be smaller or equal
// than largest palindrome found then exit
// (n - i) = string length to check
// maxDist: max distance to check from i
int iPossibleLongestSize = maxDist + (2 * (n - i));
if ( iPossibleLongestSize <= longest){
if (bDebug)
cout<<" .PosSize:"<<iPossibleLongestSize<<" longest found:"<<iPossibleLongestSize<<endl;
return;
}
//for 'a' to 'z' : int iChar = s[i] - 'a';
int iChar = s[i] - ' ';
//for 'a' to 'z': if (iChar < 0 || iChar > 25){
if (iChar < 0 || iChar > 94){
if (bDebug)
cout<<" . i:"<<i<<" iChar:"<<s[i]<<" skipped!"<<endl;
continue;
}
//check distance to previous char, if exist
checkDist(iChar, i);
//check if this can be an anchor
int iDist = checkIfAnchor(iChar,i);
if (iDist == -1)
continue;
//append distance to check for next char.
if (bDebug)
cout<<" . Adding anchor for i:"<<i<<" dist:"<<iDist<<endl;
mDist.insert(make_pair(i,iDist));
//check if this is the only palindrome, at the end
//i.e. "......baa" or "....baca"
if (i == (s.length() - 1) && s.length() > (iDist - 2)){
//if this is the longest palindrome!
if (longest < (iDist - 1)){
sLongest = s.substr((i - iDist + 2),(iDist - 1));
}
}
}
};
int main(){
string s;
cin >> s;
Palindrome p(s);
p.findLongestPal();
cout<<p.sLongest;
return 0;
}

How to find list of possible words from a letter matrix [Boggle Solver]

Lately I have been playing a game on my iPhone called Scramble. Some of you may know this game as Boggle. Essentially, when the game starts you get a matrix of letters like so:
F X I E
A M L O
E W B X
A S T U
The goal of the game is to find as many words as you can that can be formed by chaining letters together. You can start with any letter, and all the letters that surround it are fair game, and then once you move on to the next letter, all the letters that surround that letter are fair game, except for any previously used letters. So in the grid above, for example, I could come up with the words LOB, TUX, SEA, FAME, etc. Words must be at least 3 characters, and no more than NxN characters, which would be 16 in this game but can vary in some implementations. While this game is fun and addictive, I am apparently not very good at it and I wanted to cheat a little bit by making a program that would give me the best possible words (the longer the word the more points you get).
(source: boggled.org)
I am, unfortunately, not very good with algorithms or their efficiencies and so forth. My first attempt uses a dictionary such as this one (~2.3MB) and does a linear search trying to match combinations with dictionary entries. This takes a very long time to find the possible words, and since you only get 2 minutes per round, it is simply not adequate.
I am interested to see if any Stackoverflowers can come up with more efficient solutions. I am mostly looking for solutions using the Big 3 Ps: Python, PHP, and Perl, although anything with Java or C++ is cool too, since speed is essential.
CURRENT SOLUTIONS:
Adam Rosenfield, Python, ~20s
John Fouhy, Python, ~3s
Kent Fredric, Perl, ~1s
Darius Bacon, Python, ~1s
rvarcher, VB.NET, ~1s
Paolo Bergantino, PHP (live link), ~5s (~2s locally)
My answer works like the others here, but I'll post it because it looks a bit faster than the other Python solutions, from setting up the dictionary faster. (I checked this against John Fouhy's solution.) After setup, the time to solve is down in the noise.
grid = "fxie amlo ewbx astu".split()
nrows, ncols = len(grid), len(grid[0])
# A dictionary word that could be a solution must use only the grid's
# letters and have length >= 3. (With a case-insensitive match.)
import re
alphabet = ''.join(set(''.join(grid)))
bogglable = re.compile('[' + alphabet + ']{3,}$', re.I).match
words = set(word.rstrip('\n') for word in open('words') if bogglable(word))
prefixes = set(word[:i] for word in words
for i in range(2, len(word)+1))
def solve():
for y, row in enumerate(grid):
for x, letter in enumerate(row):
for result in extending(letter, ((x, y),)):
yield result
def extending(prefix, path):
if prefix in words:
yield (prefix, path)
for (nx, ny) in neighbors(path[-1]):
if (nx, ny) not in path:
prefix1 = prefix + grid[ny][nx]
if prefix1 in prefixes:
for result in extending(prefix1, path + ((nx, ny),)):
yield result
def neighbors((x, y)):
for nx in range(max(0, x-1), min(x+2, ncols)):
for ny in range(max(0, y-1), min(y+2, nrows)):
yield (nx, ny)
Sample usage:
# Print a maximal-length word and its path:
print max(solve(), key=lambda (word, path): len(word))
Edit: Filter out words less than 3 letters long.
Edit 2: I was curious why Kent Fredric's Perl solution was faster; it turns out to use regular-expression matching instead of a set of characters. Doing the same in Python about doubles the speed.
The fastest solution you're going to get will probably involve storing your dictionary in a trie. Then, create a queue of triplets (x, y, s), where each element in the queue corresponds to a prefix s of a word which can be spelled in the grid, ending at location (x, y). Initialize the queue with N x N elements (where N is the size of your grid), one element for each square in the grid. Then, the algorithm proceeds as follows:
While the queue is not empty:
Dequeue a triple (x, y, s)
For each square (x', y') with letter c adjacent to (x, y):
If s+c is a word, output s+c
If s+c is a prefix of a word, insert (x', y', s+c) into the queue
If you store your dictionary in a trie, testing if s+c is a word or a prefix of a word can be done in constant time (provided you also keep some extra metadata in each queue datum, such as a pointer to the current node in the trie), so the running time of this algorithm is O(number of words that can be spelled).
[Edit] Here's an implementation in Python that I just coded up:
#!/usr/bin/python
class TrieNode:
def __init__(self, parent, value):
self.parent = parent
self.children = [None] * 26
self.isWord = False
if parent is not None:
parent.children[ord(value) - 97] = self
def MakeTrie(dictfile):
dict = open(dictfile)
root = TrieNode(None, '')
for word in dict:
curNode = root
for letter in word.lower():
if 97 <= ord(letter) < 123:
nextNode = curNode.children[ord(letter) - 97]
if nextNode is None:
nextNode = TrieNode(curNode, letter)
curNode = nextNode
curNode.isWord = True
return root
def BoggleWords(grid, dict):
rows = len(grid)
cols = len(grid[0])
queue = []
words = []
for y in range(cols):
for x in range(rows):
c = grid[y][x]
node = dict.children[ord(c) - 97]
if node is not None:
queue.append((x, y, c, node))
while queue:
x, y, s, node = queue[0]
del queue[0]
for dx, dy in ((1, 0), (1, -1), (0, -1), (-1, -1), (-1, 0), (-1, 1), (0, 1), (1, 1)):
x2, y2 = x + dx, y + dy
if 0 <= x2 < cols and 0 <= y2 < rows:
s2 = s + grid[y2][x2]
node2 = node.children[ord(grid[y2][x2]) - 97]
if node2 is not None:
if node2.isWord:
words.append(s2)
queue.append((x2, y2, s2, node2))
return words
Example usage:
d = MakeTrie('/usr/share/dict/words')
print(BoggleWords(['fxie','amlo','ewbx','astu'], d))
Output:
['fa', 'xi', 'ie', 'io', 'el', 'am', 'ax', 'ae', 'aw', 'mi', 'ma', 'me', 'lo', 'li', 'oe', 'ox', 'em', 'ea', 'ea', 'es', 'wa', 'we', 'wa', 'bo', 'bu', 'as', 'aw', 'ae', 'st', 'se', 'sa', 'tu', 'ut', 'fam', 'fae', 'imi', 'eli', 'elm', 'elb', 'ami', 'ama', 'ame', 'aes', 'awl', 'awa', 'awe', 'awa', 'mix', 'mim', 'mil', 'mam', 'max', 'mae', 'maw', 'mew', 'mem', 'mes', 'lob', 'lox', 'lei', 'leo', 'lie', 'lim', 'oil', 'olm', 'ewe', 'eme', 'wax', 'waf', 'wae', 'waw', 'wem', 'wea', 'wea', 'was', 'waw', 'wae', 'bob', 'blo', 'bub', 'but', 'ast', 'ase', 'asa', 'awl', 'awa', 'awe', 'awa', 'aes', 'swa', 'swa', 'sew', 'sea', 'sea', 'saw', 'tux', 'tub', 'tut', 'twa', 'twa', 'tst', 'utu', 'fama', 'fame', 'ixil', 'imam', 'amli', 'amil', 'ambo', 'axil', 'axle', 'mimi', 'mima', 'mime', 'milo', 'mile', 'mewl', 'mese', 'mesa', 'lolo', 'lobo', 'lima', 'lime', 'limb', 'lile', 'oime', 'oleo', 'olio', 'oboe', 'obol', 'emim', 'emil', 'east', 'ease', 'wame', 'wawa', 'wawa', 'weam', 'west', 'wese', 'wast', 'wase', 'wawa', 'wawa', 'boil', 'bolo', 'bole', 'bobo', 'blob', 'bleo', 'bubo', 'asem', 'stub', 'stut', 'swam', 'semi', 'seme', 'seam', 'seax', 'sasa', 'sawt', 'tutu', 'tuts', 'twae', 'twas', 'twae', 'ilima', 'amble', 'axile', 'awest', 'mamie', 'mambo', 'maxim', 'mease', 'mesem', 'limax', 'limes', 'limbo', 'limbu', 'obole', 'emesa', 'embox', 'awest', 'swami', 'famble', 'mimble', 'maxima', 'embolo', 'embole', 'wamble', 'semese', 'semble', 'sawbwa', 'sawbwa']
Notes: This program doesn't output 1-letter words, or filter by word length at all. That's easy to add but not really relevant to the problem. It also outputs some words multiple times if they can be spelled in multiple ways. If a given word can be spelled in many different ways (worst case: every letter in the grid is the same (e.g. 'A') and a word like 'aaaaaaaaaa' is in your dictionary), then the running time will get horribly exponential. Filtering out duplicates and sorting is trivial to due after the algorithm has finished.
For a dictionary speedup, there is one general transformation/process you can do to greatly reduce the dictionary comparisons ahead of time.
Given that the above grid contains only 16 characters, some of them duplicate, you can greatly reduce the number of total keys in your dictionary by simply filtering out entries that have unattainable characters.
I thought this was the obvious optimization but seeing nobody did it I'm mentioning it.
It reduced me from a dictionary of 200,000 keys to only 2,000 keys simply during the input pass. This at the very least reduces memory overhead, and that's sure to map to a speed increase somewhere as memory isn't infinitely fast.
Perl Implementation
My implementation is a bit top-heavy because I placed importance on being able to know the exact path of every extracted string, not just the validity therein.
I also have a few adaptions in there that would theoretically permit a grid with holes in it to function, and grids with different sized lines ( assuming you get the input right and it lines up somehow ).
The early-filter is by far the most significant bottleneck in my application, as suspected earlier, commenting out that line bloats it from 1.5s to 7.5s.
Upon execution it appears to think all the single digits are on their own valid words, but I'm pretty sure thats due to how the dictionary file works.
Its a bit bloated, but at least I reuse Tree::Trie from cpan
Some of it was inspired partially by the existing implementations, some of it I had in mind already.
Constructive Criticism and ways it could be improved welcome ( /me notes he never searched CPAN for a boggle solver, but this was more fun to work out )
updated for new criteria
#!/usr/bin/perl
use strict;
use warnings;
{
# this package manages a given path through the grid.
# Its an array of matrix-nodes in-order with
# Convenience functions for pretty-printing the paths
# and for extending paths as new paths.
# Usage:
# my $p = Prefix->new(path=>[ $startnode ]);
# my $c = $p->child( $extensionNode );
# print $c->current_word ;
package Prefix;
use Moose;
has path => (
isa => 'ArrayRef[MatrixNode]',
is => 'rw',
default => sub { [] },
);
has current_word => (
isa => 'Str',
is => 'rw',
lazy_build => 1,
);
# Create a clone of this object
# with a longer path
# $o->child( $successive-node-on-graph );
sub child {
my $self = shift;
my $newNode = shift;
my $f = Prefix->new();
# Have to do this manually or other recorded paths get modified
push #{ $f->{path} }, #{ $self->{path} }, $newNode;
return $f;
}
# Traverses $o->path left-to-right to get the string it represents.
sub _build_current_word {
my $self = shift;
return join q{}, map { $_->{value} } #{ $self->{path} };
}
# Returns the rightmost node on this path
sub tail {
my $self = shift;
return $self->{path}->[-1];
}
# pretty-format $o->path
sub pp_path {
my $self = shift;
my #path =
map { '[' . $_->{x_position} . ',' . $_->{y_position} . ']' }
#{ $self->{path} };
return "[" . join( ",", #path ) . "]";
}
# pretty-format $o
sub pp {
my $self = shift;
return $self->current_word . ' => ' . $self->pp_path;
}
__PACKAGE__->meta->make_immutable;
}
{
# Basic package for tracking node data
# without having to look on the grid.
# I could have just used an array or a hash, but that got ugly.
# Once the matrix is up and running it doesn't really care so much about rows/columns,
# Its just a sea of points and each point has adjacent points.
# Relative positioning is only really useful to map it back to userspace
package MatrixNode;
use Moose;
has x_position => ( isa => 'Int', is => 'rw', required => 1 );
has y_position => ( isa => 'Int', is => 'rw', required => 1 );
has value => ( isa => 'Str', is => 'rw', required => 1 );
has siblings => (
isa => 'ArrayRef[MatrixNode]',
is => 'rw',
default => sub { [] }
);
# Its not implicitly uni-directional joins. It would be more effient in therory
# to make the link go both ways at the same time, but thats too hard to program around.
# and besides, this isn't slow enough to bother caring about.
sub add_sibling {
my $self = shift;
my $sibling = shift;
push #{ $self->siblings }, $sibling;
}
# Convenience method to derive a path starting at this node
sub to_path {
my $self = shift;
return Prefix->new( path => [$self] );
}
__PACKAGE__->meta->make_immutable;
}
{
package Matrix;
use Moose;
has rows => (
isa => 'ArrayRef',
is => 'rw',
default => sub { [] },
);
has regex => (
isa => 'Regexp',
is => 'rw',
lazy_build => 1,
);
has cells => (
isa => 'ArrayRef',
is => 'rw',
lazy_build => 1,
);
sub add_row {
my $self = shift;
push #{ $self->rows }, [#_];
}
# Most of these functions from here down are just builder functions,
# or utilities to help build things.
# Some just broken out to make it easier for me to process.
# All thats really useful is add_row
# The rest will generally be computed, stored, and ready to go
# from ->cells by the time either ->cells or ->regex are called.
# traverse all cells and make a regex that covers them.
sub _build_regex {
my $self = shift;
my $chars = q{};
for my $cell ( #{ $self->cells } ) {
$chars .= $cell->value();
}
$chars = "[^$chars]";
return qr/$chars/i;
}
# convert a plain cell ( ie: [x][y] = 0 )
# to an intelligent cell ie: [x][y] = object( x, y )
# we only really keep them in this format temporarily
# so we can go through and tie in neighbouring information.
# after the neigbouring is done, the grid should be considered inoperative.
sub _convert {
my $self = shift;
my $x = shift;
my $y = shift;
my $v = $self->_read( $x, $y );
my $n = MatrixNode->new(
x_position => $x,
y_position => $y,
value => $v,
);
$self->_write( $x, $y, $n );
return $n;
}
# go through the rows/collums presently available and freeze them into objects.
sub _build_cells {
my $self = shift;
my #out = ();
my #rows = #{ $self->{rows} };
for my $x ( 0 .. $#rows ) {
next unless defined $self->{rows}->[$x];
my #col = #{ $self->{rows}->[$x] };
for my $y ( 0 .. $#col ) {
next unless defined $self->{rows}->[$x]->[$y];
push #out, $self->_convert( $x, $y );
}
}
for my $c (#out) {
for my $n ( $self->_neighbours( $c->x_position, $c->y_position ) ) {
$c->add_sibling( $self->{rows}->[ $n->[0] ]->[ $n->[1] ] );
}
}
return \#out;
}
# given x,y , return array of points that refer to valid neighbours.
sub _neighbours {
my $self = shift;
my $x = shift;
my $y = shift;
my #out = ();
for my $sx ( -1, 0, 1 ) {
next if $sx + $x < 0;
next if not defined $self->{rows}->[ $sx + $x ];
for my $sy ( -1, 0, 1 ) {
next if $sx == 0 && $sy == 0;
next if $sy + $y < 0;
next if not defined $self->{rows}->[ $sx + $x ]->[ $sy + $y ];
push #out, [ $sx + $x, $sy + $y ];
}
}
return #out;
}
sub _has_row {
my $self = shift;
my $x = shift;
return defined $self->{rows}->[$x];
}
sub _has_cell {
my $self = shift;
my $x = shift;
my $y = shift;
return defined $self->{rows}->[$x]->[$y];
}
sub _read {
my $self = shift;
my $x = shift;
my $y = shift;
return $self->{rows}->[$x]->[$y];
}
sub _write {
my $self = shift;
my $x = shift;
my $y = shift;
my $v = shift;
$self->{rows}->[$x]->[$y] = $v;
return $v;
}
__PACKAGE__->meta->make_immutable;
}
use Tree::Trie;
sub readDict {
my $fn = shift;
my $re = shift;
my $d = Tree::Trie->new();
# Dictionary Loading
open my $fh, '<', $fn;
while ( my $line = <$fh> ) {
chomp($line);
# Commenting the next line makes it go from 1.5 seconds to 7.5 seconds. EPIC.
next if $line =~ $re; # Early Filter
$d->add( uc($line) );
}
return $d;
}
sub traverseGraph {
my $d = shift;
my $m = shift;
my $min = shift;
my $max = shift;
my #words = ();
# Inject all grid nodes into the processing queue.
my #queue =
grep { $d->lookup( $_->current_word ) }
map { $_->to_path } #{ $m->cells };
while (#queue) {
my $item = shift #queue;
# put the dictionary into "exact match" mode.
$d->deepsearch('exact');
my $cword = $item->current_word;
my $l = length($cword);
if ( $l >= $min && $d->lookup($cword) ) {
push #words,
$item; # push current path into "words" if it exactly matches.
}
next if $l > $max;
# put the dictionary into "is-a-prefix" mode.
$d->deepsearch('boolean');
siblingloop: foreach my $sibling ( #{ $item->tail->siblings } ) {
foreach my $visited ( #{ $item->{path} } ) {
next siblingloop if $sibling == $visited;
}
# given path y , iterate for all its end points
my $subpath = $item->child($sibling);
# create a new path for each end-point
if ( $d->lookup( $subpath->current_word ) ) {
# if the new path is a prefix, add it to the bottom of the queue.
push #queue, $subpath;
}
}
}
return \#words;
}
sub setup_predetermined {
my $m = shift;
my $gameNo = shift;
if( $gameNo == 0 ){
$m->add_row(qw( F X I E ));
$m->add_row(qw( A M L O ));
$m->add_row(qw( E W B X ));
$m->add_row(qw( A S T U ));
return $m;
}
if( $gameNo == 1 ){
$m->add_row(qw( D G H I ));
$m->add_row(qw( K L P S ));
$m->add_row(qw( Y E U T ));
$m->add_row(qw( E O R N ));
return $m;
}
}
sub setup_random {
my $m = shift;
my $seed = shift;
srand $seed;
my #letters = 'A' .. 'Z' ;
for( 1 .. 4 ){
my #r = ();
for( 1 .. 4 ){
push #r , $letters[int(rand(25))];
}
$m->add_row( #r );
}
}
# Here is where the real work starts.
my $m = Matrix->new();
setup_predetermined( $m, 0 );
#setup_random( $m, 5 );
my $d = readDict( 'dict.txt', $m->regex );
my $c = scalar #{ $m->cells }; # get the max, as per spec
print join ",\n", map { $_->pp } #{
traverseGraph( $d, $m, 3, $c ) ;
};
Arch/execution info for comparison:
model name : Intel(R) Core(TM)2 Duo CPU T9300 # 2.50GHz
cache size : 6144 KB
Memory usage summary: heap total: 77057577, heap peak: 11446200, stack peak: 26448
total calls total memory failed calls
malloc| 947212 68763684 0
realloc| 11191 1045641 0 (nomove:9063, dec:4731, free:0)
calloc| 121001 7248252 0
free| 973159 65854762
Histogram for block sizes:
0-15 392633 36% ==================================================
16-31 43530 4% =====
32-47 50048 4% ======
48-63 70701 6% =========
64-79 18831 1% ==
80-95 19271 1% ==
96-111 238398 22% ==============================
112-127 3007 <1%
128-143 236727 21% ==============================
More Mumblings on that Regex Optimization
The regex optimization I use is useless for multi-solve dictionaries, and for multi-solve you'll want a full dictionary, not a pre-trimmed one.
However, that said, for one-off solves, its really fast. ( Perl regex are in C! :) )
Here is some varying code additions:
sub readDict_nofilter {
my $fn = shift;
my $re = shift;
my $d = Tree::Trie->new();
# Dictionary Loading
open my $fh, '<', $fn;
while ( my $line = <$fh> ) {
chomp($line);
$d->add( uc($line) );
}
return $d;
}
sub benchmark_io {
use Benchmark qw( cmpthese :hireswallclock );
# generate a random 16 character string
# to simulate there being an input grid.
my $regexen = sub {
my #letters = 'A' .. 'Z' ;
my #lo = ();
for( 1..16 ){
push #lo , $_ ;
}
my $c = join '', #lo;
$c = "[^$c]";
return qr/$c/i;
};
cmpthese( 200 , {
filtered => sub {
readDict('dict.txt', $regexen->() );
},
unfiltered => sub {
readDict_nofilter('dict.txt');
}
});
}
s/iter unfiltered filtered
unfiltered 8.16 -- -94%
filtered 0.464 1658% --
ps: 8.16 * 200 = 27 minutes.
You could split the problem up into two pieces:
Some kind of search algorithm that will enumerate possible strings in the grid.
A way of testing whether a string is a valid word.
Ideally, (2) should also include a way of testing whether a string is a prefix of a valid word – this will allow you to prune your search and save a whole heap of time.
Adam Rosenfield's Trie is a solution to (2). It's elegant and probably what your algorithms specialist would prefer, but with modern languages and modern computers, we can be a bit lazier. Also, as Kent suggests, we can reduce our dictionary size by discarding words that have letters not present in the grid. Here's some python:
def make_lookups(grid, fn='dict.txt'):
# Make set of valid characters.
chars = set()
for word in grid:
chars.update(word)
words = set(x.strip() for x in open(fn) if set(x.strip()) <= chars)
prefixes = set()
for w in words:
for i in range(len(w)+1):
prefixes.add(w[:i])
return words, prefixes
Wow; constant-time prefix testing. It takes a couple of seconds to load the dictionary you linked, but only a couple :-) (notice that words <= prefixes)
Now, for part (1), I'm inclined to think in terms of graphs. So I'll build a dictionary that looks something like this:
graph = { (x, y):set([(x0,y0), (x1,y1), (x2,y2)]), }
i.e. graph[(x, y)] is the set of coordinates that you can reach from position (x, y). I'll also add a dummy node None which will connect to everything.
Building it's a bit clumsy, because there's 8 possible positions and you have to do bounds checking. Here's some correspondingly-clumsy python code:
def make_graph(grid):
root = None
graph = { root:set() }
chardict = { root:'' }
for i, row in enumerate(grid):
for j, char in enumerate(row):
chardict[(i, j)] = char
node = (i, j)
children = set()
graph[node] = children
graph[root].add(node)
add_children(node, children, grid)
return graph, chardict
def add_children(node, children, grid):
x0, y0 = node
for i in [-1,0,1]:
x = x0 + i
if not (0 <= x < len(grid)):
continue
for j in [-1,0,1]:
y = y0 + j
if not (0 <= y < len(grid[0])) or (i == j == 0):
continue
children.add((x,y))
This code also builds up a dictionary mapping (x,y) to the corresponding character. This lets me turn a list of positions into a word:
def to_word(chardict, pos_list):
return ''.join(chardict[x] for x in pos_list)
Finally, we do a depth-first search. The basic procedure is:
The search arrives at a particular node.
Check if the path so far could be part of a word. If not, don't explore this branch any further.
Check if the path so far is a word. If so, add to the list of results.
Explore all children not part of the path so far.
Python:
def find_words(graph, chardict, position, prefix, results, words, prefixes):
""" Arguments:
graph :: mapping (x,y) to set of reachable positions
chardict :: mapping (x,y) to character
position :: current position (x,y) -- equals prefix[-1]
prefix :: list of positions in current string
results :: set of words found
words :: set of valid words in the dictionary
prefixes :: set of valid words or prefixes thereof
"""
word = to_word(chardict, prefix)
if word not in prefixes:
return
if word in words:
results.add(word)
for child in graph[position]:
if child not in prefix:
find_words(graph, chardict, child, prefix+[child], results, words, prefixes)
Run the code as:
grid = ['fxie', 'amlo', 'ewbx', 'astu']
g, c = make_graph(grid)
w, p = make_lookups(grid)
res = set()
find_words(g, c, None, [], res, w, p)
and inspect res to see the answers. Here's a list of words found for your example, sorted by size:
['a', 'b', 'e', 'f', 'i', 'l', 'm', 'o', 's', 't',
'u', 'w', 'x', 'ae', 'am', 'as', 'aw', 'ax', 'bo',
'bu', 'ea', 'el', 'em', 'es', 'fa', 'ie', 'io', 'li',
'lo', 'ma', 'me', 'mi', 'oe', 'ox', 'sa', 'se', 'st',
'tu', 'ut', 'wa', 'we', 'xi', 'aes', 'ame', 'ami',
'ase', 'ast', 'awa', 'awe', 'awl', 'blo', 'but', 'elb',
'elm', 'fae', 'fam', 'lei', 'lie', 'lim', 'lob', 'lox',
'mae', 'maw', 'mew', 'mil', 'mix', 'oil', 'olm', 'saw',
'sea', 'sew', 'swa', 'tub', 'tux', 'twa', 'wae', 'was',
'wax', 'wem', 'ambo', 'amil', 'amli', 'asem', 'axil',
'axle', 'bleo', 'boil', 'bole', 'east', 'fame', 'limb',
'lime', 'mesa', 'mewl', 'mile', 'milo', 'oime', 'sawt',
'seam', 'seax', 'semi', 'stub', 'swam', 'twae', 'twas',
'wame', 'wase', 'wast', 'weam', 'west', 'amble', 'awest',
'axile', 'embox', 'limbo', 'limes', 'swami', 'embole',
'famble', 'semble', 'wamble']
The code takes (literally) a couple of seconds to load the dictionary, but the rest is instant on my machine.
My attempt in Java. It takes about 2 s to read file and build trie, and around 50 ms to solve the puzzle. I used the dictionary linked in the question (it has a few words that I didn't know exist in English such as fae, ima)
0 [main] INFO gineer.bogglesolver.util.Util - Reading the dictionary
2234 [main] INFO gineer.bogglesolver.util.Util - Finish reading the dictionary
2234 [main] INFO gineer.bogglesolver.Solver - Found: FAM
2234 [main] INFO gineer.bogglesolver.Solver - Found: FAME
2234 [main] INFO gineer.bogglesolver.Solver - Found: FAMBLE
2234 [main] INFO gineer.bogglesolver.Solver - Found: FAE
2234 [main] INFO gineer.bogglesolver.Solver - Found: IMA
2234 [main] INFO gineer.bogglesolver.Solver - Found: ELI
2234 [main] INFO gineer.bogglesolver.Solver - Found: ELM
2234 [main] INFO gineer.bogglesolver.Solver - Found: ELB
2234 [main] INFO gineer.bogglesolver.Solver - Found: AXIL
2234 [main] INFO gineer.bogglesolver.Solver - Found: AXILE
2234 [main] INFO gineer.bogglesolver.Solver - Found: AXLE
2234 [main] INFO gineer.bogglesolver.Solver - Found: AMI
2234 [main] INFO gineer.bogglesolver.Solver - Found: AMIL
2234 [main] INFO gineer.bogglesolver.Solver - Found: AMLI
2234 [main] INFO gineer.bogglesolver.Solver - Found: AME
2234 [main] INFO gineer.bogglesolver.Solver - Found: AMBLE
2234 [main] INFO gineer.bogglesolver.Solver - Found: AMBO
2250 [main] INFO gineer.bogglesolver.Solver - Found: AES
2250 [main] INFO gineer.bogglesolver.Solver - Found: AWL
2250 [main] INFO gineer.bogglesolver.Solver - Found: AWE
2250 [main] INFO gineer.bogglesolver.Solver - Found: AWEST
2250 [main] INFO gineer.bogglesolver.Solver - Found: AWA
2250 [main] INFO gineer.bogglesolver.Solver - Found: MIX
2250 [main] INFO gineer.bogglesolver.Solver - Found: MIL
2250 [main] INFO gineer.bogglesolver.Solver - Found: MILE
2250 [main] INFO gineer.bogglesolver.Solver - Found: MILO
2250 [main] INFO gineer.bogglesolver.Solver - Found: MAX
2250 [main] INFO gineer.bogglesolver.Solver - Found: MAE
2250 [main] INFO gineer.bogglesolver.Solver - Found: MAW
2250 [main] INFO gineer.bogglesolver.Solver - Found: MEW
2250 [main] INFO gineer.bogglesolver.Solver - Found: MEWL
2250 [main] INFO gineer.bogglesolver.Solver - Found: MES
2250 [main] INFO gineer.bogglesolver.Solver - Found: MESA
2250 [main] INFO gineer.bogglesolver.Solver - Found: MWA
2250 [main] INFO gineer.bogglesolver.Solver - Found: MWA
2250 [main] INFO gineer.bogglesolver.Solver - Found: LIE
2250 [main] INFO gineer.bogglesolver.Solver - Found: LIM
2250 [main] INFO gineer.bogglesolver.Solver - Found: LIMA
2250 [main] INFO gineer.bogglesolver.Solver - Found: LIMAX
2250 [main] INFO gineer.bogglesolver.Solver - Found: LIME
2250 [main] INFO gineer.bogglesolver.Solver - Found: LIMES
2250 [main] INFO gineer.bogglesolver.Solver - Found: LIMB
2250 [main] INFO gineer.bogglesolver.Solver - Found: LIMBO
2250 [main] INFO gineer.bogglesolver.Solver - Found: LIMBU
2250 [main] INFO gineer.bogglesolver.Solver - Found: LEI
2250 [main] INFO gineer.bogglesolver.Solver - Found: LEO
2250 [main] INFO gineer.bogglesolver.Solver - Found: LOB
2250 [main] INFO gineer.bogglesolver.Solver - Found: LOX
2250 [main] INFO gineer.bogglesolver.Solver - Found: OIME
2250 [main] INFO gineer.bogglesolver.Solver - Found: OIL
2250 [main] INFO gineer.bogglesolver.Solver - Found: OLE
2250 [main] INFO gineer.bogglesolver.Solver - Found: OLM
2250 [main] INFO gineer.bogglesolver.Solver - Found: EMIL
2250 [main] INFO gineer.bogglesolver.Solver - Found: EMBOLE
2250 [main] INFO gineer.bogglesolver.Solver - Found: EMBOX
2250 [main] INFO gineer.bogglesolver.Solver - Found: EAST
2250 [main] INFO gineer.bogglesolver.Solver - Found: WAF
2250 [main] INFO gineer.bogglesolver.Solver - Found: WAX
2250 [main] INFO gineer.bogglesolver.Solver - Found: WAME
2250 [main] INFO gineer.bogglesolver.Solver - Found: WAMBLE
2250 [main] INFO gineer.bogglesolver.Solver - Found: WAE
2250 [main] INFO gineer.bogglesolver.Solver - Found: WEA
2250 [main] INFO gineer.bogglesolver.Solver - Found: WEAM
2250 [main] INFO gineer.bogglesolver.Solver - Found: WEM
2250 [main] INFO gineer.bogglesolver.Solver - Found: WEA
2250 [main] INFO gineer.bogglesolver.Solver - Found: WES
2250 [main] INFO gineer.bogglesolver.Solver - Found: WEST
2250 [main] INFO gineer.bogglesolver.Solver - Found: WAE
2250 [main] INFO gineer.bogglesolver.Solver - Found: WAS
2250 [main] INFO gineer.bogglesolver.Solver - Found: WASE
2250 [main] INFO gineer.bogglesolver.Solver - Found: WAST
2250 [main] INFO gineer.bogglesolver.Solver - Found: BLEO
2250 [main] INFO gineer.bogglesolver.Solver - Found: BLO
2250 [main] INFO gineer.bogglesolver.Solver - Found: BOIL
2250 [main] INFO gineer.bogglesolver.Solver - Found: BOLE
2250 [main] INFO gineer.bogglesolver.Solver - Found: BUT
2250 [main] INFO gineer.bogglesolver.Solver - Found: AES
2250 [main] INFO gineer.bogglesolver.Solver - Found: AWA
2250 [main] INFO gineer.bogglesolver.Solver - Found: AWL
2250 [main] INFO gineer.bogglesolver.Solver - Found: AWE
2250 [main] INFO gineer.bogglesolver.Solver - Found: AWEST
2250 [main] INFO gineer.bogglesolver.Solver - Found: ASE
2250 [main] INFO gineer.bogglesolver.Solver - Found: ASEM
2250 [main] INFO gineer.bogglesolver.Solver - Found: AST
2250 [main] INFO gineer.bogglesolver.Solver - Found: SEA
2250 [main] INFO gineer.bogglesolver.Solver - Found: SEAX
2250 [main] INFO gineer.bogglesolver.Solver - Found: SEAM
2250 [main] INFO gineer.bogglesolver.Solver - Found: SEMI
2250 [main] INFO gineer.bogglesolver.Solver - Found: SEMBLE
2250 [main] INFO gineer.bogglesolver.Solver - Found: SEW
2250 [main] INFO gineer.bogglesolver.Solver - Found: SEA
2250 [main] INFO gineer.bogglesolver.Solver - Found: SWA
2250 [main] INFO gineer.bogglesolver.Solver - Found: SWAM
2250 [main] INFO gineer.bogglesolver.Solver - Found: SWAMI
2250 [main] INFO gineer.bogglesolver.Solver - Found: SWA
2250 [main] INFO gineer.bogglesolver.Solver - Found: SAW
2250 [main] INFO gineer.bogglesolver.Solver - Found: SAWT
2250 [main] INFO gineer.bogglesolver.Solver - Found: STU
2250 [main] INFO gineer.bogglesolver.Solver - Found: STUB
2250 [main] INFO gineer.bogglesolver.Solver - Found: TWA
2250 [main] INFO gineer.bogglesolver.Solver - Found: TWAE
2250 [main] INFO gineer.bogglesolver.Solver - Found: TWA
2250 [main] INFO gineer.bogglesolver.Solver - Found: TWAE
2250 [main] INFO gineer.bogglesolver.Solver - Found: TWAS
2250 [main] INFO gineer.bogglesolver.Solver - Found: TUB
2250 [main] INFO gineer.bogglesolver.Solver - Found: TUX
Source code consists of 6 classes. I'll post them below (if this is not the right practice on StackOverflow, please tell me).
gineer.bogglesolver.Main
package gineer.bogglesolver;
import org.apache.log4j.BasicConfigurator;
import org.apache.log4j.Logger;
public class Main
{
private final static Logger logger = Logger.getLogger(Main.class);
public static void main(String[] args)
{
BasicConfigurator.configure();
Solver solver = new Solver(4,
"FXIE" +
"AMLO" +
"EWBX" +
"ASTU");
solver.solve();
}
}
gineer.bogglesolver.Solver
package gineer.bogglesolver;
import gineer.bogglesolver.trie.Trie;
import gineer.bogglesolver.util.Constants;
import gineer.bogglesolver.util.Util;
import org.apache.log4j.Logger;
public class Solver
{
private char[] puzzle;
private int maxSize;
private boolean[] used;
private StringBuilder stringSoFar;
private boolean[][] matrix;
private Trie trie;
private final static Logger logger = Logger.getLogger(Solver.class);
public Solver(int size, String puzzle)
{
trie = Util.getTrie(size);
matrix = Util.connectivityMatrix(size);
maxSize = size * size;
stringSoFar = new StringBuilder(maxSize);
used = new boolean[maxSize];
if (puzzle.length() == maxSize)
{
this.puzzle = puzzle.toCharArray();
}
else
{
logger.error("The puzzle size does not match the size specified: " + puzzle.length());
this.puzzle = puzzle.substring(0, maxSize).toCharArray();
}
}
public void solve()
{
for (int i = 0; i < maxSize; i++)
{
traverseAt(i);
}
}
private void traverseAt(int origin)
{
stringSoFar.append(puzzle[origin]);
used[origin] = true;
//Check if we have a valid word
if ((stringSoFar.length() >= Constants.MINIMUM_WORD_LENGTH) && (trie.containKey(stringSoFar.toString())))
{
logger.info("Found: " + stringSoFar.toString());
}
//Find where to go next
for (int destination = 0; destination < maxSize; destination++)
{
if (matrix[origin][destination] && !used[destination] && trie.containPrefix(stringSoFar.toString() + puzzle[destination]))
{
traverseAt(destination);
}
}
used[origin] = false;
stringSoFar.deleteCharAt(stringSoFar.length() - 1);
}
}
gineer.bogglesolver.trie.Node
package gineer.bogglesolver.trie;
import gineer.bogglesolver.util.Constants;
class Node
{
Node[] children;
boolean isKey;
public Node()
{
isKey = false;
children = new Node[Constants.NUMBER_LETTERS_IN_ALPHABET];
}
public Node(boolean key)
{
isKey = key;
children = new Node[Constants.NUMBER_LETTERS_IN_ALPHABET];
}
/**
Method to insert a string to Node and its children
#param key the string to insert (the string is assumed to be uppercase)
#return true if the node or one of its children is changed, false otherwise
*/
public boolean insert(String key)
{
//If the key is empty, this node is a key
if (key.length() == 0)
{
if (isKey)
return false;
else
{
isKey = true;
return true;
}
}
else
{//otherwise, insert in one of its child
int childNodePosition = key.charAt(0) - Constants.LETTER_A;
if (children[childNodePosition] == null)
{
children[childNodePosition] = new Node();
children[childNodePosition].insert(key.substring(1));
return true;
}
else
{
return children[childNodePosition].insert(key.substring(1));
}
}
}
/**
Returns whether key is a valid prefix for certain key in this trie.
For example: if key "hello" is in this trie, tests with all prefixes "hel", "hell", "hello" return true
#param prefix the prefix to check
#return true if the prefix is valid, false otherwise
*/
public boolean containPrefix(String prefix)
{
//If the prefix is empty, return true
if (prefix.length() == 0)
{
return true;
}
else
{//otherwise, check in one of its child
int childNodePosition = prefix.charAt(0) - Constants.LETTER_A;
return children[childNodePosition] != null && children[childNodePosition].containPrefix(prefix.substring(1));
}
}
/**
Returns whether key is a valid key in this trie.
For example: if key "hello" is in this trie, tests with all prefixes "hel", "hell" return false
#param key the key to check
#return true if the key is valid, false otherwise
*/
public boolean containKey(String key)
{
//If the prefix is empty, return true
if (key.length() == 0)
{
return isKey;
}
else
{//otherwise, check in one of its child
int childNodePosition = key.charAt(0) - Constants.LETTER_A;
return children[childNodePosition] != null && children[childNodePosition].containKey(key.substring(1));
}
}
public boolean isKey()
{
return isKey;
}
public void setKey(boolean key)
{
isKey = key;
}
}
gineer.bogglesolver.trie.Trie
package gineer.bogglesolver.trie;
public class Trie
{
Node root;
public Trie()
{
this.root = new Node();
}
/**
Method to insert a string to Node and its children
#param key the string to insert (the string is assumed to be uppercase)
#return true if the node or one of its children is changed, false otherwise
*/
public boolean insert(String key)
{
return root.insert(key.toUpperCase());
}
/**
Returns whether key is a valid prefix for certain key in this trie.
For example: if key "hello" is in this trie, tests with all prefixes "hel", "hell", "hello" return true
#param prefix the prefix to check
#return true if the prefix is valid, false otherwise
*/
public boolean containPrefix(String prefix)
{
return root.containPrefix(prefix.toUpperCase());
}
/**
Returns whether key is a valid key in this trie.
For example: if key "hello" is in this trie, tests with all prefixes "hel", "hell" return false
#param key the key to check
#return true if the key is valid, false otherwise
*/
public boolean containKey(String key)
{
return root.containKey(key.toUpperCase());
}
}
gineer.bogglesolver.util.Constants
package gineer.bogglesolver.util;
public class Constants
{
public static final int NUMBER_LETTERS_IN_ALPHABET = 26;
public static final char LETTER_A = 'A';
public static final int MINIMUM_WORD_LENGTH = 3;
public static final int DEFAULT_PUZZLE_SIZE = 4;
}
gineer.bogglesolver.util.Util
package gineer.bogglesolver.util;
import gineer.bogglesolver.trie.Trie;
import org.apache.log4j.Logger;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class Util
{
private final static Logger logger = Logger.getLogger(Util.class);
private static Trie trie;
private static int size = Constants.DEFAULT_PUZZLE_SIZE;
/**
Returns the trie built from the dictionary. The size is used to eliminate words that are too long.
#param size the size of puzzle. The maximum lenght of words in the returned trie is (size * size)
#return the trie that can be used for puzzle of that size
*/
public static Trie getTrie(int size)
{
if ((trie != null) && size == Util.size)
return trie;
trie = new Trie();
Util.size = size;
logger.info("Reading the dictionary");
final File file = new File("dictionary.txt");
try
{
Scanner scanner = new Scanner(file);
final int maxSize = size * size;
while (scanner.hasNext())
{
String line = scanner.nextLine().replaceAll("[^\\p{Alpha}]", "");
if (line.length() <= maxSize)
trie.insert(line);
}
}
catch (FileNotFoundException e)
{
logger.error("Cannot open file", e);
}
logger.info("Finish reading the dictionary");
return trie;
}
static boolean[] connectivityRow(int x, int y, int size)
{
boolean[] squares = new boolean[size * size];
for (int offsetX = -1; offsetX <= 1; offsetX++)
{
for (int offsetY = -1; offsetY <= 1; offsetY++)
{
final int calX = x + offsetX;
final int calY = y + offsetY;
if ((calX >= 0) && (calX < size) && (calY >= 0) && (calY < size))
squares[calY * size + calX] = true;
}
}
squares[y * size + x] = false;//the current x, y is false
return squares;
}
/**
Returns the matrix of connectivity between two points. Point i can go to point j iff matrix[i][j] is true
Square (x, y) is equivalent to point (size * y + x). For example, square (1,1) is point 5 in a puzzle of size 4
#param size the size of the puzzle
#return the connectivity matrix
*/
public static boolean[][] connectivityMatrix(int size)
{
boolean[][] matrix = new boolean[size * size][];
for (int x = 0; x < size; x++)
{
for (int y = 0; y < size; y++)
{
matrix[y * size + x] = connectivityRow(x, y, size);
}
}
return matrix;
}
}
I think you will probably spend most of your time trying to match words that can't possibly be built by your letter grid. So, the first thing I would do is try to speed up that step and that should get you most of the way there.
For this, I would re-express the grid as a table of possible "moves" that you index by the letter-transition you are looking at.
Start by assigning each letter a number from your entire alphabet (A=0, B=1, C=2, ... and so forth).
Let's take this example:
h b c d
e e g h
l l k l
m o f p
And for now, lets use the alphabet of the letters we have (usually you'd probably want to use the same whole alphabet every time):
b | c | d | e | f | g | h | k | l | m | o | p
---+---+---+---+---+---+---+---+---+---+----+----
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11
Then you make a 2D boolean array that tells whether you have a certain letter transition available:
| 0 1 2 3 4 5 6 7 8 9 10 11 <- from letter
| b c d e f g h k l m o p
-----+--------------------------------------
0 b | T T T T
1 c | T T T T T
2 d | T T T
3 e | T T T T T T T
4 f | T T T T
5 g | T T T T T T T
6 h | T T T T T T T
7 k | T T T T T T T
8 l | T T T T T T T T T
9 m | T T
10 o | T T T T
11 p | T T T
^
to letter
Now go through your word list and convert the words to transitions:
hello (6, 3, 8, 8, 10):
6 -> 3, 3 -> 8, 8 -> 8, 8 -> 10
Then check if these transitions are allowed by looking them up in your table:
[6][ 3] : T
[3][ 8] : T
[8][ 8] : T
[8][10] : T
If they are all allowed, there's a chance that this word might be found.
For example the word "helmet" can be ruled out on the 4th transition (m to e: helMEt), since that entry in your table is false.
And the word hamster can be ruled out, since the first (h to a) transition is not allowed (doesn't even exist in your table).
Now, for the probably very few remaining words that you didn't eliminate, try to actually find them in the grid the way you're doing it now or as suggested in some of the other answers here. This is to avoid false positives that result from jumps between identical letters in your grid. For example the word "help" is allowed by the table, but not by the grid.
Some further performance improvement tips on this idea:
Instead of using a 2D array, use a 1D array and simply compute the index of the second letter yourself. So, instead of a 12x12 array like above, make a 1D array of length 144. If you then always use the same alphabet (i.e. a 26x26 = 676x1 array for the standard english alphabet), even if not all letters show up in your grid, you can pre-compute the indices into this 1D array that you need to test to match your dictionary words. For example, the indices for 'hello' in the example above would be
hello (6, 3, 8, 8, 10):
42 (from 6 + 3x12), 99, 104, 128
-> "hello" will be stored as 42, 99, 104, 128 in the dictionary
Extend the idea to a 3D table (expressed as a 1D array), i.e. all allowed 3-letter combinations. That way you can eliminate even more words immediately and you reduce the number of array lookups for each word by 1: For 'hello', you only need 3 array lookups: hel, ell, llo. It will be very quick to build this table, by the way, as there are only 400 possible 3-letter-moves in your grid.
Pre-compute the indices of the moves in your grid that you need to include in your table. For the example above, you need to set the following entries to 'True':
(0,0) (0,1) -> here: h, b : [6][0]
(0,0) (1,0) -> here: h, e : [6][3]
(0,0) (1,1) -> here: h, e : [6][3]
(0,1) (0,0) -> here: b, h : [0][6]
(0,1) (0,2) -> here: b, c : [0][1]
.
:
Also represent your game grid in a 1-D array with 16 entries and have the table pre-computed in 3. contain the indices into this array.
I'm sure if you use this approach you can get your code to run insanely fast, if you have the dictionary pre-computed and already loaded into memory.
BTW: Another nice thing to do, if you are building a game, is to run these sort of things immediately in the background. Start generating and solving the first game while the user is still looking at the title screen on your app and getting his finger into position to press "Play". Then generate and solve the next game as the user plays the previous one. That should give you a lot of time to run your code.
(I like this problem, so I'll probably be tempted to implement my proposal in Java sometime in the next days to see how it would actually perform... I'll post the code here once I do.)
UPDATE:
Ok, I had some time today and implemented this idea in Java:
class DictionaryEntry {
public int[] letters;
public int[] triplets;
}
class BoggleSolver {
// Constants
final int ALPHABET_SIZE = 5; // up to 2^5 = 32 letters
final int BOARD_SIZE = 4; // 4x4 board
final int[] moves = {-BOARD_SIZE-1, -BOARD_SIZE, -BOARD_SIZE+1,
-1, +1,
+BOARD_SIZE-1, +BOARD_SIZE, +BOARD_SIZE+1};
// Technically constant (calculated here for flexibility, but should be fixed)
DictionaryEntry[] dictionary; // Processed word list
int maxWordLength = 0;
int[] boardTripletIndices; // List of all 3-letter moves in board coordinates
DictionaryEntry[] buildDictionary(String fileName) throws IOException {
BufferedReader fileReader = new BufferedReader(new FileReader(fileName));
String word = fileReader.readLine();
ArrayList<DictionaryEntry> result = new ArrayList<DictionaryEntry>();
while (word!=null) {
if (word.length()>=3) {
word = word.toUpperCase();
if (word.length()>maxWordLength) maxWordLength = word.length();
DictionaryEntry entry = new DictionaryEntry();
entry.letters = new int[word.length() ];
entry.triplets = new int[word.length()-2];
int i=0;
for (char letter: word.toCharArray()) {
entry.letters[i] = (byte) letter - 65; // Convert ASCII to 0..25
if (i>=2)
entry.triplets[i-2] = (((entry.letters[i-2] << ALPHABET_SIZE) +
entry.letters[i-1]) << ALPHABET_SIZE) +
entry.letters[i];
i++;
}
result.add(entry);
}
word = fileReader.readLine();
}
return result.toArray(new DictionaryEntry[result.size()]);
}
boolean isWrap(int a, int b) { // Checks if move a->b wraps board edge (like 3->4)
return Math.abs(a%BOARD_SIZE-b%BOARD_SIZE)>1;
}
int[] buildTripletIndices() {
ArrayList<Integer> result = new ArrayList<Integer>();
for (int a=0; a<BOARD_SIZE*BOARD_SIZE; a++)
for (int bm: moves) {
int b=a+bm;
if ((b>=0) && (b<board.length) && !isWrap(a, b))
for (int cm: moves) {
int c=b+cm;
if ((c>=0) && (c<board.length) && (c!=a) && !isWrap(b, c)) {
result.add(a);
result.add(b);
result.add(c);
}
}
}
int[] result2 = new int[result.size()];
int i=0;
for (Integer r: result) result2[i++] = r;
return result2;
}
// Variables that depend on the actual game layout
int[] board = new int[BOARD_SIZE*BOARD_SIZE]; // Letters in board
boolean[] possibleTriplets = new boolean[1 << (ALPHABET_SIZE*3)];
DictionaryEntry[] candidateWords;
int candidateCount;
int[] usedBoardPositions;
DictionaryEntry[] foundWords;
int foundCount;
void initializeBoard(String[] letters) {
for (int row=0; row<BOARD_SIZE; row++)
for (int col=0; col<BOARD_SIZE; col++)
board[row*BOARD_SIZE + col] = (byte) letters[row].charAt(col) - 65;
}
void setPossibleTriplets() {
Arrays.fill(possibleTriplets, false); // Reset list
int i=0;
while (i<boardTripletIndices.length) {
int triplet = (((board[boardTripletIndices[i++]] << ALPHABET_SIZE) +
board[boardTripletIndices[i++]]) << ALPHABET_SIZE) +
board[boardTripletIndices[i++]];
possibleTriplets[triplet] = true;
}
}
void checkWordTriplets() {
candidateCount = 0;
for (DictionaryEntry entry: dictionary) {
boolean ok = true;
int len = entry.triplets.length;
for (int t=0; (t<len) && ok; t++)
ok = possibleTriplets[entry.triplets[t]];
if (ok) candidateWords[candidateCount++] = entry;
}
}
void checkWords() { // Can probably be optimized a lot
foundCount = 0;
for (int i=0; i<candidateCount; i++) {
DictionaryEntry candidate = candidateWords[i];
for (int j=0; j<board.length; j++)
if (board[j]==candidate.letters[0]) {
usedBoardPositions[0] = j;
if (checkNextLetters(candidate, 1, j)) {
foundWords[foundCount++] = candidate;
break;
}
}
}
}
boolean checkNextLetters(DictionaryEntry candidate, int letter, int pos) {
if (letter==candidate.letters.length) return true;
int match = candidate.letters[letter];
for (int move: moves) {
int next=pos+move;
if ((next>=0) && (next<board.length) && (board[next]==match) && !isWrap(pos, next)) {
boolean ok = true;
for (int i=0; (i<letter) && ok; i++)
ok = usedBoardPositions[i]!=next;
if (ok) {
usedBoardPositions[letter] = next;
if (checkNextLetters(candidate, letter+1, next)) return true;
}
}
}
return false;
}
// Just some helper functions
String formatTime(long start, long end, long repetitions) {
long time = (end-start)/repetitions;
return time/1000000 + "." + (time/100000) % 10 + "" + (time/10000) % 10 + "ms";
}
String getWord(DictionaryEntry entry) {
char[] result = new char[entry.letters.length];
int i=0;
for (int letter: entry.letters)
result[i++] = (char) (letter+97);
return new String(result);
}
void run() throws IOException {
long start = System.nanoTime();
// The following can be pre-computed and should be replaced by constants
dictionary = buildDictionary("C:/TWL06.txt");
boardTripletIndices = buildTripletIndices();
long precomputed = System.nanoTime();
// The following only needs to run once at the beginning of the program
candidateWords = new DictionaryEntry[dictionary.length]; // WAAAY too generous
foundWords = new DictionaryEntry[dictionary.length]; // WAAAY too generous
usedBoardPositions = new int[maxWordLength];
long initialized = System.nanoTime();
for (int n=1; n<=100; n++) {
// The following needs to run again for every new board
initializeBoard(new String[] {"DGHI",
"KLPS",
"YEUT",
"EORN"});
setPossibleTriplets();
checkWordTriplets();
checkWords();
}
long solved = System.nanoTime();
// Print out result and statistics
System.out.println("Precomputation finished in " + formatTime(start, precomputed, 1)+":");
System.out.println(" Words in the dictionary: "+dictionary.length);
System.out.println(" Longest word: "+maxWordLength+" letters");
System.out.println(" Number of triplet-moves: "+boardTripletIndices.length/3);
System.out.println();
System.out.println("Initialization finished in " + formatTime(precomputed, initialized, 1));
System.out.println();
System.out.println("Board solved in "+formatTime(initialized, solved, 100)+":");
System.out.println(" Number of candidates: "+candidateCount);
System.out.println(" Number of actual words: "+foundCount);
System.out.println();
System.out.println("Words found:");
int w=0;
System.out.print(" ");
for (int i=0; i<foundCount; i++) {
System.out.print(getWord(foundWords[i]));
w++;
if (w==10) {
w=0;
System.out.println(); System.out.print(" ");
} else
if (i<foundCount-1) System.out.print(", ");
}
System.out.println();
}
public static void main(String[] args) throws IOException {
new BoggleSolver().run();
}
}
Here are some results:
For the grid from the picture posted in the original question (DGHI...):
Precomputation finished in 239.59ms:
Words in the dictionary: 178590
Longest word: 15 letters
Number of triplet-moves: 408
Initialization finished in 0.22ms
Board solved in 3.70ms:
Number of candidates: 230
Number of actual words: 163
Words found:
eek, eel, eely, eld, elhi, elk, ern, erupt, erupts, euro
eye, eyer, ghi, ghis, glee, gley, glue, gluer, gluey, glut
gluts, hip, hiply, hips, his, hist, kelp, kelps, kep, kepi
kepis, keps, kept, kern, key, kye, lee, lek, lept, leu
ley, lunt, lunts, lure, lush, lust, lustre, lye, nus, nut
nuts, ore, ort, orts, ouph, ouphs, our, oust, out, outre
outs, oyer, pee, per, pert, phi, phis, pis, pish, plus
plush, ply, plyer, psi, pst, pul, pule, puler, pun, punt
punts, pur, pure, puree, purely, pus, push, put, puts, ree
rely, rep, reply, reps, roe, roue, roup, roups, roust, rout
routs, rue, rule, ruly, run, runt, runts, rupee, rush, rust
rut, ruts, ship, shlep, sip, sipe, spue, spun, spur, spurn
spurt, strep, stroy, stun, stupe, sue, suer, sulk, sulker, sulky
sun, sup, supe, super, sure, surely, tree, trek, trey, troupe
troy, true, truly, tule, tun, tup, tups, turn, tush, ups
urn, uts, yeld, yelk, yelp, yelps, yep, yeps, yore, you
your, yourn, yous
For the letters posted as the example in the original question (FXIE...)
Precomputation finished in 239.68ms:
Words in the dictionary: 178590
Longest word: 15 letters
Number of triplet-moves: 408
Initialization finished in 0.21ms
Board solved in 3.69ms:
Number of candidates: 87
Number of actual words: 76
Words found:
amble, ambo, ami, amie, asea, awa, awe, awes, awl, axil
axile, axle, boil, bole, box, but, buts, east, elm, emboli
fame, fames, fax, lei, lie, lima, limb, limbo, limbs, lime
limes, lob, lobs, lox, mae, maes, maw, maws, max, maxi
mesa, mew, mewl, mews, mil, mile, milo, mix, oil, ole
sae, saw, sea, seam, semi, sew, stub, swam, swami, tub
tubs, tux, twa, twae, twaes, twas, uts, wae, waes, wamble
wame, wames, was, wast, wax, west
For the following 5x5-grid:
R P R I T
A H H L N
I E T E P
Z R Y S G
O G W E Y
it gives this:
Precomputation finished in 240.39ms:
Words in the dictionary: 178590
Longest word: 15 letters
Number of triplet-moves: 768
Initialization finished in 0.23ms
Board solved in 3.85ms:
Number of candidates: 331
Number of actual words: 240
Words found:
aero, aery, ahi, air, airt, airth, airts, airy, ear, egest
elhi, elint, erg, ergo, ester, eth, ether, eye, eyen, eyer
eyes, eyre, eyrie, gel, gelt, gelts, gen, gent, gentil, gest
geste, get, gets, gey, gor, gore, gory, grey, greyest, greys
gyre, gyri, gyro, hae, haet, haets, hair, hairy, hap, harp
heap, hear, heh, heir, help, helps, hen, hent, hep, her
hero, hes, hest, het, hetero, heth, hets, hey, hie, hilt
hilts, hin, hint, hire, hit, inlet, inlets, ire, leg, leges
legs, lehr, lent, les, lest, let, lethe, lets, ley, leys
lin, line, lines, liney, lint, lit, neg, negs, nest, nester
net, nether, nets, nil, nit, ogre, ore, orgy, ort, orts
pah, pair, par, peg, pegs, peh, pelt, pelter, peltry, pelts
pen, pent, pes, pest, pester, pesty, pet, peter, pets, phi
philter, philtre, phiz, pht, print, pst, rah, rai, rap, raphe
raphes, reap, rear, rei, ret, rete, rets, rhaphe, rhaphes, rhea
ria, rile, riles, riley, rin, rye, ryes, seg, sel, sen
sent, senti, set, sew, spelt, spelter, spent, splent, spline, splint
split, stent, step, stey, stria, striae, sty, stye, tea, tear
teg, tegs, tel, ten, tent, thae, the, their, then, these
thesp, they, thin, thine, thir, thirl, til, tile, tiles, tilt
tilter, tilth, tilts, tin, tine, tines, tirl, trey, treys, trog
try, tye, tyer, tyes, tyre, tyro, west, wester, wry, wryest
wye, wyes, wyte, wytes, yea, yeah, year, yeh, yelp, yelps
yen, yep, yeps, yes, yester, yet, yew, yews, zero, zori
For this I used the TWL06 Tournament Scrabble Word List, since the link in the original question no longer works. This file is 1.85MB, so it's a little bit shorter. And the buildDictionary function throws out all words with less than 3 letters.
Here are a couple of observations about the performance of this:
It's about 10 times slower than the reported performance of Victor Nicollet's OCaml implementation. Whether this is caused by the different algorithm, the shorter dictionary he used, the fact that his code is compiled and mine runs in a Java virtual machine, or the performance of our computers (mine is an Intel Q6600 # 2.4MHz running WinXP), I don't know. But it's much faster than the results for the other implementations quoted at the end of the original question. So, whether this algorithm is superior to the trie dictionary or not, I don't know at this point.
The table method used in checkWordTriplets() yields a very good approximation to the actual answers. Only 1 in 3-5 words passed by it will fail the checkWords() test (See number of candidates vs. number of actual words above).
Something you can't see above: The checkWordTriplets() function takes about 3.65ms and is therefore fully dominant in the search process. The checkWords() function takes up pretty much the remaining 0.05-0.20 ms.
The execution time of the checkWordTriplets() function depends linearly on the dictionary size and is virtually independent of board size!
The execution time of checkWords() depends on the board size and the number of words not ruled out by checkWordTriplets().
The checkWords() implementation above is the dumbest first version I came up with. It is basically not optimized at all. But compared to checkWordTriplets() it is irrelevant for the total performance of the application, so I didn't worry about it. But, if the board size gets bigger, this function will get slower and slower and will eventually start to matter. Then, it would need to be optimized as well.
One nice thing about this code is its flexibility:
You can easily change the board size: Update line 10 and the String array passed to initializeBoard().
It can support larger/different alphabets and can handle things like treating 'Qu' as one letter without any performance overhead. To do this, one would need to update line 9 and the couple of places where characters are converted to numbers (currently simply by subtracting 65 from the ASCII value)
Ok, but I think by now this post is waaaay long enough. I can definitely answer any questions you might have, but let's move that to the comments.
Surprisingly, no one attempted a PHP version of this.
This is a working PHP version of John Fouhy's Python solution.
Although I took some pointers from everyone else's answers, this is mostly copied from John.
$boggle = "fxie
amlo
ewbx
astu";
$alphabet = str_split(str_replace(array("\n", " ", "\r"), "", strtolower($boggle)));
$rows = array_map('trim', explode("\n", $boggle));
$dictionary = file("C:/dict.txt");
$prefixes = array(''=>'');
$words = array();
$regex = '/[' . implode('', $alphabet) . ']{3,}$/S';
foreach($dictionary as $k=>$value) {
$value = trim(strtolower($value));
$length = strlen($value);
if(preg_match($regex, $value)) {
for($x = 0; $x < $length; $x++) {
$letter = substr($value, 0, $x+1);
if($letter == $value) {
$words[$value] = 1;
} else {
$prefixes[$letter] = 1;
}
}
}
}
$graph = array();
$chardict = array();
$positions = array();
$c = count($rows);
for($i = 0; $i < $c; $i++) {
$l = strlen($rows[$i]);
for($j = 0; $j < $l; $j++) {
$chardict[$i.','.$j] = $rows[$i][$j];
$children = array();
$pos = array(-1,0,1);
foreach($pos as $z) {
$xCoord = $z + $i;
if($xCoord < 0 || $xCoord >= count($rows)) {
continue;
}
$len = strlen($rows[0]);
foreach($pos as $w) {
$yCoord = $j + $w;
if(($yCoord < 0 || $yCoord >= $len) || ($z == 0 && $w == 0)) {
continue;
}
$children[] = array($xCoord, $yCoord);
}
}
$graph['None'][] = array($i, $j);
$graph[$i.','.$j] = $children;
}
}
function to_word($chardict, $prefix) {
$word = array();
foreach($prefix as $v) {
$word[] = $chardict[$v[0].','.$v[1]];
}
return implode("", $word);
}
function find_words($graph, $chardict, $position, $prefix, $prefixes, &$results, $words) {
$word = to_word($chardict, $prefix);
if(!isset($prefixes[$word])) return false;
if(isset($words[$word])) {
$results[] = $word;
}
foreach($graph[$position] as $child) {
if(!in_array($child, $prefix)) {
$newprefix = $prefix;
$newprefix[] = $child;
find_words($graph, $chardict, $child[0].','.$child[1], $newprefix, $prefixes, $results, $words);
}
}
}
$solution = array();
find_words($graph, $chardict, 'None', array(), $prefixes, $solution);
print_r($solution);
Here is a live link if you want to try it out. Although it takes ~2s in my local machine, it takes ~5s on my webserver. In either case, it is not very fast. Still, though, it is quite hideous so I can imagine the time can be reduced significantly. Any pointers on how to accomplish that would be appreciated. PHP's lack of tuples made the coordinates weird to work with and my inability to comprehend just what the hell is going on didn't help at all.
EDIT: A few fixes make it take less than 1s locally.
Not interested in VB? :) I couldn't resist. I've solved this differently than many of the solutions presented here.
My times are:
Loading the dictionary and word prefixes into a hashtable: .5 to 1 seconds.
Finding the words: averaging under 10 milliseconds.
EDIT: Dictionary load times on the web host server are running about 1 to 1.5 seconds longer than my home computer.
I don't know how badly the times will deteriorate with a load on the server.
I wrote my solution as a web page in .Net. myvrad.com/boggle
I'm using the dictionary referenced in the original question.
Letters are not reused in a word. Only words 3 characters or longer are found.
I'm using a hashtable of all unique word prefixes and words instead of a trie. I didn't know about trie's so I learned something there. The idea of creating a list of prefixes of words in addition to the complete words is what finally got my times down to a respectable number.
Read the code comments for additional details.
Here's the code:
Imports System.Collections.Generic
Imports System.IO
Partial Class boggle_Default
'Bob Archer, 4/15/2009
'To avoid using a 2 dimensional array in VB I'm not using typical X,Y
'coordinate iteration to find paths.
'
'I have locked the code into a 4 by 4 grid laid out like so:
' abcd
' efgh
' ijkl
' mnop
'
'To find paths the code starts with a letter from a to p then
'explores the paths available around it. If a neighboring letter
'already exists in the path then we don't go there.
'
'Neighboring letters (grid points) are hard coded into
'a Generic.Dictionary below.
'Paths is a list of only valid Paths found.
'If a word prefix or word is not found the path is not
'added and extending that path is terminated.
Dim Paths As New Generic.List(Of String)
'NeighborsOf. The keys are the letters a to p.
'The value is a string of letters representing neighboring letters.
'The string of neighboring letters is split and iterated later.
Dim NeigborsOf As New Generic.Dictionary(Of String, String)
'BoggleLetters. The keys are mapped to the lettered grid of a to p.
'The values are what the user inputs on the page.
Dim BoggleLetters As New Generic.Dictionary(Of String, String)
'Used to store last postition of path. This will be a letter
'from a to p.
Dim LastPositionOfPath As String = ""
'I found a HashTable was by far faster than a Generic.Dictionary
' - about 10 times faster. This stores prefixes of words and words.
'I determined 792773 was the number of words and unique prefixes that
'will be generated from the dictionary file. This is a max number and
'the final hashtable will not have that many.
Dim HashTableOfPrefixesAndWords As New Hashtable(792773)
'Stores words that are found.
Dim FoundWords As New Generic.List(Of String)
'Just to validate what the user enters in the grid.
Dim ErrorFoundWithSubmittedLetters As Boolean = False
Public Sub BuildAndTestPathsAndFindWords(ByVal ThisPath As String)
'Word is the word correlating to the ThisPath parameter.
'This path would be a series of letters from a to p.
Dim Word As String = ""
'The path is iterated through and a word based on the actual
'letters in the Boggle grid is assembled.
For i As Integer = 0 To ThisPath.Length - 1
Word += Me.BoggleLetters(ThisPath.Substring(i, 1))
Next
'If my hashtable of word prefixes and words doesn't contain this Word
'Then this isn't a word and any further extension of ThisPath will not
'yield any words either. So exit sub to terminate exploring this path.
If Not HashTableOfPrefixesAndWords.ContainsKey(Word) Then Exit Sub
'The value of my hashtable is a boolean representing if the key if a word (true) or
'just a prefix (false). If true and at least 3 letters long then yay! word found.
If HashTableOfPrefixesAndWords(Word) AndAlso Word.Length > 2 Then Me.FoundWords.Add(Word)
'If my List of Paths doesn't contain ThisPath then add it.
'Remember only valid paths will make it this far. Paths not found
'in the HashTableOfPrefixesAndWords cause this sub to exit above.
If Not Paths.Contains(ThisPath) Then Paths.Add(ThisPath)
'Examine the last letter of ThisPath. We are looking to extend the path
'to our neighboring letters if any are still available.
LastPositionOfPath = ThisPath.Substring(ThisPath.Length - 1, 1)
'Loop through my list of neighboring letters (representing grid points).
For Each Neighbor As String In Me.NeigborsOf(LastPositionOfPath).ToCharArray()
'If I find a neighboring grid point that I haven't already used
'in ThisPath then extend ThisPath and feed the new path into
'this recursive function. (see recursive.)
If Not ThisPath.Contains(Neighbor) Then Me.BuildAndTestPathsAndFindWords(ThisPath & Neighbor)
Next
End Sub
Protected Sub ButtonBoggle_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles ButtonBoggle.Click
'User has entered the 16 letters and clicked the go button.
'Set up my Generic.Dictionary of grid points, I'm using letters a to p -
'not an x,y grid system. The values are neighboring points.
NeigborsOf.Add("a", "bfe")
NeigborsOf.Add("b", "cgfea")
NeigborsOf.Add("c", "dhgfb")
NeigborsOf.Add("d", "hgc")
NeigborsOf.Add("e", "abfji")
NeigborsOf.Add("f", "abcgkjie")
NeigborsOf.Add("g", "bcdhlkjf")
NeigborsOf.Add("h", "cdlkg")
NeigborsOf.Add("i", "efjnm")
NeigborsOf.Add("j", "efgkonmi")
NeigborsOf.Add("k", "fghlponj")
NeigborsOf.Add("l", "ghpok")
NeigborsOf.Add("m", "ijn")
NeigborsOf.Add("n", "ijkom")
NeigborsOf.Add("o", "jklpn")
NeigborsOf.Add("p", "klo")
'Retrieve letters the user entered.
BoggleLetters.Add("a", Me.TextBox1.Text.ToLower.Trim())
BoggleLetters.Add("b", Me.TextBox2.Text.ToLower.Trim())
BoggleLetters.Add("c", Me.TextBox3.Text.ToLower.Trim())
BoggleLetters.Add("d", Me.TextBox4.Text.ToLower.Trim())
BoggleLetters.Add("e", Me.TextBox5.Text.ToLower.Trim())
BoggleLetters.Add("f", Me.TextBox6.Text.ToLower.Trim())
BoggleLetters.Add("g", Me.TextBox7.Text.ToLower.Trim())
BoggleLetters.Add("h", Me.TextBox8.Text.ToLower.Trim())
BoggleLetters.Add("i", Me.TextBox9.Text.ToLower.Trim())
BoggleLetters.Add("j", Me.TextBox10.Text.ToLower.Trim())
BoggleLetters.Add("k", Me.TextBox11.Text.ToLower.Trim())
BoggleLetters.Add("l", Me.TextBox12.Text.ToLower.Trim())
BoggleLetters.Add("m", Me.TextBox13.Text.ToLower.Trim())
BoggleLetters.Add("n", Me.TextBox14.Text.ToLower.Trim())
BoggleLetters.Add("o", Me.TextBox15.Text.ToLower.Trim())
BoggleLetters.Add("p", Me.TextBox16.Text.ToLower.Trim())
'Validate user entered something with a length of 1 for all 16 textboxes.
For Each S As String In BoggleLetters.Keys
If BoggleLetters(S).Length <> 1 Then
ErrorFoundWithSubmittedLetters = True
Exit For
End If
Next
'If input is not valid then...
If ErrorFoundWithSubmittedLetters Then
'Present error message.
Else
'Else assume we have 16 letters to work with and start finding words.
Dim SB As New StringBuilder
Dim Time As String = String.Format("{0}:{1}:{2}:{3}", Date.Now.Hour.ToString(), Date.Now.Minute.ToString(), Date.Now.Second.ToString(), Date.Now.Millisecond.ToString())
Dim NumOfLetters As Integer = 0
Dim Word As String = ""
Dim TempWord As String = ""
Dim Letter As String = ""
Dim fr As StreamReader = Nothing
fr = New System.IO.StreamReader(HttpContext.Current.Request.MapPath("~/boggle/dic.txt"))
'First fill my hashtable with word prefixes and words.
'HashTable(PrefixOrWordString, BooleanTrueIfWordFalseIfPrefix)
While fr.Peek <> -1
Word = fr.ReadLine.Trim()
TempWord = ""
For i As Integer = 0 To Word.Length - 1
Letter = Word.Substring(i, 1)
'This optimization helped quite a bit. Words in the dictionary that begin
'with letters that the user did not enter in the grid shouldn't go in my hashtable.
'
'I realize most of the solutions went with a Trie. I'd never heard of that before,
'which is one of the neat things about SO, seeing how others approach challenges
'and learning some best practices.
'
'However, I didn't code a Trie in my solution. I just have a hashtable with
'all words in the dicitonary file and all possible prefixes for those words.
'A Trie might be faster but I'm not coding it now. I'm getting good times with this.
If i = 0 AndAlso Not BoggleLetters.ContainsValue(Letter) Then Continue While
TempWord += Letter
If Not HashTableOfPrefixesAndWords.ContainsKey(TempWord) Then
HashTableOfPrefixesAndWords.Add(TempWord, TempWord = Word)
End If
Next
End While
SB.Append("Number of Word Prefixes and Words in Hashtable: " & HashTableOfPrefixesAndWords.Count.ToString())
SB.Append("<br />")
SB.Append("Loading Dictionary: " & Time & " - " & String.Format("{0}:{1}:{2}:{3}", Date.Now.Hour.ToString(), Date.Now.Minute.ToString(), Date.Now.Second.ToString(), Date.Now.Millisecond.ToString()))
SB.Append("<br />")
Time = String.Format("{0}:{1}:{2}:{3}", Date.Now.Hour.ToString(), Date.Now.Minute.ToString(), Date.Now.Second.ToString(), Date.Now.Millisecond.ToString())
'This starts a path at each point on the grid an builds a path until
'the string of letters correlating to the path is not found in the hashtable
'of word prefixes and words.
Me.BuildAndTestPathsAndFindWords("a")
Me.BuildAndTestPathsAndFindWords("b")
Me.BuildAndTestPathsAndFindWords("c")
Me.BuildAndTestPathsAndFindWords("d")
Me.BuildAndTestPathsAndFindWords("e")
Me.BuildAndTestPathsAndFindWords("f")
Me.BuildAndTestPathsAndFindWords("g")
Me.BuildAndTestPathsAndFindWords("h")
Me.BuildAndTestPathsAndFindWords("i")
Me.BuildAndTestPathsAndFindWords("j")
Me.BuildAndTestPathsAndFindWords("k")
Me.BuildAndTestPathsAndFindWords("l")
Me.BuildAndTestPathsAndFindWords("m")
Me.BuildAndTestPathsAndFindWords("n")
Me.BuildAndTestPathsAndFindWords("o")
Me.BuildAndTestPathsAndFindWords("p")
SB.Append("Finding Words: " & Time & " - " & String.Format("{0}:{1}:{2}:{3}", Date.Now.Hour.ToString(), Date.Now.Minute.ToString(), Date.Now.Second.ToString(), Date.Now.Millisecond.ToString()))
SB.Append("<br />")
SB.Append("Num of words found: " & FoundWords.Count.ToString())
SB.Append("<br />")
SB.Append("<br />")
FoundWords.Sort()
SB.Append(String.Join("<br />", FoundWords.ToArray()))
'Output results.
Me.LiteralBoggleResults.Text = SB.ToString()
Me.PanelBoggleResults.Visible = True
End If
End Sub
End Class
As soon as I saw the problem statement, I thought "Trie". But seeing as several other posters made use of that approach, I looked for another approach just to be different. Alas, the Trie approach performs better. I ran Kent's Perl solution on my machine and it took 0.31 seconds to run, after adapting it to use my dictionary file. My own perl implementation required 0.54 seconds to run.
This was my approach:
Create a transition hash to model the legal transitions.
Iterate through all 16^3 possible three letter combinations.
In the loop, exclude illegal transitions and repeat visits to the
same square. Form all the legal 3-letter sequences and store them in a hash.
Then loop through all words in the dictionary.
Exclude words that are too long or short
Slide a 3-letter window across each word and see if it is among the 3-letter combos from step 2. Exclude words that fail. This eliminates most non-matches.
If still not eliminated, use a recursive algorithm to see if the word can be formed by making paths through the puzzle. (This part is slow, but called infrequently.)
Print out the words I found.
I tried 3-letter and 4-letter sequences, but 4-letter sequences slowed the program down.
In my code, I use /usr/share/dict/words for my dictionary. It comes standard on MAC OS X and many Unix systems. You can use another file if you want. To crack a different puzzle, just change the variable #puzzle. This would be easy to adapt for larger matrices. You would just need to change the %transitions hash and %legalTransitions hash.
The strength of this solution is that the code is short, and the data structures simple.
Here is the Perl code (which uses too many global variables, I know):
#!/usr/bin/perl
use Time::HiRes qw{ time };
sub readFile($);
sub findAllPrefixes($);
sub isWordTraceable($);
sub findWordsInPuzzle(#);
my $startTime = time;
# Puzzle to solve
my #puzzle = (
F, X, I, E,
A, M, L, O,
E, W, B, X,
A, S, T, U
);
my $minimumWordLength = 3;
my $maximumPrefixLength = 3; # I tried four and it slowed down.
# Slurp the word list.
my $wordlistFile = "/usr/share/dict/words";
my #words = split(/\n/, uc(readFile($wordlistFile)));
print "Words loaded from word list: " . scalar #words . "\n";
print "Word file load time: " . (time - $startTime) . "\n";
my $postLoad = time;
# Define the legal transitions from one letter position to another.
# Positions are numbered 0-15.
# 0 1 2 3
# 4 5 6 7
# 8 9 10 11
# 12 13 14 15
my %transitions = (
-1 => [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
0 => [1,4,5],
1 => [0,2,4,5,6],
2 => [1,3,5,6,7],
3 => [2,6,7],
4 => [0,1,5,8,9],
5 => [0,1,2,4,6,8,9,10],
6 => [1,2,3,5,7,9,10,11],
7 => [2,3,6,10,11],
8 => [4,5,9,12,13],
9 => [4,5,6,8,10,12,13,14],
10 => [5,6,7,9,11,13,14,15],
11 => [6,7,10,14,15],
12 => [8,9,13],
13 => [8,9,10,12,14],
14 => [9,10,11,13,15],
15 => [10,11,14]
);
# Convert the transition matrix into a hash for easy access.
my %legalTransitions = ();
foreach my $start (keys %transitions) {
my $legalRef = $transitions{$start};
foreach my $stop (#$legalRef) {
my $index = ($start + 1) * (scalar #puzzle) + ($stop + 1);
$legalTransitions{$index} = 1;
}
}
my %prefixesInPuzzle = findAllPrefixes($maximumPrefixLength);
print "Find prefixes time: " . (time - $postLoad) . "\n";
my $postPrefix = time;
my #wordsFoundInPuzzle = findWordsInPuzzle(#words);
print "Find words in puzzle time: " . (time - $postPrefix) . "\n";
print "Unique prefixes found: " . (scalar keys %prefixesInPuzzle) . "\n";
print "Words found (" . (scalar #wordsFoundInPuzzle) . ") :\n " . join("\n ", #wordsFoundInPuzzle) . "\n";
print "Total Elapsed time: " . (time - $startTime) . "\n";
###########################################
sub readFile($) {
my ($filename) = #_;
my $contents;
if (-e $filename) {
# This is magic: it opens and reads a file into a scalar in one line of code.
# See http://www.perl.com/pub/a/2003/11/21/slurp.html
$contents = do { local( #ARGV, $/ ) = $filename ; <> } ;
}
else {
$contents = '';
}
return $contents;
}
# Is it legal to move from the first position to the second? They must be adjacent.
sub isLegalTransition($$) {
my ($pos1,$pos2) = #_;
my $index = ($pos1 + 1) * (scalar #puzzle) + ($pos2 + 1);
return $legalTransitions{$index};
}
# Find all prefixes where $minimumWordLength <= length <= $maxPrefixLength
#
# $maxPrefixLength ... Maximum length of prefix we will store. Three gives best performance.
sub findAllPrefixes($) {
my ($maxPrefixLength) = #_;
my %prefixes = ();
my $puzzleSize = scalar #puzzle;
# Every possible N-letter combination of the letters in the puzzle
# can be represented as an integer, though many of those combinations
# involve illegal transitions, duplicated letters, etc.
# Iterate through all those possibilities and eliminate the illegal ones.
my $maxIndex = $puzzleSize ** $maxPrefixLength;
for (my $i = 0; $i < $maxIndex; $i++) {
my #path;
my $remainder = $i;
my $prevPosition = -1;
my $prefix = '';
my %usedPositions = ();
for (my $prefixLength = 1; $prefixLength <= $maxPrefixLength; $prefixLength++) {
my $position = $remainder % $puzzleSize;
# Is this a valid step?
# a. Is the transition legal (to an adjacent square)?
if (! isLegalTransition($prevPosition, $position)) {
last;
}
# b. Have we repeated a square?
if ($usedPositions{$position}) {
last;
}
else {
$usedPositions{$position} = 1;
}
# Record this prefix if length >= $minimumWordLength.
$prefix .= $puzzle[$position];
if ($prefixLength >= $minimumWordLength) {
$prefixes{$prefix} = 1;
}
push #path, $position;
$remainder -= $position;
$remainder /= $puzzleSize;
$prevPosition = $position;
} # end inner for
} # end outer for
return %prefixes;
}
# Loop through all words in dictionary, looking for ones that are in the puzzle.
sub findWordsInPuzzle(#) {
my #allWords = #_;
my #wordsFound = ();
my $puzzleSize = scalar #puzzle;
WORD: foreach my $word (#allWords) {
my $wordLength = length($word);
if ($wordLength > $puzzleSize || $wordLength < $minimumWordLength) {
# Reject word as too short or too long.
}
elsif ($wordLength <= $maximumPrefixLength ) {
# Word should be in the prefix hash.
if ($prefixesInPuzzle{$word}) {
push #wordsFound, $word;
}
}
else {
# Scan through the word using a window of length $maximumPrefixLength, looking for any strings not in our prefix list.
# If any are found that are not in the list, this word is not possible.
# If no non-matches are found, we have more work to do.
my $limit = $wordLength - $maximumPrefixLength + 1;
for (my $startIndex = 0; $startIndex < $limit; $startIndex ++) {
if (! $prefixesInPuzzle{substr($word, $startIndex, $maximumPrefixLength)}) {
next WORD;
}
}
if (isWordTraceable($word)) {
# Additional test necessary: see if we can form this word by following legal transitions
push #wordsFound, $word;
}
}
}
return #wordsFound;
}
# Is it possible to trace out the word using only legal transitions?
sub isWordTraceable($) {
my $word = shift;
return traverse([split(//, $word)], [-1]); # Start at special square -1, which may transition to any square in the puzzle.
}
# Recursively look for a path through the puzzle that matches the word.
sub traverse($$) {
my ($lettersRef, $pathRef) = #_;
my $index = scalar #$pathRef - 1;
my $position = $pathRef->[$index];
my $letter = $lettersRef->[$index];
my $branchesRef = $transitions{$position};
BRANCH: foreach my $branch (#$branchesRef) {
if ($puzzle[$branch] eq $letter) {
# Have we used this position yet?
foreach my $usedBranch (#$pathRef) {
if ($usedBranch == $branch) {
next BRANCH;
}
}
if (scalar #$lettersRef == $index + 1) {
return 1; # End of word and success.
}
push #$pathRef, $branch;
if (traverse($lettersRef, $pathRef)) {
return 1; # Recursive success.
}
else {
pop #$pathRef;
}
}
}
return 0; # No path found. Failed.
}
I know I'm super late but I made one of these a while ago in PHP - just for fun too...
http://www.lostsockdesign.com.au/sandbox/boggle/index.php?letters=fxieamloewbxastu
Found 75 words (133 pts) in 0.90108 seconds
F.........X..I..............E...............
A......................................M..............................L............................O...............................
E....................W............................B..........................X
A..................S..................................................T.................U....
Gives some indication of what the program is actually doing - each letter is where it starts looking through the patterns while each '.' shows a path that it has tried to take. The more '.' there are the further it has searched.
Let me know if you want the code... it is a horrible mix of PHP and HTML that was never meant to see the light of day so I dare not post it here :P
I spent 3 months working on a solution to the 10 best point dense 5x5 Boggle boards problem.
The problem is now solved and laid out with full disclosure on 5 web pages. Please contact me with questions.
The board analysis algorithm uses an explicit stack to pseudo-recursively traverse the board squares through a Directed Acyclic Word Graph with direct child information, and a time stamp tracking mechanism. This may very well be the world's most advanced lexicon data structure.
The scheme evaluates some 10,000 very good boards per second on a quad core. (9500+ points)
Parent Web Page:
DeepSearch.c - http://www.pathcom.com/~vadco/deep.html
Component Web Pages:
Optimal Scoreboard - http://www.pathcom.com/~vadco/binary.html
Advanced Lexicon Structure - http://www.pathcom.com/~vadco/adtdawg.html
Board Analysis Algorithm - http://www.pathcom.com/~vadco/guns.html
Parallel Batch Processing - http://www.pathcom.com/~vadco/parallel.html
-
This comprehensive body of work will only interest a person who demands the very best.
Does your search algorithm continually decrease the word list as your search continues?
For instance, in the search above there are only 13 letters that your words can start with (effectively reducing to half as many starting letters).
As you add more letter permutations it would further decrease the available word sets decreasing the searching necessary.
I'd start there.
I'd have to give more thought to a complete solution, but as a handy optimisation, I wonder whether it might be worth pre-computing a table of frequencies of digrams and trigrams (2- and 3-letter combinations) based on all the words from your dictionary, and use this to prioritise your search. I'd go with the starting letters of words. So if your dictionary contained the words "India", "Water", "Extreme", and "Extraordinary", then your pre-computed table might be:
'IN': 1
'WA': 1
'EX': 2
Then search for these digrams in the order of commonality (first EX, then WA/IN)
First, read how one of the C# language designers solved a related problem:
http://blogs.msdn.com/ericlippert/archive/2009/02/04/a-nasality-talisman-for-the-sultana-analyst.aspx.
Like him, you can start with a dictionary and the canonacalize words by creating a dictionary from an array of letters sorted alphabetically to a list of words that can be spelled from those letters.
Next, start creating the possible words from the board and looking them up. I suspect that will get you pretty far, but there are certainly more tricks that might speed things up.
I suggest making a tree of letters based on words. The tree would be composed of a letter structs, like this:
letter: char
isWord: boolean
Then you build up the tree, with each depth adding a new letter. In other words, on the first level there'd be the alphabet; then from each of those trees, there'd be another another 26 entries, and so on, until you've spelled out all the words. Hang onto this parsed tree, and it'll make all possible answers faster to look up.
With this parsed tree, you can very quickly find solutions. Here's the pseudo-code:
BEGIN:
For each letter:
if the struct representing it on the current depth has isWord == true, enter it as an answer.
Cycle through all its neighbors; if there is a child of the current node corresponding to the letter, recursively call BEGIN on it.
This could be sped up with a bit of dynamic programming. For example, in your sample, the two 'A's are both next to an 'E' and a 'W', which (from the point they hit them on) would be identical. I don't have enough time to really spell out the code for this, but I think you can gather the idea.
Also, I'm sure you'll find other solutions if you Google for "Boggle solver".
Just for fun, I implemented one in bash.
It is not super fast, but reasonable.
http://dev.xkyle.com/bashboggle/
Hilarious. I nearly posted the same question a few days ago due to the same damn game! I did not however because just searched google for boggle solver python and got all the answers I could want.
I realize this question's time has come and gone, but since I was working on a solver myself, and stumbled onto this while googling about, I thought I should post a reference to mine as it seems a bit different from some of the others.
I chose to go with a flat array for the game board, and to do recursive hunts from each letter on the board, traversing from valid neighbor to valid neighbor, extending the hunt if the current list of letters if a valid prefix in an index. While traversing the notion of the current word is list of indexes into board, not letters that make up a word. When checking the index, the indexes are translated to letters and the check done.
The index is a brute force dictionary that's a bit like a trie, but allows for Pythonic queries of the index. If the words 'cat' and 'cater' are in the list, you'll get this in the dictionary:
d = { 'c': ['cat','cater'],
'ca': ['cat','cater'],
'cat': ['cat','cater'],
'cate': ['cater'],
'cater': ['cater'],
}
So if the current_word is 'ca' you know that it is a valid prefix because 'ca' in d returns True (so continue the board traversal). And if the current_word is 'cat' then you know that it is a valid word because it is a valid prefix and 'cat' in d['cat'] returns True too.
If felt like this allowed for some readable code that doesn't seem too slow. Like everyone else the expense in this system is reading/building the index. Solving the board is pretty much noise.
The code is at http://gist.github.com/268079. It is intentionally vertical and naive with lots of explicit validity checking because I wanted to understand the problem without crufting it up with a bunch of magic or obscurity.
I wrote my solver in C++. I implemented a custom tree structure. I'm not sure it can be considered a trie but it's similar. Each node has 26 branches, 1 for each letter of the alphabet. I traverse the branches of the boggle board in parallel with the branches of my dictionary. If the branch does not exist in the dictionary, I stop searching it on the Boggle board. I convert all the letters on the board to ints. So 'A' = 0. Since it's just arrays, lookup is always O(1). Each node stores if it completes a word and how many words exist in its children. The tree is pruned as words are found to reduce repeatedly searching for the same words. I believe pruning is also O(1).
CPU: Pentium SU2700 1.3GHz
RAM: 3gb
Loads dictionary of 178,590 words in < 1 second.
Solves 100x100 Boggle (boggle.txt) in 4 seconds. ~44,000 words found.
Solving a 4x4 Boggle is too fast to provide a meaningful benchmark. :)
Fast Boggle Solver GitHub Repo
Given a Boggle board with N rows and M columns, let's assume the following:
N*M is substantially greater than the number of possible words
N*M is substantially greater than the longest possible word
Under these assumptions, the complexity of this solution is O(N*M).
I think comparing running times for this one example board in many ways misses the point but, for the sake of completeness, this solution completes in <0.2s on my modern MacBook Pro.
This solution will find all possible paths for each word in the corpus.
#!/usr/bin/env ruby
# Example usage: ./boggle-solver --board "fxie amlo ewbx astu"
autoload :Matrix, 'matrix'
autoload :OptionParser, 'optparse'
DEFAULT_CORPUS_PATH = '/usr/share/dict/words'.freeze
# Functions
def filter_corpus(matrix, corpus, min_word_length)
board_char_counts = Hash.new(0)
matrix.each { |c| board_char_counts[c] += 1 }
max_word_length = matrix.row_count * matrix.column_count
boggleable_regex = /^[#{board_char_counts.keys.reduce(:+)}]{#{min_word_length},#{max_word_length}}$/
corpus.select{ |w| w.match boggleable_regex }.select do |w|
word_char_counts = Hash.new(0)
w.each_char { |c| word_char_counts[c] += 1 }
word_char_counts.all? { |c, count| board_char_counts[c] >= count }
end
end
def neighbors(point, matrix)
i, j = point
([i-1, 0].max .. [i+1, matrix.row_count-1].min).inject([]) do |r, new_i|
([j-1, 0].max .. [j+1, matrix.column_count-1].min).inject(r) do |r, new_j|
neighbor = [new_i, new_j]
neighbor.eql?(point) ? r : r << neighbor
end
end
end
def expand_path(path, word, matrix)
return [path] if path.length == word.length
next_char = word[path.length]
viable_neighbors = neighbors(path[-1], matrix).select do |point|
!path.include?(point) && matrix.element(*point).eql?(next_char)
end
viable_neighbors.inject([]) do |result, point|
result + expand_path(path.dup << point, word, matrix)
end
end
def find_paths(word, matrix)
result = []
matrix.each_with_index do |c, i, j|
result += expand_path([[i, j]], word, matrix) if c.eql?(word[0])
end
result
end
def solve(matrix, corpus, min_word_length: 3)
boggleable_corpus = filter_corpus(matrix, corpus, min_word_length)
boggleable_corpus.inject({}) do |result, w|
paths = find_paths(w, matrix)
result[w] = paths unless paths.empty?
result
end
end
# Script
options = { corpus_path: DEFAULT_CORPUS_PATH }
option_parser = OptionParser.new do |opts|
opts.banner = 'Usage: boggle-solver --board <value> [--corpus <value>]'
opts.on('--board BOARD', String, 'The board (e.g. "fxi aml ewb ast")') do |b|
options[:board] = b
end
opts.on('--corpus CORPUS_PATH', String, 'Corpus file path') do |c|
options[:corpus_path] = c
end
opts.on_tail('-h', '--help', 'Shows usage') do
STDOUT.puts opts
exit
end
end
option_parser.parse!
unless options[:board]
STDERR.puts option_parser
exit false
end
unless File.file? options[:corpus_path]
STDERR.puts "No corpus exists - #{options[:corpus_path]}"
exit false
end
rows = options[:board].downcase.scan(/\S+/).map{ |row| row.scan(/./) }
raw_corpus = File.readlines(options[:corpus_path])
corpus = raw_corpus.map{ |w| w.downcase.rstrip }.uniq.sort
solution = solve(Matrix.rows(rows), corpus)
solution.each_pair do |w, paths|
STDOUT.puts w
paths.each do |path|
STDOUT.puts "\t" + path.map{ |point| point.inspect }.join(', ')
end
end
STDOUT.puts "TOTAL: #{solution.count}"
This solution also gives the direction to search in the given board
Algo:
1. Uses trie to save all the word in the english to fasten the search
2. The uses DFS to search the words in Boggle
Output:
Found "pic" directions from (4,0)(p) go → →
Found "pick" directions from (4,0)(p) go → → ↑
Found "pickman" directions from (4,0)(p) go → → ↑ ↑ ↖ ↑
Found "picket" directions from (4,0)(p) go → → ↑ ↗ ↖
Found "picked" directions from (4,0)(p) go → → ↑ ↗ ↘
Found "pickle" directions from (4,0)(p) go → → ↑ ↘ →
Code:
from collections import defaultdict
from nltk.corpus import words
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
english_words = words.words()
# If you wan to remove stop words
# stop_words = set(stopwords.words('english'))
# english_words = [w for w in english_words if w not in stop_words]
boggle = [
['c', 'n', 't', 's', 's'],
['d', 'a', 't', 'i', 'n'],
['o', 'o', 'm', 'e', 'l'],
['s', 'i', 'k', 'n', 'd'],
['p', 'i', 'c', 'l', 'e']
]
# Instead of X and Y co-ordinates
# better to use Row and column
lenc = len(boggle[0])
lenr = len(boggle)
# Initialize trie datastructure
trie_node = {'valid': False, 'next': {}}
# lets get the delta to find all the nighbors
neighbors_delta = [
(-1,-1, "↖"),
(-1, 0, "↑"),
(-1, 1, "↗"),
(0, -1, "←"),
(0, 1, "→"),
(1, -1, "↙"),
(1, 0, "↓"),
(1, 1, "↘"),
]
def gen_trie(word, node):
"""udpates the trie datastructure using the given word"""
if not word:
return
if word[0] not in node:
node[word[0]] = {'valid': len(word) == 1, 'next': {}}
# recursively build trie
gen_trie(word[1:], node[word[0]])
def build_trie(words, trie):
"""Builds trie data structure from the list of words given"""
for word in words:
gen_trie(word, trie)
return trie
def get_neighbors(r, c):
"""Returns the neighbors for a given co-ordinates"""
n = []
for neigh in neighbors_delta:
new_r = r + neigh[0]
new_c = c + neigh[1]
if (new_r >= lenr) or (new_c >= lenc) or (new_r < 0) or (new_c < 0):
continue
n.append((new_r, new_c, neigh[2]))
return n
def dfs(r, c, visited, trie, now_word, direction):
"""Scan the graph using DFS"""
if (r, c) in visited:
return
letter = boggle[r][c]
visited.append((r, c))
if letter in trie:
now_word += letter
if trie[letter]['valid']:
print('Found "{}" {}'.format(now_word, direction))
neighbors = get_neighbors(r, c)
for n in neighbors:
dfs(n[0], n[1], visited[::], trie[letter], now_word, direction + " " + n[2])
def main(trie_node):
"""Initiate the search for words in boggle"""
trie_node = build_trie(english_words, trie_node)
# print the board
print("Given board")
for i in range(lenr):print (boggle[i])
print ('\n')
for r in range(lenr):
for c in range(lenc):
letter = boggle[r][c]
dfs(r, c, [], trie_node, '', 'directions from ({},{})({}) go '.format(r, c, letter))
if __name__ == '__main__':
main(trie_node)
I have implemented a solution in OCaml. It pre-compiles a dictionary as a trie, and uses two-letter sequence frequencies to eliminate edges that could never appear in a word to further speed up processing.
It solves your example board in 0.35ms (with an additional 6ms start-up time which is mostly related to loading the trie into memory).
The solutions found:
["swami"; "emile"; "limbs"; "limbo"; "limes"; "amble"; "tubs"; "stub";
"swam"; "semi"; "seam"; "awes"; "buts"; "bole"; "boil"; "west"; "east";
"emil"; "lobs"; "limb"; "lime"; "lima"; "mesa"; "mews"; "mewl"; "maws";
"milo"; "mile"; "awes"; "amie"; "axle"; "elma"; "fame"; "ubs"; "tux"; "tub";
"twa"; "twa"; "stu"; "saw"; "sea"; "sew"; "sea"; "awe"; "awl"; "but"; "btu";
"box"; "bmw"; "was"; "wax"; "oil"; "lox"; "lob"; "leo"; "lei"; "lie"; "mes";
"mew"; "mae"; "maw"; "max"; "mil"; "mix"; "awe"; "awl"; "elm"; "eli"; "fax"]
A Node.JS JavaScript solution. Computes all 100 unique words in less than a second which includes reading dictionary file (MBA 2012).
Output:
["FAM","TUX","TUB","FAE","ELI","ELM","ELB","TWA","TWA","SAW","AMI","SWA","SWA","AME","SEA","SEW","AES","AWL","AWE","SEA","AWA","MIX","MIL","AST","ASE","MAX","MAE","MAW","MEW","AWE","MES","AWL","LIE","LIM","AWA","AES","BUT","BLO","WAS","WAE","WEA","LEI","LEO","LOB","LOX","WEM","OIL","OLM","WEA","WAE","WAX","WAF","MILO","EAST","WAME","TWAS","TWAE","EMIL","WEAM","OIME","AXIL","WEST","TWAE","LIMB","WASE","WAST","BLEO","STUB","BOIL","BOLE","LIME","SAWT","LIMA","MESA","MEWL","AXLE","FAME","ASEM","MILE","AMIL","SEAX","SEAM","SEMI","SWAM","AMBO","AMLI","AXILE","AMBLE","SWAMI","AWEST","AWEST","LIMAX","LIMES","LIMBU","LIMBO","EMBOX","SEMBLE","EMBOLE","WAMBLE","FAMBLE"]
Code:
var fs = require('fs')
var Node = function(value, row, col) {
this.value = value
this.row = row
this.col = col
}
var Path = function() {
this.nodes = []
}
Path.prototype.push = function(node) {
this.nodes.push(node)
return this
}
Path.prototype.contains = function(node) {
for (var i = 0, ii = this.nodes.length; i < ii; i++) {
if (this.nodes[i] === node) {
return true
}
}
return false
}
Path.prototype.clone = function() {
var path = new Path()
path.nodes = this.nodes.slice(0)
return path
}
Path.prototype.to_word = function() {
var word = ''
for (var i = 0, ii = this.nodes.length; i < ii; ++i) {
word += this.nodes[i].value
}
return word
}
var Board = function(nodes, dict) {
// Expects n x m array.
this.nodes = nodes
this.words = []
this.row_count = nodes.length
this.col_count = nodes[0].length
this.dict = dict
}
Board.from_raw = function(board, dict) {
var ROW_COUNT = board.length
, COL_COUNT = board[0].length
var nodes = []
// Replace board with Nodes
for (var i = 0, ii = ROW_COUNT; i < ii; ++i) {
nodes.push([])
for (var j = 0, jj = COL_COUNT; j < jj; ++j) {
nodes[i].push(new Node(board[i][j], i, j))
}
}
return new Board(nodes, dict)
}
Board.prototype.toString = function() {
return JSON.stringify(this.nodes)
}
Board.prototype.update_potential_words = function(dict) {
for (var i = 0, ii = this.row_count; i < ii; ++i) {
for (var j = 0, jj = this.col_count; j < jj; ++j) {
var node = this.nodes[i][j]
, path = new Path()
path.push(node)
this.dfs_search(path)
}
}
}
Board.prototype.on_board = function(row, col) {
return 0 <= row && row < this.row_count && 0 <= col && col < this.col_count
}
Board.prototype.get_unsearched_neighbours = function(path) {
var last_node = path.nodes[path.nodes.length - 1]
var offsets = [
[-1, -1], [-1, 0], [-1, +1]
, [ 0, -1], [ 0, +1]
, [+1, -1], [+1, 0], [+1, +1]
]
var neighbours = []
for (var i = 0, ii = offsets.length; i < ii; ++i) {
var offset = offsets[i]
if (this.on_board(last_node.row + offset[0], last_node.col + offset[1])) {
var potential_node = this.nodes[last_node.row + offset[0]][last_node.col + offset[1]]
if (!path.contains(potential_node)) {
// Create a new path if on board and we haven't visited this node yet.
neighbours.push(potential_node)
}
}
}
return neighbours
}
Board.prototype.dfs_search = function(path) {
var path_word = path.to_word()
if (this.dict.contains_exact(path_word) && path_word.length >= 3) {
this.words.push(path_word)
}
var neighbours = this.get_unsearched_neighbours(path)
for (var i = 0, ii = neighbours.length; i < ii; ++i) {
var neighbour = neighbours[i]
var new_path = path.clone()
new_path.push(neighbour)
if (this.dict.contains_prefix(new_path.to_word())) {
this.dfs_search(new_path)
}
}
}
var Dict = function() {
this.dict_array = []
var dict_data = fs.readFileSync('./web2', 'utf8')
var dict_array = dict_data.split('\n')
for (var i = 0, ii = dict_array.length; i < ii; ++i) {
dict_array[i] = dict_array[i].toUpperCase()
}
this.dict_array = dict_array.sort()
}
Dict.prototype.contains_prefix = function(prefix) {
// Binary search
return this.search_prefix(prefix, 0, this.dict_array.length)
}
Dict.prototype.contains_exact = function(exact) {
// Binary search
return this.search_exact(exact, 0, this.dict_array.length)
}
Dict.prototype.search_prefix = function(prefix, start, end) {
if (start >= end) {
// If no more place to search, return no matter what.
return this.dict_array[start].indexOf(prefix) > -1
}
var middle = Math.floor((start + end)/2)
if (this.dict_array[middle].indexOf(prefix) > -1) {
// If we prefix exists, return true.
return true
} else {
// Recurse
if (prefix <= this.dict_array[middle]) {
return this.search_prefix(prefix, start, middle - 1)
} else {
return this.search_prefix(prefix, middle + 1, end)
}
}
}
Dict.prototype.search_exact = function(exact, start, end) {
if (start >= end) {
// If no more place to search, return no matter what.
return this.dict_array[start] === exact
}
var middle = Math.floor((start + end)/2)
if (this.dict_array[middle] === exact) {
// If we prefix exists, return true.
return true
} else {
// Recurse
if (exact <= this.dict_array[middle]) {
return this.search_exact(exact, start, middle - 1)
} else {
return this.search_exact(exact, middle + 1, end)
}
}
}
var board = [
['F', 'X', 'I', 'E']
, ['A', 'M', 'L', 'O']
, ['E', 'W', 'B', 'X']
, ['A', 'S', 'T', 'U']
]
var dict = new Dict()
var b = Board.from_raw(board, dict)
b.update_potential_words()
console.log(JSON.stringify(b.words.sort(function(a, b) {
return a.length - b.length
})))
So I wanted to add another PHP way of solving this, since everyone loves PHP.
There's a little bit of refactoring I would like to do, like using a regexpression match against the dictionary file, but right now I'm just loading the entire dictionary file into a wordList.
I did this using a linked list idea. Each Node has a character value, a location value, and a next pointer.
The location value is how I found out if two nodes are connected.
1 2 3 4
11 12 13 14
21 22 23 24
31 32 33 34
So using that grid, I know two nodes are connected if the first node's location equals the second nodes location +/- 1 for the same row, +/- 9, 10, 11 for the row above and below.
I use recursion for the main search. It takes a word off the wordList, finds all the possible starting points, and then recursively finds the next possible connection, keeping in mind that it can't go to a location it's already using (which is why I add $notInLoc).
Anyway, I know it needs some refactoring, and would love to hear thoughts on how to make it cleaner, but it produces the correct results based on the dictionary file I'm using. Depending on the number of vowels and combinations on the board, it takes about 3 to 6 seconds. I know that once I preg_match the dictionary results, that will reduce significantly.
<?php
ini_set('xdebug.var_display_max_depth', 20);
ini_set('xdebug.var_display_max_children', 1024);
ini_set('xdebug.var_display_max_data', 1024);
class Node {
var $loc;
function __construct($value) {
$this->value = $value;
$next = null;
}
}
class Boggle {
var $root;
var $locList = array (1, 2, 3, 4, 11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34);
var $wordList = [];
var $foundWords = [];
function __construct($board) {
// Takes in a board string and creates all the nodes
$node = new Node($board[0]);
$node->loc = $this->locList[0];
$this->root = $node;
for ($i = 1; $i < strlen($board); $i++) {
$node->next = new Node($board[$i]);
$node->next->loc = $this->locList[$i];
$node = $node->next;
}
// Load in a dictionary file
// Use regexp to elimate all the words that could never appear and load the
// rest of the words into wordList
$handle = fopen("dict.txt", "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
// process the line read.
$line = trim($line);
if (strlen($line) > 2) {
$this->wordList[] = trim($line);
}
}
fclose($handle);
} else {
// error opening the file.
echo "Problem with the file.";
}
}
function isConnected($node1, $node2) {
// Determines if 2 nodes are connected on the boggle board
return (($node1->loc == $node2->loc + 1) || ($node1->loc == $node2->loc - 1) ||
($node1->loc == $node2->loc - 9) || ($node1->loc == $node2->loc - 10) || ($node1->loc == $node2->loc - 11) ||
($node1->loc == $node2->loc + 9) || ($node1->loc == $node2->loc + 10) || ($node1->loc == $node2->loc + 11)) ? true : false;
}
function find($value, $notInLoc = []) {
// Returns a node with the value that isn't in a location
$current = $this->root;
while($current) {
if ($current->value == $value && !in_array($current->loc, $notInLoc)) {
return $current;
}
if (isset($current->next)) {
$current = $current->next;
} else {
break;
}
}
return false;
}
function findAll($value) {
// Returns an array of nodes with a specific value
$current = $this->root;
$foundNodes = [];
while ($current) {
if ($current->value == $value) {
$foundNodes[] = $current;
}
if (isset($current->next)) {
$current = $current->next;
} else {
break;
}
}
return (empty($foundNodes)) ? false : $foundNodes;
}
function findAllConnectedTo($node, $value, $notInLoc = []) {
// Returns an array of nodes that are connected to a specific node and
// contain a specific value and are not in a certain location
$nodeList = $this->findAll($value);
$newList = [];
if ($nodeList) {
foreach ($nodeList as $node2) {
if (!in_array($node2->loc, $notInLoc) && $this->isConnected($node, $node2)) {
$newList[] = $node2;
}
}
}
return (empty($newList)) ? false : $newList;
}
function inner($word, $list, $i = 0, $notInLoc = []) {
$i++;
foreach($list as $node) {
$notInLoc[] = $node->loc;
if ($list2 = $this->findAllConnectedTo($node, $word[$i], $notInLoc)) {
if ($i == (strlen($word) - 1)) {
return true;
} else {
return $this->inner($word, $list2, $i, $notInLoc);
}
}
}
return false;
}
function findWord($word) {
if ($list = $this->findAll($word[0])) {
return $this->inner($word, $list);
}
return false;
}
function findAllWords() {
foreach($this->wordList as $word) {
if ($this->findWord($word)) {
$this->foundWords[] = $word;
}
}
}
function displayBoard() {
$current = $this->root;
for ($i=0; $i < 4; $i++) {
echo $current->value . " " . $current->next->value . " " . $current->next->next->value . " " . $current->next->next->next->value . "<br />";
if ($i < 3) {
$current = $current->next->next->next->next;
}
}
}
}
function randomBoardString() {
return substr(str_shuffle(str_repeat("abcdefghijklmnopqrstuvwxyz", 16)), 0, 16);
}
$myBoggle = new Boggle(randomBoardString());
$myBoggle->displayBoard();
$x = microtime(true);
$myBoggle->findAllWords();
$y = microtime(true);
echo ($y-$x);
var_dump($myBoggle->foundWords);
?>
I know I am really late at the party but I have implemented, as a coding exercise, a boggle solver in several programming languages (C++, Java, Go, C#, Python, Ruby, JavaScript, Julia, Lua, PHP, Perl) and I thought that someone might be interested in those, so I leave link here:
https://github.com/AmokHuginnsson/boggle-solvers
Here is the solution Using Predefined words in NLTK toolkit
NLTK has nltk.corpus package in that we have package called words and it contains more than 2Lakhs English words you can simply use all into your program.
Once creating your matrix convert it into a character array and perform this code
import nltk
from nltk.corpus import words
from collections import Counter
def possibleWords(input, charSet):
for word in input:
dict = Counter(word)
flag = 1
for key in dict.keys():
if key not in charSet:
flag = 0
if flag == 1 and len(word)>5: #its depends if you want only length more than 5 use this otherwise remove that one.
print(word)
nltk.download('words')
word_list = words.words()
# prints 236736
print(len(word_list))
charSet = ['h', 'e', 'l', 'o', 'n', 'v', 't']
possibleWords(word_list, charSet)
Output:
eleven
eleventh
elevon
entente
entone
ethene
ethenol
evolve
evolvent
hellhole
helvell
hooven
letten
looten
nettle
nonene
nonent
nonlevel
notelet
novelet
novelette
novene
teenet
teethe
teevee
telethon
tellee
tenent
tentlet
theelol
toetoe
tonlet
toothlet
tootle
tottle
vellon
velvet
velveteen
venene
vennel
venthole
voeten
volent
volvelle
volvent
voteen
I hope you get it.
Here is my java implementation: https://github.com/zouzhile/interview/blob/master/src/com/interview/algorithms/tree/BoggleSolver.java
Trie build took 0 hours, 0 minutes, 1 seconds, 532 milliseconds
Word searching took 0 hours, 0 minutes, 0 seconds, 92 milliseconds
eel eeler eely eer eke eker eld eleut elk ell
elle epee epihippus ere erept err error erupt eurus eye
eyer eyey hip hipe hiper hippish hipple hippus his hish
hiss hist hler hsi ihi iphis isis issue issuer ist
isurus kee keek keeker keel keeler keep keeper keld kele
kelek kelep kelk kell kelly kelp kelper kep kepi kept
ker kerel kern keup keuper key kyl kyle lee leek
leeky leep leer lek leo leper leptus lepus ler leu
ley lleu lue lull luller lulu lunn lunt lunule luo
lupe lupis lupulus lupus lur lure lurer lush lushly lust
lustrous lut lye nul null nun nupe nurture nurturer nut
oer ore ort ouphish our oust out outpeep outpeer outpipe
outpull outpush output outre outrun outrush outspell outspue outspurn outspurt
outstrut outstunt outsulk outturn outusure oyer pee peek peel peele
peeler peeoy peep peeper peepeye peer pele peleus pell peller
pelu pep peplus pepper pepperer pepsis per pern pert pertussis
peru perule perun peul phi pip pipe piper pipi pipistrel
pipistrelle pipistrellus pipper pish piss pist plup plus plush ply
plyer psi pst puerer pul pule puler pulk pull puller
pulley pullus pulp pulper pulu puly pun punt pup puppis
pur pure puree purely purer purr purre purree purrel purrer
puru purupuru pus push puss pustule put putt puture ree
reek reeker reeky reel reeler reeper rel rely reoutput rep
repel repeller repipe reply repp reps reree rereel rerun reuel
roe roer roey roue rouelle roun roup rouper roust rout
roy rue ruelle ruer rule ruler rull ruller run runt
rupee rupert rupture ruru rus rush russ rust rustre rut
shi shih ship shipper shish shlu sip sipe siper sipper
sis sish sisi siss sissu sist sistrurus speel speer spelk
spell speller splurt spun spur spurn spurrer spurt sput ssi
ssu stre stree streek streel streeler streep streke streperous strepsis
strey stroup stroy stroyer strue strunt strut stu stue stull
stuller stun stunt stupe stupeous stupp sturnus sturt stuss stut
sue suer suerre suld sulk sulker sulky sull sully sulu
sun sunn sunt sunup sup supe super superoutput supper supple
supplely supply sur sure surely surrey sus susi susu susurr
susurrous susurrus sutu suture suu tree treey trek trekker trey
troupe trouper trout troy true truer trull truller truly trun
trush truss trust tshi tst tsun tsutsutsi tue tule tulle
tulu tun tunu tup tupek tupi tur turn turnup turr
turus tush tussis tussur tut tuts tutu tutulus ule ull
uller ulu ululu unreel unrule unruly unrun unrust untrue untruly
untruss untrust unturn unurn upper upperer uppish uppishly uppull uppush
upspurt upsun upsup uptree uptruss upturn ure urn uro uru
urus urushi ush ust usun usure usurer utu yee yeel
yeld yelk yell yeller yelp yelper yeo yep yer yere
yern yoe yor yore you youl youp your yourn yoy
Note:
I used the dictionary and character matrix at the beginning of this thread. The code was run on my MacBookPro, below is some information about the machine.
Model Name: MacBook Pro
Model Identifier: MacBookPro8,1
Processor Name: Intel Core i5
Processor Speed: 2.3 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache (per core): 256 KB
L3 Cache: 3 MB
Memory: 4 GB
Boot ROM Version: MBP81.0047.B0E
SMC Version (system): 1.68f96
I solved this too, with Java. My implementation is 269 lines long and pretty easy to use. First you need to create a new instance of the Boggler class and then call the solve function with the grid as a parameter. It takes about 100 ms to load the dictionary of 50 000 words on my computer and it finds the words in about 10-20 ms. The found words are stored in an ArrayList, foundWords.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URISyntaxException;
import java.net.URL;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
public class Boggler {
private ArrayList<String> words = new ArrayList<String>();
private ArrayList<String> roundWords = new ArrayList<String>();
private ArrayList<Word> foundWords = new ArrayList<Word>();
private char[][] letterGrid = new char[4][4];
private String letters;
public Boggler() throws FileNotFoundException, IOException, URISyntaxException {
long startTime = System.currentTimeMillis();
URL path = GUI.class.getResource("words.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(new File(path.toURI()).getAbsolutePath()), "iso-8859-1"));
String line;
while((line = br.readLine()) != null) {
if(line.length() < 3 || line.length() > 10) {
continue;
}
this.words.add(line);
}
}
public ArrayList<Word> getWords() {
return this.foundWords;
}
public void solve(String letters) {
this.letters = "";
this.foundWords = new ArrayList<Word>();
for(int i = 0; i < letters.length(); i++) {
if(!this.letters.contains(letters.substring(i, i + 1))) {
this.letters += letters.substring(i, i + 1);
}
}
for(int i = 0; i < 4; i++) {
for(int j = 0; j < 4; j++) {
this.letterGrid[i][j] = letters.charAt(i * 4 + j);
}
}
System.out.println(Arrays.deepToString(this.letterGrid));
this.roundWords = new ArrayList<String>();
String pattern = "[" + this.letters + "]+";
for(int i = 0; i < this.words.size(); i++) {
if(this.words.get(i).matches(pattern)) {
this.roundWords.add(this.words.get(i));
}
}
for(int i = 0; i < this.roundWords.size(); i++) {
Word word = checkForWord(this.roundWords.get(i));
if(word != null) {
System.out.println(word);
this.foundWords.add(word);
}
}
}
private Word checkForWord(String word) {
char initial = word.charAt(0);
ArrayList<LetterCoord> startPoints = new ArrayList<LetterCoord>();
int x = 0;
int y = 0;
for(char[] row: this.letterGrid) {
x = 0;
for(char letter: row) {
if(initial == letter) {
startPoints.add(new LetterCoord(x, y));
}
x++;
}
y++;
}
ArrayList<LetterCoord> letterCoords = null;
for(int initialTry = 0; initialTry < startPoints.size(); initialTry++) {
letterCoords = new ArrayList<LetterCoord>();
x = startPoints.get(initialTry).getX();
y = startPoints.get(initialTry).getY();
LetterCoord initialCoord = new LetterCoord(x, y);
letterCoords.add(initialCoord);
letterLoop: for(int letterIndex = 1; letterIndex < word.length(); letterIndex++) {
LetterCoord lastCoord = letterCoords.get(letterCoords.size() - 1);
char currentChar = word.charAt(letterIndex);
ArrayList<LetterCoord> letterLocations = getNeighbours(currentChar, lastCoord.getX(), lastCoord.getY());
if(letterLocations == null) {
return null;
}
for(int foundIndex = 0; foundIndex < letterLocations.size(); foundIndex++) {
if(letterIndex != word.length() - 1 && true == false) {
char nextChar = word.charAt(letterIndex + 1);
int lastX = letterCoords.get(letterCoords.size() - 1).getX();
int lastY = letterCoords.get(letterCoords.size() - 1).getY();
ArrayList<LetterCoord> possibleIndex = getNeighbours(nextChar, lastX, lastY);
if(possibleIndex != null) {
if(!letterCoords.contains(letterLocations.get(foundIndex))) {
letterCoords.add(letterLocations.get(foundIndex));
}
continue letterLoop;
} else {
return null;
}
} else {
if(!letterCoords.contains(letterLocations.get(foundIndex))) {
letterCoords.add(letterLocations.get(foundIndex));
continue letterLoop;
}
}
}
}
if(letterCoords != null) {
if(letterCoords.size() == word.length()) {
Word w = new Word(word);
w.addList(letterCoords);
return w;
} else {
return null;
}
}
}
if(letterCoords != null) {
Word foundWord = new Word(word);
foundWord.addList(letterCoords);
return foundWord;
}
return null;
}
public ArrayList<LetterCoord> getNeighbours(char letterToSearch, int x, int y) {
ArrayList<LetterCoord> neighbours = new ArrayList<LetterCoord>();
for(int _y = y - 1; _y <= y + 1; _y++) {
for(int _x = x - 1; _x <= x + 1; _x++) {
if(_x < 0 || _y < 0 || (_x == x && _y == y) || _y > 3 || _x > 3) {
continue;
}
if(this.letterGrid[_y][_x] == letterToSearch && !neighbours.contains(new LetterCoord(_x, _y))) {
neighbours.add(new LetterCoord(_x, _y));
}
}
}
if(neighbours.isEmpty()) {
return null;
} else {
return neighbours;
}
}
}
class Word {
private String word;
private ArrayList<LetterCoord> letterCoords = new ArrayList<LetterCoord>();
public Word(String word) {
this.word = word;
}
public boolean addCoords(int x, int y) {
LetterCoord lc = new LetterCoord(x, y);
if(!this.letterCoords.contains(lc)) {
this.letterCoords.add(lc);
return true;
}
return false;
}
public void addList(ArrayList<LetterCoord> letterCoords) {
this.letterCoords = letterCoords;
}
#Override
public String toString() {
String outputString = this.word + " ";
for(int i = 0; i < letterCoords.size(); i++) {
outputString += "(" + letterCoords.get(i).getX() + ", " + letterCoords.get(i).getY() + ") ";
}
return outputString;
}
public String getWord() {
return this.word;
}
public ArrayList<LetterCoord> getList() {
return this.letterCoords;
}
}
class LetterCoord extends ArrayList {
private int x;
private int y;
public LetterCoord(int x, int y) {
this.x = x;
this.y = y;
}
public int getX() {
return this.x;
}
public int getY() {
return this.y;
}
#Override
public boolean equals(Object o) {
if(!(o instanceof LetterCoord)) {
return false;
}
LetterCoord lc = (LetterCoord) o;
if(this.x == lc.getX() &&
this.y == lc.getY()) {
return true;
}
return false;
}
#Override
public int hashCode() {
int hash = 7;
hash = 29 * hash + this.x;
hash = 24 * hash + this.y;
return hash;
}
}
I solved this in c. It takes around 48 ms to run on my machine (with around 98% of the time spent loading the dictionary from disk and creating the trie). The dictionary is /usr/share/dict/american-english which has 62886 words.
Source code
I solved this perfectly and very fast. I put it into an android app. View the video at the play store link to see it in action.
Word Cheats is an app that "cracks" any matrix style word game. This app was built
to to help me cheat at word scrambler. It can be used for word searches,
ruzzle, words, word finder, word crack, boggle, and more!
It can be seen here
https://play.google.com/store/apps/details?id=com.harris.wordcracker
View the app in action in the video
https://www.youtube.com/watch?v=DL2974WmNAI

algorithm to sum up a list of numbers for all combinations

I have a list of numbers and I want to add up all the different combinations.
For example:
number as 1,4,7 and 13
the output would be:
1+4=5
1+7=8
1+13=14
4+7=11
4+13=17
7+13=20
1+4+7=12
1+4+13=18
1+7+13=21
4+7+13=24
1+4+7+13=25
Is there a formula to calculate this with different numbers?
A simple way to do this is to create a bit set with as much bits as there are numbers.
In your example 4.
Then count from 0001 to 1111 and sum each number that has a 1 on the set:
Numbers 1,4,7,13:
0001 = 13=13
0010 = 7=7
0011 = 7+13 = 20
1111 = 1+4+7+13 = 25
Here's how a simple recursive solution would look like, in Java:
public static void main(String[] args)
{
f(new int[] {1,4,7,13}, 0, 0, "{");
}
static void f(int[] numbers, int index, int sum, String output)
{
if (index == numbers.length)
{
System.out.println(output + " } = " + sum);
return;
}
// include numbers[index]
f(numbers, index + 1, sum + numbers[index], output + " " + numbers[index]);
// exclude numbers[index]
f(numbers, index + 1, sum, output);
}
Output:
{ 1 4 7 13 } = 25
{ 1 4 7 } = 12
{ 1 4 13 } = 18
{ 1 4 } = 5
{ 1 7 13 } = 21
{ 1 7 } = 8
{ 1 13 } = 14
{ 1 } = 1
{ 4 7 13 } = 24
{ 4 7 } = 11
{ 4 13 } = 17
{ 4 } = 4
{ 7 13 } = 20
{ 7 } = 7
{ 13 } = 13
{ } = 0
The best-known algorithm requires exponential time. If there were a polynomial-time algorithm, then you would solve the subset sum problem, and thus the P=NP problem.
The algorithm here is to create bitvector of length that is equal to the cardinality of your set of numbers. Fix an enumeration (n_i) of your set of numbers. Then, enumerate over all possible values of the bitvector. For each enumeration (e_i) of the bitvector, compute the sum of e_i * n_i.
The intuition here is that you are representing the subsets of your set of numbers by a bitvector and generating all possible subsets of the set of numbers. When bit e_i is equal to one, n_i is in the subset, otherwise it is not.
The fourth volume of Knuth's TAOCP provides algorithms for generating all possible values of the bitvector.
C#:
I was trying to find something more elegant - but this should do the trick for now...
//Set up our array of integers
int[] items = { 1, 3, 5, 7 };
//Figure out how many bitmasks we need...
//4 bits have a maximum value of 15, so we need 15 masks.
//Calculated as:
// (2 ^ ItemCount) - 1
int len = items.Length;
int calcs = (int)Math.Pow(2, len) - 1;
//Create our array of bitmasks... each item in the array
//represents a unique combination from our items array
string[] masks = Enumerable.Range(1, calcs).Select(i => Convert.ToString(i, 2).PadLeft(len, '0')).ToArray();
//Spit out the corresponding calculation for each bitmask
foreach (string m in masks)
{
//Get the items from our array that correspond to
//the on bits in our mask
int[] incl = items.Where((c, i) => m[i] == '1').ToArray();
//Write out our mask, calculation and resulting sum
Console.WriteLine(
"[{0}] {1}={2}",
m,
String.Join("+", incl.Select(c => c.ToString()).ToArray()),
incl.Sum()
);
}
Outputs as:
[0001] 7=7
[0010] 5=5
[0011] 5+7=12
[0100] 3=3
[0101] 3+7=10
[0110] 3+5=8
[0111] 3+5+7=15
[1000] 1=1
[1001] 1+7=8
[1010] 1+5=6
[1011] 1+5+7=13
[1100] 1+3=4
[1101] 1+3+7=11
[1110] 1+3+5=9
[1111] 1+3+5+7=16
Here is a simple recursive Ruby implementation:
a = [1, 4, 7, 13]
def add(current, ary, idx, sum)
(idx...ary.length).each do |i|
add(current + [ary[i]], ary, i+1, sum + ary[i])
end
puts "#{current.join('+')} = #{sum}" if current.size > 1
end
add([], a, 0, 0)
Which prints
1+4+7+13 = 25
1+4+7 = 12
1+4+13 = 18
1+4 = 5
1+7+13 = 21
1+7 = 8
1+13 = 14
4+7+13 = 24
4+7 = 11
4+13 = 17
7+13 = 20
If you do not need to print the array at each step, the code can be made even simpler and much faster because no additional arrays are created:
def add(ary, idx, sum)
(idx...ary.length).each do |i|
add(ary, i+1, sum + ary[i])
end
puts sum
end
add(a, 0, 0)
I dont think you can have it much simpler than that.
Mathematica solution:
{#, Total##}& /# Subsets[{1, 4, 7, 13}] //MatrixForm
Output:
{} 0
{1} 1
{4} 4
{7} 7
{13} 13
{1,4} 5
{1,7} 8
{1,13} 14
{4,7} 11
{4,13} 17
{7,13} 20
{1,4,7} 12
{1,4,13} 18
{1,7,13} 21
{4,7,13} 24
{1,4,7,13} 25
This Perl program seems to do what you want. It goes through the different ways to choose n items from k items. It's easy to calculate how many combinations there are, but getting the sums of each combination means you have to add them eventually. I had a similar question on Perlmonks when I was asking How can I calculate the right combination of postage stamps?.
The Math::Combinatorics module can also handle many other cases. Even if you don't want to use it, the documentation has a lot of pointers to other information about the problem. Other people might be able to suggest the appropriate library for the language you'd like to you.
#!/usr/bin/perl
use List::Util qw(sum);
use Math::Combinatorics;
my #n = qw(1 4 7 13);
foreach my $count ( 2 .. #n ) {
my $c = Math::Combinatorics->new(
count => $count, # number to choose
data => [#n],
);
print "combinations of $count from: [" . join(" ",#n) . "]\n";
while( my #combo = $c->next_combination ){
print join( ' ', #combo ), " = ", sum( #combo ) , "\n";
}
}
You can enumerate all subsets using a bitvector.
In a for loop, go from 0 to 2 to the Nth power minus 1 (or start with 1 if you don't care about the empty set).
On each iteration, determine which bits are set. The Nth bit represents the Nth element of the set. For each set bit, dereference the appropriate element of the set and add to an accumulated value.
ETA: Because the nature of this problem involves exponential complexity, there's a practical limit to size of the set you can enumerate on. If it turns out you don't need all subsets, you can look up "n choose k" for ways of enumerating subsets of k elements.
PHP: Here's a non-recursive implementation. I'm not saying this is the most efficient way to do it (this is indeed exponential 2^N - see JasonTrue's response and comments), but it works for a small set of elements. I just wanted to write something quick to obtain results. I based the algorithm off Toon's answer.
$set = array(3, 5, 8, 13, 19);
$additions = array();
for($i = 0; $i < pow(2, count($set)); $i++){
$sum = 0;
$addends = array();
for($j = count($set)-1; $j >= 0; $j--) {
if(pow(2, $j) & $i) {
$sum += $set[$j];
$addends[] = $set[$j];
}
}
$additions[] = array($sum, $addends);
}
sort($additions);
foreach($additions as $addition){
printf("%d\t%s\n", $addition[0], implode('+', $addition[1]));
}
Which will output:
0
3 3
5 5
8 8
8 5+3
11 8+3
13 13
13 8+5
16 13+3
16 8+5+3
18 13+5
19 19
21 13+8
21 13+5+3
22 19+3
24 19+5
24 13+8+3
26 13+8+5
27 19+8
27 19+5+3
29 13+8+5+3
30 19+8+3
32 19+13
32 19+8+5
35 19+13+3
35 19+8+5+3
37 19+13+5
40 19+13+8
40 19+13+5+3
43 19+13+8+3
45 19+13+8+5
48 19+13+8+5+3
For example, a case for this could be a set of resistance bands for working out. Say you get 5 bands each having different resistances represented in pounds and you can combine bands to sum up the total resistance. The bands resistances are 3, 5, 8, 13 and 19 pounds. This set gives you 32 (2^5) possible configurations, minus the zero. In this example, the algorithm returns the data sorted by ascending total resistance favoring efficient band configurations first, and for each configuration the bands are sorted by descending resistance.
This is not the code to generate the sums, but it generates the permutations. In your case:
1; 1,4; 1,7; 4,7; 1,4,7; ...
If I have a moment over the weekend, and if it's interesting, I can modify this to come up with the sums.
It's just a fun chunk of LINQ code from Igor Ostrovsky's blog titled "7 tricks to simplify your programs with LINQ" (http://igoro.com/archive/7-tricks-to-simplify-your-programs-with-linq/).
T[] arr = …;
var subsets = from m in Enumerable.Range(0, 1 << arr.Length)
select
from i in Enumerable.Range(0, arr.Length)
where (m & (1 << i)) != 0
select arr[i];
You might be interested in checking out the GNU Scientific Library if you want to avoid maintenance costs. The actual process of summing longer sequences will become very expensive (more-so than generating a single permutation on a step basis), most architectures have SIMD/vector instructions that can provide rather impressive speed-up (I would provide examples of such implementations but I cannot post URLs yet).
Thanks Zach,
I am creating a Bank Reconciliation solution. I dropped your code into jsbin.com to do some quick testing and produced this in Javascript:
function f(numbers,ids, index, sum, output, outputid, find )
{
if (index == numbers.length){
var x ="";
if (find == sum) {
y= output + " } = " + sum + " " + outputid + " }<br/>" ;
}
return;
}
f(numbers,ids, index + 1, sum + numbers[index], output + " " + numbers[index], outputid + " " + ids[index], find);
f(numbers,ids, index + 1, sum, output, outputid,find);
}
var y;
f( [1.2,4,7,13,45,325,23,245,78,432,1,2,6],[1,2,3,4,5,6,7,8,9,10,11,12,13], 0, 0, '{','{', 24.2);
if (document.getElementById('hello')) {
document.getElementById('hello').innerHTML = y;
}
I need it to produce a list of ID's to exclude from the next matching number.
I will post back my final solution using vb.net
v=[1,2,3,4]#variables to sum
i=0
clis=[]#check list for solution excluding the variables itself
def iterate(lis,a,b):
global i
global clis
while len(b)!=0 and i<len(lis):
a=lis[i]
b=lis[i+1:]
if len(b)>1:
t=a+sum(b)
clis.append(t)
for j in b:
clis.append(a+j)
i+=1
iterate(lis,a,b)
iterate(v,0,v)
its written in python. the idea is to break the list in a single integer and a list for eg. [1,2,3,4] into 1,[2,3,4]. we append the total sum now by adding the integer and sum of remaining list.also we take each individual sum i.e 1,2;1,3;1,4. checklist shall now be [1+2+3+4,1+2,1+3,1+4] then we call the new list recursively i.e now int=2,list=[3,4]. checklist will now append [2+3+4,2+3,2+4] accordingly we append the checklist till list is empty.
set is the set of sums and list is the list of the original numbers.
Its Java.
public void subSums() {
Set<Long> resultSet = new HashSet<Long>();
for(long l: list) {
for(long s: set) {
resultSet.add(s);
resultSet.add(l + s);
}
resultSet.add(l);
set.addAll(resultSet);
resultSet.clear();
}
}
public static void main(String[] args) {
// this is an example number
long number = 245L;
int sum = 0;
if (number > 0) {
do {
int last = (int) (number % 10);
sum = (sum + last) % 9;
} while ((number /= 10) > 0);
System.err.println("s = " + (sum==0 ? 9:sum);
} else {
System.err.println("0");
}
}

Resources