Any algorithm to find the double trouble number? - algorithm

I was trying to code the double trouble number problem, but before that not able to finalize the algorithm.
Anybody has any idea?
Problem Statement -
The numbers has the following property -
Whenever you would right-rotate the number (that is, take away the
last digit and put it in front of the number), you would end up with
double the original number. Numbers possessing this property were
called double-trouble numbers. For example, X = 421052631578947368 is
a double-trouble number, since 2X = 842105263157894736 which is a
right rotation of X.
The number X is a double-trouble number in the number system with base
10. Any number system with base p >= 2 , however, has many such double-trouble numbers. In the binary number system (base p = 2), for
example, we have the double-trouble numbers 01 and 0101. Notice that
the leading zeros are necessary here in order to obtain the proper
number after right rotation.
In the binary number system the smallest double-trouble number is 01. In > the decimal (p = 10) number system, the smallest double-trouble number
is 052631578947368421. I need to write a program that computes for a
given base p of a number system the smallest double-trouble number in
that system.

Here's the brute force solution in JavaScript.
It starts with a digit, then prepends the double of the previous digit (plus carry).
After each iteraion it tests if the digits are a double trouble number (it also tries the prepend by "0" corner/ambiguous case)
This implementation is only for base 10; you'll have to understand the algorithm and modify the code to create an arbitrary base abstraction.
Double Trouble Solver for base 10
// (digits * 2) == digits[n]:digits[1..n-1]
function isDT(digits) {
var times2 = "";
var carry = false;
for(var i = digits.length-1; i >= 0; i--) {
var d = parseInt(digits.charAt(i));
var d2 = "" + (d * 2 + (carry ? 1 : 0));
carry = d2.length > 1;
times2 = d2.charAt(d2.length > 1 ? 1 : 0) + times2;
}
if(carry) { times2 = "1" + times2; }
return times2 == (digits.charAt(digits.length -1) + digits.substring(0, digits.length -1));
}
// generate a doule trouble number from a starting digit
function makeDT(digits, carry) {
var carry = carry || false;
var digits = "" + digits;
if(carry && isDT("1" + digits)) {
return "1" + digits;
} else if(isDT(digits)) {
return digits;
} else if(isDT("0" + digits)) {
return "0" + digits;
}
var d = digits.charAt(0);
var d2 = "" + (d * 2 + (carry ? 1 : 0));
carry = d2.length > 1;
digits = d2.charAt(d2.length > 1 ? 1 : 0) + digits;
return makeDT(digits, carry);
}
//
alert(makeDT("9"));
alert(makeDT("8"));
alert(makeDT("7"));
alert(makeDT("6"));
alert(makeDT("5"));
alert(makeDT("4"));
alert(makeDT("3"));
alert(makeDT("2"));
alert(makeDT("1"));
EDIT Here's the jsfiddle http://jsfiddle.net/avbfae0w/

Related

Compact way to produce a large sequence of strings in lexical order

I want to generate a sequence of strings with the following properties:
Lexically ordered
Theoretically infinite
Compact over a realistic range
Generated by a simple process of incrementation
Matches the regexp /\w+/
The obvious way to generate a lexically-ordered sequence is to choose a string length and pad the strings with a base value like this: 000000, 000001, etc. This approach poses a trade-off between the number of permutations and compactness: a string long enough to yield many permutations will be filled many zeros along the way. Plus, the length I choose sets an upper bound on the total number of permutations unless I have some mechanism for expanding the string when it maxes out.
So I came up with a sequence that works like this:
Each string consists of a "head", which is a base-36 number, followed by an underscore, and then the "tail", which is also a base-36 number padded by an increasing number of zeros
The first cycle goes from 0_0 to 0_z
The second cycle goes from 1_00 to 1_zz
The third cycle goes from 2_000 to 2_zzz, and so on
Once the head has reached z and the tail consists of 36 zs, the first "supercycle" has ended. Now the whole sequence starts over, except the z remains at the beginning, so the new cycle starts with z0_0, then continues to z1_00, and so on
The second supercycle goes zz0_0, zz1_00, and so on
Although the string of zs in the head could become unwieldy over the long run, a single supercycle contains over 10^56 permutations, which is far more than I ever expect to use. The sequence is theoretically infinite but very compact within a realistic range. For instance, the trillionth permutation is a succinct 7_bqd55h8s.
I can generate the sequence relatively simply with this javascript function:
function genStr (n) {
n = BigInt(n);
let prefix = "",
cycle = 0n,
max = 36n ** (cycle + 1n);
while (n >= max) {
n -= max;
if (cycle === 35n) {
prefix += "z";
cycle = 0n;
} else {
cycle++;
}
max = 36n ** (cycle + 1n);
}
return prefix
+ cycle.toString(36)
+ "_"
+ n.toString(36).padStart(Number(cycle) + 1, 0);
}
The n parameter is a number that I increment and pass to the function to get the next member of the sequence. All I need to keep track of is a simple integer, making the sequence very easy to use.
So obviously I spent a lot of time on this and I think it's pretty good, but I'm wondering if there is a better way. Is there a good algorithm for generating a sequence along the lines of the one I'm looking for?
A close idea to yours. (more rafined than my first edit...).
Let our alphabet be A = {0,1,2,3}.
Let |2| mean we iterate from 0 to 2 and |2|^2 mean we generate the cartesian product in a lexically sorted manner (00,01,10,11).
We start with
0 |3|
So we have a string of length 2. We "unshift" the digit 1 which "factorizes" since any 0|3|... is less than 1|3|^2.
1 |3|^2
Same idea: unshift 2, and make words of length 4.
2 |3|^3
Now we can continue and generate
3 |2| |3|^3
Notice |2| and not |3|. Now our maximum number becomes 32333. And as you did, we can now add the carry and start a new supercycle:
33 0|3|
This is a slight improvement, since _ can now be part of our alphabet: we don't need to reserve it as a token separator.
In our case we can represent in a supercycle:
n + n^2 + ... + n^(n-1) + (n-1) * n^(n-1)
\-----------------------/\--------------/
geometric special
In your case, the special part would be n^n (with the nuance that you have theorically one char less so replace n with n-1 everywhere)
The proposed supercycle is of length :
P = (n \sum_{k = 0}^{n-2} n^k) + (n-1) * n^(n-1)
P = (n \sum_{k = 0}^{n-3} n^k) + n^n
P = n(n^{n-2} - 1)/(n-1) + n^n
Here is an example diff with alphabet A={0,1,2}
my genStr(grandinero)
,00 0_0
,01 0_1
,02 0_2
,100 1_00
,101 1_01
,102 1_02
,110 1_10
,111 1_11
,112 1_12
,120 1_20
,121 1_21
,122 1_22
,2000 2_000
,2001 2_001
,2002 2_002
,2010 2_010
,2011 2_011
,2012 2_012
,2020 2_020
,2021 2_021
,2022 2_022
,2100 2_100
,2101 2_101
,2102 2_102
,2110 2_110
,2111 2_111
,2112 2_112
,2120 2_120
,2121 2_121
,2122 2_122
22,00 2_200 <-- end of my supercycle if no '_' allowed
22,01 2_201
22,02 2_202
22,100 2_210
22,101 2_211
22,102 2_212
22,110 2_220
22,111 2_221
22,112 2_222 <-- end of yours
22,120 z0_0
That said, for a given number x, we can can count how many supercycles (E(x / P)) there are, each supercycle making two leading e (e being the last char of A).
e.g: A = {0,1,2} and x = 43
e = 2
P = n(n^{n-2} - 1)/(n-1) + n^n = 3(3^1 -1)/2 + 27 = 30
// our supercycle is of length 30
E(43/30) = 1 // 43 makes one supercycle and a few more "strings"
r = x % P = 13 // this is also x - (E(43/30) * 30) (the rest of the euclidean division by P)
Then for the left over (r = x % P) two cases to consider:
either we fall in the geometric sequence
either we fall in the (n-1) * n^(n-1) part.
1. Adressing the geometric sequence with cumulative sums (x < S_w)
Let S_i be the cumsum of n, n^2,..
S_i = n\sum_{k = 0}^{i-1} n^k
S_i = n/(n-1)*(n^i - 1)
which gives S_0 = 0, S_1 = n, S_2 = n + n^2...
So basically, if x < S_1, we get 0(x), elif x < S_2, we get 1(x-S_1)
Let S_w = S_{n-1} the count of all the numbers we can represent.
If x <= S_w then we want the i such that
S_i < x <= S_{i+1} <=> n^i < (n-1)/n * x + 1 <= n^{i+1}
We can then apply some log flooring (base(n)) to get that i.
We can then associate the string: A[i] + base_n(x - S_i).
Illustration:
This time with A = {0,1,2,3}.
Let x be 17.
Our consecutive S_i are:
S_0 = 0
S_1 = 4
S_2 = S_1 + 4^2 = 20
S_3 = S_2 + 4^3 = 84
S_w = S_{4-1} = S_3 = 84
x=17 is indeed less than 84, we will be able to affect it to one of the S_i ranges.
In particular S_1==4 < x==17 <= S_2==20.
We remove the strings encoded by the leading 0(there are a number S_1 of those strings).
The position to encode with the leading 1 is
x - 4 = 13.
And we conclude the thirteen's string generated with a leading 1 is base_4(13) = '31' (idem string -> '131')
Should we have had x = 21, we would have removed the count of S_2 so 21-20 = 1, which in turn gives with a leading 2 the string '2001'.
2. Adressing x in the special part (x >= S_w)
Let's consider study case below:
with A = {0,1,2}
The special part is
2 |1| |2|^2
that is:
2 0 00
2 0 01
2 0 02
2 0 10
2 0 11
2 0 12
2 0 20
2 0 21
2 0 22
2 1 20
2 1 21
2 1 22
2 1 10
2 1 11
2 1 12
2 1 20
2 1 21
2 1 22
Each incremented number of the second column (here 0 to 1 (specified from |1|)) gives 3^2 combination.
This is similar to the geometric series except that here each range is constant. We want to find the range which means we know which string to prefix.
We can represent it as the matrix
20 (00,01,02,10,11,12,20,21,22)
21 (00,01,02,10,11,12,20,21,22)
The portion in parenthesis is our matrix.
Every item in a row is simply its position base_3 (left-padded with 0).
e.g: n=7 has base_3 value '21'. (7=2*3+1).
'21' does occur in position 7 in the row.
Assuming we get some x (relative to that special part).
E(x / 3^2) gives us the row number (here E(7/9) = 0 so prefix is '20')
x % 3^2 give us the position in the row (here base_3(7%9)='21' giving us the final string '2021')
If we want to observe it remember that we substracted S_w=12 before to get x = 7, so we would call myGen(7+12)
Some code
Notice the same output as long as we stand in the "geometric" range, without supercycle.
Obviously, when carry starts to appear, it depends on whether I can use '_' or not. If yes, my words get shorter otherwise longer.
// https://www.cs.sfu.ca/~ggbaker/zju/math/int-alg.html
// \w insensitive could give base64
// but also éè and other accents...
function base_n(x, n, A) {
const a = []
while (x !== 0n) {
a.push(A[Number(x % n)])
x = x / n // auto floor with bigInt
}
return a.reverse().join('')
}
function mygen (A) {
const n = A.length
const bn = BigInt(n)
const A_last = A[A.length-1]
const S = Array(n).fill(0).map((x, i) => bn * (bn ** BigInt(i) - 1n) / (bn - 1n))
const S_w = S[n-1]
const w = S_w + (bn - 1n) * bn ** (bn - 1n)
const w2 = bn ** (bn - 1n)
const flog_bn = x => {
// https://math.stackexchange.com/questions/1627914/smart-way-to-calculate-floorlogx
let L = 0
while (x >= bn) {
L++
x /= bn
}
return L
}
return function (x) {
x = BigInt(x)
let r = x % w
const q = (x - r) / w
let s
if (r < S_w) {
const i = flog_bn(r * (bn - 1n) / bn + 1n)
const r2 = r - S[i]
s = A[i] + base_n(r2, bn, A).padStart(i+1, '0')
} else {
const n2 = r - S_w
const r2 = n2 % w2
const q2 = (n2 - r2 ) / w2
s = A_last + A[q2] + base_n(r2, bn, A).padStart(n-1, '0')
}
// comma below __not__ necessary, just to ease seeing cycles
return A_last.repeat(2*Number(q)) +','+ s
}
}
function genStr (A) {
A = A.filter(x => x !== '_')
const bn_noUnderscore = BigInt(A.length)
return function (x) {
x = BigInt(x);
let prefix = "",
cycle = 0n,
max = bn_noUnderscore ** (cycle + 1n);
while (x >= max) {
x -= max;
if (cycle === bn_noUnderscore - 1n) {
prefix += "z";
cycle = 0n;
} else {
cycle++;
}
max = bn_noUnderscore ** (cycle + 1n);
}
return prefix
+ base_n(cycle, bn_noUnderscore, A)
+ "_"
+ base_n(x, bn_noUnderscore, A).padStart(Number(cycle) + 1, 0);
}
}
function test(a, b, x){
console.log(a(x), b(x))
}
{
console.log('---my supercycle is shorter if underscore not used. Plenty of room for grandinero')
const A = '0123456789abcdefghijklmnopqrstuvwxyz'.split('').sort((a,b)=>a.localeCompare(b))
let my = mygen(A)
const grandinero = genStr(A)
test(my, grandinero, 1e4)
test(my, grandinero, 1e12)
test(my, grandinero, 106471793335560744271846581685593263893929893610517909620n) // cycle ended for me (w variable value)
}
{
console.log('---\n my supercycle is greater if underscore is used in my alphabet (not grandinero since "forbidden')
// underscore used
const A = '0123456789abcdefghijklmnopqrstuvwxyz_'.split('').sort((a,b)=>a.localeCompare(b))
let my = mygen(A)
const grandinero = genStr(A)
test(my, grandinero, 1e12)
test(my, grandinero, 106471793335560744271846581685593263893929893610517909620n) // cycle ended for me (w variable value)
test(my, grandinero, 1e57) // still got some place in the supercycle
}
After considering the advice provided by #kaya3 and #grodzi and reviewing my original code, I have made some improvements. I realized a few things:
There was a bug in my original code. If one cycle ends at z_z (actually 36 z's after the underscore, but you get the idea) and the next one begins at z0_0, then lexical ordering is broken because _ comes after 0. The separator (or "neck") needs to be lower in lexical order than the lowest possible value of the head.
Though I was initially resistant to the idea of rolling a custom baseN generator so that more characters can be included, I have now come around to the idea.
I can squeeze more permutations out of a given string length by also incrementing the neck. For example, I can go from A00...A0z to A10...A1z, and so on, thus increasing the number of unique strings I can generate with A as the head before I move on to B.
With that in mind, I have revised my code:
// this is the alphabet used in standard baseN conversions:
let baseAlpha = "0123456789abcdefghijklmnopqrstuvwxyz";
// this is a factory for creating a new string generator:
function sequenceGenerator (config) {
let
// alphabets for the head, neck and body:
headAlpha = config.headAlpha,
neckAlpha = config.neckAlpha,
bodyAlpha = config.bodyAlpha,
// length of the body alphabet corresponds to the
// base of the numbering system:
base = BigInt(bodyAlpha.length),
// if bodyAlpha is identical to an alphabet that
// would be used for a standard baseN conversion,
// then use the built-in method, which should be
// much faster:
convertBody = baseAlpha.startsWith(bodyAlpha)
? (n) => n.toString(bodyAlpha.length)
// otherwise, roll a custom baseN generator:
: function (n) {
let s = "";
while (n > 0n) {
let i = n % base;
s = bodyAlpha[i] + s;
n = n / base;
}
return s;
},
// n is used to cache the last iteration and is
// incremented each time you call `getNext`
// it can optionally be initialized to a value other
// than 0:
n = BigInt(config.start || 0),
// see below:
headCycles = [0n],
cycleLength = 0n;
// the length of the body increases by 1 each time the
// head increments, meaning that the total number of
// permutations increases geometrically for each
// character in headAlpha
// here we cache the maximum number of permutations for
// each length of the body
// since we know these values ahead of time, calculating
// them in advance saves time when we generate a new
// string
// more importantly, it saves us from having to do a
// reverse calculation involving Math.log, which requires
// converting BigInts to Numbers, which breaks the
// program on larger numbers:
for (let i = 0; i < headAlpha.length; i++) {
// the maximum number of permutations depends on both
// the string length (i + 1) and the number of
// characters in neckAlpha, since the string length
// remains the same while the neck increments
cycleLength += BigInt(neckAlpha.length) * base ** BigInt(i + 1);
headCycles.push(cycleLength);
}
// given a number n, this function searches through
// headCycles to find where the total number of
// permutations exceeds n
// this is how we avoid the reverse calculation with
// Math.log to determine which head cycle we are on for
// a given permutation:
function getHeadCycle (n) {
for (let i = 0; i < headCycles.length; i++) {
if (headCycles[i] > n) return i;
}
}
return {
cycleLength: cycleLength,
getString: function (n) {
let cyclesDone = Number(n / cycleLength),
headLast = headAlpha[headAlpha.length - 1],
prefix = headLast.repeat(cyclesDone),
nn = n % cycleLength,
headCycle = getHeadCycle(nn),
head = headAlpha[headCycle - 1],
nnn = nn - headCycles[headCycle - 1],
neckCycleLength = BigInt(bodyAlpha.length) ** BigInt(headCycle),
neckCycle = nnn / neckCycleLength,
neck = neckAlpha[Number(neckCycle)],
body = convertBody(nnn % neckCycleLength);
body = body.padStart(headCycle , bodyAlpha[0]);
return prefix + head + neck + body;
},
getNext: function () { return this.getString(n++); }
};
}
let bodyAlpha = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz",
getStr = sequenceGenerator({
// achieve more permutations within a supercycle
// with a larger headAlpha:
headAlpha: "123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
// the highest value of neckAlpha must be lower than
// the lowest value of headAlpha:
neckAlpha: "0",
bodyAlpha: bodyAlpha
});
console.log("---supercycle length:");
console.log(Number(getStr.cycleLength));
console.log("---first two values:")
console.log(getStr.getNext());
console.log(getStr.getNext());
console.log("---arbitrary large value (1e57):");
console.log(getStr.getString(BigInt(1e57)));
console.log("");
// here we use a shorter headAlpha and longer neckAlpha
// to shorten the maximum length of the body, but this also
// decreases the number of permutations in the supercycle:
getStr = sequenceGenerator({
headAlpha: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
neckAlpha: "0123456789",
bodyAlpha: bodyAlpha
});
console.log("---supercycle length:");
console.log(Number(getStr.cycleLength));
console.log("---first two values:");
console.log(getStr.getNext());
console.log(getStr.getNext());
console.log("---arbitrary large value (1e57):");
console.log(getStr.getString(BigInt(1e57)));
EDIT
After further discussion with #grodzi, I have made some more improvements:
I realized that the "neck" or separator wasn't providing much value, so I have gotten rid of it. Later edit: actually, the separator is necessary. I am not sure why I thought it wasn't. Without the separator, the beginning of each new supercycle will lexically precede the end of the previous supercycle. I haven't changed my code below, but anyone using this code should include a separator. I have also realized that I was wrong to use an underscore as the separator. The separator must be a character, such as the hyphen, which lexically precedes the lowest digit used in the sequence (0).
I have taken #grodzi's suggestion to allow the length of the tail to continue growing indefinitely.
Here is the new code:
let baseAlpha = "0123456789abcdefghijklmnopqrstuvwxyz";
function sequenceGenerator (config) {
let headAlpha = config.headAlpha,
tailAlpha = config.tailAlpha,
base = BigInt(tailAlpha.length),
convertTail = baseAlpha.startsWith(tailAlpha)
? (n) => n.toString(tailAlpha.length)
: function (n) {
if (n === 0n) return "0";
let s = "";
while (n > 0n) {
let i = n % base;
s = tailAlpha[i] + s;
n = n / base;
}
return s;
},
n = BigInt(config.start || 0);
return {
getString: function (n) {
let cyclesDone = 0n,
headCycle = 0n,
initLength = 0n,
accum = 0n;
for (;; headCycle++) {
let _accum = accum + base ** (headCycle + 1n + initLength);
if (_accum > n) {
n -= accum;
break;
} else if (Number(headCycle) === headAlpha.length - 1) {
cyclesDone++;
initLength += BigInt(headAlpha.length);
headCycle = -1n;
}
accum = _accum;
}
let headLast = headAlpha[headAlpha.length - 1],
prefix = headLast.repeat(Number(cyclesDone)),
head = headAlpha[Number(headCycle)],
tail = convertTail(n),
tailLength = Number(headCycle + initLength);
tail = tail.padStart(tailLength, tailAlpha[0]);
return prefix + head + tail;
},
getNext: function () { return this.getString(n++); }
};
}
let alpha = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz",
genStr = sequenceGenerator({headAlpha: alpha, tailAlpha: alpha});
console.log("--- first string:");
console.log(genStr.getString(0n));
console.log("--- 1e+57");
console.log(genStr.getString(BigInt(1e+57)));
console.log("--- end of first supercycle:");
console.log(genStr.getString(63n*(1n-(63n**63n))/(1n-63n)-1n));
console.log("--- start of second supercycle:");
console.log(genStr.getString(63n*(1n-(63n**63n))/(1n-63n)));

Fastest way to check if a number is a vampire number?

A vampire number is defined here https://en.wikipedia.org/wiki/Vampire_number. A number V is a vampire number if:
It can be expressed as X*Y such that X and Y have N/2 digits each where N is the number of digits in V
Both X & Y should not have trailing zeros
X & Y together should have the same digits as V
I came up with a solution,
strV = sort(toString(V))
for factor <- pow(10, N/2) to sqrt(V)
if factor divides V
X <- factor
Y <- V/factor
if X and Y have trailing zeros
continue
checkStr = sort(toString(X) + toString(Y))
if checkStr equals strV return true
Another possible solution is to permute the string represented by V and split it into half and check if its a vampire number. Which one is the best way to do so?
The algorithm I propose here will not go through all permutations of digits. It will eliminate possibilities as fast as possible so that only a fraction of permutations will actually be tested.
Algorithm explained by example
Here is how it works based on example number 125460. If you are fine with reading the code directly, then you can skip this (long) part:
At first the two fangs (i.e. vampire factors) are obviously not known, and the problem can be represented as follows:
?**
X ?**
-------
=125460
For the left most digit of the first factor (marked with ?) we could choose any of the digits 0,1,2,5,4, or 6. But on closer analysis 0 would not be a viable possibility, as the product would never reach more than a 5-digit number. So it would be a waste of time to go through all permutations of digits that start with a zero.
For the left most digit of the second factor (also marked with ?), the same is true. However, when looking at the combinations, we can again filter out some pairs that cannot contribute to reaching the target product. For instance, this combination should be discarded:
1**
X 2**
-------
=125460
The greatest number that can be achieved with these digits is 199x299 = 59501 (ignoring the fact that we don't even have a 9), which is not even half of the desired number. So we should reject the combination (1, 2). For the same reason, the pair (1, 5) can be discarded for taking these positions. Similarly, the pairs (4, 5), (4, 6), and (5, 6) can be rejected as well, because they yield a too large product (>= 200000). I will call this kind of a test -- where it is determined whether the target number is within reach for a certain chosen digit pair, the "range test".
At this stage there is no difference between the first and the second fang, so we should also not have to investigate pairs where the second digit is smaller than the first, because they mirror a pair that would already have been investigated (or rejected).
So of all the possible pairs that could take up this first position (there are 30 possibilities to take 2 digits from a set of 6 digits), only the following 4 need to be investigated:
(1, 6), (2, 4), (2, 5), (2, 6)
In a more elaborate notation this means we are limiting the search to these number patterns:
1** 2** 2** 2**
X 6** X 4** X 5** X 6**
------- ------- ------- -------
=125460 =125460 =125460 =125460
A B C D
It is clear that this reduction of possibilities before even looking at the other positions greatly reduces the search tree.
The algorithm will take each of these 4 possibilities in order, and for each will check the possibilities for the next digit position. So first configuration A is analysed:
1?*
X 6?*
-------
=125460
The pairs that are available for the ?-marked positions are these 12:
(0, 2), (0, 4), (0, 5)
(2, 0), (2, 4), (2, 5)
(4, 0), (4, 2), (4, 5)
(5, 0), (5, 2), (5, 4)
Again, we can eliminate pairs by applying the range test. Let's take for instance the pair (5, 4). This would mean we had factors 15* and 64* (where * is an unknown digit at this point). The product of these two will be maximised with 159 * 649, i.e. 103191 (again ignoring the fact we do not even have a 9 available): this is too low for reaching the target, so this pair can be ignored. By further applying the range test, all these 12 pairs can be discarded, and so the search within configuration A stops here: there is no solution there.
Then the algorithm moves to configuration B:
2?*
X 4?*
-------
=125460
Again, the range test is applied to the possible pairs for the second position, and again it turns out none of these pairs passes the test: for instance (5, 6) can never represent a greater product than 259 * 469 = 121471, which is (only just) too small.
Then the algorithm moves to option C:
2?*
X 5?*
-------
=125460
Of all 12 possible pairs, only the following survive the range test: (4, 0), (4, 1), (6, 0), (6, 1). So now we have the following second-level configurations:
24* 24* 26* 26*
X 50* X 51* X 50* X 51*
------- ------- ------- -------
=125460 =125460 =125460 =125460
Ca Cb Cc Cd
In configuration Ca, there is no pair that passes the range test.
In configuration Cb, the pair (6, 0) passes, and leads to a solution:
246
X 510
-------
=125460
At this point the algorithm stops searching. The outcome is clear. In total the number of configurations looked at is very small compared to a brute force permutation checking algorithm. Here is a visualisation of the search tree:
*-+-- (1, 6)
|
+-- (2, 4)
|
+-- (2, 5) -+-- (4, 0)
| |
| +-- (4, 1) ---- (6, 0) = success: 246 * 510
/ /
| +-- (6, 0)
| |
| +-- (6, 1)
|
+-- (2, 6) ---- (0, 1) ---- (4, 5) = success: 204 * 615
The variants below / are only for showing what else the algorithm would have done, if there had not been a solution found. But in this actual case, that part of the search tree was actually never followed.
I have no clear idea of the time complexity, but it seems to run quite well for larger numbers, showing that the elimination of digits at an early stage makes the width of the search tree quite narrow.
Here is a live JavaScript implementation, which also runs some test cases when it it is activated (and it has a few other optimisations -- see code comments).
/*
Function: vampireFangs
Arguments:
vampire: number to factorise into two fangs, if possible.
Return value:
Array with two fangs if indeed the argument is a vampire number.
Otherwise false (not a vampire number) or null (argument too large to
compute)
*/
function vampireFangs(vampire) {
/* Function recurse: for the recursive part of the algorithm.
prevA, prevB: partial, potential fangs based on left-most digits of the given
number
counts: array of ten numbers representing the occurrence of still
available digits
divider: power of 100, is divided by 100 each next level in the search tree.
Determines the number of right-most digits of the given number that
are ignored at first in the algorithm. They will be considered in
deeper levels of recursion.
*/
function recurse(vampire, prevA, prevB, counts, divider) {
if (divider < 1) { // end of recursion
// Product of fangs must equal original number and fangs must not both
// end with a 0.
return prevA * prevB === vampire && (prevA % 10 + prevB % 10 > 0)
? [prevA, prevB] // Solution found
: false; // It's not a solution
}
// Get left-most digits (multiple of 2) of potential vampire number
var v = Math.floor(vampire/divider);
// Shift decimal digits of partial fangs to the left to make room for
// the next digits
prevA *= 10;
prevB *= 10;
// Calculate the min/max A digit that can potentially contribute to a
// solution
var minDigA = Math.floor(v / (prevB + 10)) - prevA;
var maxDigA = prevB ? Math.floor((v + 1) / prevB) - prevA : 9;
if (maxDigA > 9) maxDigA = 9;
for (var digA = minDigA; digA <= maxDigA; digA++) {
if (!counts[digA]) continue; // this digit is not available
var fangA = prevA + digA;
counts[digA]--;
// Calculate the min/max B digit that can potentially contribute to
// a solution
var minDigB = Math.floor(v / (fangA + 1)) - prevB;
var maxDigB = fangA ? (v + 1) / fangA - prevB : 9;
// Don't search mirrored A-B digits when both fangs are equal until now.
if (prevA === prevB && digA > minDigB) minDigB = digA;
if (maxDigB > 9) maxDigB = 9;
for (var digB = minDigB; digB <= Math.min(maxDigB, 9); digB++) {
if (!counts[digB]) continue; // this digit is not available
var fangB = prevB + digB;
counts[digB]--;
// Recurse by considering the next two digits of the potential
// vampire number, for finding the next digits to append to
// both partial fangs.
var result = recurse(vampire, fangA, fangB, counts, divider / 100);
// When one solution is found: stop searching & exit search tree.
if (result) return result; // solution found
// Restore counts
counts[digB]++;
}
counts[digA]++;
}
}
// Validate argument
if (typeof vampire !== 'number') return false;
if (vampire < 0 || vampire % 1 !== 0) return false; // not positive and integer
if (vampire > 9007199254740991) return null; // beyond JavaScript precision
var digits = vampire.toString(10).split('').map(Number);
// A vampire number has an even number of digits
if (!digits.length || digits.length % 2 > 0) return false;
// Register per digit (0..9) the frequency of that digit in the argument
var counts = [0,0,0,0,0,0,0,0,0,0];
for (var i = 0; i < digits.length; i++) {
counts[digits[i]]++;
}
return recurse(vampire, 0, 0, counts, Math.pow(10, digits.length - 2));
}
function Timer() {
function now() { // try performance object, else use Date
return performance ? performance.now() : new Date().getTime();
}
var start = now();
this.spent = function () { return Math.round(now() - start); }
}
// I/O
var button = document.querySelector('button');
var input = document.querySelector('input');
var output = document.querySelector('pre');
button.onclick = function () {
var str = input.value;
// Convert to number
var vampire = parseInt(str);
// Measure performance
var timer = new Timer();
// Input must be valid number
var result = vampire.toString(10) !== str ? null
: vampireFangs(vampire);
output.textContent = (result
? 'Vampire number. Fangs are: ' + result.join(', ')
: result === null
? 'Input is not an integer or too large for JavaScript'
: 'Not a vampire number')
+ '\nTime spent: ' + timer.spent() + 'ms';
}
// Tests (numbers taken from wiki page)
var tests = [
// Negative test cases:
[1, 999, 126000, 1023],
// Positive test cases:
[1260, 1395, 1435, 1530, 1827, 2187, 6880,
102510, 104260, 105210, 105264, 105750, 108135,
110758, 115672, 116725, 117067, 118440,
120600, 123354, 124483, 125248, 125433, 125460, 125500,
13078260,
16758243290880,
24959017348650]
];
tests.forEach(function (vampires, shouldBeVampire) {
vampires.forEach(function (vampire) {
var isVampire = vampireFangs(vampire);
if (!isVampire !== !shouldBeVampire) {
output.textContent = 'Unexpected: vampireFangs('
+ vampire + ') returns ' + JSON.stringify(isVampire);
throw 'Test failed';
}
});
});
output.textContent = 'All tests passed.';
N: <input value="1047527295416280"><button>Vampire Check</button>
<pre></pre>
As JavaScript uses 64 bit floating point representation, the above snippet only accepts to numbers up to 253-1. Above that limit there would be loss of precision and consequently unreliable results.
As Python does not have such limitation, I also put a Python implementation on eval.in. That site has a limitation on execution times, so you'd have to run it elsewhere if that becomes an issue.
In pseudocode:
if digitcount is odd return false
if digitcount is 2 return false
for A = each permutation of length digitcount/2 selected from all the digits,
for B = each permutation of the remaining digits,
if either A or B starts with a zero, continue
if both A and B end in a zero, continue
if A*B == the number, return true
There are a number of optimizations that could still be performed here, mostly in terms of ensuring that each possible pair of factors is tried only once. In other words, how to best check for repeating digits when selecting permutations?
But that's the gist of the algorithm I would use.
P.S.: You're not looking for primes, so why use a primality test? You just care about whether these are vampire numbers; there are only a very few possible factors. No need to check all the numbers up to sqrt(number).
Here are some suggestions:
First a simple improvement: if the number of digits is < 4 or odd return false (or if v is negative too).
You don't need to sort v, it is enough to count how many times each digit occurs O(n).
You don't have to check each number, only the combinations that are possible with the digits. This could be done by backtracking and significantly reduces the amount of numbers that have to be checked.
The final sort to check if all digits were used isn't needed either, just add up the used digits of both numbers and compare with the occurences in v.
Here is the code for a JS-like language with integers that never overflow, the V parameter is an integer string without leading 0s:
Edit: As it turns out the code is not only JS-like, but valid JS code and it had no problem to decide that 1047527295416280 is indeed a vampire number (jsfiddle).
var V, v, isVmp, digits, len;
function isVampire(numberString) {
V = numberString;
if (V.length < 4 || V.length % 2 == 1 )
return false;
v = parseInt(V);
if (v < 0)
return false;
digits = countDigits(V);
len = V.length / 2;
isVmp = false;
checkNumbers();
return isVmp;
}
function countDigits(s) {
var offset = "0".charCodeAt(0);
var ret = [0,0,0,0,0,0,0,0,0,0];
for (var i = 0; i < s.length; i++)
ret[s.charCodeAt(i) - offset]++;
return ret;
}
function checkNumbers(number, depth) {
if (isVmp)
return;
if (typeof number == 'undefined') {
for (var i = 1; i < 10; i++) {
if (digits[i] > 0) {
digits[i]--;
checkNumbers(i, len - 1);
digits[i]++;
}
}
} else if (depth == 0) {
if (v % number == 0) {
var b = v / number;
if (number % 10 != 0 || b % 10 != 0) {
var d = countDigits('' + b);
if (d[0] == digits[0] && d[1] == digits[1] && d[2] == digits[2] &&
d[3] == digits[3] && d[4] == digits[4] && d[5] == digits[5] &&
d[6] == digits[6] && d[7] == digits[7] && d[8] == digits[8] &&
d[9] == digits[9])
isVmp = true;
}
}
} else {
for (var i = 0; i < 10; i++) {
if (digits[i] > 0) {
digits[i]--;
checkNumbers(number * 10 + i, depth - 1);
digits[i]++;
}
}
}
}

Take exact root of value

I'd like to know how you can take the exact n-th root of a number (in any programming language). When I use a physical calculator, I can type something like sqrt(12) (nicely formatted of course) and get as a result 2 sqrt(3). How can I achieve this not only with square roots but any type of root when representing a number as numerator and denominator. Of course, I would have to use another representation, but I don't have any idea how this works in general.
Thanks in advance.
I doubt this is an efficient way, but it would work. Assuming you want to take the nth root of some number m:
Calculate the prime factorization m = p1a1 * p2a2 * ... * pxax.
For each 1 <= i <= x let ki = ai div n and ri = ai mod n.
The part that gets factored out is then p1k1 * p2k2 * ... * pxkx.
The part that remains "under the root" is p1r1 * p2r2 * ... * pxrx.
The first step is the only tricky one. Once you have found all prime factors of m it is just a matter of looping over those factors and dividing out the multiples of n.
To simplify the n-th root of a number, the algorithm shouldn't do prime factorisation, but rather "n-th power factorisation", i.e. look for the largest n-th power inside the root, which you can then move outside the root. For example: the 3rd root of 250 equals the third root of 2 x 125; since 125 is the third power of 5, you can move it out of the root and get: 5 times the third root of 2.
Algorithm: take the floating-point n-th root of the number, and round it down, then check this and all smaller integers until you find the largest integer whose n-th power divides the number; then divide the number by the n-th power and move the integer out of the root.
This javascript example shows a basic implementation; you could clean it up further by printing 11/root simply as 1; further optimisation is undoubtedly possible.
function integerRoot(number, root) {
var base = number, factor = 1;
var max = Math.floor(Math.pow(base, 1/root));
for (var i = max; i > 1; i--) {
var power = Math.pow(i, root);
if (base % power == 0) {
base /= power;
factor *= i;
break;
}
}
document.write(number + "<SUP>1/" + root + "</SUP> = " +
factor + " × " + base + "<SUP>1/" + root + "</SUP><BR>");
}
integerRoot(25, 3);
integerRoot(27, 3);
integerRoot(81, 3);
integerRoot(135, 3);
integerRoot(375, 3);
integerRoot(8*27*64*17, 3);
UPDATE: This is a more efficient version; I haven't yet taken negative numbers into account, though, so there's definitely room for further improvement.
function simplifyRoot(radicand, degree) {
var factor = 1, base = 1, power;
while ((power = Math.pow(++base, degree)) <= radicand) {
while (radicand % power == 0) {
factor *= base;
radicand /= power;
}
}
return {factor: factor, radicand: radicand, degree: degree};
}
var radicand = 8*27*36*64*125*216, degree = 3;
var simplified = simplifyRoot(radicand, degree);
document.write(radicand + "<SUP>1/" + degree + "</SUP> = " +
simplified.factor + " × " + simplified.radicand + "<SUP>1/" + simplified.degree + "</SUP><BR>");

Number of 1s in the two's complement binary representations of integers in a range

This problem is from the 2011 Codesprint (http://csfall11.interviewstreet.com/):
One of the basics of Computer Science is knowing how numbers are represented in 2's complement. Imagine that you write down all numbers between A and B inclusive in 2's complement representation using 32 bits. How many 1's will you write down in all ?
Input:
The first line contains the number of test cases T (<1000). Each of the next T lines contains two integers A and B.
Output:
Output T lines, one corresponding to each test case.
Constraints:
-2^31 <= A <= B <= 2^31 - 1
Sample Input:
3
-2 0
-3 4
-1 4
Sample Output:
63
99
37
Explanation:
For the first case, -2 contains 31 1's followed by a 0, -1 contains 32 1's and 0 contains 0 1's. Thus the total is 63.
For the second case, the answer is 31 + 31 + 32 + 0 + 1 + 1 + 2 + 1 = 99
I realize that you can use the fact that the number of 1s in -X is equal to the number of 0s in the complement of (-X) = X-1 to speed up the search. The solution claims that there is a O(log X) recurrence relation for generating the answer but I do not understand it. The solution code can be viewed here: https://gist.github.com/1285119
I would appreciate it if someone could explain how this relation is derived!
Well, it's not that complicated...
The single-argument solve(int a) function is the key. It is short, so I will cut&paste it here:
long long solve(int a)
{
if(a == 0) return 0 ;
if(a % 2 == 0) return solve(a - 1) + __builtin_popcount(a) ;
return ((long long)a + 1) / 2 + 2 * solve(a / 2) ;
}
It only works for non-negative a, and it counts the number of 1 bits in all integers from 0 to a inclusive.
The function has three cases:
a == 0 -> returns 0. Obviously.
a even -> returns the number of 1 bits in a plus solve(a-1). Also pretty obvious.
The final case is the interesting one. So, how do we count the number of 1 bits from 0 to an odd number a?
Consider all of the integers between 0 and a, and split them into two groups: The evens, and the odds. For example, if a is 5, you have two groups (in binary):
000 (aka. 0)
010 (aka. 2)
100 (aka. 4)
and
001 (aka 1)
011 (aka 3)
101 (aka 5)
Observe that these two groups must have the same size (because a is odd and the range is inclusive). To count how many 1 bits there are in each group, first count all but the last bits, then count the last bits.
All but the last bits looks like this:
00
01
10
...and it looks like this for both groups. The number of 1 bits here is just solve(a/2). (In this example, it is the number of 1 bits from 0 to 2. Also, recall that integer division in C/C++ rounds down.)
The last bit is zero for every number in the first group and one for every number in the second group, so those last bits contribute (a+1)/2 one bits to the total.
So the third case of the recursion is (a+1)/2 + 2*solve(a/2), with appropriate casts to long long to handle the case where a is INT_MAX (and thus a+1 overflows).
This is an O(log N) solution. To generalize it to solve(a,b), you just compute solve(b) - solve(a), plus the appropriate logic for worrying about negative numbers. That is what the two-argument solve(int a, int b) is doing.
Cast the array into a series of integers. Then for each integer do:
int NumberOfSetBits(int i)
{
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}
Also this is portable, unlike __builtin_popcount
See here: How to count the number of set bits in a 32-bit integer?
when a is positive, the better explanation was already been posted.
If a is negative, then on a 32-bit system each negative number between a and zero will have 32 1's bits less the number of bits in the range from 0 to the binary representation of positive a.
So, in a better way,
long long solve(int a) {
if (a >= 0){
if (a == 0) return 0;
else if ((a %2) == 0) return solve(a - 1) + noOfSetBits(a);
else return (2 * solve( a / 2)) + ((long long)a + 1) / 2;
}else {
a++;
return ((long long)(-a) + 1) * 32 - solve(-a);
}
}
In the following code, the bitsum of x is defined as the count of 1 bits in the two's complement representation of the numbers between 0 and x (inclusive), where Integer.MIN_VALUE <= x <= Integer.MAX_VALUE.
For example:
bitsum(0) is 0
bitsum(1) is 1
bitsum(2) is 1
bitsum(3) is 4
..etc
10987654321098765432109876543210 i % 10 for 0 <= i <= 31
00000000000000000000000000000000 0
00000000000000000000000000000001 1
00000000000000000000000000000010 2
00000000000000000000000000000011 3
00000000000000000000000000000100 4
00000000000000000000000000000101 ...
00000000000000000000000000000110
00000000000000000000000000000111 (2^i)-1
00000000000000000000000000001000 2^i
00000000000000000000000000001001 (2^i)+1
00000000000000000000000000001010 ...
00000000000000000000000000001011 x, 011 = x & (2^i)-1 = 3
00000000000000000000000000001100
00000000000000000000000000001101
00000000000000000000000000001110
00000000000000000000000000001111
00000000000000000000000000010000
00000000000000000000000000010001
00000000000000000000000000010010 18
...
01111111111111111111111111111111 Integer.MAX_VALUE
The formula of the bitsum is:
bitsum(x) = bitsum((2^i)-1) + 1 + x - 2^i + bitsum(x & (2^i)-1 )
Note that x - 2^i = x & (2^i)-1
Negative numbers are handled slightly differently than positive numbers. In this case the number of zeros is subtracted from the total number of bits:
Integer.MIN_VALUE <= x < -1
Total number of bits: 32 * -x.
The number of zeros in a negative number x is equal to the number of ones in -x - 1.
public class TwosComplement {
//t[i] is the bitsum of (2^i)-1 for i in 0 to 31.
private static long[] t = new long[32];
static {
t[0] = 0;
t[1] = 1;
int p = 2;
for (int i = 2; i < 32; i++) {
t[i] = 2*t[i-1] + p;
p = p << 1;
}
}
//count the bits between x and y inclusive
public static long bitsum(int x, int y) {
if (y > x && x > 0) {
return bitsum(y) - bitsum(x-1);
}
else if (y >= 0 && x == 0) {
return bitsum(y);
}
else if (y == x) {
return Integer.bitCount(y);
}
else if (x < 0 && y == 0) {
return bitsum(x);
} else if (x < 0 && x < y && y < 0 ) {
return bitsum(x) - bitsum(y+1);
} else if (x < 0 && x < y && 0 < y) {
return bitsum(x) + bitsum(y);
}
throw new RuntimeException(x + " " + y);
}
//count the bits between 0 and x
public static long bitsum(int x) {
if (x == 0) return 0;
if (x < 0) {
if (x == -1) {
return 32;
} else {
long y = -(long)x;
return 32 * y - bitsum((int)(y - 1));
}
} else {
int n = x;
int sum = 0; //x & (2^i)-1
int j = 0;
int i = 1; //i = 2^j
int lsb = n & 1; //least significant bit
n = n >>> 1;
while (n != 0) {
sum += lsb * i;
lsb = n & 1;
n = n >>> 1;
i = i << 1;
j++;
}
long tot = t[j] + 1 + sum + bitsum(sum);
return tot;
}
}
}

How do I generate a random string of up to a certain length?

I would like to generate a random string (or a series of random strings, repetitions allowed) of length between 1 and n characters from some (finite) alphabet. Each string should be equally likely (in other words, the strings should be uniformly distributed).
The uniformity requirement means that an algorithm like this doesn't work:
alphabet = "abcdefghijklmnopqrstuvwxyz"
len = rand(1, n)
s = ""
for(i = 0; i < len; ++i)
s = s + alphabet[rand(0, 25)]
(pseudo code, rand(a, b) returns a integer between a and b, inclusively, each integer equally likely)
This algorithm generates strings with uniformly distributed lengths, but the actual distribution should be weighted toward longer strings (there are 26 times as many strings with length 2 as there are with length 1, and so on.) How can I achieve this?
What you need to do is generate your length and then your string as two distinct steps. You will need to first chose the length using a weighted approach. You can calculate the number of strings of a given length l for an alphabet of k symbols as k^l. Sum those up and then you have the total number of strings of any length, your first step is to generate a random number between 1 and that value and then bin it accordingly. Modulo off by one errors you would break at 26, 26^2, 26^3, 26^4 and so on. The logarithm based on the number of symbols would be useful for this task.
Once you have you length then you can generate the string as you have above.
Okay, there are 26 possibilities for a 1-character string, 262 for a 2-character string, and so on up to 2626 possibilities for a 26-character string.
That means there are 26 times as many possibilities for an (N)-character string than there are for an (N-1)-character string. You can use that fact to select your length:
def getlen(maxlen):
sz = maxlen
while sz != 1:
if rnd(27) != 1:
return sz
sz--;
return 1
I use 27 in the above code since the total sample space for selecting strings from "ab" is the 26 1-character possibilities and the 262 2-character possibilities. In other words, the ratio is 1:26 so 1-character has a probability of 1/27 (rather than 1/26 as I first answered).
This solution isn't perfect since you're calling rnd multiple times and it would be better to call it once with an possible range of 26N+26N-1+261 and select the length based on where your returned number falls within there but it may be difficult to find a random number generator that'll work on numbers that large (10 characters gives you a possible range of 2610+...+261 which, unless I've done the math wrong, is 146,813,779,479,510).
If you can limit the maximum size so that your rnd function will work in the range, something like this should be workable:
def getlen(chars,maxlen):
assert maxlen >= 1
range = chars
sampspace = 0
for i in 1 .. maxlen:
sampspace = sampspace + range
range = range * chars
range = range / chars
val = rnd(sampspace)
sz = maxlen
while val < sampspace - range:
sampspace = sampspace - range
range = range / chars
sz = sz - 1
return sz
Once you have the length, I would then use your current algorithm to choose the actual characters to populate the string.
Explaining it further:
Let's say our alphabet only consists of "ab". The possible sets up to length 3 are [ab] (2), [ab][ab] (4) and [ab][ab][ab] (8). So there is a 8/14 chance of getting a length of 3, 4/14 of length 2 and 2/14 of length 1.
The 14 is the magic figure: it's the sum of all 2n for n = 1 to the maximum length. So, testing that pseudo-code above with chars = 2 and maxlen = 3:
assert maxlen >= 1 [okay]
range = chars [2]
sampspace = 0
for i in 1 .. 3:
i = 1:
sampspace = sampspace + range [0 + 2 = 2]
range = range * chars [2 * 2 = 4]
i = 2:
sampspace = sampspace + range [2 + 4 = 6]
range = range * chars [4 * 2 = 8]
i = 3:
sampspace = sampspace + range [6 + 8 = 14]
range = range * chars [8 * 2 = 16]
range = range / chars [16 / 2 = 8]
val = rnd(sampspace) [number from 0 to 13 inclusive]
sz = maxlen [3]
while val < sampspace - range: [see below]
sampspace = sampspace - range
range = range / chars
sz = sz - 1
return sz
So, from that code, the first iteration of the final loop will exit with sz = 3 if val is greater than or equal to sampspace - range [14 - 8 = 6]. In other words, for the values 6 through 13 inclusive, 8 of the 14 possibilities.
Otherwise, sampspace becomes sampspace - range [14 - 8 = 6] and range becomes range / chars [8 / 2 = 4].
Then the second iteration of the final loop will exit with sz = 2 if val is greater than or equal to sampspace - range [6 - 4 = 2]. In other words, for the values 2 through 5 inclusive, 4 of the 14 possibilities.
Otherwise, sampspace becomes sampspace - range [6 - 4 = 2] and range becomes range / chars [4 / 2 = 2].
Then the third iteration of the final loop will exit with sz = 1 if val is greater than or equal to sampspace - range [2 - 2 = 0]. In other words, for the values 0 through 1 inclusive, 2 of the 14 possibilities (this iteration will always exit since the value must be greater than or equal to zero.
In retrospect, that second solution is a bit of a nightmare. In my personal opinion, I'd go for the first solution for its simplicity and to avoid the possibility of rather large numbers.
Building on my comment posted as a reply to the OP:
I'd consider it an exercise in base
conversion. You're simply generating a
"random number" in "base 26", where
a=0 and z=25. For a random string of
length n, generate a number between 1
and 26^n. Convert from base 10 to base
26, using symbols from your chosen
alphabet.
Here's a PHP implementation. I won't guaranty that there isn't an off-by-one error or two in here, but any such error should be minor:
<?php
$n = 5;
var_dump(randstr($n));
function randstr($maxlen) {
$dict = 'abcdefghijklmnopqrstuvwxyz';
$rand = rand(0, pow(strlen($dict), $maxlen));
$str = base_convert($rand, 10, 26);
//base convert returns base 26 using 0-9 and 15 letters a-p(?)
//we must convert those to our own set of symbols
return strtr($str, '1234567890abcdefghijklmnopqrstuvwxyz', $dict);
}
Instead of picking a length with uniform distribution, weight it according to how many strings are a given length. If your alphabet is size m, there are mx strings of size x, and (1-mn+1)/(1-m) strings of length n or less. The probability of choosing a string of length x should be mx*(1-m)/(1-mn+1).
Edit:
Regarding overflow - using floating point instead of integers will expand the range, so for a 26-character alphabet and single-precision floats, direct weight calculation shouldn't overflow for n<26.
A more robust approach is to deal with it iteratively. This should also minimize the effects of underflow:
int randomLength() {
for(int i = n; i > 0; i--) {
double d = Math.random();
if(d > (m - 1) / (m - Math.pow(m, -i))) {
return i;
}
}
return 0;
}
To make this more efficient by calculating fewer random numbers, we can reuse them by splitting intervals in more than one place:
int randomLength() {
for(int i = n; i > 0; i -= 5) {
double d = Math.random();
double c = (m - 1) / (m - Math.pow(m, -i))
for(int j = 0; j < 5; j++) {
if(d > c) {
return i - j;
}
c /= m;
}
}
for(int i = n % 0; i > 0; i--) {
double d = Math.random();
if(d > (m - 1) / (m - Math.pow(m, -i))) {
return i;
}
}
return 0;
}
Edit: This answer isn't quite right. See the bottom for a disproof. I'll leave it up for now in the hope someone can come up with a variant that fixes it.
It's possible to do this without calculating the length separately - which, as others have pointed out, requires raising a number to a large power, and generally seems like a messy solution to me.
Proving that this is correct is a little tough, and I'm not sure I trust my expository powers to make it clear, but bear with me. For the purposes of the explanation, we're generating strings of length at most n from an alphabet a of |a| characters.
First, imagine you have a maximum length of n, and you've already decided you're generating a string of at least length n-1. It should be obvious that there are |a|+1 equally likely possibilities: we can generate any of the |a| characters from the alphabet, or we can choose to terminate with n-1 characters. To decide, we simply pick a random number x between 0 and |a| (inclusive); if x is |a|, we terminate at n-1 characters; otherwise, we append the xth character of a to the string. Here's a simple implementation of this procedure in Python:
def pick_character(alphabet):
x = random.randrange(len(alphabet) + 1)
if x == len(alphabet):
return ''
else:
return alphabet[x]
Now, we can apply this recursively. To generate the kth character of the string, we first attempt to generate the characters after k. If our recursive invocation returns anything, then we know the string should be at least length k, and we generate a character of our own from the alphabet and return it. If, however, the recursive invocation returns nothing, we know the string is no longer than k, and we use the above routine to select either the final character or no character. Here's an implementation of this in Python:
def uniform_random_string(alphabet, max_len):
if max_len == 1:
return pick_character(alphabet)
suffix = uniform_random_string(alphabet, max_len - 1)
if suffix:
# String contains characters after ours
return random.choice(alphabet) + suffix
else:
# String contains no characters after our own
return pick_character(alphabet)
If you doubt the uniformity of this function, you can attempt to disprove it: suggest a string for which there are two distinct ways to generate it, or none. If there are no such strings - and alas, I do not have a robust proof of this fact, though I'm fairly certain it's true - and given that the individual selections are uniform, then the result must also select any string with uniform probability.
As promised, and unlike every other solution posted thus far, no raising of numbers to large powers is required; no arbitrary length integers or floating point numbers are needed to store the result, and the validity, at least to my eyes, is fairly easy to demonstrate. It's also shorter than any fully-specified solution thus far. ;)
If anyone wants to chip in with a robust proof of the function's uniformity, I'd be extremely grateful.
Edit: Disproof, provided by a friend:
dato: so imagine alphabet = 'abc' and n = 2
dato: you have 9 strings of length 2, 3 of length 1, 1 of length 0
dato: that's 13 in total
dato: so probability of getting a length 2 string should be 9/13
dato: and probability of getting a length 1 or a length 0 should be 4/13
dato: now if you call uniform_random_string('abc', 2)
dato: that transforms itself into a call to uniform_random_string('abc', 1)
dato: which is an uniform distribution over ['a', 'b', 'c', '']
dato: the first three of those yield all the 2 length strings
dato: and the latter produce all the 1 length strings and the empty strings
dato: but 0.75 > 9/13
dato: and 0.25 < 4/13
// Note space as an available char
alphabet = "abcdefghijklmnopqrstuvwxyz "
result_string = ""
for( ;; )
{
s = ""
for( i = 0; i < n; i++ )
s += alphabet[rand(0, 26)]
first_space = n;
for( i = 0; i < n; i++ )
if( s[ i ] == ' ' )
{
first_space = i;
break;
}
ok = true;
// Reject "duplicate" shorter strings
for( i = first_space + 1; i < n; i++ )
if( s[ i ] != ' ' )
{
ok = false;
break;
}
if( !ok )
continue;
// Extract the short version of the string
for( i = 0; i < first_space; i++ )
result_string += s[ i ];
break;
}
Edit: I forgot to disallow 0-length strings, that will take a bit more code which I don't have time to add now.
Edit: After considering how my answer doesn't scale to large n (takes too long to get lucky and find an accepted string), I like paxdiablo's answer much better. Less code too.
Personally I'd do it like this:
Let's say your alphabet has Z characters. Then the number of possible strings for each length L is:
L | Z
--------------------------
1 | 26
2 | 676 (= 26 * 26)
3 | 17576 (= 26 * 26 * 26)
...and so on.
Now let's say your maximum desired length is N. Then the total number of possible strings from length 1 to N that your function could generate would be the sum of a geometric sequence:
(1 - (Z ^ (N + 1))) / (1 - Z)
Let's call this value S. Then the probability of generating a string of any length L should be:
(Z ^ L) / S
OK, fine. This is all well and good; but how do we generate a random number given a non-uniform probability distribution?
The short answer is: you don't. Get a library to do that for you. I develop mainly in .NET, so one I might turn to would be Math.NET.
That said, it's really not so hard to come up with a rudimentary approach to doing this on your own.
Here's one way: take a generator that gives you a random value within a known uniform distribution, and assign ranges within that distribution of sizes dependent on your desired distribution. Then interpret the random value provided by the generator by determining which range it falls into.
Here's an example in C# of one way you could implement this idea (scroll to the bottom for example output):
RandomStringGenerator class
public class RandomStringGenerator
{
private readonly Random _random;
private readonly char[] _alphabet;
public RandomStringGenerator(string alphabet)
{
if (string.IsNullOrEmpty(alphabet))
throw new ArgumentException("alphabet");
_random = new Random();
_alphabet = alphabet.Distinct().ToArray();
}
public string NextString(int maxLength)
{
// Get a value randomly distributed between 0.0 and 1.0 --
// this is approximately what the System.Random class provides.
double value = _random.NextDouble();
// This is where the magic happens: we "translate" the above number
// to a length based on our computed probability distribution for the given
// alphabet and the desired maximum string length.
int length = GetLengthFromRandomValue(value, _alphabet.Length, maxLength);
// The rest is easy: allocate a char array of the length determined above...
char[] chars = new char[length];
// ...populate it with a bunch of random values from the alphabet...
for (int i = 0; i < length; ++i)
{
chars[i] = _alphabet[_random.Next(0, _alphabet.Length)];
}
// ...and return a newly constructed string.
return new string(chars);
}
static int GetLengthFromRandomValue(double value, int alphabetSize, int maxLength)
{
// Looping really might not be the smartest way to do this,
// but it's the most obvious way that immediately springs to my mind.
for (int length = 1; length <= maxLength; ++length)
{
Range r = GetRangeForLength(length, alphabetSize, maxLength);
if (r.Contains(value))
return length;
}
return maxLength;
}
static Range GetRangeForLength(int length, int alphabetSize, int maxLength)
{
int L = length;
int Z = alphabetSize;
int N = maxLength;
double possibleStrings = (1 - (Math.Pow(Z, N + 1)) / (1 - Z));
double stringsOfGivenLength = Math.Pow(Z, L);
double possibleSmallerStrings = (1 - Math.Pow(Z, L)) / (1 - Z);
double probabilityOfGivenLength = ((double)stringsOfGivenLength / possibleStrings);
double probabilityOfShorterLength = ((double)possibleSmallerStrings / possibleStrings);
double startPoint = probabilityOfShorterLength;
double endPoint = probabilityOfShorterLength + probabilityOfGivenLength;
return new Range(startPoint, endPoint);
}
}
Range struct
public struct Range
{
public readonly double StartPoint;
public readonly double EndPoint;
public Range(double startPoint, double endPoint)
: this()
{
this.StartPoint = startPoint;
this.EndPoint = endPoint;
}
public bool Contains(double value)
{
return this.StartPoint <= value && value <= this.EndPoint;
}
}
Test
static void Main(string[] args)
{
const int N = 5;
const string alphabet = "acegikmoqstvwy";
int Z = alphabet.Length;
var rand = new RandomStringGenerator(alphabet);
var strings = new List<string>();
for (int i = 0; i < 100000; ++i)
{
strings.Add(rand.NextString(N));
}
Console.WriteLine("First 10 results:");
for (int i = 0; i < 10; ++i)
{
Console.WriteLine(strings[i]);
}
// sanity check
double sumOfProbabilities = 0.0;
for (int i = 1; i <= N; ++i)
{
double probability = Math.Pow(Z, i) / ((1 - (Math.Pow(Z, N + 1))) / (1 - Z));
int numStrings = strings.Count(str => str.Length == i);
Console.WriteLine("# strings of length {0}: {1} (probability = {2:0.00%})", i, numStrings, probability);
sumOfProbabilities += probability;
}
Console.WriteLine("Probabilities sum to {0:0.00%}.", sumOfProbabilities);
Console.ReadLine();
}
Output:
First 10 results:
wmkyw
qqowc
ackai
tokmo
eeiyw
cakgg
vceec
qwqyq
aiomt
qkyav
# strings of length 1: 1 (probability = 0.00%)
# strings of length 2: 38 (probability = 0.03%)
# strings of length 3: 475 (probability = 0.47%)
# strings of length 4: 6633 (probability = 6.63%)
# strings of length 5: 92853 (probability = 92.86%)
Probabilities sum to 100.00%.
My idea regarding this is like:
you have 1-n length string.there 26 possible 1 length string,26*26 2 length string and so on.
you can find out the percentage of each length string of the total possible strings.for example percentage of single length string is like
((26/(TOTAL_POSSIBLE_STRINGS_OF_ALL_LENGTH))*100).
similarly you can find out the percentage of other length strings.
Mark them on a number line between 1 to 100.ie suppose percentage of single length string is 3 and double length string is 6 then number line single length string lies between 0-3 while double length string lies between 3-9 and so on.
Now take a random number between 1 to 100.find out the range in which this number lies.I mean suppose for examplethe number you have randomly chosen is 2.Now this number lies between 0-3 so go 1 length string or if the random number chosen is 7 then go for double length string.
In this fashion you can see that length of each string choosen will be proportional to the percentage of the total number of that length string contribute to the all possible strings.
Hope I am clear.
Disclaimer: I have not gone through above solution except one or two.So if it matches with some one solution it will be purely a chance.
Also,I will welcome all the advice and positive criticism and correct me if I am wrong.
Thanks and regard
Mawia
Matthieu: Your idea doesn't work because strings with blanks are still more likely to be generated. In your case, with n=4, you could have the string 'ab' generated as 'a' + 'b' + '' + '' or '' + 'a' + 'b' + '', or other combinations. Thus not all the strings have the same chance of appearing.

Resources