Related
I'd like to know how you can take the exact n-th root of a number (in any programming language). When I use a physical calculator, I can type something like sqrt(12) (nicely formatted of course) and get as a result 2 sqrt(3). How can I achieve this not only with square roots but any type of root when representing a number as numerator and denominator. Of course, I would have to use another representation, but I don't have any idea how this works in general.
Thanks in advance.
I doubt this is an efficient way, but it would work. Assuming you want to take the nth root of some number m:
Calculate the prime factorization m = p1a1 * p2a2 * ... * pxax.
For each 1 <= i <= x let ki = ai div n and ri = ai mod n.
The part that gets factored out is then p1k1 * p2k2 * ... * pxkx.
The part that remains "under the root" is p1r1 * p2r2 * ... * pxrx.
The first step is the only tricky one. Once you have found all prime factors of m it is just a matter of looping over those factors and dividing out the multiples of n.
To simplify the n-th root of a number, the algorithm shouldn't do prime factorisation, but rather "n-th power factorisation", i.e. look for the largest n-th power inside the root, which you can then move outside the root. For example: the 3rd root of 250 equals the third root of 2 x 125; since 125 is the third power of 5, you can move it out of the root and get: 5 times the third root of 2.
Algorithm: take the floating-point n-th root of the number, and round it down, then check this and all smaller integers until you find the largest integer whose n-th power divides the number; then divide the number by the n-th power and move the integer out of the root.
This javascript example shows a basic implementation; you could clean it up further by printing 11/root simply as 1; further optimisation is undoubtedly possible.
function integerRoot(number, root) {
var base = number, factor = 1;
var max = Math.floor(Math.pow(base, 1/root));
for (var i = max; i > 1; i--) {
var power = Math.pow(i, root);
if (base % power == 0) {
base /= power;
factor *= i;
break;
}
}
document.write(number + "<SUP>1/" + root + "</SUP> = " +
factor + " × " + base + "<SUP>1/" + root + "</SUP><BR>");
}
integerRoot(25, 3);
integerRoot(27, 3);
integerRoot(81, 3);
integerRoot(135, 3);
integerRoot(375, 3);
integerRoot(8*27*64*17, 3);
UPDATE: This is a more efficient version; I haven't yet taken negative numbers into account, though, so there's definitely room for further improvement.
function simplifyRoot(radicand, degree) {
var factor = 1, base = 1, power;
while ((power = Math.pow(++base, degree)) <= radicand) {
while (radicand % power == 0) {
factor *= base;
radicand /= power;
}
}
return {factor: factor, radicand: radicand, degree: degree};
}
var radicand = 8*27*36*64*125*216, degree = 3;
var simplified = simplifyRoot(radicand, degree);
document.write(radicand + "<SUP>1/" + degree + "</SUP> = " +
simplified.factor + " × " + simplified.radicand + "<SUP>1/" + simplified.degree + "</SUP><BR>");
I have a very large set (billions or more, it's expected to grow exponentially to some level), and I want to generate seemingly random elements from it without repeating. I know I can pick a random number and repeat and record the elements I have generated, but that takes more and more memory as numbers are generated, and wouldn't be practical after couple millions elements out.
I mean, I could say 1, 2, 3 up to billions and each would be constant time without remembering all the previous, or I can say 1,3,5,7,9 and on then 2,4,6,8,10, but is there a more sophisticated way to do that and eventually get a seemingly random permutation of that set?
Update
1, The set does not change size in the generation process. I meant when the user's input increases linearly, the size of the set increases exponentially.
2, In short, the set is like the set of every integer from 1 to 10 billions or more.
3, In long, it goes up to 10 billion because each element carries the information of many independent choices, for example. Imagine an RPG character that have 10 attributes, each can go from 1 to 100 (for my problem different choices can have different ranges), thus there's 10^20 possible characters, number "10873456879326587345" would correspond to a character that have "11, 88, 35...", and I would like an algorithm to generate them one by one without repeating, but makes it looks random.
Thanks for the interesting question. You can create a "pseudorandom"* (cyclic) permutation with a few bytes using modular exponentiation. Say we have n elements. Search for a prime p that's bigger than n+1. Then find a primitive root g modulo p. Basically by definition of primitive root, the action x --> (g * x) % p is a cyclic permutation of {1, ..., p-1}. And so x --> ((g * (x+1))%p) - 1 is a cyclic permutation of {0, ..., p-2}. We can get a cyclic permutation of {0, ..., n-1} by repeating the previous permutation if it gives a value bigger (or equal) n.
I implemented this idea as a Go package. https://github.com/bwesterb/powercycle
package main
import (
"fmt"
"github.com/bwesterb/powercycle"
)
func main() {
var x uint64
cycle := powercycle.New(10)
for i := 0; i < 10; i++ {
fmt.Println(x)
x = cycle.Apply(x)
}
}
This outputs something like
0
6
4
1
2
9
3
5
8
7
but that might vary off course depending on the generator chosen.
It's fast, but not super-fast: on my five year old i7 it takes less than 210ns to compute one application of a cycle on 1000000000000000 elements. More details:
BenchmarkNew10-8 1000000 1328 ns/op
BenchmarkNew1000-8 500000 2566 ns/op
BenchmarkNew1000000-8 50000 25893 ns/op
BenchmarkNew1000000000-8 200000 7589 ns/op
BenchmarkNew1000000000000-8 2000 648785 ns/op
BenchmarkApply10-8 10000000 170 ns/op
BenchmarkApply1000-8 10000000 173 ns/op
BenchmarkApply1000000-8 10000000 172 ns/op
BenchmarkApply1000000000-8 10000000 169 ns/op
BenchmarkApply1000000000000-8 10000000 201 ns/op
BenchmarkApply1000000000000000-8 10000000 204 ns/op
Why did I say "pseudorandom"? Well, we are always creating a very specific kind of cycle: namely one that uses modular exponentiation. It looks pretty pseudorandom though.
I would use a random number and swap it with an element at the beginning of the set.
Here's some pseudo code
set = [1, 2, 3, 4, 5, 6]
picked = 0
Function PickNext(set, picked)
If picked > Len(set) - 1 Then
Return Nothing
End If
// random number between picked (inclusive) and length (exclusive)
r = RandomInt(picked, Len(set))
// swap the picked element to the beginning of the set
result = set[r]
set[r] = set[picked]
set[picked] = result
// update picked
picked++
// return your next random element
Return temp
End Function
Every time you pick an element there is one swap and the only extra memory being used is the picked variable. The swap can happen if the elements are in a database or in memory.
EDIT Here's a jsfiddle of a working implementation http://jsfiddle.net/sun8rw4d/
JavaScript
var set = [];
set.picked = 0;
function pickNext(set) {
if(set.picked > set.length - 1) { return null; }
var r = set.picked + Math.floor(Math.random() * (set.length - set.picked));
var result = set[r];
set[r] = set[set.picked];
set[set.picked] = result;
set.picked++;
return result;
}
// testing
for(var i=0; i<100; i++) {
set.push(i);
}
while(pickNext(set) !== null) { }
document.body.innerHTML += set.toString();
EDIT 2 Finally, a random binary walk of the set. This can be accomplished with O(Log2(N)) stack space (memory) which for 10billion is only 33. There's no shuffling or swapping involved. Using trinary instead of binary might yield even better pseudo random results.
// on the fly set generator
var count = 0;
var maxValue = 64;
function nextElement() {
// restart the generation
if(count == maxValue) {
count = 0;
}
return count++;
}
// code to pseudo randomly select elements
var current = 0;
var stack = [0, maxValue - 1];
function randomBinaryWalk() {
if(stack.length == 0) { return null; }
var high = stack.pop();
var low = stack.pop();
var mid = ((high + low) / 2) | 0;
// pseudo randomly choose the next path
if(Math.random() > 0.5) {
if(low <= mid - 1) {
stack.push(low);
stack.push(mid - 1);
}
if(mid + 1 <= high) {
stack.push(mid + 1);
stack.push(high);
}
} else {
if(mid + 1 <= high) {
stack.push(mid + 1);
stack.push(high);
}
if(low <= mid - 1) {
stack.push(low);
stack.push(mid - 1);
}
}
// how many elements to skip
var toMid = (current < mid ? mid - current : (maxValue - current) + mid);
// skip elements
for(var i = 0; i < toMid - 1; i++) {
nextElement();
}
current = mid;
// get result
return nextElement();
}
// test
var result;
var list = [];
do {
result = randomBinaryWalk();
list.push(result);
} while(result !== null);
document.body.innerHTML += '<br/>' + list.toString();
Here's the results from a couple of runs with a small set of 64 elements. JSFiddle http://jsfiddle.net/yooLjtgu/
30,46,38,34,36,35,37,32,33,31,42,40,41,39,44,45,43,54,50,52,53,51,48,47,49,58,60,59,61,62,56,57,55,14,22,18,20,19,21,16,15,17,26,28,29,27,24,25,23,6,2,4,5,3,0,1,63,10,8,7,9,12,11,13
30,14,22,18,16,15,17,20,19,21,26,28,29,27,24,23,25,6,10,8,7,9,12,13,11,2,0,63,1,4,5,3,46,38,42,44,45,43,40,41,39,34,36,35,37,32,31,33,54,58,56,55,57,60,59,61,62,50,48,49,47,52,51,53
As I mentioned in my comment, unless you have an efficient way to skip to a specific point in your "on the fly" generation of the set this will not be very efficient.
if it is enumerable then use a pseudo-random integer generator adjusted to the period 0 .. 2^n - 1 where the upper bound is just greater than the size of your set and generate pseudo-random integers discarding those more than the size of your set. Use those integers to index items from your set.
Pre- compute yourself a series of indices (e.g. in a file), which has the properties you need and then randomly choose a start index for your enumeration and use the series in a round-robin manner.
The length of your pre-computed series should be > the maximum size of the set.
If you combine this (depending on your programming language etc.) with file mappings, your final nextIndex(INOUT state) function is (nearly) as simple as return mappedIndices[state++ % PERIOD];, if you have a fixed size of each entry (e.g. 8 bytes -> uint64_t).
Of course, the returned value could be > your current set size. Simply draw indices until you get one which is <= your sets current size.
Update (In response to question-update):
There is another option to achieve your goal if it is about creating 10Billion unique characters in your RPG: Generate a GUID and write yourself a function which computes your number from the GUID. man uuid if you are are on a unix system. Else google it. Some parts of the uuid are not random but contain meta-info, some parts are either systematic (such as your network cards MAC address) or random, depending on generator algorithm. But they are very very most likely unique. So, whenever you need a new unique number, generate a uuid and transform it to your number by means of some algorithm which basically maps the uuid bytes to your number in a non-trivial way (e.g. use hash functions).
I'm working on a Free Code Camp problem - http://www.freecodecamp.com/challenges/bonfire-no-repeats-please
The problem description is as follows -
Return the number of total permutations of the provided string that
don't have repeated consecutive letters. For example, 'aab' should
return 2 because it has 6 total permutations, but only 2 of them don't
have the same letter (in this case 'a') repeating.
I know I can solve this by writing a program that creates every permutation and then filters out the ones with repeated characters.
But I have this gnawing feeling that I can solve this mathematically.
First question then - Can I?
Second question - If yes, what formula could I use?
To elaborate further -
The example given in the problem is "aab" which the site says has six possible permutations, with only two meeting the non-repeated character criteria:
aab aba baa aab aba baa
The problem sees each character as unique so maybe "aab" could better be described as "a1a2b"
The tests for this problem are as follows (returning the number of permutations that meet the criteria)-
"aab" should return 2
"aaa" should return 0
"abcdefa" should return 3600
"abfdefa" should return 2640
"zzzzzzzz" should return 0
I have read through a lot of post about Combinatorics and Permutations and just seem to be digging a deeper hole for myself. But I really want to try to resolve this problem efficiently rather than brute force through an array of all possible permutations.
I posted this question on math.stackexchange - https://math.stackexchange.com/q/1410184/264492
The maths to resolve the case where only one character is repeated is pretty trivial - Factorial of total number of characters minus number of spaces available multiplied by repeated characters.
"aab" = 3! - 2! * 2! = 2
"abcdefa" = 7! - 6! * 2! = 3600
But trying to figure out the formula for the instances where more than one character is repeated has eluded me. e.g. "abfdefa"
This is a mathematical approach, that doesn't need to check all the possible strings.
Let's start with this string:
abfdefa
To find the solution we have to calculate the total number of permutations (without restrictions), and then subtract the invalid ones.
TOTAL OF PERMUTATIONS
We have to fill a number of positions, that is equal to the length of the original string. Let's consider each position a small box.
So, if we have
abfdefa
which has 7 characters, there are seven boxes to fill. We can fill the first with any of the 7 characters, the second with any of the remaining 6, and so on. So the total number of permutations, without restrictions, is:
7 * 6 * 5 * 4 * 3 * 2 * 1 = 7! (= 5,040)
INVALID PERMUTATIONS
Any permutation with two equal characters side by side is not valid. Let's see how many of those we have.
To calculate them, we'll consider that any character that has the same character side by side, will be in the same box. As they have to be together, why don't consider them something like a "compound" character?
Our example string has two repeated characters: the 'a' appears twice, and the 'f' also appears twice.
Number of permutations with 'aa'
Now we have only six boxes, as one of them will be filled with 'aa':
6 * 5 * 4 * 3 * 2 * 1 = 6!
We also have to consider that the two 'a' can be themselves permuted in 2! (as we have two 'a') ways.
So, the total number of permutations with two 'a' together is:
6! * 2! (= 1,440)
Number of permutations with 'ff'
Of course, as we also have two 'f', the number of permutations with 'ff' will be the same as the ones with 'aa':
6! * 2! (= 1,440)
OVERLAPS
If we had only one character repeated, the problem is finished, and the final result would be TOTAL - INVALID permutations.
But, if we have more than one repeated character, we have counted some of the invalid strings twice or more times.
We have to notice that some of the permutations with two 'a' together, will also have two 'f' together, and vice versa, so we need to add those back.
How do we count them?
As we have two repeated characters, we will consider two "compound" boxes: one for occurrences of 'aa' and other for 'ff' (both at the same time).
So now we have to fill 5 boxes: one with 'aa', other with 'ff', and 3 with the remaining 'b', 'd' and 'e'.
Also, each of those 'aa' and 'bb' can be permuted in 2! ways. So the total number of overlaps is:
5! * 2! * 2! (= 480)
FINAL SOLUTION
The final solution to this problem will be:
TOTAL - INVALID + OVERLAPS
And that's:
7! - (2 * 6! * 2!) + (5! * 2! * 2!) = 5,040 - 2 * 1,440 + 480 = 2,640
It seemed like a straightforward enough problem, but I spent hours on the wrong track before finally figuring out the correct logic. To find all permutations of a string with one or multiple repeated characters, while keeping identical characters seperated:
Start with a string like:
abcdabc
Seperate the first occurances from the repeats:
firsts: abcd
repeats: abc
Find all permutations of the firsts:
abcd abdc adbc adcb ...
Then, one by one, insert the repeats into each permutation, following these rules:
Start with the repeated character whose original comes first in the firsts
e.g. when inserting abc into dbac, use b first
Put the repeat two places or more after the first occurance
e.g. when inserting b into dbac, results are dbabc and dbacb
Then recurse for each result with the remaining repeated characters
I've seen this question with one repeated character, where the number of permutations of abcdefa where the two a's are kept seperate is given as 3600. However, this way of counting considers abcdefa and abcdefa to be two distinct permutations, "because the a's are swapped". In my opinion, this is just one permutation and its double, and the correct answer is 1800; the algorithm below will return both results.
function seperatedPermutations(str) {
var total = 0, firsts = "", repeats = "";
for (var i = 0; i < str.length; i++) {
char = str.charAt(i);
if (str.indexOf(char) == i) firsts += char; else repeats += char;
}
var firsts = stringPermutator(firsts);
for (var i = 0; i < firsts.length; i++) {
insertRepeats(firsts[i], repeats);
}
alert("Permutations of \"" + str + "\"\ntotal: " + (Math.pow(2, repeats.length) * total) + ", unique: " + total);
// RECURSIVE CHARACTER INSERTER
function insertRepeats(firsts, repeats) {
var pos = -1;
for (var i = 0; i < firsts.length, pos < 0; i++) {
pos = repeats.indexOf(firsts.charAt(i));
}
var char = repeats.charAt(pos);
for (var i = firsts.indexOf(char) + 2; i <= firsts.length; i++) {
var combi = firsts.slice(0, i) + char + firsts.slice(i);
if (repeats.length > 1) {
insertRepeats(combi, repeats.slice(0, pos) + repeats.slice(pos + 1));
} else {
document.write(combi + "<BR>");
++total;
}
}
}
// STRING PERMUTATOR (after Filip Nguyen)
function stringPermutator(str) {
var fact = [1], permutations = [];
for (var i = 1; i <= str.length; i++) fact[i] = i * fact[i - 1];
for (var i = 0; i < fact[str.length]; i++) {
var perm = "", temp = str, code = i;
for (var pos = str.length; pos > 0; pos--) {
var sel = code / fact[pos - 1];
perm += temp.charAt(sel);
code = code % fact[pos - 1];
temp = temp.substring(0, sel) + temp.substring(sel + 1);
}
permutations.push(perm);
}
return permutations;
}
}
seperatedPermutations("abfdefa");
A calculation based on this logic of the number of results for a string like abfdefa, with 5 "first" characters and 2 repeated characters (A and F) , would be:
The 5 "first" characters create 5! = 120 permutations
Each character can be in 5 positions, with 24 permutations each:
A**** (24)
*A*** (24)
**A** (24)
***A* (24)
****A (24)
For each of these positions, the repeat character has to come at least 2 places after its "first", so that makes 4, 3, 2 and 1 places respectively (for the last position, a repeat is impossible). With the repeated character inserted, this makes 240 permutations:
A***** (24 * 4)
*A**** (24 * 3)
**A*** (24 * 2)
***A** (24 * 1)
In each of these cases, the second character that will be repeated could be in 6 places, and the repeat character would have 5, 4, 3, 2, and 1 place to go. However, the second (F) character cannot be in the same place as the first (A) character, so one of the combinations is always impossible:
A****** (24 * 4 * (0+4+3+2+1)) = 24 * 4 * 10 = 960
*A***** (24 * 3 * (5+0+3+2+1)) = 24 * 3 * 11 = 792
**A**** (24 * 2 * (5+4+0+2+1)) = 24 * 2 * 12 = 576
***A*** (24 * 1 * (5+4+3+0+1)) = 24 * 1 * 13 = 312
And 960 + 792 + 576 + 312 = 2640, the expected result.
Or, for any string like abfdefa with 2 repeats:
where F is the number of "firsts".
To calculate the total without identical permutations (which I think makes more sense) you'd divide this number by 2^R, where R is the number or repeats.
Here's one way to think about it, which still seems a bit complicated to me: subtract the count of possibilities with disallowed neighbors.
For example abfdefa:
There are 6 ways to place "aa" or "ff" between the 5! ways to arrange the other five
letters, so altogether 5! * 6 * 2, multiplied by their number of permutations (2).
Based on the inclusion-exclusion principle, we subtract those possibilities that include
both "aa" and "ff" from the count above: 3! * (2 + 4 - 1) choose 2 ways to place both
"aa" and "ff" around the other three letters, and we must multiply by the permutation
counts within (2 * 2) and between (2).
So altogether,
7! - (5! * 6 * 2 * 2 - 3! * (2 + 4 - 1) choose 2 * 2 * 2 * 2) = 2640
I used the formula for multiset combinations for the count of ways to place the letter pairs between the rest.
A generalizable way that might achieve some improvement over the brute force solution is to enumerate the ways to interleave the letters with repeats and then multiply by the ways to partition the rest around them, taking into account the spaces that must be filled. The example, abfdefa, might look something like this:
afaf / fafa => (5 + 3 - 1) choose 3 // all ways to partition the rest
affa / faaf => 1 + 4 + (4 + 2 - 1) choose 2 // all three in the middle; two in the middle, one anywhere else; one in the middle, two anywhere else
aaff / ffaa => 3 + 1 + 1 // one in each required space, the other anywhere else; two in one required space, one in the other (x2)
Finally, multiply by the permutation counts, so altogether:
2 * 2! * 2! * 3! * ((5 + 3 - 1) choose 3 + 1 + 4 + (4 + 2 - 1) choose 2 + 3 + 1 + 1) = 2640
Well I won't have any mathematical solution for you here.
I guess you know backtracking as I percieved from your answer.So you can use Backtracking to generate all permutations and skipping a particular permutation whenever a repeat is encountered. This method is called Backtracking and Pruning.
Let n be the the length of the solution string, say(a1,a2,....an).
So during backtracking when only partial solution was formed, say (a1,a2,....ak) compare the values at ak and a(k-1).
Obviously you need to maintaion a reference to a previous letter(here a(k-1))
If both are same then break out from the partial solution, without reaching to the end and start creating another permutation from a1.
Thanks Lurai for great suggestion. It took a while and is a bit lengthy but here's my solution (it passes all test cases at FreeCodeCamp after converting to JavaScript of course) - apologies for crappy variables names (learning how to be a bad programmer too ;)) :D
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
public class PermAlone {
public static int permAlone(String str) {
int length = str.length();
int total = 0;
int invalid = 0;
int overlap = 0;
ArrayList<Integer> vals = new ArrayList<>();
Map<Character, Integer> chars = new HashMap<>();
// obtain individual characters and their frequencies from the string
for (int i = 0; i < length; i++) {
char key = str.charAt(i);
if (!chars.containsKey(key)) {
chars.put(key, 1);
}
else {
chars.put(key, chars.get(key) + 1);
}
}
// if one character repeated set total to 0
if (chars.size() == 1 && length > 1) {
total = 0;
}
// otherwise calculate total, invalid permutations and overlap
else {
// calculate total
total = factorial(length);
// calculate invalid permutations
for (char key : chars.keySet()) {
int len = 0;
int lenPerm = 0;
int charPerm = 0;
int val = chars.get(key);
int check = 1;
// if val > 0 there will be more invalid permutations to calculate
if (val > 1) {
check = val;
vals.add(val);
}
while (check > 1) {
len = length - check + 1;
lenPerm = factorial(len);
charPerm = factorial(check);
invalid = lenPerm * charPerm;
total -= invalid;
check--;
}
}
// calculate overlaps
if (vals.size() > 1) {
overlap = factorial(chars.size());
for (int val : vals) {
overlap *= factorial(val);
}
}
total += overlap;
}
return total;
}
// helper function to calculate factorials - not recursive as I was running out of memory on the platform :?
private static int factorial(int num) {
int result = 1;
if (num == 0 || num == 1) {
result = num;
}
else {
for (int i = 2; i <= num; i++) {
result *= i;
}
}
return result;
}
public static void main(String[] args) {
System.out.printf("For %s: %d\n\n", "aab", permAlone("aab")); // expected 2
System.out.printf("For %s: %d\n\n", "aaa", permAlone("aaa")); // expected 0
System.out.printf("For %s: %d\n\n", "aabb", permAlone("aabb")); // expected 8
System.out.printf("For %s: %d\n\n", "abcdefa", permAlone("abcdefa")); // expected 3600
System.out.printf("For %s: %d\n\n", "abfdefa", permAlone("abfdefa")); // expected 2640
System.out.printf("For %s: %d\n\n", "zzzzzzzz", permAlone("zzzzzzzz")); // expected 0
System.out.printf("For %s: %d\n\n", "a", permAlone("a")); // expected 1
System.out.printf("For %s: %d\n\n", "aaab", permAlone("aaab")); // expected 0
System.out.printf("For %s: %d\n\n", "aaabb", permAlone("aaabb")); // expected 12
System.out.printf("For %s: %d\n\n", "abbc", permAlone("abbc")); //expected 12
}
}
I am currently building an app in xcode and I have something i'm stuck on... for example if the total of a question came to 15 how do you seperate the "1" and "5" and add those two number and recieve six? and i only want to display the six for my pp user to see
9+6 = 15
nut instead i want it to display as 9+6= 15/6
The wording of your post is a little confusing. Are you asking how to separate numbers into their individual digits, and then do things with those digits?
Not sure exactly what language you're writing in here, but in C:
int firstDigit = 0;
int secondDigit = 0;
int result = 0;
int num = 15;
firstDigit = num % 10; // 15 % 10 = 5
num /= 10; // 15 / 10 = 1
secondDigit = num % 10; // 1 % 10 = 1
result = firstDigit + secondDigit; // 5 + 1 = 6
Taking a number modulo 10 allows you to easily isolate the trailing digit.
You could even throw the above logic (isolate trailing digit, chop off trailing digit) into a loop to deal with arbitrarily-long numbers (within reason, of course).
I would like to generate a random string (or a series of random strings, repetitions allowed) of length between 1 and n characters from some (finite) alphabet. Each string should be equally likely (in other words, the strings should be uniformly distributed).
The uniformity requirement means that an algorithm like this doesn't work:
alphabet = "abcdefghijklmnopqrstuvwxyz"
len = rand(1, n)
s = ""
for(i = 0; i < len; ++i)
s = s + alphabet[rand(0, 25)]
(pseudo code, rand(a, b) returns a integer between a and b, inclusively, each integer equally likely)
This algorithm generates strings with uniformly distributed lengths, but the actual distribution should be weighted toward longer strings (there are 26 times as many strings with length 2 as there are with length 1, and so on.) How can I achieve this?
What you need to do is generate your length and then your string as two distinct steps. You will need to first chose the length using a weighted approach. You can calculate the number of strings of a given length l for an alphabet of k symbols as k^l. Sum those up and then you have the total number of strings of any length, your first step is to generate a random number between 1 and that value and then bin it accordingly. Modulo off by one errors you would break at 26, 26^2, 26^3, 26^4 and so on. The logarithm based on the number of symbols would be useful for this task.
Once you have you length then you can generate the string as you have above.
Okay, there are 26 possibilities for a 1-character string, 262 for a 2-character string, and so on up to 2626 possibilities for a 26-character string.
That means there are 26 times as many possibilities for an (N)-character string than there are for an (N-1)-character string. You can use that fact to select your length:
def getlen(maxlen):
sz = maxlen
while sz != 1:
if rnd(27) != 1:
return sz
sz--;
return 1
I use 27 in the above code since the total sample space for selecting strings from "ab" is the 26 1-character possibilities and the 262 2-character possibilities. In other words, the ratio is 1:26 so 1-character has a probability of 1/27 (rather than 1/26 as I first answered).
This solution isn't perfect since you're calling rnd multiple times and it would be better to call it once with an possible range of 26N+26N-1+261 and select the length based on where your returned number falls within there but it may be difficult to find a random number generator that'll work on numbers that large (10 characters gives you a possible range of 2610+...+261 which, unless I've done the math wrong, is 146,813,779,479,510).
If you can limit the maximum size so that your rnd function will work in the range, something like this should be workable:
def getlen(chars,maxlen):
assert maxlen >= 1
range = chars
sampspace = 0
for i in 1 .. maxlen:
sampspace = sampspace + range
range = range * chars
range = range / chars
val = rnd(sampspace)
sz = maxlen
while val < sampspace - range:
sampspace = sampspace - range
range = range / chars
sz = sz - 1
return sz
Once you have the length, I would then use your current algorithm to choose the actual characters to populate the string.
Explaining it further:
Let's say our alphabet only consists of "ab". The possible sets up to length 3 are [ab] (2), [ab][ab] (4) and [ab][ab][ab] (8). So there is a 8/14 chance of getting a length of 3, 4/14 of length 2 and 2/14 of length 1.
The 14 is the magic figure: it's the sum of all 2n for n = 1 to the maximum length. So, testing that pseudo-code above with chars = 2 and maxlen = 3:
assert maxlen >= 1 [okay]
range = chars [2]
sampspace = 0
for i in 1 .. 3:
i = 1:
sampspace = sampspace + range [0 + 2 = 2]
range = range * chars [2 * 2 = 4]
i = 2:
sampspace = sampspace + range [2 + 4 = 6]
range = range * chars [4 * 2 = 8]
i = 3:
sampspace = sampspace + range [6 + 8 = 14]
range = range * chars [8 * 2 = 16]
range = range / chars [16 / 2 = 8]
val = rnd(sampspace) [number from 0 to 13 inclusive]
sz = maxlen [3]
while val < sampspace - range: [see below]
sampspace = sampspace - range
range = range / chars
sz = sz - 1
return sz
So, from that code, the first iteration of the final loop will exit with sz = 3 if val is greater than or equal to sampspace - range [14 - 8 = 6]. In other words, for the values 6 through 13 inclusive, 8 of the 14 possibilities.
Otherwise, sampspace becomes sampspace - range [14 - 8 = 6] and range becomes range / chars [8 / 2 = 4].
Then the second iteration of the final loop will exit with sz = 2 if val is greater than or equal to sampspace - range [6 - 4 = 2]. In other words, for the values 2 through 5 inclusive, 4 of the 14 possibilities.
Otherwise, sampspace becomes sampspace - range [6 - 4 = 2] and range becomes range / chars [4 / 2 = 2].
Then the third iteration of the final loop will exit with sz = 1 if val is greater than or equal to sampspace - range [2 - 2 = 0]. In other words, for the values 0 through 1 inclusive, 2 of the 14 possibilities (this iteration will always exit since the value must be greater than or equal to zero.
In retrospect, that second solution is a bit of a nightmare. In my personal opinion, I'd go for the first solution for its simplicity and to avoid the possibility of rather large numbers.
Building on my comment posted as a reply to the OP:
I'd consider it an exercise in base
conversion. You're simply generating a
"random number" in "base 26", where
a=0 and z=25. For a random string of
length n, generate a number between 1
and 26^n. Convert from base 10 to base
26, using symbols from your chosen
alphabet.
Here's a PHP implementation. I won't guaranty that there isn't an off-by-one error or two in here, but any such error should be minor:
<?php
$n = 5;
var_dump(randstr($n));
function randstr($maxlen) {
$dict = 'abcdefghijklmnopqrstuvwxyz';
$rand = rand(0, pow(strlen($dict), $maxlen));
$str = base_convert($rand, 10, 26);
//base convert returns base 26 using 0-9 and 15 letters a-p(?)
//we must convert those to our own set of symbols
return strtr($str, '1234567890abcdefghijklmnopqrstuvwxyz', $dict);
}
Instead of picking a length with uniform distribution, weight it according to how many strings are a given length. If your alphabet is size m, there are mx strings of size x, and (1-mn+1)/(1-m) strings of length n or less. The probability of choosing a string of length x should be mx*(1-m)/(1-mn+1).
Edit:
Regarding overflow - using floating point instead of integers will expand the range, so for a 26-character alphabet and single-precision floats, direct weight calculation shouldn't overflow for n<26.
A more robust approach is to deal with it iteratively. This should also minimize the effects of underflow:
int randomLength() {
for(int i = n; i > 0; i--) {
double d = Math.random();
if(d > (m - 1) / (m - Math.pow(m, -i))) {
return i;
}
}
return 0;
}
To make this more efficient by calculating fewer random numbers, we can reuse them by splitting intervals in more than one place:
int randomLength() {
for(int i = n; i > 0; i -= 5) {
double d = Math.random();
double c = (m - 1) / (m - Math.pow(m, -i))
for(int j = 0; j < 5; j++) {
if(d > c) {
return i - j;
}
c /= m;
}
}
for(int i = n % 0; i > 0; i--) {
double d = Math.random();
if(d > (m - 1) / (m - Math.pow(m, -i))) {
return i;
}
}
return 0;
}
Edit: This answer isn't quite right. See the bottom for a disproof. I'll leave it up for now in the hope someone can come up with a variant that fixes it.
It's possible to do this without calculating the length separately - which, as others have pointed out, requires raising a number to a large power, and generally seems like a messy solution to me.
Proving that this is correct is a little tough, and I'm not sure I trust my expository powers to make it clear, but bear with me. For the purposes of the explanation, we're generating strings of length at most n from an alphabet a of |a| characters.
First, imagine you have a maximum length of n, and you've already decided you're generating a string of at least length n-1. It should be obvious that there are |a|+1 equally likely possibilities: we can generate any of the |a| characters from the alphabet, or we can choose to terminate with n-1 characters. To decide, we simply pick a random number x between 0 and |a| (inclusive); if x is |a|, we terminate at n-1 characters; otherwise, we append the xth character of a to the string. Here's a simple implementation of this procedure in Python:
def pick_character(alphabet):
x = random.randrange(len(alphabet) + 1)
if x == len(alphabet):
return ''
else:
return alphabet[x]
Now, we can apply this recursively. To generate the kth character of the string, we first attempt to generate the characters after k. If our recursive invocation returns anything, then we know the string should be at least length k, and we generate a character of our own from the alphabet and return it. If, however, the recursive invocation returns nothing, we know the string is no longer than k, and we use the above routine to select either the final character or no character. Here's an implementation of this in Python:
def uniform_random_string(alphabet, max_len):
if max_len == 1:
return pick_character(alphabet)
suffix = uniform_random_string(alphabet, max_len - 1)
if suffix:
# String contains characters after ours
return random.choice(alphabet) + suffix
else:
# String contains no characters after our own
return pick_character(alphabet)
If you doubt the uniformity of this function, you can attempt to disprove it: suggest a string for which there are two distinct ways to generate it, or none. If there are no such strings - and alas, I do not have a robust proof of this fact, though I'm fairly certain it's true - and given that the individual selections are uniform, then the result must also select any string with uniform probability.
As promised, and unlike every other solution posted thus far, no raising of numbers to large powers is required; no arbitrary length integers or floating point numbers are needed to store the result, and the validity, at least to my eyes, is fairly easy to demonstrate. It's also shorter than any fully-specified solution thus far. ;)
If anyone wants to chip in with a robust proof of the function's uniformity, I'd be extremely grateful.
Edit: Disproof, provided by a friend:
dato: so imagine alphabet = 'abc' and n = 2
dato: you have 9 strings of length 2, 3 of length 1, 1 of length 0
dato: that's 13 in total
dato: so probability of getting a length 2 string should be 9/13
dato: and probability of getting a length 1 or a length 0 should be 4/13
dato: now if you call uniform_random_string('abc', 2)
dato: that transforms itself into a call to uniform_random_string('abc', 1)
dato: which is an uniform distribution over ['a', 'b', 'c', '']
dato: the first three of those yield all the 2 length strings
dato: and the latter produce all the 1 length strings and the empty strings
dato: but 0.75 > 9/13
dato: and 0.25 < 4/13
// Note space as an available char
alphabet = "abcdefghijklmnopqrstuvwxyz "
result_string = ""
for( ;; )
{
s = ""
for( i = 0; i < n; i++ )
s += alphabet[rand(0, 26)]
first_space = n;
for( i = 0; i < n; i++ )
if( s[ i ] == ' ' )
{
first_space = i;
break;
}
ok = true;
// Reject "duplicate" shorter strings
for( i = first_space + 1; i < n; i++ )
if( s[ i ] != ' ' )
{
ok = false;
break;
}
if( !ok )
continue;
// Extract the short version of the string
for( i = 0; i < first_space; i++ )
result_string += s[ i ];
break;
}
Edit: I forgot to disallow 0-length strings, that will take a bit more code which I don't have time to add now.
Edit: After considering how my answer doesn't scale to large n (takes too long to get lucky and find an accepted string), I like paxdiablo's answer much better. Less code too.
Personally I'd do it like this:
Let's say your alphabet has Z characters. Then the number of possible strings for each length L is:
L | Z
--------------------------
1 | 26
2 | 676 (= 26 * 26)
3 | 17576 (= 26 * 26 * 26)
...and so on.
Now let's say your maximum desired length is N. Then the total number of possible strings from length 1 to N that your function could generate would be the sum of a geometric sequence:
(1 - (Z ^ (N + 1))) / (1 - Z)
Let's call this value S. Then the probability of generating a string of any length L should be:
(Z ^ L) / S
OK, fine. This is all well and good; but how do we generate a random number given a non-uniform probability distribution?
The short answer is: you don't. Get a library to do that for you. I develop mainly in .NET, so one I might turn to would be Math.NET.
That said, it's really not so hard to come up with a rudimentary approach to doing this on your own.
Here's one way: take a generator that gives you a random value within a known uniform distribution, and assign ranges within that distribution of sizes dependent on your desired distribution. Then interpret the random value provided by the generator by determining which range it falls into.
Here's an example in C# of one way you could implement this idea (scroll to the bottom for example output):
RandomStringGenerator class
public class RandomStringGenerator
{
private readonly Random _random;
private readonly char[] _alphabet;
public RandomStringGenerator(string alphabet)
{
if (string.IsNullOrEmpty(alphabet))
throw new ArgumentException("alphabet");
_random = new Random();
_alphabet = alphabet.Distinct().ToArray();
}
public string NextString(int maxLength)
{
// Get a value randomly distributed between 0.0 and 1.0 --
// this is approximately what the System.Random class provides.
double value = _random.NextDouble();
// This is where the magic happens: we "translate" the above number
// to a length based on our computed probability distribution for the given
// alphabet and the desired maximum string length.
int length = GetLengthFromRandomValue(value, _alphabet.Length, maxLength);
// The rest is easy: allocate a char array of the length determined above...
char[] chars = new char[length];
// ...populate it with a bunch of random values from the alphabet...
for (int i = 0; i < length; ++i)
{
chars[i] = _alphabet[_random.Next(0, _alphabet.Length)];
}
// ...and return a newly constructed string.
return new string(chars);
}
static int GetLengthFromRandomValue(double value, int alphabetSize, int maxLength)
{
// Looping really might not be the smartest way to do this,
// but it's the most obvious way that immediately springs to my mind.
for (int length = 1; length <= maxLength; ++length)
{
Range r = GetRangeForLength(length, alphabetSize, maxLength);
if (r.Contains(value))
return length;
}
return maxLength;
}
static Range GetRangeForLength(int length, int alphabetSize, int maxLength)
{
int L = length;
int Z = alphabetSize;
int N = maxLength;
double possibleStrings = (1 - (Math.Pow(Z, N + 1)) / (1 - Z));
double stringsOfGivenLength = Math.Pow(Z, L);
double possibleSmallerStrings = (1 - Math.Pow(Z, L)) / (1 - Z);
double probabilityOfGivenLength = ((double)stringsOfGivenLength / possibleStrings);
double probabilityOfShorterLength = ((double)possibleSmallerStrings / possibleStrings);
double startPoint = probabilityOfShorterLength;
double endPoint = probabilityOfShorterLength + probabilityOfGivenLength;
return new Range(startPoint, endPoint);
}
}
Range struct
public struct Range
{
public readonly double StartPoint;
public readonly double EndPoint;
public Range(double startPoint, double endPoint)
: this()
{
this.StartPoint = startPoint;
this.EndPoint = endPoint;
}
public bool Contains(double value)
{
return this.StartPoint <= value && value <= this.EndPoint;
}
}
Test
static void Main(string[] args)
{
const int N = 5;
const string alphabet = "acegikmoqstvwy";
int Z = alphabet.Length;
var rand = new RandomStringGenerator(alphabet);
var strings = new List<string>();
for (int i = 0; i < 100000; ++i)
{
strings.Add(rand.NextString(N));
}
Console.WriteLine("First 10 results:");
for (int i = 0; i < 10; ++i)
{
Console.WriteLine(strings[i]);
}
// sanity check
double sumOfProbabilities = 0.0;
for (int i = 1; i <= N; ++i)
{
double probability = Math.Pow(Z, i) / ((1 - (Math.Pow(Z, N + 1))) / (1 - Z));
int numStrings = strings.Count(str => str.Length == i);
Console.WriteLine("# strings of length {0}: {1} (probability = {2:0.00%})", i, numStrings, probability);
sumOfProbabilities += probability;
}
Console.WriteLine("Probabilities sum to {0:0.00%}.", sumOfProbabilities);
Console.ReadLine();
}
Output:
First 10 results:
wmkyw
qqowc
ackai
tokmo
eeiyw
cakgg
vceec
qwqyq
aiomt
qkyav
# strings of length 1: 1 (probability = 0.00%)
# strings of length 2: 38 (probability = 0.03%)
# strings of length 3: 475 (probability = 0.47%)
# strings of length 4: 6633 (probability = 6.63%)
# strings of length 5: 92853 (probability = 92.86%)
Probabilities sum to 100.00%.
My idea regarding this is like:
you have 1-n length string.there 26 possible 1 length string,26*26 2 length string and so on.
you can find out the percentage of each length string of the total possible strings.for example percentage of single length string is like
((26/(TOTAL_POSSIBLE_STRINGS_OF_ALL_LENGTH))*100).
similarly you can find out the percentage of other length strings.
Mark them on a number line between 1 to 100.ie suppose percentage of single length string is 3 and double length string is 6 then number line single length string lies between 0-3 while double length string lies between 3-9 and so on.
Now take a random number between 1 to 100.find out the range in which this number lies.I mean suppose for examplethe number you have randomly chosen is 2.Now this number lies between 0-3 so go 1 length string or if the random number chosen is 7 then go for double length string.
In this fashion you can see that length of each string choosen will be proportional to the percentage of the total number of that length string contribute to the all possible strings.
Hope I am clear.
Disclaimer: I have not gone through above solution except one or two.So if it matches with some one solution it will be purely a chance.
Also,I will welcome all the advice and positive criticism and correct me if I am wrong.
Thanks and regard
Mawia
Matthieu: Your idea doesn't work because strings with blanks are still more likely to be generated. In your case, with n=4, you could have the string 'ab' generated as 'a' + 'b' + '' + '' or '' + 'a' + 'b' + '', or other combinations. Thus not all the strings have the same chance of appearing.