Split int to separate digits so as to loop over each digit - go

Am new to GoLang. So am doing leetcode problems each day of which one was to Subtract the Product and Sum of Digits of an Integer. For this initially I thought of splitting the integer to individual numbers and then add / multiply all of them to get the output. But was unable to do that as I currently do not understand the type conversions adequately. After many trials and errors, I gave up on that approach and used the divide & modulo to get the last numbers and getting the output. Here's what I did:
func subtractProductAndSum(n int) int {
sum, prod := 0, 1
for {
if n < 10 {
sum += n
prod *= n
break
}
sum += n % 10
prod *= n % 10
n = n / 10
}
return prod - sum
}
This worked but among other answers I found one which worked, based on my first approach (Splitting and conquering), which was:
func subtractProductAndSum(n int) int {
p := 1
s := 0
strN := strconv.Itoa(n)
for _, val := range strN {
intVal := int(val - '0')
p = p * intVal
s = s + intVal
}
return p - s
}
In this approach I could not understand intVal := int(val - '0'). It certainly gets the desired output. I think val is being type casted to int but I am unable to understand what is - '0' helping with.
Hoping somebody could help.

Your strN contains a string value which is same as n. val variable in for loop is a character type data not a int digit and it's definitely not being type casted into int.
The variable val may contain any of the following character {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}.
So, if you subtract '0' from any of the digit character, you will get difference between '0' and that digit which is actually the int value you are looking for.
The subtract is basically the difference between the ASCII values of the two characters.
For example, if you subtract '0' from '5', you will get 5 which is integer value.
ASCII value of '5' and '0' is 53 and 48. So, '5' - '0' is actually 53 - 48 whch is 5.
This is how the type conversion in your code works.

Related

static_cast use to convert int to char

I have written this code to convert Decimal to binary:
string Solution::findDigitsInBinary(int A) {
if(A == 0 )
return "0" ;
else
{
string bin = "";
while(A > 0)
{
int rem = (A % 2);
bin.push_back(static_cast<char>(A % 2));
A = A/2 ;
}
reverse(bin.begin(),bin.end()) ;
return bin ;
}
}
But not getting the desired result using static_cast.
I have seen something related to this that is giving the desired result :
(char)('0'+ rem).
What's the difference between static_cast? why I am not getting the correct binary output?
With:
(char) '0' + rem;
The important difference is not the cast, but that the remainder, which always results in 0 or 1, is added to the character '0', which means that you adding a character of '0' or '1' to your string.
In your version you are adding either the integer representation of 0 or 1, but the string representations of 0 and 1 are either 48 or 49. By adding the remainder of 0 or 1 to '0' it gives a value of either 48 (character 0) or 49 (character 1).
If you do the same thing in your code it will also work.
string findDigitsInBinary(int A) {
if (A == 0)
return "0";
else
{
string bin = "";
while (A > 0)
{
int rem = (A % 2);
bin.push_back(static_cast<char>(A % 2 + '0')); // Remainder + '0'
A = A / 2;
}
reverse(bin.begin(), bin.end());
return bin;
}
Basically you should be adding characters to the string, and not numbers. So you shouldn't be adding 0 and 1 to the string, you should be adding the numbers 48 (character 0) and 49 (character 1).
This chart might illustrate better. See how the character value/digit '0' is 48 in decimal? Let's just say you wanted to add the digit 4 to the string, then because decimal 48 is 0, then you would actually want to add the decimal value of 52 to the string, 48 + 4. This is what the '0' + rem does. This is done automatically for you if you insert a character, that is, if you do:
mystring += 'A';
It will add an 'A' character to your string, but what it's actually doing in reality is converting that 'A' to decimal 65 and adding it to the string. What you have in your code is you're adding decimal numbers/integers 0 and 1, and these aren't characters in the Unicode/ASCII representation.
Now that you understand how characters are encoded, to cast an integer to a char does not change the decimal/integer to its character representation, but it changes the data type from int to char, a 4-byte data type (most likely) to a 1-byte data type. Your cast did the following:
After the modulo % operation you got a result of either 1 or 0 as an integer, let's just say you got a 1 remainder, it would look like this as an int:
00000000 00000000 00000000 00000001
After the cast to a char it would convert it to a one-byte data type, which would make it look like this:
00000001 // Now it's a one-byte data type
Whereas what a '1' digit looks like encoded as a string character is 49, which looks like this:
00110000
As for the difference between static_cast and c-style cast, the static_cast does compile-time checks and allows casts between certain types based on particular rules, whereas a c-style cast isn't as restrictive.
char a = 5;
int* p = static_cast<int*>(&a); // Will not compile
int* p2 = (int*)&a; // Will compile and run, but is discouraged as there are risks.
*p2 = 7; // You've written past the single byte char into 3 extra bytes, which is an access violation, or undefined behaviour.

Sum of numbers with approximation and no repetition

For an app I'm working on, I need to process an array of numbers and return a new array such that the sum of the elements are as close as possible to a target sum. This is similar to the coin-counting problem, with two differences:
Each element of the new array has to come from the input array (i.e. no repetition/duplication)
The algorithm should stop when it finds an array whose sum falls within X of the target number (e.g., given [10, 12, 15, 23, 26], a target of 35, and a sigma of 5, a result of [10, 12, 15] (sum 37) is OK but a result of [15, 26] (sum 41) is not.
I was considering the following algorithm (in pseudocode) but I doubt that this is the best way to do it.
function (array, goal, sigma)
var A = []
for each element E in array
if (E + (sum of rest of A) < goal +/- sigma)
A.push(E)
return A
For what it's worth, the language I'm using is Javascript. Any advice is much appreciated!
This is not intended as the best answer possible, just maybe something that will work well enough. All remarks/input is welcome.
Also, this is taking into mind the answers from the comments, that the input is length of songs (usually 100 - 600), the length of the input array is between 5 to 50 and the goal is anywhere between 100 to 7200.
The idea:
Start with finding the average value of the input, and then work out a guess on the number of input values you're going to need. Lets say that comes out x.
Order your input.
Take the first x-1 values and substitute the smallest one with the any other to get to your goal (somewhere in the range). If none exist, find a number so you're still lower than the goal.
Repeat step #3 using backtracking or something like that. Maybe limit the number of trials you're gonna spend there.
x++ and go back to step #3.
I would use some kind of divide and conquer and a recursive implementation. Here is a prototype in Smalltalk
SequenceableCollection>>subsetOfSum: s plusOrMinus: d
"check if a singleton matches"
self do: [:v | (v between: s - d and: s + d) ifTrue: [^{v}]].
"nope, engage recursion with a smaller collection"
self keysAndValuesDo: [:i :v |
| sub |
sub := (self copyWithoutIndex: i) subsetOfSum: s-v plusOrMinus: d.
sub isNil ifFalse: [^sub copyWith: v]].
"none found"
^nil
Using like this:
#(10 12 15 23 26) subsetOfSum: 62 plusOrMinus: 3.
gives:
#(23 15 12 10)
With limited input this problem is good candidate for dynamic programming with time complexity O((Sum + Sigma) * ArrayLength)
Delphi code:
function FindCombination(const A: array of Integer; Sum, Sigma: Integer): string;
var
Sums: array of Integer;
Value, idx: Integer;
begin
Result := '';
SetLength(Sums, Sum + Sigma + 1); //zero-initialized array
Sums[0] := 1; //just non-zero
for Value in A do begin
idx := Sum + Sigma;
while idx >= Value do begin
if Sums[idx - Value] <> 0 then begin //(idx-Value) sum can be formed from array]
Sums[idx] := Value; //value is included in this sum
if idx >= Sum - Sigma then begin //bingo!
while idx > 0 do begin //unwind and extract all values for this sum
Result := Result + IntToStr(Sums[idx]) + ' ';
idx := idx - Sums[idx];
end;
Exit;
end;
end;
Dec(idx); //idx--
end;
end;
end;
Here's one commented algorithm in JavaScript:
var arr = [9, 12, 20, 23, 26];
var target = 35;
var sigma = 5;
var n = arr.length;
// sort the numbers in ascending order
arr.sort(function(a,b){return a-b;});
// initialize the recursion
var stack = [[0,0,[]]];
while (stack[0] !== undefined){
var params = stack.pop();
var i = params[0]; // index
var s = params[1]; // sum so far
var r = params[2]; // accumulating list of numbers
// if the sum is within range, output sum
if (s >= target - sigma && s <= target + sigma){
console.log(r);
break;
// since the numbers are sorted, if the current
// number makes the sum too large, abandon this thread
} else if (s + arr[i] > target + sigma){
continue;
}
// there are still enough numbers left to skip this one
if (i < n - 1){
stack.push([i + 1,s,r]);
}
// there are still enough numbers left to add this one
if (i < n){
_r = r.slice();
_r.push(arr[i]);
stack.push([i + 1,s + arr[i],_r]);
}
}
/* [9,23] */

Display all the possible numbers having its digits in ascending order

Write a program that can display all the possible numbers in between given two numbers, having its digits in ascending order.
For Example:-
Input: 5000 to 6000
Output: 5678 5679 5689 5789
Input: 90 to 124
Output: 123 124
Brute force approach can make it count to all numbers and check of digits for each one of them. But I want approaches that can skip some numbers and can bring complexity lesser than O(n). Do any such solution(s) exists that can give better approach for this problem?
I offer a solution in Python. It is efficient as it considers only the relevant numbers. The basic idea is to count upwards, but handle overflow somewhat differently. While we normally set overflowing digits to 0, here we set them to the previous digit +1. Please check the inline comments for further details. You can play with it here: http://ideone.com/ePvVsQ
def ascending( na, nb ):
assert nb>=na
# split each number into a list of digits
a = list( int(x) for x in str(na))
b = list( int(x) for x in str(nb))
d = len(b) - len(a)
# if both numbers have different length add leading zeros
if d>0:
a = [0]*d + a # add leading zeros
assert len(a) == len(b)
n = len(a)
# check if the initial value has increasing digits as required,
# and fix if necessary
for x in range(d+1, n):
if a[x] <= a[x-1]:
for y in range(x, n):
a[y] = a[y-1] + 1
break
res = [] # result set
while a<=b:
# if we found a value and add it to the result list
# turn the list of digits back into an integer
if max(a) < 10:
res.append( int( ''.join( str(k) for k in a ) ) )
# in order to increase the number we look for the
# least significant digit that can be increased
for x in range( n-1, -1, -1): # count down from n-1 to 0
if a[x] < 10+x-n:
break
# digit x is to be increased
a[x] += 1
# all subsequent digits must be increased accordingly
for y in range( x+1, n ):
a[y] = a[y-1] + 1
return res
print( ascending( 5000, 9000 ) )
Sounds like task from Project Euler. Here is the solution in C++. It is not short, but it is straightforward and effective. Oh, and hey, it uses backtracking.
// Higher order digits at the back
typedef std::vector<int> Digits;
// Extract decimal digits of a number
Digits ExtractDigits(int n)
{
Digits digits;
while (n > 0)
{
digits.push_back(n % 10);
n /= 10;
}
if (digits.empty())
{
digits.push_back(0);
}
return digits;
}
// Main function
void PrintNumsRec(
const Digits& minDigits, // digits of the min value
const Digits& maxDigits, // digits of the max value
Digits& digits, // digits of current value
int pos, // current digits with index greater than pos are already filled
bool minEq, // currently filled digits are the same as of min value
bool maxEq) // currently filled digits are the same as of max value
{
if (pos < 0)
{
// Print current value. Handle leading zeros by yourself, if need
for (auto pDigit = digits.rbegin(); pDigit != digits.rend(); ++pDigit)
{
if (*pDigit >= 0)
{
std::cout << *pDigit;
}
}
std::cout << std::endl;
return;
}
// Compute iteration boundaries for current position
int first = minEq ? minDigits[pos] : 0;
int last = maxEq ? maxDigits[pos] : 9;
// The last filled digit
int prev = digits[pos + 1];
// Make sure generated number has increasing digits
int firstInc = std::max(first, prev + 1);
// Iterate through possible cases for current digit
for (int d = firstInc; d <= last; ++d)
{
digits[pos] = d;
if (d == 0 && prev == -1)
{
// Mark leading zeros with -1
digits[pos] = -1;
}
PrintNumsRec(minDigits, maxDigits, digits, pos - 1, minEq && (d == first), maxEq && (d == last));
}
}
// High-level function
void PrintNums(int min, int max)
{
auto minDigits = ExtractDigits(min);
auto maxDigits = ExtractDigits(max);
// Make digits array of the same size
while (minDigits.size() < maxDigits.size())
{
minDigits.push_back(0);
}
Digits digits(minDigits.size());
int pos = digits.size() - 1;
// Placeholder for leading zero
digits.push_back(-1);
PrintNumsRec(minDigits, maxDigits, digits, pos, true, true);
}
void main()
{
PrintNums(53, 297);
}
It uses recursion to handle arbitrary amount of digits, but it is essentially the same as the nested loops approach. Here is the output for (53, 297):
056
057
058
059
067
068
069
078
079
089
123
124
125
126
127
128
129
134
135
136
137
138
139
145
146
147
148
149
156
157
158
159
167
168
169
178
179
189
234
235
236
237
238
239
245
246
247
248
249
256
257
258
259
267
268
269
278
279
289
Much more interesting problem would be to count all these numbers without explicitly computing it. One would use dynamic programming for that.
There is only a very limited number of numbers which can match your definition (with 9 digits max) and these can be generated very fast. But if you really need speed, just cache the tree or the generated list and do a lookup when you need your result.
using System;
using System.Collections.Generic;
namespace so_ascending_digits
{
class Program
{
class Node
{
int digit;
int value;
List<Node> children;
public Node(int val = 0, int dig = 0)
{
digit = dig;
value = (val * 10) + digit;
children = new List<Node>();
for (int i = digit + 1; i < 10; i++)
{
children.Add(new Node(value, i));
}
}
public void Collect(ref List<int> collection, int min = 0, int max = Int16.MaxValue)
{
if ((value >= min) && (value <= max)) collection.Add(value);
foreach (Node n in children) if (value * 10 < max) n.Collect(ref collection, min, max);
}
}
static void Main(string[] args)
{
Node root = new Node();
List<int> numbers = new List<int>();
root.Collect(ref numbers, 5000, 6000);
numbers.Sort();
Console.WriteLine(String.Join("\n", numbers));
}
}
}
Why the brute force algorithm may be very inefficient.
One efficient way of encoding the input is to provide two numbers: the lower end of the range, a, and the number of values in the range, b-a-1. This can be encoded in O(lg a + lg (b - a)) bits, since the number of bits needed to represent a number in base-2 is roughly equal to the base-2 logarithm of the number. We can simplify this to O(lg b), because intuitively if b - a is small, then a = O(b), and if b - a is large, then b - a = O(b). Either way, the total input size is O(2 lg b) = O(lg b).
Now the brute force algorithm just checks each number from a to b, and outputs the numbers whose digits in base 10 are in increasing order. There are b - a + 1 possible numbers in that range. However, when you represent this in terms of the input size, you find that b - a + 1 = 2lg (b - a + 1) = 2O(lg b) for a large enough interval.
This means that for an input size n = O(lg b), you may need to check in the worst case O(2 n) values.
A better algorithm
Instead of checking every possible number in the interval, you can simply generate the valid numbers directly. Here's a rough overview of how. A number n can be thought of as a sequence of digits n1 ... nk, where k is again roughly log10 n.
For a and a four-digit number b, the iteration would look something like
for w in a1 .. 9:
for x in w+1 .. 9:
for y in x+1 .. 9:
for x in y+1 .. 9:
m = 1000 * w + 100 * x + 10 * y + w
if m < a:
next
if m > b:
exit
output w ++ x ++ y ++ z (++ is just string concatenation)
where a1 can be considered 0 if a has fewer digits than b.
For larger numbers, you can imagine just adding more nested for loops. In general, if b has d digits, you need d = O(lg b) loops, each of which iterates at most 10 times. The running time is thus O(10 lg b) = O(lg b) , which is a far better than the O(2lg b) running time you get by checking if every number is sorted or not.
One other detail that I have glossed over, which actually does affect the running time. As written, the algorithm needs to consider the time it takes to generate m. Without going into the details, you could assume that this adds at worst a factor of O(lg b) to the running time, resulting in an O(lg2 b) algorithm. However, using a little extra space at the top of each for loop to store partial products would save lots of redundant multiplication, allowing us to preserve the originally stated O(lg b) running time.
One way (pseudo-code):
for (digit3 = '5'; digit3 <= '6'; digit3++)
for (digit2 = digit3+1; digit2 <= '9'; digit2++)
for (digit1 = digit2+1; digit1 <= '9'; digit1++)
for (digit0 = digit1+1; digit0 <= '9'; digit0++)
output = digit3 + digit2 + digit1 + digit0; // concatenation

Fastest way to get the number in tail of a string

Given that we have this kind of string "XXXXXXX XXXXX 756", "XXXXX XXXXXX35665", (X is a character), which is the fasted way to get the number in the end of string?
EDIT: well, this is just-for-fun question. Solve this is quite simple, but I want to know the fastest algorithm to archive this. Rock it on!
In C, a quick O(n), one-pass algorithm (does not see negative signs) is :
int suffixedNumber(char* string) {
int result = 0;
char ch;
while (ch = *string++)
// Check whether <= '9' first, because most characters are > '9'.
result = (ch <= '9' && ch >= '0') ? 10*result + (ch - '0') : 0;
return result;
}
If you're alright with gotos, you can get a ≈20% faster algorithm that (in order of importance) :
returns -1 when there is no number at the end of string
avoids checking for end-of-string when ch >= '0'
avoids resetting result to zero when ch is nonnumeric
avoids multiplying result by ten when a number starts
avoids setting result to zero at the beginning
int suffixedNumber(char* string) {
int result;
char ch;
nonnumber: // STATE: Waiting for the start of a number.
ch = *string++;
if (ch > '9') goto nonnumber; // Decide this boundary first (> '9' most frequent)
if (ch < '0') { // Decide this boundary next
if (ch == '\0') return -1; // Decide this boundary last ('\0' least frequent)
goto nonnumber;
}
result = ch - '0';
number: // STATE: In the middle of a number.
ch = *string++;
if (ch > '9') goto nonnumber; // Decide this boundary first (> '9' most frequent)
if (ch < '0') { // Decide this boundary next
if (ch == '\0') return result; // Decide this boundary last ('\0' least frequent)
goto nonnumber;
}
result = 10*result + (ch - '0');
goto number;
}
Assuming the text can be streamed in reverse order (a reasonable assumption since strings in most languages are backed by an array of characters with O(1) access), construct the number by reading the text backwards until you hit a character that is not a digit or the text has been consumed entirely.
numDigits = 0
number = 0
while(numDigits <> length and characterAt[length - numDigits] is a digit)
number = number + (parseCharacterAt[length - numDigits] * (10 ^ numDigits))
numDigits = numDigits + 1
end while
if(numDigits is 0)
Error ("No digits at the end")
else return number
Note: (10 ^ numDigits) can be trivially optimized with another variable.
Without knowing language or context, if the number of digits or characters is fixed length a simple substring would do, otherwise a regex matching consecutive digits (i.e. /\d+/).
Probably some faster algorithm if you drop down to C++ levels, but I favour expressiveness.
I would just call the lastIndexOf('X') of your String object and proceed from there. Not backward looping mess.
use regular expression:
/^.+?(\d+)$/
and take first match capture group (\d+)
Edit:
if you can't use regular expressions, FASTEST WAY will be something like this:
i = string.len
while i > 0:
break if string[i].isNotNum
i--
end
out = substring(string, i,string.len)
Stating the problem:
We want to find the index of the last non-digit character
Reflexion:
This implies that we check that each character after this point is a digit, which means we will need to perform at least O(k) comparison where k is the number of digits at the end of the string
Implementation:
Linear backward search, possibly involving bitwise trickery to "vectorize" the operations (comparing multiple characters at once) or leveraging a multithreading effort.
definitely dont use regex - way too slow. Just loop backwards until you find the first non numeric character:
string s = "XXXXX XXXXXX35665";
int i = s.Length;
while (--i >= 0 && Char.IsNumber(s[i]));
s=s.Substring(i + 1);
should do the trick.. ??
extracting tail number
str="XXXXXx....XXXX333";
parseInt(str.match(/\d*$/)
if float point wanted then
parseInt(str.match(/\d|\.*$/)
increment tail number:
str.replace(/(\d*)$/,function(){return parseInt(arguments[0])+1;})

How do I generate a random string of up to a certain length?

I would like to generate a random string (or a series of random strings, repetitions allowed) of length between 1 and n characters from some (finite) alphabet. Each string should be equally likely (in other words, the strings should be uniformly distributed).
The uniformity requirement means that an algorithm like this doesn't work:
alphabet = "abcdefghijklmnopqrstuvwxyz"
len = rand(1, n)
s = ""
for(i = 0; i < len; ++i)
s = s + alphabet[rand(0, 25)]
(pseudo code, rand(a, b) returns a integer between a and b, inclusively, each integer equally likely)
This algorithm generates strings with uniformly distributed lengths, but the actual distribution should be weighted toward longer strings (there are 26 times as many strings with length 2 as there are with length 1, and so on.) How can I achieve this?
What you need to do is generate your length and then your string as two distinct steps. You will need to first chose the length using a weighted approach. You can calculate the number of strings of a given length l for an alphabet of k symbols as k^l. Sum those up and then you have the total number of strings of any length, your first step is to generate a random number between 1 and that value and then bin it accordingly. Modulo off by one errors you would break at 26, 26^2, 26^3, 26^4 and so on. The logarithm based on the number of symbols would be useful for this task.
Once you have you length then you can generate the string as you have above.
Okay, there are 26 possibilities for a 1-character string, 262 for a 2-character string, and so on up to 2626 possibilities for a 26-character string.
That means there are 26 times as many possibilities for an (N)-character string than there are for an (N-1)-character string. You can use that fact to select your length:
def getlen(maxlen):
sz = maxlen
while sz != 1:
if rnd(27) != 1:
return sz
sz--;
return 1
I use 27 in the above code since the total sample space for selecting strings from "ab" is the 26 1-character possibilities and the 262 2-character possibilities. In other words, the ratio is 1:26 so 1-character has a probability of 1/27 (rather than 1/26 as I first answered).
This solution isn't perfect since you're calling rnd multiple times and it would be better to call it once with an possible range of 26N+26N-1+261 and select the length based on where your returned number falls within there but it may be difficult to find a random number generator that'll work on numbers that large (10 characters gives you a possible range of 2610+...+261 which, unless I've done the math wrong, is 146,813,779,479,510).
If you can limit the maximum size so that your rnd function will work in the range, something like this should be workable:
def getlen(chars,maxlen):
assert maxlen >= 1
range = chars
sampspace = 0
for i in 1 .. maxlen:
sampspace = sampspace + range
range = range * chars
range = range / chars
val = rnd(sampspace)
sz = maxlen
while val < sampspace - range:
sampspace = sampspace - range
range = range / chars
sz = sz - 1
return sz
Once you have the length, I would then use your current algorithm to choose the actual characters to populate the string.
Explaining it further:
Let's say our alphabet only consists of "ab". The possible sets up to length 3 are [ab] (2), [ab][ab] (4) and [ab][ab][ab] (8). So there is a 8/14 chance of getting a length of 3, 4/14 of length 2 and 2/14 of length 1.
The 14 is the magic figure: it's the sum of all 2n for n = 1 to the maximum length. So, testing that pseudo-code above with chars = 2 and maxlen = 3:
assert maxlen >= 1 [okay]
range = chars [2]
sampspace = 0
for i in 1 .. 3:
i = 1:
sampspace = sampspace + range [0 + 2 = 2]
range = range * chars [2 * 2 = 4]
i = 2:
sampspace = sampspace + range [2 + 4 = 6]
range = range * chars [4 * 2 = 8]
i = 3:
sampspace = sampspace + range [6 + 8 = 14]
range = range * chars [8 * 2 = 16]
range = range / chars [16 / 2 = 8]
val = rnd(sampspace) [number from 0 to 13 inclusive]
sz = maxlen [3]
while val < sampspace - range: [see below]
sampspace = sampspace - range
range = range / chars
sz = sz - 1
return sz
So, from that code, the first iteration of the final loop will exit with sz = 3 if val is greater than or equal to sampspace - range [14 - 8 = 6]. In other words, for the values 6 through 13 inclusive, 8 of the 14 possibilities.
Otherwise, sampspace becomes sampspace - range [14 - 8 = 6] and range becomes range / chars [8 / 2 = 4].
Then the second iteration of the final loop will exit with sz = 2 if val is greater than or equal to sampspace - range [6 - 4 = 2]. In other words, for the values 2 through 5 inclusive, 4 of the 14 possibilities.
Otherwise, sampspace becomes sampspace - range [6 - 4 = 2] and range becomes range / chars [4 / 2 = 2].
Then the third iteration of the final loop will exit with sz = 1 if val is greater than or equal to sampspace - range [2 - 2 = 0]. In other words, for the values 0 through 1 inclusive, 2 of the 14 possibilities (this iteration will always exit since the value must be greater than or equal to zero.
In retrospect, that second solution is a bit of a nightmare. In my personal opinion, I'd go for the first solution for its simplicity and to avoid the possibility of rather large numbers.
Building on my comment posted as a reply to the OP:
I'd consider it an exercise in base
conversion. You're simply generating a
"random number" in "base 26", where
a=0 and z=25. For a random string of
length n, generate a number between 1
and 26^n. Convert from base 10 to base
26, using symbols from your chosen
alphabet.
Here's a PHP implementation. I won't guaranty that there isn't an off-by-one error or two in here, but any such error should be minor:
<?php
$n = 5;
var_dump(randstr($n));
function randstr($maxlen) {
$dict = 'abcdefghijklmnopqrstuvwxyz';
$rand = rand(0, pow(strlen($dict), $maxlen));
$str = base_convert($rand, 10, 26);
//base convert returns base 26 using 0-9 and 15 letters a-p(?)
//we must convert those to our own set of symbols
return strtr($str, '1234567890abcdefghijklmnopqrstuvwxyz', $dict);
}
Instead of picking a length with uniform distribution, weight it according to how many strings are a given length. If your alphabet is size m, there are mx strings of size x, and (1-mn+1)/(1-m) strings of length n or less. The probability of choosing a string of length x should be mx*(1-m)/(1-mn+1).
Edit:
Regarding overflow - using floating point instead of integers will expand the range, so for a 26-character alphabet and single-precision floats, direct weight calculation shouldn't overflow for n<26.
A more robust approach is to deal with it iteratively. This should also minimize the effects of underflow:
int randomLength() {
for(int i = n; i > 0; i--) {
double d = Math.random();
if(d > (m - 1) / (m - Math.pow(m, -i))) {
return i;
}
}
return 0;
}
To make this more efficient by calculating fewer random numbers, we can reuse them by splitting intervals in more than one place:
int randomLength() {
for(int i = n; i > 0; i -= 5) {
double d = Math.random();
double c = (m - 1) / (m - Math.pow(m, -i))
for(int j = 0; j < 5; j++) {
if(d > c) {
return i - j;
}
c /= m;
}
}
for(int i = n % 0; i > 0; i--) {
double d = Math.random();
if(d > (m - 1) / (m - Math.pow(m, -i))) {
return i;
}
}
return 0;
}
Edit: This answer isn't quite right. See the bottom for a disproof. I'll leave it up for now in the hope someone can come up with a variant that fixes it.
It's possible to do this without calculating the length separately - which, as others have pointed out, requires raising a number to a large power, and generally seems like a messy solution to me.
Proving that this is correct is a little tough, and I'm not sure I trust my expository powers to make it clear, but bear with me. For the purposes of the explanation, we're generating strings of length at most n from an alphabet a of |a| characters.
First, imagine you have a maximum length of n, and you've already decided you're generating a string of at least length n-1. It should be obvious that there are |a|+1 equally likely possibilities: we can generate any of the |a| characters from the alphabet, or we can choose to terminate with n-1 characters. To decide, we simply pick a random number x between 0 and |a| (inclusive); if x is |a|, we terminate at n-1 characters; otherwise, we append the xth character of a to the string. Here's a simple implementation of this procedure in Python:
def pick_character(alphabet):
x = random.randrange(len(alphabet) + 1)
if x == len(alphabet):
return ''
else:
return alphabet[x]
Now, we can apply this recursively. To generate the kth character of the string, we first attempt to generate the characters after k. If our recursive invocation returns anything, then we know the string should be at least length k, and we generate a character of our own from the alphabet and return it. If, however, the recursive invocation returns nothing, we know the string is no longer than k, and we use the above routine to select either the final character or no character. Here's an implementation of this in Python:
def uniform_random_string(alphabet, max_len):
if max_len == 1:
return pick_character(alphabet)
suffix = uniform_random_string(alphabet, max_len - 1)
if suffix:
# String contains characters after ours
return random.choice(alphabet) + suffix
else:
# String contains no characters after our own
return pick_character(alphabet)
If you doubt the uniformity of this function, you can attempt to disprove it: suggest a string for which there are two distinct ways to generate it, or none. If there are no such strings - and alas, I do not have a robust proof of this fact, though I'm fairly certain it's true - and given that the individual selections are uniform, then the result must also select any string with uniform probability.
As promised, and unlike every other solution posted thus far, no raising of numbers to large powers is required; no arbitrary length integers or floating point numbers are needed to store the result, and the validity, at least to my eyes, is fairly easy to demonstrate. It's also shorter than any fully-specified solution thus far. ;)
If anyone wants to chip in with a robust proof of the function's uniformity, I'd be extremely grateful.
Edit: Disproof, provided by a friend:
dato: so imagine alphabet = 'abc' and n = 2
dato: you have 9 strings of length 2, 3 of length 1, 1 of length 0
dato: that's 13 in total
dato: so probability of getting a length 2 string should be 9/13
dato: and probability of getting a length 1 or a length 0 should be 4/13
dato: now if you call uniform_random_string('abc', 2)
dato: that transforms itself into a call to uniform_random_string('abc', 1)
dato: which is an uniform distribution over ['a', 'b', 'c', '']
dato: the first three of those yield all the 2 length strings
dato: and the latter produce all the 1 length strings and the empty strings
dato: but 0.75 > 9/13
dato: and 0.25 < 4/13
// Note space as an available char
alphabet = "abcdefghijklmnopqrstuvwxyz "
result_string = ""
for( ;; )
{
s = ""
for( i = 0; i < n; i++ )
s += alphabet[rand(0, 26)]
first_space = n;
for( i = 0; i < n; i++ )
if( s[ i ] == ' ' )
{
first_space = i;
break;
}
ok = true;
// Reject "duplicate" shorter strings
for( i = first_space + 1; i < n; i++ )
if( s[ i ] != ' ' )
{
ok = false;
break;
}
if( !ok )
continue;
// Extract the short version of the string
for( i = 0; i < first_space; i++ )
result_string += s[ i ];
break;
}
Edit: I forgot to disallow 0-length strings, that will take a bit more code which I don't have time to add now.
Edit: After considering how my answer doesn't scale to large n (takes too long to get lucky and find an accepted string), I like paxdiablo's answer much better. Less code too.
Personally I'd do it like this:
Let's say your alphabet has Z characters. Then the number of possible strings for each length L is:
L | Z
--------------------------
1 | 26
2 | 676 (= 26 * 26)
3 | 17576 (= 26 * 26 * 26)
...and so on.
Now let's say your maximum desired length is N. Then the total number of possible strings from length 1 to N that your function could generate would be the sum of a geometric sequence:
(1 - (Z ^ (N + 1))) / (1 - Z)
Let's call this value S. Then the probability of generating a string of any length L should be:
(Z ^ L) / S
OK, fine. This is all well and good; but how do we generate a random number given a non-uniform probability distribution?
The short answer is: you don't. Get a library to do that for you. I develop mainly in .NET, so one I might turn to would be Math.NET.
That said, it's really not so hard to come up with a rudimentary approach to doing this on your own.
Here's one way: take a generator that gives you a random value within a known uniform distribution, and assign ranges within that distribution of sizes dependent on your desired distribution. Then interpret the random value provided by the generator by determining which range it falls into.
Here's an example in C# of one way you could implement this idea (scroll to the bottom for example output):
RandomStringGenerator class
public class RandomStringGenerator
{
private readonly Random _random;
private readonly char[] _alphabet;
public RandomStringGenerator(string alphabet)
{
if (string.IsNullOrEmpty(alphabet))
throw new ArgumentException("alphabet");
_random = new Random();
_alphabet = alphabet.Distinct().ToArray();
}
public string NextString(int maxLength)
{
// Get a value randomly distributed between 0.0 and 1.0 --
// this is approximately what the System.Random class provides.
double value = _random.NextDouble();
// This is where the magic happens: we "translate" the above number
// to a length based on our computed probability distribution for the given
// alphabet and the desired maximum string length.
int length = GetLengthFromRandomValue(value, _alphabet.Length, maxLength);
// The rest is easy: allocate a char array of the length determined above...
char[] chars = new char[length];
// ...populate it with a bunch of random values from the alphabet...
for (int i = 0; i < length; ++i)
{
chars[i] = _alphabet[_random.Next(0, _alphabet.Length)];
}
// ...and return a newly constructed string.
return new string(chars);
}
static int GetLengthFromRandomValue(double value, int alphabetSize, int maxLength)
{
// Looping really might not be the smartest way to do this,
// but it's the most obvious way that immediately springs to my mind.
for (int length = 1; length <= maxLength; ++length)
{
Range r = GetRangeForLength(length, alphabetSize, maxLength);
if (r.Contains(value))
return length;
}
return maxLength;
}
static Range GetRangeForLength(int length, int alphabetSize, int maxLength)
{
int L = length;
int Z = alphabetSize;
int N = maxLength;
double possibleStrings = (1 - (Math.Pow(Z, N + 1)) / (1 - Z));
double stringsOfGivenLength = Math.Pow(Z, L);
double possibleSmallerStrings = (1 - Math.Pow(Z, L)) / (1 - Z);
double probabilityOfGivenLength = ((double)stringsOfGivenLength / possibleStrings);
double probabilityOfShorterLength = ((double)possibleSmallerStrings / possibleStrings);
double startPoint = probabilityOfShorterLength;
double endPoint = probabilityOfShorterLength + probabilityOfGivenLength;
return new Range(startPoint, endPoint);
}
}
Range struct
public struct Range
{
public readonly double StartPoint;
public readonly double EndPoint;
public Range(double startPoint, double endPoint)
: this()
{
this.StartPoint = startPoint;
this.EndPoint = endPoint;
}
public bool Contains(double value)
{
return this.StartPoint <= value && value <= this.EndPoint;
}
}
Test
static void Main(string[] args)
{
const int N = 5;
const string alphabet = "acegikmoqstvwy";
int Z = alphabet.Length;
var rand = new RandomStringGenerator(alphabet);
var strings = new List<string>();
for (int i = 0; i < 100000; ++i)
{
strings.Add(rand.NextString(N));
}
Console.WriteLine("First 10 results:");
for (int i = 0; i < 10; ++i)
{
Console.WriteLine(strings[i]);
}
// sanity check
double sumOfProbabilities = 0.0;
for (int i = 1; i <= N; ++i)
{
double probability = Math.Pow(Z, i) / ((1 - (Math.Pow(Z, N + 1))) / (1 - Z));
int numStrings = strings.Count(str => str.Length == i);
Console.WriteLine("# strings of length {0}: {1} (probability = {2:0.00%})", i, numStrings, probability);
sumOfProbabilities += probability;
}
Console.WriteLine("Probabilities sum to {0:0.00%}.", sumOfProbabilities);
Console.ReadLine();
}
Output:
First 10 results:
wmkyw
qqowc
ackai
tokmo
eeiyw
cakgg
vceec
qwqyq
aiomt
qkyav
# strings of length 1: 1 (probability = 0.00%)
# strings of length 2: 38 (probability = 0.03%)
# strings of length 3: 475 (probability = 0.47%)
# strings of length 4: 6633 (probability = 6.63%)
# strings of length 5: 92853 (probability = 92.86%)
Probabilities sum to 100.00%.
My idea regarding this is like:
you have 1-n length string.there 26 possible 1 length string,26*26 2 length string and so on.
you can find out the percentage of each length string of the total possible strings.for example percentage of single length string is like
((26/(TOTAL_POSSIBLE_STRINGS_OF_ALL_LENGTH))*100).
similarly you can find out the percentage of other length strings.
Mark them on a number line between 1 to 100.ie suppose percentage of single length string is 3 and double length string is 6 then number line single length string lies between 0-3 while double length string lies between 3-9 and so on.
Now take a random number between 1 to 100.find out the range in which this number lies.I mean suppose for examplethe number you have randomly chosen is 2.Now this number lies between 0-3 so go 1 length string or if the random number chosen is 7 then go for double length string.
In this fashion you can see that length of each string choosen will be proportional to the percentage of the total number of that length string contribute to the all possible strings.
Hope I am clear.
Disclaimer: I have not gone through above solution except one or two.So if it matches with some one solution it will be purely a chance.
Also,I will welcome all the advice and positive criticism and correct me if I am wrong.
Thanks and regard
Mawia
Matthieu: Your idea doesn't work because strings with blanks are still more likely to be generated. In your case, with n=4, you could have the string 'ab' generated as 'a' + 'b' + '' + '' or '' + 'a' + 'b' + '', or other combinations. Thus not all the strings have the same chance of appearing.

Resources