What is the best algorithm to find whether an anagram is of a palindrome? - algorithm

In this problem we consider only strings of lower-case English letters (a-z).
A string is a palindrome if it has exactly the same sequence of characters when traversed left-to-right as right-to-left. For example, the following strings are palindromes:
"kayak"
"codilitytilidoc"
"neveroddoreven"
A string A is an anagram of a string B if it consists of exactly the same characters, but possibly in another order. For example, the following strings are each other's anagrams:
A="mary" B="army" A="rocketboys" B="octobersky" A="codility" B="codility"
Write a function
int isAnagramOfPalindrome(String S);
which returns 1 if the string s is a anagram of some palindrome, or returns 0 otherwise.
For example your function should return 1 for the argument "dooernedeevrvn", because it is an anagram of a palindrome "neveroddoreven". For argument "aabcba", your function should return 0.

'Algorithm' would be too big word for it.
You can construct a palindrome from the given character set if each character occurs in that set even number of times (with possible exception of one character).
For any other set, you can easily show that no palindrome exists.
Proof is simple in both cases, but let me know if that wasn't clear.

In a palindrome, every character must have a copy of itself, a "twin", on the other side of the string, except in the case of the middle letter, which can act as its own twin.
The algorithm you seek would create a length-26 array, one for each lowercase letter, and start counting the characters in the string, placing the quantity of character n at index n of the array. Then, it would pass through the array and count the number of characters with an odd quantity (because one letter there does not have a twin). If this number is 0 or 1, place that single odd letter in the center, and a palindrome is easily generated. Else, it's impossible to generate one, because two or more letters with no twins exist, and they can't both be in the center.

I came up with this solution for Javascript.
This solution is based on the premise that a string is an anagram of a palindrome if and only if at most one character appears an odd number of times in it.
function solution(S) {
var retval = 0;
var sorted = S.split('').sort(); // sort the input characters and store in
// a char array
var array = new Array();
for (var i = 0; i < sorted.length; i++) {
// check if the 2 chars are the same, if so copy the 2 chars to the new
// array
// and additionally increment the counter to account for the second char
// position in the loop.
if ((sorted[i] === sorted[i + 1]) && (sorted[i + 1] != undefined)) {
array.push.apply(array, sorted.slice(i, i + 2));
i = i + 1;
}
}
// if the original string array's length is 1 or more than the length of the
// new array's length
if (sorted.length <= array.length + 1) {
retval = 1;
}
//console.log("new array-> " + array);
//console.log("sorted array-> " + sorted);
return retval;
}

i wrote this code in java. i don't think if its gonna be a good one ^^,
public static int isAnagramOfPalindrome(String str){
ArrayList<Character> a = new ArrayList<Character>();
for(int i = 0; i < str.length(); i++){
if(a.contains(str.charAt(i))){
a.remove((Object)str.charAt(i));
}
else{
a.add(str.charAt(i));
}
}
if(a.size() > 1)
return 0;
return 1;
}

Algorithm:
Count the number of occurrence of each character.
Only one character with odd occurrence is allowed since in a palindrome the maximum number of character with odd occurrence can be '1'.
All other characters should occur in an even number of times.
If (2) and (3) fail, then the given string is not a palindrome.

This adds to the other answers given. We want to keep track of the count of each letter seen. If we have more than one odd count for a letter then we will not be able to form a palindrome. The odd count would go in the middle, but only one odd count can do so.
We can use a hashmap to keep track of the counts. The lookup for a hashmap is O(1) so it is fast. We are able to run the whole algorithm in O(n). Here's it is in code:
if __name__ == '__main__':
line = input()
dic = {}
for i in range(len(line)):
ch = line[i]
if ch in dic:
dic[ch] += 1
else:
dic[ch] = 1
chars_whose_count_is_odd = 0
for key, value in dic.items():
if value % 2 == 1:
chars_whose_count_is_odd += 1
if chars_whose_count_is_odd > 1:
print ("NO")
else:
print ("YES")

I have a neat solution in PHP posted in this question about complexities.
class Solution {
// Function to determine if the input string can make a palindrome by rearranging it
static public function isAnagramOfPalindrome($S) {
// here I am counting how many characters have odd number of occurrences
$odds = count(array_filter(count_chars($S, 1), function($var) {
return($var & 1);
}));
// If the string length is odd, then a palindrome would have 1 character with odd number occurrences
// If the string length is even, all characters should have even number of occurrences
return (int)($odds == (strlen($S) & 1));
}
}
echo Solution :: isAnagramOfPalindrome($_POST['input']);
It uses built-in PHP functions (why not), but you can make it yourself, as those functions are quite simple. First, the count_chars function generates a named array (dictionary in python) with all characters that appear in the string, and their number of occurrences. It can be substituted with a custom function like this:
$count_chars = array();
foreach($S as $char) {
if array_key_exists($char, $count_chars) {
$count_chars[$char]++;
else {
$count_chars[$char] = 1;
}
}
Then, an array_filter with a count function is applied to count how many chars have odd number of occurrences:
$odds = 0;
foreach($count_chars as $char) {
$odds += $char % 2;
}
And then you just apply the comparison in return (explained in the comments of the original function).
return ($odds == strlen($char) % 2)

This runs in O(n). For all chars but one, must be even. the optional odd character can be any odd number.
e.g.
abababa
def anagram_of_pali(str):
char_list = list(str)
map = {}
nb_of_odds = 0
for char in char_list:
if char in map:
map[char] += 1
else:
map[char] = 1
for char in map:
if map[char] % 2 != 0:
nb_of_odds += 1
return True if nb_of_odds <= 1 else False

You just have to count all the letters and check if there are letters with odd counts. If there are more than one letter with odd counts the string does not satisfy the above palindrome condition.
Furthermore, since a string with an even number letters must not have a letter with an odd count it is not necessary to check whether string length is even or not. It will take O(n) time complexity:
Here's the implementation in javascript:
function canRearrangeToPalindrome(str)
{
var letterCounts = {};
var letter;
var palindromeSum = 0;
for (var i = 0; i < str.length; i++) {
letter = str[i];
letterCounts[letter] = letterCounts[letter] || 0;
letterCounts[letter]++;
}
for (var letterCount in letterCounts) {
palindromeSum += letterCounts[letterCount] % 2;
}
return palindromeSum < 2;
}

All right - it's been a while, but as I was asked such a question in a job interview I needed to give it a try in a few lines of Python. The basic idea is that if there is an anagram that is a palindrome for even number of letters each character occurs twice (or something like 2n times, i.e. count%2==0). In addition, for an odd number of characters one character (the one in the middle) may occur only once (or an uneven number - count%2==1).
I used a set in python to get the unique characters and then simply count and break the loop once the condition cannot be fulfilled. Example code (Python3):
def is_palindrome(s):
letters = set(s)
oddc=0
fail=False
for c in letters:
if s.count(c)%2==1:
oddc = oddc+1
if oddc>0 and len(s)%2==0:
fail=True
break
elif oddc>1:
fail=True
break
return(not fail)

def is_anagram_of_palindrome(S):
L = [ 0 for _ in range(26) ]
a = ord('a')
length = 0
for s in S:
length += 1
i = ord(s) - a
L[i] = abs(L[i] - 1)
return length > 0 and sum(L) < 2 and 1 or 0

While you can detect that the given string "S" is a candidate palindrome using the given techniques, it is still not very useful. According to the implementations given,
isAnagramOfPalindrome("rrss") would return true but there is no actual palindrome because:
A palindrome is a word, phrase, number, or other sequence of symbols or elements, whose meaning may be interpreted the same way in either forward or reverse direction. (Wikipedia)
And Rssr or Srrs is not an actual word or phrase that is interpretable. Same with it's anagram. Aarrdd is not an anagram of radar because it is not interpretable.
So, the solutions given must be augmented with a heuristic check against the input to see if it's even a word, and then a verification (via the implementations given), that it is palindrome-able at all. Then there is a heuristic search through the collected buckets with n/2! permutations to search if those are ACTUALLY palindromes and not garbage. The search is only n/2! and not n! because you calculate all permutations of each repeated letter, and then you mirror those over (in addition to possibly adding the singular pivot letter) to create all possible palindromes.
I disagree that algorithm is too big of a word, because this search can be done pure recursively, or using dynamic programming (in the case of words with letters with occurrences greater than 2) and is non trivial.

Here's some code: This is same as the top answer that describes algorithm.
1 #include<iostream>
2 #include<string>
3 #include<vector>
4 #include<stack>
5
6 using namespace std;
7
8 bool fun(string in)
9 {
10 int len=in.size();
11 int myints[len ];
12
13 for(int i=0; i<len; i++)
14 {
15 myints[i]= in.at(i);
16 }
17 vector<char> input(myints, myints+len);
18 sort(input.begin(), input.end());
19
20 stack<int> ret;
21
22 for(int i=0; i<len; i++)
23 {
24 if(!ret.empty() && ret.top()==input.at(i))
25 {
26 ret.pop();
27 }
28 else{
29 ret.push(input.at(i));
30 }
31 }
32
33 return ret.size()<=1;
34
35 }
36
37 int main()
38 {
39 string input;
40 cout<<"Enter word/number"<<endl;
41 cin>>input;
42 cout<<fun(input)<<endl;
43
44 return 0;
45 }

Related

Deleting duplicate characters from array

Got asked this question in an interview and couldn't find a solution.
Given an array of characters delete all the characters that got repeated k or more times consecutively and add '#' in the end of the array for every deleted character.
Example:
"xavvvarrrt"->"xaat######"
O(1) memory and O(n) time without writing to the same cell twice.
The tricky part for me was that I am not allowed to overwrite a cell more than once, which means I need to know exactly where each character will move after deleting the duplicates.
The best I could come up with is iterating once on the array and saving in a map the occurrences of each character, and when iterating again and checking if the current character is not deleted then move it to the new position according to the offset, if it is deleted then update an offset variable.
The problem with this approach is that it won't work in this scenario:
"aabbaa" because 'a' appears at two different places.
So when I thought about saving an array of occurrences in the map but now it won't use O(1) memory.
Thanks
This seems to work with your examples, although it seems a little complicated to me :) I wonder if we could simplify it. The basic idea is to traverse from left to right, keeping a record of how many places in the current block of duplicates are still available to replace, while the right pointer looks for more blocks to shift over.
JavaScript code:
function f(str){
str = str.split('')
let r = 1
let l = 0
let to_fill = 0
let count = 1
let fill = function(){
while (count > 0 && (to_fill > 0 || l < r)){
str[l] = str[r - count]
l++
count--
to_fill--
}
}
for (; r<str.length; r++){
if (str[r] == str[r-1]){
count++
} else if (count < 3){
if (to_fill)
fill()
count = 1
if (!to_fill)
l = r
} else if (!to_fill){
to_fill = count
count = 1
} else {
count = 1
}
}
if (count < 3)
fill()
while (l < str.length)
str[l++] = '#'
return str.join('')
}
var str = "aayyyycbbbee"
console.log(str)
console.log(f(str)) // "aacee#######"
str = "xavvvarrrt"
console.log(str)
console.log(f(str)) // "xaat######"
str = "xxaavvvaarrrbbsssgggtt"
console.log(str)
console.log(f(str))
Here is a version similar to the other JS answer, but a bit simpler:
function repl(str) {
str = str.split("");
var count = 1, write = 0;
for (var read = 0; read < str.length; read++) {
if (str[read] == str[read+1])
count++;
else {
if (count < 3) {
for (var i = 0; i < count; i++)
str[write++] = str[read];
}
count = 1;
}
}
while (write < str.length)
str[write++] = '#';
return str.join("");
}
function demo(str) {
console.log(str + " ==> " + repl(str));
}
demo("a");
demo("aa");
demo("aaa");
demo("aaaaaaa");
demo("aayyyycbbbee");
demo("xavvvarrrt");
demo("xxxaaaaxxxaaa");
demo("xxaavvvaarrrbbsssgggtt");
/*
Output:
a ==> a
aa ==> aa
aaa ==> ###
aaaaaaa ==> #######
aayyyycbbbee ==> aacee#######
xavvvarrrt ==> xaat######
xxxaaaaxxxaaa ==> #############
xxaavvvaarrrbbsssgggtt ==> xxaaaabbtt############
*/
The idea is to keep the current index for reading the next character and one for writing, as well as the number of consecutive repeated characters. If the following character is equal to the current, we just increase the counter. Otherwise we copy all characters below a count of 3, increasing the write index appropriately.
At the end of reading, anything from the current write index up to the end of the array is the number of repeated characters we have skipped. We just fill that with hashes now.
As we only store 3 values, memory consumption is O(1); we read each array cell twice, so O(n) time (the extra reads on writing could be eliminated by another variable); and each write index is accessed exactly once.

Permutations excluding repeated characters

I'm working on a Free Code Camp problem - http://www.freecodecamp.com/challenges/bonfire-no-repeats-please
The problem description is as follows -
Return the number of total permutations of the provided string that
don't have repeated consecutive letters. For example, 'aab' should
return 2 because it has 6 total permutations, but only 2 of them don't
have the same letter (in this case 'a') repeating.
I know I can solve this by writing a program that creates every permutation and then filters out the ones with repeated characters.
But I have this gnawing feeling that I can solve this mathematically.
First question then - Can I?
Second question - If yes, what formula could I use?
To elaborate further -
The example given in the problem is "aab" which the site says has six possible permutations, with only two meeting the non-repeated character criteria:
aab aba baa aab aba baa
The problem sees each character as unique so maybe "aab" could better be described as "a1a2b"
The tests for this problem are as follows (returning the number of permutations that meet the criteria)-
"aab" should return 2
"aaa" should return 0
"abcdefa" should return 3600
"abfdefa" should return 2640
"zzzzzzzz" should return 0
I have read through a lot of post about Combinatorics and Permutations and just seem to be digging a deeper hole for myself. But I really want to try to resolve this problem efficiently rather than brute force through an array of all possible permutations.
I posted this question on math.stackexchange - https://math.stackexchange.com/q/1410184/264492
The maths to resolve the case where only one character is repeated is pretty trivial - Factorial of total number of characters minus number of spaces available multiplied by repeated characters.
"aab" = 3! - 2! * 2! = 2
"abcdefa" = 7! - 6! * 2! = 3600
But trying to figure out the formula for the instances where more than one character is repeated has eluded me. e.g. "abfdefa"
This is a mathematical approach, that doesn't need to check all the possible strings.
Let's start with this string:
abfdefa
To find the solution we have to calculate the total number of permutations (without restrictions), and then subtract the invalid ones.
TOTAL OF PERMUTATIONS
We have to fill a number of positions, that is equal to the length of the original string. Let's consider each position a small box.
So, if we have
abfdefa
which has 7 characters, there are seven boxes to fill. We can fill the first with any of the 7 characters, the second with any of the remaining 6, and so on. So the total number of permutations, without restrictions, is:
7 * 6 * 5 * 4 * 3 * 2 * 1 = 7! (= 5,040)
INVALID PERMUTATIONS
Any permutation with two equal characters side by side is not valid. Let's see how many of those we have.
To calculate them, we'll consider that any character that has the same character side by side, will be in the same box. As they have to be together, why don't consider them something like a "compound" character?
Our example string has two repeated characters: the 'a' appears twice, and the 'f' also appears twice.
Number of permutations with 'aa'
Now we have only six boxes, as one of them will be filled with 'aa':
6 * 5 * 4 * 3 * 2 * 1 = 6!
We also have to consider that the two 'a' can be themselves permuted in 2! (as we have two 'a') ways.
So, the total number of permutations with two 'a' together is:
6! * 2! (= 1,440)
Number of permutations with 'ff'
Of course, as we also have two 'f', the number of permutations with 'ff' will be the same as the ones with 'aa':
6! * 2! (= 1,440)
OVERLAPS
If we had only one character repeated, the problem is finished, and the final result would be TOTAL - INVALID permutations.
But, if we have more than one repeated character, we have counted some of the invalid strings twice or more times.
We have to notice that some of the permutations with two 'a' together, will also have two 'f' together, and vice versa, so we need to add those back.
How do we count them?
As we have two repeated characters, we will consider two "compound" boxes: one for occurrences of 'aa' and other for 'ff' (both at the same time).
So now we have to fill 5 boxes: one with 'aa', other with 'ff', and 3 with the remaining 'b', 'd' and 'e'.
Also, each of those 'aa' and 'bb' can be permuted in 2! ways. So the total number of overlaps is:
5! * 2! * 2! (= 480)
FINAL SOLUTION
The final solution to this problem will be:
TOTAL - INVALID + OVERLAPS
And that's:
7! - (2 * 6! * 2!) + (5! * 2! * 2!) = 5,040 - 2 * 1,440 + 480 = 2,640
It seemed like a straightforward enough problem, but I spent hours on the wrong track before finally figuring out the correct logic. To find all permutations of a string with one or multiple repeated characters, while keeping identical characters seperated:
Start with a string like:
abcdabc
Seperate the first occurances from the repeats:
firsts: abcd
repeats: abc
Find all permutations of the firsts:
abcd abdc adbc adcb ...
Then, one by one, insert the repeats into each permutation, following these rules:
Start with the repeated character whose original comes first in the firsts
e.g. when inserting abc into dbac, use b first
Put the repeat two places or more after the first occurance
e.g. when inserting b into dbac, results are dbabc and dbacb
Then recurse for each result with the remaining repeated characters
I've seen this question with one repeated character, where the number of permutations of abcdefa where the two a's are kept seperate is given as 3600. However, this way of counting considers abcdefa and abcdefa to be two distinct permutations, "because the a's are swapped". In my opinion, this is just one permutation and its double, and the correct answer is 1800; the algorithm below will return both results.
function seperatedPermutations(str) {
var total = 0, firsts = "", repeats = "";
for (var i = 0; i < str.length; i++) {
char = str.charAt(i);
if (str.indexOf(char) == i) firsts += char; else repeats += char;
}
var firsts = stringPermutator(firsts);
for (var i = 0; i < firsts.length; i++) {
insertRepeats(firsts[i], repeats);
}
alert("Permutations of \"" + str + "\"\ntotal: " + (Math.pow(2, repeats.length) * total) + ", unique: " + total);
// RECURSIVE CHARACTER INSERTER
function insertRepeats(firsts, repeats) {
var pos = -1;
for (var i = 0; i < firsts.length, pos < 0; i++) {
pos = repeats.indexOf(firsts.charAt(i));
}
var char = repeats.charAt(pos);
for (var i = firsts.indexOf(char) + 2; i <= firsts.length; i++) {
var combi = firsts.slice(0, i) + char + firsts.slice(i);
if (repeats.length > 1) {
insertRepeats(combi, repeats.slice(0, pos) + repeats.slice(pos + 1));
} else {
document.write(combi + "<BR>");
++total;
}
}
}
// STRING PERMUTATOR (after Filip Nguyen)
function stringPermutator(str) {
var fact = [1], permutations = [];
for (var i = 1; i <= str.length; i++) fact[i] = i * fact[i - 1];
for (var i = 0; i < fact[str.length]; i++) {
var perm = "", temp = str, code = i;
for (var pos = str.length; pos > 0; pos--) {
var sel = code / fact[pos - 1];
perm += temp.charAt(sel);
code = code % fact[pos - 1];
temp = temp.substring(0, sel) + temp.substring(sel + 1);
}
permutations.push(perm);
}
return permutations;
}
}
seperatedPermutations("abfdefa");
A calculation based on this logic of the number of results for a string like abfdefa, with 5 "first" characters and 2 repeated characters (A and F) , would be:
The 5 "first" characters create 5! = 120 permutations
Each character can be in 5 positions, with 24 permutations each:
A**** (24)
*A*** (24)
**A** (24)
***A* (24)
****A (24)
For each of these positions, the repeat character has to come at least 2 places after its "first", so that makes 4, 3, 2 and 1 places respectively (for the last position, a repeat is impossible). With the repeated character inserted, this makes 240 permutations:
A***** (24 * 4)
*A**** (24 * 3)
**A*** (24 * 2)
***A** (24 * 1)
In each of these cases, the second character that will be repeated could be in 6 places, and the repeat character would have 5, 4, 3, 2, and 1 place to go. However, the second (F) character cannot be in the same place as the first (A) character, so one of the combinations is always impossible:
A****** (24 * 4 * (0+4+3+2+1)) = 24 * 4 * 10 = 960
*A***** (24 * 3 * (5+0+3+2+1)) = 24 * 3 * 11 = 792
**A**** (24 * 2 * (5+4+0+2+1)) = 24 * 2 * 12 = 576
***A*** (24 * 1 * (5+4+3+0+1)) = 24 * 1 * 13 = 312
And 960 + 792 + 576 + 312 = 2640, the expected result.
Or, for any string like abfdefa with 2 repeats:
where F is the number of "firsts".
To calculate the total without identical permutations (which I think makes more sense) you'd divide this number by 2^R, where R is the number or repeats.
Here's one way to think about it, which still seems a bit complicated to me: subtract the count of possibilities with disallowed neighbors.
For example abfdefa:
There are 6 ways to place "aa" or "ff" between the 5! ways to arrange the other five
letters, so altogether 5! * 6 * 2, multiplied by their number of permutations (2).
Based on the inclusion-exclusion principle, we subtract those possibilities that include
both "aa" and "ff" from the count above: 3! * (2 + 4 - 1) choose 2 ways to place both
"aa" and "ff" around the other three letters, and we must multiply by the permutation
counts within (2 * 2) and between (2).
So altogether,
7! - (5! * 6 * 2 * 2 - 3! * (2 + 4 - 1) choose 2 * 2 * 2 * 2) = 2640
I used the formula for multiset combinations for the count of ways to place the letter pairs between the rest.
A generalizable way that might achieve some improvement over the brute force solution is to enumerate the ways to interleave the letters with repeats and then multiply by the ways to partition the rest around them, taking into account the spaces that must be filled. The example, abfdefa, might look something like this:
afaf / fafa => (5 + 3 - 1) choose 3 // all ways to partition the rest
affa / faaf => 1 + 4 + (4 + 2 - 1) choose 2 // all three in the middle; two in the middle, one anywhere else; one in the middle, two anywhere else
aaff / ffaa => 3 + 1 + 1 // one in each required space, the other anywhere else; two in one required space, one in the other (x2)
Finally, multiply by the permutation counts, so altogether:
2 * 2! * 2! * 3! * ((5 + 3 - 1) choose 3 + 1 + 4 + (4 + 2 - 1) choose 2 + 3 + 1 + 1) = 2640
Well I won't have any mathematical solution for you here.
I guess you know backtracking as I percieved from your answer.So you can use Backtracking to generate all permutations and skipping a particular permutation whenever a repeat is encountered. This method is called Backtracking and Pruning.
Let n be the the length of the solution string, say(a1,a2,....an).
So during backtracking when only partial solution was formed, say (a1,a2,....ak) compare the values at ak and a(k-1).
Obviously you need to maintaion a reference to a previous letter(here a(k-1))
If both are same then break out from the partial solution, without reaching to the end and start creating another permutation from a1.
Thanks Lurai for great suggestion. It took a while and is a bit lengthy but here's my solution (it passes all test cases at FreeCodeCamp after converting to JavaScript of course) - apologies for crappy variables names (learning how to be a bad programmer too ;)) :D
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
public class PermAlone {
public static int permAlone(String str) {
int length = str.length();
int total = 0;
int invalid = 0;
int overlap = 0;
ArrayList<Integer> vals = new ArrayList<>();
Map<Character, Integer> chars = new HashMap<>();
// obtain individual characters and their frequencies from the string
for (int i = 0; i < length; i++) {
char key = str.charAt(i);
if (!chars.containsKey(key)) {
chars.put(key, 1);
}
else {
chars.put(key, chars.get(key) + 1);
}
}
// if one character repeated set total to 0
if (chars.size() == 1 && length > 1) {
total = 0;
}
// otherwise calculate total, invalid permutations and overlap
else {
// calculate total
total = factorial(length);
// calculate invalid permutations
for (char key : chars.keySet()) {
int len = 0;
int lenPerm = 0;
int charPerm = 0;
int val = chars.get(key);
int check = 1;
// if val > 0 there will be more invalid permutations to calculate
if (val > 1) {
check = val;
vals.add(val);
}
while (check > 1) {
len = length - check + 1;
lenPerm = factorial(len);
charPerm = factorial(check);
invalid = lenPerm * charPerm;
total -= invalid;
check--;
}
}
// calculate overlaps
if (vals.size() > 1) {
overlap = factorial(chars.size());
for (int val : vals) {
overlap *= factorial(val);
}
}
total += overlap;
}
return total;
}
// helper function to calculate factorials - not recursive as I was running out of memory on the platform :?
private static int factorial(int num) {
int result = 1;
if (num == 0 || num == 1) {
result = num;
}
else {
for (int i = 2; i <= num; i++) {
result *= i;
}
}
return result;
}
public static void main(String[] args) {
System.out.printf("For %s: %d\n\n", "aab", permAlone("aab")); // expected 2
System.out.printf("For %s: %d\n\n", "aaa", permAlone("aaa")); // expected 0
System.out.printf("For %s: %d\n\n", "aabb", permAlone("aabb")); // expected 8
System.out.printf("For %s: %d\n\n", "abcdefa", permAlone("abcdefa")); // expected 3600
System.out.printf("For %s: %d\n\n", "abfdefa", permAlone("abfdefa")); // expected 2640
System.out.printf("For %s: %d\n\n", "zzzzzzzz", permAlone("zzzzzzzz")); // expected 0
System.out.printf("For %s: %d\n\n", "a", permAlone("a")); // expected 1
System.out.printf("For %s: %d\n\n", "aaab", permAlone("aaab")); // expected 0
System.out.printf("For %s: %d\n\n", "aaabb", permAlone("aaabb")); // expected 12
System.out.printf("For %s: %d\n\n", "abbc", permAlone("abbc")); //expected 12
}
}

Display all the possible numbers having its digits in ascending order

Write a program that can display all the possible numbers in between given two numbers, having its digits in ascending order.
For Example:-
Input: 5000 to 6000
Output: 5678 5679 5689 5789
Input: 90 to 124
Output: 123 124
Brute force approach can make it count to all numbers and check of digits for each one of them. But I want approaches that can skip some numbers and can bring complexity lesser than O(n). Do any such solution(s) exists that can give better approach for this problem?
I offer a solution in Python. It is efficient as it considers only the relevant numbers. The basic idea is to count upwards, but handle overflow somewhat differently. While we normally set overflowing digits to 0, here we set them to the previous digit +1. Please check the inline comments for further details. You can play with it here: http://ideone.com/ePvVsQ
def ascending( na, nb ):
assert nb>=na
# split each number into a list of digits
a = list( int(x) for x in str(na))
b = list( int(x) for x in str(nb))
d = len(b) - len(a)
# if both numbers have different length add leading zeros
if d>0:
a = [0]*d + a # add leading zeros
assert len(a) == len(b)
n = len(a)
# check if the initial value has increasing digits as required,
# and fix if necessary
for x in range(d+1, n):
if a[x] <= a[x-1]:
for y in range(x, n):
a[y] = a[y-1] + 1
break
res = [] # result set
while a<=b:
# if we found a value and add it to the result list
# turn the list of digits back into an integer
if max(a) < 10:
res.append( int( ''.join( str(k) for k in a ) ) )
# in order to increase the number we look for the
# least significant digit that can be increased
for x in range( n-1, -1, -1): # count down from n-1 to 0
if a[x] < 10+x-n:
break
# digit x is to be increased
a[x] += 1
# all subsequent digits must be increased accordingly
for y in range( x+1, n ):
a[y] = a[y-1] + 1
return res
print( ascending( 5000, 9000 ) )
Sounds like task from Project Euler. Here is the solution in C++. It is not short, but it is straightforward and effective. Oh, and hey, it uses backtracking.
// Higher order digits at the back
typedef std::vector<int> Digits;
// Extract decimal digits of a number
Digits ExtractDigits(int n)
{
Digits digits;
while (n > 0)
{
digits.push_back(n % 10);
n /= 10;
}
if (digits.empty())
{
digits.push_back(0);
}
return digits;
}
// Main function
void PrintNumsRec(
const Digits& minDigits, // digits of the min value
const Digits& maxDigits, // digits of the max value
Digits& digits, // digits of current value
int pos, // current digits with index greater than pos are already filled
bool minEq, // currently filled digits are the same as of min value
bool maxEq) // currently filled digits are the same as of max value
{
if (pos < 0)
{
// Print current value. Handle leading zeros by yourself, if need
for (auto pDigit = digits.rbegin(); pDigit != digits.rend(); ++pDigit)
{
if (*pDigit >= 0)
{
std::cout << *pDigit;
}
}
std::cout << std::endl;
return;
}
// Compute iteration boundaries for current position
int first = minEq ? minDigits[pos] : 0;
int last = maxEq ? maxDigits[pos] : 9;
// The last filled digit
int prev = digits[pos + 1];
// Make sure generated number has increasing digits
int firstInc = std::max(first, prev + 1);
// Iterate through possible cases for current digit
for (int d = firstInc; d <= last; ++d)
{
digits[pos] = d;
if (d == 0 && prev == -1)
{
// Mark leading zeros with -1
digits[pos] = -1;
}
PrintNumsRec(minDigits, maxDigits, digits, pos - 1, minEq && (d == first), maxEq && (d == last));
}
}
// High-level function
void PrintNums(int min, int max)
{
auto minDigits = ExtractDigits(min);
auto maxDigits = ExtractDigits(max);
// Make digits array of the same size
while (minDigits.size() < maxDigits.size())
{
minDigits.push_back(0);
}
Digits digits(minDigits.size());
int pos = digits.size() - 1;
// Placeholder for leading zero
digits.push_back(-1);
PrintNumsRec(minDigits, maxDigits, digits, pos, true, true);
}
void main()
{
PrintNums(53, 297);
}
It uses recursion to handle arbitrary amount of digits, but it is essentially the same as the nested loops approach. Here is the output for (53, 297):
056
057
058
059
067
068
069
078
079
089
123
124
125
126
127
128
129
134
135
136
137
138
139
145
146
147
148
149
156
157
158
159
167
168
169
178
179
189
234
235
236
237
238
239
245
246
247
248
249
256
257
258
259
267
268
269
278
279
289
Much more interesting problem would be to count all these numbers without explicitly computing it. One would use dynamic programming for that.
There is only a very limited number of numbers which can match your definition (with 9 digits max) and these can be generated very fast. But if you really need speed, just cache the tree or the generated list and do a lookup when you need your result.
using System;
using System.Collections.Generic;
namespace so_ascending_digits
{
class Program
{
class Node
{
int digit;
int value;
List<Node> children;
public Node(int val = 0, int dig = 0)
{
digit = dig;
value = (val * 10) + digit;
children = new List<Node>();
for (int i = digit + 1; i < 10; i++)
{
children.Add(new Node(value, i));
}
}
public void Collect(ref List<int> collection, int min = 0, int max = Int16.MaxValue)
{
if ((value >= min) && (value <= max)) collection.Add(value);
foreach (Node n in children) if (value * 10 < max) n.Collect(ref collection, min, max);
}
}
static void Main(string[] args)
{
Node root = new Node();
List<int> numbers = new List<int>();
root.Collect(ref numbers, 5000, 6000);
numbers.Sort();
Console.WriteLine(String.Join("\n", numbers));
}
}
}
Why the brute force algorithm may be very inefficient.
One efficient way of encoding the input is to provide two numbers: the lower end of the range, a, and the number of values in the range, b-a-1. This can be encoded in O(lg a + lg (b - a)) bits, since the number of bits needed to represent a number in base-2 is roughly equal to the base-2 logarithm of the number. We can simplify this to O(lg b), because intuitively if b - a is small, then a = O(b), and if b - a is large, then b - a = O(b). Either way, the total input size is O(2 lg b) = O(lg b).
Now the brute force algorithm just checks each number from a to b, and outputs the numbers whose digits in base 10 are in increasing order. There are b - a + 1 possible numbers in that range. However, when you represent this in terms of the input size, you find that b - a + 1 = 2lg (b - a + 1) = 2O(lg b) for a large enough interval.
This means that for an input size n = O(lg b), you may need to check in the worst case O(2 n) values.
A better algorithm
Instead of checking every possible number in the interval, you can simply generate the valid numbers directly. Here's a rough overview of how. A number n can be thought of as a sequence of digits n1 ... nk, where k is again roughly log10 n.
For a and a four-digit number b, the iteration would look something like
for w in a1 .. 9:
for x in w+1 .. 9:
for y in x+1 .. 9:
for x in y+1 .. 9:
m = 1000 * w + 100 * x + 10 * y + w
if m < a:
next
if m > b:
exit
output w ++ x ++ y ++ z (++ is just string concatenation)
where a1 can be considered 0 if a has fewer digits than b.
For larger numbers, you can imagine just adding more nested for loops. In general, if b has d digits, you need d = O(lg b) loops, each of which iterates at most 10 times. The running time is thus O(10 lg b) = O(lg b) , which is a far better than the O(2lg b) running time you get by checking if every number is sorted or not.
One other detail that I have glossed over, which actually does affect the running time. As written, the algorithm needs to consider the time it takes to generate m. Without going into the details, you could assume that this adds at worst a factor of O(lg b) to the running time, resulting in an O(lg2 b) algorithm. However, using a little extra space at the top of each for loop to store partial products would save lots of redundant multiplication, allowing us to preserve the originally stated O(lg b) running time.
One way (pseudo-code):
for (digit3 = '5'; digit3 <= '6'; digit3++)
for (digit2 = digit3+1; digit2 <= '9'; digit2++)
for (digit1 = digit2+1; digit1 <= '9'; digit1++)
for (digit0 = digit1+1; digit0 <= '9'; digit0++)
output = digit3 + digit2 + digit1 + digit0; // concatenation

How would you find the initial letter with the most occurrences using recursion?

Given a sentence that is spread over a linked list where each item in the list is a word, for example:
Hello -> Everybody -> How -> Are -> You -> Feeling -> |
Given that this list is sorted, eg:
Are -> Everybody -> Feeling -> Hello -> How -> You -> |
How would you write the recursion that will find the initial letter that appears the most in the sentence (in this example the letter H from Hello & How) ?
Edit: I have update the code to recursion version.
In order to run it you call
GetMostLetterRecursion(rootNode , '0', 0, '0', 0)
The code itself look like this:
public char GetMostLetterRecursion(LinkedListNode<String> node, char currentChar, int currentCount, char maxChar, int maxCount)
{
if (node == null) return maxChar;
char c = node.Value[0];
if (c == currentChar)
{
return GetMostLetterRecursion(node.Next, currentChar, currentCount++, maxChar, maxCount);
}
if(currentCount > maxCount)
{
return GetMostLetterRecursion(node.Next, c, 1, currentChar, currentCount);
}
return GetMostLetterRecursion(node.Next, c, 1, maxChar, maxCount);
}
Solution 1
Loop over the words, keeping a tally of how many words start with each letter. Return the most popular letter according to the tally (easy if you used a priority queue for the tally).
This takes O(n) time (the number of words) and O(26) memory (the number of letters in alphabet).
Solution 2
Sort the words alphabetically. Loop over the words. Keep a record of the current letter and its frequency, as well as the most popular letter so far and its frequency. At the end of the loop, that's the most popular letter over the whole list.
This takes O(n log n) time and O(1) memory.
Keep an array to store the count of occurrences and Go through the linked list once to count it. Finally loop through the array to find the highest one.
Rough sketch in C:
int count[26]={0};
While ( head->next != NULL)
{
count[head->word[0] - 'A']++; // Assuming 'word' is string in each node
head = head->next;
}
max = count[0];
for (i=0;i<26;i++)
{
if(max<a[i])
max = a[i];
}
You can modify it to use recursion and handle lower case letters.
Here is a pure recursive implementation in Python. I haven't tested it, but it should work modulo typos or syntax errors. I used a Dictionary to store counts, so it will work with Unicode words too. The problem is split into two functions: one to count the occurrences of each letter, and another to find the maximum recursively.
# returns a dictionary where dict[letter] contains the count of letter
def count_first_letters(words):
def count_first_letters_rec(words, count_so_far):
if len(words) == 0:
return count_so_far
first_letter = words[0][0]
# could use defaultdict but this is an exercise :)
try:
count_so_far[first_letter] += 1
except KeyError:
count_so_far[first_letter] = 1
# recursive call
return count_first_letters_rec(words[1:], count_so_far)
return count_first_letters(words, {})
# takes a list of (item, count) pairs and returns the item with largest count.
def argmax(item_count_pairs):
def argmax_rec(item_count_pairs, max_so_far, argmax_so_far):
if len(item_count_pairs) == 0:
return argmax_so_far
item, count = item_count_pairs[0]
if count > max_so_far:
max_so_far = count
argmax_so_far = item
# recursive call
return argmax_rec(item_count_pairs[1:], max_so_far, argmax_so_far)
return argmax_rec(item_count_pairs, 0, None)
def most_common_first_letter(words);
counts = count_first_letters(words)
# this returns a dictionary, but we need to convert to
# a list of (key, value) tuples because recursively iterating
# over a dictionary is not so easy
kvpairs = counts.items()
# counts.iteritems() for Python 2
return argmax(kvpairs)
I have an array with the length of 26 (as English letters, so index 1 is for 'a' and 2 for 'b' and so on. ). Each time a letter occurs, I increment it's value in the array. if the value becomes more than max amount, then I update the max and take that letter as most occurred one.then I call the method for the next node.
This is the code in Java:
import java.util.LinkedList;
public class MostOccurance {
char mostOccured;
int maxOccurance;
LinkedList<String> list= new LinkedList<String>();
int[] letters= new int[26];
public void start(){
findMostOccuredChar( 0, '0', 0);
}
public char findMostOccuredChar ( int node, char most, int max){
if(node>=list.size())
return most;
String string=list.get(node);
if (string.charAt(0)== most)
{max++;
letters[Character.getNumericValue(most)-10]++;
}
else{
letters[Character.getNumericValue(most)-10]++;
if (letters[Character.getNumericValue(most)-10]++>max){
max=letters[Character.getNumericValue(most)-10];
most=string.charAt(0);
}
}
findMostOccuredChar( node++, most, max);
return most;
}
}
of course, you have to add each word to your link list. I didn't do that, because I was just showing an example.

How do I generate a random string of up to a certain length?

I would like to generate a random string (or a series of random strings, repetitions allowed) of length between 1 and n characters from some (finite) alphabet. Each string should be equally likely (in other words, the strings should be uniformly distributed).
The uniformity requirement means that an algorithm like this doesn't work:
alphabet = "abcdefghijklmnopqrstuvwxyz"
len = rand(1, n)
s = ""
for(i = 0; i < len; ++i)
s = s + alphabet[rand(0, 25)]
(pseudo code, rand(a, b) returns a integer between a and b, inclusively, each integer equally likely)
This algorithm generates strings with uniformly distributed lengths, but the actual distribution should be weighted toward longer strings (there are 26 times as many strings with length 2 as there are with length 1, and so on.) How can I achieve this?
What you need to do is generate your length and then your string as two distinct steps. You will need to first chose the length using a weighted approach. You can calculate the number of strings of a given length l for an alphabet of k symbols as k^l. Sum those up and then you have the total number of strings of any length, your first step is to generate a random number between 1 and that value and then bin it accordingly. Modulo off by one errors you would break at 26, 26^2, 26^3, 26^4 and so on. The logarithm based on the number of symbols would be useful for this task.
Once you have you length then you can generate the string as you have above.
Okay, there are 26 possibilities for a 1-character string, 262 for a 2-character string, and so on up to 2626 possibilities for a 26-character string.
That means there are 26 times as many possibilities for an (N)-character string than there are for an (N-1)-character string. You can use that fact to select your length:
def getlen(maxlen):
sz = maxlen
while sz != 1:
if rnd(27) != 1:
return sz
sz--;
return 1
I use 27 in the above code since the total sample space for selecting strings from "ab" is the 26 1-character possibilities and the 262 2-character possibilities. In other words, the ratio is 1:26 so 1-character has a probability of 1/27 (rather than 1/26 as I first answered).
This solution isn't perfect since you're calling rnd multiple times and it would be better to call it once with an possible range of 26N+26N-1+261 and select the length based on where your returned number falls within there but it may be difficult to find a random number generator that'll work on numbers that large (10 characters gives you a possible range of 2610+...+261 which, unless I've done the math wrong, is 146,813,779,479,510).
If you can limit the maximum size so that your rnd function will work in the range, something like this should be workable:
def getlen(chars,maxlen):
assert maxlen >= 1
range = chars
sampspace = 0
for i in 1 .. maxlen:
sampspace = sampspace + range
range = range * chars
range = range / chars
val = rnd(sampspace)
sz = maxlen
while val < sampspace - range:
sampspace = sampspace - range
range = range / chars
sz = sz - 1
return sz
Once you have the length, I would then use your current algorithm to choose the actual characters to populate the string.
Explaining it further:
Let's say our alphabet only consists of "ab". The possible sets up to length 3 are [ab] (2), [ab][ab] (4) and [ab][ab][ab] (8). So there is a 8/14 chance of getting a length of 3, 4/14 of length 2 and 2/14 of length 1.
The 14 is the magic figure: it's the sum of all 2n for n = 1 to the maximum length. So, testing that pseudo-code above with chars = 2 and maxlen = 3:
assert maxlen >= 1 [okay]
range = chars [2]
sampspace = 0
for i in 1 .. 3:
i = 1:
sampspace = sampspace + range [0 + 2 = 2]
range = range * chars [2 * 2 = 4]
i = 2:
sampspace = sampspace + range [2 + 4 = 6]
range = range * chars [4 * 2 = 8]
i = 3:
sampspace = sampspace + range [6 + 8 = 14]
range = range * chars [8 * 2 = 16]
range = range / chars [16 / 2 = 8]
val = rnd(sampspace) [number from 0 to 13 inclusive]
sz = maxlen [3]
while val < sampspace - range: [see below]
sampspace = sampspace - range
range = range / chars
sz = sz - 1
return sz
So, from that code, the first iteration of the final loop will exit with sz = 3 if val is greater than or equal to sampspace - range [14 - 8 = 6]. In other words, for the values 6 through 13 inclusive, 8 of the 14 possibilities.
Otherwise, sampspace becomes sampspace - range [14 - 8 = 6] and range becomes range / chars [8 / 2 = 4].
Then the second iteration of the final loop will exit with sz = 2 if val is greater than or equal to sampspace - range [6 - 4 = 2]. In other words, for the values 2 through 5 inclusive, 4 of the 14 possibilities.
Otherwise, sampspace becomes sampspace - range [6 - 4 = 2] and range becomes range / chars [4 / 2 = 2].
Then the third iteration of the final loop will exit with sz = 1 if val is greater than or equal to sampspace - range [2 - 2 = 0]. In other words, for the values 0 through 1 inclusive, 2 of the 14 possibilities (this iteration will always exit since the value must be greater than or equal to zero.
In retrospect, that second solution is a bit of a nightmare. In my personal opinion, I'd go for the first solution for its simplicity and to avoid the possibility of rather large numbers.
Building on my comment posted as a reply to the OP:
I'd consider it an exercise in base
conversion. You're simply generating a
"random number" in "base 26", where
a=0 and z=25. For a random string of
length n, generate a number between 1
and 26^n. Convert from base 10 to base
26, using symbols from your chosen
alphabet.
Here's a PHP implementation. I won't guaranty that there isn't an off-by-one error or two in here, but any such error should be minor:
<?php
$n = 5;
var_dump(randstr($n));
function randstr($maxlen) {
$dict = 'abcdefghijklmnopqrstuvwxyz';
$rand = rand(0, pow(strlen($dict), $maxlen));
$str = base_convert($rand, 10, 26);
//base convert returns base 26 using 0-9 and 15 letters a-p(?)
//we must convert those to our own set of symbols
return strtr($str, '1234567890abcdefghijklmnopqrstuvwxyz', $dict);
}
Instead of picking a length with uniform distribution, weight it according to how many strings are a given length. If your alphabet is size m, there are mx strings of size x, and (1-mn+1)/(1-m) strings of length n or less. The probability of choosing a string of length x should be mx*(1-m)/(1-mn+1).
Edit:
Regarding overflow - using floating point instead of integers will expand the range, so for a 26-character alphabet and single-precision floats, direct weight calculation shouldn't overflow for n<26.
A more robust approach is to deal with it iteratively. This should also minimize the effects of underflow:
int randomLength() {
for(int i = n; i > 0; i--) {
double d = Math.random();
if(d > (m - 1) / (m - Math.pow(m, -i))) {
return i;
}
}
return 0;
}
To make this more efficient by calculating fewer random numbers, we can reuse them by splitting intervals in more than one place:
int randomLength() {
for(int i = n; i > 0; i -= 5) {
double d = Math.random();
double c = (m - 1) / (m - Math.pow(m, -i))
for(int j = 0; j < 5; j++) {
if(d > c) {
return i - j;
}
c /= m;
}
}
for(int i = n % 0; i > 0; i--) {
double d = Math.random();
if(d > (m - 1) / (m - Math.pow(m, -i))) {
return i;
}
}
return 0;
}
Edit: This answer isn't quite right. See the bottom for a disproof. I'll leave it up for now in the hope someone can come up with a variant that fixes it.
It's possible to do this without calculating the length separately - which, as others have pointed out, requires raising a number to a large power, and generally seems like a messy solution to me.
Proving that this is correct is a little tough, and I'm not sure I trust my expository powers to make it clear, but bear with me. For the purposes of the explanation, we're generating strings of length at most n from an alphabet a of |a| characters.
First, imagine you have a maximum length of n, and you've already decided you're generating a string of at least length n-1. It should be obvious that there are |a|+1 equally likely possibilities: we can generate any of the |a| characters from the alphabet, or we can choose to terminate with n-1 characters. To decide, we simply pick a random number x between 0 and |a| (inclusive); if x is |a|, we terminate at n-1 characters; otherwise, we append the xth character of a to the string. Here's a simple implementation of this procedure in Python:
def pick_character(alphabet):
x = random.randrange(len(alphabet) + 1)
if x == len(alphabet):
return ''
else:
return alphabet[x]
Now, we can apply this recursively. To generate the kth character of the string, we first attempt to generate the characters after k. If our recursive invocation returns anything, then we know the string should be at least length k, and we generate a character of our own from the alphabet and return it. If, however, the recursive invocation returns nothing, we know the string is no longer than k, and we use the above routine to select either the final character or no character. Here's an implementation of this in Python:
def uniform_random_string(alphabet, max_len):
if max_len == 1:
return pick_character(alphabet)
suffix = uniform_random_string(alphabet, max_len - 1)
if suffix:
# String contains characters after ours
return random.choice(alphabet) + suffix
else:
# String contains no characters after our own
return pick_character(alphabet)
If you doubt the uniformity of this function, you can attempt to disprove it: suggest a string for which there are two distinct ways to generate it, or none. If there are no such strings - and alas, I do not have a robust proof of this fact, though I'm fairly certain it's true - and given that the individual selections are uniform, then the result must also select any string with uniform probability.
As promised, and unlike every other solution posted thus far, no raising of numbers to large powers is required; no arbitrary length integers or floating point numbers are needed to store the result, and the validity, at least to my eyes, is fairly easy to demonstrate. It's also shorter than any fully-specified solution thus far. ;)
If anyone wants to chip in with a robust proof of the function's uniformity, I'd be extremely grateful.
Edit: Disproof, provided by a friend:
dato: so imagine alphabet = 'abc' and n = 2
dato: you have 9 strings of length 2, 3 of length 1, 1 of length 0
dato: that's 13 in total
dato: so probability of getting a length 2 string should be 9/13
dato: and probability of getting a length 1 or a length 0 should be 4/13
dato: now if you call uniform_random_string('abc', 2)
dato: that transforms itself into a call to uniform_random_string('abc', 1)
dato: which is an uniform distribution over ['a', 'b', 'c', '']
dato: the first three of those yield all the 2 length strings
dato: and the latter produce all the 1 length strings and the empty strings
dato: but 0.75 > 9/13
dato: and 0.25 < 4/13
// Note space as an available char
alphabet = "abcdefghijklmnopqrstuvwxyz "
result_string = ""
for( ;; )
{
s = ""
for( i = 0; i < n; i++ )
s += alphabet[rand(0, 26)]
first_space = n;
for( i = 0; i < n; i++ )
if( s[ i ] == ' ' )
{
first_space = i;
break;
}
ok = true;
// Reject "duplicate" shorter strings
for( i = first_space + 1; i < n; i++ )
if( s[ i ] != ' ' )
{
ok = false;
break;
}
if( !ok )
continue;
// Extract the short version of the string
for( i = 0; i < first_space; i++ )
result_string += s[ i ];
break;
}
Edit: I forgot to disallow 0-length strings, that will take a bit more code which I don't have time to add now.
Edit: After considering how my answer doesn't scale to large n (takes too long to get lucky and find an accepted string), I like paxdiablo's answer much better. Less code too.
Personally I'd do it like this:
Let's say your alphabet has Z characters. Then the number of possible strings for each length L is:
L | Z
--------------------------
1 | 26
2 | 676 (= 26 * 26)
3 | 17576 (= 26 * 26 * 26)
...and so on.
Now let's say your maximum desired length is N. Then the total number of possible strings from length 1 to N that your function could generate would be the sum of a geometric sequence:
(1 - (Z ^ (N + 1))) / (1 - Z)
Let's call this value S. Then the probability of generating a string of any length L should be:
(Z ^ L) / S
OK, fine. This is all well and good; but how do we generate a random number given a non-uniform probability distribution?
The short answer is: you don't. Get a library to do that for you. I develop mainly in .NET, so one I might turn to would be Math.NET.
That said, it's really not so hard to come up with a rudimentary approach to doing this on your own.
Here's one way: take a generator that gives you a random value within a known uniform distribution, and assign ranges within that distribution of sizes dependent on your desired distribution. Then interpret the random value provided by the generator by determining which range it falls into.
Here's an example in C# of one way you could implement this idea (scroll to the bottom for example output):
RandomStringGenerator class
public class RandomStringGenerator
{
private readonly Random _random;
private readonly char[] _alphabet;
public RandomStringGenerator(string alphabet)
{
if (string.IsNullOrEmpty(alphabet))
throw new ArgumentException("alphabet");
_random = new Random();
_alphabet = alphabet.Distinct().ToArray();
}
public string NextString(int maxLength)
{
// Get a value randomly distributed between 0.0 and 1.0 --
// this is approximately what the System.Random class provides.
double value = _random.NextDouble();
// This is where the magic happens: we "translate" the above number
// to a length based on our computed probability distribution for the given
// alphabet and the desired maximum string length.
int length = GetLengthFromRandomValue(value, _alphabet.Length, maxLength);
// The rest is easy: allocate a char array of the length determined above...
char[] chars = new char[length];
// ...populate it with a bunch of random values from the alphabet...
for (int i = 0; i < length; ++i)
{
chars[i] = _alphabet[_random.Next(0, _alphabet.Length)];
}
// ...and return a newly constructed string.
return new string(chars);
}
static int GetLengthFromRandomValue(double value, int alphabetSize, int maxLength)
{
// Looping really might not be the smartest way to do this,
// but it's the most obvious way that immediately springs to my mind.
for (int length = 1; length <= maxLength; ++length)
{
Range r = GetRangeForLength(length, alphabetSize, maxLength);
if (r.Contains(value))
return length;
}
return maxLength;
}
static Range GetRangeForLength(int length, int alphabetSize, int maxLength)
{
int L = length;
int Z = alphabetSize;
int N = maxLength;
double possibleStrings = (1 - (Math.Pow(Z, N + 1)) / (1 - Z));
double stringsOfGivenLength = Math.Pow(Z, L);
double possibleSmallerStrings = (1 - Math.Pow(Z, L)) / (1 - Z);
double probabilityOfGivenLength = ((double)stringsOfGivenLength / possibleStrings);
double probabilityOfShorterLength = ((double)possibleSmallerStrings / possibleStrings);
double startPoint = probabilityOfShorterLength;
double endPoint = probabilityOfShorterLength + probabilityOfGivenLength;
return new Range(startPoint, endPoint);
}
}
Range struct
public struct Range
{
public readonly double StartPoint;
public readonly double EndPoint;
public Range(double startPoint, double endPoint)
: this()
{
this.StartPoint = startPoint;
this.EndPoint = endPoint;
}
public bool Contains(double value)
{
return this.StartPoint <= value && value <= this.EndPoint;
}
}
Test
static void Main(string[] args)
{
const int N = 5;
const string alphabet = "acegikmoqstvwy";
int Z = alphabet.Length;
var rand = new RandomStringGenerator(alphabet);
var strings = new List<string>();
for (int i = 0; i < 100000; ++i)
{
strings.Add(rand.NextString(N));
}
Console.WriteLine("First 10 results:");
for (int i = 0; i < 10; ++i)
{
Console.WriteLine(strings[i]);
}
// sanity check
double sumOfProbabilities = 0.0;
for (int i = 1; i <= N; ++i)
{
double probability = Math.Pow(Z, i) / ((1 - (Math.Pow(Z, N + 1))) / (1 - Z));
int numStrings = strings.Count(str => str.Length == i);
Console.WriteLine("# strings of length {0}: {1} (probability = {2:0.00%})", i, numStrings, probability);
sumOfProbabilities += probability;
}
Console.WriteLine("Probabilities sum to {0:0.00%}.", sumOfProbabilities);
Console.ReadLine();
}
Output:
First 10 results:
wmkyw
qqowc
ackai
tokmo
eeiyw
cakgg
vceec
qwqyq
aiomt
qkyav
# strings of length 1: 1 (probability = 0.00%)
# strings of length 2: 38 (probability = 0.03%)
# strings of length 3: 475 (probability = 0.47%)
# strings of length 4: 6633 (probability = 6.63%)
# strings of length 5: 92853 (probability = 92.86%)
Probabilities sum to 100.00%.
My idea regarding this is like:
you have 1-n length string.there 26 possible 1 length string,26*26 2 length string and so on.
you can find out the percentage of each length string of the total possible strings.for example percentage of single length string is like
((26/(TOTAL_POSSIBLE_STRINGS_OF_ALL_LENGTH))*100).
similarly you can find out the percentage of other length strings.
Mark them on a number line between 1 to 100.ie suppose percentage of single length string is 3 and double length string is 6 then number line single length string lies between 0-3 while double length string lies between 3-9 and so on.
Now take a random number between 1 to 100.find out the range in which this number lies.I mean suppose for examplethe number you have randomly chosen is 2.Now this number lies between 0-3 so go 1 length string or if the random number chosen is 7 then go for double length string.
In this fashion you can see that length of each string choosen will be proportional to the percentage of the total number of that length string contribute to the all possible strings.
Hope I am clear.
Disclaimer: I have not gone through above solution except one or two.So if it matches with some one solution it will be purely a chance.
Also,I will welcome all the advice and positive criticism and correct me if I am wrong.
Thanks and regard
Mawia
Matthieu: Your idea doesn't work because strings with blanks are still more likely to be generated. In your case, with n=4, you could have the string 'ab' generated as 'a' + 'b' + '' + '' or '' + 'a' + 'b' + '', or other combinations. Thus not all the strings have the same chance of appearing.

Resources