How can I find all possible letter combinations of a string? - algorithm

I am given a string and i need to find all possible letter combinations of this string. What is the best way I can achieve this?
example:
abc
result:
abc
acb
bca
bac
cab
cba
i have nothing so far. i am not asking for code. i am just asking for the best way to do it? an algorithm? a pseudocode? maybe a discussion?

you can sort it then use std::next_permutation
take a look at the example: http://www.cplusplus.com/reference/algorithm/next_permutation/

Do you want combinations or permutations? For example, if your string is "abbc" do you want to see "bbac" once or twice?
If you actually want permutations you can use std::next_permutation and it'll take care of all the work for you.

If you want the combinations (order independant) You can use a combination finding algorithm such as that found either here or here. Alternatively, you can use this (a java implementation of a combination generator, with an example demonstrating what you want.
Alternatively, if you want what you have listed in your post (the permutations), then you can (for C++) use std::next_permutation found in <algorithm.h>. You can find more information on std::next_permutation here.
Hope this helps. :)

In C++, std::next_permutation:
std::string s = "abc";
do
{
std::cout << s << std::endl;
} while (std::next_permutation(s.begin(), s.end()));

Copied from an old Wikipedia article;
For every number k, with 0 ≤ k < n!, the following algorithm generates a unique permutation on sequence s.
function permutation(k, s) {
for j = 2 to length(s) {
swap s[(k mod j) + 1] with s[j]; // note that our array is indexed starting at 1
k := k / j; // integer division cuts off the remainder
}
return s;
}

Related

Find best adjacent pair such that to maximize the sum of the first element

I was asked this question in an interview, but couldn't figure it out and would like to know the answer.
Suppose we have a list like this:
1 7 8 6 1 1 5 0
I need to find an algorithm such that it pairs adjacent numbers. The goal is to maximize the benefit but such that only the first number in the pair is counted.
e.g in the above, the optimal solution is:
{7,8} {6,1} {5,0}
so when taking only the first one:
7 + 6 + 5 = 18.
I tried various greedy solutions, but they often pick on {8,6} which leads to a non-optimal solution.
Thoughts?
First, observe that it never makes sense to skip more than one number *. Then, observe that the answer to this problem can be constructed by comparing two numbers:
The answer to the subproblem where you skip the first number, and
The answer to the subproblem where you keep the first number
Finally, observe that the answer to a problem with the sequence of only one number is zero, and the solution to the problem with only two numbers is the first number of the two.
With this information in hand, you can construct a recursive memoized solution to the problem, or a dynamic programming solution that starts at the back and goes back deciding on whether to include the previous number or not.
* Proof: assume that you have a sequence that produces the max sum, and that it skip two numbers in the original sequence. Then you can add the pair that you skipped, and improve on the answer.
A simple dynamic programming problem. Starting from one specific index, we can either make a pair at current index, or skip to the next index:
int []dp;//Array to store result of sub-problem
boolean[]check;//Check for already solve sub-problem
public int solve(int index, int []data){
if(index + 1 >= data.length){//Special case,which cannot create any pair
return 0;
}
if(check[index]){//If this sub-problem is solved before, return the value
return dp[index];
}
check[index] = true;
//Either make a pair at this index, or skip to next index
int result = max(data[index] + solve(index + 2, data) , solve(index + 1,data));
return dp[index] = result;
}
It's a dynamic programming problem, and the table can be optimised away.
def best_pairs(xs):
b0, b1 = 0, max(0, xs[0])
for i in xrange(2, len(xs)):
b0, b1 = b1, max(b1, xs[i-1]+b0)
return b1
print best_pairs(map(int, '1 7 8 6 1 1 5 0'.split()))
After each iteration, b1 is the best solution using elements up to and including i, and b0 is the best solution using elements up to and including i-1.
This is my solution in Java, hope it helps.
public static int getBestSolution(int[] a, int offset) {
if (a.length-offset <= 1)
return 0;
if (a.length-offset == 2)
return a[offset];
return Math.max(a[offset] + getBestSolution(a,offset+2),
getBestSolution(a,offset+1));
}
Here is a DP formulation for O(N) solution : -
MaxPairSum(i) = Max(arr[i]+MaxPairSum(i+2),MaxPairSum(i+1))
MaxPairSum(i) is max sum for subarray (i,N)

Lexicographical sorting

I'm doing a problem that says "concatenate the words to generate the lexicographically lowest possible string." from a competition.
Take for example this string: jibw ji jp bw jibw
The actual output turns out to be: bw jibw jibw ji jp
When I do sorting on this, I get: bw ji jibw jibw jp.
Does this mean that this is not sorting? If it is sorting, does "lexicographic" sorting take into consideration pushing the shorter strings to the back or something?
I've been doing some reading on lexigographical order and I don't see any point or scenarios on which this is used, do you have any?
It seems that what you're looking for is a better understanding of the question, so let me just make it clear. The usual sorting on strings is lexicographic sorting. If you sort the strings [jibw, ji, jp, bw, jibw] into lexicographic order, the sorted sequence is [bw, ji, jibw, jibw, jp], which is what you got. So your problem is not with understanding the word "lexicographic"; you already understand it correctly.
Your problem is that you're misreading the question. The question doesn't ask you to sort the strings in lexicographic order. (If it did, the answer you got by sorting would be correct.) Instead, it asks you to produce one string, got by concatenating the input strings in some order (i.e., making one string without spaces), so that the resulting single string is lexicographically minimal.
To illustrate the difference, consider the string you get by concatenating the sorted sequence, and the answer string:
bwjijibwjibwjp //Your answer
bwjibwjibwjijp //The correct answer
Now when you compare these two strings — note that you're just comparing two 14-character strings, not two sequences of strings — you can see that the correct answer is indeed lexicographically smaller than your answer: your answer starts with "bwjij", while the correct answer starts with "bwjib", and "bwjib" comes before "bwjij" in lexicographic order.
Hope you understand the question now. It is not a sorting question at all. (That is, it is not a problem of sorting the input strings. You could do sorting on all possible strings got by permuting and concatenating the input strings; this is one way of solving the problem if the number of input strings is small.)
You can convert this into a trivial sorting problem by comparing word1 + word2 against word2 + word1. In Python:
def cmp_concetanate(word1, word2):
c1 = word1 + word2
c2 = word2 + word1
if c1 < c2:
return -1
elif c1 > c2:
return 1
else:
return 0
Using this comparison function with the standard sort solves the problem.
I've been using F# in this Facebook hacker cup. Learned quite a bit in this competition. Since the documentation on F# on the web is still rare, I think I might as well share a bit here.
This problem requests you to sort a list of strings based on a customized comparison method. Here is my code snippet in F#.
let comparer (string1:string) (string2:string) =
String.Compare(string1 + string2, string2 + string1)
// Assume words is an array of strings that you read from the input
// Do inplace sorting there
Array.sortInPlaceWith comparer words
// result contains the string for output
let result = Array.fold (+) "" words
//Use this block of code to print lexicographically sorted characters of an array or it can be used in many ways.
#include<stdio.h>
#include<conio.h>
void combo(int,int,char[],char[],int*,int*,int*);
void main()
{
char a[4]={'a','b','c'};
char a1[10];
int i=0,no=0;
int l=0,j=0;
combo(0,3,a,a1,&j,&l,&no);
printf("%d",no);
getch();
}
void combo(int ctr,int n,char a[],char a1[],int*j,int*l,int*no)
{
int i=0;
if(ctr==n)
{
for(i=0;i<n;i++)
printf("%c",a1[i]);
printf("\n");
(*no)++;
(*j)++;
if((*j)==n)
{
*l=0;
*j=0;
}
else
*l=1;
getch();
}
else
for(i=0;i<n;i++)
{
if(*l!=1)
*j=i;
a1[ctr]=a[*j];
combo(ctr+1,n,a,a1,j,l,no);
}
}
The example you posted shows that mere sorting would not generate the lexicographically lowest string.
For the given problem, you would need to apply some additional trick to determine which string should come before which(as of now, I can't think of the exact method)
The actual output does not violate the condition for lexicographically lowest word.
The sort command on linux also does Lexicographic sorting and generates the output in the order bw ji jibw jibw jp
Check what happened here:
If you just apply a lexicographic sort you'll get bw ji jibw jibw jp
but if you analyze token by token you'll find that "bwjibw" (bw, jibw) is lexicographicaly lower than "bwjijibw" (bw, ji, jibw) that's why the answer is bw jibw jibw ji jp because first you should append bwjibwjibw and after that you could concatenate ji and jp to get the lowest string.
A simple trick involving only sorting, which would work for this problem as the max string length is specified, would be to pad all strings up to max length with the first letter in the string. Then you sort the padded strings, but output the original unpadded ones. For ex. for string length 2 and inputs b and ba you would sort bb and ba which would give you ba and bb, and hence you should output bab.
Prasun's trick works if you instead pad with a special "placeholder" character that could be weighted to be greater than "z" in a string sort function. The result would give you the order of lowest lexicographic combination.
The contest is over so I am posting a possible solution, not the most efficient but one way of doing it
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
using namespace std;
int main()
{
ofstream myfile;
myfile.open("output.txt");
int numTestCases;
int numStrings;
string* ptr=NULL;
char*ptr2=NULL;
string tosort;
scanf("%d",&numTestCases);
for(int i=0;i<numTestCases;i++)
{
scanf("%d",&numStrings);
ptr=new string[numStrings];
for(int i=0;i<numStrings;i++)
{
cin>>ptr[i];
}
sort(ptr,ptr+numStrings);
for(int i=0;i<numStrings;i++)
{
next_permutation(ptr,ptr+numStrings);
}
tosort.clear();
for(int i=0;i<numStrings;i++)
{
tosort.append(ptr[i]);
}
ptr2=&tosort[i];
cout<<tosort<<endl;
myfile<<tosort<<endl;
delete[]ptr;
}
return 0;
}
I am using algorithms from the STL library in c++, the prev_permutation function simply generates a permutation sorted lexicographically

C/C++/Java/C#: help parsing numbers

I've got a real problem (it's not homework, you can check my profile). I need to parse data whose formatting is not under my control.
The data look like this:
6,852:6,100,752
So there's first a number made of up to 9 digits, followed by a colon.
Then I know for sure that, after the colon:
there's at least one valid combination of numbers that add up to the number before the column
I know exactly how many numbers add up to the number before the colon (two in this case, but it can go as high as ten numbers)
In this case, 6852 is 6100 + 752.
My problem: I need to find these numbers (in this example, 6100 + 752).
It is unfortunate that in the data I'm forced to parse, the separator between the numbers (the comma) is also the separator used inside the number themselves (6100 is written as 6,100).
Once again: that unfortunate formatting is not under my control and, once again, this is not homework.
I need to solve this for up to 10 numbers that need to add up.
Here's an example with three numbers adding up to 6855:
6,855:360,6,175,320
I fear that there are cases where there would be two possible different solutions. HOWEVER if I get a solution that works "in most cases" it would be enough.
How do you typically solve such a problem in a C-style bracket language?
Well, I would start with the brute force approach and then apply some heuristics to prune the search space. Just split the list on the right by commas and iterate over all possible ways to group them into n terms (where n is the number of terms in the solution). You can use the following two rules to skip over invalid possibilities.
(1) You know that any group of 1 or 2 digits must begin a term.
(2) You know that no candidate term in your comma delimited list can be greater than the total on the left. (This also tells you the maximum number of digit groups that any candidate term can have.)
Recursive implementation (pseudo code):
int total; // The total read before the colon
// Takes the list of tokens as integers after the colon
// tokens is the set of tokens left to analyse,
// partialList is the partial list of numbers built so far
// sum is the sum of numbers in partialList
// Aggregate takes 2 ints XXX and YYY and returns XXX,YYY (= XXX*1000+YYY)
function getNumbers(tokens, sum, partialList) =
if isEmpty(tokens)
if sum = total return partialList
else return null // Got to the end with the wrong sum
var result1 = getNumbers(tokens[1:end], sum+token[0], Add(partialList, tokens[0]))
var result2 = getNumbers(tokens[2:end], sum+Aggregate(token[0], token[1]), Append(partialList, Aggregate(tokens[0], tokens[1])))
if result1 <> null return result1
if result2 <> null return result2
return null // No solution overall
You can do a lot better from different points of view, like tail recursion, pruning (you can have XXX,YYY only if YYY has 3 digits)... but this may work well enough for your app.
Divide-and-conquer would make for a nice improvement.
I think you should try all possible ways to parse the string and calculate the sum and return a list of those results that give the correct sum. This should be only one result in most cases unless you are very unlucky.
One thing to note that reduces the number of possibilities is that there is only an ambiguity if you have aa,bbb and bbb is exactly 3 digits. If you have aa,bb there is only one way to parse it.
Reading in C++:
std::pair<int,std::vector<int> > read_numbers(std::istream& is)
{
std::pair<int,std::vector<int> > result;
if(!is >> result.first) throw "foo!"
for(;;) {
int i;
if(!is >> i)
if(is.eof()) return result;
else throw "bar!";
result.second.push_back(i);
char ch;
if(is >> ch)
if(ch != ',') throw "foobar!";
is >> std::ws;
}
}
void f()
{
std::istringstream iss("6,852:6,100,752");
std::pair<int,std::vector<int> > foo = read_numbers(iss);
std::vector<int> result = get_winning_combination( foo.first
, foo.second.begin()
, foo.second.end() );
for( std::vector<int>::const_iterator i=result.begin(); i!=result.end(), ++i)
std::cout << *i << " ";
}
The actual cracking of the numbers is left as an exercise to the reader. :)
I think your main problem is deciding how to actually parse the numbers. The rest is just rote work with strings->numbers and iteration over combinations.
For instance, in the examples you gave, you could heuristically decide that a single-digit number followed by a three-digit number is, in fact, a four-digit number. Does a heuristic such as this hold true over a larger dataset? If not, you're also likely to have to iterate over the possible input parsing combinations, which means the naive solution is going to have a big polynomic complexity (O(nx), where x is >4).
Actually checking for which numbers add up is easy to do using a recursive search.
List<int> GetSummands(int total, int numberOfElements, IEnumerable<int> values)
{
if (numberOfElements == 0)
{
if (total == 0)
return new List<int>(); // Empty list.
else
return null; // Indicate no solution.
}
else if (total < 0)
{
return null; // Indicate no solution.
}
else
{
for (int i = 0; i < values.Count; ++i)
{
List<int> summands = GetSummands(
total - values[i], numberOfElements - 1, values.Skip(i + 1));
if (summands != null)
{
// Found solution.
summands.Add(values[i]);
return summands;
}
}
}
}

Finding perfect numbers between 1 and 100

How can I generate all perfect numbers between 1 and 100?
A perfect number is a positive integer that is equal to the sum of its proper divisors. For example, 6(=1+2+3) is a perfect number.
So I suspect Frank is looking for an answer in Prolog, and yes it does smell rather homeworky...
For fun I decided to write up my answer. It took me about 50 lines.
So here is the outline of what my predicates look like. Maybe it will help get you thinking the Prolog way.
is_divisor(+Num,+Factor)
divisors(+Num,-Factors)
divisors(+Num,+N,-Factors)
sum(+List,-Total)
sum(+List,+Sofar,-Total)
is_perfect(+N)
perfect(+N,-List)
The + and - are not really part of the parameter names. They are a documentation clue about what the author expects to be instantiated.(NB) "+Foo" means you expect Foo to have a value when the predicate is called. "-Foo" means you expect to Foo to be a variable when the predicate is called, and give it a value by the time it finishes. (kind of like input and output, if it helps to think that way)
Whenever you see a pair of predicates like sum/2 and sum/3, odds are the sum/2 one is like a wrapper to the sum/3 one which is doing something like an accumulator.
I didn't bother to make it print them out nicely. You can just query it directly in the Prolog command line:
?- perfect(100,L).
L = [28, 6] ;
fail.
Another thing that might be helpful, that I find with Prolog predicates, is that there are generally two kinds. One is one that simply checks if something is true. For this kind of predicate, you want everything else to fail. These don't tend to need to be recursive.
Others will want to go through a range (of numbers or a list) and always return a result, even if it is 0 or []. For these types of predicates you need to use recursion and think about your base case.
HTH.
NB: This is called "mode", and you can actually specify them and the compiler/interpreter will enforce them, but I personally just use them in documentation. Also tried to find a page with info about Prolog mode, but I can't find a good link. :(
I'm not sure if this is what you were looking for, but you could always just print out "6, 28"...
Well looks like you need to loop up until n/2 that is 1/2 of n. Divide the number and if there is no remainder then you can include it in the total, once you have exhausted 1/2 of n then you check if your total added = the number you are testing.
For instance:
#include "stdafx.h"
#include "iostream"
#include "math.h"
using namespace std;
int main(void)
{
int total=0;
for(int i = 1; i<=100; i++)
{
for( int j=1; j<=i/2; j++)
{
if (!(i%j))
{
total+=j;
}
}
if (i==total)
{
cout << i << " is perfect";
}
//it works
total=0;
}
return 0;
}

String Tiling Algorithm

I'm looking for an efficient algorithm to do string tiling. Basically, you are given a list of strings, say BCD, CDE, ABC, A, and the resulting tiled string should be ABCDE, because BCD aligns with CDE yielding BCDE, which is then aligned with ABC yielding the final ABCDE.
Currently, I'm using a slightly naïve algorithm, that works as follows. Starting with a random pair of strings, say BCD and CDE, I use the following (in Java):
public static String tile(String first, String second) {
for (int i = 0; i < first.length() || i < second.length(); i++) {
// "right" tile (e.g., "BCD" and "CDE")
String firstTile = first.substring(i);
// "left" tile (e.g., "CDE" and "BCD")
String secondTile = second.substring(i);
if (second.contains(firstTile)) {
return first.substring(0, i) + second;
} else if (first.contains(secondTile)) {
return second.substring(0, i) + first;
}
}
return EMPTY;
}
System.out.println(tile("CDE", "ABCDEF")); // ABCDEF
System.out.println(tile("BCD", "CDE")); // BCDE
System.out.println(tile("CDE", "ABC")); // ABCDE
System.out.println(tile("ABC", tile("BCX", "XYZ"))); // ABCXYZ
Although this works, it's not very efficient, as it iterates over the same characters over and over again.
So, does anybody know a better (more efficient) algorithm to do this ? This problem is similar to a DNA sequence alignment problem, so any advice from someone in this field (and others, of course) are very much welcome. Also note that I'm not looking for an alignment, but a tiling, because I require a full overlap of one of the strings over the other.
I'm currently looking for an adaptation of the Rabin-Karp algorithm, in order to improve the asymptotic complexity of the algorithm, but I'd like to hear some advice before delving any further into this matter.
Thanks in advance.
For situations where there is ambiguity -- e.g., {ABC, CBA} which could result in ABCBA or CBABC --, any tiling can be returned. However, this situation seldom occurs, because I'm tiling words, e.g. {This is, is me} => {This is me}, which are manipulated so that the aforementioned algorithm works.
Similar question: Efficient Algorithm for String Concatenation with Overlap
Order the strings by the first character, then length (smallest to largest), and then apply the adaptation to KMP found in this question about concatenating overlapping strings.
I think this should work for the tiling of two strings, and be more efficient than your current implementation using substring and contains. Conceptually I loop across the characters in the 'left' string and compare them to a character in the 'right' string. If the two characters match, I move to the next character in the right string. Depending on which string the end is first reached of, and if the last compared characters match or not, one of the possible tiling cases is identified.
I haven't thought of anything to improve the time complexity of tiling more than two strings. As a small note for multiple strings, this algorithm below is easily extended to checking the tiling of a single 'left' string with multiple 'right' strings at once, which might prevent extra looping over the strings a bit if you're trying to find out whether to do ("ABC", "BCX", "XYZ") or ("ABC", "XYZ", BCX") by just trying all the possibilities. A bit.
string Tile(string a, string b)
{
// Try both orderings of a and b,
// since TileLeftToRight is not commutative.
string ab = TileLeftToRight(a, b);
if (ab != "")
return ab;
return TileLeftToRight(b, a);
// Alternatively you could return whichever
// of the two results is longest, for cases
// like ("ABC" "BCABC").
}
string TileLeftToRight(string left, string right)
{
int i = 0;
int j = 0;
while (true)
{
if (left[i] != right[j])
{
i++;
if (i >= left.Length)
return "";
}
else
{
i++;
j++;
if (i >= left.Length)
return left + right.Substring(j);
if (j >= right.Length)
return left;
}
}
}
If Open Source code is acceptable, then you should check the genome benchmarks in Stanford's STAMP benchmark suite: it does pretty much exactly what you're looking for. Starting with a bunch of strings ("genes"), it looks for the shortest string that incorporates all the genes. So for example if you have ATGC and GCAA, it'll find ATGCAA. There's nothing about the algorithm that limits it to a 4-character alphabet, so this should be able to help you.
The first thing to ask is if you want to find the tilling of {CDB, CDA}? There is no single tilling.
Interesting problem. You need some kind of backtracking. For example if you have:
ABC, BCD, DBC
Combining DBC with BCD results in:
ABC, DBCD
Which is not solvable. But combining ABC with BCD results in:
ABCD, DBC
Which can be combined to:
ABCDBC.

Resources