Selection sort on strings - sorting

How does selection sort work with strings? I've done some searching and can't seem to find a definitive answer. If I had 4 names [Rob, Adam, Tom, Thomas] - how would a selection sort, sort these? Would it just simply sort by the first letter? If so, would it sort like the following: [Adam, Rob, Thomas, Tom].
Thanks.

All sorting algorithms use some kind of comparison function to determine the order of elements. It's generally independent of particular sorting algorithm you select.
Most languages try to guess the comparison function, depending of the type of sorted data. For example comparison on numbers simply checks which number is greater. Comparison function on strings uses dictionary order which compares consecutive letters. Some examples (GT - greater than, LT - less than):
Compare numbers:
> compare 1 2
LT
letters:
> compare 'R' 'A'
GT
strings (it compares letters internally, think how):
> compare "Rob" "Adam"
GT
Sorting function uses this comparisons internally ([1,2,3] is a list of three numbers). You don't know which sorting algorithm is used internally, but as long as the same comparison function is used, results shouldn't vary:
> sort [3,1,2]
[1,2,3]
> sort ['t', 'h', 'o', 'm', 'a', 's']
['a', 'h', 'm', 'o', 's', 't']
> sort ["Rob", "Adam", "Tom", "Thomas"]
["Adam","Rob","Thomas","Tom"]
You can even define your own comparison function, to sort by some more sophisticated criteria:
Sort list of numbers by count of prime divisors.
First custom comparison function:
> numOfPrimeDivs 30
3
> numOfPrimeDivs 6
2
> let compareNumOfPrimeDivs n1 n2 = compare (numOfPrimeDivs n1) (numOfPrimeDivs n2))
> compareNumOfPrimeDivs 30 6
GT
> sortBy compareNumOfPrimeDivs [2,210,30,2310,6]
[2,6,30,210,2310]
Sort by length of strings
Comparison function:
> length "Rob"
3
> length "Adam"
4
> let compareLength s1 s2 = compare (length s1) (length s2)
> compareLength "Rob" "Adam"
LT
> sortBy compareLength ["Rob", "Adam", "Tom", "Thomas"]
["Rob","Tom","Adam","Thomas"]

Related

Algorithm exercise

I'm working on this algorithm exercise but I don't understand completely the formulation. Here is the exercise:
Given a string str and array of pairs that indicates which indices in
the string can be swapped, return the lexicographically largest string
that results from doing the allowed swaps. You can swap indices any
number of times.
Example
For str = "abdc" and pairs = [[1, 4], [3, 4]], the output should be
swapLexOrder(str, pairs) = "dbca".
By swapping the given indices, you get the strings: "cbda", "cbad",
"dbac", "dbca". The lexicographically largest string in this list is
"dbca".
Input/Output
[execution time limit] 4 seconds (js)
[input] string str
A string consisting only of lowercase English letters.
Guaranteed constraints: 1 ≤ str.length ≤ 104.
[input] array.array.integer pairs
An array containing pairs of indices that can be swapped in str
(1-based). This means that for each pairs[i], you can swap elements in
str that have the indices pairs[i][0] and pairs[i][1].
Guaranteed constraints: 0 ≤ pairs.length ≤ 5000, pairs[i].length = 2.
[output] string
My question is, why "abcd" is not a posible answer (just swapping index 3 and 4 on the original string "abdc")? The example says
By swapping the given indices, you get the strings: "cbda", "cbad",
"dbac", "dbca". The lexicographically largest string in this list is
"dbca"
I understand that even if "abcd" is a possible answer "dbca" is lexicographically largest so the answer is the same. But if I don't understand why "abcd" is not a possible answer I think I'm misunderstanding the task
You are reading the question correctly, and their description is broken. Both "abcd" and "abdc" are on the list of possible strings that you can produce, and yet are not in their list.

Given an dictionary of words and and an array letters, find the maximum number of dictionary words which can be created using those letters

Each letter can be used only once. There may be more than one instance of the same letter in the array.
We can assume that each word in the dict can be spelled using the letters. The goal is to return the maximum number of words.
Example 1:
arr = ['a', 'b', 'z', 'z', 'z', 'z']
dict = ['ab', 'azz', 'bzz']
// returns 2 ( for [ 'azz', 'bzz' ])
Example 2:
arr = ['g', 't', 'o', 'g', 'w', 'r', 'd', 'e', 'a', 'b']
dict = ['we', 'bag', 'got', 'word']
// returns 3 ( for ['we', 'bag', 'got'] )
EDIT for clarity to adhere to SO guidelines:
Looking for a solution. I was given this problem during an interview. My solution is below, but it was rejected as too slow.
1.) For each word in dict, w
- Remove w's letters from the arr.
- With the remaining letters, count how many other words could be spelled.
Put that # as w's "score"
2.) With every word "scored", select the word with the highest score,
remove that word and its letters from the input arrays.
3.) Repeat this process until no more words can be spelled from the remaining
set of letters.
This is a fairly generic packing problem with up to 26 resources. If I were trying to solve this problem in practice, I would formulate it as an integer program and apply an integer program solver. Here's an example formulation for the given instance:
maximize x_ab + x_azz + x_bzz
subject to
constraint a: x_ab + x_azz <= 1
constraint b: x_ab + x_bzz <= 1
constraint z: 2 x_azz + 2 x_bzz <= 4
x_ab, x_azz, x_bzz in {0, 1} (or integer >= 0 depending on the exact variant)
The solver will solve the linear relaxation of this program and in the process put a price on each letter indicating how useful it is to make words, which guides the solver quickly to a provably optimal solution on surprisingly large instances (though this is an NP-hard problem for arbitrary-size alphabets, so don't expect much on artificial instances such as those resulting from NP-hardness reductions).
I don't know what your interviewer was looking for -- maybe a dynamic program whose states are multisets of unused letters.
Expression for One possible Dynamic Programming solution can be following:
WordCount(dict,i,listOfRemainingLetterCounts) =
max(WordCount(dict,i-1,listOfRemainingLetterCounts),
WordCount(dict,i-1,listOfRemainingLetterCountsAfterReducingCountOfWordDict[i]))
I see it as a multidimensional problem. Was the interviewer impressed by your answer?
Turn the list of letters into a set of letter-occurence pairs. Where the occurrence is incremented on each occurrence of the same letter in the list e.g. aba becomes set of a-1 b-1 a-2
Translate each word in the dictionary, independently, in a similar manner; so The word coo becomes a set: c-1 o-2.
A word is accepted if the set of its letter-occurences is a subset of the set generated from the original list of letters.
For fixed alphabet, and maximum letter frequencies, this could be implemented quite quickly using bitsets, but, again, how fast is fast enough?

How to generate all unique 4 character permutations of a seed string? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Algorithm to return all combinations of k elements from n
Generate Distinct Combinations PHP
I have an array containing a number of characters/letter,s e.g:
$seed = array('a','b','c','d','e','f',.....,'z','1','2','3',...'9');
I want to get all possible unique 4 character combinations/permutations from the seed, for example:
abcd, azxy, ag12, aaaa, etc
What's the best way to accomplish this?
I have thought about dividing the seed array into 4 letter groups, then go through each group and generate all possible combinations of that group, but that will leave out many combinations (i.e it will process abcd and wxyz, but not abyz and wxcd)
For each character in the array, write that character followed by each of the unique 3 character strings either from the characters after it (if you actually mean combinations) or from all the characters (which is what I think you mean).
How to generate all unique 3 character permutations of a seed string?
See this very similar question.
You may also want to read about recursion.
Python code
>>> def product(chars, n):
if n == 0:
yield ''
else:
for c in chars:
for result in product(x, n - 1): # Recursive call
yield c + result
>>> list(product(['a', 'b', 'c'], 2))
['aa', 'ab', 'ac', 'ba', 'bb', 'bc', 'ca', 'cb', 'cc']
(Note: in real Python code you should use itertools.product rather than writing it yourself.)
Generating permutations is like summing up numbers. This is beautifully explained in the freely available book Higher Order Perl, page 128

Python: all possible words (permutations) of fixed length in mini-alphabet

Let's say I have a string like so:
abcdefghijklmnopqrstuvwxyz1234567890!##$%^&*()-_+={}[]\:;"'?/>.<,`~|€
This is basicly a list of all the characters on my keyboard. How could I get all possible combinations for, let's say, a "word" made up of 8 of these chars? I know there are going to be millions of possibilities.
Cheers!
Difference between permutations and combinations
You are either looking for a permutation or a combination.
'abc' and 'bac' are different permutations, but they are the same combination {a,b,c}.
Permutations of 'abc': '', 'a', 'b', 'c', 'ab', 'ba', 'ac', 'ca', 'bc', 'cb', 'abc', 'acb', 'bac', 'bca', 'cab', 'cba'
Combinations of 'abc': {}, {'a'}, {'b'}, {'c'}, {'a','b'}, {'b','c'}, {'a','c'}, {'a','b','c'}
In python
Use from itertools import * (since the functions there really should be in the default namespace), or import itertools if you'd like.
If you care about permutations:
permutations(yourString, 8)
If you care about combinations :
combinations(yourString, 8)
In other languages
In other languages, there are simple recursive or iterative algorithms to generate these. See wikipedia or stackoverflow. e.g. http://en.wikipedia.org/wiki/Permutation#Systematic_generation_of_all_permutations
Important note
Do note that the number of permutations is N!, so for example your string would have
(69 choose 8) = 8 billion combinations of length 8, and therefore...
(69 choose 8) * 8! ~= 3.37 × 10^14 permutations of length 8.
You'll run out of memory if you are storing every permutation. Even if you don't (because you're reducing them), it'll take a long time to run, maybe somewhere between 1-10 days on a modern computer.

Generating Balls in Boxes

Given two sorted vectors a and b, find all vectors which are sums of a and some permutation of b, and which are unique once sorted.
You can create one of the sought vectors in the following way:
Take vector a and a permutation of vector b.
Sum them together so c[i]=a[i]+b[i].
Sort c.
I'm interested in finding the set of b-permutations that yield the entire set of unique c vectors.
Example 0: a='ccdd' and b='xxyy'
Gives the summed vectors: 'cycydxdx', 'cxcxdydy', 'cxcydxdy'.
Notice that the permutations of b: 'xyxy' and 'yxyx' are equal, because in both cases the "box c" and the "box d" both get exactly one 'x' and one 'y'.
I guess this is similar to putting M balls in M boxes (one in each) with some groups of balls and boxes being identical.
Update: Given a string a='aabbbcdddd' and b='xxyyzzttqq' your problem will be 10 balls in 4 boxes. There are 4 distinct boxes of size 2, 3, 1 and 4. The balls are pair wise indistinguishable.
Example 1: Given strings are a='xyy' and b='kkd'.
Possible solution: 'kkd', 'dkk'.
Reason: We see that all unique permutations of b are 'kkd', 'kdk' and 'dkk'. However with our restraints, the two first permutations are considered equal as the indices on which the differ maps to the same char 'y' in string a.
Example 2: Given strings are a='xyy' and b='khd'.
Possible solution: 'khd', 'dkh', 'hkd'.
Example 3: Given strings are a='xxxx' and b='khhd'.
Possible solution: 'khhd'.
I can solve the problem of generating unique candidate b permutations using Narayana Pandita's algorithm as decribed on Wikipedia/Permutation.
The second part seams harder. My best shot is to join the two strings pairwise to a list, sort it and use it as a key in a lookup set. ('xx'+'hd' join→'xh','xd' sort→'xd','xh').
As my M is often very big, and as similarities in the strings are common, I currently generate way more b permutations than actually goes through the set filter. I would love to have an algorithm generating the correct ones directly. Any improvement is welcome.
To generate k-combinations of possibly repeated elements (multiset), the following could be useful: A Gray Code for Combinations of a Multiset (1995).
For a recursive solution you try the following:
Count the number of times each character appears. Say they are x1 x2 ... xm, corresponding to m distinct characters.
Then you need to find all possible ordered pairs (y1 y2 ... ym) such that
0 <= yi <= xi
and Sum yi = k.
Here yi is the number of times character i appears.
The idea is, fix the number of times char 1 appears (y1). Then recursively generate all combinations of k-y1 from the remaining.
psuedocode:
List Generate (int [] x /* array index starting at 1*/,
int k /* size of set */) {
list = List.Empty;
if (Sum(x) < k) return list;
for (int i = 0; i <= x[1], i++) {
// Remove first element and generate subsets of size k-i.
remaining = x.Remove(1);
list_i = Generate(remaining, k-i);
if (list_i.NotEmpty()) {
list = list + list_i;
} else {
return list;
}
}
return list;
}
PRIOR TO EDITS:
If I understood it correctly, you need to look at string a, see the symbols that appear exactly once. Say there are k such symbols. Then you need to generate all possible permutations of b, which contain k elements and map to those symbols at the corresponding positions. The rest you can ignore/fill in as you see fit.
I remember posting C# code for that here: How to find permutation of k in a given length?
I am assuming xxyy will give only 1 unique string and the ones that appear exactly once are the 'distinguishing' points.
Eg in case of a=xyy, b=add
distinguishing point is x
So you select permuations of 'add' of length 1. Those gives you a and d.
Thus add and dad (or dda) are the ones you need.
For a=xyyz b=good
distinguishing points are x and z
So you generate permutations of b of length 2 giving
go
og
oo
od
do
gd
dg
giving you 7 unique permutations.
Does that help? Is my understanding correct?
Ok, I'm sorry I never was able to clearly explain the problem, but here is a solution.
We need two functions combinations and runvector(v). combinations(s,k) generates the unique combinations of a multiset of a length k. For s='xxyy' these would be ['xx','xy','yy']. runvector(v) transforms a multiset represented as a sorted vector into a more simple structure, the runvector. runvector('cddeee')=[1,2,3].
To solve the problem, we will use recursive generators. We run through all the combinations that fits in box1 and the recourse on the rest of the boxes, banning the values we already chose. To accomplish the banning, combinations will maintain a bitarray across of calls.
In python the approach looks like this:
def fillrest(banned,out,rv,b,i):
if i == len(rv):
yield None
return
for comb in combinations(b,rv[i],banned):
out[i] = comb
for rest in fillrest(banned,out,rv,b,i+1):
yield None
def balls(a,b):
rv = runvector(a)
banned = [False for _ in b]
out = [None for _ in rv]
for _ in fill(out,rv,0,b,banned):
yield out[:]
>>> print list(balls('abbccc','xyyzzz'))
[['x', 'yy', 'zzz'],
['x', 'yz', 'yzz'],
['x', 'zz', 'yyz'],
['y', 'xy', 'zzz'],
['y', 'xz', 'yzz'],
['y', 'yz', 'xzz'],
['y', 'zz', 'xyz'],
['z', 'xy', 'yzz'],
['z', 'xz', 'yyz'],
['z', 'yy', 'xzz'],
['z', 'yz', 'xyz'],
['z', 'zz', 'xyy']]
The output are in 'box' format, but can easily be merged back to simple strings: 'xyyzzzz', 'xyzyzz'...

Resources