Print array length for each element of an array - data-structures

Given a string array of variable length, print the lengths of each element in the array.
For example, given:
string[] ex = {"abc", "adf", "df", "ergd", "adfdfd");
The output should be:
2 3 4 6
One possibility I'm considering is to use a linked list to save each string length, and sort while inserting and finally display the results.
Any other suggestions for efficient solutions to this problem?

Whenever you want to maintain a collection of distinct things (ie: filter out duplicates), you probably want a set.
There are many different data structures for storing sets. Some of these, like search trees, will also "sort" the values for you. You could try using one of the many forms of binary search trees.

What you are doing now (or the given answer) is called the insertion sort. It basically compare the length of the string-to-insert from the inserted strings. After then, when printing, teh length of string-to-print (at current pointer) will be compared to the length of the string before it and after it, if has the same length, do not print!
Another approach is, the bubble sort, it will sort two strings at a time, sort them, then move to next string...
The printing is the most important part in your program, regardless of what sorting algorithm you use, it doesn't matter.
Here's an algorithm for bubble sort and printing process, it's VB so just convert it...
Dim YourString(4) As String
YourString(0) = "12345" 'Will not be printed
YourString(1) = "12345" 'Will not be printed
YourString(2) = "123" 'Will be printed
YourString(3) = "1234" 'Will be printed
Dim RoundLimit As Integer = YourString.Length - 2
'Outer loop for how many times we will sort the whole array...
For CycleCounter = 0 To RoundLimit
Dim CompareCounter As Integer
'Inner loop to compare strings...
For CompareCounter = 0 To RoundLimit - CycleCounter - 1
'Compare lengths... If the first is greater, sort! Note: this is ascending
If YourString(CompareCounter).Length > YourString(CompareCounter + 1).Length Then
'Sorting process...
Dim TempString = YourString(CompareCounter)
YourString(CompareCounter) = YourString(CompareCounter + 1)
YourString(CompareCounter + 1) = TempString
End If
Next
Next
'Cycles = Array length - 2 , so we have 2 cycles here
'First Cycle!!!
'"12345","12345","123","1234" Compare 1: index 0 and 1 no changes
'"12345","123","12345","1234" Compare 2: index 1 and 2 changed
'"12345","123","1234","12345" Compare 3: index 2 and 3 changed
'Second Cycle!!!
'"123","12345","1234","12345" Compare 1: index 0 and 1 changed
'"123","1234","12345","12345" Compare 2: index 1 and 2 changed
'"123","1234","12345","12345" Compare 3: index 2 and 3 no changes
'No more cycle!
'Now print it! Or use messagebox...
Dim CompareLimit As Integer = YourString.Length - 2
For CycleCounter = 0 To CompareLimit
'If length is equal to next string or the preceeding string, do not print...
If ((CycleCounter - 1) <> -1) Then 'Check if index exist
If YourString(CycleCounter).Length = YourString(CycleCounter - 1).Length Then
Continue For 'The length is not unique, exit compare, go to next iteration...
End If
End If
If ((CycleCounter + 1) <> YourString.Length - 1) Then 'Check if index exist
If YourString(CycleCounter).Length = YourString(CycleCounter + 1).Length Then
Continue For 'The length is not unique, exit compare, go to next iteration...
End If
End If
'All test passed, the length is unique, show a dialog!
MsgBox(YourString(CycleCounter))
Next

The question as stated doesn't say anything about sorting or removing duplicates from the results. It is only the given output that implies the sorting and duplicate removal. It doesn't say anything about optimisation for speed or space or writing for maintainability.
So there really isn't enough information for a "best" solution.
If you want a solution that will work in most languages you probably should stick with an array. Put the lengths in a new array, sort it, then print in a loop that remembers that last value to skip duplicates. I wouldn't want to use a language that couldn't cope with that.
If a language is specified you might be able to take advantage of set or associate array type data structures to handle the duplicates and/or sorting automatically. E.g., in Java you could pick a collection class that automatically ignores duplicates and sorts, and you could structure your code such that a one line change to use a different class would let you keep duplicates, or not sort. If you are using C# you could probably write the whole thing as a one-line LINQ statement...

Here is a C++ solution:
#include <set>
#include <vector>
#include <string>
#include <iostream>
using namespace std;
int main()
{
string strarr[] = {"abc", "adf", "df", "ergd", "adfsgf"};
vector< string > vstr(strarr, strarr + 5);
set< size_t > s;
for (size_t i = 0; i < vstr.size(); i++)
{
s.insert( vstr[i].size() );
}
for (set<size_t>::iterator ii = s.begin(); ii != s.end(); ii++)
cout << *ii << " ";
cout << endl;
return 0;
}
Output:
$ g++ -o set-str set-str.cpp
$ ./set-str
2 3 4 6
A set is used because (quoting from here):
Sets are a kind of associative container that stores unique elements,
and in which the elements themselves are the keys.
Associative containers are containers especially designed to be
efficient accessing its elements by their key (unlike sequence
containers, which are more efficient accessing elements by their
relative or absolute position).
Internally, the elements in a set are always sorted from lower to
higher following a specific strict weak ordering criterion set on
container construction.
Sets are typically implemented as binary search trees.
And for details on vector see here and here for string.

Depending on the language, the easiest way might be to iterate through the array using a for loop
for (i=0;i<array.length;i++){
print array[i].length;
}
do you need to print them in order?

Related

Make a macro to sort a row using a custom list in LibreOffice Calc

I need to sort a column containing cells with the following format : "TITLE text". I know the list of possible titles, but not the texts, so what I would like to do is sort the title in a custom order (for example : PLA, ARG, FHI, BRT) that is not alphabetical. The problem is that the title and the text are in the same cell.
So, for example, here is a screen of datas I might want to work on :
How can I sort this if the cells doesn't perfectly match the list members ?
And, if possible, how to do that using a macro and not manually ?
It's not very difficult. I will try to explain how this is done.
First of all, we need to figure out a way to transfer the range of cells to be sorted to the macro. There are different ways - write the address directly in the macro code, pass it as a parameter to the UDF, get it from the current selection. We use the third method - it is not the easiest to code, but it will work with any data sets.
The main difficulty when using the current selection is that the selection can be one single cell (nothing to sort), a range of cells (and may be several columns - how to sort this?) or several ranges of cells (this is if you hold down the CTRL key and select several unconnected ranges).
A good macro should handle each of these situations. But now we are not writing a good macro, we are getting acquainted with the principle of solving such problems (Since StackOfflow is a resource for programmers, the answers here help you write code yourself, and not get ready-made programs for free). Therefore, we will ignore a single cell and
multiple ranges - we will just stop execution of macro. Moreover, if there is more than one column in the selected range, then we will not do anything either.
Also, in case a full column is selected, we restrict the range to be sorted to the used area. This will sort the real data, but not the million empty cells.
The code that does all this looks like this:
Sub SortByTitles()
Dim oCurrentSelection As Variant
Dim oSortRange As Variant
Dim oSheet As Variant
Dim oCursor As Variant
Dim oDataArray As Variant
Dim sList As String
sList = "PLA,ARG,FHI,BRT"
oCurrentSelection = ThisComponent.getCurrentSelection()
Rem Is it one singl cell?
If oCurrentSelection.supportsService("com.sun.star.sheet.SheetCell") Then Exit Sub
Rem Is it several ranges of cells?
If oCurrentSelection.supportsService("com.sun.star.sheet.SheetCellRanges") Then Exit Sub
Rem Is this one range of cells? (It can be a graphic item or a control.
Rem Or it may not even be a Calc spreadsheet at all)
If Not oCurrentSelection.supportsService("com.sun.star.sheet.SheetCellRange") Then Exit Sub
Rem Is there only one column selected?
If oCurrentSelection.getColumns().getCount() <> 1 Then Exit Sub
Rem Is the current selection outside of the used area?
oSheet = oCurrentSelection.getSpreadsheet()
oCursor = oSheet.createCursor()
oCursor.gotoEndOfUsedArea(True)
oSortRange = oCursor.queryIntersection(oCurrentSelection.getRangeAddress())
If oSortRange.getCount() <> 1 Then Exit Sub
Rem Redim oSortRange as single range (not any ranges)
oSortRange = oSortRange.getByIndex(0)
Rem Get data from oSortRange
oDataArray = oSortRange.getDataArray()
Rem Paste sorted data to the same place:
oSortRange.setDataArray(getSorted(oDataArray, Split(sList,",")))
End Sub
The getSorted() function, which is mentioned in the last line of the procedure, must take two arrays as parameters — the values ​​of the cells to be sorted and the sort list — and return one array of sorted values.
One aspect of working with data from ranges of cells should be mentioned here. If in Excel after receiving data from the range we get a two-dimensional array, then in OpenOffice/LibreOffice we get a one-dimensional "array of arrays", each element of which is a one-dimensional array of cell values ​​of one row. Writing to a range is done from exactly the same structure, from an "array of arrays". The first parameter of the getSorted() function is oDataArray - just such an array of arrays, this will need to be taken into account when processing data.
What will getSorted() function do? It will build a "tree" sorted by Headers from the oDataArray values. In fact, this is not a tree - it is an ascending sorted array of all Headers and all values ​​with these Headers. The values ​​are also a sorted array. Then the function will select from the tree those Headings that are listed in the List and remove them from the tree. If, after all the actions, some elements still remain in the sorted tree, they will be displayed at the very end.
The function will accumulate the result in a separate array of the same size as the original one. In other words, the algorithm will use three times more memory than the original sorted range - source data, a tree and result array. The function will accumulate the result in a separate array of the same size as the original one. In other words, the algorithm will use three times more memory than the original sorted range - source data, a tree and result array.
You can try to save resources and write the results directly to the original array. But I strongly advise against doing this.
The fact is that an array cell may contain not a value, but a reference to a value, and in the case of inaccurate coding, you will not get a large sorted array, but a large array of the same value (the last cell).
I deliberately do not comment on all the following code - if you can read and understand this without comment, then you will understand how actions are programmed to process data from ranges:
Function getSorted(aData As Variant, aList As Variant) As Variant
Dim aRes As Variant
Dim i As Long, pos As Long, j As Long, k As Long, m As Long, uB As Long
Dim aTemp As Variant
aTemp = Array()
ReDim aRes(LBound(aData) To UBound(aData))
For i = LBound(aData) To UBound(aData)
pos = InStr(aData(i)(0), " ")
If pos > 0 Then
AddToArray(Left(aData(i)(0),pos-1), aData(i)(0), aTemp)
Else
AddToArray(aData(i)(0), aData(i)(0), aTemp)
EndIf
Next i
m = LBound(aData) - 1
For i = LBound(aList) To UBound(aList)
k = getIndex(aList(i), aTemp)
If k > -1 Then
uB = UBound(aTemp) - 1
For j = LBound(aTemp(k)(1)) To UBound(aTemp(k)(1))
m = m + 1
aRes(m) = Array(aTemp(k)(1)(j))
Next j
For j = k To uB
aTemp(j) = aTemp(j+1)
Next j
ReDim Preserve aTemp(uB)
EndIf
Next i
For k = LBound(aTemp) To UBound(aTemp)
For j = LBound(aTemp(k)(1)) To UBound(aTemp(k)(1))
m = m + 1
aRes(m) = Array(aTemp(k)(1)(j))
Next j
Next k
getSorted = aRes
End Function
To build a sorted tree, two subroutines are used - AddToArray() and InsertToArray(). They are very similar - the first eight lines are a normal binary search, and the remaining 10-12 lines are actions when an element is not found at the end of the array, when it is found and when it is not found in the middle of the array:
Sub AddToArray(key As Variant, value As Variant, aData As Variant)
Dim l&, r&, m&, N&, i&
l=LBound(aData)
r=UBound(aData)+1
N=r
While (l<r)
m=l+Int((r-l)/2)
If aData(m)(0)<key Then
l=m+1
Else
r=m
EndIf
Wend
If r=N Then
ReDim Preserve aData(0 To N)
aData(N) = Array(key, Array(value))
ElseIf aData(r)(0)=key Then
InsertToArray(value, aData(r)(1))
Else
ReDim Preserve aData(0 To N)
For i = N-1 To r Step -1
aData(i+1)=aData(i)
Next i
aData(r) = Array(key, Array(value))
EndIf
End Sub
Sub InsertToArray(key As Variant, aData As Variant)
Dim l&, r&, m&, N&, i&
l=LBound(aData)
r=UBound(aData)+1
N=r
While (l<r)
m=l+Int((r-l)/2)
If aData(m)<key Then
l=m+1
Else
r=m
EndIf
Wend
If r=N Then
ReDim Preserve aData(0 To N)
aData(N) = key
Else
ReDim Preserve aData(0 To N)
For i = N-1 To r Step -1
aData(i+1)=aData(i)
Next i
aData(r) = key
EndIf
End Sub
The getIndex() function uses the same binary search. It will return the index of the element in the array if it can find it, or -1 otherwise:
Function getIndex(key As Variant, aData As Variant) As Long
Dim l&, r&, m&, N&
l=LBound(aData)
r=UBound(aData)+1
N=r
While (l<r)
m=l+Int((r-l)/2)
If aData(m)(0)<key Then
l=m+1
Else
r=m
EndIf
Wend
If r=N Then
getIndex = -1
ElseIf aData(r)(0)=key Then
getIndex = r
Else
getIndex = -1
EndIf
End Function
And that's all that is needed to solve the task:
Demo file with code - SortByTitle.ods

How to populate an array with incrementally increasing values Ruby

I'm attempting to solve http://projecteuler.net/problem=1.
I want to create a method which takes in an integer and then creates an array of all the integers preceding it and the integer itself as values within the array.
Below is what I have so far. Code doesn't work.
def make_array(num)
numbers = Array.new num
count = 1
numbers.each do |number|
numbers << number = count
count = count + 1
end
return numbers
end
make_array(10)
(1..num).to_a is all you need to do in Ruby.
1..num will create a Range object with start at 1 and end at whatever value num is. Range objects have to_a method to blow them up into real Arrays by enumerating each element within the range.
For most purposes, you won't actually need the Array - Range will work fine. That includes iteration (which is what I assume you want, given the problem you're working on).
That said, knowing how to create such an Array "by hand" is valuable learning experience, so you might want to keep working on it a bit. Hint: you want to start with an empty array ([]) instead with Array.new num, then iterate something num.times, and add numbers into the Array. If you already start with an Array of size num, and then push num elements into it, you'll end up with twice num elements. If, as is your case, you're adding elements while you're iterating the array, the loop never exits, because for each element you process, you add another one. It's like chasing a metal ball with the repulsing side of a magnet.
To answer the Euler Question:
(1 ... 1000).to_a.select{|x| x%3==0 || x%5==0}.reduce(:+) # => 233168
Sometimes a one-liner is more readable than more detailed code i think.
Assuming you are learning Ruby by examples on ProjectEuler, i'll explain what the line does:
(1 ... 1000).to_a
will create an array with the numbers one to 999. Euler-Question wants numbers below 1000. Using three dots in a Range will create it without the boundary-value itself.
.select{|x| x%3==0 || x%5==0}
chooses only elements which are divideable by 3 or 5, and therefore multiples of 3 or 5. The other values are discarded. The result of this operation is a new Array with only multiples of 3 or 5.
.reduce(:+)
Finally this operation will sum up all the numbers in the array (or reduce it to) a single number: The sum you need for the solution.
What i want to illustrate: many methods you would write by hand everyday are already integrated in ruby, since it is a language from programmers for programmers. be pragmatic ;)

simple method to keep last n elements in a queue for vb6?

I am trying to keep the last n elements from a changing list of x elements (where x >> n)
I found out about the deque method, with a fixed length, in other programming languages. I was wondering if there is something similar for VB6
Create a Class that extends an encapsulated Collection.
Add at the end (anonymous), retrieve & remove from the beginning (index 1). As part of adding check your MaxDepth property setting (or hard code it if you like) and if Collection.Count exceeds it remove the extra item.
Or just hard code it all inline if a Class is a stumper for you.
This is pretty routine.
The only thing I can think of is possibly looping through the last 5 values of the dynamic array using something like:
For UBound(Array) - 5 To UBound(Array)
'Code to store or do the desired with these values
Loop
Sorry I don't have a definite answer, but hopefully that might help.
Here's my simplest solution to this:
For i = n - 1 To 1 Step -1
arrayX(i) = arrayX(i - 1)
Next i
arrayX(0) = latestX
Where:
arrayX = array of values
n = # of array elements
latestX = latest value of interest (assumes entire code block is also
within another loop)

Is there an easy way to have a "mode" function on an array of singles in vb6?

I need to run "mode" (which value occurs most frequently) on an array of singles in vb6. Is there a quick way do do this on large arrays?
Have a look online for a decent implementation of a sort algorithm for VB6 (I can't believe it doesn't have one built in!), sort the array, and then go through it counting the occurrences (which will be straightforward as you've all the same items together in the array) - keep a track of the most frequently occurring item on your way through and you're done. This should be O(n ln(n)) - that is, fast enough - if you've used a decent sort algorithm (quicksort or similar).
You could use a hash table. Hash all of the elements of your array (which is O(n)). You'll need a back-end data structure to hold the unique values that each hash bin contains and the number of occurances (some sort of associative memory similar to the C++ std::map). As long as you can guarantee that there will be no more than a constant, m, number of collisions (for dissimilar hash input values) in any given bin, this is O(m log m), but since m is constant, this is really O(1). This assumption may not be reasonable, but the key is to get good enough spread for your input values.
To pull out the mode, examine all of the elements in the hash table, which will be values that occur in your original input array and the number of times they occur. Find the value with the largest number of occurances (again O(n)). Total complexity is O(n) if you can find a suitable hash function. Worst case performance will be O(n log n) if the hash function doesn't provide you with good collision performance.
On another note, .Net provides a large runtime library that might make this easier. If it's feasible, you might want to consider using a new version of VB.
Included a reference to Microsoft Scripting Runtime, and used a Dictionary object to keep tally of frequency, then looked for index highest frequency and the corresponding key is the mode. Not the quickest/most elegant solution, but I just needed something up fast that worked.
Function fnModeSingle(ByRef pValues() As Single) As Single
Dim dict As Dictionary
Set dict = New Dictionary
dict.CompareMode = BinaryCompare
Dim i As Long
Dim pCurVal As Single
For i = 0 To uBound(pValues)
'limit the values that have to be analyzed to desired precision'
pCurVal = Round(pValues(i), 2)
If (pCurVal > 0) Then
'this will create a dictionary entry if it doesn't exist
dict.Item(pCurVal) = dict.Item(pCurVal) + 1
End If
Next
'find index of first largest frequency'
Dim KeyArray, itemArray
KeyArray = dict.Keys
itemArray = dict.Items
pCount = 0
Dim pModeIdx As Integer
'find index of mode'
For i = 0 To UBound(itemArray)
If (itemArray(i) > pCount) Then
pCount = itemArray(i)
pModeIdx = i
End If
Next
'get value corresponding to selected mode index'
fnModeSingle = KeyArray(pModeIdx)
Set dict = Nothing
End Function

How to find all brotherhood strings?

I have a string, and another text file which contains a list of strings.
We call 2 strings "brotherhood strings" when they're exactly the same after sorting alphabetically.
For example, "abc" and "cba" will be sorted into "abc" and "abc", so the original two are brotherhood. But "abc" and "aaa" are not.
So, is there an efficient way to pick out all brotherhood strings from the text file, according to the one string provided?
For example, we have "abc" and a text file which writes like this:
abc
cba
acb
lalala
then "abc", "cba", "acb" are the answers.
Of course, "sort & compare" is a nice try, but by "efficient", i mean if there is a way, we can determine a candidate string is or not brotherhood of the original one after one pass processing.
This is the most efficient way, i think. After all, you can not tell out the answer without even reading candidate strings. For sorting, most of the time, we need to do more than 1 pass to the candidate string. So, hash table might be a good solution, but i've no idea what hash function to choose.
Most efficient algorithm I can think of:
Set up a hash table for the original string. Let each letter be the key, and the number of times the letter appears in the string be the value. Call this hash table inputStringTable
Parse the input string, and each time you see a character, increment the value of the hash entry by one
for each string in the file
create a new hash table. Call this one brotherStringTable.
for each character in the string, add one to a new hash table. If brotherStringTable[character] > inputStringTable[character], this string is not a brother (one character shows up too many times)
once string is parsed, compare each inputStringTable value with the corresponding brotherStringTable value. If one is different, then this string is not a brother string. If all match, then the string is a brother string.
This will be O(nk), where n is the length of the input string (any strings longer than the input string can be discarded immediately) and k is the number of strings in the file. Any sort based algorithm will be O(nk lg n), so in certain cases, this algorithm is faster than a sort based algorithm.
Sorting each string, then comparing it, works out to something like O(N*(k+log S)), where N is the number of strings, k is the search key length, and S is the average string length.
It seems like counting the occurrences of each character might be a possible way to go here (assuming the strings are of a reasonable length). That gives you O(k+N*S). Whether that's actually faster than the sort & compare is obviously going to depend on the values of k, N, and S.
I think that in practice, the cache-thrashing effect of re-writing all the strings in the sorting case will kill performance, compared to any algorithm that doesn't modify the strings...
iterate, sort, compare. that shouldn't be too hard, right?
Let's assume your alphabet is from 'a' to 'z' and you can index an array based on the characters. Then, for each element in a 26 element array, you store the number of times that letter appears in the input string.
Then you go through the set of strings you're searching, and iterate through the characters in each string. You can decrement the count associated with each letter in (a copy of) the array of counts from the key string.
If you finish your loop through the candidate string without having to stop, and you have seen the same number of characters as there were in the input string, it's a match.
This allows you to skip the sorts in favor of a constant-time array copy and a single iteration through each string.
EDIT: Upon further reflection, this is effectively sorting the characters of the first string using a bucket sort.
I think what will help you is the test if two strings are anagrams. Here is how you can do it. I am assuming the string can contain 256 ascii characters for now.
#define NUM_ALPHABETS 256
int alphabets[NUM_ALPHABETS];
bool isAnagram(char *src, char *dest) {
len1 = strlen(src);
len2 = strlen(dest);
if (len1 != len2)
return false;
memset(alphabets, 0, sizeof(alphabets));
for (i = 0; i < len1; i++)
alphabets[src[i]]++;
for (i = 0; i < len2; i++) {
alphabets[dest[i]]--;
if (alphabets[dest[i]] < 0)
return false;
}
return true;
}
This will run in O(mn) if you have 'm' strings in the file of average length 'n'
Sort your query string
Iterate through the Collection, doing the following:
Sort current string
Compare against query string
If it matches, this is a "brotherhood" match, save it/index/whatever you want
That's pretty much it. If you're doing lots of searching, presorting all of your collection will make the routine a lot faster (at the cost of extra memory). If you are doing this even more, you could pre-sort and save a dictionary (or some hashed collection) based off the first character, etc, to find matches much faster.
It's fairly obvious that each brotherhood string will have the same histogram of letters as the original. It is trivial to construct such a histogram, and fairly efficient to test whether the input string has the same histogram as the test string ( you have to increment or decrement counters for twice the length of the input string ).
The steps would be:
construct histogram of test string ( zero an array int histogram[128] and increment position for each character in test string )
for each input string
for each character in input string c, test whether histogram[c] is zero. If it is, it is a non-match and restore the histogram.
decrement histogram[c]
to restore the histogram, traverse the input string back to its start incrementing rather than decrementing
At most, it requires two increments/decrements of an array for each character in the input.
The most efficient answer will depend on the contents of the file. Any algorithm we come up with will have complexity proportional to N (number of words in file) and L (average length of the strings) and possibly V (variety in the length of strings)
If this were a real world situation, I would start with KISS and not try to overcomplicate it. Checking the length of the target string is simple but could help avoid lots of nlogn sort operations.
target = sort_characters("target string")
count = 0
foreach (word in inputfile){
if target.len == word.len && target == sort_characters(word){
count++
}
}
I would recommend:
for each string in text file :
compare size with "source string" (size of brotherhood strings should be equal)
compare hashes (CRC or default framework hash should be good)
in case of equity, do a finer compare with string sorted.
It's not the fastest algorithm but it will work for any alphabet/encoding.
Here's another method, which works if you have a relatively small set of possible "letters" in the strings, or good support for large integers. Basically consists of writing a position-independent hash function...
Assign a different prime number for each letter:
prime['a']=2;
prime['b']=3;
prime['c']=5;
Write a function that runs through a string, repeatedly multiplying the prime associated with each letter into a running product
long long key(char *string)
{
long long product=1;
while (*string++) {
product *= prime[*string];
}
return product;
}
This function will return a guaranteed-unique integer for any set of letters, independent of the order that they appear in the string. Once you've got the value for the "key", you can go through the list of strings to match, and perform the same operation.
Time complexity of this is O(N), of course. You can even re-generate the (sorted) search string by factoring the key. The disadvantage, of course, is that the keys do get large pretty quickly if you have a large alphabet.
Here's an implementation. It creates a dict of the letters of the master, and a string version of the same as string comparisons will be done at C++ speed. When creating a dict of the letters in a trial string, it checks against the master dict in order to fail at the first possible moment - if it finds a letter not in the original, or more of that letter than the original, it will fail. You could replace the strings with integer-based hashes (as per one answer regarding base 26) if that proves quicker. Currently the hash for comparison looks like a3c2b1 for abacca.
This should work out O(N log( min(M,K) )) for N strings of length M and a reference string of length K, and requires the minimum number of lookups of the trial string.
master = "abc"
wordset = "def cba accb aepojpaohge abd bac ajghe aegage abc".split()
def dictmaster(str):
charmap = {}
for char in str:
if char not in charmap:
charmap[char]=1
else:
charmap[char] += 1
return charmap
def dicttrial(str,mastermap):
trialmap = {}
for char in str:
if char in mastermap:
# check if this means there are more incidences
# than in the master
if char not in trialmap:
trialmap[char]=1
else:
trialmap[char] += 1
else:
return False
return trialmap
def dicttostring(hash):
if hash==False:
return False
str = ""
for char in hash:
str += char + `hash[char]`
return str
def testtrial(str,master,mastermap,masterhashstring):
if len(master) != len(str):
return False
trialhashstring=dicttostring(dicttrial(str,mastermap))
if (trialhashstring==False) or (trialhashstring != masterhashstring):
return False
else:
return True
mastermap = dictmaster(master)
masterhashstring = dicttostring(mastermap)
for word in wordset:
if testtrial(word,master,mastermap,masterhashstring):
print word+"\n"

Resources