how many Nested words on the length of the words within l are there? - computation-theory

how many Nested words on the length of the words within l are there?

Related

An algorithm reading N words and printing all anagrams

The original problem is here:
Design a O(N log N) algorithm to read in a list of words and print out all anagrams. For example, the strings "comedian" and "demoniac" are anagrams of each other. Assume there are N words and each word contains at most 20 letters. Designing a O(N^2) algorithms should not be too difficult, but getting it down to O(N log N) requires some cleverness.
I am confused since the length of a word does not depend on N. It is a constant number 20. So I thought we can multiply the running time for one word by N. Hence the result will be O(N). However, it seems I miss something.
If they insist on hitting that O(n log n) algorithm (because I think you can do better), a way is as follows:
Iterate over the array and sort each word individually
Keep a one to one map of each sorted word its original (in the form of a tuple for example)
Sort that newly created list by the sorted words.
Iterate over the sorted list, extract the words with equal sorted counterparts (they are adjacent now) and print them.
Example:
array = ["abc", "gba", "bca"]
Sorting each word individually and keeping the original word gives:
new_array = [("abc", "abc"), ("abg", "gba"), ("abc", "bca")]
Sorting the whole array by the first element gives:
new_array = [("abc", "abc"), ("abc", "bca"), ("abg", "gba")]
Now we iterate over the above array and extract words with equal first elements in the tuple, which gives
[("abc", "abc"), ("abc", "bca")] => ["abc", "bca"]
Time complexity analysis:
Looping over the original array is linear n.
sorting each individual word in constant because they will never exceed 20 characters. It's just 20 * log 20 = 100 roughly.
sorting the whole array is linearithmic in n.
the resulting time complexity becomes O(n * log n) where n is the length of the input array.
An algorithm
We are going to find an "identifier" for every class of anagrams. An identifier should be something that:
is unique to this class: no two classes have the same identifier;
can be computed when we're given a single word of the class: given two different words of the same class, we should compute the same identifier.
Once we've done that, all we have to do is group together the words that have the same identifier. There are several different ways of grouping words that have the same identifier; the main two ways are:
sorting the list of words, using the identifier as a comparison key;
using a "map" data structure, for instance a hash table or a binary tree.
Can you think of a good identifier?
An identifier I can think of is the list of letters of the words, in alphabetical order. For instance:
comedian --> acdeimno
dog --> dgo
god --> dgo
hello --> ehllo
hole --> ehlo
demoniac --> acdeimno
Implementation in python
words = 'comedian dog god hello hole demoniac'.split()
d = {}
for word in words:
d.setdefault(''.join(sorted(word)), []).append(word)
print(list(d.values()))
[['comedian', 'demoniac'], ['dog', 'god'], ['hello'], ['hole']]
The explanation
The most important thing here is that for each word, we computed ''.join(sorted(word)). That's the identifier I mentioned earlier. In fact, I didn't write the earlier example by hand; I printed it with python using the following code:
for word in words:
print(word, ' --> ', ''.join(sorted(word)))
comedian --> acdeimno
dog --> dgo
god --> dgo
hello --> ehllo
hole --> ehlo
demoniac --> acdeimno
So what is this? For each class of anagrams, we've made up a unique word to represent that class. "comedian" and "demoniac" both belong to the same class, represented by "acdeimno".
Once we've managed to do that, all that is left is to group the words which have the same representative. There are a few different ways to do that. In the python code, I have used a python dict, a dictionary, which is effectively a hashtable mapping the representative to the list of corresponding words.
Another way, if you don't know about map data structures, is to sort the list, which takes O(N log N) operations, using the representative as the comparison key:
print( sorted(words, key=lambda word:''.join(sorted(word))) )
['comedian', 'demoniac', 'dog', 'god', 'hello', 'hole']
Now, all words that belong to the same class of synonyms are adjacent. All that's left for you to do is iterate through this sorted list, and group elements which have the same key. This is only O(N). So the longest part of the algorithm was sorting the list.
You can do it in o(n) by using a hash table (or a dict in python)
Then you add of for i in 1..sqrt(n) at each step
This is the best way to make a n.sqrt(n) where a o(n) algorithm exists.
Let d=dict() a python dictionary
Iterate over the array:
for each word w, let s=word sorted by increasing letter value
if s already in d, add w to d[s]
if not d[s]=[w]
for i in 1.. sqrt(n) : do nothing // needed to slow from o(n) to o(n.sqrt())`
Print anagrams
foreach (k,l) in d
if len(l)>1 print l

Given a collection of strings, can we make all the strings equal

The allowed operations are removal of character from a string and adding that character to another string. We can repeat the operation as many times as we want.
Given list = ['CAA', 'CBB'].
We can remove 'A' from the first string and add it to the second string.
'CA', 'CBBA'.
Now, we can remove 'B" from the second string and add in the middle of string 'CA'.
So, we have 'CBA' and 'CBA'
Step-1. Compute the frequency of each character in all the strings of the list, let's say it as stringList.
Step-2: Compute the length of stringList and let's call it as length. (length is equal to the number of strings in the list.)
Step-3: Now, for the frequency of each character, Check if it is divisible by length. If any frequency is not divisible by length then it's not possible to equate the strings.
If it is possible:
Just distribute the characters equally among the strings to get an answer list.

More efficient way to find phrases in a string?

I have a list that contains 100,000+ words/phrases sorted by length
let list = [“string with spaces”, “another string”, “test”, ...]
I need to find the longest element in the list above that is inside a given sentence. This is my initial solution
for item in list {
if sentence == item
|| sentence.startsWith(item + “ “)
|| sentence.contains(“ “ + item + “ “)
|| sentence.endsWith(“ “ + item) {
...
break
}
}
This issue I am running into is that this is too slow for my application. Is there a different approach I could take to make this faster?
You could build an Aho-Corasick searcher from the list and then run this on the sentence. According to https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm "The complexity of the algorithm is linear in the length of the strings plus the length of the searched text plus the number of output matches. Note that because all matches are found, there can be a quadratic number of matches if every substring matches (e.g. dictionary = a, aa, aaa, aaaa and input string is aaaa). "
I would break the given sentence up into a list of words and then compute all possible contiguous sublists (i.e. phrases). Given a sentence of n words, there are n * (n + 1) / 2 possible phrases that can be found inside it.
If you now substitute your list of search phrases ([“string with spaces”, “another string”, “test”, ...]) for an (amortized) constant time lookup data structure like a hashset, you can walk over the list of phrases you computed in the previous step and check whether each one is in the set in ~ constant time.
The overall time complexity of this algorithm scales quadratically in the size of the sentence, and is roughly independent of the size of the set of search terms.
The solution I decided to use was a Trie https://en.wikipedia.org/wiki/Trie. Each node in the trie is a word, and all I do is tokenize the input sentence (by word) and traverse the trie.
This improved performance from ~140 seconds to ~5 seconds

How to sort a list of strings by a part of the string in prolog

If I have a list containing 123.1234,abc and 321.321,qwe, how can i sort the list by only the float value?
I have tried to use sort/4, but it doesn't work. Is it possible to use some other sort like keysort?
?- sort(["10.29127443524318,F","5.968607804131937,A"],LL).
LL = ["10.29127443524318,F", "5.968607804131937,A"].
when I do the above sort, it gives me wrong answer.
Sorting is numeric for ints and floats, and alphabetic for strings (so in your example the sorting is correct, because "1" is alphabetically before "5").
You can convert string to numbers, but this results in an error if there are non-numeric parts in the string (such as ,F in your example).
Therefore you need to split the string first, and sort the resulting lists of substrings. For a list of lists sorting is performed on the first element of the inner list, and then the second element if the first element is equal, etc.
After sorting the substrings can be concatenated again. Here the conversion from numbers back to strings can be performed implicitly during the concatenation.
Splitting, number conversion and recombination can be done in the basic Prolog recursive fashion.
MWE:
sort_compound(L1,LL) :-
split_compound(L1,L2),
sort(L2,L3),
recombine(L3,LL).
split_compound([H|T],[[N,L2]|T2]) :-
split_string(H,",","",[H1,L2]),
number_string(N,H1),
split_compound(T,T2).
split_compound([],[]).
recombine([[A,B]|T],[C|T2]) :-
atomics_to_string([A,',',B],C),
recombine(T,T2).
recombine([],[]).
Result:
?- sort_compound(["10.29127443524318,F","5.968607804131937,A"],LL).
LL = ["5.968607804131937,A", "10.29127443524318,F"].
Note that several SWI-Prolog specific predicates are used for data type conversion, so for other interpreters this needs to be adjusted.

Number of possible palindrome anagrams for a given word

I have to find No. of palindrome anagrams are possible for a given word.
Suppose the word is aaabbbb.My approach is
Prepare a hash map that contains no. of time each letter is appearing
For my example it will be
a--->3
b--->4
If length of string is even then no. of occurrence of each letter should be even to form palindrome of given word else no of
palindrome anagrams is 0
If length of string is odd then at max one occurrence of letter can be odd and other should be even.
This two above steps was for finding that weather a given word can can form palindrome or not.
Now for finding no of palindrome anagrams, what approach should I follow?
First thing to notice is that if the word is an odd length, then there must be exactly one character with an odd number of occurrences. If the word is an even length, then there must be no characters with an odd number of occurrences. In either case, you're looking for how many ways you can arrange the pairs of characters. You're looking for the number of permutations since order matters:
n = number of character pairs (aaaabbb would have 3 pairs, aabbcccc would have 4 pairs)
(n)!/( number_of_a_pairs! * number_of_b_pairs! * etc..)
So in the aaaabbb case, you're finding the permutations of aab:
3!/2!1! = 3
baa = baabaab
aba = abababa
aab = aabbbaa
And in the aabbcccc case, you're finding the permutations of abcc:
4!/2! = 12:
abcc
acbc
accb
bacc
bcac
bcca
cabc
cacb
cbac
cbca
ccab
ccba

Resources