Finding number of anagrams [closed] - algorithm

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Following was a question that was asked to me in one of the interviews.
We know anagram of eat is: tea and ate
The question is:
We have a program. We feed a list of 10 thousand alphabets to this program.
We run the program.
Now at run-time, we provide a word to this program eg. "eat"
Now the program should return the number of anagrams that exist in the list of 10 thousand alphabets. Hence for an input of "eat", it should return 2.
What will be the strategy to store those 10 thousand alphabets so that finding the number of anagrams becomes easy.

Order the letters of each word as to minimize it's ordering, i.e. tea becomes aet.
Then simply put these in a (hash) map of words to counts (both tea and ate maps to aet, so we'll have (aet, 2) in the map)
Then, when you get a word, reorder the letters as above and do a lookup for the count.
Running time:
Assuming n words in the list, with an average word length of m...
Expected O(nm log m) preprocessing, expected O(m log m) per query.
It's m log m on the assumption we just do a simple sort of the letters of a word.
The time taken per query is expected to be unaffected by the numbers of words in the list (i.e. hash maps give expected O(1) lookup time).

Related

Which number appeared once? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
Given a list of 2n-1 numbers: all between 1 to n, all but one occur twice. Determine the number that occurs only once. Multiple ways preferred.
I think the problem is at fault, how can you determine which number without knowing the list of numbers?
[O(1) space, O(n) time]: Just take the XOR of all the numbers. Since all the numbers occur two times except one, XOR of those numbers will be zero and the single occurring number will be the result.
[O(1) space, O(n) time]: As said by user3386109 in comments, we can sum all the given numbers and compare that to the sum of numbers in the range [1, n] which will be n*(n+1) (since all numbers are supposed to occur twice). The difference of the two numbers is the answer.
[O(n) space, O(n) time]: Create an array of size n and keep the count of all the elements in the array at their corresponding positions. At the end, traverse the array, and find the number whose count is only 1.

Powers of a half that sum to one [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Call every subunitary ratio with its denominator a power of 2 a perplex.
Number 1 can be written in many ways as a sum of perplexes.
Call every sum of perplexes a zeta.
Two zetas are distinct if and only if one of the zeta has as least one perplex that the other does not have. In the image shown above, the last two zetas are considered to be the same.
Find all the numbers of ways 1 can be written as a zeta with N perplexes. Because this number can be big, calculate it modulo 100003.
Please don't post the code, but rather the algorithm. Be as precise as you can.
This problem was given at a contest and the official solution, written in the Romanian language, has been uploaded at https://www.dropbox.com/s/ulvp9of5b3bfgm0/1112_descr_P2_fractii2.docx?dl=0 , as a docx file. (you can use google translate)
I do not understand what the author of the solution meant to say there.
Well, this reminds me of BFS algorithms(Breadth first search), where you radiate out from a single point to find multiple solutions w/ different permutations.
Here you can use recursion, and set the base case as when N perplexes have been reached in that 1 call stack of the recursive function.
So you can say:
function(int N <-- perplexes, ArrayList<Double> currentNumbers, double dividedNum)
if N == 0, then you're done - enter the currentNumbers array into a hashtable
clone the currentNumbers ArrayList as cloneNumbers
remove dividedNum from cloneNumbers and add 2 dividedNum/2
iterate through index of cloneNumbers
for every number x in cloneNumbers, call function(N--, cloneNumbers, x)
This is a rough, very inefficient but short way to do it. There's obviously a lot of ways you can prune the algorithm(reduce the amount of duplicates going into the hashtable, prevent cloning as much as possible, etc), but because this shows the absolute permutation of every number, and then enters that sequence into a hashtable, the hashtable will use its equals() comparison to see that the sequence already exists(such as your last 2 zetas), and reject the duplicate. That way, you'll be left with the answer you want.
The efficiency of the current algorithm: O(|E|^(N)), where |E| is the absolute number of numbers you can have inside of the array at the end of all insertions, and N is the number of insertions(or as you said, # of perplexes). Obviously this isn't the most optimal speed, but it does definitely work.
Hope this helps!

Is O(nk(log(k))) algorithm same as O(n(log(k))) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I was asked to give an algorithm that was supposed to be O(n(log(k)))
k is the number of arrays and n is the total number of elements in all of these. I had to sort the arrays.
Minus the details I came up with an algorithm that does the job for in klog(k) times the total number of elements. i.e. O(nk(log(k)))
Also in this case k is much smaller than n so it wont be n^2(logn) (in case k and n were almost same)right?
Well, no, it's not the same. If k is a variable (as opposed to a constant) in the complexity expression then O(nk(log(k))) > O(n(log(k))).
That is because there is no constant C such that Cn(log(k)) > kn(log(k)) for every n, k.
The way you describe the question both k and n are input parameters. If that is the case then the answer to your question is
'No, O(n*k *log(k)) is not the same as O(n*log(k))'.
It is not that hard to see that the first one grows faster than the second one, but it is even more obvious if you fix the value of n. Consider n begin a constant say 1. Than it is more obvious that O(k*log(k)) is not the same as O(log(k)).

Figure out the order of a list of chars [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
English has 26 chars (a,b,c,d,...,z) and they have the order b behind a, c behind b, etc.
Suppose we have another language. In the language, we also have a number of chars. All chars have an order, just like chars in English.
However, we don't know the total order of all chars yet.
We are given a list of words, in each word, the chars are sorted already.
Please use data structure and algorithm to induct the total order of all chars.
for example,
we have chars #, £, $, %. We don't know the order of these in a language.
We are given a list of words
£ %
# %
$ #
£ $
Then we can get the total order £ $ # %.
Construct a directed graph of containing all characters as vertices.
Create an edge from each character to each character directly following that character in any word. For example, if you have a word # % ^, you'd have edges # -> % and % -> ^.
Run a topological sort on the graph to get the correct order.

two whole texts similarity using levenshtein distance [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have two text files which I'd like to compare. What I did is:
I've split both of them into sentences.
I've measured levenshtein distance between each of the sentences from one file with each of the sentences from second file.
I'd like to calculate average similarity between those two text files, however I have trouble to deliver any meaningful value - obviously arithmetic mean (sum of all the distances [normalized] divided by number of comparisions) is a bad idea.
How to interpret such results?
edit:
Distance values are normalized.
The levenshtein distances has a maximum value, i.e. the max. length of both input strings. It cannot get worse than that. So a normalized similarity index (0=bad, 1=match) for two strings a and b can be calculated as 1- distance(a,b)/max(a.length, b.length).
Take one sentence from File A. You said you'd compare this to each sentence of File B. I guess you are looking for a sentence out of B which has the smallest distance (i.e. the highest similarity index).
Simply calculate the average of all those 'minimum similarity indexes'. This should give you a rough estimation of the similarity of two texts.
But what makes you think that two texts which are similar might have their sentences shuffled? My personal opinion is that you should also introduce stop word lists, synonyms and all that.
Nevertheless: Please also check trigram matching which might be another good approach to what you are looking for.

Resources