C(n)= 2nCn * 1/(n+1)
The above sequence is used to find the possible no. of binary search trees, I want to know that what is the name of above sequence and the purposes for which this sequence can be used efficiently.
The name is Catalan Equation (Sequence).
I think this thread contains the information you want.
Related
Provided a list of valid words, and a search word, I want to find whether the search word is a valid word or not ALLOWING 2 typo characters.
What would be a good data structure to store a dictionary of words(assumingly it contains a million words) and algorithm to find whether the word exists in the dictionary(allowing 2 typo characters).
If no typo characters where allowed, then a Trie would be a good way to store the words but not sure if it stays the best way to store dictionary when typos are allowed. Not sure what the complexity for a backtracking algorithm(to search for a word in Trie allowing 2 typos) would be. Any idea about it?
You might want to checkout Directed Acyclic Word Graph or DAWG. It has more of an automata structure than a tree of graph structure. Multiple possibilities from one place may provide you with your solution.
If there is no need to also store all mistyped words I would consider to use a two-step approach for this problem.
Build a set containing hashes of all valid words (not including
typos). So probably we are talking here about some 10.000 entries,
which should still allow quite fast lookups with a binary search. If
the hash of a word is found in the set it is typed correctly.
If a words hash is not found in the set the word is probably
mistyped. So calculate a the Damerau-Levenshtein distance between
the word and all known words to figure out what the user might have
meant. To gain some performance here modify the DL-algorithm to
abort calculation if the distance gets bigger than your allowed
threshold of 2 typos.
My problem statement is that I am given millions of strings, and I have to find one sub-string which can be present in any of those strings.
e.g. given is "xyzoverflowasxs, werstackweq" etc. and I have to find a given sub string named as "stack", which should return "werstackweq". What kind of data structure we can use for solving this problem ?
I think we can use suffix tree for this , but wanted some more suggestions for this problem.
I think the way to go is with a dictionary holding the actual words, and another data structure pointing to entries within this dictionary. One way to go would be with suffix trees and their variants, as mentioned in the question and the comments. I think the following is a far simpler (heuristic) alternative.
Say you choose some integer k. For each of your strings, finding the k Rabin Fingerprints of length-k within each string should be efficient and easy (any language has an implementation).
So, for a given k, you could hold two data structures:
A dictionary of the words, say a hash table based on collision lists
A dictionary mapping each fingerprint to an array of the linked-list node pointers in the first data structure.
Given a word of length k or greater, you would choose a k subword, calculate its Rabin fingerprint, find the words which contain this fingerprint, and check if they indeed contain this word.
The question is which k to use, and whether to use multiple such k. I would try this experimentally (starting with simultaneously a few small k values for, say, 1, 2, and 3, and also a couple of larger ones). The performance of this heuristic anyway depends on the distribution of your dictionary and queries.
I have no idea how to Google this out so I would like to ask you guys if you know is there such an algorithm, which would determine what kind of signs you need to put between numbers in order to get given result.
For example, you input 4 numbers: 8 9 2 1. The last number is the result. So the answer would be 8-9+2=1. Is there such an algorithm, maybe you know its name so I could Google it out and read about it?
Yes, it's called Brute-force search.
You generate all possible candidates for the solution and check each of them to see whether they satisfy the constraints.
I am trying to understand Trie data structure & I've understood that they are used in Spell checkers/Auto suggest or correct spellings etc. i.e. especially used in the context of language dictionaries. I wonder if there are any other possible use cases for a Trie data structure (as it is or in any augmented form).
Thanks for advance.
PS: This is not a homework problem & I am here trying to better understand possible usecases for a Trie data structure and that's it.
Tries are integral in routing systems.
Most routers stores IP address in a form of a trie (Patricia Trees) which are well suited for lookups etc.
Tries are useful as a lookup structure where you are dealing with strings (of bytes/bits etc).
Suffix trees are essentially tries and have wide string related applications, like substring checks, finding repeated substrings, palindrome finding etc.
Here are a couple of algorithm puzzles for you to try out.
Given an nxn binary matrix (of zeroes and ones), eliminate the duplicate rows.
Given n numbers, find two numbers x,y among those such that x XOR y (the exclusive OR) is maximum among all the n^2 possibilities.
I was asked the question in an interview. The interviewer told me to assume that there exists a function say getNextWord() to return the next word in a given document. My task was to design a data structure to implement the task, and give an algorithm that constructs a list of all words with their frequencies.
Being from a C++ background, my answer was to create a multimap of string and then insert all words in it, and later display the count of it. I was however told later, to do this in a more generic way. By generic he meant that he didn't want me to use a library feature. Also I guess a multimap is implemented internally as a 2-3 tree or so, so for the multimap solution to be generic I would need to code the 2-3 tree as well.
Although tries did come to mind, implementing one during an interview was out of question for me. So, I just wanted to know if there are better ways of achieving it? Or is there a way to implement it in a smooth manner using tries?
Any histogram based algorithm would be both effient and generic in here. The idea is simple: build a histogram from the data. A generic interface for a histogram is a Map<String,Integer>
Iterate the document once (with your nextDoc() method), while maintaining your histogram.
The best implementation for this interface, in terms of big O notations - would probably be to use a trie, and in each leaf node - add the counter of occurances.
Getting the actual (word,number) pairs from the trie will be done by a simple DFS on the trie.
This solution gives you O(n * |S|) time complexity, where |S| is the average size for a string.
The insertion algorithm for each word:
Each time you add a new word: check if it already exists, if it does - increase the counter, else - add the word to the dictionary with a counter value of 1.
I'd try to implement a B-Tree (or smth quite similar) to store all the words. Therefore I could easily find a next word, if already have it and increase associated counter in the node. Or just insert a new one.
The time complexity in that case would be: O(nlogn), where n is all words count and logn is a Big-Oh for such kind of tree.
I think the simplest solution would be a a Trie. O(N) is given in this case (both for insertion and getting the count). Just store the count in an additional space at every node.
Basically each node in the tree contains 26 links to 26 possible children (1 for each letter) + 1 counter (for words the are terminated in the current node)
.
Just look at the link for a graphic image of a trie.