As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
asked in a recent interview:
What data structure would you use to implement spell correction in a document. The goal is to find if a given word typed by the user is in the dictionary or not (no need to correct it).
What is the complexity?
I would use a "Radix," or "Patricia," tree to index the dictionary. See here, including an example of its use to index dictionary words: https://secure.wikimedia.org/wikipedia/en/wiki/Radix_tree. There is a useful discussion at that link of its complexity.
if I'm understanding the question correctly, you are given a dictionary (or a list of "correct" words), and are asked to specify whether an input word is in the dictionary. So you're looking for data structures with very fast lookup times. I would go with a hash table
I would use a DAWG (Directed Acyclic Word Graph) which is basically a compressed Trie.
These are commonly used in algorithms for Scrabble and other words games, like Boggle.
I've done this before. The TWL06 Scrabble dictionary with 170,000 words fits in a 700 KB structure both on disk and in RAM.
The Levenshtein distance tells you how many letters you need to change to get from one string to another ... by finding the one with less substitutions you are able to provide correct words (also see Damerau Levenshtein distance)
The increase performance you should not calculate the distance against your whole dictionary and constrain it with some heuristic, for instance words that start with same first letter.
Bloom Filter. False positives are possible, but false negatives are not. As you know the dictionary in advance you can eliminate the false negatives by using a perfect hash for your input.(dictionary). Or you can use this as an auxiliary data structure behind your actual dictionary data structure.
edit: Of course complexity is O(1) for bloom filter.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
SHOW_SCHEDULE(START_CITY, START_STATE , HOURS)
This function looks at the current set of campaign stops that is stored in the system to create a schedule for the candidate. The schedule includes a subset of the current set of stored campaign stops and the route information between these campaign stops. The schedule must include the maximum number of campaign stops that can be accommodated within a given number of hours. START_CITY, START_STATE together denote the first city in the schedule. HOURS denote
the number of hours for which the schedule is being made.
What will be the best algorithm for this function??
You could look at this answer which talks about Djikstra's Algorithm for routing (you would probably need to define your graph such).
Basically, have your stops as vertices and the route could be treading these vertices.
Now, since you bring in the time dimension, his makes the route somewhat non-static static. Have a look at Distance Vector routing, again as suggested in the above mentioned answer.
The below links should give some more insights and comparision on routing algorithms
Wikipedia Journey Planner
This paper compares other algorithms that are faster than Djikstra's algorithm
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Recently I came by a question which asked , how many bits are sufficient to hash a webpage with these assumptions:
There are 1 billion web pages
The average length of web pages is 300 words
We have 250,000 words in English
The pages are in ASCII
Apparently there is no one right answer to this problem , but the aim of the question is to see how the general method works.
You haven't defined what it means to “hash a webpage”; that phrase appears in this question and in a couple of other pages on Internet. In those other pages it is used to mean computing a checksum (for example with sha1sum) to verify that content is intact. If that's what you mean, then you need all the bits of any page that's to be “hashed”; on average, that is 300 * 8 * average English word length. The question doesn't specify the average English word length, but if it is five letters plus a space, the average number of bits per page is 6*300*8 or 14400.
If you instead mean putting all the words of all the webpages into an index structure to allow a search to find all the webpages that contain any given set of words, one answer is about 10^13 bits: There are 300 billion word references in a billion pages; each reference uses log_2(1G) bits, or about 30 bits, if references are stored naively; hence 9 trillion bits, or about 10^13. You can also work out that naive storage for a billion URLs is at least an order of magnitude smaller than that, ie 10^12 bits at most. Special methods might be used to reduce reference storage a couple orders of magnitude, but because URLs are easier to compress or save compactly (via, eg, a trie), reference storage is likely to still be far more than what is needed for storing URLs.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
This is broad question, but would like to know views of experts.
I came across one document Suffix arrays – a contest approach, also found some comments that participant should be ready with such data structures already in hand. now a days lot of online programming puzzles are coming with time bound. So I would like to know what are the other data-structures/algorithms should one be ready with.
I have been competing for about 10 years now and have created a not-so-bad library myself. Most of the really good competitors have their blogs for instance the legend Petr Mitrichev and there they explain ideas they got on some competitive problems. Reading these can help you - if you see a nice idea implement it and have it stored.
I add algorithms to my library when I see a problem that involves them. That way I can verify that my implementation is correct- I only add an algorithm if I have passed at least one problem with its implementation.
Here is a list with some of the algorithms I have:
I have a huge geometrial library with classes representing points, lines, polygons, segments, circles and some operations with them(for instance intersection, convex hull of set of points etc)
Tarjan's algorithm for strongly connected components
Dinitz flow algorithm
Bipartite matching implementation
Min cost max flow implementation
Aho-Corasic string searching algorithm
Knuth-morris-pratt string searching algorithm
Rabin-Karp string searching algorithm
Linear time suffix tree using ukonnen's algorithm
Fast exponentiation
Polynom implementation
Big integer implementation
Fractional numbers implementation
Matrix class implementation
Prime factorization
Eratosthenes Sieve
Segment Tree
Hungarian algorithm
2-Sat algorithm. For this I use Tarjan's algorithm mentioned above.
You will notice that some of the most basic algorithms(like BFS,DFS, Dijkstra) are not mentioned above and that is because I don't have them implemented. These algorithms can not be easily generalized in a way that you will simply copy and paste them and everything will work. Also it takes me less then 5 minutes to write them - I usually put in my library only algorithms that are either hard to implement or are easy to make an error when implementing them.
Check out these featured articles # TopCoder. They are really cool.
While you are at it, I suggest taking part in the programming contests at TopCoder. Because the best way to improve is to practice & keep taking part in such contests.
Also Project Euler too is really addictive.
Also, take a look at the Programming Challenges book, it's a great reference on the subject - it presents the topics necessary for succeeding in a programming contest, backed by an online judge.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I was recently reading an article which mentioned:
For God's sake, don't try sorting a linked list during the interview.
Is there any reason why the author wrote this? The reason is not immediately clear. I am aware that merge sort works on linked lists in O(nlgn) time- what's wrong with that? Am I missing something obvious?
EDIT:
Any reason why is question is voted to close? I'm honestly curious and merely looking for some answers or interesting points.
I have no way of knowing why the author of the blog wrote what he did. If I had to guess, I'd say what was really meant was something along the lines of:
Don't assume that efficiently sorting a linked list would be as easy as sorting a data structure that provides random access to its elements. If you do end up relying on being able to sort a linked list, be prepared to explain what a suitable algorithm might be, and to discuss its complexity.
I think you'll find that, although it's possible to sort a linked list using merge sort, the code to do so efficiently is somewhat involved. It's not something you'd want to develop while standing at the white board in the middle of an interview.
The operation of getting/setting elements at specific indices is used by most sorting algorithms, and are required to be fast in order for the sorting algorithms to be fast. Normally they are O(1) for say a normal list, but for a linked list it is O(n) and this makes the sorting terribly inefficient. Perhaps this captures the reasoning behind your quote.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Can someone give a brief list of Mathematics areas (like functions, calculus etc.,) to learn for understanding the Algorithm Analysis books (like Introduction to Algorithms)?
I would start with discrete mathematics. That would probably give you the best computational basis and intuition for what computer algorithms are about in terms of working with sets and discrete numbers in general. Also, something on data structures and algorithms would help as well. This would give you good background on things like sorting arrays, efficient searches etc. You could then move on to books on artificial intelligence (my best guess), but by this time you should definitely be ready to read some algorithms books. IMO, that is.
UPDATE
Also, calculus never hurts either if you're working with minimization/maximization/optimization problems. That might or might not bee needed depending on the specific algorithms you'd like to work with.
To start with:
number theory, especially induction.
basic set theory, sets and functions.
basic calculus, limits.
logarithms
discrete math (combinations, permutations, etc)
generating functions (adv. discrete math).
For Introduction to Algorithms the only things you really need to know are induction and some basic set theory. For the more advanced parts you also need to know some linear algebra and probability theory.