Related
This question is about how to best approach a coding interview from a data structures point of view.
The way I see it, there are two different ways, I could implement a specific DS from scratch, initialise it and then use it to solve my problem, or simply use a library (I'm talking about Node.js here, but I guess this applies to other languages as well, at least those with some in-built support for DS) without worrying about the implementation and only focusing on how to use them to solve a problem.
In the first case, I'm also demonstrating that I can implement a specific DS from scratch, but at the same time I would need more time and there's some additional complexity. Instead, using a library would leave me more time to solve the actual problem, but some companies might take a dim view on this approach.
I know there's no silver bullet, and different companies will have different views, but what approach would you take if you could only pick one, and why?
Well it is always best to use the library but it is always better to know how common library functions work at least the basic ones.
For example, in many interviews Binary search is asked to be implemented instead of just using the library functions. This is because knowing the implementation adds some good concept which can be used in general problem solving like using the same concept in other divide and conquer algorithms.
In production level code we always look for the fail safe and properly tested library code.
You should pick available libraries, first hand. If needed, customize the behavior of already available libraries.
I'm trying to implement a small problem to understand better in my compilers class. This is the problem as follows: Assume, I have a bunch of files to compile as follows:
a depends on nothing
b depends on c
c depends on f
d dependes on a
e depends on b
f depends on nothing
So in this case the compilation order for the files to be successfully compiled is a,f,c,b,d,e. I want to write my own algorithm to output the desired dependency just as an exercise. I know the linker does it automatically in C++ etc, but this is just a personal exercise. How can I go about solving this problem. Any references to algorithms/readings is much appreciated, since I'm fairly new.
Based on the comment of #ajb, a quick google search brings up the Wikipedia article for Topological Sorting. However, it seems to me that if you're going to go through the trouble of making a graph to represent the problem, there's a really easy way to do this.
First, for each file you're compiling, make a node. Then have an edge from each node to it's dependency, and have an edge to a special node for if the file requires no dependency. Once that is done, all you have to do is reverse the edges and compile in a breadth first search from that special node.
If you need to worry about circular dependencies or any of that jazz, then it gets a lot more complicated, but it's still doable.
Since you're asking for literature, there is a book called Data Structures and Algorithms in C++ that goes over all kinds of data structures and algorithms (what a surprise!) including graph algorithms in chapter 13.
Up until now I've mostly concentrated on how to properly design code, make it as readable as possible and as maintainable as possible. So I alway chose to learn about the higher level details of programming, such as class interactions, API design, etc.
Algorithms I never really found particularly interesting. As a result, even though I can come up with a good design for my programs, and even if I can come up with a solution to a given problem it rarely is the most efficient.
Is there a particular way of thinking about problems that helps you come up with an as efficient solution as possible, or is it simple a matter of practice and/or memorizing?
Also, what online resources can you recommend that teach you various efficient algorithms for different problems?
Data dominates. If you design your program around the right abstract data structures (ADTs), you often get a clean design, the algorithms follow quite naturally and when performance is lacking, you should be able to "plug in" more efficient ones.
A strong background in maths and logic helps here, as it allows you to visualize your program at a high level as the interaction between functions, sets, graphs, sequences, etc. You then decide whether the sets need to be ordered (balanced BST, O(lg n) operations) or not (hash tables, O(1) operations), what operations need to supported on sequences (vector-like or list-like), etc.
If you want to learn some algorithms, get a good book such as Cormen et al. and try to implement the main data structures:
binary search trees
generic binary search trees (that work on more than just int or strings)
hash tables
priority queues/heaps
dynamic arrays
Introduction To Algorithms is a great book to get you thinking about efficiency of different algorithms/data structures.
The authors of the book also teach an algorithms course on MIT . You can find most lectures here
I would say that in coming up with good algorithms (which is actually part of good design IMHO), you have to develop a way of thinking. This is best done by studying algorithm design. By study I don't mean just knowing all the common algorithms covered in a textbook, but actually understanding how and why they work, and being able to apply the basic idea contained in them to actual problems you are trying to solve.
I would suggest reading a good book on algorithms (my favourite is CLRS). For an online resource I would recommend the series of articles in the TopCoder Algorithm Tutorials.
I do not understand why you would mention practice and memorization in the same breath. Memorization won't help you at all (you probably already know this), but practice is essential. If you cannot apply what you learned, its not really learning. You can practice at various online programming contest/puzzle sites like SPOJ, Project Euler and PythonChallenge.
Recommendations:
First of all i recommend the book "Intro to Algorithms, Second Edition By corman", great book has most(if not all) of the algorithms you will need. (Some of the more important topics are sorting-algorithms, shortest paths, dynamic programing, many data structures like bst, hash maps, heaps).
another great way to learn algorithms is http://ace.delos.com/usacogate, great practice after the begining.
To your questions you will just get used to write good fast running code, after a little practice you just wouldnt want to write un-efficient code.
While I think #larsmans is correct inasmuch that understanding logic and maths is a fast way to understanding how to choose useful ADTs for solving a given problem, studying existing solutions may be more instructive for those of us who struggle with those topics. In particular, reviewing code of established software (OSS) that solves some similar problem as the one you're interested in.
I find a particularly good method for this method of study is reviewing unit tests of such a project. Apache Lucene, for example, has a source control repository containing numerous examples. While it doesn't reveal the underlying algorithms, it helps trace to particular functionality that solves a certain problem. This leads to an opportunity for studying its innards - i.e. an interesting algorithm. In Lucene's case inverted indices come to mind.
While this does not guarantee the algorithm you discover is the best, it's likely one that's received a lot scrutiny and probably comes from project with an active mailing that may answer your questions. So it's a good resource for finding a solution that is probably better than what most of us would come up with on our own.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
What are some simple algorithm or data structure related "white boarding" problems that you find effective during the candidate screening process?
I have some simple ones that I use to validate problem solving skills and that can be simply expressed but have some opportunity for the application of some heuristics.
One of the basics that I use for junior developers is:
Write a C# method that takes a string which contains a set of words (a sentence) and rotates those words X number of places to the right. When a word in the last position of the sentence is rotated it should show up at the front of the resulting string.
When a candidate answers this question I look to see that they available .NET data structures and methods (string.Join, string.Split, List, etc...) to solve the problem. I also look for them to identify special cases for optimization. Like the number of times that the words need to be rotated isn't really X it's X % number of words.
What are some of the white board problems that you use to interview a candidate and what are some of the things you look for in an answer (do not need to post the actual answer).
I enjoy the classic "what's the difference between a LinkedList and an ArrayList (or between a linked list and an array/vector) and why would you choose one or the other?"
The kind of answer I hope for is one that includes discussion of:
insertion performance
iteration performance
memory allocation/reallocation impact
impact of removing elements from the beginning/middle/end
how knowing (or not knowing) the maximum size of the list can affect the decision
Once when I was interviewing for Microsoft in college, the guy asked me how to detect a cycle in a linked list.
Having discussed in class the prior week the optimal solution to the problem, I started to tell him.
He told me, "No, no, everybody gives me that solution. Give me a different one."
I argued that my solution was optimal. He said, "I know it's optimal. Give me a sub-optimal one."
At the same time, it's a pretty good problem.
When interviewing recently, I was often asked to implement a data structure, usually LinkedList or HashMap. Both of these are easy enough to be doable in a short time, and difficult enough to eliminate the clueless.
This doesn't necessarily touch on OOP capabilities but in our last set of interviews we used a selection of buggy code from the Bug of the Month list. Watching the candidates find the bugs shows their analytical capabilities, shows the know how to interpret somebody elses code
Write a method that takes a string, and returns true if that string is a number.(anything with regex as the most effective answer for an interview)
Please write an abstract factory method, that doesn't contain a switch and returns types with the base type of "X". (Looking for patterns, looking for reflection, looking for them to not side step and use an if else if)
Please split the string "every;thing|;|else|;|in|;|he;re" by the token "|;|".(multi character tokens are not allowed at least in .net, so looking for creativity, the best solution is a total hack)
Graphs are tough, because most non-trivial graph problems tend to require a decent amount of actual code to implement, if more than a sketch of an algorithm is required. A lot of it tends to come down to whether or not the candidate knows the shortest path and graph traversal algorithms, is familiar with cycle types and detection, and whether they know the complexity bounds. I think a lot of questions about this stuff comes down to trivia more than on the spot creative thinking ability.
I think problems related to trees tend to cover most of the difficulties of graph questions, but without as much code complexity.
I like the Project Euler problem that asks to find the most expensive path down a tree (16/67); common ancestor is a good warm up, but a lot of people have seen it. Asking somebody to design a tree class, perform traversals, and then figure out from which traversals they could rebuild a tree also gives some insight into data structure and algorithm implementation. The Stern-Brocot programming challenge is also interesting and quick to develop on a board (http://online-judge.uva.es/p/v100/10077.html).
Follow up any question like this with: "How could you improve this code so the developer who maintains it can figure out how it works easily?"
Implement a function that, given a linked list that may be circular, swaps the first two elements, the third with the fourth, etc...
I like to go over a code the person actually wrote and have them explain it to me.
Asking them to write a recursive algorithm for a well known iterative solution (i.e. Fibonacci etc. -- we give them an iterative function, if needed) and then have them compute the run time for it.
Many times the recursive function involves a tree data structure. The number of times the person has failed to recognize that baffles me. It becomes slightly difficult to calculate the run time until you can see that it's a tree structure...
I find that this problem covers many areas. Namely, their code-reading ability (if they are given an iterative function), code-writing ability (since they write a recursive function), algorithm, data-structure (for run-time)...
A trivial one is to ask them to code up a breadth-first search of a tree from scratch. Yeah, if you know what you're doing it is trivial. But a lot of programmers don't know how to tackle it.
One that I find more useful still is as follows. I've given this in a number of languages, here is a Perl version. First I give them the following code sample:
# #a and #b are two arrays which are already populated.
my #int;
OUTER: for my $x (#a) {
for my $y (#b) {
if ($x eq $y) {
push #int, $x;
next OUTER;
}
}
}
Then I ask them the following questions. I ask them slowly, give people time to think, and am willing to give them nudges:
What is in #int when this code is done?
This code is put into production and there is a performance problem that is tracked back to this code. Explain the potential performance problem. (If they are struggling I'll ask how many comparisons it takes if #a and #b each have 100,000 elements. I am not looking for specific terminology, just a back of the envelope estimate.)
Without code, suggest to make this faster. (If they propose a direction that is easy to code, I'll ask them to code it. If they think of a solution that will result in #int being changed in any way (eg commonly order), I'll push to see whether they realize that they shouldn't code the fix before checking whether that matters.)
If they come up with a slightly (or very) wrong solution, the following silly data set will find most mistakes you run across:
#a = qw(
hello
world
hello
goodbye
earthlings
);
#b = qw(
earthlings
say
hello
earthlings
);
I'd guess that about 2/3 of candidates fail this question. I have yet to encounter a competent programmer who had trouble with it. I've found that people with good common sense and very little programming background do better on this than average programmers with a few years of experience.
I would suggest using these questions as filters. Don't hire someone because they can answer these. But if they can't answer these, then don't hire them.
I wonder how many of you have implemented one of computer science's "classical algorithms" like Dijkstra's algorithm or data structures (e.g. binary search trees) in a real world, not academic project?
Is there a benefit to our dayjobs in knowing these algorithms and data structures when there are tons of libraries, frameworks and APIs which give you the same functionality?
Is there a benefit to our dayjobs in knowing these algorithms and data structures when there are tons of libraries, frameworks and APIs which give you the same functionality?
The library doesn't know what your problem domain is and won't be able to chose the correct algorithm to do the job. That is why I think it is important to know about them: then YOU can make the correct choice of algorithms to solve YOUR problem.
Knowing, or being able to understand these algorithms is important, these are the tools of your trade. It does not mean you have to be able to implement A* in an hour from memory. But you should be able to figure out what the advantages of using a red-black tree as opposed to a normal unbalanced tree are so you can decide if you need it or not. You need to be able to judge the fitness of an algorithm for solving your problem.
This might sound too school-masterish but these "classical algorithms" were not invented to give college students exam questions, they were invented to solve problems or improve on current solutions, just like the array, the linked list or the stack are building blocks to write a program so are some of these. Just like in math where you move from addition and subtraction to integration and differentiation, these are advanced techniques that will help you solve problems that are out there.
They might not be directly applicable to your problems or work situation but in the long run knowing of them will help you as a professional software engineer.
To answer your question, I did an implementation of A* recently for a game.
Is there a benefit to understanding your tools, rather than simply knowing that they exist?
Yes, of course there is. Taking a trivial example, don't you think there's a benefit to knowing what the difference is List (or your language's equivalent dynamic array implementation) and LinkedList (or your language's equivalent)? It's pretty important to know that one has constant random access time, while the other is linear. And one requires N copies if you insert a value in the middle of the sequence, while the other can do it in constant time.
Don't you think there's an advantage to understanding that the same sorting algorithm isn't always optimal? That for almost-sorted data, quicksort sucks, for example? Naively just calling Sort() and hoping for the best can become ridiculously expensive if you don't understand what's happening under the hood.
Of course there are a lot of algorithms you probably won't need, but even so, just understanding how they work may make it easier for yourself to come up with efficient algorithms to solve other, unrelated, problems.
Well, someone has to write the libraries. While working at a mapping software company, I implemented Dijkstra's, as well as binary search trees, b-trees, n-ary trees, bk-trees and hidden markov models.
Besides, if all you want is a single 'well known' algorithm, and you also want the freedom to specialise it and optimise it if it becomes critical to performance, including a whole library seems like a poor choice.
We use a home grown implementation of a p-random number generator from Knuth SemiNumeric as an aid in some statistical processing
In my previous workplace, which was an EDA company, we implemented versions of Prim and Dijsktra's algorithms, disjoint set data structures, A* search and more. All of these had real world significance. I believe this is dependent on problem domain - some domains are more algorithm-intensive and some less so.
Having said that, there is a fine line to walk - I see no business reason for re-implementing STL or Java Generics. In many cases, a standard library is better than "inventing a wheel". The more you are near your core application, the more it may be necessary to implement a textbook algorithm or data structure.
If you never work with performance-critical code, consider yourself lucky. However, I consider this scenario unrealistic. Performance problems could occur anywhere. And then it's necessary to know how to fix that problem. Obviously, merely knowing a few algorithm names isn't enough here – unless you want to implement them all and try them out one after the other.
No, knowing (at least some of) the inner workings of different algorithms is important for gauging their strengths and weaknesses and for analyzing how they would handle your situation.
Obviously, if there's a library already implementing exactly what you need, you're incredibly lucky. But let's face it, even if there is such a library, using it is often not completely straightforward (at the very least, interfaces and data representation often have to be adapted) so it's still good to know what to expect.
A* for a pac man clone. It took me weeks to really get but to this day I consider it a thing of beauty.
I've had to implement some of the classical algorithms from numerical analysis. It was easier to write my own than to connect to an existing library. Also, I've had to write variations on classical algorithms because the textbook case didn't fit my application.
For classical data structures, I nearly always use the standard libraries, such as STL for C++. The one time recently when I thought STL didn't have the structure I needed (a heap) I rolled my own, only to have someone point out almost immediately that I didn't need to do that.
Classical algorithms I have used in actual work:
A topological sort
A red-black tree (although I will
confess that I only had to implement
insertions for that application and
it only got used in a prototype).
This got used to implement an
'ordered dict' type structure in
Python.
A priority queue
State machines of various sorts
Probably one or two others I can't remember.
As to the second part of the question:
An understanding of how the algorithms work, their complexity and semantics gets used on a fairly regular basis. They also inform the design of systems. Occasionally one has to do things involving parsing or protocol handling, or some computation that's slightly clever. Having a working knowledge of what the algorithms do, how they work, how expensive they are and where one might find them lying around in library code goes a long way to knowing how to avoid reinventing the wheel poorly.
I use the Levenshtein distance algorithm to help implement a 'Did you mean [suggested word]?' feature in our website search.
Works quite well when combined with our 'tagging' system, which allows us to associate extra words (other than those in title/description/etc) with items in the database. \
It's not perfect by any means, but it's way better than most corporate site searches, if I don't say so myself ; )
Classical algorithms are usually associated with something glamorous, like games, or Web search, or scientific computation. However, I had to use some of the classical algorithms for a mere enterprise application.
I was building a metadata migration tool, and I had to use topological sort for dependency resolution, various forms of graph traversals for queries on metadata, and a modified variation of Tarjan's union-find datastructure to partition forest-like structured metadata to trees.
That was a really satisfying experience. Most of those algorithms were implemented before, but their implementations lacked something that I would need for my task. That's why It's important to understand their internals.