LCA problem at interview - algorithm

Sometimes I come across interview questions like this: "Find the common parent of any 2 nodes in a tree". I noticed that they ask LCA questions also at Google, Amazon, etc.
As wikipedia says LCA can be found by intersection of the paths from the given nodes to the root and it takes O(H), where H is the height of the tree. Besides, there are more advanced algorithms that process trees in O(N) and answer LCA queries in O(1).
I wonder what exactly interviewers want to learn about candidates asking this LCA question. The first algorithm of paths intersection seems trivial. Do they expect the candidates to remember pre-processing algorithms ? Do they expect the candidates to invent these algorithms and data structures on the fly ?

They want to see what you're made of. They want to see how you think, how you tackle on problem, and handle stress from deadline. I suppose that if you solve the problem because you already know the solution then they will just pose you another problem. It's not the solution they want, it's your brainz.

With many algorithms questions in an interview, I don't care (or want) you to remember the answer. I want you to be able to derive the answer from the main principles you know.
Realistically: I could give two cares less if you already know the answer to "how to do X", if you are able to quickly construct an answer on the fly. Ultimately: in the real world, I can't necessarily assume you have experience tackling problems of domain X, but if you run across a problem in said domain, I will certainly hope you'll have the analytical ability to figure out a reasonable answer, based on the general knowledge you hold.
A tree is a common data structure it's safe to assume most will know - if you know the answer offhand, you should be able to explain it. If you don't know the answer, but understand the data structure, you should be able to fairy easily derive the answer.

Related

NN vs Greedy Search

Both NN and Greedy Search algorithms have a Greed nature, and both have tendency towards the lowest cost/distance (my understanding may be incorrect though). But what makes them different in a way that each one can be classified into a distinct algorithm group is somehow unclear to me.
For instance, if I can solve a particular problem using NN, I can surely solve it with Greedy Search algorithm as well specially if minimization is the case. I came to this conclusion because when I start coding them I come across very similar implementations in code although the general concept behind both might be different. Sometimes I can't even tell if the implementation follows NN or Greedy Search.
I have done my homework well and searched enough on Google, but couldn't find a decent explanation on what distinguishes them from one another. Any such explanation is indeed appreciated.
Hmm, at a very high level they both driven by heuristics in order to evaluate a given solution against an ideal solution. But, whilst a greedy search algo outputs a solution for a given input, the NN trains a model that will generate solutions for given inputs. So at a very very high level, you can think that the NN generates a solution finder, whereas the greedy search is a harcoded solution finder.
In other words, the NN will generate "code" (i.e. the model (aka the weights)) that finds solutions to the problem when provided to the same network topology. The greedy search is you actually writing the code that finds the solution to the problem. This is quite wishy washy though, I'm sure there is a much more concise, academically sound way of expressing what I've just said
All of what I've just said in based on the assumption that by "Greedy search" you meant the algorithms to solve problems such as travelling sales man.
Another way to think of it is:
In greedy search, you write an algorithm that solves a search problem (find me the graph that best describes the relationship, based on provided heuristic(s), between data point A and data point B).
When you write a neural network, you declare a network topology, provide some initially "random" weights and some heuristics to measure output errors and then train the networks weights via a plethora of different methods (back prop, GAN etc). These weights can then be used as a solver for novel problems.
For what it's worth, I don't think an NN would be a good approach to generate a solver for travelling sales man problem. You would be far better off just using a common graph search algorithm..

Precalculate Result of A*

Currently learning about the A* search algorithm and using it to find the quickest solution to the N-Puzzle. For some random seed of the initial starting state, the puzzle may be unsolvable which would result in extremely long wait times until the algorithm has search the entire search-space and determined there is not solution to the give start state.
I was wondering if there is a method of precalculating whether the A* algorithm will fail to avoid such a scenario. I've read a bit about how it is possible but can't find a direct answer as to a method in which to do it.
Any guidance or options are appreciated.
I think A* does not offer you a mechanism to know whether or not a problem is solvable. Specifically for N-Puzzle, I think this could help you to check if it can be solved or not:
http://www.geeksforgeeks.org/check-instance-8-puzzle-solvable/
It seems that if you are in a state where you have an odd amount inversion, you know for sure the problem for that permutation is infeasible.
For the N-puzzle specifically, there are only two possible parities, so you just need to check which parity the current puzzle is.
There is an in-depth explanation on how to do this on the math stackexchange
For general A* problems, no, there is no way to pre-compute if the graph is solvable.

Solving puzzle with tree data structure

I am currently following an algorithms class and we have to solve a sudoku.
I have already a working solution with naive backtracking but I'm much more interest in solving this puzzle problem with a tree data structure.
My problem is that I don't quite understand how it works. Is anyone can explain to me the basic of puzzle solving with tree?
I don't seek optimization. I looking for explanation on algorithms like the Genetic algorithm or something similar. My purpose only to learn at this point. I have hard time to take what I read in scientific articles and translate it on real implementation.
I Hope, I've made my question more clear.
Thank you very much!
EDIT: I edit the post to be more precise.
I don't know how to solve Sudoku "with a tree", but you're trying to mark as many cells as possible before trying to guess and using backtrace. Therefore check out this question.
Like in most search problems, also in Sudoku you may associate a tree with the problem. In this case nodes are partial assignments to the variables and branches correspond to extensions of the assignment in the parent node. However it is not practical (or even feasible) to store these trees in memory, you can think of them as a conceptual tool. Backtracking actually navigates such a tree using variable ordering to avoid exploring hopeless branches. A better (faster) solution can be obtained by constraint propagation techniques since you know that all rows, columns, and 3x3 cells must contain different values. A simple algorithm for that is AC3 and it is covered in most introductory AI books including Russell & Norvig 2010.

I need an algorithm to find the best path

I need an algorithm to find the best solution of a path finding problem. The problem can be stated as:
At the starting point I can proceed along multiple different paths.
At each step there are another multiple possible choices where to proceed.
There are two operations possible at each step:
A boundary condition that determine if a path is acceptable or not.
A condition that determine if the path has reached the final destination and can be selected as the best one.
At each step a number of paths can be eliminated, letting only the "good" paths to grow.
I hope this sufficiently describes my problem, and also a possible brute force solution.
My question is: is the brute force is the best/only solution to the problem, and I need some hint also about the best coding structure of the algorithm.
Take a look at A*, and use the length as boundary condition.
http://en.wikipedia.org/wiki/A%2a_search_algorithm
You are looking for some kind of state space search algorithm. Without knowing more about the particular problem, it is difficult to recommend one over another.
If your space is open-ended (infinite tree search), or nearly so (chess, for example), you want an algorithm that prunes unpromising paths, as well as selects promising ones. The alpha-beta algorithm (used by many OLD chess programs) comes immediately to mind.
The A* algorithm can give good results. The key to getting good results out of A* is choosing a good heuristic (weighting function) to evaluate the current node and the various successor nodes, to select the most promising path. Simple path length is probably not good enough.
Elaine Rich's AI textbook (oldie but goodie) spent a fair amount of time on various search algorithms. Full Disclosure: I was one of the guinea pigs for the text, during my undergraduate days at UT Austin.
did you try breadth-first search? (BFS) that is if length is a criteria for best path
you will also have to modify the algorithm to disregard "unacceptable paths"
If your problem is exactly as you describe it, you have two choices: depth-first search, and breadth first search.
Depth first search considers a possible path, pursues it all the way to the end (or as far as it is acceptable), and only then is it compared with other paths.
Breadth first search is probably more appropriate, at each junction you consider all possible next steps and use some score to rank the order in which each possible step is taken. This allows you to prioritise your search and find good solutions faster, (but to prove you have found the best solution it takes just as long as depth-first searching, and is less easy to parallelise).
However, your problem may also be suitable for Dijkstra's algorithm depending on the details of your problem. If it is, that is a much better approach!
This would also be a good starting point to develop your own algorithm that performs much better than iterative searching (if such an algorithm is actually possible, which it may not be!)
A* plus floodfill and dynamic programming. It is hard to implement, and too hard to describe in a simple post and too valuable to just give away so sorry I can't provide more but searching on flood fill and dynamic programming will put you on the path if you want to go that route.

How do 20 questions AI algorithms work?

Simple online games of 20 questions powered by an eerily accurate AI.
How do they guess so well?
You can think of it as the Binary Search Algorithm.
In each iteration, we ask a question, which should eliminate roughly half of the possible word choices. If there are total of N words, then we can expect to get an answer after log2(N) questions.
With 20 question, we should optimally be able to find a word among 2^20 = 1 million words.
One easy way to eliminate outliers (wrong answers) would be to probably use something like RANSAC. This would mean, instead of taking into account all questions which have been answered, you randomly pick a smaller subset, which is enough to give you a single answer. Now you repeat that a few times with different random subset of questions, till you see that most of the time, you are getting the same result. you then know you have the right answer.
Of course this is just one way of many ways of solving this problem.
I recommend reading about the game here: http://en.wikipedia.org/wiki/Twenty_Questions
In particular the Computers section:
The game suggests that the information
(as measured by Shannon's entropy
statistic) required to identify an
arbitrary object is about 20 bits. The
game is often used as an example when
teaching people about information
theory. Mathematically, if each
question is structured to eliminate
half the objects, 20 questions will
allow the questioner to distinguish
between 220 or 1,048,576 subjects.
Accordingly, the most effective
strategy for Twenty Questions is to
ask questions that will split the
field of remaining possibilities
roughly in half each time. The process
is analogous to a binary search
algorithm in computer science.
A decision tree supports this kind of application directly. Decision trees are commonly used in artificial intelligence.
A decision tree is a binary tree that asks "the best" question at each branch to distinguish between the collections represented by its left and right children. The best question is determined by some learning algorithm that the creators of the 20 questions application use to build the tree. Then, as other posters point out, a tree 20 levels deep gives you a million things.
A simple way to define "the best" question at each point is to look for a property that most evenly divides the collection into half. That way when you get a yes/no answer to that question, you get rid of about half of the collection at each step. This way you can approximate binary search.
Wikipedia gives a more complete example:
http://en.wikipedia.org/wiki/Decision_tree_learning
And some general background:
http://en.wikipedia.org/wiki/Decision_tree
It bills itself as "the neural net on the internet", and therein lies the key. It likely stores the question/answer probabilities in a spare matrix. Using those probabilities, it's able to use a decision tree algorithm to deduce which question to ask that would best narrow down the next question. Once it narrows the number of possible answers to a few dozen, or if it's reached 20 questions already, then it starts reading off the most likely.
The really intriguing aspect of 20q.net is that unlike most decision tree and neural network algorithms I'm aware of, 20q supports a sparse matrix and incremental updates.
Edit: Turns out the answer's been on the net this whole time. Robin Burgener, the inventor, described his algorithm in detail in his 2005 patent filing.
It is using a learning algorithm.
k-NN is a good example of one of these.
Wikipedia: k-Nearest Neighbor Algorithm

Resources