This is an exam practice question i've been working on, i know of methods to do this but as the question states i don't know which would be the most efficient.
You
are
given
a
telephone
book
listing
the
surnames
of
people
in
alphabetical
order.
Describe
the
fastest
method
(clearly
explain
what
you
have
to
do)
you
can
use
to
find
a
given
surname.
If
there
are
n
people
listed
in
the
telephone
book,
what
is
the
Big
O
complexity
of
your
fastest
method
(and
explain
why)?
In this case you know the phone book entries are in order already. This means that a binary search is probably your best bet. This search works by cutting the number of entries to search in half on each iteration. It only works if your data is already sorted however. Check out this website for time complexity in Big O notation: http://bigocheatsheet.com
Edit wording
Related
Assuming that I have an array of twitter users and their followers, and I want to identify 5 users who have the most number of unqiue followers such that if I ask them to retweet an advertisement for my product, it would reach the most number of users.
I do not have formal programming or computer science training. However I do understand algorithms and basic CS concepts. It would be great if a solution can be provided in a way a layman could follow.
This is the "Maximum coverage problem", which is a class of problems thought to be difficult to solve efficiently (so-called NP-hard problems). You can read about the problem on wikipedia: https://en.wikipedia.org/wiki/Maximum_coverage_problem
A simple algorithm to solve it is to enumerate all subsets of size 5 of your friends, and measure the size of union of their followers. It is not an efficient solution, since if you've got n friends, then there's around n^5 subsets of size 5 (assuming n is large).
If you wanted a solution that's feasible to code and may be reasonably efficient in real-world cases, you might look at representing the problem as an "integer linear program" (ILP) and use a solver such as GLPK. The details of how to represent max-coverage as an ILP is given on the wikipedia page. Getting it working will require some effort though, and may still not work well if your problem is large.
I have a folder full of images. There are too many to just 'rank'. I made a program that shows two at a time and let's the user pick which one of the two is better. At the end I would like all of the photos to be ordered from best to worst.
I am purely trying to optimize for the fewest amount of comparisons possible. I don't care if the program runs in n cubed time. I've read the other questions here with similar questions but I'm looking for something more advanced.
I'm thinking maybe some sort of algorithm that based on what comparisons you've already made, the program chooses two images to compare that will offer the most information. Maybe even an algorithm that makes complex connections to help determine the orders and potential orders.
Like I said I don't care if it is slow just purely trying to minimize comparisons
If total order exists, you need at least nlog2(n) comparisons. It can be easily proved mathematically. No way around. So regular sorting algorithms in nlog(n) will do the job.
What you are trying to do is called 'topological sort'. Google it and read about it in wikipedia. You can achieve partial sorts in less comparisons. Its kind of a graduate sort. The more comparisons you get, the better the result will be.
However, what do you do if no total order exists? Humans are not able to generate a total order for subjective tasks.
For example picture 1 is better than 2, 2 is better than 3 but 3 is better than 1.
In this case no sorting algorithm can produce a permutation which will match all the decisions. During topological sort, you can detect those inconsitent decisions and get rid of them.
You are looking for a sorting algorithm - pick one. Most algorithms just need a comparison function (a < b?). This is when you show the user two pictures and he has to choose the better one.
You might wan't to read trough some of the algorithms and choose the best one for you. E.g. on quicksort, you would pick a random picture and the user have to compare this picture against all other pictures in the first round - might be too boring from the end user perspective.
I was asked an interview question which asked me to return the number with the biggest repetition within an array, for example, {1,1,2,3,4} returns 1.
I first proposed a method in hashtable, which requires space complexity O(n).
Then I said sort the array first and then go through it then we can find the number.
Which requires O(NlogN)
The interviewer was still not satisfied.
Any optimization?
Thanks.
Interviewers aren't always looking for solutions per se. Sometimes they're looking to find out your capacity to do other things. You should have asked if there there were any constraints on the data such as:
is it already sorted?
are the values limited to a certain range?
This establishes your ability to think about problems rather than just blindly vomit forth stuff that you've read in textbooks. For example, if it's already sorted, you can do it in O(n) time, O(1) space simply by looking for the run with the largest size.
If it's unsorted but limited to the values 1..100, you can still do it in O(n) time, O(1) space by creating a count of each possible value, initially all set to zero, then incrementing for each item.
Then ask the interviewer other things like:
What sort of behaviour do they want if there are two numbers with the same count?
If they're not satisfied with your provided solutions, try to get a clue as to where they're thinking. Do they think it can be done in O(log N) or O(1)? Interviews are never a one-way street.
There are boundless other "solutions" like stashing the whole thing into a class so that you can perform other optimisations (such as caching the information, or using a different data structures which makes the operation faster). Discussing these with your interviewer will give them a chance to see you in action.
As an aside, I tell my children to always show working out in their school assignments. If they just plop down the answer and it's wrong, they'll get nothing. However, if they show their working out and get the wrong answer, the teacher can at least see that they had the right idea (they probably just made one little mistake along the way).
It's exactly the same thing here. If you simply say "hashtable" and the interviewer has a different idea, that'll be a pretty short interview question.
However, saying "based on unsorted arrays, no possibility of keeping the data in a different data structure, and no limitations on the data values, it would appear hashtables are the most efficient way, BUT, if there was some other information I'm not yet privy to, there might be a better method" will show that you've given it some thought, and possibly open a dialogue with the interviewer that will help you out.
Bottom line, when an interviewer asks you a question, don't always assume it's as straightforward as you initially think. Apart from tech knowledge, they may be looking to see how you approach problem solving, how you handle Kobayashi-Maru-type problems, how you'll work in a team, how you'll treat difficult customers, whether you're a closet psychopath and endless other possibilities.
I was reading about sorting of presorted list in which few numbers are unsorted, someone said that cooks-kim algorithm is best for such cases, I googled about it but no relevant links.
Please let me know if anyone knows about it
Thank you
Kurtis R Cook, Do Jin Kim, the paper you want is called "Best sorting algorithm for nearly sorted list", can be found in Communications of the ACM,
23:620–624, 1980.
Can't find anywhere to download it from, the publisher keeps vigilant, $15 from ACM themselves.
To answer your question, it's a combination of an insertion sort and a quick sort, optimised for reordering mostly ordered data. ie. bringing a previously sorted list back in to a sorted form after some alterations.
There's one research paper of them... You can view it if you have ACM account
Simple online games of 20 questions powered by an eerily accurate AI.
How do they guess so well?
You can think of it as the Binary Search Algorithm.
In each iteration, we ask a question, which should eliminate roughly half of the possible word choices. If there are total of N words, then we can expect to get an answer after log2(N) questions.
With 20 question, we should optimally be able to find a word among 2^20 = 1 million words.
One easy way to eliminate outliers (wrong answers) would be to probably use something like RANSAC. This would mean, instead of taking into account all questions which have been answered, you randomly pick a smaller subset, which is enough to give you a single answer. Now you repeat that a few times with different random subset of questions, till you see that most of the time, you are getting the same result. you then know you have the right answer.
Of course this is just one way of many ways of solving this problem.
I recommend reading about the game here: http://en.wikipedia.org/wiki/Twenty_Questions
In particular the Computers section:
The game suggests that the information
(as measured by Shannon's entropy
statistic) required to identify an
arbitrary object is about 20 bits. The
game is often used as an example when
teaching people about information
theory. Mathematically, if each
question is structured to eliminate
half the objects, 20 questions will
allow the questioner to distinguish
between 220 or 1,048,576 subjects.
Accordingly, the most effective
strategy for Twenty Questions is to
ask questions that will split the
field of remaining possibilities
roughly in half each time. The process
is analogous to a binary search
algorithm in computer science.
A decision tree supports this kind of application directly. Decision trees are commonly used in artificial intelligence.
A decision tree is a binary tree that asks "the best" question at each branch to distinguish between the collections represented by its left and right children. The best question is determined by some learning algorithm that the creators of the 20 questions application use to build the tree. Then, as other posters point out, a tree 20 levels deep gives you a million things.
A simple way to define "the best" question at each point is to look for a property that most evenly divides the collection into half. That way when you get a yes/no answer to that question, you get rid of about half of the collection at each step. This way you can approximate binary search.
Wikipedia gives a more complete example:
http://en.wikipedia.org/wiki/Decision_tree_learning
And some general background:
http://en.wikipedia.org/wiki/Decision_tree
It bills itself as "the neural net on the internet", and therein lies the key. It likely stores the question/answer probabilities in a spare matrix. Using those probabilities, it's able to use a decision tree algorithm to deduce which question to ask that would best narrow down the next question. Once it narrows the number of possible answers to a few dozen, or if it's reached 20 questions already, then it starts reading off the most likely.
The really intriguing aspect of 20q.net is that unlike most decision tree and neural network algorithms I'm aware of, 20q supports a sparse matrix and incremental updates.
Edit: Turns out the answer's been on the net this whole time. Robin Burgener, the inventor, described his algorithm in detail in his 2005 patent filing.
It is using a learning algorithm.
k-NN is a good example of one of these.
Wikipedia: k-Nearest Neighbor Algorithm