The number with the biggest number of repetitions within an array

The number with the biggest number of repetitions within an array - algorithm

I was asked an interview question which asked me to return the number with the biggest repetition within an array, for example, {1,1,2,3,4} returns 1.
I first proposed a method in hashtable, which requires space complexity O(n).
Then I said sort the array first and then go through it then we can find the number.
Which requires O(NlogN)
The interviewer was still not satisfied.
Any optimization?
Thanks.

Interviewers aren't always looking for solutions per se. Sometimes they're looking to find out your capacity to do other things. You should have asked if there there were any constraints on the data such as:
is it already sorted?
are the values limited to a certain range?
This establishes your ability to think about problems rather than just blindly vomit forth stuff that you've read in textbooks. For example, if it's already sorted, you can do it in O(n) time, O(1) space simply by looking for the run with the largest size.
If it's unsorted but limited to the values 1..100, you can still do it in O(n) time, O(1) space by creating a count of each possible value, initially all set to zero, then incrementing for each item.
Then ask the interviewer other things like:
What sort of behaviour do they want if there are two numbers with the same count?
If they're not satisfied with your provided solutions, try to get a clue as to where they're thinking. Do they think it can be done in O(log N) or O(1)? Interviews are never a one-way street.
There are boundless other "solutions" like stashing the whole thing into a class so that you can perform other optimisations (such as caching the information, or using a different data structures which makes the operation faster). Discussing these with your interviewer will give them a chance to see you in action.
As an aside, I tell my children to always show working out in their school assignments. If they just plop down the answer and it's wrong, they'll get nothing. However, if they show their working out and get the wrong answer, the teacher can at least see that they had the right idea (they probably just made one little mistake along the way).
It's exactly the same thing here. If you simply say "hashtable" and the interviewer has a different idea, that'll be a pretty short interview question.
However, saying "based on unsorted arrays, no possibility of keeping the data in a different data structure, and no limitations on the data values, it would appear hashtables are the most efficient way, BUT, if there was some other information I'm not yet privy to, there might be a better method" will show that you've given it some thought, and possibly open a dialogue with the interviewer that will help you out.
Bottom line, when an interviewer asks you a question, don't always assume it's as straightforward as you initially think. Apart from tech knowledge, they may be looking to see how you approach problem solving, how you handle Kobayashi-Maru-type problems, how you'll work in a team, how you'll treat difficult customers, whether you're a closet psychopath and endless other possibilities.

Related

Independence of order of insertion hashtable with open addressing

I'm taking a data-structure class, and the lecturer made the following assertion:
the number of attempts needed to insert n keys in a hash table with linear probing is independent of their order.
No proof was given, so I tried to get one myself. However, I'm stuck.
My approach at the moment: I try to show that if I swap two adjacent keys the number of attempts doesn't change. I get the idea behind it, and I think it's going in the right direction, but I can't manage to make it into a rigorous proof.
Aside, does this fact also hold for other probing techniques such as quadratic or double hashing?

When is a linked list the best data structure to use

like the title really. My question is can you give an example where a linked list is the BEST data structure to use. I have been struggling to think of any really, and in my code I pretty much always just use hashmaps or lists etc.
http://bigocheatsheet.com/ Here you can see the cheat sheet of Big O's for various operations. A linked list is no better than a stack or a queue in terms of complexity. And so I wanted to know when someone might use a linked list over these for example? A perfect answer will say "Imagine I was trying to do XYZ, if I did it with an array it would look like this {enter some code}, however, if I do it with a linked list, it will look like this {enter more code}. The complexities or space are substantially better for the linked list." etc.
I don't want an answer where someone tells me WHAT a linked list is. I know what a linked list is and how they are implemented.
Thanks

Consider if you have a line-up of people, and somewhere in the middle you want to add a lot of people. If you used a conventional ArrayList, you would need to shift all elements after it, so O(N) because of indexing per person! In a LinkedList, each person would be O(1), with O(N) to get to the middle. Linked Lists are very quick in adding elements in the middle, as you don't need to reindex anything and just adjust the local pointer.

Someone dd a survey of the C++ standard template library and found that the linked list was the least used of all the common basic structures. So you're right they they are not much used. They're useful when you don't need random access to an array, when you don't know N or have a reasonably tight upper bound on N, and when insertions and deletions are common and time critical. An insertion in the middle is O(N), as with an array, but the actual operation is a lot cheaper (pointer dereference rather than memory shifting), insertions at the beginning are O(1), and at the end if you keep an end pointer.

Sort elements with the fewest comparisons possible

I have a folder full of images. There are too many to just 'rank'. I made a program that shows two at a time and let's the user pick which one of the two is better. At the end I would like all of the photos to be ordered from best to worst.
I am purely trying to optimize for the fewest amount of comparisons possible. I don't care if the program runs in n cubed time. I've read the other questions here with similar questions but I'm looking for something more advanced.
I'm thinking maybe some sort of algorithm that based on what comparisons you've already made, the program chooses two images to compare that will offer the most information. Maybe even an algorithm that makes complex connections to help determine the orders and potential orders.
Like I said I don't care if it is slow just purely trying to minimize comparisons

If total order exists, you need at least nlog2(n) comparisons. It can be easily proved mathematically. No way around. So regular sorting algorithms in nlog(n) will do the job.
What you are trying to do is called 'topological sort'. Google it and read about it in wikipedia. You can achieve partial sorts in less comparisons. Its kind of a graduate sort. The more comparisons you get, the better the result will be.
However, what do you do if no total order exists? Humans are not able to generate a total order for subjective tasks.
For example picture 1 is better than 2, 2 is better than 3 but 3 is better than 1.
In this case no sorting algorithm can produce a permutation which will match all the decisions. During topological sort, you can detect those inconsitent decisions and get rid of them.

You are looking for a sorting algorithm - pick one. Most algorithms just need a comparison function (a < b?). This is when you show the user two pictures and he has to choose the better one.
You might wan't to read trough some of the algorithms and choose the best one for you. E.g. on quicksort, you would pick a random picture and the user have to compare this picture against all other pictures in the first round - might be too boring from the end user perspective.

Algorithm for highest value inside budget

I wasn't entirely sure the best way to ask this question (or do the research to see if it has been previously answered).
Given a data set where each entry has a Point value and a Dollar value, I'm looking to generate a list of length N entries that yields the highest aggregate Point value whilst staying within budget B.
Example data set:
Item Points Dollars
Apple 3.0 $1.00
Pear 2.5 $0.75
Peach 2.8 $0.88
And with this (small) data set, say my budget (B) is $2.25, and list length (N) must be 2. You MUST use the fixed list length, but are not required to use ALL of the budget.
Obviously the example provided is easy to do in one's head, but given a much larger data set, and both higher N and B values, I'm looking for an algorithm that can generate the list. Having a hard time wrapping my head around this one.
Just looking for a pseudo-algorithm, but if you prefer any given language feel free to respond with that!

I am quite positive that this can be reduced to an NP-complete problem and hence it's not really worth trying to develop a process that will always give you the 'correct' answer as many people have tried and failed to do this efficiently over a large data set. However, you can use a much more efficient approximation technique that whilst it will not guarantee to give you the correct answer, many popular approximation algorithms are capable of achieving a high degree of accuracy.
Hope this helps you out :)

This problem is NP-Complete (NP and NP-Hard), meaning, that until now there is no algorithm found, that solves this problem in a polynomial amount time (polynomial to the input size) and if you find an algorithm that does, you would have solved one of the greatest problems in computer science (P=NP), which would you at least bring a million dollar reward.
If you are satisfied with an approximation, I would recommend the Greedy-Algorithm:
https://en.wikipedia.org/wiki/Greedy_algorithm

Defining a special case of a subset-sum with complications

I have a problem that I have a number of questions about. First, I'm mostly looking for help describing and understanding the problem at hand. Solutions are always welcome, but most importantly I could use some advice from someone more experienced than I. Now, to the problem at hand:
I have a set of orders that each require some number of items. I also have several groupings of items that each contain some number of some items (call them groups). The goal is to find a subset of the orders that can be fulfilled using as few groups as possible and where the total number of items contained within the orders is between n and N.
Edit: The constraints on the number of items contained in the orders (n and N) are chosen independently.
To me at least, that's a really complicated way of saying the problem so I've been trying to re-phrase it as a knapsack problem (I suspect this might reduce to a subset-sum). To help my conceptual understanding of this I've started using the following definitions:
First, lets say that a dimension exists for each possible item, and somethings 'length' in that dimension is the number of that particular type of item it either has or requires.
From this, an order becomes an 'n-dimensional object' where its value in each dimension corresponds to the number of that item that it requires.
In addition, a group can be seen as an 'n-dimensional box' that has space in each dimension corresponding to the number of items it provides.
An objects value is equal to the sum of its length in all dimensions.
Boxes can be combined.
Given the above I've rephrased the problem to this:
What is the smallest combination of boxes that can hold a combination of items with value between n and N.
Question #1: Is this a correct/useful way to express the problem? Does it seem like I've missed anything obvious?
As I see it, since there are two combinations that I'm looking for I need to break the problem into two parts. So far I think breaking the problem up like this is a good step:
How many objects can box (or combination of boxes) X hold?
Check all (or preferably some small subset of) the possible combinations of boxes and pick the 'best'.
That makes it a little more manageable, but I'm still struggling with the details.
Question #2: Solved To solve the first part I think it's appropriate to say that the cost of an object is equal to the sum of its length in all dimensions, so is it's value. That places me into a subset-sum problem, right? Obviously it's a special case, but does this problem have a name?
Question #3: Solved I've been looking into subset-sum solutions a lot, but I don't understand how to apply them to something like this in multiple dimensions. I assume it's been done before, but I'm unsure where to start my research. Could someone either describe the principles at work or point me in a research direction?
Edit: After looking at everyone's feedback and digging into the terms I think I've found a good algorithm I can implement to solve part 1. Since I will have a very large number of dimensions compared to the number of items it looks like using a 'primal effective capacity heuristic (PECH)' will be a good fit. I'd be interested in hearing someones thoughts about it if they have experience with such an algorithm.
Question #4: For the second part, performance is a concern and I doubt it will be realistic to brute force it. So I intend to treat all combinations of boxes as a really big tree of solutions. The idea is to compute part 1 for all combinations of M-1 boxes where M is the total number of boxes. Somehow determine the 'best' couple box combinations from that set and do the same to their child nodes on the tree. Does this sound like it would help me arrive at something close to optimal? How would I choose the 'best' box combinations?
Thanks for reading! Suggestions for edits and clarifications are welcome.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio