Two Sum and its Time Complexity - algorithm

I was going over both solutions to Two Sum on leetcode and I noticed that the n^2 solution is to basically test all combinations of two numbers and see if they sum to Target.
I understand the naive solution iterates over each element of the array( or more precisely n-1 times because we can't compare the last element to itself) to grab the first addend and then another loop to grab all of the following elements. This second loop needs to iterate n-1-i times where i is the index of the first addend. I can see that (n-1)(n-1-i) is O(n^2).
The problem comes when I googled "algorithm for finding combinations" and it lead to this thread where the accepted answer talks about Gray Codes which goes way above my head.
Now I'm unsure whether my assumption was correct, the naive solution is a version of a Gray Code, or something else.
If Two Sum is a combinations problem then it's Time Complexity would be O(n!/ ((n-k)! k!)) or O(nCk) but I don't see how that reduces to O(n^2)

I read the Two Sum question and it states that:
Given an array of integers nums and an integer target, return indices
of the two numbers such that they add up to target.
It is a combinations problem. However, on closer inspection you will find that here the value of k is fixed.
You need to find two numbers from a list of given numbers that
add up to a particular target.
Any two numbers from n numbers can be selected in nC2 ways.
nC2 = n!/((n-2)! * 2!)
= n*(n-1)*(n-2)!/((n-2)!*2)
= n*(n-1)/2
= (n^2 - n)/2
Ignoring n and the constant 2 as it will hardly matter when n tends to infinity. The expressions finally results in a complexity of O(n^2).
Hence, a naïve solution of Two Sum has a complexity of O(n^2). Check this article for more information on your question.
https://www.geeksforgeeks.org/given-an-array-a-and-a-number-x-check-for-pair-in-a-with-sum-as-x/

Related

Given n numbers find minimum perimeter triangle

Encountered this problem in coding contest. Could think only O(n^2 log(n)) solution. I guess the expected was O(n log n).
I am given n numbers, I have to find 3 numbers that follow triangle inequality and have the smallest sum.
I hope this is quite easy to understand.
Eg.
10,2,5,1,8,20
Answer is 23 = (5+8+10)
The longest side should be the successor of the second longest; otherwise, we could shrink the longest and thus the perimeter. Now you can use your binary search to find the third side over O(n) possibilities instead of O(n^2) (and actually, you don't even need to search if you iterate from small to large, though the sort will still cost you).
I think the answer is something like this, assuming no duplicates in the numbers.
Sort the numbers. Then scan the numbers and take the first number that is smaller than the sum of the preceding two numbers. Call that x(n) . . . the nth position of the sorted series.
x(n) is one of the numbers, and so far we are O(n log(n)).
Then there are a limited number of previous choices. Then x(n-1) has to be one of the numbers, because x(n-2) + x(n-3) < x(n-1) < x(n). Then it is a simple scan up to x(n-1) to find the smallest number that matches. This could be at the beginning of the series, as in 2, 3, 8, 15, 16.
I think the analysis is essentially the same with duplicates.

Finding the m Largest Numbers

This is a problem from the Cormen text, but I'd like to see if there are any other solutions.
Given an array with n distinct numbers, you need to find the m largest ones in the array, and have
them in sorted order. Assume n and m are large, but grow differently. In particular, you need
to consider below the situations where m = t*n, where t is a small number, say 0.1, and then the
possibility m = √n.
The solution given in the book offers 3 options:
Sort the array and return the top m-long segment
Convert the array to a max-heap and extract the m elements
Select the m-th largest number, partition the array about it, and sort the segment of larger entries.
These all make sense, and they all have their pros and cons, but I'm wondering, is there another way to do it? It doesn't have to be better or faster, I'm just curious to see if this is a common problem with more solutions, or if we are limited to those 3 choices.
The time complexities of the three approaches you have mentioned are as follows.
O(n log n)
O(n + m log n)
O(n + m log m)
So option (3) is definitely better than the others in terms of asymptotic complexity, since m <= n. When m is small, the difference between (2) and (3) is so small it would have little practical impact.
As for other ways to solve the problem, there are infinitely many ways you could, so the question is somewhat poor in this regard. Another approach I can think of as being practically simple and performant is the following.
Extract the first m numbers from your list of n into an array, and sort it.
Repeatedly grab the next number from your list and insert it into the correct location in the array, shifting all the lesser numbers over by one and pushing one out.
I would only do this if m was very small though. Option (2) from your original list is also extremely easy to implement if you have a max-heap implementation and will work great.
A different approach.
Take the first m numbers, and turn them into a min heap. Run through the array, if its value exceeds the min of the top m then you extract the min value and insert the new one. When you reach the end of the array you can then extract the elements into an array and reverse it.
The worst case performance of this version is O(n log(m)) placing it between the first and second methods for efficiency.
The average case is more interesting. On average only O(m log(n/m)) of the elements are going to pass the first comparison test, each time incurring O(log(m)) work so you get O(n + m log(n/m) log(m)) work, which puts it between the second and third methods. But if n is many orders of magnitude greater than m then the O(n) piece dominates, and the O(n) median select in the third approach has worse constants than the one comparison per element in this approach, so in this case this is actually the fastest!

Partial selection sort vs Mergesort to find "k largest in array"

I was wondering if my line of thinking is correct.
I'm preparing for interviews (as a college student) and one of the questions I came across was to find the K largest numbers in an array.
My first thought was to just use a partial selection sort (e.g. scan the array from the first element and keep two variables for the lowest element seen and its index and swap with that index at the end of the array and continue doing so until we've swapped K elements and return a copy of the first K elements in that array).
However, this takes O(K*n) time. If I simply sorted the array using an efficient sorting method like Mergesort, it would only take O(n*log(n)) time to sort the entire array and return the K largest numbers.
Is it good enough to discuss these two methods during an interview (comparing log(n) and K of the input and going with the smaller of the two to compute the K largest) or would it be safe to assume that I'm expected to give a O(n) solution for this problem?
There exists an O(n) algorithm for finding the k'th smallest element, and once you have that element, you can simply scan through the list and collect the appropriate elements. It's based on Quicksort, but the reasoning behind why it works are rather hairy... There's also a simpler variation that probably will run in O(n). My answer to another question contains a brief discussion of this.
Here's a general discussion of this particular interview question found from googling:
http://www.geeksforgeeks.org/k-largestor-smallest-elements-in-an-array/
As for your question about interviews in general, it probably greatly depends on the interviewer. They usually like to see how you think about things. So, as long as you can come up with some sort of initial solution, your interviewer would likely ask questions from there depending on what they were looking for exactly.
IMHO, I think the interviewer wouldn't be satisfied with either of the methods if he says the dataset is huge (say a billion elements). In this case, if K to be returned is huge (nearing a billion) your partial selection would almost result in an O(n^2). I think it entirely depends on the intricacies of the question proposed.
EDIT: Aasmund Eldhuset's answer shows you how to achieve the O(n) time complexity.
If you want to find K (so for K = 5 you'll get five results - five highest numbers ) then the best what you can get is O(n+klogn) - you can build prority queue in O(n) and then invoke pq.Dequeue() k times. If you are looking for K biggest number then you can get it with O(n) quicksort modification - it's called k-th order statistics. Pseudocode looks like that: (it's randomized algorithm, avg time is approximately O(n) however worst case is O(n^2))
QuickSortSelection(numbers, currentLength, k) {
if (currentLength == 1)
return numbers[0];
int pivot = random number from numbers array;
int newPivotIndex = partitionAroundPivot(numbers) // check quicksort algorithm for more details - less elements go left to the pivot, bigger elements go right
if ( k == newPivotIndex )
return pivot;
else if ( k < newPivotIndex )
return QuickSortSelection(numbers[0..newPivotIndex-1], newPivotIndex, k)
else
return QuickSortSelection(numbers[newPivotIndex+1..end], currentLength-newPivotIndex+1, k-newPivotIndex);
}
As i said this algorithm is O(n^2) worst case because pivot is chosen at random (however probability of running time of ~n^2 is something like 1/2^n). You can convert it deterministic algorithm with same running time worst case using for instance median of three median as a pivot - but it is slower in practice (due to constant).

Finding largest and second largest of n numbers in average n + log n comparisons

We know that the easy way to find the smallest number of a list would simply be n comparisons, and if we wanted the 2nd smallest number we could go through it again or just keep track of another variable during the first iteration. Either way, this would take 2n comparisons to find both numbers.
So suppose that I had a list of n distinct elements, and I wanted to find the smallest and the 2nd smallest. Yes, the optimal algorithm takes at most n + ceiling(lg n) - 2 comparisons. (Not interested in the optimal way though)
But suppose then that you're forced to use the easy algorithm, the one that takes 2n comparisons. In the worst case, it'd take 2n comparisons. But what about the average? What would be the average number of comparisons it'd take to find the smallest and the 2nd smallest using the easy brute force algorithm?
EDIT: It'd have to be smaller than 2n -- (copied and pasted from my comment below) I compare the index I am at to the tmp2 variable keeping track of 2nd smallest. I don't need to make another comparison to tmp1 variable keeping track of smallest unless the value at my current index is smaller than tmp2. So you can reduce the number of comparisons from 2n. It'd still take more than n though. Yes in worst case this would still take 2n comparisons. But on average if everything is randomly put in...
I'd guess that it'd be n + something comparisons, but I can't figure out the 2nd part. I'd imagine that there would be some way to involve log n somehow, but any ideas on how to prove that?
(Coworker asked me this at lunch, and I got stumped. Sorry) Once again, I'm not interested in the optimal algorithm since that one is kinda common knowledge.
As you pointed out in the comment, there is no need for a second comparison if the current element in the iteration is larger than the second smallest found so far. What is the probability for a second comparison if we look at the k-th element ?
I think this can be rephrased as follows "What is the probability that the k-th element is in the subset containing the 2 smallest elements of the first k elements?"
This should be 2/k for uniformly distributed elements, because if we think of the first k elements as an ordered list, every position has equal probability 1/k for the k-th element, but only two, the smallest and second smallest position, cause a second comparison. So the number of 2nd comparisons should be sum_k=1^n (2/k) = 2 H_n (the n-th harmonic number). This is actually the calculation of the expected value for second comparisons, where the random number represents the event that a second comparison has to be done, it is 1 if a second comparison has to be done and 0 if just one comparison has to be done.
If this is correct, the overall number of comparisons in the average case is C(n) = n + 2 H_n and afaik H_n = theta(log(n)), C(n) = theta(n + log(n)) = theta(n)

Is it possible to find two numbers whose difference is minimum in O(n) time

Given an unsorted integer array, and without making any assumptions on
the numbers in the array:
Is it possible to find two numbers whose
difference is minimum in O(n) time?
Edit: Difference between two numbers a, b is defined as abs(a-b)
Find smallest and largest element in the list. The difference smallest-largest will be minimum.
If you're looking for nonnegative difference, then this is of course at least as hard as checking if the array has two same elements. This is called element uniqueness problem and without any additional assumptions (like limiting size of integers, allowing other operations than comparison) requires >= n log n time. It is the 1-dimensional case of finding the closest pair of points.
I don't think you can to it in O(n). The best I can come up with off the top of my head is to sort them (which is O(n * log n)) and find the minimum difference of adjacent pairs in the sorted list (which adds another O(n)).
I think it is possible. The secret is that you don't actually have to sort the list, you just need to create a tally of which numbers exist. This may count as "making an assumption" from an algorithmic perspective, but not from a practical perspective. We know the ints are bounded by a min and a max.
So, create an array of 2 bit elements, 1 pair for each int from INT_MIN to INT_MAX inclusive, set all of them to 00.
Iterate through the entire list of numbers. For each number in the list, if the corresponding 2 bits are 00 set them to 01. If they're 01 set them to 10. Otherwise ignore. This is obviously O(n).
Next, if any of the 2 bits is set to 10, that is your answer. The minimum distance is 0 because the list contains a repeated number. If not, scan through the list and find the minimum distance. Many people have already pointed out there are simple O(n) algorithms for this.
So O(n) + O(n) = O(n).
Edit: responding to comments.
Interesting points. I think you could achieve the same results without making any assumptions by finding the min/max of the list first and using a sparse array ranging from min to max to hold the data. Takes care of the INT_MIN/MAX assumption, the space complexity and the O(m) time complexity of scanning the array.
The best I can think of is to counting sort the array (possibly combining equal values) and then do the sorted comparisons -- bin sort is O(n + M) (M being the number of distinct values). This has a heavy memory requirement, however. Some form of bucket or radix sort would be intermediate in time and more efficient in space.
Sort the list with radixsort (which is O(n) for integers), then iterate and keep track of the smallest distance so far.
(I assume your integer is a fixed-bit type. If they can hold arbitrarily large mathematical integers, radixsort will be O(n log n) as well.)
It seems to be possible to sort unbounded set of integers in O(n*sqrt(log(log(n))) time. After sorting it is of course trivial to find the minimal difference in linear time.
But I can't think of any algorithm to make it faster than this.
No, not without making assumptions about the numbers/ordering.
It would be possible given a sorted list though.
I think the answer is no and the proof is similar to the proof that you can not sort faster than n lg n: you have to compare all of the elements, i.e create a comparison tree, which implies omega(n lg n) algorithm.
EDIT. OK, if you really want to argue, then the question does not say whether it should be a Turing machine or not. With quantum computers, you can do it in linear time :)

Resources