Arithmetic sequence in a list of numbers - algorithm

For a given set of numbers
3 5 3 6 3 4 10 4 5 2
I wish to find all the **triplets** which form a arithmetic progression.
like (3,3,3) (3,4,5) (6,4,2) (3,4,5)
I have a trivial O(n^3) solution. I was wondering if it can be done it time O(n^2) or less.
Any help is highly appreciated.

O(n^2 * logn) can be achieved by:
Sort the array - O(nlogn)
iterate all pairs (O(n^2) of those) - and for each pair (x,y) do a binary search to see if you have: max{x,y} + abs(x-y) or min{x,y} - abs(x-y) as an element.
Special care should be taken for pairs where x==y - but it can be easily solved within the same time complexity.
Note that this solution will give you 1 occurance of each triplet (no duplicates).
(EDIT: by using a hash table (histogram if you care for the number of triplets ) and look in it instead of sorting the array and using binary search - you can reduce the time to O(n^2) on average, with the cost of O(n) additional space).
Without the 1 occurance drawback - it cannot be done better then O(n^3), because there could be O(n^3) such triplets, for example in the array [1,1,1,...,1] - you have chose(3,n) such triplets.

One can use hashing to solve it in O(n^2) by choosing a middle element and then choosing first and last element in O(n).
This is simple question of finding two numbers in a array whose sum is fixed. Here, a+c should be 2b.
Therefore, all I look for a & c such that a+c=2i.

Related

Why is the time complexity of the selection sort algorithm O(n2)

Let's take an array of 5 numbers for example and we wish to sort them using the selection sort algorithm.
On the first iteration, we will iterate over the the 5 numbers and swap the lowest number found with element at index 0.
The next iteration starts at index 1, and iterates over the other 4 numbers doing the same, next iteration, start at index 2 and iterate over the 3 numbers etc.
There will therefore be 5 + 4 + 3 + 2 = 14 index checks. If time complexity is stated as O(n2), then shouldn't this be 25? I understand the bubble sort algorithm having time complexity of O(n2), but not this.
I think you dont understand the real meaning of big O notation, the example that you provided is not valid. "big O notation is used to classify algorithms according to how their run time or space requirements grow as the input size grows". Moreover it does not provide exact information(count of steps to make) for small size inputs. For more info you can visit link: https://en.wikipedia.org/wiki/Big_O_notation

Binary search with Random element

I know that Binary Search has time complexity of O(logn) to search for an element in a sorted array. But let's say if instead of selecting the middle element, we select a random element, how would it impact the time complexity. Will it still be O(logn) or will it be something else?
For example :
A traditional binary search in an array of size 18 , will go down like 18 -> 9 -> 4 ...
My modified binary search pings a random element and decides to remove the right part or left part based on the value.
My attempt:
let C(N) be the average number of comparisons required by a search among N elements. For simplicity, we assume that the algorithm only terminates when there is a single element left (no early termination on strict equality with the key).
As the pivot value is chosen at random, the probabilities of the remaining sizes are uniform and we can write the recurrence
C(N) = 1 + 1/N.Sum(1<=i<=N:C(i))
Then
N.C(N) - (N-1).C(N-1) = 1 + C(N)
and
C(N) - C(N-1) = 1 / (N-1)
The solution of this recurrence is the Harmonic series, hence the behavior is indeed logarithmic.
C(N) ~ Ln(N-1) + Gamma
Note that this is the natural logarithm, which is better than the base 2 logarithm by a factor 1.44 !
My bet is that adding the early termination test would further improve the log basis (and keep the log behavior), but at the same time double the number of comparisons, so that globally it would be worse in terms of comparisons.
Let us assume we have a tree of size 18. The number I am looking for is in the 1st spot. In the worst case, I always randomly pick the highest number, (18->17->16...). Effectively only eliminating one element in every iteration. So it become a linear search: O(n) time
The recursion in the answer of #Yves Daoust relies on the assumption that the target element is located either at the beginning or the end of the array. In general, where the element lies in the array changes after each recursive call making it difficult to write and solve the recursion. Here is another solution that proves O(log n) bound on the expected number of recursive calls.
Let T be the (random) number of elements checked by the randomized version of binary search. We can write T=sum I{element i is checked} where we sum over i from 1 to n and I{element i is checked} is an indicator variable. Our goal is to asymptotically bound E[T]=sum Pr{element i is checked}. For the algorithm to check element i it must be the case that this element is selected uniformly at random from the array of size at least |j-i|+1 where j is the index of the element that we are searching for. This is because arrays of smaller size simply won't contain the element under index i while the element under index j is always contained in the array during each recursive call. Thus, the probability that the algorithm checks the element at index i is at most 1/(|j-i|+1). In fact, with a bit more effort one can show that this probability is exactly equal to 1/(|j-i|+1). Thus, we have
E[T]=sum Pr{element i is checked} <= sum_i 1/(|j-i|+1)=O(log n),
where the last equation follows from the summation of harmonic series.

Median of medians algorithm: why divide the array into blocks of size 5

In median-of-medians algorithm, we need to divide the array into chunks of size 5. I am wondering how did the inventors of the algorithms came up with the magic number '5' and not, may be, 7, or 9 or something else?
The number has to be larger than 3 (and an odd number, obviously) for the algorithm. 5 is the smallest odd number larger than 3. So 5 was chosen.
I think that if you'll check "Proof of O(n) running time" section of wiki page for medians-of-medians algorithm:
The median-calculating recursive call does not exceed worst-case linear behavior because the list of medians is 20% of the size of the list, while the other recursive call recurses on at most 70% of the list, making the running time
The O(n) term c n is for the partitioning work (we visited each element a constant number of times, in order to form them into n/5 groups and take each median in O(1) time).
From this, using induction, one can easily show that
That should help you to understand, why.
You can also use blocks of size 3 or 4, as shown in the paper Select with groups of 3 or 4 by K. Chen and A. Dumitrescu (2015). The idea is to use the "median of medians" algorithm twice and partition only after that. This lowers the quality of the pivot but is faster.
So instead of:
T(n) <= T(n/3) + T(2n/3) + O(n)
T(n) = O(nlogn)
one gets:
T(n) <= T(n/9) + T(7n/9) + O(n)
T(n) = Theta(n)
See this explanation on Brilliant.org. Basically, five is the smallest possible array we can use to maintain linear time. It is also easy to implement a linear sort with an n=5 sized array. Apologies for the laTex:
Why 5?
The median-of-medians divides a list into sublists of length five to
get an optimal running time. Remember, finding the median of small
lists by brute force (sorting) takes a small amount of time, so the
length of the sublists must be fairly small. However, adjusting the
sublist size to three, for example, does change the running time for
the worse.
If the algorithm divided the list into sublists of length three, pp
would be greater than approximately \frac{n}{3} 3 n ​ elements and
it would be smaller than approximately \frac{n}{3} 3 n ​ elements.
This would cause a worst case \frac{2n}{3} 3 2n ​ recursions,
yielding the recurrence T(n) = T\big( \frac{n}{3}\big) +
T\big(\frac{2n}{3}\big) + O(n),T(n)=T( 3 n ​ )+T( 3 2n ​ )+O(n),
which by the master theorem is O(n \log n),O(nlogn), which is slower
than linear time.
In fact, for any recurrence of the form T(n) \leq T(an) + T(bn) +
cnT(n)≤T(an)+T(bn)+cn, if a + b < 1a+b<1, the recurrence will solve to
O(n)O(n), and if a+b > 1a+b>1, the recurrence is usually equal to
\Omega(n \log n)Ω(nlogn). [3]
The median-of-medians algorithm could use a sublist size greater than
5—for example, 7—and maintain a linear running time. However, we need
to keep the sublist size as small as we can so that sorting the
sublists can be done in what is effectively constant time.

Finding largest and second largest of n numbers in average n + log n comparisons

We know that the easy way to find the smallest number of a list would simply be n comparisons, and if we wanted the 2nd smallest number we could go through it again or just keep track of another variable during the first iteration. Either way, this would take 2n comparisons to find both numbers.
So suppose that I had a list of n distinct elements, and I wanted to find the smallest and the 2nd smallest. Yes, the optimal algorithm takes at most n + ceiling(lg n) - 2 comparisons. (Not interested in the optimal way though)
But suppose then that you're forced to use the easy algorithm, the one that takes 2n comparisons. In the worst case, it'd take 2n comparisons. But what about the average? What would be the average number of comparisons it'd take to find the smallest and the 2nd smallest using the easy brute force algorithm?
EDIT: It'd have to be smaller than 2n -- (copied and pasted from my comment below) I compare the index I am at to the tmp2 variable keeping track of 2nd smallest. I don't need to make another comparison to tmp1 variable keeping track of smallest unless the value at my current index is smaller than tmp2. So you can reduce the number of comparisons from 2n. It'd still take more than n though. Yes in worst case this would still take 2n comparisons. But on average if everything is randomly put in...
I'd guess that it'd be n + something comparisons, but I can't figure out the 2nd part. I'd imagine that there would be some way to involve log n somehow, but any ideas on how to prove that?
(Coworker asked me this at lunch, and I got stumped. Sorry) Once again, I'm not interested in the optimal algorithm since that one is kinda common knowledge.
As you pointed out in the comment, there is no need for a second comparison if the current element in the iteration is larger than the second smallest found so far. What is the probability for a second comparison if we look at the k-th element ?
I think this can be rephrased as follows "What is the probability that the k-th element is in the subset containing the 2 smallest elements of the first k elements?"
This should be 2/k for uniformly distributed elements, because if we think of the first k elements as an ordered list, every position has equal probability 1/k for the k-th element, but only two, the smallest and second smallest position, cause a second comparison. So the number of 2nd comparisons should be sum_k=1^n (2/k) = 2 H_n (the n-th harmonic number). This is actually the calculation of the expected value for second comparisons, where the random number represents the event that a second comparison has to be done, it is 1 if a second comparison has to be done and 0 if just one comparison has to be done.
If this is correct, the overall number of comparisons in the average case is C(n) = n + 2 H_n and afaik H_n = theta(log(n)), C(n) = theta(n + log(n)) = theta(n)

Search the biggest and second biggest number in O(n+logn) time [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Find the 2nd largest element in an array with minimum # of comparisom
May I know how to achieve searching the biggest and second biggest number in O(n+logn) time?
Thank you in advance.
Best regards,
Pidig
Note that O(n+logn) = O(n), and iterating twice on the list is O(n).
Iterate once to find the max and remove/mark it,
Then iterating the second time to find the new max (second biggest
element),
Because it iterate constant number of times on the array, the algorithm is O(n).
For general purpose k largest elements: You can do it using a min heap in O(nlogk), or selection algorithm in O(n) - as described in this answer, but for 2 greatest elements - these methods are overkill.
I guess you mean n + log(n) - 2 comparisons.
Here is how you do it.
Compare elements in groups of two. i.e. make n/2 groups of two elements each.
Continue in this way with n/4, n/8, n/16 and so on till you get the first(largest) element.
Now the next largest element has to be amongst the losers of the first element in this method. Hence log(n) more comparisons for this.
Precisely this will take n + log(n) - 2 comparisons.
You can actually do that in O(n) time: Selection Algorithm

Resources