Running time: Finding k smallest element using Selection Sort - algorithm

I suppose answer is kn? But when I try drawing the tree, it looked like
So I must have done something wrong in the more detailed analysis?

First, your work list has length k+2 when it should probably have length k. My guess is that you meant to run from n to n-(k-1) = n-k+1.
Now if you want to sum consecutive numbers, the easiest is to remember (or derive) the formula
1 + 2 + ... + a = a(a+1)/2
Use this to figure out that the sum you're after is
n(n+1)/2 - (n-k)(n-k+1)/2 = nk + (k-k^2)/2
as you correctly found. Now, think about big O. Since n>k, we know nk > k^2, so the latter term is really a lower term, and the whole thing is O(nk).

Related

Binary vs Linear searches for unsorted N elements

I try to understand a formula when we should use quicksort. For instance, we have an array with N = 1_000_000 elements. If we will search only once, we should use a simple linear search, but if we'll do it 10 times we should use sort array O(n log n). How can I detect threshold when and for which size of input array should I use sorting and after that use binary search?
You want to solve inequality that rougly might be described as
t * n > C * n * log(n) + t * log(n)
where t is number of checks and C is some constant for sort implementation (should be determined experimentally). When you evaluate this constant, you can solve inequality numerically (with uncertainty, of course)
Like you already pointed out, it depends on the number of searches you want to do. A good threshold can come out of the following statement:
n*log[b](n) + x*log[2](n) <= x*n/2 x is the number of searches; n the input size; b the base of the logarithm for the sort, depending on the partitioning you use.
When this statement evaluates to true, you should switch methods from linear search to sort and search.
Generally speaking, a linear search through an unordered array will take n/2 steps on average, though this average will only play a big role once x approaches n. If you want to stick with big Omicron or big Theta notation then you can omit the /2 in the above.
Assuming n elements and m searches, with crude approximations
the cost of the sort will be C0.n.log n,
the cost of the m binary searches C1.m.log n,
the cost of the m linear searches C2.m.n,
with C2 ~ C1 < C0.
Now you compare
C0.n.log n + C1.m.log n vs. C2.m.n
or
C0.n.log n / (C2.n - C1.log n) vs. m
For reasonably large n, the breakeven point is about C0.log n / C2.
For instance, taking C0 / C2 = 5, n = 1000000 gives m = 100.
You should plot the complexities of both operations.
Linear search: O(n)
Sort and binary search: O(nlogn + logn)
In the plot, you will see for which values of n it makes sense to choose the one approach over the other.
This actually turned into an interesting question for me as I looked into the expected runtime of a quicksort-like algorithm when the expected split at each level is not 50/50.
the first question I wanted to answer was for random data, what is the average split at each level. It surely must be greater than 50% (for the larger subdivision). Well, given an array of size N of random values, the smallest value has a subdivision of (1, N-1), the second smallest value has a subdivision of (2, N-2) and etc. I put this in a quick script:
split = 0
for x in range(10000):
split += float(max(x, 10000 - x)) / 10000
split /= 10000
print split
And got exactly 0.75 as an answer. I'm sure I could show that this is always the exact answer, but I wanted to move on to the harder part.
Now, let's assume that even 25/75 split follows an nlogn progression for some unknown logarithm base. That means that num_comparisons(n) = n * log_b(n) and the question is to find b via statistical means (since I don't expect that model to be exact at every step). We can do this with a clever application of least-squares fitting after we use a logarithm identity to get:
C(n) = n * log(n) / log(b)
where now the logarithm can have any base, as long as log(n) and log(b) use the same base. This is a linear equation just waiting for some data! So I wrote another script to generate an array of xs and filled it with C(n) and ys and filled it with n*log(n) and used numpy to tell me the slope of that least squares fit, which I expect to equal 1 / log(b). I ran the script and got b inside of [2.16, 2.3] depending on how high I set n to (I varied n from 100 to 100'000'000). The fact that b seems to vary depending on n shows that my model isn't exact, but I think that's okay for this example.
To actually answer your question now, with these assumptions, we can solve for the cutoff point of when: N * n/2 = n*log_2.3(n) + N * log_2.3(n). I'm just assuming that the binary search will have the same logarithm base as the sorting method for a 25/75 split. Isolating N you get:
N = n*log_2.3(n) / (n/2 - log_2.3(n))
If your number of searches N exceeds the quantity on the RHS (where n is the size of the array in question) then it will be more efficient to sort once and use binary searches on that.

Big O Recursive Method

I have a method called binary sum
Algorithm BinarySum(A, i, n):
Input: An array A and integers i and n
Output: The sum of the n integers in A starting at index i
if n = 1 then
return A[i]
return BinarySum(A, i, n/ 2) + BinarySum(A, i + n/ 2, n/ 2)
Ignoring the fact of making a simple problem complicated I have been asked to find the Big O. Here is my thought process. For an array of size N I will be making 1 + 2 + 4 .. + N recursive calls. This is close to half the sum from 1 to N so I will say it is about N(N + 1)/4. After making this many calls now I need to add them together. So once again I need to perform N(N+1)/4 additions. Adding them together we are left with N^2 as the dominate term.
So would the big O of this algorithm be O(N^2)? Or am I doing something wrong. It feels strange to have binary recursion and not have a 2^n or log n in the final answer
There are in-fact 2^n and log n terms in the final result... sort of.
For each call to a sub-array of length n, two recursive calls are made to both halves of this array, plus a constant amount of work (if-statement, addition, pushing onto the call stack etc). Thus the recurrence relation is given by:
At this point we could just use the Master theorem to directly arrive at the final result - O(n). But let's instead derive it by repeated expansion:
The stopping condition n = 1 gives the maximum value of m (ignoring rounding):
In step (*) we used the standard formula for geometric series. So as you can see the answer does involve log n and 2^n terms in a sense, but they "cancel" out to give a simple linear term, which is the same as for a simple loop.

Stuck at Algorithm pseudocode generation

I do not know what to do next (and even if my approach is correct) in the following problem:
Part 1
Part 2
I have just figured out that a possible MNT (for part a) is to get a jar, test if it breaks from height h, if so then there's the answer, if not, height+1 and keep looping.
For part b is the following. Since we know max height equals n, then we start from n (current height = n). Therefore we go from top to bottom adding to our broken jar count (they are supposed to break if you start from top) until the jars stop breaking. Then the number would be current height + 1 (because we need to go back one index).
For part c, I don't even know what my approach would be, since I am assuming that the order of the algorithm is O(n^c) where c is a fraction. I also know that O(n^c) is faster than O(n).
I also noted that there is a problem similar to this one online, but it talks about rungs instead of a robotic arm. Maybe it is similar? Here is the link
Do you have any recommendations/clues? Any help will be appreciated.
Thank you for your time and help in advance.
Cheers!
This is an answer for part (c).
The idea is to find some number k and apply the following scheme:
Drop a jar from height k:
If it breaks, drop the other one from k-1 down to 1 until we find the height that it breaks in, in no more than k tries.
If it doesn't break, drop it again from height k + (k-1). Again, if it breaks drop the other one from (k+(k-1)-1) down to k+1, otherwise continue to (k + (k-1) + (k-2)).
Continue this until you find the height
(of course if at some point you need to jump to a height greater than n, you just jump to n).
This scheme ensures we'll use at most k tries. So now the question is how to find a minimal k (as a function of n), for which the scheme will work. Since, at every step, we reduce by 1 our height advancement, the following equation must hold:
k + (k-1) + (k-2) + ... + 1 >= n
Otherwise will "run out" of steps before reaching n. We want to find the smallest k for which the inequality holds.
There's a formula to the sum:
1 + 2 + ... + k = k(k+1)/2
Using that we get the equation:
k(k+1)/2 = n ===> k^2 + k - 2n = 0
Solving this (and if it's not integral take the ceiling of it) will give us k. Quadratic equations might have two solutions, but ignoring the negative one you get:
k = (-1 + sqrt(1 + 8n))/2
Looking for the complexity, we can ignore everything but the n, which has an exponent of 1/2 (since we're taking its square root). That is actually better then the requested complexity of n to power of 2/3.
For part (a) you can use binary search over height. pseudo code for the same is below :
lo = 0
hi = n
while(lo<hi) {
mid = lo +(hi-lo)/2;
if(galss_breaks(mid)) {
hi = mid-1;
} else {
lo = mid;
}
}
'lo' will contain the maximum possible height in minimum possible trials. It will take log(n) steps in worst case whereas your approach may take N steps in worst case.
For part(b) ,
you can use your approach a, start from the minimum height and increase height by 1 until the glass breaks. This will at most break 1 glass to determine the required height.

Computing number of permutations of two values, with a restriction on runs

I was thinking about ways to solve this other question about counting the number of values whose digits sum to a target, and decided to try the case where the range was of the form [0, n^base). So essentially you get N independent digits to work with, which is a simpler problem.
The number of ways N natural numbers can sum to a target T is easy to compute. If you think of it as placing N-1 dividers among T sticks, you should see the answer is (T+N-1)!/(T!(N-1)!).
However, our N natural numbers are restricted to [0, base) and so there will be fewer possibilities. I want to find a simple formula for this case as well.
The first thing I considered was deducting the number of possibilities where 'base' of the sticks had been replaced with a 'big stick'. Unfortunately, some possibilities are double counted because they have multiple places a 'big stick' could be inserted.
Any ideas?
You can use generating functions.
Assuming that the order matters, then you are looking for the coefficient of x^T in
(1 + x + x^2 + ... + x^b)(1 + x + x^2 + .. + x^b) ... n times
= (x^(b+1) - 1)^n/(x-1)^n
Using binomial theorem (works even for -n), you should be able to write you answer as a sum of products of binomial coefficients.
Let b+1 = B.
Using binomial theorem we have
(x^(b+1) - 1)^n = Sum_{r=0}^{n} (-1)^(n-r)* (n choose r) x^(Br)
1/(x-1)^n = Sum (n+s-1 choose s) x^s
So the answer we need is:
Sum (-1)^(n-r) * (n choose r)*(n+s-1 choose s)
for any r and s subject to the condition that
Br + s = T.

O(NlogN) finding 3 numbers that have a sum of any arbitrary T in an array

Given an array A of integers, find any 3 of them that sum to any given T.
I saw this on some online post, which claims it has a O(NlogN) solution.
For 2 numbers, I know hashtable could help for O(N), but for 3 numbers, I cannot find one.
I also feel this problem sounds familar to some hard problems, but cannot recall the name and therefore cannot google for it. (While the worst is obviously O(N^3), and with the solution to 2 numbers it is really O(N^2) )
It does not really solve anything in the real world, just bugs me..
Any idea?
I think your problem is equivalent to the 3SUM problem.
For three sum problem, you cannot find a solution better than O(n^2). You can refer to http://en.wikipedia.org/wiki/List_of_unsolved_problems_in_computer_science
2SUM problem can be solved in O(nlgn) time.
First sort the array which takes at most O(nlgn) operations. Now at ith iteration we picked the element a[i] and find the the element -a[i] in the remaining part of the array (i.e from i+1 to n-1) and this search could be conducted in binary search which takes at most lgn time. So overall it will take O(nlgn) operations.
But 3SUM problem cant be solved in O(nlgn) time . We could reduce it to O(n^2)
Sounds like a homework question...
If you can find two values that sum to N but you want to extend the search to three values, couldn't you, for each value M in the set, look for two values that sum to (N - M)? If you can find two values that sum to a specific value in O(log N) time, then that will be O(N log N).
I think this is just the subset sum problem
If so, it is NP-Complete.
EDIT: Never mind, it is 3sum, as stated in another answer.
Yes! 3SUM has an O(nlogn) algorithms using Fast Fourier Transform(FFT), here is a general idea:
Lifted directly from https://en.wikipedia.org/wiki/3SUM
sort(S);
for i=0 to n-3 do
a = S[i];
start = i+1;
end = n-1;
while (start < end) do
b = S[start];
c = S[end];
if (a+b+c == 0) then
output a, b, c;
// Continue search for all triplet combinations summing to zero.
start = start + 1
end = end - 1
else if (a+b+c > 0) then
end = end - 1;
else
start = start + 1;
end
end
end

Resources