Find common keys in two sorted arrays that prints them in o(sqrt n) - algorithm

How do I write pseudocode for the following problem?
Given two sorted arrays X and Y with lg n and n keys respectively. I need a space and time efficient algorithm that finds the keys common to X and Y and prints them. It should run in o(sqrt n), i.e. (small 'o')time.
My attempt: Do you think binary search would be an option ?

Here is what I make of it. There are two sorted arrays with log(n) and n elements. You need to find the elements common to both.
You can iterate through X and binary search for each element of X in Y. If the search is successful, print the element. That would be in f(x) = c * (log(n))^2 time, where c is some constant.
As for every k > 0, you can find a constant a such that f(x) < k * sqrt(n) holds for all x > a, hence this solution is o(sqrt(n)).
EDIT: Here is the pseudocode (pretty simple):
input X
input Y
n = number of elements in Y
for i = 0 to n:
if(binary_search(X[i] in Y) = found)print X[i]

For Shikhar Gupta's solution,I have one improvement. Shikhar's solution doesn't make use of the fact that X is sorted too.So through every iteration,we can reduce the lower or higher boundary of Y.This can reduce the run time as well.
In order to prove O((log(n))^2) < O(sqrt(n)),we only need to prove the derivative of the first is smaller than the second.Which means 2log(n)/n < 1/sqrt(n).Then we have to prove log(n) < sqrt(n).This is pretty tricky.

Related

What kind of search algorithm could you use to solve this problem?

Given an array of n integers A[0…n−1], such that ∀i,0≤i<n, we have that |A[i]−A[i+1]|≤1, and given x=A[0], y=A[n−1], we have that x<y. Locate, in O(log n), any index j such that A[j]=z, for a given value of z, x≤ z ≤y.
As I see it, the problem restricts that the -difference or "distance"- between the elements that are adjacent to each other is at most 1, but as I understand it, that does not mean that the array is ordered, it is true that there are sub-sequences, of any size, which can follow an ascending or descending order but precisely for this reason I do not see a clear way to use an algorithm that can search for the index in O(log n).
I was thinking of examples that would meet the constraints:
[1,2,1,2]
[-1,0,1,2,2,1,0,1,2,3]
[4,5,6,7,8,9,10,11,12,13,14,13,12,11,10]
[4,3,2,1,0,1,2,3,4,5]
I really don't know if I'm understanding the problem correctly :C.Could someone give me a hint please?
Even though the array is not sorted, you can solve it similarly to binary search, by checking in the middle (let it be m), and then:
If z == m - you are done.
If z < m - divide and conquer: check on the first half of the array since x <= z < m
If m < z - divide and conquer: check on the second half of the array since m < z <= y
It is easy to prove this works by using the invariant that in each iteration x <= z <= y, and that the algorithm is guaranteed to end (since size is shrinking) - giving you a guaranteed end when x == z == y (at most).

Given a number N and an array A. Check if N can be expressed as a product of one or more array elements

Given a number N (where N <= 10^18) and an array A(consisting of at most 20 elements). I have to tell if it is possible to form N by multiplying some elements of the array. Note that, I can use any element multiple times.
Example: N = 8 and A = {2, 3}. Here, 8 = 2 * 2 * 2. So the answer is YES. But if N = 15, then I can't form 15 as a product of one or more elements using them any number of times. So in this case the answer is NO.
How can I approach this problem?
Simple pseudocode:
A_divisors = set()
for x in A:
if num % x == 0:
A_divisors.add(x)
candidates = A_divisors.clone()
seen = set()
while(candidates.size()):
size = divisors.size()
new_candidates = set()
for y in candidates:
for x in A_divisors:
if num % (x * y) == 0 and (x * y) not in seen:
new_candidates.add(x * y)
seen.add(x * y)
if x * y == num:
return true
candidates = new_candidates
return false
Complexity: O(|A| * k * log k), with k being amount of divisors. The log k would be the cost of adding and checking if element is present in the set. With a hash based approach it would be O(1) and can be removed. I am also assuming %, * operations to be O(1).
Since you show no code or algorithm, I'll just give one idea. If you want more help, please show more of your own work on the problem.
Note that N can be at most 60 bits long. This is small enough that N could be decomposed into its prime factors pretty quickly. So first work up a good factoring algorithm for numbers of that size.
Your algorithm would factor N and each of the elements in your array A. If there is any prime factor of N that does not divide into any element of A then your answer is NO. This is the case in your example of N = 15.
Now you work with the prime factors and their exponents in N and in the elements of A. Now you want to find a subset (or, more properly, a sub-multiset) of A where the exponents for each prime add up to that in N. This greatly reduces the sizes of your numbers thus makes the problem easier.
That last part is not trivial. Work more on this problem and show us some of your work, then we can continue helping you.
You can follow below approach:
Form 2 queues: Q2 and Q3.
Add 2 in Q2 and 3 in Q3.
Get the minimum of the head of both queues, lets say h. Remove h from the corresponding queue. Check if it is equal to the number N. If yes, return true. If it is greater than N, return false.
If it is less than N, then add 2*h in Q2 and 3*h in Q3. Repeat steps 3 to 4.
Please note that when the minimum h comes from Q3, you need not to add 2*h into Q2. That is because you already have added that element in Q3 before. (I will leave it for you to deduce}. Keep on doing this procedure until your h is greater than N.
If you have more such numbers, you can form there queues as well. I think this is an optimal solution in case you have more numbers to process.
Can you guess the time and space complexity of this?

Binary vs Linear searches for unsorted N elements

I try to understand a formula when we should use quicksort. For instance, we have an array with N = 1_000_000 elements. If we will search only once, we should use a simple linear search, but if we'll do it 10 times we should use sort array O(n log n). How can I detect threshold when and for which size of input array should I use sorting and after that use binary search?
You want to solve inequality that rougly might be described as
t * n > C * n * log(n) + t * log(n)
where t is number of checks and C is some constant for sort implementation (should be determined experimentally). When you evaluate this constant, you can solve inequality numerically (with uncertainty, of course)
Like you already pointed out, it depends on the number of searches you want to do. A good threshold can come out of the following statement:
n*log[b](n) + x*log[2](n) <= x*n/2 x is the number of searches; n the input size; b the base of the logarithm for the sort, depending on the partitioning you use.
When this statement evaluates to true, you should switch methods from linear search to sort and search.
Generally speaking, a linear search through an unordered array will take n/2 steps on average, though this average will only play a big role once x approaches n. If you want to stick with big Omicron or big Theta notation then you can omit the /2 in the above.
Assuming n elements and m searches, with crude approximations
the cost of the sort will be C0.n.log n,
the cost of the m binary searches C1.m.log n,
the cost of the m linear searches C2.m.n,
with C2 ~ C1 < C0.
Now you compare
C0.n.log n + C1.m.log n vs. C2.m.n
or
C0.n.log n / (C2.n - C1.log n) vs. m
For reasonably large n, the breakeven point is about C0.log n / C2.
For instance, taking C0 / C2 = 5, n = 1000000 gives m = 100.
You should plot the complexities of both operations.
Linear search: O(n)
Sort and binary search: O(nlogn + logn)
In the plot, you will see for which values of n it makes sense to choose the one approach over the other.
This actually turned into an interesting question for me as I looked into the expected runtime of a quicksort-like algorithm when the expected split at each level is not 50/50.
the first question I wanted to answer was for random data, what is the average split at each level. It surely must be greater than 50% (for the larger subdivision). Well, given an array of size N of random values, the smallest value has a subdivision of (1, N-1), the second smallest value has a subdivision of (2, N-2) and etc. I put this in a quick script:
split = 0
for x in range(10000):
split += float(max(x, 10000 - x)) / 10000
split /= 10000
print split
And got exactly 0.75 as an answer. I'm sure I could show that this is always the exact answer, but I wanted to move on to the harder part.
Now, let's assume that even 25/75 split follows an nlogn progression for some unknown logarithm base. That means that num_comparisons(n) = n * log_b(n) and the question is to find b via statistical means (since I don't expect that model to be exact at every step). We can do this with a clever application of least-squares fitting after we use a logarithm identity to get:
C(n) = n * log(n) / log(b)
where now the logarithm can have any base, as long as log(n) and log(b) use the same base. This is a linear equation just waiting for some data! So I wrote another script to generate an array of xs and filled it with C(n) and ys and filled it with n*log(n) and used numpy to tell me the slope of that least squares fit, which I expect to equal 1 / log(b). I ran the script and got b inside of [2.16, 2.3] depending on how high I set n to (I varied n from 100 to 100'000'000). The fact that b seems to vary depending on n shows that my model isn't exact, but I think that's okay for this example.
To actually answer your question now, with these assumptions, we can solve for the cutoff point of when: N * n/2 = n*log_2.3(n) + N * log_2.3(n). I'm just assuming that the binary search will have the same logarithm base as the sorting method for a 25/75 split. Isolating N you get:
N = n*log_2.3(n) / (n/2 - log_2.3(n))
If your number of searches N exceeds the quantity on the RHS (where n is the size of the array in question) then it will be more efficient to sort once and use binary searches on that.

Find The number of sides of triangle

I am giving a Array A consists of integer, I have to choose any two integer from array A and any third integer from range [L,R] , such that all three integer forms a valid triangle.
I have to find the number of integer in range [L,R] which can be used to form a valid triangle, by choosing any two values from array A.
I know if i know two sides then third side must be range a-b<x<a+b
where a and b are any two integer from A.
How to find the number of valid integer in [L,R] in O(N) N= Size of A time.
L and R and be very large upto 10^20
To my understanding, complete enumeration will yield an efficient solution by the following argument. The maximum number of possible solutions occurs in the case where any selection of elements of A yields a consistent choice of side lengths for a triangle. Such an input, for instance, would be an input of N times the value 1. However, the number of triples chosen from A can be bounded by
(n choose 3) = n!/(3!(n-3))
= n!/(6(n-3)!)
= (1/6)*(n-2)*(n-1)*n
<= n^3
(where choose is meant to denote the binomial coefficient) which is a polynomial number of choices. Any choice can be checked for validity in constant time, as only 3 values are involved.
Now the contest has ended, so here is my way to solve it in the contest.
The problem is asking how many number x in [L,R]can form triangle with any pair (a_i, a_j) in A.
Naive method is to brute all pairs which is O(N^2 * (R-L+1)) = O(N^2 * (R-L))
But indeed, we do not need to test all O(N^2) pairs, we only need to test O(N) pairs IF A is sorted, namely, all adjacent pair (a_i-1, a_i) for i > 0
Why? Because in sorted A:
If there is some pair (a_j, a_i) where a_j < a_i and j != i-1, which can form triangle with x
Then (a_i-1, a_i) must form a triangle with x too:
a_i-1 + a_i > a_j + a_i > x
x + a_i-1 > x + a_j > a_i
x + a_i > x + a_i-1 > a_j
Therefore checking all (a_i-1, a_i) is sufficient, this is the first core idea to solve this problem.
So now we have a O(NlgN + N*(R-L+1)) = O(N*(R-L)) algorithm.
For the original contest problem, it is still too slow as R-L+1 is as large as 10^18, so we need another tricks.
Note that actually by the triangle inequality, for a pair (a_i-1, a_i), indeed we can find a range of x which can form triangle with this pair:
a_i-1 + a_i > x > a_i - a_i-1 (By above 1 & 2)
For example, (4,5) can form triangle with all 1 < x < 9
So instead of iterating all x in [L,R], we try to find range of x for each pair, which can be done in O(N) as we know the range for x of a pair in O(1). Beware that x must fall in the range of [L,R].
After that we have O(N) ranges / segments of x which then we can union them and the size of the result set is the desired answer.
Union O(N) segments can be done easily in O(N) as well, I leave you as a homework :)
Combine both tricks, the algorithm is O(N lg N + N + N) = O(N lg N)

Find pairs with given difference

Given n, k and n number of integers. How would you find the pairs of integers for which their difference is k?
There is a n*log n solution, but I cannot figure it out.
You can do it like this:
Sort the array
For each item data[i], determine its two target pairs, i.e. data[i]+k and data[i]-k
Run a binary search on the sorted array for these two targets; if found, add both data[i] and data[targetPos] to the output.
Sorting is done in O(n*log n). Each of the n search steps take 2 * log n time to look for the targets, for the overall time of O(n*log n)
For this problem exists the linear solution! Just ask yourself one question. If you have a what number should be in the array? Of course a+k or a-k (A special case: k = 0, required an alternative solution). So, what now?
You are creating a hash-set (for example unordered_set in C++11) with all values from the array. O(1) - Average complexity for each element, so it's O(n).
You are iterating through the array, and check for each element Is present in the array (x+k) or (x-k)?. You check it for each element, in set in O(1), You check each element once, so it's linear (O(n)).
If you found x with pair (x+k / x-k), it is what you are looking for.
So it's linear (O(n)). If you really want O(n lg n) you should use a set on tree, with checking is_exist in (lg n), then you have O(n lg n) algorithm.
Apposition: No need to check x+k and x-k, just x+k is sufficient. Cause if a and b are good pair then:
if a < b then
a + k == b
else
b + k == a
Improvement: If you know a range, you can guarantee linear complexity, by using bool table (set_tab[i] == true, when i is in table.).
Solution similar to one above:
Sort the array
set variables i = 0; j = 1;
check the difference between array[i] and array[j]
if the difference is too small, increase j
if the difference is too big, increase i
if the difference is the one you're looking for, add it to results and increase j
repeat 3 and 4 until the end of array
Sorting is O(n*lg n), the next step is, if I'm correct, O(n) (at most 2*n comparisons), so the whole algorithm is O(n*lg n)

Resources