Find the middle element in merged arrays in O(logn) - algorithm

We have two sorted arrays of the same size n. Let's call the array a and b.
How to find the middle element in an sorted array merged by a and b?
Example:
n = 4
a = [1, 2, 3, 4]
b = [3, 4, 5, 6]
merged = [1, 2, 3, 3, 4, 4, 5, 6]
mid_element = merged[(0 + merged.length - 1) / 2] = merged[3] = 3
More complicated cases:
Case 1:
a = [1, 2, 3, 4]
b = [3, 4, 5, 6]
Case 2:
a = [1, 2, 3, 4, 8]
b = [3, 4, 5, 6, 7]
Case 3:
a = [1, 2, 3, 4, 8]
b = [0, 4, 5, 6, 7]
Case 4:
a = [1, 3, 5, 7]
b = [2, 4, 6, 8]
Time required: O(log n). Any ideas?

Look at the middle of both the arrays. Let's say one value is smaller and the other is bigger.
Discard the lower half of the array with the smaller value. Discard the upper half of the array with the higher value. Now we are left with half of what we started with.
Rinse and repeat until only one element is left in each array. Return the smaller of those two.
If the two middle values are the same, then pick arbitrarily.
Credits: Bill Li's blog

Quite interesting task. I'm not sure about O(logn), but solution O((logn)^2) is obvious for me.
If you know position of some element in first array then you can find how many elements are smaller in both arrays then this value (you know already how many smaller elements are in first array and you can find count of smaller elements in second array using binary search - so just sum up this two numbers). So if you know that number of smaller elements in both arrays is less than N, you should look in to the upper half in first array, otherwise you should move to the lower half. So you will get general binary search with internal binary search. Overall complexity will be O((logn)^2)
Note: if you will not find median in first array then start initial search in the second array. This will not have impact on complexity

So, having
n = 4 and a = [1, 2, 3, 4] and b = [3, 4, 5, 6]
You know the k-th position in result array in advance based on n, which is equal to n.
The result n-th element could be in first array or second.
Let's first assume that element is in first array then
do binary search taking middle element from [l,r], at the beginning l = 0, r = 3;
So taking middle element you know how many elements in the same array smaller, which is middle - 1.
Knowing that middle-1 element is less and knowing you need n-th element you may have [n - (middle-1)]th element from second array to be smaller, greater. If that's greater and previos element is smaller that it's what you need, if it's greater and previous is also greater we need to L = middle, if it's smaller r = middle.
Than do the same for the second array in case you did not find solution for first.
In total log(n) + log(n)

Related

The best solution (considering time complexity) for the function implementation

A function does the following task:
For example L = [[1, 2, 3], [1, 2], [1, 2, 3, 5, 6, 8], [1, 8, 6, 10, 21], [1, 4, 6, 9], [22]]; (array of arrays)
find out the index number of L such that all digit numbers in the value(sub-array) don't appear in other sub-arrays. In this example, the function would return 5 (the index of [22]) because 22 is only in this sub-array.
What could be the optimal solution in time complexity
The algorithm is to keep track of all the numbers you've seen so far (for example in a hashset), and process the sub-arrays one by one until you find one which matches your condition. In the worst case it's O(n) basic set operations, where n is the sum of the lengths of the subarrays of L. This is O(n) comparisons on average if you use a hashset.

What does this line mean in Sorting Algo(Bubble example)

def bubbleSort(array):
swapped = False
for i in range(len(array)-1,0,-1):
print(i)
for j in range(i):
print(j)
if array[j]>array[j+1]:
array[j], array[j+1] = array[j+1], array[j]
swapped= True
if swapped:
swapped=False
else:
break
print(array)
bubbleSort([5, 2, 1, 3])
How should I interpret this line: for i in range(len(array)-1,0,-1)? I'm particularly confused about the need for the 0 and -1 parameters.
That line has a couple of things happening, which I will give simplified explanations of (I'm assuming this code is written in Python).
First, for i in iterable will loop through iterable, meaning the code in the for loop will repeat as many times as there are elements in iterable, which could be an array, a list, a string etc, and each time it loops, i will be the next element of iterable, starting with the first. For example, for i in [1, 2, 3] will loop 3 times; the first time, i will be equal to 1; the second, 2, etc.
Next, the range function produces an iterable that is a range of numbers, for example from 0-9. With a single argument, range will produce a range from 0 to that number, but stopping just before it, e.g. range(5) will give you [0, 1, 2, 3, 4]. Thus if you were to use for i in range(5), your code would repeat 5 times, with i incrementing from 0 to 4.
With two arguments, the range will start at the first and stop before the second, which must be greater than the first. For example, range(3, 8) would give you [3, 4, 5, 6, 7]. range(8, 3), however, will not work, as the start number is greater than the stop number. This means you cannot count down with only 2 arguments.
The third optional argument for range is the step size; how much you want the numbers to increase or decrease by each step. For example, range(0, 10, 2) will give you the output [0, 2, 4, 6, 8], stopping before 10. Here is where you can produce a descending range, by setting the step argument to a negative number. range(10, 0, -2) will give you [10, 8, 6, 4, 2], again stopping before the second argument, and range(10, 0, -1) will give you the full [10, 9, 8, 7, 6, 5, 4, 3, 2, 1].
Finally, the len(iterable) function will give you the length of whatever you give it, or the number of items contained in say a list. For example len("Hello!") will give you 6, and len([1, 2, 3, 4, 5]) will give you 5.
Putting this all together, the line for i in range(len(array)-1, 0, -1) will do the following:
the code will repeat as many times as there are items in a list, with i taking on each value in the list
that list is a range of numbers
that start number of the range is the length of array minus one
the end of the range is 0
the range is descending, with a step size of -1
Thus if array were ["fish", "banana", "pineapple", "onion"], len(array) will return 4, so you will have for i in range(3, 0, -1), which will loop 3 times, with i being 3, then 2, then 1.
This was a rather simplified answer, so I suggest you find some tutorials on any functions you don't understand.

How to find maximum sum of smallest and second smallest elements chosen from all possible subarrays

Given an array, find maximum sum of smallest and second smallest elements chosen from all possible subarrays. More formally, if we write all (nC2) subarrays of array of size >=2 and find the sum of smallest and second smallest, then our answer will be maximum sum among them.
Examples: Input : arr[] = [4, 3, 1, 5, 6] Output : 11`
Subarrays with smallest and second smallest are,
[4, 3] smallest = 3 second smallest = 4
[4, 3, 1] smallest = 1 second smallest = 3
[4, 3, 1, 5] smallest = 1 second smallest = 3
[4, 3, 1, 5, 6] smallest = 1 second smallest = 3
[3, 1] smallest = 1 second smallest = 3
[3, 1, 5] smallest = 1 second smallest = 3
[3, 1, 5, 6] smallest = 1 second smallest = 3
[1, 5] smallest = 1 second smallest = 5
[1, 5, 6] smallest = 1 second smallest = 5
[5, 6] smallest = 5 second smallest = 6
Maximum sum among all above choices is, 5 + 6 = 11
This question is on GFG but I didn't understand its explanation.
Please anybody gives its solution in O(n) time complexity.
The question is:
Why are we guaranteed to always find the maximum sum if we only look at all subarrays of length two?
To answer that question, lets assume we have some array A. Inside that array, obviously, there has to be at least one subarray S for which the smallest and second smallest elements, let's call them X and Y, sum up to our result.
If these two elements are already next to each other, this means that there is a subarray of A of length two that will contain X and Y, and thus, if we only look at all the subarrays of length two, we will find X and Y and output X+Y.
However, the question is: Is there any way for our two elements X and Y to not be "neighbors" in S? Well, if this was the case, there obviously would need to be other numbers, lets call them Z0, Z1, ..., between them.
Obviously, for all these values, it would have to hold that Zi >= X and Zi >= Y, because in S, X and Y are the smallest and second smallest elements, so there can be no other numbers smaller than X or Y.
If any of the Zi were bigger than X or Y, this would mean that there would be a subarray of A that only included this bigger Zi plus its neighbor. In this subarray, Zi and its neighbor would be the smallest and second smallest elements, and they would sum up to a larger sum than X+Y, so our subarray S would not have been the subarray giving us our solution. This is a contradiction to our definition of S, so this can not happen.
So, all the Zi can not be smaller than X or Y, and they can not be bigger than X or Y. This only leaves one possibility: For X == Y, they could all be equal. But, in this case, we obviously also have a subarray of length 2 that sums up to our correct result.
So, in all cases, we can show that there has to be a subarray of length two where both elements sum up to our result, which is why the algorithm is correct.
At first place you didn't understand the question! if you consider all sub-arrays carefully, at the end you can see all sub-arrays are related; in other words we are considering the result of previous sub-array into the current sub-array
Subarrays with smallest and second smallest are,
[4, 3] smallest = 3 second smallest = 4
[4, 3, 1] smallest = 1 second smallest = 3
[4, 3, 1, 5] smallest = 1 second smallest = 3
[4, 3, 1, 5, 6] smallest = 1 second smallest = 3
[3, 1] smallest = 1 second smallest = 3
[3, 1, 5] smallest = 1 second smallest = 3
[3, 1, 5, 6] smallest = 1 second smallest = 3
[1, 5] smallest = 1 second smallest = 5
[1, 5, 6] smallest = 1 second smallest = 5
[5, 6] smallest = 5 second smallest = 6
Maximum sum among all above choices is, 5 + 6 = 11
From each subarray:
1. we are taking the current largest value and if it is greater than previous largest we are replacing, and previous largest eventually becomes second most largest.
2. we are repeating this steps for every possible subarray (increasing in terms of index).
3. and at the end you can see, we are taking first-most and second-most value from the array.
so checking every-pair of values from the array reduced your overall complexity in O(N)
int res = arr[0] + arr[1]; //O(1)+O(1)
for (int i=1; i<N-1; i++) // O(N-2) -> O(N)
res = max(res, arr[i] + arr[i+1]); //O(1)+O(1)+O(1)
Overall complexity: O(N).

Finding minimum element to the right of an index in an array for all indices

Given an array, I wish to find the minimum element to the right of the current element at i where 0=<i<n and store the index of the corresponding minimum element in another array.
For example, I have an array A ={1,3,6,7,8}
The result array would contain R={1,2,3,4} .(R array stores indices to min element).
I could only think of an O(N^2) approach.. where for each element in A, I would traverse the remaining elements to right of A and find the minimum.
Is it possible to do this in O(N)? I want to use the solution to solve another problem.
You should be able to do this in O(n) by filling the array from the right hand side and maintaining the index of the current minimum, as per the following pseudo-code:
def genNewArray (oldArray):
newArray = new array[oldArray.size]
saveIndex = -1
for i = newArray.size - 1 down to 0:
newArray[i] = saveIndex
if saveIndex == -1 or oldArray[i] < oldArray[saveIndex]:
saveIndex = i
return newArray
This passes through the array once, giving you the O(n) time complexity. It can do this because, once you've found a minimum beyond element N, it will only change for element N-1 if element N is less than the current minimum.
The following Python code shows this in action:
def genNewArray (oldArray):
newArray = []
saveIndex = -1
for i in range (len (oldArray) - 1, -1, -1):
newArray.insert (0, saveIndex)
if saveIndex == -1 or oldArray[i] < oldArray[saveIndex]:
saveIndex = i
return newArray
oldList = [1,3,6,7,8,2,7,4]
x = genNewArray (oldList)
print "idx", [0,1,2,3,4,5,6,7]
print "old", oldList
print "new", x
The output of this is:
idx [0, 1, 2, 3, 4, 5, 6, 7]
old [1, 3, 6, 7, 8, 2, 7, 4]
new [5, 5, 5, 5, 5, 7, 7, -1]
and you can see that the indexes at each element of the new array (the second one) correctly point to the minimum value to the right of each element in the original (first one).
Note that I've taken one specific definition of "to the right of", meaning it doesn't include the current element. If your definition of "to the right of" includes the current element, just change the order of the insert and if statement within the loop so that the index is updated first:
idx [0, 1, 2, 3, 4, 5, 6, 7]
old [1, 3, 6, 7, 8, 2, 7, 4]
new [0, 5, 5, 5, 5, 5, 7, 7]
The code for that removes the check on saveIndex since you know that the minimum index for the last element can be found at the last element:
def genNewArray (oldArray):
newArray = []
saveIndex = len (oldArray) - 1
for i in range (len (oldArray) - 1, -1, -1):
if oldArray[i] < oldArray[saveIndex]:
saveIndex = i
newArray.insert (0, saveIndex)
return newArray
Looks like HW. Let f(i) denote the index of the minimum element to the right of the element at i. Now consider walking backwards (filling in f(n-1), then f(n-2), f(n-3), ..., f(3), f(2), f(1)) and think about how information of f(i) can give you information of f(i-1).

Partitioning a superset and getting the list of original sets for each partition

Introduction
While trying to do some cathegorization on nodes in a graph (which will be rendered differenty), I find myself confronted with the following problem:
The Problem
Given a superset of elements S = {0, 1, ... M} and a number n of non-disjoint subsets T_i thereof, with 0 <= i < n, what is the best algorithm to find out the partition of the set S called P?
P = S is the union of all disjoint partitions P_j of the original superset S, with 0 <= j < M, such that for all elements x in P_j, every x has the same list of "parents" among the "original" sets T_i.
Example
S = [1, 2, 3, 4, 5, 6, 8, 9]
T_1 = [1, 4]
T_2 = [2, 3]
T_3 = [1, 3, 4]
So all P_js would be:
P_1 = [1, 4] # all elements x have the same list of "parents": T_1, T_3
P_2 = [2] # all elements x have the same list of "parents": T_2
P_3 = [3] # all elements x have the same list of "parents": T_2, T_3
P_4 = [5, 6, 8, 9] # all elements x have the same list of "parents": S (so they're not in any of the P_j
Questions
What are good functions/classes in the python packages to compute all P_js and the list of their "parents", ideally restricted to numpy and scipy? Perhaps there's already a function which does just that
What is the best algorithm to find those partitions P_js and for each one, the list of "parents"? Let's note T_0 = S
I think the brute force approach would be to generate all 2-combinations of T sets and split them in at most 3 disjoint sets, which would be added back to the pool of T sets and then repeat the process until all resulting Ts are disjoint, and thus we've arrived at our answer - the set of P sets. A little problematic could be caching all the "parents" on the way there.
I suspect a dynamic programming approach could be used to optimize the algorithm.
Note: I would have loved to write the math parts in latex (via MathJax), but unfortunately this is not activated :-(
The following should be linear time (in the number of the elements in the Ts).
from collections import defaultdict
S = [1, 2, 3, 4, 5, 6, 8, 9]
T_1 = [1, 4]
T_2 = [2, 3]
T_3 = [1, 3, 4]
Ts = [S, T_1, T_2, T_3]
parents = defaultdict(int)
for i, T in enumerate(Ts):
for elem in T:
parents[elem] += 2 ** i
children = defaultdict(list)
for elem, p in parents.items():
children[p].append(elem)
print(list(children.values()))
Result:
[[5, 6, 8, 9], [1, 4], [2], [3]]
The way I'd do this is to construct an M × n boolean array In where In(i, j) &equals; Si &in; Tj. You can construct that in O(Σj|Tj|), provided you can map an element of S onto its integer index in O(1), by scanning all of the sets T and marking the corresponding bit in In.
You can then read the "signature" of each element i directly from In by concatenating row i into a binary number of n bits. The signature is precisely the equivalence relationship of the partition you are seeking.
By the way, I'm in total agreement with you about Math markup. Perhaps it's time to mount a new campaign.

Resources