What is the time complexity of the following merge algorithm? - algorithm

You are given int[][] lists (an array of sorted int-array).
You are to merge all these.
What's the time complexity?
I've tried doing this by dividing up the array into pairs and merge all the pairs in parallel.
public static List<Integer> merge(int[][] array) throws InterruptedException {
// number of array remaining to be merged
int remaining = array.length;
while (remaining > 2) {
List<Thread> threads = new LinkedList<>();
for (int i = 0; i < remaining - 1; i += 2) {
// DoMerge is a runnable that merges
// two array in O(n1 + n2) time (n1 and n2 are
// lengths of the two given arrays)
// DoMerge will also put the merged array
// at position i in the given array
Thread mergeThread = new Thread(new DoMerge(i, i + 1, array));
threads.add(mergeThread);
mergeThread.start();
}
// wait for them all to finish
for (Thread t : threads) {
t.join();
}
// move the newly merged list to the front
for (int j = 1, i = 2; i < remaining; i += 2, ++j) {
array[j] = array[i];
array[i] = null;
}
remaining = (int) Math.ceil(remaining / 2.0);
}
return combine(lists[0], lists[1]);
}
(Assume the number of processors >= arrays.length)
I think the time complexity of this is log(n).k where k is the max length of each array to be merged and n is the number of arrays.
Is this correct?

Unfortunately no, this is incorrect.
Lets assume a "best case" scenario where k = n and all initial arrays have size k, which would be the best partitioning possible. For simplicity lets also assume n is a power of 2.
Every iteration is fully parallel due to assumption on CPU threads availability, so first iteration will have time complexity of O(k+k) (this is same as O(k), but hold on).
Second iteration will "work" on size 2 x k arrays so time complexity is O(2k + 2k), and next iteration will be O(4k + 4k), up to the last iteration which will have a time complexity of O(n/2 k + n/2 k) and that's quite expected considering the fact that in the end you merge the last 2 parts and create the full array.
Lets sum all iterations: *2k + 4k + 8k + ... + nk = O(nk).
You can't go below nk since you must create a full array in the end, so k log(n) in't possible.

Related

Choosing k out of n

I want to choose k elements uniformly at random out of a possible n without choosing the same number twice. There are two trivial approaches to this.
Make a list of all n possibilities. Shuffle them (you don't need
to shuffle all n numbers just k of them by performing the first
k steps of Fisher Yates). Choose the first k. This approach
takes O(k) time (assuming allocating an array of size n takes
O(1) time) and O(n) space. This is a problem if k is very
small relative to n.
Store a set of seen elements. Choose a number at random from [0, n-1]. While the element is in the set then choose a new number.
This approach takes O(k) space. The run-time is a little more
complicated to analyze. If k = theta(n) then the run-time is
O(k*lg(k))=O(n*lg(n)) because it is the coupon collector's
problem. If k is small relative to n then it takes slightly
more than O(k) because of the probability (albeit low) of choosing
the same number twice. This is better than the above solution in
terms of space but worse in terms of run-time.
My question:
is there an O(k) time, O(k) space algorithm for all k and n?
With an O(1) hash table, the partial Fisher-Yates method can be made to run in O(k) time and space. The trick is simply to store only the changed elements of the array in the hash table.
Here's a simple example in Java:
public static int[] getRandomSelection (int k, int n, Random rng) {
if (k > n) throw new IllegalArgumentException(
"Cannot choose " + k + " elements out of " + n + "."
);
HashMap<Integer, Integer> hash = new HashMap<Integer, Integer>(2*k);
int[] output = new int[k];
for (int i = 0; i < k; i++) {
int j = i + rng.nextInt(n - i);
output[i] = (hash.containsKey(j) ? hash.remove(j) : j);
if (j > i) hash.put(j, (hash.containsKey(i) ? hash.remove(i) : i));
}
return output;
}
This code allocates a HashMap of 2×k buckets to store the modified elements (which should be enough to ensure that the hash table is never rehashed), and just runs a partial Fisher-Yates shuffle on it.
Here's a quick test on Ideone; it picks two elements out of three 30,000 times, and counts the number of times each pair of elements gets chosen. For an unbiased shuffle, each ordered pair should appear approximately 5,000 (&pm;100 or so) times, except for the impossible cases where both elements would be equal.
Your second approach does not take Theta(k log k) time on average, it takes about n/(n-k+1) + n/(n-k+2) + ... + n/n operations, which is less than k(n/(n-k)) since you have k terms which are each smaller than n/(n-k). For k <= n/2, it takes under 2*k operations on average. For k>n/2, you can choose a random subset of size n-k, and take the complement. So, this is already an O(k) average time and space algorithm.
What you could use is the following algorithm (using javascript instead of pseudocode):
var k = 3;
var n = [1,2,3,4,5,6];
// O(k) iterations
for(var i = 0, tmp; i < k; ++i) {
// Random index O(1)
var index = Math.floor(Math.random() * (n.length - i));
// Output O(1)
console.log(n[index]);
// Swap and lookup O(1)
tmp = n[index];
n[index] = n[n.length - i - 1];
n[n.length - i - 1] = tmp;
}
In short, you swap the selected value with the last item and in the next iteration sample from the reduced subset. This assumes your original set is wholly unique.
The storage is O(n), if you wish to retrieve the numbers as a set, just refer to the last k entries from n.

Maximize sum of list with no more than k consecutive elements from input

I have an array of N numbers and I want remove only those elements from the list which when removed will create a new list where there are no more K numbers adjacent to each other. There can be multiple lists that can be created with this restriction. So I just want that list in which the sum of the remaining numbers is maximum and as an output print that sum only.
The algorithm that I have come up with so far has a time complexity of O(n^2). Is it possible to get better algorithm for this problem?
Link to the question.
Here's my attempt:
int main()
{
//Total Number of elements in the list
int count = 6;
//Maximum number of elements that can be together
int maxTogether = 1;
//The list of numbers
int billboards[] = {4, 7, 2, 0, 8, 9};
int maxSum = 0;
for(int k = 0; k<=maxTogether ; k++){
int sum=0;
int size= k;
for (int i = 0; i< count; i++) {
if(size != maxTogether){
sum += billboards[i];
size++;
}else{
size = 0;
}
}
printf("%i\n", sum);
if(sum > maxSum)
{
maxSum = sum;
}
}
return 0;
}
The O(NK) dynamic programming solution is fairly easy:
Let A[i] be the best sum of the elements to the left subject to the not-k-consecutive constraint (assuming we're removing the i-th element as well).
Then we can calculate A[i] by looking back K elements:
A[i] = 0;
for j = 1 to k
A[i] = max(A[i], A[i-j])
A[i] += input[i]
And, at the end, just look through the last k elements from A, adding the elements to the right to each and picking the best one.
But this is too slow.
Let's do better.
So A[i] finds the best from A[i-1], A[i-2], ..., A[i-K+1], A[i-K].
So A[i+1] finds the best from A[i], A[i-1], A[i-2], ..., A[i-K+1].
There's a lot of redundancy there - we already know the best from indices i-1 through i-K because of A[i]'s calculation, but then we find the best of all of those except i-K (with i) again in A[i+1].
So we can just store all of them in an ordered data structure and then remove A[i-K] and insert A[i]. My choice - A binary search tree to find the minimum, along with a circular array of size K+1 of tree nodes, so we can easily find the one we need to remove.
I swapped the problem around to make it slightly simpler - instead of finding the maximum of remaining elements, I find the minimum of removed elements and then return total sum - removed sum.
High-level pseudo-code:
for each i in input
add (i + the smallest value in the BST) to the BST
add the above node to the circular array
if it wrapper around, remove the overridden element from the BST
// now the remaining nodes in the BST are the last k elements
return (the total sum - the smallest value in the BST)
Running time:
O(n log k)
Java code:
int getBestSum(int[] input, int K)
{
Node[] array = new Node[K+1];
TreeSet<Node> nodes = new TreeSet<Node>();
Node n = new Node(0);
nodes.add(n);
array[0] = n;
int arrPos = 0;
int sum = 0;
for (int i: input)
{
sum += i;
Node oldNode = nodes.first();
Node newNode = new Node(oldNode.value + i);
arrPos = (arrPos + 1) % array.length;
if (array[arrPos] != null)
nodes.remove(array[arrPos]);
array[arrPos] = newNode;
nodes.add(newNode);
}
return sum - nodes.first().value;
}
getBestSum(new int[]{1,2,3,1,6,10}, 2) prints 21, as required.
Let f[i] be the maximum total value you can get with the first i numbers, while you don't choose the last(i.e. the i-th) one. Then we have
f[i] = max{
f[i-1],
max{f[j] + sum(j + 1, i - 1) | (i - j) <= k}
}
you can use a heap-like data structure to maintain the options and get the maximum one in log(n) time, keep a global delta or whatever, and pay attention to the range i - j <= k.
The following algorithm is of O(N*K) complexity.
Examine the 1st K elements (0 to K-1) of the array. There can be at most 1 gap in this region.
Reason: If there were two gaps, then there would not be any reason to have the lower (earlier gap).
For each index i of these K gap options, following holds true:
1. Sum upto i-1 is the present score of each option.
2. If the next gap is after a distance of d, then the options for d are (K - i) to K
For every possible position of gap, calculate the best sum upto that position among the options.
The latter part of the array can be traversed similarly independently from the past gap history.
Traverse the array further till the end.

How to increment all values in an array interval by a given amount

Suppose i have an array A of length L. I will be given n intervals(i,j) and i have to increment all values between A[i] and A[j].Which data structure would be most suitable for the given operations?
The intervals are known beforehand.
You can get O(N + M). Keep an extra increment array B the same size of A initially empty (filled with 0). If you need to increment the range (i, j) with value k then do B[i] += k and B[j + 1] -= k
Now do a partial sum transformation in B, considering you're indexing from 0:
for (int i = 1; i < N; ++i) B[i] += B[i - 1];
And now the final values of A are A[i] + B[i]
break all intervals into start and end indexes: s_i,e_i for the i-th interval which starts including s_i and ends excluding e_i
sort all s_i-s as an array S
sort all e_i-s as an array E
set increment to zero
start a linear scan of the input and add increment to everyone,
in each loop if the next s_i is the current index increment increment if the next e_i is index decement increment
inc=0
s=<PriorityQueue of interval startindexes>
e=<PriorityQueue of interval endindexes>
for(i=0;i<n;i++){
if( inc == 0 ){
// skip adding zeros
i=min(s.peek(),e.peek())
}
while( s.peek() == i ) {
s.pop();
inc++;
}
while( e.peek() == i ) {
e.pop();
inc--;
}
a[i]+=inc;
}
complexity(without skipping nonincremented elements): O(n+m*log(m)) // m is the number of intervals
if n>>m then it's O(n)
complexity when skipping elements: O( min( n , \sum length(I_i) ) ), where length(I_i)=e_i-s_i
There are three main approaches that I can think of:
Approach 1
This is the simplest one, where you just keep the array as is, and do the naive thing for increment.
Pros: Querying is constant time
Cons: Increment can be linear time (and hence pretty slow if L is big)
Approach 2
This one is a little more complicated, but is better if you plan on incrementing a lot.
Store the elements in a binary tree so that an in-order traversal accesses the elements in order. Each node (aside from the normal left and right subchildren) also stores an extra int addOn, which will be "add me when you query any node in this tree".
For querying elements, do the normal binary search on index to find the element, adding up all of the values of the addOn variables as you go. Add those to the A[i] at the node you want, and that's your value.
For increments, traverse down into the tree, updating all of these new addOns as necessary. Note that if you add the incremented value to an addOn for one node, you do not update it for the two children. The runtime for each increment is then O(log L), since the only times you ever have to "branch off" into the children is when the first or last element in the interval is in your range. Hence, you branch off at most 2 log L times, and access a constant factor more in elements.
Pros: Increment is now O(log L), so now things are much faster than before if you increment a ton.
Cons: Queries take longer (also O(log L)), and the implementation is much trickier.
Approach 3
Use an interval tree.
Pros: Just like approach 2, this one can be much faster than the naive approach
Cons: Not doable if you don't know what the intervals are going to be beforehand.Also tricky to implement
Solve the problem for a single interval. Then iterate over all intervals and apply the single-interval solution for each. The best data structure depends on the language. Here's a Java example:
public class Interval {
int i;
int j;
}
public void increment(int[] array, Interval interval) {
for (int i = interval.i; i < interval.j; ++i) {
++array[i];
}
}
public void increment(int[] array, Interval[] intervals) {
for (Interval interval : intervals) {
increment(array, interval);
}
}
Obviously you could nest one loop inside the other if you wanted to reduce the amount of code. However, a single-interval method might be useful in its own right.
EDIT
If the intervals are known beforehand, then you can improve things a bit. You can modify the Interval structure to maintain an increment amount (which defaults to 1). Then preprocess the set of intervals S as follows:
Initialize a second set of intervals T to the empty set
For each interval I in S: if I does not overlap any interval in T, add I to T; otherwise:
For each interval J in T that overlaps I, remove J from T, form new intervals K1...Kn from I and J such that there are no overlaps (n can be from 1 to 3), and add K1...Kn to T
When this finishes, use the intervals in T with the earlier code (modified as described). Since there are no overlaps, no element of the array will be incremented more than once. For a fixed set of intervals, this is a constant time algorithm, regardless of the array length.
For N intervals, the splitting process can probably be designed to run in something close to O(N log N) by keeping T ordered by interval start index. But if the cost is amortized among many array increment operations, this isn't all that important to the overall complexity.
A Possible implementation of O(M+N) algorithm suggested by Adrian Budau
import java.util.Scanner;
class Interval{
int i;
int j;
}
public class IncrementArray {
public static void main(String[] args) {
int k= 5; // increase array elements by this value
Scanner sc = new Scanner(System.in);
int intervalNo = sc.nextInt(); // specify no of intervals
Interval[] interval = new Interval[intervalNo]; // array containing ranges/intervals
System.out.println(">"+sc.nextLine()+"<");
for(int i=0;i<intervalNo;i++)
{
interval[i]= new Interval();
String s = sc.nextLine(); // specify i and j separated by comma in one line for an interval.
String[] s1 = s.split(" ");
interval[i].i= Integer.parseInt(s1[0]);
interval[i].j= Integer.parseInt(s1[1]);
}
int[] arr = new int[10]; // array whose values need to be incremented.
for(int i=0;i<arr.length;++i)
arr[i]=i+1; // initialising array.
int[] temp = new int[10];
Interval run=interval[0]; int i;
for(i=0;i<intervalNo;i++,run=interval[i<intervalNo?i:0] ) // i<intervalNo?i:0 is used just to avoid arrayBound Exceptions at last iteration.
{
temp[run.i]+=k;
if(run.j+1<10) // incrementing temp within array bounds.
temp[run.j +1]-=k;
}
for (i = 1; i < 10; ++i)
temp[i] += temp[i - 1];
for(i=0, run=interval[i];i<10;i++)
{
arr[i]+= temp[i];
System.out.print(" "+arr[i]); // printing results.
}
}
}

Amortized Time Cost using Accounting Method

I written an algorithm to calculate the next lexicographic permutation of an array of integers (ex. 123, 132, 213, 231, 312,323). I dont think the code is necessary but I included it below.
I think I have appropriately determined worst case time cost of O(n) where n is the number of elements in the array. I understand however if you utilize "Amortized Cost" you would find that the time cost could be accurately shown as O(1) on average case.
Question:
I would like to learn the "ACCOUNTING METHOD" to show this as O(1) but am having difficulty understanding how to apply a cost to each operation. Accounting method: Link: Accounting_Method_Explained
Thoughts:
Ive thought to apply a cost of changing a value at a position, or applying the cost to a swap. But it really doesnt make much sense.
public static int[] getNext(int[] array) {
int temp;
int j = array.length - 1;
int k = array.length - 1;
// Find largest index j with a[j] < a[j+1]
// Finds the next adjacent pair of values not in descending order
do {
j--;
if(j < 0)
{
//Edge case, where you have the largest value, return reverse order
for(int x = 0, y = array.length-1; x<y; x++,y--)
{
temp = array[x];
array[x] = array[y];
array[y] = temp;
}
return array;
}
}while (array[j] > array[j+1]);
// Find index k such that a[k] is smallest integer
// greater than a[j] to the right of a[j]
for (;array[j] > array[k]; k--,count++);
//Swap the two elements found from j and k
temp = array[k];
array[k] = array[j];
array[j] = temp;
//Sort the elements to right of j+1 in ascending order
//This will make sure you get the next smallest order
//after swaping j and k
int r = array.length - 1;
int s = j + 1;
while (r > s) {
temp = array[s];
array[s++] = array[r];
array[r--] = temp;
}
return array;
} // end getNext
Measure running time in swaps, since the other work per iteration is worst-case O(#swaps).
The swap of array[j] and array[k] has virtual cost 2. The other swaps have virtual cost 0. Since at most one swap per iteration is costly, the running time per iteration is amortized constant (assuming that we don't go into debt).
To show that we don't go into debt, it suffices to show that, if the swap of array[j] and array[k] leaves a credit at position j, then every other swap involves a position with a credit available, which is consumed. Case analysis and induction reveal that, between iterations, if an item is larger than the one immediately following it, then it was put in its current position by a swap that left an as-yet unconsumed credit.
This problem is not a great candidate for the accounting method, given the comparatively simple potential function that can be used: number of indexes j such that array[j] > array[j + 1].
From the aggregate analysis, we see T(n) < n! · e < n! · 3, so we pay $3 for each operation, and its enough for the total n! operations. Therefore its an upper bound of actual cost. So the total amortized

Find the x smallest integers in a list of length n

You have a list of n integers and you want the x smallest. For example,
x_smallest([1, 2, 5, 4, 3], 3) should return [1, 2, 3].
I'll vote up unique runtimes within reason and will give the green check to the best runtime.
I'll start with O(n * x): Create an array of length x. Iterate through the list x times, each time pulling out the next smallest integer.
Edits
You have no idea how big or small these numbers are ahead of time.
You don't care about the final order, you just want the x smallest.
This is already being handled in some solutions, but let's say that while you aren't guaranteed a unique list, you aren't going to get a degenerate list either such as [1, 1, 1, 1, 1] either.
You can find the k-th smallest element in O(n) time. This has been discussed on StackOverflow before. There are relatively simple randomized algorithms, such as QuickSelect, that run in O(n) expected time and more complicated algorithms that run in O(n) worst-case time.
Given the k-th smallest element you can make one pass over the list to find all elements less than the k-th smallest and you are done. (I assume that the result array does not need to be sorted.)
Overall run-time is O(n).
Maintain the list of the x highest so far in sorted order in a skip-list. Iterate through the array. For each element, find where it would be inserted in the skip list (log x time). If in the interior of the list, it is one of the smallest x so far, so insert it and remove the element at the end of the list. Otherwise do nothing.
Time O(n*log(x))
Alternative implementation: maintain the collection of x highest so far in a max-heap, compare each new element with top element of the heap, and pop + insert new element only if the new element is less than the top element. Since comparison to top element is O(1) and pop/insert O(log x), this is also O(nlog(x))
Add all n numbers to a heap and delete x of them. Complexity is O((n + x) log n). Since x is obviously less than n, it's O(n log n).
If the range of numbers (L) is known, you can do a modified counting sort.
given L, x, input[]
counts <- array[0..L]
for each number in input
increment counts[number]
next
#populate the output
index <- 0
xIndex <- 0
while xIndex < x and index <= L
if counts[index] > 0 then
decrement counts[index]
output[xIndex] = index
increment xIndex
else
increment index
end if
loop
This has a runtime of O(n + L) (with memory overhead of O(L)) which makes it pretty attractive if the range is small (L < n log n).
def x_smallest(items, x):
result = sorted(items[:x])
for i in items[x:]:
if i < result[-1]:
result[-1] = i
j = x - 1
while j > 0 and result[j] < result[j-1]:
result[j-1], result[j] = result[j], result[j-1]
j -= 1
return result
Worst case is O(x*n), but will typically be closer to O(n).
Psudocode:
def x_smallest(array<int> arr, int limit)
array<int> ret = new array[limit]
ret = {INT_MAX}
for i in arr
for j in range(0..limit)
if (i < ret[j])
ret[j] = i
endif
endfor
endfor
return ret
enddef
In pseudo code:
y = length of list / 2
if (x > y)
iterate and pop off the (length - x) largest
else
iterate and pop off the x smallest
O(n/2 * x) ?
sort array
slice array 0 x
Choose the best sort algorithm and you're done: http://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms
You can sort then take the first x values?
Java: with QuickSort O(n log n)
import java.util.Arrays;
import java.util.Random;
public class Main {
public static void main(String[] args) {
Random random = new Random(); // Random number generator
int[] list = new int[1000];
int lenght = 3;
// Initialize array with positive random values
for (int i = 0; i < list.length; i++) {
list[i] = Math.abs(random.nextInt());
}
// Solution
int[] output = findSmallest(list, lenght);
// Display Results
for(int x : output)
System.out.println(x);
}
private static int[] findSmallest(int[] list, int lenght) {
// A tuned quicksort
Arrays.sort(list);
// Send back correct lenght
return Arrays.copyOf(list, lenght);
}
}
Its pretty fast.
private static int[] x_smallest(int[] input, int x)
{
int[] output = new int[x];
for (int i = 0; i < x; i++) { // O(x)
output[i] = input[i];
}
for (int i = x; i < input.Length; i++) { // + O(n-x)
int current = input[i];
int temp;
for (int j = 0; j < output.Length; j++) { // * O(x)
if (current < output[j]) {
temp = output[j];
output[j] = current;
current = temp;
}
}
}
return output;
}
Looking at the complexity:
O(x + (n-x) * x) -- assuming x is some constant, O(n)
What about using a splay tree? Because of the splay tree's unique approach to adaptive balancing it makes for a slick implementation of the algorithm with the added benefit of being able to enumerate the x items in order afterwards. Here is some psuedocode.
public SplayTree GetSmallest(int[] array, int x)
{
var tree = new SplayTree();
for (int i = 0; i < array.Length; i++)
{
int max = tree.GetLargest();
if (array[i] < max || tree.Count < x)
{
if (tree.Count >= x)
{
tree.Remove(max);
}
tree.Add(array[i]);
}
}
return tree;
}
The GetLargest and Remove operations have an amortized complexity of O(log(n)), but because the last accessed item bubbles to the top it would normally be O(1). So the space complexity is O(x) and the runtime complexity is O(n*log(x)). If the array happens to already be ordered then this algorithm would acheive its best case complexity of O(n) with either an ascending or descending ordered array. However, a very odd or peculiar ordering could result in a O(n^2) complexity. Can you guess how the array would have to be ordered for that to happen?
In scala, and probably other functional languages, a no brainer:
scala> List (1, 3, 6, 4, 5, 1, 2, 9, 4) sortWith ( _<_ ) take 5
res18: List[Int] = List(1, 1, 2, 3, 4)

Resources