Algorithms Runtime complexity: - algorithm

At careercup site, there was this problem (https://careercup.com/question?id=6270877443293184):
Given an array, find the number of tuples such that
A [i] + A [j] + A [k] = A [l] in an array, where i <j <k <l.
The proposed solution (below) there works but state runtime complexity of O(n^2). After analyzing the code, I don't think it can be done in less than n^2 * log n. My rationale is that it iterates through all elements in the 2d array (which is n^2 and them in a list that contains the tuples, check for each one, which is O(n). Even using TreeMap and doing a binary search can only reduce is to log n, not to constant time. Can someone confirm if this can be done in O(n^2) and explain me what is incorrect in my logic?
Proposed solution:
Fill 2 2d arrays with
arr1[i][j]=a[i]+a[j]
arr2[i][j]=a[j]-a[i]
j>i
In a map<int,list<int>>, map[arr1[i][j]].push_back(j)
For each element in arr2, search in the map.
Count all hits where j < k

It's pretty easy to insert j in increasing order in map[arr1[][]].
If you enumerate each element in arr2 in increasing k order you don't you have to do a binary search. You can just enumerate all the j.
Because you are going in increasing k order, for each list in the map you just have to remember where was the last one that you saw. So the map should rather be a map<int, <int, list<int>>>.
Since you only touch each j once and your complexity is only O(n^2).

You are right, in the worst case this is not O(n^2).
The author obviously assumed that list from map<int,list<int>> would contain only few members; it is the assumption similar to the one that we use when we state that the complexity of find operation of a hash table is O(1). Recall, a hash table whose collision resolution is based on separate chaining has a constant complexity of find operation on average, but in case when many elements hash to the same value it can degrade to a linear complexity.
Implementation-wise, notice that map map<int,list<int>> needs to be a hash table (i.e. std::unordered_map in C++, HashMap in Java), not std::map (or TreeMap in Java) because with std::map just the find operation is O(logn).

Related

How do you sort an array in the most efficient way when given the largest value?

Let's say that i have an array of size n and the largest value of this array is k.
Let's assume that k=log(sqrt(n)) and I want to sort this array in the most efficient way possible ,to do this I've simplified the equation to get k in terms of n and I've got n=2^2k that's my array size.
Now if i apply any sorting algorithm of Θ(n^(2)) the time complexity will be equal to Θ(2^(4k)) in terms of n this will be Θ(n^2)
,and if i apply a Θ(nlogn) sorting algorithm i will have Θ(k*2^(2k)) and in terms of n i will have Θ(nlog(sqrt(n))) which is the most efficient time complexity ,did i do this right?
And if i assume k=n^n can I use the same method as i used before?
for this I'm failing to denote the array size in terms of k to use the same method ,is there another way ?
Knowing the largest element value k in an array you want to sort, might help, especially if, as in your case k < n. In this case you can use counting sort an will have a runtime of O(n + k)

Given an unsorted array A, check if A[i] = i exists efficiently

Given array A, check if A[i] = i for any i exists.
I'm supposed to solve this faster than linear time, which to me seems impossible. The solution I came up with is to first sort the array in n*log(n) time, and then you can easily check faster than linear time. However, since the array is given unsorted I can't see an "efficient" solution?
You can't have a correct algorithm with better than O(N) complexity for an arbitrary (unsorted) array.
Suppose you have the solution better than O(N). It means that the algorithm has to omit some items of the array since scanning all the items is O(N).
Construct A such that A[i] != i for all i then run the algorithm.
Let A[k] be the item which has been omitted. Assign k to A[k],
run the algorithm again - it'll return no such items when k is expected.
You'll get O(log n) with a parallel algorithm (you didn't restrict that). Just start N processors in ld(N) steps and let them check the array items in parallel.

Efficiently find order statistics of unsorted list prefixes?

A is an array of the integers from 1 to n in random order.
I need random access to the ith largest element of the first j elements in at least log time.
What I've come up with so far is an n x n matrix M, where the element in the (i, j) position is the ith largest of the first j. This gives me constant-time random access, but requires n^2 storage.
By construction, M is sorted by row and column. Further, each column differs from its neighbors by a single value.
Can anyone suggest a way to compress M down to n log(n) space or better, with log(n) or better random access time?
I believe you can perform the access in O(log(N)) time, given O(N log(N)) preprocessing time and O(N log(N)) extra space. Here's how.
You can augment a red-black tree to support a select(i) operation which retrieves the element at rank i in O(log(N)) time. For example, see this PDF or the appropriate chapter of Introduction to Algorithms.
You can implement a red-black tree (even one augmented to support select(i)) in a functional manner, such that the insert operation returns a new tree which shares all but O(log(N)) nodes with the old tree. See for example Purely Functional Data Structures by Chris Okasaki.
We will build an array T of purely functional augmented red-black trees, such that the tree T[j] stores the indexes 0 ... j-1 of the first j elements of A sorted largest to smallest.
Base case: At T[0] create an augmented red-black tree with just one node, whose data is the number 0, which is the index of the 0th largest element in the first 1 elements of your array A.
Inductive step: For each j from 1 to N-1, at T[j] create an augmented red-black tree by purely functionally inserting a new node with index j into the tree T[j-1]. This creates at most O(log(j)) new nodes; the remaining nodes are shared with T[j-1]. This takes O(log(j)) time.
The total time to construct the array T is O(N log(N)) and the total space used is also O(N log(N)).
Once T[j-1] is created, you can access the ith largest element of the first j elements of A by performing T[j-1].select(i). This takes O(log(j)) time. Note that you can create T[j-1] lazily the first time it is needed. If A is very large and j is always relatively small, this will save a lot of time and space.
Unless I misunderstand, you are just finding the k-th order statistic of an array which is the prefix of another array.
This can be done using an algorithm that I think is called 'quickselect' or something along those lines. Basically, it's like quicksort:
Take a random pivot
Swap around array elements so all the smaller ones are on one side
You know this is the p+1th largest element where p is the number of smaller array elements
If p+1 = k, it's the solution! If p+1 > k, repeat on the 'smaller' subarray. If p+1 < k, repeat on the larger 'subarray'.
There's a (much) better description here under the Quickselect and Quicker Select headings, and also just generally on the internet if you search for k-th order quicksort solutions.
Although the worst-case time for this algorithm is O(n2) like quicksort, its expected case is much better (also like quicksort) if you properly select your random pivots. I think the space complexity would just be O(n); you can just make one copy of your prefix to muck up the ordering for.

Special Sorting

There is an external array of integers on which you can perform the following operations in O(1) time.
get(int i) - returns the value at the index 'i' in the external array.
reverse( int i, int j) - returns the reverse of the array between index positions i and j (including i and j).
example for reverse: consider an array {1,2,3,4,5}. reverse(0,2) will return {3,2,1,4,5} and reverse(1,4) will return {1,5,4,3,2}.
Write a code to sort the external array. Mention the time and space complexity for your code.
Obviously We can sort in nlogn using quick sort or merge sort. But given the scenerio can we do better?
To sort an array is to find the permutation, or shuffle, that restores it to a sorted state. In other words, your algorithm determines which of the n! possible permutations must be applied, and applies it. Since your algorithm explores the array by asking yes-no questions (Is cell i smaller or greater than cell j?) it follows an implicit decision tree that has depth log(n!) ~ n*log(n).
This means there will be O(n*log(n)) calls to get() to determine how to sort the array.
An interesting variant is to determine the smallest number of calls to reverse() necessary to sort the array, once you know what permutation you need. We know that this number is smaller than n-1, which can be achieved by using selection sort. Can the worst case number be smaller than n-2 ? I must say that I have no idea...
I'd try to reduce the problem to a classic swaps() based sorting algorithm.
In the following we assume without loss of generality j>=i:
Note that swap(i,j) = reverse(i,j) for each j <= i+2, the reversed sub array is only swapping the edges if there are 3 or less elements
Now, for any j>i+2 - all you need is just reverse() the array, by this swapping the edges - and then reverse the "middle" to get it back to the original, so you get: swap(i,j) = reverse(i,j) ; reverse(i+1,j-1)
Using the just built swap(), you can use any compare based algorithms that uses swaps, such as quicksort, which is O(nlogn). The complexity remains O(nlogn) since for each swap() you need up to 2 reverse() ops, which is O(1)
EDIT: Note: This solution fits for the original question (before it was editted), which asked for a solution, and not to optimize it better then quicksort/mergesort.
Assuming you want to minimize the number of external operations get and reverse:
read all integers into an internal array by calling get n times
do an internal sort (n log in internal ops) and calculate the permutation
sort the external array by calling reverse a maximum of n times
This has O(n) time and O(n) space complexity.
Edit in response to anonymous downvotes:
when talking about time complexity, you always have to state, which operations are to be counted. Here I assumed, only the external operations have a cost.
Based on get(int i) and reverse( int i, int j), we can't optimise the code. It will have same complexity.

Testing if unsorted sets are disjoint in linear time. (homework problem)

Problem:
Two sets A and B have n elements each. Assume that each element is an integer in the range [0, n^100]. These sets are not necessarily sorted. Show how to check whether these two sets are disjoint in O(n) time. Your algorithm should use O(n) space.
My original idea for this problem was to create a hash table of set A and search this hash table for each of the elements in B. However, I'm not aware of any way to create a hash table of a data set with this range that only takes O(n) space. Should I be considering a completely different approach?
UPDATE:
I contacted the professor regarding this problem asking about implementing a hash table and his response was:
Please note that hashing takes O(1) time for the operations only on an average. We need a worst case O(n) time algorithm for this problem.
So it seems the problem is looking for a different approach...
Input: Arrays A[m], B[n]
Output: True if they are disjoint, False otherwise
1. Brute Force: O(m*n) time, O(1) space
1. Search for each element of A into B
2. As soon as you get a match break and return false
3. If you reach till end, return true
Advantage: Doesn't modify the input
2. Sort both O(mlogm + nlogn + m + n)
1. Sort both arrays
2. Scan linearly
Disadvantage: Modifies the input
3. Sort smaller O((m + n)logm)
1. Say, m < n, sort A
2. Binary search for each element of B into A
Disadvantage: Modifies the input
4. Sort larger O((m + n)logn)
1. Say n > m, sort B
2. Binary search for each element of A into B
Disadvantage: Modifies the input
5. Hashing O(m + n) time, O(m) or O(n) space
Advantage: Doesn't modify the input
Why not use a hash table? Aren't they O(n) to create(Assuming they are all unique), then O(n) to search, being O(2n) = O(n)?
A hash set will work fine. It's extremely common to assume hash sets/tables are constant time per operation even though that's not strictly true.
Note that hash sets/tables absolutely only use space proportional to the elements inserted, not the potential total number of elements. You seem to have misunderstood that.
If "commonly assumed to be good enough" is unacceptable for some reason, you can use radix sort. It's linear in the total representation size of the input elements. (Caveat: that's slightly different from being linear in the number of elements.)
Honestly I didn't expect such answers from SO community but never mind. The question explicitly states that the algorithm should take O(n) space and time complexity, therefore we can rule out algorithms involving hashing since in the worst case hashing is not O(n).
Now I was going through some texts and found that the problem of finding whether 2 sets are reducible or not is reducible to the sorting problem. This is very standard when studying the lower bounds of many algorithms.
Actual lines from the book DESIGN METHODS AND ANALYSIS OF ALGORITHMS
By S. K. BASU · 2013.
Here the author clearly states that set disjointedness is clearly Omega(nlogn)
#include <bits/stdc++.h>
using namespace std;
int main()
{
unordered_map<string,int>m;
int n,i;
cin>>n;
string a,b; // for storing numbers upto n^100
for(i=0;i<n;i++)
{
cin>>a;
m[a]=1;
}
for(i=0;i<n;i++)
{
cin>>b;
if(m[b])
{
cout<<"Not disjoint";
exit(0);
}
}
cout<<"Disjoint";
return 0;
}
Time complexity : O(n)
Auxiliary space : O(n)
You can radix sort the inputs, in base n.
This will take 101 iterations through each array (because the input numbers are in the range 0 to n^100).
Once you've sorted the inputs, you can compare them in the obvious way in O(n) time.
Note: for the radix sort to run in O(n) time, you need to check that extracting the k'th digit (base n) of an input number is O(1). You can do that with (k-1) divisions by n and a modulo operation. Since k is at most 101, this is O(1).
side note
I note that kennytm# gave a similar answer is 2010, but the answer was deleted after commenters noted that "Radix sort is O(nk) time, where n is the number of keys, and k is the average key length. Since the max key value is n^100, the max key length would be 100 log n. So, this would still be O(n log n), same as all of the best sorting algorithms."
Note that this comment is incorrect -- the maximum key length is 101, because the key is a sequence of numbers some base, and is not measured in bits.

Resources