Special Sorting - algorithm

There is an external array of integers on which you can perform the following operations in O(1) time.
get(int i) - returns the value at the index 'i' in the external array.
reverse( int i, int j) - returns the reverse of the array between index positions i and j (including i and j).
example for reverse: consider an array {1,2,3,4,5}. reverse(0,2) will return {3,2,1,4,5} and reverse(1,4) will return {1,5,4,3,2}.
Write a code to sort the external array. Mention the time and space complexity for your code.
Obviously We can sort in nlogn using quick sort or merge sort. But given the scenerio can we do better?

To sort an array is to find the permutation, or shuffle, that restores it to a sorted state. In other words, your algorithm determines which of the n! possible permutations must be applied, and applies it. Since your algorithm explores the array by asking yes-no questions (Is cell i smaller or greater than cell j?) it follows an implicit decision tree that has depth log(n!) ~ n*log(n).
This means there will be O(n*log(n)) calls to get() to determine how to sort the array.
An interesting variant is to determine the smallest number of calls to reverse() necessary to sort the array, once you know what permutation you need. We know that this number is smaller than n-1, which can be achieved by using selection sort. Can the worst case number be smaller than n-2 ? I must say that I have no idea...

I'd try to reduce the problem to a classic swaps() based sorting algorithm.
In the following we assume without loss of generality j>=i:
Note that swap(i,j) = reverse(i,j) for each j <= i+2, the reversed sub array is only swapping the edges if there are 3 or less elements
Now, for any j>i+2 - all you need is just reverse() the array, by this swapping the edges - and then reverse the "middle" to get it back to the original, so you get: swap(i,j) = reverse(i,j) ; reverse(i+1,j-1)
Using the just built swap(), you can use any compare based algorithms that uses swaps, such as quicksort, which is O(nlogn). The complexity remains O(nlogn) since for each swap() you need up to 2 reverse() ops, which is O(1)
EDIT: Note: This solution fits for the original question (before it was editted), which asked for a solution, and not to optimize it better then quicksort/mergesort.

Assuming you want to minimize the number of external operations get and reverse:
read all integers into an internal array by calling get n times
do an internal sort (n log in internal ops) and calculate the permutation
sort the external array by calling reverse a maximum of n times
This has O(n) time and O(n) space complexity.
Edit in response to anonymous downvotes:
when talking about time complexity, you always have to state, which operations are to be counted. Here I assumed, only the external operations have a cost.

Based on get(int i) and reverse( int i, int j), we can't optimise the code. It will have same complexity.

Related

Algorithms Runtime complexity:

At careercup site, there was this problem (https://careercup.com/question?id=6270877443293184):
Given an array, find the number of tuples such that
A [i] + A [j] + A [k] = A [l] in an array, where i <j <k <l.
The proposed solution (below) there works but state runtime complexity of O(n^2). After analyzing the code, I don't think it can be done in less than n^2 * log n. My rationale is that it iterates through all elements in the 2d array (which is n^2 and them in a list that contains the tuples, check for each one, which is O(n). Even using TreeMap and doing a binary search can only reduce is to log n, not to constant time. Can someone confirm if this can be done in O(n^2) and explain me what is incorrect in my logic?
Proposed solution:
Fill 2 2d arrays with
arr1[i][j]=a[i]+a[j]
arr2[i][j]=a[j]-a[i]
j>i
In a map<int,list<int>>, map[arr1[i][j]].push_back(j)
For each element in arr2, search in the map.
Count all hits where j < k
It's pretty easy to insert j in increasing order in map[arr1[][]].
If you enumerate each element in arr2 in increasing k order you don't you have to do a binary search. You can just enumerate all the j.
Because you are going in increasing k order, for each list in the map you just have to remember where was the last one that you saw. So the map should rather be a map<int, <int, list<int>>>.
Since you only touch each j once and your complexity is only O(n^2).
You are right, in the worst case this is not O(n^2).
The author obviously assumed that list from map<int,list<int>> would contain only few members; it is the assumption similar to the one that we use when we state that the complexity of find operation of a hash table is O(1). Recall, a hash table whose collision resolution is based on separate chaining has a constant complexity of find operation on average, but in case when many elements hash to the same value it can degrade to a linear complexity.
Implementation-wise, notice that map map<int,list<int>> needs to be a hash table (i.e. std::unordered_map in C++, HashMap in Java), not std::map (or TreeMap in Java) because with std::map just the find operation is O(logn).

Find Pair with Difference less than K with O(n) complexity on average

I have an unsorted array of n positive numbers and a parameter k, I need to find out if there is a pair of numbers in the array that the difference between than is less than k and I need to do so in time complexity of O(n) on probable average and in space complexity of O(n).
I believe it requires the use of a universal hash table but I'm not sure how, any ideas?
This answer works even on unbounded integers and floats (doing some assumptions on the nicety of the hashmap you'll be using - the java implementation should work for instance):
keep a hashmap<int, float> all_divided_values. For each key y,
if all_divided_values[y] exists, it will contain a value v that
is in the array such that floor(v/k) = y.
For each value v in the original array A, if v/k is in all_divided_values's keys, output (v, all_divided_values[v/k])
(they are distant by less than k). Else, store v in
all_divided_values[v/k]
Once all_divided_values is filled, go through A again. For each v, test whether all_divided_values[v/k - 1] exists, and if so,
output the pair (v, all_divided_values[v/k - 1]) if and only if abs(v-all_divided_values[v/k - 1])<=k
Inserting in a hashmap is usually (with Java hashmap for instance) O(1) in average, so the total time is O(n). But please note that technically this could be false, for instance if your language's implementation does not have a nice strategy about the hashmap.
Simple solution:
1- Sort the array
2- Calculate the difference between consecutive elements
a) If the difference is smaller than k return that pair
b) If no consecutive number difference yields a value smaller than k, then your array has no pair of numbers such that the difference is smaller than k.
Sorting is O(nlogn), but if you have only Integers of limited size, you can use Counting sort, that is O(n)
You can consider this way.
The problem can be modeled as this:-
consider each element (considering integer) now you convert them to a range (A[i]-K,A[i]+K)
Now you want to check if any of the two intervals overlap.
Interval intersection problem without any sorted ness is not solvable in O(n) (worst case). You need to sort them and then inn O(n) you can check if hey intersect.
Same goes for your logic. Sort it and find it.

What are the rules for the "Ω(n log n) barrier" for sorting algorithms?

I wrote a simple program that sorts in O(n). It is highly memory inefficient, but that's not the point.
It uses the principle behind a HashMap for sorting:
public class NLogNBreak {
public static class LinkedListBack {
public LinkedListBack(int val){
first = new Node();
first.val = val;
}
public Node first = null;
public void insert(int i){
Node n = new Node();
n.val = i;
n.next = first;
first = n;
}
}
private static class Node {
public Node next = null;
public int val;
}
//max > in[i] > 0
public static LinkedListBack[] sorted(int[] in, int max){
LinkedListBack[] ar = new LinkedListBack[max + 1];
for (int i = 0; i < in.length; i++) {
int val = in[i];
if(ar[val] == null){
ar[val] = new LinkedListBack(val);
} else {
ar[val].insert(val);
}
}
return ar;
}
}
So does this count as a sort of O(n), even though it returns the result in a funky format?
To directly answer your question:
Your sorting algorithm is technically not O(n) but rather O(n + max), since you need to create an array of size max, which takes O(max) time.
This isn't a problem; in fact, it's a special case of a well-known sorting algorithm that breaks the Ω(n log n) barrier.
So what is this Ω(n log n) barrier? Where does it come from? And how do you break it?
The Ω(n log n) Barrier
The Ω(n log n) barrier is the information-theoretical lower bound on the average-case speed of any comparison-based sorting algorithm. If the only operations you are permitted to apply to array elements to distinguish them is to perform some sort of comparison, then your sorting algorithm can't do better than Ω(n log n) in the average-case.
To understand why this is, let's think about the state of the algorithm at any point during its execution. As the algorithm is running, it can gain some amount of information about the way that the input elements were ordered. Let's say that if the algorithm has some set of information X about the original ordering of the input elements, then the algorithm is in state X.
The crux of the Ω(n log n) argument (and several related arguments, as I'll discuss later) is that the algorithm has to have the ability to get into a large number of different states based on what the input is. Let's assume for now that the input to the sorting algorithm is an array of n distinct values. Because the algorithm can't tell anything about those elements other than the way that they're ordered, it doesn't really matter what the values being sorted are. All that matters is the relative ordering of those n elements relative to one another.
Now for the key step - let's suppose that there are f(n) unique ways of ordering the n input elements and suppose that our sorting algorithm can't get into at least f(n) different states. If this is the case, then there has to be two different orderings of the elements in the array that the algorithm always groups together into the same state. If this happens, then the sorting algorithm can't possibly correctly sort both of the two input arrays correctly. The reasoning behind this is that because the algorithm treats the two arrays identically, whatever steps it uses to reorder the elements of the first array will be the same as the steps it uses to reorder the elements of the second array. Since the two arrays aren't the same, there has to be at least one element that will be out of place in one of the two cases. Consequently, we know that the sorting algorithm has to be able to get into f(n) different states.
But how can the algorithm get into these different states? Well, let's think about this. Initially, the algorithm has no information at all about the ordering of the elements. When it makes its first comparison (say, between elements A[i] and A[j]), the algorithm can get into one of two states - one where A[i] < A[j] and one where A[i] > A[j]. More generally, every comparison that the algorithm makes can, in the best case, put the algorithm into one of two new states based on the result of the comparison. We can therefore think of a large binary tree structure describing the states that the algorithm can be in - each state has up to two children describing what state the algorithm gets into based on the result of the comparison that's made. If we take any path from the root of the tree down to a leaf, we get the series of comparisons that end up getting made by the algorithm on a particular input. In order to sort as quickly as possible, we want to make the fewest number of comparisons possible, and so we want this tree structure to have the smallest height possible.
Now, we know two things. First, we can think of all of the states the algorithm can get into as a binary tree. Second, that binary tree has to have at least f(n) different nodes in it. Given this, the smallest possible binary tree we can build has to have height at least Ω(log f(n)). This means that if there are f(n) different possible ways of ordering the array elements, we have to make at least Ω(log f(n)) comparisons on average, since otherwise we can't get into enough differing states.
To conclude the proof that you can't beat Ω(n log n), note that if the array has n distinct elements in it, then there are n! different possible ways of ordering the elements. using Stirling's approximation, we have that log n! = Ω(n log n), and so we have to make at least Ω(n log n) comparisons in the average case to sort the input sequence.
Exceptions to the Rule
In what we just saw above, we saw that if you have n array elements that are all distinct, you cannot sort them with a comparison sort any faster than Ω(n log n). However, this starting assumption isn't necessarily valid. Many arrays that we'd like to sort may have duplicated elements in them. For example, suppose that I want to sort arrays that are composed solely of zeros and ones, such as this array here:
0 1 0 1 1 1 0 0 1 1 1
In this case, it is not true that there are n! different arrays of zeros and ones of length n. In fact, there are only 2n of them. From our result above, this means that we should be able to sort in Ω(log 2n) = Ω(n) time using a purely comparison-based sorting algorithm. In fact, we absolutely can do this; here's a sketch of how we'd do it:
Look at the first element.
Copy all elements less than the first element into an array called 'less'
Copy all elements equal to the first element into an array called 'equal'
Copy all elements greater than the first element into an array called 'greater'
Concatenate all three of these arrays together in the order less, equal, greater.
To see that this works, if 0 is our first element, then the 'less' array will be empty, the 'equal' array will have all the zeros, and the 'greater' array will have all the ones. Concatenating them then puts all the zeros before all the ones. Otherwise, if 1 is our first element, then the less array will hold the zeros, the equal array will hold the ones, and the greater array will be empty. Their concatenation is thus all zeros followed by all ones, as required.
In practice, you wouldn't use this algorithm (you'd use a counting sort, as described below), but it shows that you can indeed beat Ω(n log n) with a comparison-based algorithm if the number of possible inputs to the algorithm is small.
Some comparison-based sorting algorithms are known to work very quickly on inputs that have multiple duplicated values. For example, it is known that Quicksort with a special partitioning step can take advantage of duplicated elements in the input array.
Non-Comparison Sorts
All of this discussion has assumed that we're talking about comparison-based sorting, where the only permitted operation on array elements is a comparison. However, if we know more about what elements we're going to be sorting and can perform operations on those elements beyond simple comparisons, then none of the above bounds hold any more. We're breaking the starting assumptions that led us to construct a binary tree of all the states of the algorithm, and so there's no reason to suspect that those bounds will still hold.
For example, if you know that the input values are drawn from a universe that only has |U| elements in it, then you can sort in O(n + |U|) time using a clever algorithm. Start off by creating |U| different buckets into which we can place the elements from the original array. Then, iterate across the array and distribute all of the array elements into the corresponding bucket. Finally, visit each of the buckets, starting with the bucket holding copies of the smallest element and end with the bucket containing copies of the largest element, then concatenate together all of the values you find. For example, let's see how to sort arrays consisting of the values 1 - 5. If we have this starting array:
1 3 4 5 2 3 2 1 4 3 5
Then we can put those elements into buckets like this:
Bucket 1 2 3 4 5
-------------
1 2 3 4 5
1 2 3 4 5
3
Iterating across the buckets and concatenating their values together yields this:
1 1 2 2 3 3 3 4 4 5 5
which, sure enough, is a sorted version of our original array! The runtime here is O(n) time to go and distribute the original array elements into the buckets, then O(n + |U|) time to iterate across all the buckets putting the elements back together. Notice that if |U| = O(n), this runs in O(n) time, breaking the Ω(n log n) sorting barrier.
If you are sorting integers, you can do much better than this by using radix sort, which runs in O(n lg |U|). If you're dealing with primitive ints, lg |U| is usually 32 or 64, so this is extremely fast. If you're willing to implement a particularly tricky data structure, you can use a van Emde Boas Tree to sort integers from 0 to U - 1 in time O(n lg lg U), again by exploiting the fact that integers consist of groups of bits that can be manipulated in blocks.
Similarly, if you know that your elements are strings, you can sort very quickly by building a trie out of the strings, then iterating across the trie to rebuild all the strings. Alternatively, you could consider the strings as numbers written in a large base (say, base 128 for ASCII text) and then use one of the integer sorting algorithms from above.
In each of these cases, the reason that you can beat the information-theoretic barrier is that you're breaking the barrier's starting assumption, namely that you can only apply comparisons. If you can treat the input elements as numbers, or as strings, or as anything else that reveals more structure, all bets are off and you can sort extremely efficiently.
That is called Radix Sort, and yes it breaks the nlog(n) barrier, which is only a barrier on the Comparison Model. On the wikipedia page linked for Comparison Model you can see a list of sorts that use it, and a few that do not.
Radix sort sorts by putting each element in a bucket, based on it's value and then concatenating all the buckets together again at the end. It only works with types like integers that have a finite number of possible values.
Normally a radix sort is done one byte or nibble at a time to reduce the number of buckets. See the wikipedia article on it, or search for more info.
Your's can also be made to sort negative numbers and only allocate memory for the buckets it uses to improve on it.

Testing if unsorted sets are disjoint in linear time. (homework problem)

Problem:
Two sets A and B have n elements each. Assume that each element is an integer in the range [0, n^100]. These sets are not necessarily sorted. Show how to check whether these two sets are disjoint in O(n) time. Your algorithm should use O(n) space.
My original idea for this problem was to create a hash table of set A and search this hash table for each of the elements in B. However, I'm not aware of any way to create a hash table of a data set with this range that only takes O(n) space. Should I be considering a completely different approach?
UPDATE:
I contacted the professor regarding this problem asking about implementing a hash table and his response was:
Please note that hashing takes O(1) time for the operations only on an average. We need a worst case O(n) time algorithm for this problem.
So it seems the problem is looking for a different approach...
Input: Arrays A[m], B[n]
Output: True if they are disjoint, False otherwise
1. Brute Force: O(m*n) time, O(1) space
1. Search for each element of A into B
2. As soon as you get a match break and return false
3. If you reach till end, return true
Advantage: Doesn't modify the input
2. Sort both O(mlogm + nlogn + m + n)
1. Sort both arrays
2. Scan linearly
Disadvantage: Modifies the input
3. Sort smaller O((m + n)logm)
1. Say, m < n, sort A
2. Binary search for each element of B into A
Disadvantage: Modifies the input
4. Sort larger O((m + n)logn)
1. Say n > m, sort B
2. Binary search for each element of A into B
Disadvantage: Modifies the input
5. Hashing O(m + n) time, O(m) or O(n) space
Advantage: Doesn't modify the input
Why not use a hash table? Aren't they O(n) to create(Assuming they are all unique), then O(n) to search, being O(2n) = O(n)?
A hash set will work fine. It's extremely common to assume hash sets/tables are constant time per operation even though that's not strictly true.
Note that hash sets/tables absolutely only use space proportional to the elements inserted, not the potential total number of elements. You seem to have misunderstood that.
If "commonly assumed to be good enough" is unacceptable for some reason, you can use radix sort. It's linear in the total representation size of the input elements. (Caveat: that's slightly different from being linear in the number of elements.)
Honestly I didn't expect such answers from SO community but never mind. The question explicitly states that the algorithm should take O(n) space and time complexity, therefore we can rule out algorithms involving hashing since in the worst case hashing is not O(n).
Now I was going through some texts and found that the problem of finding whether 2 sets are reducible or not is reducible to the sorting problem. This is very standard when studying the lower bounds of many algorithms.
Actual lines from the book DESIGN METHODS AND ANALYSIS OF ALGORITHMS
By S. K. BASU · 2013.
Here the author clearly states that set disjointedness is clearly Omega(nlogn)
#include <bits/stdc++.h>
using namespace std;
int main()
{
unordered_map<string,int>m;
int n,i;
cin>>n;
string a,b; // for storing numbers upto n^100
for(i=0;i<n;i++)
{
cin>>a;
m[a]=1;
}
for(i=0;i<n;i++)
{
cin>>b;
if(m[b])
{
cout<<"Not disjoint";
exit(0);
}
}
cout<<"Disjoint";
return 0;
}
Time complexity : O(n)
Auxiliary space : O(n)
You can radix sort the inputs, in base n.
This will take 101 iterations through each array (because the input numbers are in the range 0 to n^100).
Once you've sorted the inputs, you can compare them in the obvious way in O(n) time.
Note: for the radix sort to run in O(n) time, you need to check that extracting the k'th digit (base n) of an input number is O(1). You can do that with (k-1) divisions by n and a modulo operation. Since k is at most 101, this is O(1).
side note
I note that kennytm# gave a similar answer is 2010, but the answer was deleted after commenters noted that "Radix sort is O(nk) time, where n is the number of keys, and k is the average key length. Since the max key value is n^100, the max key length would be 100 log n. So, this would still be O(n log n), same as all of the best sorting algorithms."
Note that this comment is incorrect -- the maximum key length is 101, because the key is a sequence of numbers some base, and is not measured in bits.

Number of different elements in an array

Is it possible to compute the number of different elements in an array in linear time and constant space? Let us say it's an array of long integers, and you can not allocate an array of length sizeof(long).
P.S. Not homework, just curious. I've got a book that sort of implies that it is possible.
This is the Element uniqueness problem, for which the lower bound is Ω( n log n ), for comparison-based models. The obvious hashing or bucket sorting solution all requires linear space too, so I'm not sure this is possible.
You can't use constant space. You can use O(number of different elements) space; that's what a HashSet does.
You can use any sorting algorithm and count the number of different adjacent elements in the array.
I do not think this can be done in linear time. One algorithm to solve in O(n log n) requires first sorting the array (then the comparisons become trivial).
If you are guaranteed that the numbers in the array are bounded above and below, by say a and b, then you could allocate an array of size b - a, and use it to keep track of which numbers have been seen.
i.e., you would move through your input array take each number, and mark a true in your target array at that spot. You would increment a counter of distinct numbers only when you encounter a number whose position in your storage array is false.
Assuming we can partially destroy the input, here's an algorithm for n words of O(log n) bits.
Find the element of order sqrt(n) via linear-time selection. Partition the array using this element as a pivot (O(n)). Using brute force, count the number of different elements in the partition of length sqrt(n). (This is O(sqrt(n)^2) = O(n).) Now use an in-place radix sort on the rest, where each "digit" is log(sqrt(n)) = log(n)/2 bits and we use the first partition to store the digit counts.
If you consider streaming algorithms only ( http://en.wikipedia.org/wiki/Streaming_algorithm ), then it's impossible to get an exact answer with o(n) bits of storage via a communication complexity lower bound ( http://en.wikipedia.org/wiki/Communication_complexity ), but possible to approximate the answer using randomness and little space (Alon, Matias, and Szegedy).
This can be done with a bucket approach when assuming that there are only a constant number of different values. Make a flag for each value (still constant space). Traverse the list and flag the occured values. If you happen to flag an already flagged value, you've found a duplicate. You have to traverse the buckets for each element in the list. But that's still linear time.

Resources