How to sort an array with unique values

How to sort an array with unique values - algorithm

Quicksort gives us a pretty nice O(nlogn); However, I was thinking is there a way to sort an array with unique values that is faster than Quicksort?

Here are some of the fastest sorting algorithms and their runtimes:
Mergesort: O(nlogn)
Timsort: O(nlogn)
Heapsort: O(nlogn)
Radix sort: O(nk)
Counting sort: O(n + k)

About sorting algorithms and techniques #Bilal answer is quite helpful!!
A work around the problem could run in O(N*log(N)) but for further calculations will be less because of removing of the duplicated values.
So the idea is to input the values and insert it in std::set which will automatically remove duplicated values, and if the duplicates are needed you can store it's count while getting input from user!!
A sample code will be something like this:
int n,x;
set<int> st;
int cnt[MAX_VAL];
int main(){
cin>>n;
for (int i=1;i<=n;i++){
cin>>x;
cnt[x]++;
st.insert(x);
}
// Rest of your code
}

Without additional assumptions, the lower bound for worst time complexity of any algorithm that uses comparisons isBigTheta(nlogn) .Note that the sorting of a permutation will in fact be the inverse of p. This means that if you are able to sort p({1,2,...n), then you are able to determine which permutation was applied to your data, out of all possible permutations.
The total number of permutations is n!, and for every information bit acquired your set is partitioned into two sets representing the outcomes consistent with that bit. Therefore you can represent the search for which permutation you use as a binary tree where every node is a set of permutations, the children of a node are the partitions of the parent set and the leaves are the outcome of your algorithm.
If your algorithm determines which partition you use, the leaves are singletons, so you end up with n! leaves. The tree with minimal height that contains n! leaves is log(n!) which is asymptotically nlog(n). http://lcm.csa.iisc.ernet.in/dsa/node208.html is a good reference for all of this.

Related

A linear algorithm for this specification?

This is my question I have got somewhere.
Given a list of numbers in random order write a linear time algorithm to find the 𝑘th smallest number in the list. Explain why your algorithm is linear.
I have searched almost half the web and what I got to know is a linear-time algorithm is whose time complexity must be O(n). (I may be wrong somewhere)
We can solve the above question by different algorithms eg.
Sort the array and select k-1 element [O(n log n)]
Using min-heap [O(n + klog n)]
etc.
Now the problem is I couldn't find any algorithm which has O(n) time complexity and satisfies that algorithm is linear.
What can be the solution for this problem?

This is std::nth_element
From cppreference:
Notes
The algorithm used is typically introselect although other selection algorithms with suitable average-case complexity are allowed.
Given a list of numbers
although it is not compatible with std::list, only std::vector, std::deque and std::array, as it requires RandomAccessIterator.

linear search remembering k smallest values is O(n*k) but if k is considered constant then its O(n) time.
However if k is not considered as constant then Using histogram leads to O(n+m.log(m)) time and O(m) space complexity where m is number of possible distinct values/range in your input data. The algo is like this:
create histogram counters for each possible value and set it to zero O(m)
process all data and count the values O(m)
sort the histogram O(m.log(m))
pick k-th element from histogram O(1)
in case we are talking about unsigned integers from 0 to m-1 then histogram is computed like this:
int data[n]={your data},cnt[m],i;
for (i=0;i<m;i++) cnt[i]=0;
for (i=0;i<n;i++) cnt[data[i]]++;
However if your input data values does not comply above condition you need to change the range by interpolation or hashing. However if m is huge (or contains huge gaps) is this a no go as such histogram is either using buckets (which is not usable for your problem) or need list of values which lead to no longer linear complexity.
So when put all this together is your problem solvable with linear complexity when:
n >= m.log(m)

Algorithms Runtime complexity:

At careercup site, there was this problem (https://careercup.com/question?id=6270877443293184):
Given an array, find the number of tuples such that
A [i] + A [j] + A [k] = A [l] in an array, where i <j <k <l.
The proposed solution (below) there works but state runtime complexity of O(n^2). After analyzing the code, I don't think it can be done in less than n^2 * log n. My rationale is that it iterates through all elements in the 2d array (which is n^2 and them in a list that contains the tuples, check for each one, which is O(n). Even using TreeMap and doing a binary search can only reduce is to log n, not to constant time. Can someone confirm if this can be done in O(n^2) and explain me what is incorrect in my logic?
Proposed solution:
Fill 2 2d arrays with
arr1[i][j]=a[i]+a[j]
arr2[i][j]=a[j]-a[i]
j>i
In a map<int,list<int>>, map[arr1[i][j]].push_back(j)
For each element in arr2, search in the map.
Count all hits where j < k

It's pretty easy to insert j in increasing order in map[arr1[][]].
If you enumerate each element in arr2 in increasing k order you don't you have to do a binary search. You can just enumerate all the j.
Because you are going in increasing k order, for each list in the map you just have to remember where was the last one that you saw. So the map should rather be a map<int, <int, list<int>>>.
Since you only touch each j once and your complexity is only O(n^2).

You are right, in the worst case this is not O(n^2).
The author obviously assumed that list from map<int,list<int>> would contain only few members; it is the assumption similar to the one that we use when we state that the complexity of find operation of a hash table is O(1). Recall, a hash table whose collision resolution is based on separate chaining has a constant complexity of find operation on average, but in case when many elements hash to the same value it can degrade to a linear complexity.
Implementation-wise, notice that map map<int,list<int>> needs to be a hash table (i.e. std::unordered_map in C++, HashMap in Java), not std::map (or TreeMap in Java) because with std::map just the find operation is O(logn).

Special Sorting

There is an external array of integers on which you can perform the following operations in O(1) time.
get(int i) - returns the value at the index 'i' in the external array.
reverse( int i, int j) - returns the reverse of the array between index positions i and j (including i and j).
example for reverse: consider an array {1,2,3,4,5}. reverse(0,2) will return {3,2,1,4,5} and reverse(1,4) will return {1,5,4,3,2}.
Write a code to sort the external array. Mention the time and space complexity for your code.
Obviously We can sort in nlogn using quick sort or merge sort. But given the scenerio can we do better?

To sort an array is to find the permutation, or shuffle, that restores it to a sorted state. In other words, your algorithm determines which of the n! possible permutations must be applied, and applies it. Since your algorithm explores the array by asking yes-no questions (Is cell i smaller or greater than cell j?) it follows an implicit decision tree that has depth log(n!) ~ n*log(n).
This means there will be O(n*log(n)) calls to get() to determine how to sort the array.
An interesting variant is to determine the smallest number of calls to reverse() necessary to sort the array, once you know what permutation you need. We know that this number is smaller than n-1, which can be achieved by using selection sort. Can the worst case number be smaller than n-2 ? I must say that I have no idea...

I'd try to reduce the problem to a classic swaps() based sorting algorithm.
In the following we assume without loss of generality j>=i:
Note that swap(i,j) = reverse(i,j) for each j <= i+2, the reversed sub array is only swapping the edges if there are 3 or less elements
Now, for any j>i+2 - all you need is just reverse() the array, by this swapping the edges - and then reverse the "middle" to get it back to the original, so you get: swap(i,j) = reverse(i,j) ; reverse(i+1,j-1)
Using the just built swap(), you can use any compare based algorithms that uses swaps, such as quicksort, which is O(nlogn). The complexity remains O(nlogn) since for each swap() you need up to 2 reverse() ops, which is O(1)
EDIT: Note: This solution fits for the original question (before it was editted), which asked for a solution, and not to optimize it better then quicksort/mergesort.

Assuming you want to minimize the number of external operations get and reverse:
read all integers into an internal array by calling get n times
do an internal sort (n log in internal ops) and calculate the permutation
sort the external array by calling reverse a maximum of n times
This has O(n) time and O(n) space complexity.
Edit in response to anonymous downvotes:
when talking about time complexity, you always have to state, which operations are to be counted. Here I assumed, only the external operations have a cost.

Based on get(int i) and reverse( int i, int j), we can't optimise the code. It will have same complexity.

Testing if unsorted sets are disjoint in linear time. (homework problem)

Problem:
Two sets A and B have n elements each. Assume that each element is an integer in the range [0, n^100]. These sets are not necessarily sorted. Show how to check whether these two sets are disjoint in O(n) time. Your algorithm should use O(n) space.
My original idea for this problem was to create a hash table of set A and search this hash table for each of the elements in B. However, I'm not aware of any way to create a hash table of a data set with this range that only takes O(n) space. Should I be considering a completely different approach?
UPDATE:
I contacted the professor regarding this problem asking about implementing a hash table and his response was:
Please note that hashing takes O(1) time for the operations only on an average. We need a worst case O(n) time algorithm for this problem.
So it seems the problem is looking for a different approach...

Input: Arrays A[m], B[n]
Output: True if they are disjoint, False otherwise
1. Brute Force: O(m*n) time, O(1) space
1. Search for each element of A into B
2. As soon as you get a match break and return false
3. If you reach till end, return true
Advantage: Doesn't modify the input
2. Sort both O(mlogm + nlogn + m + n)
1. Sort both arrays
2. Scan linearly
Disadvantage: Modifies the input
3. Sort smaller O((m + n)logm)
1. Say, m < n, sort A
2. Binary search for each element of B into A
Disadvantage: Modifies the input
4. Sort larger O((m + n)logn)
1. Say n > m, sort B
2. Binary search for each element of A into B
Disadvantage: Modifies the input
5. Hashing O(m + n) time, O(m) or O(n) space
Advantage: Doesn't modify the input

Why not use a hash table? Aren't they O(n) to create(Assuming they are all unique), then O(n) to search, being O(2n) = O(n)?

A hash set will work fine. It's extremely common to assume hash sets/tables are constant time per operation even though that's not strictly true.
Note that hash sets/tables absolutely only use space proportional to the elements inserted, not the potential total number of elements. You seem to have misunderstood that.
If "commonly assumed to be good enough" is unacceptable for some reason, you can use radix sort. It's linear in the total representation size of the input elements. (Caveat: that's slightly different from being linear in the number of elements.)

Honestly I didn't expect such answers from SO community but never mind. The question explicitly states that the algorithm should take O(n) space and time complexity, therefore we can rule out algorithms involving hashing since in the worst case hashing is not O(n).
Now I was going through some texts and found that the problem of finding whether 2 sets are reducible or not is reducible to the sorting problem. This is very standard when studying the lower bounds of many algorithms.
Actual lines from the book DESIGN METHODS AND ANALYSIS OF ALGORITHMS
By S. K. BASU · 2013.
Here the author clearly states that set disjointedness is clearly Omega(nlogn)

#include <bits/stdc++.h>
using namespace std;
int main()
{
unordered_map<string,int>m;
int n,i;
cin>>n;
string a,b; // for storing numbers upto n^100
for(i=0;i<n;i++)
{
cin>>a;
m[a]=1;
}
for(i=0;i<n;i++)
{
cin>>b;
if(m[b])
{
cout<<"Not disjoint";
exit(0);
}
}
cout<<"Disjoint";
return 0;
}
Time complexity : O(n)
Auxiliary space : O(n)

You can radix sort the inputs, in base n.
This will take 101 iterations through each array (because the input numbers are in the range 0 to n^100).
Once you've sorted the inputs, you can compare them in the obvious way in O(n) time.
Note: for the radix sort to run in O(n) time, you need to check that extracting the k'th digit (base n) of an input number is O(1). You can do that with (k-1) divisions by n and a modulo operation. Since k is at most 101, this is O(1).
side note
I note that kennytm# gave a similar answer is 2010, but the answer was deleted after commenters noted that "Radix sort is O(nk) time, where n is the number of keys, and k is the average key length. Since the max key value is n^100, the max key length would be 100 log n. So, this would still be O(n log n), same as all of the best sorting algorithms."
Note that this comment is incorrect -- the maximum key length is 101, because the key is a sequence of numbers some base, and is not measured in bits.

Is it possible to find two numbers whose difference is minimum in O(n) time

Given an unsorted integer array, and without making any assumptions on
the numbers in the array:
Is it possible to find two numbers whose
difference is minimum in O(n) time?
Edit: Difference between two numbers a, b is defined as abs(a-b)

Find smallest and largest element in the list. The difference smallest-largest will be minimum.
If you're looking for nonnegative difference, then this is of course at least as hard as checking if the array has two same elements. This is called element uniqueness problem and without any additional assumptions (like limiting size of integers, allowing other operations than comparison) requires >= n log n time. It is the 1-dimensional case of finding the closest pair of points.

I don't think you can to it in O(n). The best I can come up with off the top of my head is to sort them (which is O(n * log n)) and find the minimum difference of adjacent pairs in the sorted list (which adds another O(n)).

I think it is possible. The secret is that you don't actually have to sort the list, you just need to create a tally of which numbers exist. This may count as "making an assumption" from an algorithmic perspective, but not from a practical perspective. We know the ints are bounded by a min and a max.
So, create an array of 2 bit elements, 1 pair for each int from INT_MIN to INT_MAX inclusive, set all of them to 00.
Iterate through the entire list of numbers. For each number in the list, if the corresponding 2 bits are 00 set them to 01. If they're 01 set them to 10. Otherwise ignore. This is obviously O(n).
Next, if any of the 2 bits is set to 10, that is your answer. The minimum distance is 0 because the list contains a repeated number. If not, scan through the list and find the minimum distance. Many people have already pointed out there are simple O(n) algorithms for this.
So O(n) + O(n) = O(n).
Edit: responding to comments.
Interesting points. I think you could achieve the same results without making any assumptions by finding the min/max of the list first and using a sparse array ranging from min to max to hold the data. Takes care of the INT_MIN/MAX assumption, the space complexity and the O(m) time complexity of scanning the array.

The best I can think of is to counting sort the array (possibly combining equal values) and then do the sorted comparisons -- bin sort is O(n + M) (M being the number of distinct values). This has a heavy memory requirement, however. Some form of bucket or radix sort would be intermediate in time and more efficient in space.

Sort the list with radixsort (which is O(n) for integers), then iterate and keep track of the smallest distance so far.
(I assume your integer is a fixed-bit type. If they can hold arbitrarily large mathematical integers, radixsort will be O(n log n) as well.)

It seems to be possible to sort unbounded set of integers in O(n*sqrt(log(log(n))) time. After sorting it is of course trivial to find the minimal difference in linear time.
But I can't think of any algorithm to make it faster than this.

No, not without making assumptions about the numbers/ordering.
It would be possible given a sorted list though.

I think the answer is no and the proof is similar to the proof that you can not sort faster than n lg n: you have to compare all of the elements, i.e create a comparison tree, which implies omega(n lg n) algorithm.
EDIT. OK, if you really want to argue, then the question does not say whether it should be a Turing machine or not. With quantum computers, you can do it in linear time :)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to sort an array with unique values - algorithm

Quicksort gives us a pretty nice O(nlogn); However, I was thinking is there a way to sort an array with unique values that is faster than Quicksort?

Here are some of the fastest sorting algorithms and their runtimes: Mergesort: O(nlogn) Timsort: O(nlogn) Heapsort: O(nlogn) Radix sort: O(nk) Counting sort: O(n + k)

Related

A linear algorithm for this specification?

Algorithms Runtime complexity:

Special Sorting

Testing if unsorted sets are disjoint in linear time. (homework problem)

Is it possible to find two numbers whose difference is minimum in O(n) time

Categories

Resources