Sorting array algorithm - duplicate values

Sorting array algorithm - duplicate values - sorting

I have an Integers array size N with duplicates values and I don't know the range of the values.
I have n/logn Different values in this array, and all the rest are duplicates.
Is there a way to sort it with Time complexity of O(n) and memory complexity of O(n/logn)?

May be you can use this :
#include <iostream>
#include <algorithm>
#include <vector>
#include <map>
using namespace std;
int main()
{
int n=7;
int v[]={4,1,2,3,2,4,1}; // initial array
map<int,int>mp;
vector<int>uni;
for(int i=0;i<n;i++)
if(++mp[v[i]]==1) // counting instance of unique values
uni.push_back(v[i]); // unique value
sort(uni.begin(),uni.end()); // sorting : O ((n / logn) * logn) = O(n)
int cur=0;
for(int i=0;i<uni.size();i++)
{
int cnt=mp[uni[i]];
while(cnt) // Adding duplicate values
{
v[cur++]=uni[i];
cnt--;
}
}
for(int i=0;i<n;i++) // Printing final sorted array
cout<<v[i]<<" ";
cout<<"\n";
return 0;
}
Here uni array keeps the unique values, that is maximum n / logn values.
Then we used stl sort() function having time complexity O (n * logn)
As here total element n = n / logn. The complexity will be O ((n / logn) * logn) = O(n).
So we see that, above method works with O(n) complexity with O(n * logn) memory.
Here map<> is used to count the number of times each distinct value appears.

Simply keep key value pair
for(i=0;i<n;i++)
{
int key = a[i];
// if value is in map increase vale of this key
// else add key with value 1
}
now write this map data to output array using merge sort
hope this will solve your problem..

Bubble sort takes O(n^2) time and needs virtually no memory to execute (other than to store the original array) - see http://rosettacode.org/wiki/Sorting_algorithms/Bubble_sort . Quick sort is much quicker though - see http://rosettacode.org/wiki/Quick_Sort
Duplicates don't affect either of these algorithms

Mergesort, e.g., is an O(n*lg(n)) time algorithm with O(n) space complexity.
You can use a hash map to extract the n/lg(n) unique items into their own array and note the number of times each item occurs; this is (expected) O(n) time and O(n/lg(n)) space. Now, you can run Mergesort on the new array, which is:
O(x*lg(x)) time with x=n/lg(n) ==
O(n/lg(n) * lg(n/lg(n))) ==
O(n/lg(n) * [lg(n) - lg(lg(n))]) ==
O(n - n*lg(lg(n))/lg(n))
<= O(n) time
and
O(x) space with x=n/lg(n) ==
O(n/lg(n)) space
Finally, expand the ordered array into the final result by duplicating elements based upon the duplication number noted in the hash map.

Related

Find an algorithm for sorting integers with time complexity O(n + k*log(k))

Design an algorithm that sorts n integers where there are duplicates. The total number of different numbers is k. Your algorithm should have time complexity O(n + k*log(k)). The expected time is enough. For which values of k does the algorithm become linear?
I am not able to come up with a sorting algorithm for integers which satisfies the condition that it must be O(n + k*log(k)). I am not a very advanced programmer but I was in the problem before this one supposed to come up with an algorithm for all numbers xi in a list, 0 ≤ xi ≤ m such that the algorithm was O(n+m), where n was the number of elements in the list and m was the value of the biggest integer in the list. I solved that problem easily by using counting sort but I struggle with this problem. The condition that makes it the most difficult for me is the term k*log(k) under the ordo notation if that was n*log(n) instead I would be able to use merge sort, right? But that's not possible now so any ideas would be very helpful.
Thanks in advance!

Here is a possible solution:
Using a hash table, count the number of unique values and the number of duplicates of each value. This should have a complexity of O(n).
Enumerate the hashtable, storing the unique values into a temporary array. Complexity is O(k).
Sort this array with a standard algorithm such as mergesort: complexity is O(k.log(k)).
Create the resulting array by replicating the elements of the sorted array of unique values each the number of times stored in the hash table. complexity is O(n) + O(k).
Combined complexity is O(n + k.log(k)).
For example, if k is a small constant, sorting an array of n values converges toward linear time as n becomes larger and larger.
If during the first phase, where k is computed incrementally, it appears that k is not significantly smaller than n, drop the hash table and just sort the original array with a standard algorithm.

The runtime of O(n + k*log(k) indicates (like addition in runtimes often does) that you have 2 subroutines, one which runes in O(n) and the other that runs in O(k*log(k)).
You can first count the frequency of the elements in O(n) (for example in a Hashmap, look this up if youre not familiar with it, it's very useful).
Then you just sort the unique elements, from which there are k. This sorting runs in O(k*log(k)), use any sorting algorithm you want.
At the end replace the single unique elements by how often they actually appeared, by looking this up in the map you created in step 1.

A possible Java solution an be like this:
public List<Integer> sortArrayWithDuplicates(List<Integer> arr) {
// O(n)
Set<Integer> set = new HashSet<>(arr);
Map<Integer, Integer> freqMap = new HashMap<>();
for(Integer i: arr) {
freqMap.put(i, freqMap.getOrDefault(i, 0) + 1);
}
List<Integer> withoutDups = new ArrayList<>(set);
// Sorting => O(k(log(k)))
// as there are k different elements
Arrays.sort(withoutDups);
List<Integer> result = new ArrayList<>();
for(Integer i : withoutDups) {
int c = freqMap.get(i);
for(int j = 0; j < c; j++) {
result.add(i);
}
}
// return the result
return result;
}
The time complexity of the above code is O(n + k*log(k)) and solution is in the same line as answered above.

I'm confused about space complexity

I'm a little confused about the space complexity.
int fn_sum(int a[], int n){
int result =0;
for(int i=0; i<n ; i++){
result += a[i];
}
return result;
}
In this case, is the space complexity O(n) or O(1)?
I think it uses only result,i variables so it is O(1). What's the answer?

(1) Space Complexity: how many memory do your algorithm allocate according to input size?
int fn_sum(int a[], int n){
int result = 0; //here you have 1 variable allocated
for(int i=0; i<n ; i++){
result += a[i];
}
return result;
}
as the variable you created (result) is a single value (it's not a list, an array, etc.), your space complexity is O(1), since the space usage is constant, which means: it doesn't change according to the size of the inputs, it's just a single and constant value.
(2) Time Complexity: how do the number of operations of your algorithm relates to the size of the input?
int fn_sum(int a[], int n){ //the input is an array of size n
int result = 0; //1 variable definition operation = O(1)
for(int i=0; i<n ; i++){ //loop that will run n times whatever it has inside
result += a[i]; //1 sum operation = O(1) that runs n times = n * O(1) = O(n)
}
return result; //1 return operation = O(1)
}
all the operations you do take O(1) + O(n) + O(1) = O(n + 2) = O(n) time, following the rules of removing multiplicative and additive constants from the function.

I answer bit differently:
Since memory space consumed by int fn_sum(int a[], int n) doesn't correlate with the number of input items its algorithmic complexity in this regard is O(1).
However runtime complexity is O(N) since it iterates over N items.
And yes, there are algorithms that consume more memory and get faster. Classic one is caching operations.
https://en.wikipedia.org/wiki/Space_complexity

If int means the 32-bit signed integer type, the space complexity is O(1) since you always allocate, use and return the same number of bits.
If this is just pseudocode and int means integers represented in their binary representations with no leading zeroes and maybe an extra sign bit (imagine doing this algorithm by hand), the analysis is more complicated.
If negatives are allowed, the best case is alternating positive and negative numbers so that the result never grows beyond a constant size - O(1) space.
If zero is allowed, an equally good case is to put zero in the whole array. This is also O(1).
If only positive numbers are allowed, the best case is more complicated. I expect the best case will see some number repeated n times. For the best case, we'll want the smallest representable number for the number of bits involved; so, I expect the number to be a power of 2. We can work out the sum in terms of n and the repeated number:
result = n * val
result size = log(result) = log(n * val) = log(n) + log(val)
input size = n*log(val) + log(n)
As val grows without bound, the log(val) term dominates in result size, and the n*log(val) term dominates in the input size; the best-case is thus like the multiplicative inverse of the input size, so also O(1).
The worst case should be had by choosing val to be as small as possible (we choose val = 1) and letting n grow without bound. In that case:
result = n
result size = log(n)
input size = 2 * log(n)
This time, the result size grows like half the input size as n grows. The worst-case space complexity is linear.

Another way to calculate space complexity is to analyze whether the memory required by your code scales/increases according to the input given.
Your input is int a[] with size being n. The only variable you have declared is result.
No matter what the size of n is, result is declared only once. It does not depend on the size of your input n.
Hence you can conclude your space complexity to be O(1).

Find the k largest elements in order

What is the fastest way to find the k largest elements in an array in order (i.e. starting from the largest element to the kth largest element)?

One option would be the following:
Using a linear-time selection algorithm like median-of-medians or introsort, find the kth largest element and rearrange the elements so that all elements from the kth element forward are greater than the kth element.
Sort all elements from the kth forward using a fast sorting algorithm like heapsort or quicksort.
Step (1) takes time O(n), and step (2) takes time O(k log k). Overall, the algorithm runs in time O(n + k log k), which is very, very fast.
Hope this helps!

C++ also provides the partial_sort algorithm, which solves the problem of selecting the smallest k elements (sorted), with a time complexity of O(n log k). No algorithm is provided for selecting the greatest k elements since this should be done by inverting the ordering predicate.
For Perl, the module Sort::Key::Top, available from CPAN, provides a set of functions to select the top n elements from a list using several orderings and custom key extraction procedures. Furthermore, the Statistics::CaseResampling module provides a function to calculate quantiles using quickselect.
Python's standard library (since 2.4) includes heapq.nsmallest() and nlargest(), returning sorted lists, the former in O(n + k log n) time, the latter in O(n log k) time.

Radix sort solution:
Sort the array in descending order, using radix sort;
Print first K elements.
Time complexity: O(N*L), where L = length of the largest element, can assume L = O(1).
Space used: O(N) for radix sort.
However, I think radix sort has costly overhead, making its linear time complexity less attractive.

1) Build a Max Heap tree in O(n)
2) Use Extract Max k times to get k maximum elements from the Max Heap O(klogn)
Time complexity: O(n + klogn)
A C++ implementation using STL is given below:
#include <iostream>
#include<bits/stdc++.h>
using namespace std;
int main() {
int arr[] = {4,3,7,12,23,1,8,5,9,2};
//Lets extract 3 maximum elements
int k = 3;
//First convert the array to a vector to use STL
vector<int> vec;
for(int i=0;i<10;i++){
vec.push_back(arr[i]);
}
//Build heap in O(n)
make_heap(vec.begin(), vec.end());
//Extract max k times
for(int i=0;i<k;i++){
cout<<vec.front()<<" ";
pop_heap(vec.begin(),vec.end());
vec.pop_back();
}
return 0;
}

#templatetypedef's solution is probably the fastest one, assuming you can modify or copy input.
Alternatively, you can use heap or BST (set in C++) to store k largest elements at given moment, then read array's elements one by one. While this is O(n lg k), it doesn't modify input and only uses O(k) additional memory. It also works on streams (when you don't know all the data from the beginning).

Here's a solution with O(N + k lg k) complexity.
int[] kLargest_Dremio(int[] A, int k) {
int[] result = new int[k];
shouldGetIndex = true;
int q = AreIndicesValid(0, A.Length - 1) ? RandomizedSelet(0, A.Length-1,
A.Length-k+1) : -1;
Array.Copy(A, q, result, 0, k);
Array.Sort(result, (a, b) => { return a>b; });
return result;
}
AreIndicesValid and RandomizedSelet are defined in this github source file.

There was a question on performance & restricted resources.
Make a value class for the top 3 values. Use such an accumulator for reduction in a parallel stream. Limit the parallelism according to the context (memory, power).
class BronzeSilverGold {
int[] values = new int[] {Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE};
// For reduction
void add(int x) {
...
}
// For combining two results of two threads.
void merge(BronzeSilverGold other) {
...
}
}
The parallelism must be restricted in your constellation, hence specify an N_THREADS in:
try {
ForkJoinPool threadPool = new ForkJoinPool(N_THREADS);
threadPool.submit(() -> {
BronzeSilverGold result = IntStream.of(...).parallel().collect(
BronzeSilverGold::new,
(bsg, n) -> BronzeSilverGold::add,
(bsg1, bsg2) -> bsg1.merge(bsg2));
...
});
} catch (InterruptedException | ExecutionException e) {
prrtl();
}

Algorithm for sum-up to 0 from 4 set

I have 4 arrays A, B, C, D of size n. n is at most 4000. The elements of each array are 30 bit (positive/negative) numbers. I want to know the number of ways, A[i]+B[j]+C[k]+D[l] = 0 can be formed where 0 <= i,j,k,l < n.
The best algorithm I derived is O(n^2 lg n), is there a faster algorithm?

Ok, Here is my O(n^2lg(n^2)) algorithm-
Suppose there is four array A[], B[], C[], D[]. we want to find the number of way A[i]+B[j]+C[k]+D[l] = 0 can be made where 0 <= i,j,k,l < n.
So sum up all possible arrangement of A[] and B[] and place them in another array E[] that contain n*n number of element.
int k=0;
for(i=0;i<n;i++)
{
for(j=0;j<n;j++)
{
E[k++]=A[i]+B[j];
}
}
The complexity of above code is O(n^2).
Do the same thing for C[] and D[].
int l=0;
for(i=0;i<n;i++)
{
for(j=0;j<n;j++)
{
AUX[l++]=C[i]+D[j];
}
}
The complexity of above code is O(n^2).
Now sort AUX[] so that you can find the number of occurrence of unique element in AUX[] easily.
Sorting complexity of AUX[] is O(n^2 lg(n^2)).
now declare a structure-
struct myHash
{
int value;
int valueOccuredNumberOfTimes;
}F[];
Now in structure F[] place the unique element of AUX[] and number of time they appeared.
It's complexity is O(n^2)
possibleQuardtupple=0;
Now for each item of E[], do the following
for(i=0;i<k;i++)
{
x=E[i];
find -x in structure F[] using binary search.
if(found in j'th position)
{
possibleQuardtupple+=number of occurrences of -x in F[j];
}
}
For loop i ,total n^2 number of iteration is performed and in each
iteration for binary search lg(n^2) comparison is done. So overall
complexity is O(n^2 lg(n^2)).
The number of way 0 can be reached is = possibleQuardtupple.
Now you can use stl map/ binary search. But stl map is slow, so its better to use binary search.
Hope my explanation is clear enough to understand.

I disagree that your solution is in fact as efficient as you say. In your solution populating E[] and AUX[] is O(N^2) each, so 2.N^2. These will each have N^2 elements.
Generating x = O(N)
Sorting AUX = O((2N)*log((2N)))
The binary search for E[i] in AUX[] is based on N^2 elements to be found in N^2 elements.
Thus you are still doing N^4 work, plus extra work generating the intermediate arrays ans for sorting the N^2 elements in AUX[].
I have a solution (work in progress) but I find it very difficult to calculate how much work it is. I deleted my previous answer. I will post something when I am more sure of myself.
I need to find a way to compare O(X)+O(Z)+O(X^3)+O(X^2)+O(Z^3)+O(Z^2)+X.log(X)+Z.log(Z) to O(N^4) where X+Z = N.
It is clearly less than O(N^4) ... but by how much???? My math is failing me here....

The judgement is wrong. The supplied solution generates arrays with size N^2. It then operates on these arrays (sorting, etc).
Therefore the Order of work, which would normaly be O(n^2.log(n)) should have n substituted with n^2. The result is therefore O((n^2)^2.log(n^2))

Median Algorithm in O(log n)

How can we remove the median of a set with time complexity O(log n)? Some idea?

If the set is sorted, finding the median requires O(1) item retrievals. If the items are in arbitrary sequence, it will not be possible to identify the median with certainty without examining the majority of the items. If one has examined most, but not all, of the items, that will allow one to guarantee that the median will be within some range [if the list contains duplicates, the upper and lower bounds may match], but examining the majority of the items in a list implies O(n) item retrievals.
If one has the information in a collection which is not fully ordered, but where certain ordering relationships are known, then the time required may require anywhere between O(1) and O(n) item retrievals, depending upon the nature of the known ordering relation.

For unsorted lists, repeatedly do O(n) partial sort until the element located at the median position is known. This is at least O(n), though.
Is there any information about the elements being sorted?

For a general, unsorted set, it is impossible to reliably find the median in better than O(n) time. You can find the median of a sorted set in O(1), or you can trivially sort the set yourself in O(n log n) time and then find the median in O(1), giving an O(n logn n) algorithm. Or, finally, there are more clever median selection algorithms that can work by partitioning instead of sorting and yield O(n) performance.
But if the set has no special properties and you are not allowed any pre-processing step, you will never get below O(n) by the simple fact that you will need to examine all of the elements at least once to ensure that your median is correct.

Here's a solution in Java, based on TreeSet:
public class SetWithMedian {
private SortedSet<Integer> s = new TreeSet<Integer>();
private Integer m = null;
public boolean contains(int e) {
return s.contains(e);
}
public Integer getMedian() {
return m;
}
public void add(int e) {
s.add(e);
updateMedian();
}
public void remove(int e) {
s.remove(e);
updateMedian();
}
private void updateMedian() {
if (s.size() == 0) {
m = null;
} else if (s.size() == 1) {
m = s.first();
} else {
SortedSet<Integer> h = s.headSet(m);
SortedSet<Integer> t = s.tailSet(m + 1);
int x = 1 - s.size() % 2;
if (h.size() < t.size() + x)
m = t.first();
else if (h.size() > t.size() + x)
m = h.last();
}
}
}
Removing the median (i.e. "s.remove(s.getMedian())") takes O(log n) time.
Edit: To help understand the code, here's the invariant condition of the class attributes:
private boolean isGood() {
if (s.isEmpty()) {
return m == null;
} else {
return s.contains(m) && s.headSet(m).size() + s.size() % 2 == s.tailSet(m).size();
}
}
In human-readable form:
If the set "s" is empty, then "m" must be
null.
If the set "s" is not empty, then it must
contain "m".
Let x be the number of elements
strictly less than "m", and let y be
the number of elements greater than
or equal "m". Then, if the total
number of elements is even, x must be
equal to y; otherwise, x+1 must be
equal to y.

Try a Red-black-tree. It should work quiet good and with a binary search you get ur log(n). It has aswell a remove and insert time of log(n) and rebalancing is done in log(n) aswell.

As mentioned in previous answers, there is no way to find the median without touching every element of the data structure. If the algorithm you look for must be executed sequentially, then the best you can do is O(n). The deterministic selection algorithm (median-of-medians) or BFPRT algorithm will solve the problem with a worst case of O(n). You can find more about that here: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm
However, the median of medians algorithm can be made to run faster than O(n) making it parallel. Due to it's divide and conquer nature, the algorithm can be "easily" made parallel. For instance, when dividing the input array in elements of 5, you could potentially launch a thread for each sub-array, sort it and find the median within that thread. When this step finished the threads are joined and the algorithm is run again with the newly formed array of medians.
Note that such design would only be beneficial in really large data sets. The additional overhead that spawning threads has and merging them makes it unfeasible for smaller sets. This has a bit of insight: http://www.umiacs.umd.edu/research/EXPAR/papers/3494/node18.html
Note that you can find asymptotically faster algorithms out there, however they are not practical enough for daily use. Your best bet is the already mentioned sequential median-of-medians algorithm.

Master Yoda's randomized algorithm has, of course, a minimum complexity of n like any other, an expected complexity of n (not log n) and a maximum complexity of n squared like Quicksort. It's still very good.
In practice, the "random" pivot choice might sometimes be a fixed location (without involving a RNG) because the initial array elements are known to be random enough (e.g. a random permutation of distinct values, or independent and identically distributed) or deduced from an approximate or exactly known distribution of input values.

I know one randomize algorithm with time complexity of O(n) in expectation.
Here is the algorithm:
Input: array of n numbers A[1...n] [without loss of generality we can assume n is even]
Output: n/2th element in the sorted array.
Algorithm ( A[1..n] , k = n/2):
Pick a pivot - p universally at random from 1...n
Divided array into 2 parts:
L - having element <= A[p]
R - having element > A[p]
if(n/2 == |L|) A[|L| + 1] is the median stop
if( n/2 < |L|) re-curse on (L, k)
else re-curse on (R, k - (|L| + 1)
Complexity:
O( n)
proof is all mathematical. One page long. If you are interested ping me.

To expand on rwong's answer: Here is an example code
// partial_sort example
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
int main () {
int myints[] = {9,8,7,6,5,4,3,2,1};
vector<int> myvector (myints, myints+9);
vector<int>::iterator it;
partial_sort (myvector.begin(), myvector.begin()+5, myvector.end());
// print out content:
cout << "myvector contains:";
for (it=myvector.begin(); it!=myvector.end(); ++it)
cout << " " << *it;
cout << endl;
return 0;
}
Output:
myvector contains: 1 2 3 4 5 9 8 7 6
The element in the middle would be the median.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Sorting array algorithm - duplicate values - sorting

I have an Integers array size N with duplicates values and I don't know the range of the values. I have n/logn Different values in this array, and all the rest are duplicates. Is there a way to sort it with Time complexity of O(n) and memory complexity of O(n/logn)?

Simply keep key value pair for(i=0;i<n;i++) { int key = a[i]; // if value is in map increase vale of this key // else add key with value 1 } now write this map data to output array using merge sort hope this will solve your problem..

Related

Find an algorithm for sorting integers with time complexity O(n + k*log(k))

I'm confused about space complexity

Find the k largest elements in order

Algorithm for sum-up to 0 from 4 set

Median Algorithm in O(log n)

Categories

Resources