C++ map indices to sorted indices - algorithm

A standard problem in many languages is to sort an array and sort the indices as well. So for instance, if a = {4,1,3,2} the sorted array is b = {1,2,3,4} and the original indices moved would be {1,3,2,0}. This is easy to do by sorting a vector of pairs for instance.
What I want instead is an array c so that c[i] is the new position of element a[i] in the array b. So, in my example, c = {3,0,2,1} because 4 moves to position 3, 1 moved to position 0 and so on.
One way is to look up each element a[i] in b (perhaps using binary search to reduce lookup time) and then add the corresponding index in c. Is there a more efficient way?

Can you assume that you have the array of originally indices moved? It's the only array above that you didn't assign to a variable. If so, one efficient way of solving this problem is to back calculate it from that array of original indices moved.
You have that as {1,3,2,0}. All you need to do it march through it and put each values index at the value indicated.
So index 0 has a 1. That means at index 1 of the new array there should be a zero. Index 1 is a 3, so at index 3 of the new array put a 1. You would get your goal of {3,0,2,1}

Related

Algorithm for iterating over random permutation

I have a bag that has the following:
6 red marbles
5 green marbles
2 blue marbles
I want to remove a random marble from the bag, record its color, and repeat until no more marbles are left in the bag:
sort the counts
bag = {2:blue, 5:green, 6:red}
compute the cumulative counts
cumulative = {2:blue, 7:green, 13:red}
pick a random number in [0, max cumulative count]
rand(0, 13) = 3
find insertion point index of this integer using binary search
i = 1
record the color corresponding to this index
green
reduce that count by 1
bag = {2:blue, 4:green, 6:red}
repeat until no more marbles in bag
Is this a good way to do this or are there more efficient ways in terms of time complexity?
Your algorithm is pretty good, but it could be optimized further:
You don't need to sort the colors! You can skip the first step.
Instead of calculating the cumulative counts each time you can do it iteratively by decreasing all values right of the selected one (including the selected color itself).
You also don't need the binary search, you can just start decreasing the cumulative counts from the end until you reach the correct number.
There is also another algorithm based on lists:
Create a list with all the items (0=red, 1=green, 2=blue): [0,0,0,0,0,0,1,1,1,1,1,2,2].
Get a random integer i between 0 and the size of the list - 1.
Remove the ith item from the list and add it to the result.
Repeat 2. and 3. until the list is empty.
Instead of relying on extraction, you can shuffle the array in-place.
like in maraca's answer, you store the items individually in the array (citing it here: "Create a list with all the items (0=red, 1=green, 2=blue): [0,0,0,0,0,0,1,1,1,1,1,2,2].")
iterate through the array and, for each element i, pick a random index j of an element to swap place with
at the end, just iterate over the array to get a shuffled order.
Something like
for(i=0..len-1) {
j=random(0..len-1);
// swap them
aux=a[i]; a[i]=a[j]; a[j]=aux;
}
// now consume the array - it is as random as it can be
// without extracting from it on the way
Note: many programming languages will have libraries providing already implemented array/list shuffling functions
C++ - std::random_shuffle
Java - Collections.shuffle
Python - random.shuffle

Trying to improve efficiency of this search in an array

Suppose I have an input array where all objects are non-equivalent - e.g. [13,2,36]. I want the output array to be [1,0,2], since 13 is greater than 2 so "1", 2 is greater than no element so "0", 36 is greater than both 13 and 2 so "2". How do I get the output array with efficiency better than O(n2)?
Edit 1 : I also want to print the output in same ordering. Give a c/c++ code if possible.
Seems like a dynamic programming.
May be this can help
Here is an O(n) algorithm
1.Declare an array of say max size say 1000001;
2.Traverse through all the elements and make arr[input[n]]=1 where input[n] is the element
3.Traverse through the arr and add with the previous index(To keep record of arr[i] is greater than how many elements) like this
arr[i]+=arr[i-1]
Example: if input[]={12,3,36}
After step 2
arr[12]=1,arr[3]=1,arr[36]=1;
After step 3
arr[3]=1,arr[4]=arr[3]+arr[4]=1(arr[4]=0,arr[3]=1),
arr[11]=arr[10]=arr[9]=arr[8]=arr[7]arr[6]=arr[5]=arr[4]=1
arr[12]=arr[11]+arr[12]=2(arr[11]=1,arr[12]=1)
arr[36]=arr[35]+arr[36]=3(because arr[13],arr[14],...arr[35]=2 and arr[36]=1)
4.Traverse through the input array an print arr[input[i]]-1 where i is the index.
So arr[3]=1,arr[12]=2,arr[36]=3;
If you print arr[input[i]] then output will be {2,1,3} so we need to subtract 1 from each element then the output becomes {1,0,2} which is your desired output.
//pseude code
int arr[1000001];
int input[size];//size is the size of the input array
for(i=0;i<size;i++)
input[i]=take input;//take input
arr[input[i]]=1;//setting the index of input[i]=1;
for(i=1;i<1000001;i++)
arr[i]+=arr[i-1];
for(i=0;i<size;i++)
print arr[input[i]]-1;//since arr[i] was initialized with 1 but you want the input as 0 for first element(so subtracting 1 from each element)
To understand the algorithm better,take paper and pen and do the dry run.It will help to understand better.
Hope it helps
Happy Coding!!
Clone original array (and keep original indexes of elements somewhere) and quicksort it. Value of the element in quicksorted array should be quicksorted.length - i, where i is index of element in the new quicksorted array.
[13, 2, 36] - original
[36(2), 13(1), 2(0)] - sorted
[1, 0, 2] - substituted
def sort(array):
temp = sorted(array)
indexDict = {temp[i]: i for i in xrange(len(temp))}
return [indexDict[i] for i in array]
I realize it's in python, but nevertheless should still help you
Schwartzian transform: decorate, sort, undecorate.
Create a structure holding an object as well as an index. Create a new list of these structures from your list. Sort by the objects as planned. Create a list of the indices from the sorted list.

Given 2 arrays, returns elements that are not included in both arrays

I had an interview, and did one of the questions described below:
Given two arrays, please calculate the result: get the union and then remove the intersection from the union. e.g.
int a[] = {1, 3, 4, 5, 7};
int b[] = {5, 3, 8, 10}; // didn't mention if has the same value.
result = {1,4,7,8,10}
This is my idea:
Sort a, b.
Check each item of b using 'dichotomy search' in a. If not found, pass. Otherwise, remove this item from both a, b
result = elements left in a + elements left in b
I know it is a lousy algorithm, but nonetheless it's better than nothing. Is there a better approach than this one?
There are many approaches to this problem. one approach is:
1. construct hash-map using distinct array elements of array a with elements as keys and 1 is a value.
2. for every element,e in array b
if e in hash-map
set value of that key to 0
else
add e to result array.
3.add all keys from hash-map whose values 1 to result array.
another approach may be:
join both lists
sort the joined list
walk through the joined list and completely remove any elements that occurs multiple times
this have one drawback: it does not work if input lists already have doublets. But since we are talking about sets and set theory i would also expect the inputs to be sets in the mathematical sense.
Another (in my opinion the best) approach:
you do not need a search through your both lists. you can just sequentially iterate through them:
sort a and b
declare an empty result set
take iterators to both lists and repeat the following steps:
if the iterators values are unequal: add the smaller number to the result set and increment the belonging iterator
if the iterators values are equal: increment both iterators without adding something to the result set
if one iterator reaches end: add all remaining elements of the other set to the result

Disperse Duplicates in an Array

Source : Google Interview Question
Write a routine to ensure that identical elements in the input are maximally spread in the output?
Basically, we need to place the same elements,in such a way , that the TOTAL spreading is as maximal as possible.
Example:
Input: {1,1,2,3,2,3}
Possible Output: {1,2,3,1,2,3}
Total dispersion = Difference between position of 1's + 2's + 3's = 4-1 + 5-2 + 6-3 = 9 .
I am NOT AT ALL sure, if there's an optimal polynomial time algorithm available for this.Also,no other detail is provided for the question other than this .
What i thought is,calculate the frequency of each element in the input,then arrange them in the output,each distinct element at a time,until all the frequencies are exhausted.
I am not sure of my approach .
Any approaches/ideas people .
I believe this simple algorithm would work:
count the number of occurrences of each distinct element.
make a new list
add one instance of all elements that occur more than once to the list (order within each group does not matter)
add one instance of all unique elements to the list
add one instance of all elements that occur more than once to the list
add one instance of all elements that occur more than twice to the list
add one instance of all elements that occur more than trice to the list
...
Now, this will intuitively not give a good spread:
for {1, 1, 1, 1, 2, 3, 4} ==> {1, 2, 3, 4, 1, 1, 1}
for {1, 1, 1, 2, 2, 2, 3, 4} ==> {1, 2, 3, 4, 1, 2, 1, 2}
However, i think this is the best spread you can get given the scoring function provided.
Since the dispersion score counts the sum of the distances instead of the squared sum of the distances, you can have several duplicates close together, as long as you have a large gap somewhere else to compensate.
for a sum-of-squared-distances score, the problem becomes harder.
Perhaps the interview question hinged on the candidate recognizing this weakness in the scoring function?
In perl
#a=(9,9,9,2,2,2,1,1,1);
then make a hash table of the counts of different numbers in the list, like a frequency table
map { $x{$_}++ } #a;
then repeatedly walk through all the keys found, with the keys in a known order and add the appropriate number of individual numbers to an output list until all the keys are exhausted
#r=();
$g=1;
while( $g == 1 ) {
$g=0;
for my $n (sort keys %x)
{
if ($x{$n}>1) {
push #r, $n;
$x{$n}--;
$g=1
}
}
}
I'm sure that this could be adapted to any programming language that supports hash tables
python code for algorithm suggested by Vorsprung and HugoRune:
from collections import Counter, defaultdict
def max_spread(data):
cnt = Counter()
for i in data: cnt[i] += 1
res, num = [], list(cnt)
while len(cnt) > 0:
for i in num:
if num[i] > 0:
res.append(i)
cnt[i] -= 1
if cnt[i] == 0: del cnt[i]
return res
def calc_spread(data):
d = defaultdict()
for i, v in enumerate(data):
d.setdefault(v, []).append(i)
return sum([max(x) - min(x) for _, x in d.items()])
HugoRune's answer takes some advantage of the unusual scoring function but we can actually do even better: suppose there are d distinct non-unique values, then the only thing that is required for a solution to be optimal is that the first d values in the output must consist of these in any order, and likewise the last d values in the output must consist of these values in any (i.e. possibly a different) order. (This implies that all unique numbers appear between the first and last instance of every non-unique number.)
The relative order of the first copies of non-unique numbers doesn't matter, and likewise nor does the relative order of their last copies. Suppose the values 1 and 2 both appear multiple times in the input, and that we have built a candidate solution obeying the condition I gave in the first paragraph that has the first copy of 1 at position i and the first copy of 2 at position j > i. Now suppose we swap these two elements. Element 1 has been pushed j - i positions to the right, so its score contribution will drop by j - i. But element 2 has been pushed j - i positions to the left, so its score contribution will increase by j - i. These cancel out, leaving the total score unchanged.
Now, any permutation of elements can be achieved by swapping elements in the following way: swap the element in position 1 with the element that should be at position 1, then do the same for position 2, and so on. After the ith step, the first i elements of the permutation are correct. We know that every swap leaves the scoring function unchanged, and a permutation is just a sequence of swaps, so every permutation also leaves the scoring function unchanged! This is true at for the d elements at both ends of the output array.
When 3 or more copies of a number exist, only the position of the first and last copy contribute to the distance for that number. It doesn't matter where the middle ones go. I'll call the elements between the 2 blocks of d elements at either end the "central" elements. They consist of the unique elements, as well as some number of copies of all those non-unique elements that appear at least 3 times. As before, it's easy to see that any permutation of these "central" elements corresponds to a sequence of swaps, and that any such swap will leave the overall score unchanged (in fact it's even simpler than before, since swapping two central elements does not even change the score contribution of either of these elements).
This leads to a simple O(nlog n) algorithm (or O(n) if you use bucket sort for the first step) to generate a solution array Y from a length-n input array X:
Sort the input array X.
Use a single pass through X to count the number of distinct non-unique elements. Call this d.
Set i, j and k to 0.
While i < n:
If X[i+1] == X[i], we have a non-unique element:
Set Y[j] = Y[n-j-1] = X[i].
Increment i twice, and increment j once.
While X[i] == X[i-1]:
Set Y[d+k] = X[i].
Increment i and k.
Otherwise we have a unique element:
Set Y[d+k] = X[i].
Increment i and k.

How is counting sort a stable sort?

Suppose my input is (a,b and c to distinguish between equal keys)
1 6a 8 3 6b 0 6c 4
My counting sort will save as (discarding the a,b and c info!!)
0(1) 1(1) 3(1) 4(1) 6(3) 8(1)
which will give me the result
0 1 3 4 6 6 6 8
So, how is this stable sort?
I am not sure how it is "maintaining the relative order of records with equal keys."
Please explain.
To understand why counting sort is stable, you need to understand that counting sort can not only be used for sorting a list of integers, it can also be used for sorting a list of elements whose key is an integer, and these elements will be sorted by their keys while having additional information associated with each of them.
A counting sort example that sorts elements with additional information will help you to understand this. For instance, we want to sort three stocks by their prices:
[(GOOG 3), (CSCO 1), (MSFT 1)]
Here stock prices are integer keys, and stock names are their associated information.
Expected output for the sorting should be:
[(CSCO 1), (MSFT 1), (GOOG 3)]
(containing both stock price and its name,
and the CSCO stock should appear before MSFT so that it is a stable sort)
A counts array will be calculated for sorting this (let's say stock prices can only be 0 to 3):
counts array: [0, 2, 0, 1] (price "1" appear twice, and price "3" appear once)
If you are just sorting an integer array, you can go through the counts array and output "1" twice and "3" once and it is done, and the entire counts array will become an all-zero array after this.
But here we want to have stock names in sorting output as well. How can we obtain this additional information (it seems the counts array already discards this piece of information)? Well, the associated information is stored in the original unsorted array. In the unsorted array [(GOOG 3), (CSCO 1), (MSFT 1)], we have both the stock name and its price available. If we get to know which position (GOOG 3) should be in the final sorted array, we can copy this element to the sorted position in the sorted array.
To obtain the final position for each element in the sorted array, unlike sorting an integer array, you don't use the counts array directly to output the sorted elements. Instead, counting sort has an additional step which calculates the cumulative sum array from the counts array:
counts array: [0, 2, 2, 3] (i from 0 to 3: counts[i] = counts[i] + counts[i - 1])
This cumulative sum array tells us each value's position in the final sorted array currently. For example, counts[1]==2 means currently item with value 1 should be placed in the 2nd slot in the sorted array. Intuitively, because counts[i] is the cumulative sum from left, it shows how many smaller items are before the ith value, which tells you where the position should be for the ith value.
If a $1 price stock appears at the first time, it should be outputted to the second position of the sorted array and if a $3 price stock appears at the first time, it should be outputted to the third position of the sorted array. If a $1 stock appears and its element gets copied to the sorted array, we will decreased its count in the counts array.
counts array: [0, 1, 2, 3]
(so that the second appearance of $1 price stock's position will be 1)
So we can iterate the unsorted array from backwards (this is important to ensure the stableness), check its position in the sorted array according to the counts array, and copied it to the sorted array.
sorted array: [null, null, null]
counts array: [0, 2, 2, 3]
iterate stocks in unsorted stocks from backwards
1. the last stock (MSFT 1)
sorted array: [null, (MSFT 1), null] (copy to the second position because counts[1] == 2)
counts array: [0, 1, 2, 3] (decrease counts[1] by 1)
2. the middle stock (CSCO 1)
sorted array: [(CSCO 1), (MSFT 1), null] (copy to the first position because counts[1] == 1 now)
counts array: [0, 0, 2, 3] (decrease counts[1] by 1)
3. the first stock (GOOG 3)
sorted array: [(CSCO 1), (MSFT 1), (GOOG 3)] (copy to the third position because counts[3] == 3)
counts array: [0, 0, 2, 2] (decrease counts[3] by 1)
As you can see, after the array gets sorted, the counts array (which is [0, 0, 2, 2]) doesn't become an all-zero array like sorting an array of integers. The counts array is not used to tell how many times an integer appears in the unsorted array, instead, it is used to tell which position the element should be in the final sorted array. And since we decrease the count every time we output an element, we are essentially making the elements with same key's next appearance final position smaller. That's why we need to iterate the unsorted array from backwards to ensure its stableness.
Conclusion:
Since each element contains not only an integer as key, but also some additional information, even if their key is the same, you could tell each element is different by using the additional information, so you will be able to tell if it is a stable sorting algorithm (yes, it is a stable sorting algorithm if implemented appropriately).
References:
Some good materials explaining counting sort and its stableness:
http://www.algorithmist.com/index.php/Counting_sort (this article explains this question pretty well)
http://courses.csail.mit.edu/6.006/fall11/rec/rec07.pdf
http://rosettacode.org/wiki/Sorting_algorithms/Counting_sort (a list of counting sort implementations in different programming languages. If you compare them with the algorithm in wikipedia's entry below about counting sort, you will find most of which doesn't implement the exact counting sort correctly but implement only the integer sorting function and they don't have the additional step to calculate the cumulative sum array. But you could check out the implementation in 'Go' programming language in this link, which does provides two different implementations, one is used for sorting integers only and the other can be used for sorting elements containing additional information)
http://en.wikipedia.org/wiki/Counting_sort
Simple, really: instead of a simple counter for each 'bucket', it's a linked list.
That is, instead of
0(1) 1(1) 3(1) 4(1) 6(3) 8(1)
You get
0(.) 1(.) 3(.) 4(.) 6(a,b,c) 8(.)
(here I use . to denote some item in the bucket).
Then just dump them back into one sorted list:
0 1 3 4 6a 6b 6c 8
That is, when you find an item with key x, knowing that it may have other information that distinguishes it from other items with the same key, you don't just increment a counter for bucket x (which would discard all those extra information).
Instead, you have a linked list (or similarly ordered data structure with constant time amortized append) for each bucket, and you append that item to the end of the list for bucket x as you scan the input left to right.
So instead of using O(k) space for k counters, you have O(k) initially empty lists whose sum of lengths will be n at the end of the "counting" portion of the algorithm. This variant of counting sort will still be O(n + k) as before.
Your solution is not a full counting sort, and discards the associated values.
Here's the full counting sort algorithm.
After you calculated the histogram:
0(1) 1(1) 3(1) 4(1) 6(3) 8(1)
you have to calculate the accumulated sums - each cell will contain how many elements are less than or equal to that value:
0(1) 1(2) 3(3) 4(4) 6(7) 8(8)
Now you start from the end of your original list and go backwards.
Last element is 4. There are 4 elements less than or equal to 4. So 4 will go on the 4th position. You decrement the counter for 4.
0(1) 1(2) 3(3) 4(3) 6(7) 8(8)
The next element is 6c. There are 7 elements less than or equal to 6. So 6c will go to the 7th position. Again, you decrement the counter for 6.
0(1) 1(2) 3(3) 4(3) 6(6) 8(8)
^ next 6 will go now to 6th position
As you can see, this algorithm is a stable sort. The order for the elements with the same key will be kept.
If your three "6" values are distinguishable, then your counting sort is wrong (it discards information about the values, which a true sort doesn't do, because a true sort only re-orders the values).
If your three "6" values are not distinguishable, then the sort is stable, because you have three indistinguishable "6"s in the input, and three in the output. It's meaningless to talk about whether they have or have not been "re-ordered": they're identical.
The concept of non-stability only applies when the values have some associated information which does not participate in the order. For instance if you were sorting pointers to those integers, then you could "tell the difference" between the three 6s by looking at their different addresses. Then it would be meaningful to ask whether any particular sort was stable. A counting sort based on the integer values then would not be sorting the pointers. A counting sort based on the pointer values would not order them by integer value, rather by address.

Resources