Getting indices of sorting integer array - sorting

i have an array with integers, which i need to sort. however the result should not contain the integer values, but the indices. i.e. the new order of the old array.
for example: [10, 20, 30]
should result in: [2, 1, 0]
what is an optimized algorithm to achieve this?

You can achieve this with any sorting algorithm, if you convert each element to a tuple of (value, position) and sort this.
That is, [10, 20, 30] would become [(10, 0), (20, 1), (30, 2)]. You'd then sort this array using a comparator that looks at the first element of the tuples, giving you [(30, 2), (20, 1), (10, 0)]. From this, you can simply grab the second element of each tuple to get what you want, [2, 1, 0]. (Under the assumption you want reverse sorting.)

Won't be different from any other sorting algorithm, just modify it so that it builds or takes in an array of indices and then manipulates both the data and array of indices instead of just the data.

You could create a array of pointers to the original array of integers, perform a merge sort or what ever sorting algorithm you find most suiting (uses the value at the pointer) then just run down the list calculating the indicies based on each pointers relative address to the beginning of the allocated block containing the original array of integers.

Related

Sorting a list that have n fixed segments already sorted in ascending order

The question goes as follows:
Lists which consist of a small fixed number, n, of segments connected
end-to-end, each segment is already in ascending order.
I thought about using mergesort with base case being if equal to n then go back and merge them since we already know that they are sorted, but if I have 3 segments it won't work since I'm dividing by two and you can't divide 3 segments equally into two parts.
The other approach which is similar to merge sort. so I use n stacks for each segment, which we can identify if L[i] > L[i+1] since segments are in ascending order. But I need n comparisons to figure out which element comes first, and I don't know an efficient way of comparing n elements dynamically without using another data structure to compare the elements at the top of the stack.
Also, you are supposed to use the problem feature, segments already ordered, to get better results than conventional algorithms. i.e. complexity less than O(nlogn).
A pseudocode would be nice if you have an idea.
Edit:
An example would be [(14,20,22),(7,8,9),(1,2,3)] here we have 3 segments of 3 elements, even though the segments are sorted, the whole list isn't.
p.s. () is there to point out the segments only
I think maybe you've misunderstood mergesort. While usually you would split in half and sort each half before merging, it's really the merging part which makes the algorithm. You just need to merge on runs instead.
With your example of [(14,20,22),(7,8,9),(1,2,3)]
After first merge you have [(7, 8, 9, 14, 20, 22),(1, 2, 3)]
After second merge you have [(1, 2, 3, 7, 8, 9, 14, 20, 22)]
l = [14, 20, 22, 7, 8, 9, 1, 2, 3]
rl = [] # run list
sl = [l[0]] # temporary sublist
#split list into list of sorted sublists
for item in l[1:]:
if item > sl[-1]:
sl.append(item)
else:
rl.append(sl)
sl = [item]
rl.append(sl)
print(rl)
#function for merging two sorted lists
def merge(l1, l2):
l = [] #list we add into
while True:
if not l1:
# first list is empty, add second list onto new list
return l + l2
if not l2:
# second list is empty, add first list onto new list
return l + l1
if l1[0] < l2[0]:
# rather than deleting, you could increment an index
# which is likely to be faster, or reverse the list
# and pop off the end, or use a data structure which
# allows you to pop off the front
l.append(l1[0])
del l1[0]
else:
l.append(l2[0])
del l2[0]
# keep mergins sublists until only one remains
while len(rl) > 1:
rl.append(merge(rl.pop(), rl.pop()))
print(rl)
It's worth noting that unless this is simply an excercise, you are probably better off using whatever inbuilt sorting function your language of choice uses.

Get kth group of unsorted result list with arbitrary number of results per group

Okay so I have a huge array of unsorted elements of an unknown data type (all elements are of the same type, obviously, I just can't make assumptions as they could be numbers, strings, or any type of object that overloads the < and > operators. The only assumption I can make about those objects is that no two of them are the same, and comparing them (A < B) should give me which one should show up first if it was sorted. The "smallest" should be first.
I receive this unsorted array (type std::vector, but honestly it's more of an algorithm question so no language in particular is expected), a number of objects per "group" (groupSize), and the group number that the sender wants (groupNumber).
I'm supposed to return an array containing groupSize elements, or less if the group requested is the last one. (Examples: 17 results with groupSize of 5 would only return two of them if you ask for the fourth group. Also, the fourth group is group number 3 because it's a zero-indexed array)
Example:
Received Array: {1, 5, 8, 2, 19, -1, 6, 6.5, -14, 20}
Received pageSize: 3
Received pageNumber: 2
If the array was sorted, it would be: {-14, -1, 1, 2, 5, 6, 6.5, 8, 19, 20}
If it was split in groups of size 3: {{-14, -1, 1}, {2, 5, 6}, {6.5, 8, 19}, {20}}
I have to return the third group (pageNumber 2 in a 0-indexed array): {6.5, 8, 19}
The biggest problem is the fact that it needs to be lightning fast. I can't sort the array because it has to be faster than O(n log n).
I've tried several methods, but can never get under O(n log n).
I'm aware that I should be looking for a solution that doesn't fill up all the other groups, and skips a pretty big part of the steps shown in the example above, to create only the requested group before returning it, but I can't figure out a way to do that.
You can find the value of the smallest element s in the group in linear time using the standard C++ std::nth_element function (because you know it's index in the sorted array). You can find the largest element S in the group in the same way. After that, you need a linear pass to find all elements x such that s <= x <= S and return them. The total time complexity is O(n).
Note: this answer is not C++ specific. You just need an implementation of the k-th order statistics in linear time.

Sorting unspecified amount of numbers in Lua

I need to sort and unspecifed amount of numbers in Lua. For example if I have theese numbers 15,21,31,50,32,11,11. I need to lua to sort them so the first one is the biggest like this: 50,32,31,21,15,11,11.
What is the easiest way to do this? Remember it got to work with an unspecified amont of numbers. Thanks!
table.sort sorts a table in place. By default, it uses < to compare elements. To sort them with the bigger element before smaller element:
local t = {15, 21, 31, 50, 32, 11, 11}
table.sort(t, function(a, b) return a > b end)
The number of elements doesn't matter, as a table can hold as many elements as possible.

Searching through arrays in Ruby using a range

i've seen a lot of other questions touch on the subject but nothing as on topic as to provide an answer for my particular problem. Is there a way to search an array and return values within a given range...
for clarity I have one array = [0,5,12]
I would like to compare array to another array (array2) using a range of numbers.
Using array[0] as a starting point how would I return all values from array2 +/- 4 of array[0].
In this particular case the returned numbers from array2 will be within the range of -4 and 4.
Thanks for the help ninjas.
Build a Range that is your target ±4 and then use Enumerable#select (remember that Array includes Enumerable) and Range#include?.
For example, let us look for 11±4 in an array that contains the integers between 1 and 100 (inclusive):
a = (1..100).to_a
r = 11-4 .. 11+4
a.select { |i| r.include?(i) }
# [7, 8, 9, 10, 11, 12, 13, 14, 15]
If you don't care about preserving order in your output and you don't have any duplicates in your array you could do it this way:
a & (c-w .. c+w).to_a
Where c is the center of your interval and w is the interval's width. Using Array#& treats the arrays as sets so it will remove duplicates and is not guaranteed to preserver order.

How is counting sort a stable sort?

Suppose my input is (a,b and c to distinguish between equal keys)
1 6a 8 3 6b 0 6c 4
My counting sort will save as (discarding the a,b and c info!!)
0(1) 1(1) 3(1) 4(1) 6(3) 8(1)
which will give me the result
0 1 3 4 6 6 6 8
So, how is this stable sort?
I am not sure how it is "maintaining the relative order of records with equal keys."
Please explain.
To understand why counting sort is stable, you need to understand that counting sort can not only be used for sorting a list of integers, it can also be used for sorting a list of elements whose key is an integer, and these elements will be sorted by their keys while having additional information associated with each of them.
A counting sort example that sorts elements with additional information will help you to understand this. For instance, we want to sort three stocks by their prices:
[(GOOG 3), (CSCO 1), (MSFT 1)]
Here stock prices are integer keys, and stock names are their associated information.
Expected output for the sorting should be:
[(CSCO 1), (MSFT 1), (GOOG 3)]
(containing both stock price and its name,
and the CSCO stock should appear before MSFT so that it is a stable sort)
A counts array will be calculated for sorting this (let's say stock prices can only be 0 to 3):
counts array: [0, 2, 0, 1] (price "1" appear twice, and price "3" appear once)
If you are just sorting an integer array, you can go through the counts array and output "1" twice and "3" once and it is done, and the entire counts array will become an all-zero array after this.
But here we want to have stock names in sorting output as well. How can we obtain this additional information (it seems the counts array already discards this piece of information)? Well, the associated information is stored in the original unsorted array. In the unsorted array [(GOOG 3), (CSCO 1), (MSFT 1)], we have both the stock name and its price available. If we get to know which position (GOOG 3) should be in the final sorted array, we can copy this element to the sorted position in the sorted array.
To obtain the final position for each element in the sorted array, unlike sorting an integer array, you don't use the counts array directly to output the sorted elements. Instead, counting sort has an additional step which calculates the cumulative sum array from the counts array:
counts array: [0, 2, 2, 3] (i from 0 to 3: counts[i] = counts[i] + counts[i - 1])
This cumulative sum array tells us each value's position in the final sorted array currently. For example, counts[1]==2 means currently item with value 1 should be placed in the 2nd slot in the sorted array. Intuitively, because counts[i] is the cumulative sum from left, it shows how many smaller items are before the ith value, which tells you where the position should be for the ith value.
If a $1 price stock appears at the first time, it should be outputted to the second position of the sorted array and if a $3 price stock appears at the first time, it should be outputted to the third position of the sorted array. If a $1 stock appears and its element gets copied to the sorted array, we will decreased its count in the counts array.
counts array: [0, 1, 2, 3]
(so that the second appearance of $1 price stock's position will be 1)
So we can iterate the unsorted array from backwards (this is important to ensure the stableness), check its position in the sorted array according to the counts array, and copied it to the sorted array.
sorted array: [null, null, null]
counts array: [0, 2, 2, 3]
iterate stocks in unsorted stocks from backwards
1. the last stock (MSFT 1)
sorted array: [null, (MSFT 1), null] (copy to the second position because counts[1] == 2)
counts array: [0, 1, 2, 3] (decrease counts[1] by 1)
2. the middle stock (CSCO 1)
sorted array: [(CSCO 1), (MSFT 1), null] (copy to the first position because counts[1] == 1 now)
counts array: [0, 0, 2, 3] (decrease counts[1] by 1)
3. the first stock (GOOG 3)
sorted array: [(CSCO 1), (MSFT 1), (GOOG 3)] (copy to the third position because counts[3] == 3)
counts array: [0, 0, 2, 2] (decrease counts[3] by 1)
As you can see, after the array gets sorted, the counts array (which is [0, 0, 2, 2]) doesn't become an all-zero array like sorting an array of integers. The counts array is not used to tell how many times an integer appears in the unsorted array, instead, it is used to tell which position the element should be in the final sorted array. And since we decrease the count every time we output an element, we are essentially making the elements with same key's next appearance final position smaller. That's why we need to iterate the unsorted array from backwards to ensure its stableness.
Conclusion:
Since each element contains not only an integer as key, but also some additional information, even if their key is the same, you could tell each element is different by using the additional information, so you will be able to tell if it is a stable sorting algorithm (yes, it is a stable sorting algorithm if implemented appropriately).
References:
Some good materials explaining counting sort and its stableness:
http://www.algorithmist.com/index.php/Counting_sort (this article explains this question pretty well)
http://courses.csail.mit.edu/6.006/fall11/rec/rec07.pdf
http://rosettacode.org/wiki/Sorting_algorithms/Counting_sort (a list of counting sort implementations in different programming languages. If you compare them with the algorithm in wikipedia's entry below about counting sort, you will find most of which doesn't implement the exact counting sort correctly but implement only the integer sorting function and they don't have the additional step to calculate the cumulative sum array. But you could check out the implementation in 'Go' programming language in this link, which does provides two different implementations, one is used for sorting integers only and the other can be used for sorting elements containing additional information)
http://en.wikipedia.org/wiki/Counting_sort
Simple, really: instead of a simple counter for each 'bucket', it's a linked list.
That is, instead of
0(1) 1(1) 3(1) 4(1) 6(3) 8(1)
You get
0(.) 1(.) 3(.) 4(.) 6(a,b,c) 8(.)
(here I use . to denote some item in the bucket).
Then just dump them back into one sorted list:
0 1 3 4 6a 6b 6c 8
That is, when you find an item with key x, knowing that it may have other information that distinguishes it from other items with the same key, you don't just increment a counter for bucket x (which would discard all those extra information).
Instead, you have a linked list (or similarly ordered data structure with constant time amortized append) for each bucket, and you append that item to the end of the list for bucket x as you scan the input left to right.
So instead of using O(k) space for k counters, you have O(k) initially empty lists whose sum of lengths will be n at the end of the "counting" portion of the algorithm. This variant of counting sort will still be O(n + k) as before.
Your solution is not a full counting sort, and discards the associated values.
Here's the full counting sort algorithm.
After you calculated the histogram:
0(1) 1(1) 3(1) 4(1) 6(3) 8(1)
you have to calculate the accumulated sums - each cell will contain how many elements are less than or equal to that value:
0(1) 1(2) 3(3) 4(4) 6(7) 8(8)
Now you start from the end of your original list and go backwards.
Last element is 4. There are 4 elements less than or equal to 4. So 4 will go on the 4th position. You decrement the counter for 4.
0(1) 1(2) 3(3) 4(3) 6(7) 8(8)
The next element is 6c. There are 7 elements less than or equal to 6. So 6c will go to the 7th position. Again, you decrement the counter for 6.
0(1) 1(2) 3(3) 4(3) 6(6) 8(8)
^ next 6 will go now to 6th position
As you can see, this algorithm is a stable sort. The order for the elements with the same key will be kept.
If your three "6" values are distinguishable, then your counting sort is wrong (it discards information about the values, which a true sort doesn't do, because a true sort only re-orders the values).
If your three "6" values are not distinguishable, then the sort is stable, because you have three indistinguishable "6"s in the input, and three in the output. It's meaningless to talk about whether they have or have not been "re-ordered": they're identical.
The concept of non-stability only applies when the values have some associated information which does not participate in the order. For instance if you were sorting pointers to those integers, then you could "tell the difference" between the three 6s by looking at their different addresses. Then it would be meaningful to ask whether any particular sort was stable. A counting sort based on the integer values then would not be sorting the pointers. A counting sort based on the pointer values would not order them by integer value, rather by address.

Resources