Trying to improve efficiency of this search in an array - algorithm

Suppose I have an input array where all objects are non-equivalent - e.g. [13,2,36]. I want the output array to be [1,0,2], since 13 is greater than 2 so "1", 2 is greater than no element so "0", 36 is greater than both 13 and 2 so "2". How do I get the output array with efficiency better than O(n2)?
Edit 1 : I also want to print the output in same ordering. Give a c/c++ code if possible.

Seems like a dynamic programming.
May be this can help
Here is an O(n) algorithm
1.Declare an array of say max size say 1000001;
2.Traverse through all the elements and make arr[input[n]]=1 where input[n] is the element
3.Traverse through the arr and add with the previous index(To keep record of arr[i] is greater than how many elements) like this
arr[i]+=arr[i-1]
Example: if input[]={12,3,36}
After step 2
arr[12]=1,arr[3]=1,arr[36]=1;
After step 3
arr[3]=1,arr[4]=arr[3]+arr[4]=1(arr[4]=0,arr[3]=1),
arr[11]=arr[10]=arr[9]=arr[8]=arr[7]arr[6]=arr[5]=arr[4]=1
arr[12]=arr[11]+arr[12]=2(arr[11]=1,arr[12]=1)
arr[36]=arr[35]+arr[36]=3(because arr[13],arr[14],...arr[35]=2 and arr[36]=1)
4.Traverse through the input array an print arr[input[i]]-1 where i is the index.
So arr[3]=1,arr[12]=2,arr[36]=3;
If you print arr[input[i]] then output will be {2,1,3} so we need to subtract 1 from each element then the output becomes {1,0,2} which is your desired output.
//pseude code
int arr[1000001];
int input[size];//size is the size of the input array
for(i=0;i<size;i++)
input[i]=take input;//take input
arr[input[i]]=1;//setting the index of input[i]=1;
for(i=1;i<1000001;i++)
arr[i]+=arr[i-1];
for(i=0;i<size;i++)
print arr[input[i]]-1;//since arr[i] was initialized with 1 but you want the input as 0 for first element(so subtracting 1 from each element)
To understand the algorithm better,take paper and pen and do the dry run.It will help to understand better.
Hope it helps
Happy Coding!!

Clone original array (and keep original indexes of elements somewhere) and quicksort it. Value of the element in quicksorted array should be quicksorted.length - i, where i is index of element in the new quicksorted array.
[13, 2, 36] - original
[36(2), 13(1), 2(0)] - sorted
[1, 0, 2] - substituted

def sort(array):
temp = sorted(array)
indexDict = {temp[i]: i for i in xrange(len(temp))}
return [indexDict[i] for i in array]
I realize it's in python, but nevertheless should still help you

Schwartzian transform: decorate, sort, undecorate.
Create a structure holding an object as well as an index. Create a new list of these structures from your list. Sort by the objects as planned. Create a list of the indices from the sorted list.

Related

C++ map indices to sorted indices

A standard problem in many languages is to sort an array and sort the indices as well. So for instance, if a = {4,1,3,2} the sorted array is b = {1,2,3,4} and the original indices moved would be {1,3,2,0}. This is easy to do by sorting a vector of pairs for instance.
What I want instead is an array c so that c[i] is the new position of element a[i] in the array b. So, in my example, c = {3,0,2,1} because 4 moves to position 3, 1 moved to position 0 and so on.
One way is to look up each element a[i] in b (perhaps using binary search to reduce lookup time) and then add the corresponding index in c. Is there a more efficient way?
Can you assume that you have the array of originally indices moved? It's the only array above that you didn't assign to a variable. If so, one efficient way of solving this problem is to back calculate it from that array of original indices moved.
You have that as {1,3,2,0}. All you need to do it march through it and put each values index at the value indicated.
So index 0 has a 1. That means at index 1 of the new array there should be a zero. Index 1 is a 3, so at index 3 of the new array put a 1. You would get your goal of {3,0,2,1}

Given 2 arrays, returns elements that are not included in both arrays

I had an interview, and did one of the questions described below:
Given two arrays, please calculate the result: get the union and then remove the intersection from the union. e.g.
int a[] = {1, 3, 4, 5, 7};
int b[] = {5, 3, 8, 10}; // didn't mention if has the same value.
result = {1,4,7,8,10}
This is my idea:
Sort a, b.
Check each item of b using 'dichotomy search' in a. If not found, pass. Otherwise, remove this item from both a, b
result = elements left in a + elements left in b
I know it is a lousy algorithm, but nonetheless it's better than nothing. Is there a better approach than this one?
There are many approaches to this problem. one approach is:
1. construct hash-map using distinct array elements of array a with elements as keys and 1 is a value.
2. for every element,e in array b
if e in hash-map
set value of that key to 0
else
add e to result array.
3.add all keys from hash-map whose values 1 to result array.
another approach may be:
join both lists
sort the joined list
walk through the joined list and completely remove any elements that occurs multiple times
this have one drawback: it does not work if input lists already have doublets. But since we are talking about sets and set theory i would also expect the inputs to be sets in the mathematical sense.
Another (in my opinion the best) approach:
you do not need a search through your both lists. you can just sequentially iterate through them:
sort a and b
declare an empty result set
take iterators to both lists and repeat the following steps:
if the iterators values are unequal: add the smaller number to the result set and increment the belonging iterator
if the iterators values are equal: increment both iterators without adding something to the result set
if one iterator reaches end: add all remaining elements of the other set to the result

Boolean values and sorting list logic. Intro to programming

my teacher recently wrote a program on how to sort a list with numbers and i dont understand mainly the boolean value statements and then the logic in the loop to sort the numbers so help would be appreciated in just explaining what he did. I have homework to do and sorting is part of it so im just trying to understand this example he did. Thanks
d = [8, 14, 3, 5, 2, 23] # lists
size = len( d ) # size = number of elements in list
unsorted = True # what does this really mean and do?
while unsorted : # bassicly while true, but while what is true? what would make it false?
unsorted = False #? did he just change the variable to false? if so, why, and
# how is the "while unsorted" before statement still being met
i = 0 # this bassiclly begins the indency number in the list below
while i < size-1 : # so while the indency number is less than the list
# element size it will loop through the rest
if d[i] > d[i+1] : # if the number in d[i]is greater than the number after
temp = d[i] # then variable temp gets assigned that number in d[i]
d[i] = d[i+1] # this confuses me. whats the purpose of setting d[i] to d[i+1]?
d[i+1] = temp # i think this has to do with the statement above, what does it
unsorted = True # why is this suddenly turned back to true?
i += 1 # adds 1 to to indency or i until it reaches the list size to stop loop
print d
Output ends up being a sorted list below
[2, 3, 5, 8, 14, 23]
Thanks
This is the Bubble sort sorting algorithm. To sort all elements of an array in ascending order this algorithm compares two neighbouring elements, and swaps their location if the successor i+1 of an element i has the smaller value.
Now lets comment some of your comments ;-)
unsorted = True # what does this really mean and do?
This declares and initializes your boolean value. If False you wouldn't be able to enter the following while loop.
while unsorted : # bassicly while true, but while what is true? what would make it false?
unsorted = False #? did he just change the variable to false? if so, why, and
# how is the "while unsorted" before statement still being met
The condition for execution of a while loop only gets checked before entering a new "round". Please check how a while loop works, this is a fundamental construct! The variable unsorted is set to False so the program is able to leave the loop, when the array has been sorted entirely.
i = 0 # this bassiclly begins the indency number in the list below
Yes, indeed Python uses zero based indexing (another term you should look up). This means that the first element in an array comes with the index zero
while i < size-1 : # so while the indency number is less than the list
# element size it will loop through the rest
This makes you able to loop over all elements of the array. But be aware that this line may provoke an error. It really should be:
while i < size-2
size-1 is the index of the last element in an array of length size. But since you always compare an element and its successor you don't have to check the last element of the array (it doesn't have an successor).
temp = d[i] # then variable temp gets assigned that number in d[i]
d[i] = d[i+1] # this confuses me. whats the purpose of setting d[i] to d[i+1]?
d[i+1] = temp # i think this has to do with the statement above, what does it
This is the swapping I told you about. Elements d[i] and d[i+1] switch places. To do so you need an temporary storage for one variable.
unsorted = True # why is this suddenly turned back to true?
Because he had to change the order of the elements in the array. The program should only be allowed to leave the outer while loop when no more swapping is necessary and the array elements have been sorted.
This is a bubble sort.
The concept of a bubble sort is basically swapping larger numbers towards the end of the list. The Boolean variable is used for keeping track of whether or not the list is sorted.
We know that the list is sorted if we checked every number and we don't have to swap any of them. (That's basically what the code does and the reason we need the Boolean variable)
unsorted = True # what does this really mean and do?
This keeps tracks of whether or not the list is sorted. When this is False, we are done sorting and we can print the list. However, if it is True, we have to check the list and swap the numbers to the correct spot.
while unsorted : # bassicly while true, but while what is true? what would make it false?
As I mentioned, while True: means that the list is not sorted last time we checked, so we have to check the list again(i.e. run the code in the while loop.)
unsorted = False
This might be the tricky part. We just assume that the list the sorted, unless we have to swap numbers. (The code below is the piece of code that do the swapping)
if d[i] > d[i+1] :
temp = d[i] # store the larger number in a temporary variable
d[i] = d[i+1] # put the smaller number in the spot of the larger number
d[i+1] = temp # put the larger number after the smaller number
unsorted = True # we swapped a number, so this list might not be completely sorted

Calculate Median in An Array - Can someone tell me what is going on in this line of code?

This is a solution for calculating the median value in an array. I get the first three lines, duh ;), but the third line is where the magic is happening. Can someone explain how the 'sorted' variable is using and why it's next to brackets, and why the other variable 'len' is enclosed in those parentheses and then brackets? It's almost like sorted is all of a sudden being used as an array? Thanks!
def median(array)
sorted = array.sort
len = sorted.length
return ((sorted[(len - 1) / 2] + sorted[len / 2]) / 2.0).to_f
end
puts median([3,2,3,8,91])
puts median([2,8,3,11,-5])
puts median([4,3,8,11])
Consider this:
[1,2,2,3,4] and [1,2,3,4]. Both arrays are sorted, but have odd and even numbers of elements respectively. So, that piece of code is taking into account these 2 cases.
sorted is indeed an array. You sort [2,3,1,4] and you get back [1,2,3,4]. Then you calculate the middle index (len - 1) / 2 and len / 2 for even / odd number of elements, and find the average of them.
Yes, array.sort is returning an array and it is assigned to sorted. You can then access it via array indices.
If you have an odd number of elements, say 5 elements as in the example, the indices come out to be:
(len-1)/2=(5-1)/2=2
len/2=5/2=2 --- (remember this is integer division, so the decimal gets truncated)
So you take the value at index 2 and add them, and then divide by 2, which is the same as the value at index 2.
If you have an even number of elements, say 4,
(len-1)/2=(4-1)/2=1 --- (remember this is integer division, so the decimal gets truncated)
len/2=4/2=2
So in this case, you are effectively averaging the two middle elements 1 and 2, which is the definition of median for when you have an even number of elements.
It's almost like sorted is all of a sudden being used as an array?
Yes, it is. On line 2 it's being initialized as being an array with the same elements as the input, but in ascending order (default sort is ascending). On line 3 you have len which is initialized with the length of the sorted array, so yeah, sorted is being used as an array since then, because that's what it is.

How is counting sort a stable sort?

Suppose my input is (a,b and c to distinguish between equal keys)
1 6a 8 3 6b 0 6c 4
My counting sort will save as (discarding the a,b and c info!!)
0(1) 1(1) 3(1) 4(1) 6(3) 8(1)
which will give me the result
0 1 3 4 6 6 6 8
So, how is this stable sort?
I am not sure how it is "maintaining the relative order of records with equal keys."
Please explain.
To understand why counting sort is stable, you need to understand that counting sort can not only be used for sorting a list of integers, it can also be used for sorting a list of elements whose key is an integer, and these elements will be sorted by their keys while having additional information associated with each of them.
A counting sort example that sorts elements with additional information will help you to understand this. For instance, we want to sort three stocks by their prices:
[(GOOG 3), (CSCO 1), (MSFT 1)]
Here stock prices are integer keys, and stock names are their associated information.
Expected output for the sorting should be:
[(CSCO 1), (MSFT 1), (GOOG 3)]
(containing both stock price and its name,
and the CSCO stock should appear before MSFT so that it is a stable sort)
A counts array will be calculated for sorting this (let's say stock prices can only be 0 to 3):
counts array: [0, 2, 0, 1] (price "1" appear twice, and price "3" appear once)
If you are just sorting an integer array, you can go through the counts array and output "1" twice and "3" once and it is done, and the entire counts array will become an all-zero array after this.
But here we want to have stock names in sorting output as well. How can we obtain this additional information (it seems the counts array already discards this piece of information)? Well, the associated information is stored in the original unsorted array. In the unsorted array [(GOOG 3), (CSCO 1), (MSFT 1)], we have both the stock name and its price available. If we get to know which position (GOOG 3) should be in the final sorted array, we can copy this element to the sorted position in the sorted array.
To obtain the final position for each element in the sorted array, unlike sorting an integer array, you don't use the counts array directly to output the sorted elements. Instead, counting sort has an additional step which calculates the cumulative sum array from the counts array:
counts array: [0, 2, 2, 3] (i from 0 to 3: counts[i] = counts[i] + counts[i - 1])
This cumulative sum array tells us each value's position in the final sorted array currently. For example, counts[1]==2 means currently item with value 1 should be placed in the 2nd slot in the sorted array. Intuitively, because counts[i] is the cumulative sum from left, it shows how many smaller items are before the ith value, which tells you where the position should be for the ith value.
If a $1 price stock appears at the first time, it should be outputted to the second position of the sorted array and if a $3 price stock appears at the first time, it should be outputted to the third position of the sorted array. If a $1 stock appears and its element gets copied to the sorted array, we will decreased its count in the counts array.
counts array: [0, 1, 2, 3]
(so that the second appearance of $1 price stock's position will be 1)
So we can iterate the unsorted array from backwards (this is important to ensure the stableness), check its position in the sorted array according to the counts array, and copied it to the sorted array.
sorted array: [null, null, null]
counts array: [0, 2, 2, 3]
iterate stocks in unsorted stocks from backwards
1. the last stock (MSFT 1)
sorted array: [null, (MSFT 1), null] (copy to the second position because counts[1] == 2)
counts array: [0, 1, 2, 3] (decrease counts[1] by 1)
2. the middle stock (CSCO 1)
sorted array: [(CSCO 1), (MSFT 1), null] (copy to the first position because counts[1] == 1 now)
counts array: [0, 0, 2, 3] (decrease counts[1] by 1)
3. the first stock (GOOG 3)
sorted array: [(CSCO 1), (MSFT 1), (GOOG 3)] (copy to the third position because counts[3] == 3)
counts array: [0, 0, 2, 2] (decrease counts[3] by 1)
As you can see, after the array gets sorted, the counts array (which is [0, 0, 2, 2]) doesn't become an all-zero array like sorting an array of integers. The counts array is not used to tell how many times an integer appears in the unsorted array, instead, it is used to tell which position the element should be in the final sorted array. And since we decrease the count every time we output an element, we are essentially making the elements with same key's next appearance final position smaller. That's why we need to iterate the unsorted array from backwards to ensure its stableness.
Conclusion:
Since each element contains not only an integer as key, but also some additional information, even if their key is the same, you could tell each element is different by using the additional information, so you will be able to tell if it is a stable sorting algorithm (yes, it is a stable sorting algorithm if implemented appropriately).
References:
Some good materials explaining counting sort and its stableness:
http://www.algorithmist.com/index.php/Counting_sort (this article explains this question pretty well)
http://courses.csail.mit.edu/6.006/fall11/rec/rec07.pdf
http://rosettacode.org/wiki/Sorting_algorithms/Counting_sort (a list of counting sort implementations in different programming languages. If you compare them with the algorithm in wikipedia's entry below about counting sort, you will find most of which doesn't implement the exact counting sort correctly but implement only the integer sorting function and they don't have the additional step to calculate the cumulative sum array. But you could check out the implementation in 'Go' programming language in this link, which does provides two different implementations, one is used for sorting integers only and the other can be used for sorting elements containing additional information)
http://en.wikipedia.org/wiki/Counting_sort
Simple, really: instead of a simple counter for each 'bucket', it's a linked list.
That is, instead of
0(1) 1(1) 3(1) 4(1) 6(3) 8(1)
You get
0(.) 1(.) 3(.) 4(.) 6(a,b,c) 8(.)
(here I use . to denote some item in the bucket).
Then just dump them back into one sorted list:
0 1 3 4 6a 6b 6c 8
That is, when you find an item with key x, knowing that it may have other information that distinguishes it from other items with the same key, you don't just increment a counter for bucket x (which would discard all those extra information).
Instead, you have a linked list (or similarly ordered data structure with constant time amortized append) for each bucket, and you append that item to the end of the list for bucket x as you scan the input left to right.
So instead of using O(k) space for k counters, you have O(k) initially empty lists whose sum of lengths will be n at the end of the "counting" portion of the algorithm. This variant of counting sort will still be O(n + k) as before.
Your solution is not a full counting sort, and discards the associated values.
Here's the full counting sort algorithm.
After you calculated the histogram:
0(1) 1(1) 3(1) 4(1) 6(3) 8(1)
you have to calculate the accumulated sums - each cell will contain how many elements are less than or equal to that value:
0(1) 1(2) 3(3) 4(4) 6(7) 8(8)
Now you start from the end of your original list and go backwards.
Last element is 4. There are 4 elements less than or equal to 4. So 4 will go on the 4th position. You decrement the counter for 4.
0(1) 1(2) 3(3) 4(3) 6(7) 8(8)
The next element is 6c. There are 7 elements less than or equal to 6. So 6c will go to the 7th position. Again, you decrement the counter for 6.
0(1) 1(2) 3(3) 4(3) 6(6) 8(8)
^ next 6 will go now to 6th position
As you can see, this algorithm is a stable sort. The order for the elements with the same key will be kept.
If your three "6" values are distinguishable, then your counting sort is wrong (it discards information about the values, which a true sort doesn't do, because a true sort only re-orders the values).
If your three "6" values are not distinguishable, then the sort is stable, because you have three indistinguishable "6"s in the input, and three in the output. It's meaningless to talk about whether they have or have not been "re-ordered": they're identical.
The concept of non-stability only applies when the values have some associated information which does not participate in the order. For instance if you were sorting pointers to those integers, then you could "tell the difference" between the three 6s by looking at their different addresses. Then it would be meaningful to ask whether any particular sort was stable. A counting sort based on the integer values then would not be sorting the pointers. A counting sort based on the pointer values would not order them by integer value, rather by address.

Resources