Calculate Median in An Array - Can someone tell me what is going on in this line of code? - ruby

This is a solution for calculating the median value in an array. I get the first three lines, duh ;), but the third line is where the magic is happening. Can someone explain how the 'sorted' variable is using and why it's next to brackets, and why the other variable 'len' is enclosed in those parentheses and then brackets? It's almost like sorted is all of a sudden being used as an array? Thanks!
def median(array)
sorted = array.sort
len = sorted.length
return ((sorted[(len - 1) / 2] + sorted[len / 2]) / 2.0).to_f
end
puts median([3,2,3,8,91])
puts median([2,8,3,11,-5])
puts median([4,3,8,11])

Consider this:
[1,2,2,3,4] and [1,2,3,4]. Both arrays are sorted, but have odd and even numbers of elements respectively. So, that piece of code is taking into account these 2 cases.
sorted is indeed an array. You sort [2,3,1,4] and you get back [1,2,3,4]. Then you calculate the middle index (len - 1) / 2 and len / 2 for even / odd number of elements, and find the average of them.

Yes, array.sort is returning an array and it is assigned to sorted. You can then access it via array indices.
If you have an odd number of elements, say 5 elements as in the example, the indices come out to be:
(len-1)/2=(5-1)/2=2
len/2=5/2=2 --- (remember this is integer division, so the decimal gets truncated)
So you take the value at index 2 and add them, and then divide by 2, which is the same as the value at index 2.
If you have an even number of elements, say 4,
(len-1)/2=(4-1)/2=1 --- (remember this is integer division, so the decimal gets truncated)
len/2=4/2=2
So in this case, you are effectively averaging the two middle elements 1 and 2, which is the definition of median for when you have an even number of elements.

It's almost like sorted is all of a sudden being used as an array?
Yes, it is. On line 2 it's being initialized as being an array with the same elements as the input, but in ascending order (default sort is ascending). On line 3 you have len which is initialized with the length of the sorted array, so yeah, sorted is being used as an array since then, because that's what it is.

Related

Code times out Ruby

According as the number of elements in a set of numbers is odd or even, median of that set is defined respectively as the middle value or the average of the two middle values in the list that results when the set is sorted.
Below is code for calculating the "running" median of a list of numbers. "Running" median is a dynamic median which is re-calculated with the appearance of a new number as the list is scanned for all numbers that have appeared thus far. Input is an integer n followed by a list of n integers, and output should be the "running" median of the list as the list is scanned. For example,
3
4
1
5
should yield
4
2.5
4
because 4 is the median of [4], 2.5 ((1+4)/2)is the median of [4,1] and 4 again is the median of [4,1,5].
My program works correctly, but it times out on a certain test on very large inputs. I suspect that this copying step is the problem.
a=(a[0,(k=a.index(a.bsearch{|x|x>=t}))].push(t) + a[k,a.length-k])
But I am not sure because this copy is meant to be a shallow copy as far as I know. Also, I am not doing a regular insert anywhere, which would involved shifting elements and thus result in slowing down the code, into the array that contains the numbers.
n=gets.chomp.to_i
a=[]
n.times do
t=gets.chomp.to_i
a==[]||(t<=a.first) ? a.unshift(t): t>=a.last ? a.push(t) : a=(a[0,(k=a.index(a.bsearch{|x|x>=t}))].push(t) + a[k,a.length-k])
p (l=a.count)%2==0 ? ((a[l/2] + a[l/2-1])/2.0).round(1):a[(l-1)/2].round(1)
end
Can anybody point out where the problem could be? Thank you.
Here is a less obfuscated version.
n=gets.chomp.to_i
a=[]
n.times do
t=gets.chomp.to_i
if a==[]||(t<=a.first)
a.unshift(t)
else
k=a.index(a.bsearch{|x|x>=t})
if k.nil? == true
k=a.length
end
a=a[0,k].push(t)+ a[k,a.length-k]
end
p (l=a.count)%2==0 ? ((a[l/2] + a[l/2-1])/2.0).round(1):a[(l-1)/2].round(1)
end
I think...
a=(a[0,(k=a.index(a.bsearch{|x|x>=t}))].push(t) + a[k,a.length-k])
...because it's creating a new array every time, is likely an expensive operation as the array gets bigger.
Better might actually be something that mutates the original array.
a.insert((a.index{|x|x>t} || -1), t)
It also handles the edge cases of less than first or greater than last, so you can remove those tests. Also works on first pass (empty array a)

Trying to improve efficiency of this search in an array

Suppose I have an input array where all objects are non-equivalent - e.g. [13,2,36]. I want the output array to be [1,0,2], since 13 is greater than 2 so "1", 2 is greater than no element so "0", 36 is greater than both 13 and 2 so "2". How do I get the output array with efficiency better than O(n2)?
Edit 1 : I also want to print the output in same ordering. Give a c/c++ code if possible.
Seems like a dynamic programming.
May be this can help
Here is an O(n) algorithm
1.Declare an array of say max size say 1000001;
2.Traverse through all the elements and make arr[input[n]]=1 where input[n] is the element
3.Traverse through the arr and add with the previous index(To keep record of arr[i] is greater than how many elements) like this
arr[i]+=arr[i-1]
Example: if input[]={12,3,36}
After step 2
arr[12]=1,arr[3]=1,arr[36]=1;
After step 3
arr[3]=1,arr[4]=arr[3]+arr[4]=1(arr[4]=0,arr[3]=1),
arr[11]=arr[10]=arr[9]=arr[8]=arr[7]arr[6]=arr[5]=arr[4]=1
arr[12]=arr[11]+arr[12]=2(arr[11]=1,arr[12]=1)
arr[36]=arr[35]+arr[36]=3(because arr[13],arr[14],...arr[35]=2 and arr[36]=1)
4.Traverse through the input array an print arr[input[i]]-1 where i is the index.
So arr[3]=1,arr[12]=2,arr[36]=3;
If you print arr[input[i]] then output will be {2,1,3} so we need to subtract 1 from each element then the output becomes {1,0,2} which is your desired output.
//pseude code
int arr[1000001];
int input[size];//size is the size of the input array
for(i=0;i<size;i++)
input[i]=take input;//take input
arr[input[i]]=1;//setting the index of input[i]=1;
for(i=1;i<1000001;i++)
arr[i]+=arr[i-1];
for(i=0;i<size;i++)
print arr[input[i]]-1;//since arr[i] was initialized with 1 but you want the input as 0 for first element(so subtracting 1 from each element)
To understand the algorithm better,take paper and pen and do the dry run.It will help to understand better.
Hope it helps
Happy Coding!!
Clone original array (and keep original indexes of elements somewhere) and quicksort it. Value of the element in quicksorted array should be quicksorted.length - i, where i is index of element in the new quicksorted array.
[13, 2, 36] - original
[36(2), 13(1), 2(0)] - sorted
[1, 0, 2] - substituted
def sort(array):
temp = sorted(array)
indexDict = {temp[i]: i for i in xrange(len(temp))}
return [indexDict[i] for i in array]
I realize it's in python, but nevertheless should still help you
Schwartzian transform: decorate, sort, undecorate.
Create a structure holding an object as well as an index. Create a new list of these structures from your list. Sort by the objects as planned. Create a list of the indices from the sorted list.

What is the most efficient way to sort a number list into alternating low-high-low sequences?

Suppose you are given an unsorted list of positive integers, and you wish to order them in a manner such that the elements alternate as: (less than preceding element), (greater than preceding element), (less than preceding element), etc... The very first element in the output list may ignore the rule. So for example, suppose your list was: 1,4,9,2,7,5,3,8,6.
One correct output would be...
1,9,2,8,3,7,4,6,5
Another would be...
3,4,2,7,5,6,1,9,8
Assume that the list contains no duplicates, is arbitrarily large, and is not already sorted.
What is the most processing efficient algorithm to achieve this?
Now, the standard approach would be to simply sort the list in ascending order first, and then peel elements from the ends of the list in alternation. However, I'd like to know: Is there a more time-efficient way to do this without first sorting the list?
My reason for asking: (read this only if you care)
Apparently this is a question my sister's boyfriend poses to people at job interviews out in San Francisco. My sister asked me the question, and I immediately came up with the standard response. That's what everyone answers. However, apparently one girl came up with a completely different solution that does not require sorting the list, and it appears to work. My sister couldn't explain to me this solution, but the idea has been confounding me since last night. I'd appreciate any help! Thanks!
You can do this in O(n) by placing each element in turn at the end, or at the penultimate position based on a comparison with the current last element.
For example,
1,4,9,2,7,5,3,8,6
Place 1 at end, current list [1]
4>1 true so place 4 at end, current list [1,4]
9<4 false so place 9 at penultimate position [1,9,4]
2>4 false so place 2 at penultimate [1,9,2,4]
7<4 false so place 7 at penultimate [1,9,2,7,4]
5>4 true so place 5 at end [1,9,2,7,4,5]
3<5 true so place 3 at end [1,9,2,7,4,5,3]
8>3 true so place 8 at end [1,9,2,7,4,5,3,8]
6<8 true so place 6 at end [1,9,2,7,4,5,3,8,6]
Note that the equality tests alternate, and that we place at the end if the equality is true, or at the penultimate position if it is not true.
Example Python Code
A=[1,4,9,2,7,5,3,8,6]
B=[]
for i,a in enumerate(A):
if i==0 or (i&1 and a>B[-1]) or (i&1==0 and a<B[-1]):
B.insert(i,a)
else:
B.insert(i-1,a)
print B
One solution is this. Given in Pseudocode.
Assuming, nums has at least two elements and all elements in nums are distinct.
nums = [list of numbers]
if nums[0] < nums[1]: last_state = INCREASING else: last_state = DECREASING
for i = 2 to len(nums - 1):
if last_state = INCREASING:
if nums[i] > nums[i-1]:
swap (nums[i], nums[i-1])
last_state = DECREASING
else
if nums[i] < nums[i-1]:
swap (nums[i], nums[i-1])
last_state = INCREASING
Proof of correctness:
After each loop iteration, elements upto index i in nums remain alternating and last_state is represent the order of i th and i-1 th elements.
Note that a swapping happens only if last 3 items considered are in order. (Increasing or Decreasing) Therefore, if we swapped ith element with i-1 th element, the order of i-2 th element and i-1th element will not change.

DBC Exercise 9 - Calculate the Missing Number

Exercise 9 - 45 minutes
You have been given a list of sequential numbers from 1 to 10,000, but they are all out of order; furthermore, 1 number is missing from the list. The goal is to find which number is missing.Write out in plain English your strategy for solving this problem. Be as concise as possible.
Write Ruby code that takes this list of numbers as an argument, and returns the missing number.
My initial impression is some sort of sort function will help me put the array into order, but then I reread the problem and its not asking for a sorted sequence, it's asking for a missing number. The next step to consider is how do you determine a number that is the next sequence and I think of the 99 bottles challenge in Chris Pine's book and realize that that "n + 1"or "n - 1" will be a part of the solution as will a 'range statement' that begins with 1 and ends with 10,000 (1..10,000).
I next think about indexing and that I'll need to loop through the range using #upto or #each to determine the missing number as well as some sort of conditional statement that allows me to return the missing value. I'll be defining a method "missing_number" but what is the input?
Is it an array? Or is it a range? I am going to go with array since most of the time arrays are unsorted and when I test it I'll define the input as a range.
After doing a bit of research I came across a strategy that indicated the key step would be to sum all of the numbers in the array and subtract the
difference from the sum of the given range. This makes a lot of sense as a good approach because you are dealing with a constant value, so I selected this approach
to inform the code.
def missing_number(array)
grand_sum = (array.length + 1) * (array.length + 2) / 2
sum = 0
array.each {|n| sum += n}
grand_sum - sum
end
x=(1..10000).to_a
x.delete rand(10000)
puts missing_number(x)

How is counting sort a stable sort?

Suppose my input is (a,b and c to distinguish between equal keys)
1 6a 8 3 6b 0 6c 4
My counting sort will save as (discarding the a,b and c info!!)
0(1) 1(1) 3(1) 4(1) 6(3) 8(1)
which will give me the result
0 1 3 4 6 6 6 8
So, how is this stable sort?
I am not sure how it is "maintaining the relative order of records with equal keys."
Please explain.
To understand why counting sort is stable, you need to understand that counting sort can not only be used for sorting a list of integers, it can also be used for sorting a list of elements whose key is an integer, and these elements will be sorted by their keys while having additional information associated with each of them.
A counting sort example that sorts elements with additional information will help you to understand this. For instance, we want to sort three stocks by their prices:
[(GOOG 3), (CSCO 1), (MSFT 1)]
Here stock prices are integer keys, and stock names are their associated information.
Expected output for the sorting should be:
[(CSCO 1), (MSFT 1), (GOOG 3)]
(containing both stock price and its name,
and the CSCO stock should appear before MSFT so that it is a stable sort)
A counts array will be calculated for sorting this (let's say stock prices can only be 0 to 3):
counts array: [0, 2, 0, 1] (price "1" appear twice, and price "3" appear once)
If you are just sorting an integer array, you can go through the counts array and output "1" twice and "3" once and it is done, and the entire counts array will become an all-zero array after this.
But here we want to have stock names in sorting output as well. How can we obtain this additional information (it seems the counts array already discards this piece of information)? Well, the associated information is stored in the original unsorted array. In the unsorted array [(GOOG 3), (CSCO 1), (MSFT 1)], we have both the stock name and its price available. If we get to know which position (GOOG 3) should be in the final sorted array, we can copy this element to the sorted position in the sorted array.
To obtain the final position for each element in the sorted array, unlike sorting an integer array, you don't use the counts array directly to output the sorted elements. Instead, counting sort has an additional step which calculates the cumulative sum array from the counts array:
counts array: [0, 2, 2, 3] (i from 0 to 3: counts[i] = counts[i] + counts[i - 1])
This cumulative sum array tells us each value's position in the final sorted array currently. For example, counts[1]==2 means currently item with value 1 should be placed in the 2nd slot in the sorted array. Intuitively, because counts[i] is the cumulative sum from left, it shows how many smaller items are before the ith value, which tells you where the position should be for the ith value.
If a $1 price stock appears at the first time, it should be outputted to the second position of the sorted array and if a $3 price stock appears at the first time, it should be outputted to the third position of the sorted array. If a $1 stock appears and its element gets copied to the sorted array, we will decreased its count in the counts array.
counts array: [0, 1, 2, 3]
(so that the second appearance of $1 price stock's position will be 1)
So we can iterate the unsorted array from backwards (this is important to ensure the stableness), check its position in the sorted array according to the counts array, and copied it to the sorted array.
sorted array: [null, null, null]
counts array: [0, 2, 2, 3]
iterate stocks in unsorted stocks from backwards
1. the last stock (MSFT 1)
sorted array: [null, (MSFT 1), null] (copy to the second position because counts[1] == 2)
counts array: [0, 1, 2, 3] (decrease counts[1] by 1)
2. the middle stock (CSCO 1)
sorted array: [(CSCO 1), (MSFT 1), null] (copy to the first position because counts[1] == 1 now)
counts array: [0, 0, 2, 3] (decrease counts[1] by 1)
3. the first stock (GOOG 3)
sorted array: [(CSCO 1), (MSFT 1), (GOOG 3)] (copy to the third position because counts[3] == 3)
counts array: [0, 0, 2, 2] (decrease counts[3] by 1)
As you can see, after the array gets sorted, the counts array (which is [0, 0, 2, 2]) doesn't become an all-zero array like sorting an array of integers. The counts array is not used to tell how many times an integer appears in the unsorted array, instead, it is used to tell which position the element should be in the final sorted array. And since we decrease the count every time we output an element, we are essentially making the elements with same key's next appearance final position smaller. That's why we need to iterate the unsorted array from backwards to ensure its stableness.
Conclusion:
Since each element contains not only an integer as key, but also some additional information, even if their key is the same, you could tell each element is different by using the additional information, so you will be able to tell if it is a stable sorting algorithm (yes, it is a stable sorting algorithm if implemented appropriately).
References:
Some good materials explaining counting sort and its stableness:
http://www.algorithmist.com/index.php/Counting_sort (this article explains this question pretty well)
http://courses.csail.mit.edu/6.006/fall11/rec/rec07.pdf
http://rosettacode.org/wiki/Sorting_algorithms/Counting_sort (a list of counting sort implementations in different programming languages. If you compare them with the algorithm in wikipedia's entry below about counting sort, you will find most of which doesn't implement the exact counting sort correctly but implement only the integer sorting function and they don't have the additional step to calculate the cumulative sum array. But you could check out the implementation in 'Go' programming language in this link, which does provides two different implementations, one is used for sorting integers only and the other can be used for sorting elements containing additional information)
http://en.wikipedia.org/wiki/Counting_sort
Simple, really: instead of a simple counter for each 'bucket', it's a linked list.
That is, instead of
0(1) 1(1) 3(1) 4(1) 6(3) 8(1)
You get
0(.) 1(.) 3(.) 4(.) 6(a,b,c) 8(.)
(here I use . to denote some item in the bucket).
Then just dump them back into one sorted list:
0 1 3 4 6a 6b 6c 8
That is, when you find an item with key x, knowing that it may have other information that distinguishes it from other items with the same key, you don't just increment a counter for bucket x (which would discard all those extra information).
Instead, you have a linked list (or similarly ordered data structure with constant time amortized append) for each bucket, and you append that item to the end of the list for bucket x as you scan the input left to right.
So instead of using O(k) space for k counters, you have O(k) initially empty lists whose sum of lengths will be n at the end of the "counting" portion of the algorithm. This variant of counting sort will still be O(n + k) as before.
Your solution is not a full counting sort, and discards the associated values.
Here's the full counting sort algorithm.
After you calculated the histogram:
0(1) 1(1) 3(1) 4(1) 6(3) 8(1)
you have to calculate the accumulated sums - each cell will contain how many elements are less than or equal to that value:
0(1) 1(2) 3(3) 4(4) 6(7) 8(8)
Now you start from the end of your original list and go backwards.
Last element is 4. There are 4 elements less than or equal to 4. So 4 will go on the 4th position. You decrement the counter for 4.
0(1) 1(2) 3(3) 4(3) 6(7) 8(8)
The next element is 6c. There are 7 elements less than or equal to 6. So 6c will go to the 7th position. Again, you decrement the counter for 6.
0(1) 1(2) 3(3) 4(3) 6(6) 8(8)
^ next 6 will go now to 6th position
As you can see, this algorithm is a stable sort. The order for the elements with the same key will be kept.
If your three "6" values are distinguishable, then your counting sort is wrong (it discards information about the values, which a true sort doesn't do, because a true sort only re-orders the values).
If your three "6" values are not distinguishable, then the sort is stable, because you have three indistinguishable "6"s in the input, and three in the output. It's meaningless to talk about whether they have or have not been "re-ordered": they're identical.
The concept of non-stability only applies when the values have some associated information which does not participate in the order. For instance if you were sorting pointers to those integers, then you could "tell the difference" between the three 6s by looking at their different addresses. Then it would be meaningful to ask whether any particular sort was stable. A counting sort based on the integer values then would not be sorting the pointers. A counting sort based on the pointer values would not order them by integer value, rather by address.

Resources