Bubble Sort method - ruby

I am just learning ruby and KevinC's response (in this link) makes sense to me with one exception. I don't understand why the code is encompassed in the arr.each do |i| #while... end That part seems redundant to me as the 'while' loop is already hitting each of the positions? Can someone explain?

The inner loop finds a bubble and carries it up; if it finds another, lighter bubble, it switches them around and carries the lighter one. So you need several passes through the array to find all the bubbles and carry them to the correct place, since you can't float several bubbles at the same time.
EDIT:
The each is really misused in KevinC's code, since it is not used for its normal purpose: yielding elements of the collection. Instead of arr.each, it would be better to use arr.size.times - as it would be more informative to the reader. Redefining the i within the block is adding insult to injury. While none of this will cause the code to be wrong as such, it is misleading.
The other problem with the code is the fact that it does not provide the early termination condition (swapped in most other answers on that question). In theory, bubble sort could find the array sorted in the first pass; the other size - 1 steps are unnecesary. KevinC's code would still dry-hump the already sorted array, never realising it is done.
As for rewrite into block-less code, it is certainly possible, but you need to understand that blocks syntax is very idiomatic in Ruby, and non-block loops are almost unheard of in Ruby world. While Ruby has for, it is pretty much never used in Ruby. But...
arr.each do |i|
...
end
is equivalent to
for i in arr
...
end
which is, again, at least for the array case, equivalent to
index = 0
while index < arr.size
i = arr[index]
...
index += 1
end

Related

Why must we call to_a on an enumerator object?

The chaining of each_slice and to_a confuses me. I know that each_slice is a member of Enumerable and therefore can be called on enumerable objects like arrays, and chars does return an array of characters.
I also know that each_slice will slice the array in groups of n elements, which is 2 in the below example. And if a block is not given to each_slice, then it returns an Enumerator object.
'186A08'.chars.each_slice(2).to_a
But why must we call to_a on the enumerator object if each_slice has already grouped the array by n elements? Why doesn't ruby just evaluate what the enumerator object is (which is a collection of n elements)?
The purpose of enumerators is lazy evaluation. When you call each_slice, you get back an enumerator object. This object does not calculate the entire grouped array up front. Instead, it calculates each “slice” as it is needed. This helps save on memory, and also allows you quite a bit of flexibility in your code.
This stack overflow post has a lot of information in it that you’ll find useful:
What is the purpose of the Enumerator class in Ruby
To give you a cut and dry answer to your question “Why must I call to_a when...”, the answer is, it hasn’t. It hasn’t yet looped through the array at all. So far it’s just defined an object that says that when it goes though the array, you’re going to want elements two at a time. You then have the freedom to either force it to do the calculation on all elements in the enumerable (by calling to_a), or you could alternatively use next or each to go through and then stop partway through (maybe calculate only half of them as opposed to calculating all of them and throwing the second half away).
It’s similar to how the Range class does not build up the list of elements in the range. (1..100000) doesn’t make an array of 100000 numbers, but instead defines an object with a min and max and certain operations can be performed on that. For example (1..100000).cover?(5) doesn’t build a massive array to see if that number is in there, but instead just sees if 5 is greater than or equal to 1 and less than or equal to 100000.
The purpose of this all is performance and flexibility.
It may be worth considering whether your implementation actually needs to make an array up front, or whether you can actually keep your RAM consumption down a bit by iterating over the enumerator. (If your real world scenario is as simple as you described, an enumerator won’t help much, but if the array actually is large, an enumerator could help you a lot).

What are .each iterator fetch order guarantees?

I am really baffled by something as it led in hours of head scratching; I have the following segment of code
objectA.arrayA.each do |p|
do stuff with p
end
I thought this was fine, since from this question I felt that since I am using an array for the job so I should be fine. Unfortunately that was not the case since the order that the each iterator returned the elements was not always the same. After hours of looking at other blocks for the issue swapping the above code with this for loop solved the problem:
for i in 0...objectA.arrayA.length
do stuff with the array element
end
Anyone has any idea when the ordering of each is guaranteed?
The docs for Enumerable state
The Enumerable mixin provides collection classes with several
traversal and searching methods, and with the ability to sort. The
class must provide a method each, which yields successive members of
the collection. If Enumerable#max, #min, or #sort is used, the objects
in the collection must also implement a meaningful <=> operator, as
these methods rely on an ordering between members of the collection.
So Array.each must also yield successive members to meet this contract
If an implementation doesn't enforce this, it would be a bug in the implementation

How does JIT optimize branching while processing elements of collections? (in Scala)

This is a question about performance of code written in Scala.
Consider the following two code snippets, assume that x is some collection containing ~50 million elements:
def process(x: Traversable[T]) = {
processFirst x.head
x reduce processPair
processLast x.last
}
Versus something like this (assume for now we have some way to determine if we're operating on the first element versus the last element):
def isFirstElement[T](x: T) = ???
def isLastElement[T](x: T) = ???
def process(x: Traversable[T]) = {
x reduce {
(left, right) =>
if (isFirstElement(left)
processFirst(left)
else if (isLastElement(right))
processLast(right)
processPair(left, right)
}
}
Which approach is faster? and for ~50 million elements, how much faster?
It seems to me that the first example would be faster because there are fewer conditional checks occurring for all but the first and last elements. However for the latter example there is some argument to suggest that the JIT might be clever enough to optimize away those additional head/last conditional checks that would otherwise occur for all but the first/last elements.
Is the JIT clever enough to perform such operations? The obvious advantage of the latter approach is that all business can be placed in the same function body while in the latter case business must be partitioned into three separate function bodies invoked separately.
** EDIT **
Thanks for all the great responses. While I am leaving the second code snippet above to illustrate its incorrectness, I want to revise the first approach slightly to reflect better the problem I am attempting to solve:
// x is some iterator
def process(x: Iterator[T]) = {
if (x.hasNext)
{
var previous = x.next
var current = null
processFirst previous
while(x.hasNext)
{
current = x.next
processPair(previous, current)
previous = current
}
processLast previous
}
}
While there are no additional checks occurring in the body, there is an additional reference assignment that appears to be unavoidable (previous = current). This is also a much more imperative approach that relies on nullable mutable variables. Implementing this in a functional yet high performance manner would be another exercise for another question.
How does this code snippet stack-up against the last of the two examples above? (the single-iteration block approach containing all the branches). The other thing I realize is that the latter of the two examples is also broken on collections containing fewer than two elements.
If your underlying collection has an inexpensive head and last method (not true for a generic Traversable), and the reduction operations are relatively inexpensive, then the second way takes about 10% longer (maybe a little less) than the first on my machine. (You can use a var to get first, and you can keep updating a second far with the right argument to obtain last, and then do the final operation outside of the loop.)
If you have an expensive last (i.e. you have to traverse the whole collection), then the first operation takes about 10% longer (maybe a little more).
Mostly you shouldn't worry too much about it and instead worry more about correctness. For instance, in a 2-element list your second code has a bug (because there is an else instead of a separate test). In a 1-element list, the second code never calls reduce's lambda at all, so again fails to work.
This argues that you should do it the first way unless you're sure last is really expensive in your case.
Edit: if you switch to a manual reduce-like-operation using an iterator, you might be able to shave off up to about 40% of your time compared to the expensive-last case (e.g. list). For inexpensive last, probably not so much (up to ~20%). (I get these values when operating on lengths of strings, for example.)
First of all, note that, depending on the concrete implementation of Traversable, doing something like x.last may be really expensive. Like, more expensive than all the rest of what's going on here.
Second, I doubt the cost of conditionals themselves is going to be noticeable, even on a 50 million collection, but actually figuring out whether a given element is the first or the last, might again, depending on implementation, get pricey.
Third, JIT will not be able to optimize the conditionals away: if there was a way to do that, you would have been able to write your implementation without conditionals to begin with.
Finally, if you are at a point where it starts looking like an extra if statement might affect performance, you might consider switching to java or even "C". Don't get me wrong, I love scala, it is a great language, with lots of power and useful features, but being super-fast just isn't one of them.

Ruby performance in .each loops

Consider the following two peices of ruby code
Example 1
name = user.first_name
round_number = rounds.count
users.each do |u|
puts "#{name} beat #{u.first_name} in round #{round_number}"
end
Example 2
users.each do |u|
puts "#{user.first_name} beat #{u.first_name} in #{rounds.count}"
end
For both pieces of code imagine
#user.rb
def first_name
name.split.first
end
So in a classical analysis of algorithms, the first piece of code would be more efficient, however in most modern compiled languages, modern compilers would optimize the second piece of code to make it look like the first, eliminating the need to optimize code in such maner.
Will ruby optimize or cache values for this code before execution? Should my ruby code look like example 1 or example 2?
Example 1 will run faster, as first_name() is only called once, and it's value stored in the variable.
In Example 2 Ruby will not memoize this value automatically, since the value could have changed between iterations for the each() loop.
Therefor expensive-to-calculate methods should be explicitly memoized if they are expected to be used more than once without the return value changing.
Making use of Ruby's Benchmark Module can be useful when making decisions like this. It will likely only be worth memoizing if there are a lot of values in users, or if first_name() is expensive to calculate.
A compiler can only perform this optimization if it can prove that the method has no side effects. This is even more difficult in Ruby than most languages, as everything is mutable and can be overridden at runtime. Whether it happens or not is implementation dependent, but since it's hard to do in Ruby, most do not. I actually don't know of any that do at the time of this posting.

Ruby Loops Question

C++:
for(i=0,j=0;i<0;i++,j++)
What's the equivalence to this in ruby?
Besides the normal for, while loop seen in C++. Can someone name off the other special loops ruby has? Such as .times? .each?
Thanks in advance.
If I understand your question (at least the first part of it), you are wondering how you can iterate two separate variables at the same time, such as i and j.
You can do that in Ruby using the for loop, with multiple variables. For instance, if you wanted i to count up from 1 to 10, and j to count from 10 to 20, you could do:
for i, j in (1..10).zip(10..20)
puts "#{i}, #{j}"
end
zip will produce, from two arrays, a single array of which each element is an array, with the first element taken from the corresponding position in the first array, and the second element taken from the corresponding position in the second array:
> [1, 2, 3].zip([4, 5, 6])
=> [[1, 4], [2, 5], [3, 6]]
And using i, j in your for loop will take i from the first element of each inner array, and j from the second element.
If you'd rather use each than for, you can just use a block with two parameters:
(1..10).zip(10..20).each { |i, j| puts "#{i}, #{j}" }
As to the second part of your question, Ruby doesn't really have a fixed number of different iterators, since most iteration is done by passing a block to a method, and thus any class can define its own methods that allow iterating over its own contents. The most common is each, and any class that defines an each method can mix in the Enumerable class, which gives you a variety of different methods for iterating over elements, selecting elements, filtering, and so on. There are also times, upto, and downto defined on the Integer class, each_key, each_value, each_pair on Hash, each_byte, each_char, each_line on String, and so on. Just about any class that defines some sort of collection or sequence has methods for iterating over said collection or sequence.
Ruby is different to C++. In C++ you use a for loop to loop through anything, but in Ruby you'll find you're usually looping through an enumerable object, so it's more common to do something like:
monkeys.each do |monkey|
monkey.say 'ow!'
end
Don't try to look for too much equivalence between the two languages - they're built for different things. Obviously there are a lot of equivalent things, but you can't learn Ruby by producing a chart that shows C++ code on one side and the Ruby equivalent on the other. Try to learn the idiomatic way of doing things and you'll find it much easier.
If you want ways of looping through enumerable objects, check out all the methods in Module: Enumerable: all? any? collect detect each_cons each_slice each_with_index entries enum_cons enum_slice enum_with_index find find_all grep include? inject inject map max member? min partition reject select sort sort_by to_a to_set zip. With most of these methods you'd use a for loop to do the equivalent thing in C++.
You can do:
(0..j).each do |i|
puts i
end
I am not terribly familiar with C++, but AFAICS, the equivalent Ruby code to the loop you posted is simply:
i, j = 0, 0
Which shows once again the expressive power Ruby has. Anybody can figure out what this does, even if he has never seen Ruby before, while the equivalent C++ takes quite a while to figure out.

Resources