Related
I was going through the exercises in Ruby Koans and I was struck by the following Ruby quirk that I found really unexplainable:
array = [:peanut, :butter, :and, :jelly]
array[0] #=> :peanut #OK!
array[0,1] #=> [:peanut] #OK!
array[0,2] #=> [:peanut, :butter] #OK!
array[0,0] #=> [] #OK!
array[2] #=> :and #OK!
array[2,2] #=> [:and, :jelly] #OK!
array[2,20] #=> [:and, :jelly] #OK!
array[4] #=> nil #OK!
array[4,0] #=> [] #HUH?? Why's that?
array[4,100] #=> [] #Still HUH, but consistent with previous one
array[5] #=> nil #consistent with array[4] #=> nil
array[5,0] #=> nil #WOW. Now I don't understand anything anymore...
So why is array[5,0] not equal to array[4,0]? Is there any reason why array slicing behaves this weird when you start at the (length+1)th position??
Slicing and indexing are two different operations, and inferring the behaviour of one from the other is where your problem lies.
The first argument in slice identifies not the element but the places between elements, defining spans (and not elements themselves):
:peanut :butter :and :jelly
0 1 2 3 4
4 is still within the array, just barely; if you request 0 elements, you get the empty end of the array. But there is no index 5, so you can't slice from there.
When you do index (like array[4]), you are pointing at elements themselves, so the indices only go from 0 to 3.
this has to do with the fact that slice returns an array, relevant source documentation from Array#slice:
* call-seq:
* array[index] -> obj or nil
* array[start, length] -> an_array or nil
* array[range] -> an_array or nil
* array.slice(index) -> obj or nil
* array.slice(start, length) -> an_array or nil
* array.slice(range) -> an_array or nil
which suggests to me that if you give the start that is out of bounds, it will return nil, thus in your example array[4,0] asks for the 4th element that exists, but asks to return an array of zero elements. While array[5,0] asks for an index out of bounds so it returns nil. This perhaps makes more sense if you remember that the slice method is returning a new array, not altering the original data structure.
EDIT:
After reviewing the comments I decided to edit this answer. Slice calls the following code snippet when the arg value is two:
if (argc == 2) {
if (SYMBOL_P(argv[0])) {
rb_raise(rb_eTypeError, "Symbol as array index");
}
beg = NUM2LONG(argv[0]);
len = NUM2LONG(argv[1]);
if (beg < 0) {
beg += RARRAY(ary)->len;
}
return rb_ary_subseq(ary, beg, len);
}
if you look in the array.c class where the rb_ary_subseq method is defined, you see that it is returning nil if the length is out of bounds, not the index:
if (beg > RARRAY_LEN(ary)) return Qnil;
In this case this is what is happening when 4 is passed in, it checks that there are 4 elements and thus does not trigger the nil return. It then goes on and returns an empty array if the second arg is set to zero. while if 5 is passed in, there are not 5 elements in the array, so it returns nil before the zero arg is evaluated. code here at line 944.
I believe this to be a bug, or at least unpredictable and not the 'Principle of Least Surprise'. When I get a few minutes I will a least submit a failing test patch to ruby core.
At least note that the behavior is consistent. From 5 on up everything acts the same; the weirdness only occurs at [4,N].
Maybe this pattern helps, or maybe I'm just tired and it doesn't help at all.
array[0,4] => [:peanut, :butter, :and, :jelly]
array[1,3] => [:butter, :and, :jelly]
array[2,2] => [:and, :jelly]
array[3,1] => [:jelly]
array[4,0] => []
At [4,0], we catch the end of the array. I'd actually find it rather odd, as far as beauty in patterns go, if the last one returned nil. Because of a context like this, 4 is an acceptable option for the first parameter so that the empty array can be returned. Once we hit 5 and up, though, the method likely exits immediately by nature of being totally and completely out of bounds.
This makes sense when you consider than an array slice can be a valid lvalue, not just an rvalue:
array = [:peanut, :butter, :and, :jelly]
# replace 0 elements starting at index 5 (insert at end or array):
array[4,0] = [:sandwich]
# replace 0 elements starting at index 0 (insert at head of array):
array[0,0] = [:make, :me, :a]
# array is [:make, :me, :a, :peanut, :butter, :and, :jelly, :sandwich]
# this is just like replacing existing elements:
array[3, 4] = [:grilled, :cheese]
# array is [:make, :me, :a, :grilled, :cheese, :sandwich]
This wouldn't be possible if array[4,0] returned nil instead of []. However, array[5,0] returns nil because it's out of bounds (inserting after the 4th element of a 4-element array is meaningful, but inserting after the 5th element of a 4 element array is not).
Read the slice syntax array[x,y] as "starting after x elements in array, select up to y elements". This is only meaningful if array has at least x elements.
This does make sense
You need to be able to assign to those slices, so they are defined in such a way that the beginning and the end of the string have working zero-length expressions.
array[4, 0] = :sandwich
array[0, 0] = :crunchy
=> [:crunchy, :peanut, :butter, :and, :jelly, :sandwich]
I found explanation by Gary Wright very helpful as well.
http://www.ruby-forum.com/topic/1393096#990065
The answer by Gary Wright is -
http://www.ruby-doc.org/core/classes/Array.html
The docs certainly could be more clear but the actual behavior is
self-consistent and useful.
Note: I'm assuming 1.9.X version of String.
It helps to consider the numbering in the following way:
-4 -3 -2 -1 <-- numbering for single argument indexing
0 1 2 3
+---+---+---+---+
| a | b | c | d |
+---+---+---+---+
0 1 2 3 4 <-- numbering for two argument indexing or start of range
-4 -3 -2 -1
The common (and understandable) mistake is too assume that the semantics
of the single argument index are the same as the semantics of the
first argument in the two argument scenario (or range). They are not
the same thing in practice and the documentation doesn't reflect this.
The error though is definitely in the documentation and not in the
implementation:
single argument: the index represents a single character position
within the string. The result is either the single character string
found at the index or nil because there is no character at the given
index.
s = ""
s[0] # nil because no character at that position
s = "abcd"
s[0] # "a"
s[-4] # "a"
s[-5] # nil, no characters before the first one
two integer arguments: the arguments identify a portion of the string to
extract or to replace. In particular, zero-width portions of the string
can also be identified so that text can be inserted before or after
existing characters including at the front or end of the string. In this
case, the first argument does not identify a character position but
instead identifies the space between characters as shown in the diagram
above. The second argument is the length, which can be 0.
s = "abcd" # each example below assumes s is reset to "abcd"
To insert text before 'a': s[0,0] = "X" # "Xabcd"
To insert text after 'd': s[4,0] = "Z" # "abcdZ"
To replace first two characters: s[0,2] = "AB" # "ABcd"
To replace last two characters: s[-2,2] = "CD" # "abCD"
To replace middle two characters: s[1..3] = "XX" # "aXXd"
The behavior of a range is pretty interesting. The starting point is the
same as the first argument when two arguments are provided (as described
above) but the end point of the range can be the 'character position' as
with single indexing or the "edge position" as with two integer
arguments. The difference is determined by whether the double-dot range
or triple-dot range is used:
s = "abcd"
s[1..1] # "b"
s[1..1] = "X" # "aXcd"
s[1...1] # ""
s[1...1] = "X" # "aXbcd", the range specifies a zero-width portion of
the string
s[1..3] # "bcd"
s[1..3] = "X" # "aX", positions 1, 2, and 3 are replaced.
s[1...3] # "bc"
s[1...3] = "X" # "aXd", positions 1, 2, but not quite 3 are replaced.
If you go back through these examples and insist and using the single
index semantics for the double or range indexing examples you'll just
get confused. You've got to use the alternate numbering I show in the
ascii diagram to model the actual behavior.
I agree that this seems like strange behavior, but even the official documentation on Array#slice demonstrates the same behavior as in your example, in the "special cases" below:
a = [ "a", "b", "c", "d", "e" ]
a[2] + a[0] + a[1] #=> "cab"
a[6] #=> nil
a[1, 2] #=> [ "b", "c" ]
a[1..3] #=> [ "b", "c", "d" ]
a[4..7] #=> [ "e" ]
a[6..10] #=> nil
a[-3, 3] #=> [ "c", "d", "e" ]
# special cases
a[5] #=> nil
a[5, 1] #=> []
a[5..10] #=> []
Unfortunately, even their description of Array#slice doesn't seem to offer any insight as to why it works this way:
Element Reference—Returns the element at index, or returns a subarray starting at start and continuing for length elements, or returns a subarray specified by range. Negative indices count backward from the end of the array (-1 is the last element). Returns nil if the index (or starting index) are out of range.
An explanation provided by Jim Weirich
One way to think about it is that index position 4 is at the very edge
of the array. When asking for a slice, you return as much of the
array that is left. So consider the array[2,10], array[3,10] and
array[4,10] ... each returns the remaining bits of the end of the
array: 2 elements, 1 element and 0 elements respectively. However,
position 5 is clearly outside the array and not at the edge, so
array[5,10] returns nil.
Consider the following array:
>> array=["a","b","c"]
=> ["a", "b", "c"]
You can insert an item to the begining (head) of the array by assigning it to a[0,0]. To put the element between "a" and "b", use a[1,0]. Basically, in the notation a[i,n], i represents an index and n a number of elements. When n=0, it defines a position between the elements of the array.
Now if you think about the end of the array, how can you append an item to its end using the notation described above? Simple, assign the value to a[3,0]. This is the tail of the array.
So, if you try to access the element at a[3,0], you will get []. In this case you are still in the range of the array. But if you try to access a[4,0], you'll get nil as return value, since you're not within the range of the array anymore.
Read more about it at http://mybrainstormings.wordpress.com/2012/09/10/arrays-in-ruby/ .
tl;dr: in the source code in array.c, different functions are called depending on whether you pass 1 or 2 arguments in to Array#slice resulting in the unexpected return values.
(First off, I'd like to point out that I don't code in C, but have been using Ruby for years. So if you're not familiar with C, but you take a few minutes to familiarize yourself with the basics of functions and variables it's really not that hard to follow the Ruby source code, as demonstrated below. This answer is based on Ruby v2.3, but is more or less the same back to v1.9.)
Scenario #1
array.length == 4; array.slice(4) #=> nil
If you look at the source code for Array#slice (rb_ary_aref), you see that when only one argument is passed in (lines 1277-1289), rb_ary_entry is called, passing in the index value (which can be positive or negative).
rb_ary_entry then calculates the position of the requested element from the beginning of the array (in other words, if a negative index is passed in, it computes the positive equivalent) and then calls rb_ary_elt to get the requested element.
As expected, rb_ary_elt returns nil when the length of the array len is less than or equal to the index (here called offset).
1189: if (offset < 0 || len <= offset) {
1190: return Qnil;
1191: }
Scenario #2
array.length == 4; array.slice(4, 0) #=> []
However when 2 arguments are passed in (i.e. the starting index beg, and length of the slice len), rb_ary_subseq is called.
In rb_ary_subseq, if the starting index beg is greater than the array length alen, nil is returned:
1208: long alen = RARRAY_LEN(ary);
1209:
1210: if (beg > alen) return Qnil;
Otherwise the length of the resulting slice len is calculated, and if it's determined to be zero, an empty array is returned:
1213: if (alen < len || alen < beg + len) {
1214: len = alen - beg;
1215: }
1216: klass = rb_obj_class(ary);
1217: if (len == 0) return ary_new(klass, 0);
So since the starting index of 4 is not greater than array.length, an empty array is returned instead of the nil value that one might expect.
Question answered?
If the actual question here isn't "What code causes this to happen?", but rather, "Why did Matz do it this way?", well you'll just have to buy him a cup of coffee at the next RubyConf and ask him.
I'm doing The Odin Project. The practice problem is: create a merge sort algorithm using recursion. The following is modified from someone's solution:
def merge_sort(arry)
# kick out the odds or kick out of the recursive splitting?
# I wasn't able to get the recombination to work within the same method.
return arry if arry.length == 1
arry1 = merge_sort(arry[0...arry.length/2])
arry2 = merge_sort(arry[arry.length/2..-1])
f_arry = []
index1 = 0 # placekeeper for iterating through arry1
index2 = 0 # placekeeper for iterating through arry2
# stops when f_arry is as long as combined subarrays
while f_arry.length < (arry1.length + arry2.length)
if index1 == arry1.length
# pushes remainder of arry2 to f_arry
# not sure why it needs to be flatten(ed)!
(f_arry << arry2[index2..-1]).flatten!
elsif index2 == arry2.length
(f_arry << arry1[index1..-1]).flatten!
elsif arry1[index1] <= arry2[index2]
f_arry << arry1[index1]
index1 += 1
else
f_arry << arry2 [index2]
index2 += 1
end
end
return f_arry
end
Is the first line return arry if arry.length == 1 kicking it out of the recursive splitting of the array(s) and then bypassing the recursive splitting part of the method to go back to the recombination section? It seems like it should then just keep resplitting it once it gets back to that section as it recurses through.
Why must it be flatten-ed?
The easiest way to understand the first line is to understand that the only contract that merge_sort is bound to is to "return a sorted array" - if the array has only one element (arry.length == 1) it is already sorted - so nothing needs to be done! Simply return the array itself.
In recursion, this is known as a "Stop condition". If you don't provide a stop condition - the recursion will never end (since it will always call itself - and never return)!
The result you need to flatten your result, is because you are pushing an array as an element in you resulting array:
arr = [1]
arr << [2, 3]
# => [1, [2, 3]]
If you try to flatten the resulting array only at the end of the iteration, and not as you are adding the elements, you'll have a problem, since its length will be skewed:
arr = [1, [2, 3]]
arr.length
# => 2
Although arr contains three numbers it has only two elements - and that will break your solution.
You want all the elements in your array to be numbers, not arrays. flatten! makes sure that all elements in your array are atoms, and if they are not, it adds the child array's elements to itself instead of the child array:
arr.flatten!
# => [1, 2, 3]
Another you option you might want to consider (and will be more efficient) is to use concat instead:
arr = [1]
arr.concat([2, 3])
# => [1, 2, 3]
This method add all the elements in the array passed as parameter to the array it is called on.
I wrote a Ruby code to get max and min values from an array. The code prints the max value (8) correct but it's not printing the minimum value (2). Please let me know what went wrong in my code.
class MaxMinArray
def MaxMinMethod()
array = [4,2,8,3,5]
maxNo = array[0]
minNo = array[0]
arrayLength = array.length
for i in 1..arrayLength
if array[i].to_i > maxNo
maxNo = array[i]
end
if array[i].to_i < minNo
minNo = array[i]
end
end
puts "Maximum no. in the given array: " + maxNo.to_s
puts "Minimum no. in the given array: " + minNo.to_s
end
end
MaxiMinArrayObj = MaxMinArray.new
MaxiMinArrayObj.MaxMinMethod()
It is the combination of two things.
First, you iterated over for i in 1..arrayLength, which iterates past the last element in array. After the last element, array[i] is nil.
Second, you have the condition if array[i].to_i < minNo, which can be satisfied even if array[i] is not a number.
Because of that, the nil returned by array[i] after the last element satisfies the condition due to nil.to_i being 0, and that nil is assigned to minNo.
I realize you're trying to learn how to code, but, as you do so, it's also important to learn to take advantage of pre-existing solutions. Reinventing wheels will waste your time debugging code.
I'd write the code like:
def max_min(ary)
[ary.max, ary.min]
end
max_min([1,2,4]) # => [4, 1]
But, then again, Ruby already has a good minmax method:
[1,2,4].minmax # => [1, 4]
so use it and focus your energy on more interesting things.
If you have to see the values in the opposite order, use:
[1,2,4].minmax.reverse # => [4, 1]
A more verbose/old-school way of doing it is:
FIXNUM_MAX = (2 ** (0.size * 8 - 2) - 1)
FIXNUM_MIN = -(2 ** (0.size * 8 - 2))
def max_min(ary)
return [nil, nil] if ary.empty?
minval = FIXNUM_MAX
maxval = FIXNUM_MIN
ary.each do |i|
minval = i if i < minval
maxval = i if i > maxval
end
[maxval, minval]
end
max_min([1,2,4]) # => [4, 1]
[1,2,4].minmax.reverse # => [4, 1]
That simply loops over the array, checks each value to see if it's either smaller or larger than the last minimum or maximum value, and, if so, remembers it. Once the array is exhausted the values are returned. It's a lot more concise because using each removes a lot of the hassle of trying to walk the array using index values. We almost never use for in Ruby, especially to walk through an array.
(Technically Ruby can hold values well beyond 4611686018427387903 and -4611686018427387904, which are what FIXNUM_MAX and FIXNUM_MIN are, but those suffice for most things we want to do.)
It's not a good practice to print inside methods as long as you might want to use the results for something else.
Also Ruby comes with all sorts of magic methods to get the maximum and minimum of an array:
results = [5, 23, 43, 2, 3, 0].minmax
puts "Maximum no. in the given array: " + results[1]
puts "Minimum no. in the given array: " + results[0]
You should iterate from 1 to arrayLength - 1 (it's an index of the last element). You can use three dots for this:
for i in 1...arrayLength
If I were not allowed to used Ruby's minmax method, than I would do it probably like this:
array = [4,2,8,3,5]
min, max = nil, nil
array.each do |element|
min = element if min.nil? || element < min
max = element if max.nil? || max < element
end
puts [min, max]
# => [2, 8]
I used this expression for the min and max within ruby, it's a stretch but it works
class RubyMinMax
def self.min_function(array=[])
puts "my array is the following #{array}"
puts "the length of the array is #{array.length}"
it = 0
while array.length > 1
array.fetch(it).to_i > array.fetch(it-1).to_i ? array.delete_at(it) : array.delete_at(it-1)
it = array.length-1
end
print array[0]
end
def self.max_function(array=[])
puts "my array is the following #{array}"
puts "the length of the array is #{array.length}"
it = 0
while array.length > 1
array.fetch(it).to_i < array.fetch(it-1).to_i ? array.delete_at(it) : array.delete_at(it-1)
it = array.length-1
end
print array[0]
end
end
RubyMinMax.min_function([18, 19, 17])
RubyMinMax.max_function([18, 19, 17])
In the simplest way you can use max and min method of array.
:001 > [1,4,1,3,4,5].max
=> 5
:002 > [1,4,1,3,4,5].min
=> 1
And if your array may contain nil the first compact it the use min max
For example
:003 > [1,4,1,3,4,5,nil].compact
=> [1, 4, 1, 3, 4, 5]
:004 > [1,4,1,3,4,5].max
=> 5
:005 > [1,4,1,3,4,5].min
=> 1
I tried to see how Array#[]= works, and played around:
enum[int] = obj → obj
enum[start, length] = obj → obj
enum[range] = obj → obj
Question 1
I have one array b holding nil at its 0 index.
b = []
b[0] # => nil
I tried to replace nil with integer 10 in the code below.
b[-1] = 10 # => IndexError: index -1 too small for array; minimum: 0
Why doesn't the code above work, but the ones below do? In case of an array with size 1, why are the indices 0 and -1 treated differently?
b[0] = 5 # => 5
b[-1] = 10 # => 10
Question 2
I created an array of size 2, and did the following:
a = [1,2]
a[-3] = 3 # => IndexError: index -3 too small for array; minimum: -2
a[-3] = [3] # => IndexError: index -3 too small for array; minimum: -2
a[-3..-4] = [3] # => RangeError: -3..-4 out of range
I believe that negative index never increases the size of an array, but I don't know why. Why did the code below succeed?
a[-2..-3] = [3,4] #=> [3, 4]
I would suggest you to take a look at the first para in Array documentation. It surprisingly says: “A negative index is assumed to be relative to the end of the array—that is, an index of -1 indicates the last element of the array, -2 is the next to last element in the array, and so on.”
That means, that you may set a[-N]th element if and only |N| <= a.size. That’s why a = [1,2] ; a[-3] = 3 fails (3 > 2).
On the other hand, there is [likely] not documented feature for ruby arrays: a[INBOUNDS_IDX..NONSENSE_IDX]=SMTH will insert SMTH before INBOUNDS_IDX index:
a=[1,2]
a[2..0]='a'
a[2..1]='b'
a[2..-100]='c'
# ⇒ [1, 2, "c", "b", "a"]
a[2..-1]='q'
# ⇒ [1, 2, "q"]
Nonsense here means “less than INBOUNDS_IDX, and not treatable as index in an negative notation” (that’s why a[2..-1] in the example above is treated as a[2..(a.size - 1)].)
Q1:
An empty array has 0 elements, so when you try to set its element 0, with negative index -1, it will give an error.
Because negative index cycles through the array from the end.
So
a = []; a[-1] = 3 makes it impossible to
a) get the element at last position, since its null
b) set its value. since it was never captured.
a[0] = 5 will work because you are telling the compiler to
a) grab the first element,
b) create one if not present, and then assign that to the value you requested.
See official api doc specifically mentioning positive index can grow size, negative index past the beginning of the array raises an error.
Q2:
The above explanation almost answers the second question as well.
Given a = [1,2]
a[-3] = 3 causes the first point of break. You are trying to access the 3rd element from the end, which does not exist. By design it breaks down.
While a[-2..-3] is within the capture range of the defined array.
You ask the interpreter to capture the second element from the last (1st from the front in this case), and try to invoke a range which is asking it to increase the array's size, and populate it with whatever you requested.
Happily, all is still well and as desired. Good to know.
Observation #1
The -1 index is related to the last element, if the array has no size, [], you can't use it until you initialize it with one or more elements.
Observation #2:
Yes, you are right, the negative index never increases size of the array, it only references a concrete existing position in the array.
Don't think the array is circular—the 0 index clued to the N-1 index—so you can't use any negative index thinking that it's valid.
I was going through the exercises in Ruby Koans and I was struck by the following Ruby quirk that I found really unexplainable:
array = [:peanut, :butter, :and, :jelly]
array[0] #=> :peanut #OK!
array[0,1] #=> [:peanut] #OK!
array[0,2] #=> [:peanut, :butter] #OK!
array[0,0] #=> [] #OK!
array[2] #=> :and #OK!
array[2,2] #=> [:and, :jelly] #OK!
array[2,20] #=> [:and, :jelly] #OK!
array[4] #=> nil #OK!
array[4,0] #=> [] #HUH?? Why's that?
array[4,100] #=> [] #Still HUH, but consistent with previous one
array[5] #=> nil #consistent with array[4] #=> nil
array[5,0] #=> nil #WOW. Now I don't understand anything anymore...
So why is array[5,0] not equal to array[4,0]? Is there any reason why array slicing behaves this weird when you start at the (length+1)th position??
Slicing and indexing are two different operations, and inferring the behaviour of one from the other is where your problem lies.
The first argument in slice identifies not the element but the places between elements, defining spans (and not elements themselves):
:peanut :butter :and :jelly
0 1 2 3 4
4 is still within the array, just barely; if you request 0 elements, you get the empty end of the array. But there is no index 5, so you can't slice from there.
When you do index (like array[4]), you are pointing at elements themselves, so the indices only go from 0 to 3.
this has to do with the fact that slice returns an array, relevant source documentation from Array#slice:
* call-seq:
* array[index] -> obj or nil
* array[start, length] -> an_array or nil
* array[range] -> an_array or nil
* array.slice(index) -> obj or nil
* array.slice(start, length) -> an_array or nil
* array.slice(range) -> an_array or nil
which suggests to me that if you give the start that is out of bounds, it will return nil, thus in your example array[4,0] asks for the 4th element that exists, but asks to return an array of zero elements. While array[5,0] asks for an index out of bounds so it returns nil. This perhaps makes more sense if you remember that the slice method is returning a new array, not altering the original data structure.
EDIT:
After reviewing the comments I decided to edit this answer. Slice calls the following code snippet when the arg value is two:
if (argc == 2) {
if (SYMBOL_P(argv[0])) {
rb_raise(rb_eTypeError, "Symbol as array index");
}
beg = NUM2LONG(argv[0]);
len = NUM2LONG(argv[1]);
if (beg < 0) {
beg += RARRAY(ary)->len;
}
return rb_ary_subseq(ary, beg, len);
}
if you look in the array.c class where the rb_ary_subseq method is defined, you see that it is returning nil if the length is out of bounds, not the index:
if (beg > RARRAY_LEN(ary)) return Qnil;
In this case this is what is happening when 4 is passed in, it checks that there are 4 elements and thus does not trigger the nil return. It then goes on and returns an empty array if the second arg is set to zero. while if 5 is passed in, there are not 5 elements in the array, so it returns nil before the zero arg is evaluated. code here at line 944.
I believe this to be a bug, or at least unpredictable and not the 'Principle of Least Surprise'. When I get a few minutes I will a least submit a failing test patch to ruby core.
At least note that the behavior is consistent. From 5 on up everything acts the same; the weirdness only occurs at [4,N].
Maybe this pattern helps, or maybe I'm just tired and it doesn't help at all.
array[0,4] => [:peanut, :butter, :and, :jelly]
array[1,3] => [:butter, :and, :jelly]
array[2,2] => [:and, :jelly]
array[3,1] => [:jelly]
array[4,0] => []
At [4,0], we catch the end of the array. I'd actually find it rather odd, as far as beauty in patterns go, if the last one returned nil. Because of a context like this, 4 is an acceptable option for the first parameter so that the empty array can be returned. Once we hit 5 and up, though, the method likely exits immediately by nature of being totally and completely out of bounds.
This makes sense when you consider than an array slice can be a valid lvalue, not just an rvalue:
array = [:peanut, :butter, :and, :jelly]
# replace 0 elements starting at index 5 (insert at end or array):
array[4,0] = [:sandwich]
# replace 0 elements starting at index 0 (insert at head of array):
array[0,0] = [:make, :me, :a]
# array is [:make, :me, :a, :peanut, :butter, :and, :jelly, :sandwich]
# this is just like replacing existing elements:
array[3, 4] = [:grilled, :cheese]
# array is [:make, :me, :a, :grilled, :cheese, :sandwich]
This wouldn't be possible if array[4,0] returned nil instead of []. However, array[5,0] returns nil because it's out of bounds (inserting after the 4th element of a 4-element array is meaningful, but inserting after the 5th element of a 4 element array is not).
Read the slice syntax array[x,y] as "starting after x elements in array, select up to y elements". This is only meaningful if array has at least x elements.
This does make sense
You need to be able to assign to those slices, so they are defined in such a way that the beginning and the end of the string have working zero-length expressions.
array[4, 0] = :sandwich
array[0, 0] = :crunchy
=> [:crunchy, :peanut, :butter, :and, :jelly, :sandwich]
I found explanation by Gary Wright very helpful as well.
http://www.ruby-forum.com/topic/1393096#990065
The answer by Gary Wright is -
http://www.ruby-doc.org/core/classes/Array.html
The docs certainly could be more clear but the actual behavior is
self-consistent and useful.
Note: I'm assuming 1.9.X version of String.
It helps to consider the numbering in the following way:
-4 -3 -2 -1 <-- numbering for single argument indexing
0 1 2 3
+---+---+---+---+
| a | b | c | d |
+---+---+---+---+
0 1 2 3 4 <-- numbering for two argument indexing or start of range
-4 -3 -2 -1
The common (and understandable) mistake is too assume that the semantics
of the single argument index are the same as the semantics of the
first argument in the two argument scenario (or range). They are not
the same thing in practice and the documentation doesn't reflect this.
The error though is definitely in the documentation and not in the
implementation:
single argument: the index represents a single character position
within the string. The result is either the single character string
found at the index or nil because there is no character at the given
index.
s = ""
s[0] # nil because no character at that position
s = "abcd"
s[0] # "a"
s[-4] # "a"
s[-5] # nil, no characters before the first one
two integer arguments: the arguments identify a portion of the string to
extract or to replace. In particular, zero-width portions of the string
can also be identified so that text can be inserted before or after
existing characters including at the front or end of the string. In this
case, the first argument does not identify a character position but
instead identifies the space between characters as shown in the diagram
above. The second argument is the length, which can be 0.
s = "abcd" # each example below assumes s is reset to "abcd"
To insert text before 'a': s[0,0] = "X" # "Xabcd"
To insert text after 'd': s[4,0] = "Z" # "abcdZ"
To replace first two characters: s[0,2] = "AB" # "ABcd"
To replace last two characters: s[-2,2] = "CD" # "abCD"
To replace middle two characters: s[1..3] = "XX" # "aXXd"
The behavior of a range is pretty interesting. The starting point is the
same as the first argument when two arguments are provided (as described
above) but the end point of the range can be the 'character position' as
with single indexing or the "edge position" as with two integer
arguments. The difference is determined by whether the double-dot range
or triple-dot range is used:
s = "abcd"
s[1..1] # "b"
s[1..1] = "X" # "aXcd"
s[1...1] # ""
s[1...1] = "X" # "aXbcd", the range specifies a zero-width portion of
the string
s[1..3] # "bcd"
s[1..3] = "X" # "aX", positions 1, 2, and 3 are replaced.
s[1...3] # "bc"
s[1...3] = "X" # "aXd", positions 1, 2, but not quite 3 are replaced.
If you go back through these examples and insist and using the single
index semantics for the double or range indexing examples you'll just
get confused. You've got to use the alternate numbering I show in the
ascii diagram to model the actual behavior.
I agree that this seems like strange behavior, but even the official documentation on Array#slice demonstrates the same behavior as in your example, in the "special cases" below:
a = [ "a", "b", "c", "d", "e" ]
a[2] + a[0] + a[1] #=> "cab"
a[6] #=> nil
a[1, 2] #=> [ "b", "c" ]
a[1..3] #=> [ "b", "c", "d" ]
a[4..7] #=> [ "e" ]
a[6..10] #=> nil
a[-3, 3] #=> [ "c", "d", "e" ]
# special cases
a[5] #=> nil
a[5, 1] #=> []
a[5..10] #=> []
Unfortunately, even their description of Array#slice doesn't seem to offer any insight as to why it works this way:
Element Reference—Returns the element at index, or returns a subarray starting at start and continuing for length elements, or returns a subarray specified by range. Negative indices count backward from the end of the array (-1 is the last element). Returns nil if the index (or starting index) are out of range.
An explanation provided by Jim Weirich
One way to think about it is that index position 4 is at the very edge
of the array. When asking for a slice, you return as much of the
array that is left. So consider the array[2,10], array[3,10] and
array[4,10] ... each returns the remaining bits of the end of the
array: 2 elements, 1 element and 0 elements respectively. However,
position 5 is clearly outside the array and not at the edge, so
array[5,10] returns nil.
Consider the following array:
>> array=["a","b","c"]
=> ["a", "b", "c"]
You can insert an item to the begining (head) of the array by assigning it to a[0,0]. To put the element between "a" and "b", use a[1,0]. Basically, in the notation a[i,n], i represents an index and n a number of elements. When n=0, it defines a position between the elements of the array.
Now if you think about the end of the array, how can you append an item to its end using the notation described above? Simple, assign the value to a[3,0]. This is the tail of the array.
So, if you try to access the element at a[3,0], you will get []. In this case you are still in the range of the array. But if you try to access a[4,0], you'll get nil as return value, since you're not within the range of the array anymore.
Read more about it at http://mybrainstormings.wordpress.com/2012/09/10/arrays-in-ruby/ .
tl;dr: in the source code in array.c, different functions are called depending on whether you pass 1 or 2 arguments in to Array#slice resulting in the unexpected return values.
(First off, I'd like to point out that I don't code in C, but have been using Ruby for years. So if you're not familiar with C, but you take a few minutes to familiarize yourself with the basics of functions and variables it's really not that hard to follow the Ruby source code, as demonstrated below. This answer is based on Ruby v2.3, but is more or less the same back to v1.9.)
Scenario #1
array.length == 4; array.slice(4) #=> nil
If you look at the source code for Array#slice (rb_ary_aref), you see that when only one argument is passed in (lines 1277-1289), rb_ary_entry is called, passing in the index value (which can be positive or negative).
rb_ary_entry then calculates the position of the requested element from the beginning of the array (in other words, if a negative index is passed in, it computes the positive equivalent) and then calls rb_ary_elt to get the requested element.
As expected, rb_ary_elt returns nil when the length of the array len is less than or equal to the index (here called offset).
1189: if (offset < 0 || len <= offset) {
1190: return Qnil;
1191: }
Scenario #2
array.length == 4; array.slice(4, 0) #=> []
However when 2 arguments are passed in (i.e. the starting index beg, and length of the slice len), rb_ary_subseq is called.
In rb_ary_subseq, if the starting index beg is greater than the array length alen, nil is returned:
1208: long alen = RARRAY_LEN(ary);
1209:
1210: if (beg > alen) return Qnil;
Otherwise the length of the resulting slice len is calculated, and if it's determined to be zero, an empty array is returned:
1213: if (alen < len || alen < beg + len) {
1214: len = alen - beg;
1215: }
1216: klass = rb_obj_class(ary);
1217: if (len == 0) return ary_new(klass, 0);
So since the starting index of 4 is not greater than array.length, an empty array is returned instead of the nil value that one might expect.
Question answered?
If the actual question here isn't "What code causes this to happen?", but rather, "Why did Matz do it this way?", well you'll just have to buy him a cup of coffee at the next RubyConf and ask him.