ruby special match variables confusion - ruby

This code produces the expected result:
def test_sub_is_like_find_and_replace
assert_equal "one t-three", "one two-three".sub(/(t\w*)/) { $1[0, 1] }
end
I understand that $1 is a variable for the first match, but I am not clear what the [0,1] is, or why it takes out the last two letters of "two".

This is covered in the String.[] documentation, in particular:
str[start, length] → new_str or nil
So, $1[0, 1] means, "slice the string returning from character at index 0 to index 0 + 1."

The [0,1] can be applied to any string to find 1 character starting at index position 0:
>> "Hello"[0,1]
=> "H"
Just for fun, something other than 0 and 1:
>> "Hello World"[3,5]
=> "lo Wo"
Starts at index position 3, takes 5 characters.
In your case
"two"[0, 1]
you take one character at index 0, namely "t". It looks like it removed the last two characters; in reality it produced only the first.

Related

Ruby - How to write a method that returns an array of strings?

I've tried different ways and this is probably the closest that I got to it. I am trying to write a method that takes in an array of strings and returns it containing the strings that are at least 5 characters long and end with "y".
I'm a beginner and this is my second problem I've come across with, and I've tried multiple if statements and using a while loop, however I could not get to it and now this is where I am at. Thank you!
def phrases(arr1, arr2)
arr1 = ["funny", "tidy", "fish", "foogiliously"]
arr2 = ["happily", "lovely", "hello", "multivitaminly"]
if (arr1.length > 5 && arr1.length == "y")
return arr1
elsif (arr2.length > 5 && arr2.length == "y")
return arr2
end
end
puts phrases(["funny", "tidy", "fish", "foogiliously"])
puts phrases(["happily", "lovely", "hello", "multivitaminly"])
If I'm understanding your question correctly, you want to return a subset of the passed in array matching your conditions (length ≥ 5 and last character = 'y'). In that case:
def phrases(words)
words.grep(/.{4}y\z/)
end
What that regex does:
.{4} means 4 of any character
y is the letter y
\z is the end of the string, so we don't match in the middle of a long word
The docs for Enumerable#select are here (an Array is an Enumerable).
Output:
> phrases(["funny", "tidy", "fish", "foogiliously"])
=> ["funny", "foogiliously"]
> phrases(["happily", "lovely", "hello", "multivitaminly"])
=> ["happily", "lovely", "multivitaminly"]
If you only want word characters, rather than any character, you'd use this regex instead: /\A.{4,}y\z/. In that case, \A means the start of the string, and \w{4,} means at least 4 word characters.
If, when given an array and inclusion criterion, one wishes to construct an array that contains those elements of the first array that satisfy the inclusion criterion, one generally uses the method Array#select or Array#reject, whichever is more more convenient.
Suppose arr is a variable that holds the given array and include_element? is a method that takes one argument, an element of arr, and returns true or false, depending on whether the inclusion criterion is satisified for that element. For example, say the array comprises the integers 1 through 6 and the inclusion criterion is that the number is even (2, 4 and 6). We could write:
arr = [1,2,3,4,5,6]
def include_element?(e)
e.even?
end
include_element?(2)
#=> true
include_element?(3)
#=> false
arr.select { |e| include_element?(e) }
#=> [2, 4, 6]
The method include_element? is so short we probably would substitute it out and just write:
arr.select { |e| e.even? }
Array#select passes each element of its receiver, arr, to select's block, assigns the block variable e to that value and evaluates the expression in the block (which could be many lines, of course). Here that expresssion is just e.even?, which returns true or false. (See Integer#even? and Integer#odd?.)
If that expression evaluates as a truthy value, the element e is to be included in the array that is returned; if it evaluates as a falsy value, e is not to be included. Falsy values (logical false) are nil and false; truthy values (logical true) are all other Ruby objects, which of course includes true.
Notice that we could instead write:
arr.reject { |e| e.odd? }
Sometimes the inclusion criterion consists of a compound expression. For example, suppose the inclusion criterion were to keep elements of arr that are both even numbers and are at least 4. We would write:
arr.select { |e| e.even? && e >= 4 }
#=> [4, 6]
With other criteria we might write:
arr.select { |e| e.even? || e >= 4 }
#=> [2, 4, 5, 6]
or
arr.select { |e| e < 2 || (e > 3 && e < 6) }
#=> [1, 4, 5]
&& (logical 'and') and || (logical 'or') are operators (search "operator expressions"). As explained at the link, most Ruby operators are actually methods, but these two are among a few that are not.
Your problem now reduces to the following:
arr.select { |str| <length of str is at least 5> && <last character of str is 'y'> }
You should be able to supply code for the <...> bits.
You are trying to write a function that should work on a single array at a time I think. Also, you are taking in an array, and retaining only those elements that satisfy your conditions: at least 5 characters long, and ends with y. This is a filtering operation. Read about the methods available for ruby's Array class here
def phrases(array)
...
filtered_array
end
Now the condition you are using is this arr1.length > 5 && arr1.length == "y".
The first half should check if the string length is greater than 5, not the array length itself. The second half is an indexing operation, and your code for that is incorrect. basically you are checking if the last character in the string is y.
Usually strings are indexed in this manner: string[index]. In your case you can use string[-1]=='y' or string[string.length - 1]=='y'. This because arrays and strings are zero indexed in ruby. The first element has index of 0, the second has an index of 1, and the last one, therefore, will have an index of length-1. If you use negative indexes then the array is indexed from the end, so string[-1] is a quick way to get to the last element.
Considering this, the function will take the following structure:
def phrases(array)
filtered_array = [] # an empty array
loop through the input array
for each element check for the condition element.length > 5 && element[-1]=='y'
if true: push the element into the filtered_array
once the loop is done, return the filtered array
end
Read about ruby arrays, the methods push, filter and select in the above linked documentation to get a better idea. I'd also recommend the codeacademy ruby tutorial.
Edit: Both halves of the condition are incorrect. I had overlooked a mistake in my earlier answer. arr1.length refers to the length of the array. You want to check the length of each string in the array. So in your for loop you should check the length of the loop variable, if that is greater than 5.
You may want to spend some time reading about the methods in the core library, especially String#end_with? and Enumerable#select. You could then write a method that'd contain something like this:
['abc', 'qwerty', 'asdfghjk', 'y'].select{|s| s.length >= 5}.select{|s| s.end_with? 'y'}
#=> ["qwerty"]

Capitalize every nth character of each word in a string in Ruby

I need to capitalize every 'nth' character for each word in a string (every multiple of 4-th character in this example, so character 4, 8, 12 etc).
I came up with the code below (not very elegant I know!) but it only works for words which length < 8.
'capitalize every fourth character in this string'.split(' ').map do |word|
word.split('').map.with_index do |l,idx|
idx % 3 == 0 && idx > 0 ? word[idx].upcase : l
end
.join('')
end
.flatten.join(' ')
Anybody could show me how to capitalize every 4th character in words which length > 8?
Thanks!
str = 'capitalize every fourth character in this string'
idx = 0
str.gsub(/./) do |c|
case c
when ' '
idx = 0
c
else
idx += 1
(idx % 4).zero? ? c.upcase : c
end
end
#=> "capItalIze eveRy fouRth chaRactEr in thiS strIng"
As an option, you can just modify the nth character in the string if it exists by accessing the character by index:
'capitalizinga every fourth character in this string'.split(' ').map do |word|
(3..word.length).step(4) do |x|
c = word[x]
word[x] = c.upcase if c
end
word
end.join(' ')
# capItalIzinGa eveRy fouRth chaRactEr in thiS strIng
Here is the method step or Range class is used, so each fourth index could be calculated: 3, 7, 11, etc...
I think the easiest way is to use a regex with substitution:
'capitalize every fourth character in this string'
.gsub(/([\w]{3})(\w)|([\w]{1,3})/) {
"#{$1}#{$2.to_s.upcase}#{$3}"
}
# => capItalIze eveRy fouRth chaRactEr in thiS strIng
This uses 2 alternatives with captured groups - the first alternative matches 4 characters and the second everything with 1 to 3 characters. Group $1 will match exactly three letters and group $2 the fourth letter within a 4-letter block - while group $3 will match remainders of a longer word as well words shorter than 4 characters.
You can then replace group $2 globally with gsub. Also you need to do $2.to_s in case $2 is nil (or catch that scenario with a ternary operator).
You can inspect the regex here and try the code here
> str.split(" ").map{|word|
word.chars.each_with_index{|c,i|
c.upcase! if (i > 0 && (i+1)%4 == 0)}.join}.join(" ")
#=> "capItalIze eveRy fouRth chaRactEr in thiS strIng"
def capitalize_each_nth_char(str, n)
str.chars.each_slice(n).to_a.each { |arr| arr[-1] = arr[-1].upcase if arr.size == n }.join('')
end
Here is the explanation,
str.chars # will give array of characters
str.chars.each_slice(n) # will give an enumerator as, #<Enumerator: ...>
str.chars.each_slice(n).to_a # will give an array of arrays
arr[-1].upcase # it will capitalize the last element i.e. 4th element of each array
if arr.size == n # it will prevent to capitalize last element of sub-array if it's size is less than n(in our case 4)
str.chars.each_slice(n).to_a.each { |arr| arr[-1] = arr[-1].upcase if arr.size == n } # it will give array of subarray where every subarray last element is capital
str.chars.each_slice(n).to_a.each { |arr| arr[-1] = arr[-1].upcase if arr.size == n }.join('') # it will give the final result as, "capItalIze EverY foUrth chaRactEr iN thIs sTrinG"

Explain behaviour of ruby array when selecting inexistent elements [duplicate]

I was going through the exercises in Ruby Koans and I was struck by the following Ruby quirk that I found really unexplainable:
array = [:peanut, :butter, :and, :jelly]
array[0] #=> :peanut #OK!
array[0,1] #=> [:peanut] #OK!
array[0,2] #=> [:peanut, :butter] #OK!
array[0,0] #=> [] #OK!
array[2] #=> :and #OK!
array[2,2] #=> [:and, :jelly] #OK!
array[2,20] #=> [:and, :jelly] #OK!
array[4] #=> nil #OK!
array[4,0] #=> [] #HUH?? Why's that?
array[4,100] #=> [] #Still HUH, but consistent with previous one
array[5] #=> nil #consistent with array[4] #=> nil
array[5,0] #=> nil #WOW. Now I don't understand anything anymore...
So why is array[5,0] not equal to array[4,0]? Is there any reason why array slicing behaves this weird when you start at the (length+1)th position??
Slicing and indexing are two different operations, and inferring the behaviour of one from the other is where your problem lies.
The first argument in slice identifies not the element but the places between elements, defining spans (and not elements themselves):
:peanut :butter :and :jelly
0 1 2 3 4
4 is still within the array, just barely; if you request 0 elements, you get the empty end of the array. But there is no index 5, so you can't slice from there.
When you do index (like array[4]), you are pointing at elements themselves, so the indices only go from 0 to 3.
this has to do with the fact that slice returns an array, relevant source documentation from Array#slice:
* call-seq:
* array[index] -> obj or nil
* array[start, length] -> an_array or nil
* array[range] -> an_array or nil
* array.slice(index) -> obj or nil
* array.slice(start, length) -> an_array or nil
* array.slice(range) -> an_array or nil
which suggests to me that if you give the start that is out of bounds, it will return nil, thus in your example array[4,0] asks for the 4th element that exists, but asks to return an array of zero elements. While array[5,0] asks for an index out of bounds so it returns nil. This perhaps makes more sense if you remember that the slice method is returning a new array, not altering the original data structure.
EDIT:
After reviewing the comments I decided to edit this answer. Slice calls the following code snippet when the arg value is two:
if (argc == 2) {
if (SYMBOL_P(argv[0])) {
rb_raise(rb_eTypeError, "Symbol as array index");
}
beg = NUM2LONG(argv[0]);
len = NUM2LONG(argv[1]);
if (beg < 0) {
beg += RARRAY(ary)->len;
}
return rb_ary_subseq(ary, beg, len);
}
if you look in the array.c class where the rb_ary_subseq method is defined, you see that it is returning nil if the length is out of bounds, not the index:
if (beg > RARRAY_LEN(ary)) return Qnil;
In this case this is what is happening when 4 is passed in, it checks that there are 4 elements and thus does not trigger the nil return. It then goes on and returns an empty array if the second arg is set to zero. while if 5 is passed in, there are not 5 elements in the array, so it returns nil before the zero arg is evaluated. code here at line 944.
I believe this to be a bug, or at least unpredictable and not the 'Principle of Least Surprise'. When I get a few minutes I will a least submit a failing test patch to ruby core.
At least note that the behavior is consistent. From 5 on up everything acts the same; the weirdness only occurs at [4,N].
Maybe this pattern helps, or maybe I'm just tired and it doesn't help at all.
array[0,4] => [:peanut, :butter, :and, :jelly]
array[1,3] => [:butter, :and, :jelly]
array[2,2] => [:and, :jelly]
array[3,1] => [:jelly]
array[4,0] => []
At [4,0], we catch the end of the array. I'd actually find it rather odd, as far as beauty in patterns go, if the last one returned nil. Because of a context like this, 4 is an acceptable option for the first parameter so that the empty array can be returned. Once we hit 5 and up, though, the method likely exits immediately by nature of being totally and completely out of bounds.
This makes sense when you consider than an array slice can be a valid lvalue, not just an rvalue:
array = [:peanut, :butter, :and, :jelly]
# replace 0 elements starting at index 5 (insert at end or array):
array[4,0] = [:sandwich]
# replace 0 elements starting at index 0 (insert at head of array):
array[0,0] = [:make, :me, :a]
# array is [:make, :me, :a, :peanut, :butter, :and, :jelly, :sandwich]
# this is just like replacing existing elements:
array[3, 4] = [:grilled, :cheese]
# array is [:make, :me, :a, :grilled, :cheese, :sandwich]
This wouldn't be possible if array[4,0] returned nil instead of []. However, array[5,0] returns nil because it's out of bounds (inserting after the 4th element of a 4-element array is meaningful, but inserting after the 5th element of a 4 element array is not).
Read the slice syntax array[x,y] as "starting after x elements in array, select up to y elements". This is only meaningful if array has at least x elements.
This does make sense
You need to be able to assign to those slices, so they are defined in such a way that the beginning and the end of the string have working zero-length expressions.
array[4, 0] = :sandwich
array[0, 0] = :crunchy
=> [:crunchy, :peanut, :butter, :and, :jelly, :sandwich]
I found explanation by Gary Wright very helpful as well.
http://www.ruby-forum.com/topic/1393096#990065
The answer by Gary Wright is -
http://www.ruby-doc.org/core/classes/Array.html
The docs certainly could be more clear but the actual behavior is
self-consistent and useful.
Note: I'm assuming 1.9.X version of String.
It helps to consider the numbering in the following way:
-4 -3 -2 -1 <-- numbering for single argument indexing
0 1 2 3
+---+---+---+---+
| a | b | c | d |
+---+---+---+---+
0 1 2 3 4 <-- numbering for two argument indexing or start of range
-4 -3 -2 -1
The common (and understandable) mistake is too assume that the semantics
of the single argument index are the same as the semantics of the
first argument in the two argument scenario (or range). They are not
the same thing in practice and the documentation doesn't reflect this.
The error though is definitely in the documentation and not in the
implementation:
single argument: the index represents a single character position
within the string. The result is either the single character string
found at the index or nil because there is no character at the given
index.
s = ""
s[0] # nil because no character at that position
s = "abcd"
s[0] # "a"
s[-4] # "a"
s[-5] # nil, no characters before the first one
two integer arguments: the arguments identify a portion of the string to
extract or to replace. In particular, zero-width portions of the string
can also be identified so that text can be inserted before or after
existing characters including at the front or end of the string. In this
case, the first argument does not identify a character position but
instead identifies the space between characters as shown in the diagram
above. The second argument is the length, which can be 0.
s = "abcd" # each example below assumes s is reset to "abcd"
To insert text before 'a': s[0,0] = "X" # "Xabcd"
To insert text after 'd': s[4,0] = "Z" # "abcdZ"
To replace first two characters: s[0,2] = "AB" # "ABcd"
To replace last two characters: s[-2,2] = "CD" # "abCD"
To replace middle two characters: s[1..3] = "XX" # "aXXd"
The behavior of a range is pretty interesting. The starting point is the
same as the first argument when two arguments are provided (as described
above) but the end point of the range can be the 'character position' as
with single indexing or the "edge position" as with two integer
arguments. The difference is determined by whether the double-dot range
or triple-dot range is used:
s = "abcd"
s[1..1] # "b"
s[1..1] = "X" # "aXcd"
s[1...1] # ""
s[1...1] = "X" # "aXbcd", the range specifies a zero-width portion of
the string
s[1..3] # "bcd"
s[1..3] = "X" # "aX", positions 1, 2, and 3 are replaced.
s[1...3] # "bc"
s[1...3] = "X" # "aXd", positions 1, 2, but not quite 3 are replaced.
If you go back through these examples and insist and using the single
index semantics for the double or range indexing examples you'll just
get confused. You've got to use the alternate numbering I show in the
ascii diagram to model the actual behavior.
I agree that this seems like strange behavior, but even the official documentation on Array#slice demonstrates the same behavior as in your example, in the "special cases" below:
a = [ "a", "b", "c", "d", "e" ]
a[2] + a[0] + a[1] #=> "cab"
a[6] #=> nil
a[1, 2] #=> [ "b", "c" ]
a[1..3] #=> [ "b", "c", "d" ]
a[4..7] #=> [ "e" ]
a[6..10] #=> nil
a[-3, 3] #=> [ "c", "d", "e" ]
# special cases
a[5] #=> nil
a[5, 1] #=> []
a[5..10] #=> []
Unfortunately, even their description of Array#slice doesn't seem to offer any insight as to why it works this way:
Element Reference—Returns the element at index, or returns a subarray starting at start and continuing for length elements, or returns a subarray specified by range. Negative indices count backward from the end of the array (-1 is the last element). Returns nil if the index (or starting index) are out of range.
An explanation provided by Jim Weirich
One way to think about it is that index position 4 is at the very edge
of the array. When asking for a slice, you return as much of the
array that is left. So consider the array[2,10], array[3,10] and
array[4,10] ... each returns the remaining bits of the end of the
array: 2 elements, 1 element and 0 elements respectively. However,
position 5 is clearly outside the array and not at the edge, so
array[5,10] returns nil.
Consider the following array:
>> array=["a","b","c"]
=> ["a", "b", "c"]
You can insert an item to the begining (head) of the array by assigning it to a[0,0]. To put the element between "a" and "b", use a[1,0]. Basically, in the notation a[i,n], i represents an index and n a number of elements. When n=0, it defines a position between the elements of the array.
Now if you think about the end of the array, how can you append an item to its end using the notation described above? Simple, assign the value to a[3,0]. This is the tail of the array.
So, if you try to access the element at a[3,0], you will get []. In this case you are still in the range of the array. But if you try to access a[4,0], you'll get nil as return value, since you're not within the range of the array anymore.
Read more about it at http://mybrainstormings.wordpress.com/2012/09/10/arrays-in-ruby/ .
tl;dr: in the source code in array.c, different functions are called depending on whether you pass 1 or 2 arguments in to Array#slice resulting in the unexpected return values.
(First off, I'd like to point out that I don't code in C, but have been using Ruby for years. So if you're not familiar with C, but you take a few minutes to familiarize yourself with the basics of functions and variables it's really not that hard to follow the Ruby source code, as demonstrated below. This answer is based on Ruby v2.3, but is more or less the same back to v1.9.)
Scenario #1
array.length == 4; array.slice(4) #=> nil
If you look at the source code for Array#slice (rb_ary_aref), you see that when only one argument is passed in (lines 1277-1289), rb_ary_entry is called, passing in the index value (which can be positive or negative).
rb_ary_entry then calculates the position of the requested element from the beginning of the array (in other words, if a negative index is passed in, it computes the positive equivalent) and then calls rb_ary_elt to get the requested element.
As expected, rb_ary_elt returns nil when the length of the array len is less than or equal to the index (here called offset).
1189: if (offset < 0 || len <= offset) {
1190: return Qnil;
1191: }
Scenario #2
array.length == 4; array.slice(4, 0) #=> []
However when 2 arguments are passed in (i.e. the starting index beg, and length of the slice len), rb_ary_subseq is called.
In rb_ary_subseq, if the starting index beg is greater than the array length alen, nil is returned:
1208: long alen = RARRAY_LEN(ary);
1209:
1210: if (beg > alen) return Qnil;
Otherwise the length of the resulting slice len is calculated, and if it's determined to be zero, an empty array is returned:
1213: if (alen < len || alen < beg + len) {
1214: len = alen - beg;
1215: }
1216: klass = rb_obj_class(ary);
1217: if (len == 0) return ary_new(klass, 0);
So since the starting index of 4 is not greater than array.length, an empty array is returned instead of the nil value that one might expect.
Question answered?
If the actual question here isn't "What code causes this to happen?", but rather, "Why did Matz do it this way?", well you'll just have to buy him a cup of coffee at the next RubyConf and ask him.

Find the combinations of a given encoded string using Ruby

I was asked this question during an interview and I couldn't come up with a satisfactory solution for it. Would appreciate if anybody could give some pointers.
Given a mapping like
mapping = {"A" => 1, "B" => 2, "C" => 3..... "Z" => 26}
encode("A") == "1"
encode("BA") == "21"
encode("ABC") == "123"
encode("") == ""
decode("1") == ["A"] -> 1
decode("21") == ["BA", "V"] -> 2
decode("123") == ["ABC", "JC", "AX"] -> 3
decode("012") == [] -> 0
decode("") == [""] -> 1
decode("102") == ["JB"] -> 1
numDecode(X) == len(decode(X))
numDecode("1") == 1
numDecode("21") == 2
numDecode("123") == 3
numDecode("") == 1
numDecode("102") == 1
numDecode("012") == 0
We need a numDecode method which gives the length of unique solution array.
Updated :
Given a mapping like :
mapping = {"A" => 1, "B" => 2, "C" => 3..... "Z" => 26}
Suppose we are given a string as "A" the it can be encoded as : "1"
encode("A") should return "1"
encode("BA") should return "21" as if mapping is a hash then B has a value of 2, A has a value of 1.
encode("ABC") should return "123" as mapping["A" is 1, mapping["B"] is 2, and mapping["C"] is 3.
encode("") should return "" as it is not in mapping.
Now if decode("1") is called then it should return an array with one element i.e. ["A"] as key matching with 1 as value in mapping is "A".
decode("") should return an array with empty string i.e. [""].
decode("21") should return an array ["BA", "U"] as 2 is "B", 1 is "A" and "U" is 21 in mapping.
decode("012") should return an empty array as string starts with "0" which is not in mapping keys.
decode("102") should return an array as ["JB"] as "10" is J and "2" is B.
And finally numDecode should return the count of unique decoded strings in array. So,
numDecode(X) == len(decode(X))
numDecode("1") == 1
numDecode("21") == 2
numDecode("123") == 3
numDecode("") == 1
numDecode("102") == 1
numDecode("012") == 0
This is an interesting question, and the interview technique that goes with it is most likely to see how far the critical thinking goes. A good interviewer would probably not expect a single canonically correct answer.
If you get as far as a recursive decode solution that you then enumerate, then you are doing well IMO (at least I'd hire most candidates who could demonstrate clearly thinking through a piece of recursive code at interview!)
Having said that, one key hint is that the question asks for a num_decode function, not necessarily for implementations of encode and decode.
There is a deeper understanding and structure accessible here, that can be gained from analysing the permutations and combinations. It allows you to write a num_decode function that can handle long strings with millions of possible answers, without filling memory or taking hours to enumerate all possibilities.
First note that any set of separate ambiguous encoding multiply the number of possibilities for the whole string:
1920 -> 19 is ambiguous 'AI' or 'S' -> 'AIT' or 'ST'
192011 -> 11 is also ambiguous 'AA' or 'K' -> 'AITAA', 'AITK', 'STAA', 'STK'
Here 19 has two possible interpretations, and 11 also has two. A string with both of these separate instances of ambiguous codings has 2 * 2 == 4 valid combinations.
Each independent section of ambiguous coding multiplies the size of the whole set of decode values by the number of possibilities that it represents.
Next how to deal with longer ambiguous sections. What happens when you add an ambiguous digit to an ambiguous sequence:
11 -> 'AA' or 'K' -> 2
111 -> 'AAA', 'AK', 'KA' -> 3
1111 -> 'AAAA', 'AAK', 'AKA', 'KAA', 'KK' -> 5
11111 -> 'AAAAA', 'AAAK', 'AAKA', 'AKAA', 'AKK', 'KAAA', 'KAK', 'KKA' -> 8
2,3,5,8 should look familiar, it is the Fibonacci sequence, what's going on? The answer is that adding one digit to the sequence allows all the previous combinations plus those of the sub-sequence before that. By adding a digit 1 to the sequence 1111 you can either interpret it as 1111(1) or 111(11) - so you can add together the number of possibilities in 1111 and 111 to get the number of possibilities in 11111. That is, N(i) = N(i-1) + N(i-2) which is how to construct the Fibonacci series.
So, if we can detect ambiguous coding sequences, and get their length, we can now calculate the number of possible decodes, without actually doing the decode:
# A caching Fibonacci sequence generator
def fib n
#fibcache ||= []
return #fibcache[n] if #fibcache[n]
a = b = 1
n.times do |i|
a, b = b, a + b
#fibcache[i+1] = a
end
#fibcache[n]
end
def num_decode encoded
# Check that we don't have invalid sequences, raising here, but you
# could technically return 0 and be correct according to question
if encoded.match(/[^0-9]/) || encoded.match(/(?<![12])0/)
raise ArgumentError, "Not a valid encoded sequence"
end
# The look-ahead assertion ensures we don't match
# a '1' or '2' that is needed by a '10' or '20'
ambiguous = encoded.scan(/[12]*1[789]|[12]+[123456](?![0])/)
ambiguous.inject(1) { |n,s| n * fib(s.length) }
end
# A few examples:
num_decode('') # => 1
num_decode('1') # => 1
num_decode('12') # => 2
num_decode('120') # => 1
num_decode('12121212') # => 34
num_decode('1212121212121212121212121211212121212121') # => 165580141
It is relatively short strings like the last one which foil attempts to enumerate
the possibilities directly by decoding.
The regex in the scan took a little experimentation to get right. Adding 7,8 or 9 is ambiguous after a 1, but not after a 2. You also want to avoid counting a 1 or 2 directly before a 0 as part of an ambiguous sequence because 10 or 20 have no other interpretations. I think I made about a dozen attempts at the regex before settling on the current version (which I believe to be correct, but I did keep finding exceptions to correct values most times I tested the first versions).
Finally, as an exercise, it should be possible to use this code as the basis from which to write a decoder that directly output the Nth possible decoding (or even one that enumerated them lazily from any starting point, without requiring excessive memory or CPU time).
Here's a recursive solution:
$mapping = Hash[(0..25).map { |i| [('A'.ord+i).chr,i+1] }]
$itoa = Hash[$mapping.to_a.map { |pair| pair.reverse.map(&:to_s) }]
def decode( str )
return [''] if str.empty?
return $itoa.key?(str) ? [$itoa[str]] : nil if str.length == 1
retval = []
0.upto(str.length-1) do |i|
word = $itoa[str[0..i]] or next
tails = decode(str[i+1..-1]) or next
retval.concat tails.map { |tail| word + tail }
end
return retval
end
Some sample output:
p decode('1') # ["A"]
p decode('21') # ["BA", "U"]
p decode('123') # ["ABC", "AW", "LC"]
p decode('012') # []
p decode('') # [""]
p decode('102') # ["JB"]
p decode('12345') # ["ABCDE", "AWDE", "LCDE"]
Note differences between this output and the question. E.g. The 21st letter of the alphabet is "U", not "V". etc.
#he = Hash[("A".."Z").to_a.zip((1..26).to_a.map(&:to_s))]
# => {"A"=>"1", "B"=>"2",...,"Z"=>"26"}
#hd = #he.invert # => {"1"=>"A", "2"=>"B",.., "26"=>"Z"}
def decode(str, comb='', arr=[])
return arr << s if str.empty?
# Return if the first character of str is not a key of #hd
return arr unless (c = #hd[str[0]])
# Recurse with str less the first char, s with c appended and arr
arr = decode(str[1..-1], s+c, arr)
# If the first two chars of str are a key of #hd (with value c),
# recurse with str less the first two chars, s with c appended and arr
arr = decode(str[2..-1], s+c, arr) if str.size > 1 && (c = #hd[str[0..1]])
arr
end
def num_decode(str) decode(str).size end
decode('1') # => ["A"]
decode('') # => [""].
decode('21') # => ["BA", "U"]
decode('012') # => [""]
decode('102') # => ["JB"]
decode('123') # => ["ABC", "AW", "LC"]
decode('12345') # => ["ABCDE", "AWDE", "LCDE"]
decode('120345') # => ["ATCDE"]
decode('12720132') # => ["ABGTACB", "ABGTMB", "LGTACB", "LGTMB"]
Any more? Yes, I see a hand back there. The gentleman with the red hat wants to see '12121212':
decode('12121212')
# => ["ABABABAB", "ABABABL", "ABABAUB", "ABABLAB", "ABABLL", "ABAUBAB",
"ABAUBL", "ABAUUB", "ABLABAB", "ABLABL", "ABLAUB", "ABLLAB",
"ABLLL", "AUBABAB", "AUBABL", "AUBAUB", "AUBLAB", "AUBLL",
"AUUBAB", "AUUBL", "AUUUB", "LABABAB", "LABABL", "LABAUB",
"LABLAB", "LABLL", "LAUBAB", "LAUBL", "LAUUB", "LLABAB",
"LLABL", "LLAUB", "LLLAB", "LLLL"]
num_decode('1') # => 1
num_decode('21') # => 2
num_decode('12121212') # => 34
num_decode('12912912') # => 8
This looks like a combinatorics problem, but it's also a parsing problem.
(You asked for pointers, so I'm doing this in English rather than dusting off my Ruby.)
I would do something like this:
If X is an empty string, return 1
If X is not a string composed of digits starting with a nonzero digit, return 0
If X contains no 1's or 2's, return 1 (there's only one possible parsing)
If X contains 1's or 2's, it gets a bit more complicated:
Every 1 that is not the last character in X matches both "A" and the first digit of one of the letters "J" through "S".
Every 2 that is not the last character in X and is followed by a digit less than 7 matches both "B" and the first digit of one of the letters.
Count up your 1's and 2's that meet those criteria. Let that number be Y. You have 2^Y combinations of those, so the answer should be 2^Y but you have to subtract 1 for every time you have a 1 and 2 next to each other.
So, if you haven't returned by Step 4 above, count up your 1's that aren't the last character in X, and the 2's that both aren't the last character in X and aren't followed by a 7,8,9, or 10. Let the sum of those counts be called Y.
Now count every instance that those 1's and 2's are neighbors; let that sum be called Z.
The number of possible parsings is (2^Y) - Z.
In the spirit of giving “some pointers”, instead of writing an actually implementation for numDecode let me say that the most logically straightforward way to tackle this problem is with recursion. If the string passed to numDecode is longer than one character then look at the beginning of the string and based on what you see use one or two (or zero) recursive calls to find the correct value.
And the risk of revealing too much, numDecode("1122") should make recursive calls to numDecode("122") and numDecode("22").
# just look for all singles and double as you go down and keep repeating this.. if you get to the end where the string would be 1 or 2 digets long you count 1
# IE
# 121
# 1 that's good 2 that's good 1 that's good if all good then count + 1
# 12 that's good 1 that's good ... no more doubles if all good then count + 1
# 1 that's good 21 that's good if all good then count + 1
# test this on other cases
$str = "2022"
$strlength = $str.length
$count = 0
def decode(str)
if str[0].to_i >= 1 and str[0].to_i <= 9
$count += 1 if str.length == 1
decode(str[1..-1])
end
if str[0..1].to_i >= 10 and str[0..1].to_i <= 26
$count += 1 if str.length == 2
p str.length
decode(str[2..-1])
end
end
decode($str)
p " count is #{$count}"

Unexpected behavior of Ruby Array? [duplicate]

I was going through the exercises in Ruby Koans and I was struck by the following Ruby quirk that I found really unexplainable:
array = [:peanut, :butter, :and, :jelly]
array[0] #=> :peanut #OK!
array[0,1] #=> [:peanut] #OK!
array[0,2] #=> [:peanut, :butter] #OK!
array[0,0] #=> [] #OK!
array[2] #=> :and #OK!
array[2,2] #=> [:and, :jelly] #OK!
array[2,20] #=> [:and, :jelly] #OK!
array[4] #=> nil #OK!
array[4,0] #=> [] #HUH?? Why's that?
array[4,100] #=> [] #Still HUH, but consistent with previous one
array[5] #=> nil #consistent with array[4] #=> nil
array[5,0] #=> nil #WOW. Now I don't understand anything anymore...
So why is array[5,0] not equal to array[4,0]? Is there any reason why array slicing behaves this weird when you start at the (length+1)th position??
Slicing and indexing are two different operations, and inferring the behaviour of one from the other is where your problem lies.
The first argument in slice identifies not the element but the places between elements, defining spans (and not elements themselves):
:peanut :butter :and :jelly
0 1 2 3 4
4 is still within the array, just barely; if you request 0 elements, you get the empty end of the array. But there is no index 5, so you can't slice from there.
When you do index (like array[4]), you are pointing at elements themselves, so the indices only go from 0 to 3.
this has to do with the fact that slice returns an array, relevant source documentation from Array#slice:
* call-seq:
* array[index] -> obj or nil
* array[start, length] -> an_array or nil
* array[range] -> an_array or nil
* array.slice(index) -> obj or nil
* array.slice(start, length) -> an_array or nil
* array.slice(range) -> an_array or nil
which suggests to me that if you give the start that is out of bounds, it will return nil, thus in your example array[4,0] asks for the 4th element that exists, but asks to return an array of zero elements. While array[5,0] asks for an index out of bounds so it returns nil. This perhaps makes more sense if you remember that the slice method is returning a new array, not altering the original data structure.
EDIT:
After reviewing the comments I decided to edit this answer. Slice calls the following code snippet when the arg value is two:
if (argc == 2) {
if (SYMBOL_P(argv[0])) {
rb_raise(rb_eTypeError, "Symbol as array index");
}
beg = NUM2LONG(argv[0]);
len = NUM2LONG(argv[1]);
if (beg < 0) {
beg += RARRAY(ary)->len;
}
return rb_ary_subseq(ary, beg, len);
}
if you look in the array.c class where the rb_ary_subseq method is defined, you see that it is returning nil if the length is out of bounds, not the index:
if (beg > RARRAY_LEN(ary)) return Qnil;
In this case this is what is happening when 4 is passed in, it checks that there are 4 elements and thus does not trigger the nil return. It then goes on and returns an empty array if the second arg is set to zero. while if 5 is passed in, there are not 5 elements in the array, so it returns nil before the zero arg is evaluated. code here at line 944.
I believe this to be a bug, or at least unpredictable and not the 'Principle of Least Surprise'. When I get a few minutes I will a least submit a failing test patch to ruby core.
At least note that the behavior is consistent. From 5 on up everything acts the same; the weirdness only occurs at [4,N].
Maybe this pattern helps, or maybe I'm just tired and it doesn't help at all.
array[0,4] => [:peanut, :butter, :and, :jelly]
array[1,3] => [:butter, :and, :jelly]
array[2,2] => [:and, :jelly]
array[3,1] => [:jelly]
array[4,0] => []
At [4,0], we catch the end of the array. I'd actually find it rather odd, as far as beauty in patterns go, if the last one returned nil. Because of a context like this, 4 is an acceptable option for the first parameter so that the empty array can be returned. Once we hit 5 and up, though, the method likely exits immediately by nature of being totally and completely out of bounds.
This makes sense when you consider than an array slice can be a valid lvalue, not just an rvalue:
array = [:peanut, :butter, :and, :jelly]
# replace 0 elements starting at index 5 (insert at end or array):
array[4,0] = [:sandwich]
# replace 0 elements starting at index 0 (insert at head of array):
array[0,0] = [:make, :me, :a]
# array is [:make, :me, :a, :peanut, :butter, :and, :jelly, :sandwich]
# this is just like replacing existing elements:
array[3, 4] = [:grilled, :cheese]
# array is [:make, :me, :a, :grilled, :cheese, :sandwich]
This wouldn't be possible if array[4,0] returned nil instead of []. However, array[5,0] returns nil because it's out of bounds (inserting after the 4th element of a 4-element array is meaningful, but inserting after the 5th element of a 4 element array is not).
Read the slice syntax array[x,y] as "starting after x elements in array, select up to y elements". This is only meaningful if array has at least x elements.
This does make sense
You need to be able to assign to those slices, so they are defined in such a way that the beginning and the end of the string have working zero-length expressions.
array[4, 0] = :sandwich
array[0, 0] = :crunchy
=> [:crunchy, :peanut, :butter, :and, :jelly, :sandwich]
I found explanation by Gary Wright very helpful as well.
http://www.ruby-forum.com/topic/1393096#990065
The answer by Gary Wright is -
http://www.ruby-doc.org/core/classes/Array.html
The docs certainly could be more clear but the actual behavior is
self-consistent and useful.
Note: I'm assuming 1.9.X version of String.
It helps to consider the numbering in the following way:
-4 -3 -2 -1 <-- numbering for single argument indexing
0 1 2 3
+---+---+---+---+
| a | b | c | d |
+---+---+---+---+
0 1 2 3 4 <-- numbering for two argument indexing or start of range
-4 -3 -2 -1
The common (and understandable) mistake is too assume that the semantics
of the single argument index are the same as the semantics of the
first argument in the two argument scenario (or range). They are not
the same thing in practice and the documentation doesn't reflect this.
The error though is definitely in the documentation and not in the
implementation:
single argument: the index represents a single character position
within the string. The result is either the single character string
found at the index or nil because there is no character at the given
index.
s = ""
s[0] # nil because no character at that position
s = "abcd"
s[0] # "a"
s[-4] # "a"
s[-5] # nil, no characters before the first one
two integer arguments: the arguments identify a portion of the string to
extract or to replace. In particular, zero-width portions of the string
can also be identified so that text can be inserted before or after
existing characters including at the front or end of the string. In this
case, the first argument does not identify a character position but
instead identifies the space between characters as shown in the diagram
above. The second argument is the length, which can be 0.
s = "abcd" # each example below assumes s is reset to "abcd"
To insert text before 'a': s[0,0] = "X" # "Xabcd"
To insert text after 'd': s[4,0] = "Z" # "abcdZ"
To replace first two characters: s[0,2] = "AB" # "ABcd"
To replace last two characters: s[-2,2] = "CD" # "abCD"
To replace middle two characters: s[1..3] = "XX" # "aXXd"
The behavior of a range is pretty interesting. The starting point is the
same as the first argument when two arguments are provided (as described
above) but the end point of the range can be the 'character position' as
with single indexing or the "edge position" as with two integer
arguments. The difference is determined by whether the double-dot range
or triple-dot range is used:
s = "abcd"
s[1..1] # "b"
s[1..1] = "X" # "aXcd"
s[1...1] # ""
s[1...1] = "X" # "aXbcd", the range specifies a zero-width portion of
the string
s[1..3] # "bcd"
s[1..3] = "X" # "aX", positions 1, 2, and 3 are replaced.
s[1...3] # "bc"
s[1...3] = "X" # "aXd", positions 1, 2, but not quite 3 are replaced.
If you go back through these examples and insist and using the single
index semantics for the double or range indexing examples you'll just
get confused. You've got to use the alternate numbering I show in the
ascii diagram to model the actual behavior.
I agree that this seems like strange behavior, but even the official documentation on Array#slice demonstrates the same behavior as in your example, in the "special cases" below:
a = [ "a", "b", "c", "d", "e" ]
a[2] + a[0] + a[1] #=> "cab"
a[6] #=> nil
a[1, 2] #=> [ "b", "c" ]
a[1..3] #=> [ "b", "c", "d" ]
a[4..7] #=> [ "e" ]
a[6..10] #=> nil
a[-3, 3] #=> [ "c", "d", "e" ]
# special cases
a[5] #=> nil
a[5, 1] #=> []
a[5..10] #=> []
Unfortunately, even their description of Array#slice doesn't seem to offer any insight as to why it works this way:
Element Reference—Returns the element at index, or returns a subarray starting at start and continuing for length elements, or returns a subarray specified by range. Negative indices count backward from the end of the array (-1 is the last element). Returns nil if the index (or starting index) are out of range.
An explanation provided by Jim Weirich
One way to think about it is that index position 4 is at the very edge
of the array. When asking for a slice, you return as much of the
array that is left. So consider the array[2,10], array[3,10] and
array[4,10] ... each returns the remaining bits of the end of the
array: 2 elements, 1 element and 0 elements respectively. However,
position 5 is clearly outside the array and not at the edge, so
array[5,10] returns nil.
Consider the following array:
>> array=["a","b","c"]
=> ["a", "b", "c"]
You can insert an item to the begining (head) of the array by assigning it to a[0,0]. To put the element between "a" and "b", use a[1,0]. Basically, in the notation a[i,n], i represents an index and n a number of elements. When n=0, it defines a position between the elements of the array.
Now if you think about the end of the array, how can you append an item to its end using the notation described above? Simple, assign the value to a[3,0]. This is the tail of the array.
So, if you try to access the element at a[3,0], you will get []. In this case you are still in the range of the array. But if you try to access a[4,0], you'll get nil as return value, since you're not within the range of the array anymore.
Read more about it at http://mybrainstormings.wordpress.com/2012/09/10/arrays-in-ruby/ .
tl;dr: in the source code in array.c, different functions are called depending on whether you pass 1 or 2 arguments in to Array#slice resulting in the unexpected return values.
(First off, I'd like to point out that I don't code in C, but have been using Ruby for years. So if you're not familiar with C, but you take a few minutes to familiarize yourself with the basics of functions and variables it's really not that hard to follow the Ruby source code, as demonstrated below. This answer is based on Ruby v2.3, but is more or less the same back to v1.9.)
Scenario #1
array.length == 4; array.slice(4) #=> nil
If you look at the source code for Array#slice (rb_ary_aref), you see that when only one argument is passed in (lines 1277-1289), rb_ary_entry is called, passing in the index value (which can be positive or negative).
rb_ary_entry then calculates the position of the requested element from the beginning of the array (in other words, if a negative index is passed in, it computes the positive equivalent) and then calls rb_ary_elt to get the requested element.
As expected, rb_ary_elt returns nil when the length of the array len is less than or equal to the index (here called offset).
1189: if (offset < 0 || len <= offset) {
1190: return Qnil;
1191: }
Scenario #2
array.length == 4; array.slice(4, 0) #=> []
However when 2 arguments are passed in (i.e. the starting index beg, and length of the slice len), rb_ary_subseq is called.
In rb_ary_subseq, if the starting index beg is greater than the array length alen, nil is returned:
1208: long alen = RARRAY_LEN(ary);
1209:
1210: if (beg > alen) return Qnil;
Otherwise the length of the resulting slice len is calculated, and if it's determined to be zero, an empty array is returned:
1213: if (alen < len || alen < beg + len) {
1214: len = alen - beg;
1215: }
1216: klass = rb_obj_class(ary);
1217: if (len == 0) return ary_new(klass, 0);
So since the starting index of 4 is not greater than array.length, an empty array is returned instead of the nil value that one might expect.
Question answered?
If the actual question here isn't "What code causes this to happen?", but rather, "Why did Matz do it this way?", well you'll just have to buy him a cup of coffee at the next RubyConf and ask him.

Resources