Inconsistent Ruby .split behavior [duplicate] - ruby

This question already has an answer here:
How do I avoid trailing empty items being removed when splitting strings?
(1 answer)
Closed 5 years ago.
Suppose I have this:
a = "|hello"
if I do:
a.split("|") #=> ["", "hello"]
Now say I have:
b = "hello|"
if I do:
b.split("|") #=> ["hello"]
Why is this happening? I expected the result to be ["hello", ""] , similar to the first example. This is the split method working inconsistently. Or is there something about its inner working that I'm not aware of?

This behaviour is described in documentation:
If the limit parameter is omitted, trailing null fields are
suppressed.
If you want to save trailing empty string, just add positive or negative limit, as documentation offering:
"hello|".split('|', 2)
#=> ["hello", ""]
"hello|||".split('|', -1)
#=> ["hello", "", "", ""]
Note
If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed.

Related

is there a function to capitalize an obj accessed with str[i] in Ruby?

print str[i].upcase is not working and i have to capitalize specific letters determined using an index. Can someone help me with this?
def mumble_letters
str = nil
print "Please write a string : "
str = gets.to_str
# puts str.length
while str.length == 1
print "Please write a string : "
str = gets.to_str
end
for i in 0..str.length
print str[i].upcase!
i.times{ print str[i].capitalize}
if i != str.length - 1
print"-"
end
end
end
mumble_letters
the error I get is : undefined method `upcase' for nil:NilClass (NoMethodError)
Did you mean? case
Problem
str[i].upcase! mutates the single character in the Array value into an uppercase character. However, at least on Ruby 2.7.1, it won't actually change the contents of your original String object until you reassign the element back to the String index you want modified. For example:
str[i] = str[i].upcase
However, the approach above won't work with frozen strings, which are fairly common in certain core methods, libraries, and frameworks. As a result, you may encounter the FrozenError exception with the index-assignment approach.
Solution
There's more than one way to solve this, but one way is to:
split your String object into an Array of characters,
modify the letter at the desired indexes,
rejoin the characters into a single String, and then
re-assign the modified String to your original variable.
For example, showing some intermediate steps:
# convert String to Array of characters
str = "foobar"
chars = str.chars
# contents of your chars Array
chars
#=> ["f", "o", "o", "b", "a", "r"]
# - convert char in place at given index in Array
# - don't rely on the return value of the bang method
# to be a letter
# - safe navigation handles various nil-related errors
chars[3]&.upcase!
#=> "B"
# re-join Array of chars into String
chars.join
#=> "fooBar"
# re-assign to original variable
str = chars.join
str
#=> "fooBar"
If you want, you can perform the same operation on multiple indexes of your chars Array before re-joining the elements. That should yield the results you're looking for.
More concisely:
str = "foobar"
chars = str.chars
chars[3]&.upcase!
p str = chars.join
#=> "fooBar"
Personally, I find operating on an Array of characters more intuitive and easier to troubleshoot than making in-place changes through repeated assignments to indexes within the original String. Furthermore, it avoids exceptions raised when trying to modify indexes within a frozen String. However, your design choices may vary.
str[i].upcase returns the upcased letter, but does not modify it in place. Assign it back to the string for it to work.
str = 'abcd'
str[2] = str[2].upcase #=> "C"
str #=> "abCd"
I can see two problems with your code...
First, an empty string has a length of 0 so what you wanted to write is
while str.length == 0
Secondly, when you do...
for i in 0..str.length
You are iterating up to the string length INCLUDING the string length. If the string has five characters, it actually only has valid indexes 0 through 4 but you are iterating 0 through 5. And str[5] doesn't exist so returns nil and you cannot do upcase! on a nil.
To handle that common situation, Ruby has the tripe dot operator
for i in 0...str.length
...which will stop at the integer before the length, which is what you want.
It's also more ruby-eque to do
(0...str.length).each do |i|

What does the ruby ? method do? [duplicate]

This question already has answers here:
what is "?" in ruby
(3 answers)
Closed 3 years ago.
I ran across a code snippet today that used the ? operator to quote the next character. I have no idea where the documentation is for this method and really no idea what it's actually doing.
I've looked at the ruby docs, but haven't found it.
?1
=> "1"
?1"23abc"
=> "123abc"
? is not a method in this case but rather a parsable syntax. ? is a character literal in this context
Docs Excerpt:
There is also a character literal notation to represent single character strings, which syntax is a question mark (?) followed by a single character or escape sequence that corresponds to a single codepoint in the script encoding:
?a #=> "a"
?abc #=> SyntaxError
?\n #=> "\n"
?\s #=> " "
?\\ #=> "\\"
?\u{41} #=> "A"
?\C-a #=> "\x01"
?\M-a #=> "\xE1"
?\M-\C-a #=> "\x81"
?\C-\M-a #=> "\x81", same as above
?あ #=> "あ"
You have also found another fun little mechanism of the parser which is 2 Strings can be concatenated together by simply placing them side by side (with or without white space). e.g.
"1" "234"
#=> "1234"
"1""234"
#=> "1234"

Use regex to match only sequential (but not necessarily consecutive) matches [duplicate]

This question already has answers here:
regex - matching non-necessarily consecutive occurrences
(4 answers)
Closed 3 months ago.
I'm trying to match a regex with a string as long as possible. This is the string to look in:
"xxaxxbxxbxbxxbxxbxxbxxdxx"
The pattern to match is:
"bcda"
The pattern is to be interpreted as follows:
b: There are several of them in the string. The first one should match.
c: There isn't one in the string, so nothing is returned.
d: There is just one near the end of the string. It should be returned.
a: There is one at the beginning of the string. Since b, c, and d were sought first and results are returned, a will not be returned.
The expected return is:
"bd"
It may be that regex match is not the correct way to accomplish this, but I'd like to ask for assistance with one. The basic question is this: can I use regex to generically find a substring that represents as much of a an ordered, but not necessarily consecutive, sequence of candidate characters as it possibly can? If so, how?
As #sawa explained, you cannot do this with a single regex. Here is a recursive solution.
def consecutive_matches(str, pattern)
return '' if str.empty? || pattern.empty?
ch, pat = pattern[0], pattern[1..-1]
i = str.index(ch)
if i
ch + consecutive_matches(str[i+1..-1], pat)
else
consecutive_matches(str, pat)
end
end
str = "xxaxxbxxbxbxxbxxbxxbxxdxx"
consecutive_matches(str, "bcda") #=> "bd"
consecutive_matches(str, "abcd") #=> "abd"
consecutive_matches(str, "dabc") #=> "d"
consecutive_matches(str, "cfgh") #=> ""
It is impossible to do that with a single regex match. A capture in a regex must be a substring of the original string. bd here is not, so there is no way to match that as a single capture.

Why is splitting strings inconsistent?

Examine:
"test.one.two".split(".") # => ["test", "one", "two"]
Right, perfect. Exactly what we want.
"test..two".split(".") # => ["test", "", "two"]
I replaced one with the empty string, so that makes sense
"test".split(".") # => ["test"]
That's what I would expect, no problems here.
".test".split(".") # => ["", "test"]
Yep, my string has one . so I got two sections as a result.
"test.".split(".") # => ["test"]
What? There's a . in my string, it should have been split into two sections. I didn't ask to get rid of empty strings; it didn't get rid of empty strings back in tests 2 or 4.
I would have expected ["test", ""]
"".split(".") # => []
WHAT? This should operate almost exactly like test 3, and return [""]. But now I can't perform any string methods on result[0]
Why is this inconsistent for splits that occur on the edges, or for the empty string?
The documentation explains this well: http://ruby-doc.org/core-2.2.0/String.html#method-i-split
If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of
fields will be returned (if limit is 1, the entire string is returned
as the only entry in an array). If negative, there is no limit to the
number of fields returned, and trailing null fields are not
suppressed.
So, this does what you'd expect:
"test.".split(".", -1)
=> ["test", ""]
The rest is there in the docs.

How does this regexp split on the first vowel?

This code splits a word into two strings at the first vowel. Why?
word = "banana"
parts = word.split(/([aeiou].*)/)
The key here is the regular expression (or regex) that is being used between the two /'s
[aeiou] says to look for the first instance of one of those characters.
. matches any single character
* modifies the previous thing to mean match 0 or more of it
(...) means capture everything enclosed between the parentheses
Translated to english this regular expression might read something like "Given a string, find the first vowel that is followed by zero or more characters. Collect that vowel and its following characters and set them aside."
The slightly more confusing part is the regex's interaction with the split method. The value the regex returns is 'anana'. And we can see that calling split with 'anana' doesn't have the same result:
'banana'.split('anana') #=> ["b"]
But when split is called with a regular expression that uses a capture group - or parentheses (...), then anything in that capture group will also be returned in the result of the split. Which is why:
'banana'.split /([aeiou].*)/ #=> ["b", "anana"]
If you want to learn more about how regular expressions work (particularly in ruby), Rubular is a great resource to fiddle with - http://www.rubular.com/r/XEUgPhOdlH
This is actually a bit tricky. This regexp
/[aeiou].*/
matches the string from the first vowel to the end of the string i.e. "anana". But if you were to split on that, you would only get the first letter since split doesn't include the splitting pattern:
"banana".split /[aeiou].*/
# ["b"]
But according to the String#split docs, if the splitting pattern is a regexp with a capture group, the capture groups are included in the result as well. Since the whole pattern is wrapped in a capture group, the result is that the string splits before the first vowel.
For example, if you change the regexp to have two capture groups, it splits further:
"banana".split /([aeiou])(.*)/
# ["b", "a", "nana"]
ANSWER FOR OLD TITLE
It's not really a Ruby's syntax, it's a standard Regular Expression's syntax that also implemented by Ruby.
* means zero or more of previous item
. means any character
[aeiou] means any character inside the brace
() means capture it
So that regex means: capture anything that starts with a, e, i, o, or u.
the word.split(/([aeiou].*)/) means, split the word variable based on anything that starts with letter a, e, i, o, or u.
See here fore more information.
ANSWER FOR NEW TITLE
Why does it split on the first vowel? It's not really like that.. What it does is, split by anything that start with vowels and capture it (the string that starts with vowels) also, see more example here:
word = 'banana'
word.split /[aeiou]/ # split by vowels
#=> ["b", "n", "n"]
word.split /([aeiou])/ # split by vowels and capture the vowels
#=> ["b", "a", "n", "a", "n", "a"]
word.split /[aeiou].*/ # split by anything that start with vowels
#=> ["b"]
word.split /([aeiou].*)/ # split by anything that start with vowels and capture the thing that start with vowels also
#=> ["b", "anana"]
ANSWER FOR OLD TITLE
If the * symbol not inside the regular expression // (Ruby's syntax), there are some possibilities:
multiplication 2 * 3 == 6, 'na' * 3 == 'nanana' # batman!
splat operation [*(1..4)] == [1,2,3,4], see more info here

Resources