Ruby regexp returning "" vs nil - ruby

What is the reason behind different results between the following regexp statements:
"abbcccddddeeee"[/z*/] # => ""
And these that return nil:
"some matching content"[/missing/] # => nil
"start end"[/\Aend/] # => nil

What's happening is that /z*/ will return zero or more occurrences of z.
If you use /z+/, which returns one or more, you'll see it returns nil as expected.

The regular expression /z*/ matches 0 or more z characters, so it also matches an empty string at the beginning of your string. Consider this:
"abbcccddddeeee" =~ /z*/
# => 0
Thus String#[] returns the matched empty string.
In your second example the expressions /missing/ and /\Aend/ don't match anything so nil is returned.

* wild-card stands for 0 or more matches so even if your z is not present it will show a empty string match. on the other hand you can use + for 1 or more and ? for zero or more matches.

Related

Ruby : get value between parenthesis (without those parenthesis)

I have strings like "untitled", "untitled(1)" or "untitled(2)".
I want to get the last integer value between parenthesis when if there is. So far I tried a lot of regex, the ones making sense to me (I am new to regex) look like this:
number= string[/\([0-9]+\)/]
number= string[/\(([0-9]+)\)/]
but it still returns me the value with the parenthesis.
If there are no left-then-right parenthesis, getting an empty string (or nil) would be nice. Case such as "untitled($)", getting the '$' char or nil or an empty string would do the trick. For "untitled(3) animal(4)", I want to get 4.
I have been looking a lot of topics about how to do that and but it never seems to work ... what am I missing here ?
/(?<=\()\w+(?=\)$)/ matches one or more word characters (letter, number, underscore) within parenthesis, right before the end of line:
words = %w[
untitled
untitled(1)
untitled(2)
untitled(foo)
unti(tle)d
]
p words.map { |word| word[/(?<=\()\w+(?=\)$)/] }
# => [nil, "1", "2", "foo", nil]
When you use Regex as the parameter to String#[], you can optionally pass in the index of captured group to extract that group.
If a Regexp is supplied, the matching portion of the string is returned. If a capture follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.
string = "untitled(1)"
number = string[/\(([0-9]+)\)/, 1]
puts number
#=> 1

Why does this evaluate to false? "S" == /[S]/ => 0

I'm a ruby newbie and I'm having trouble understanding why "S" == /[S]/ evaluates to false.
Can anyone explain this? I've also tried
"S" =~ /[S]/
#=> 0 `
"S" =~ /[^S]/
#=> nil
baffling to me
"S" == /[S]/ is false because == in Ruby doesn't evaluate whether a regexp matches, it just determines whether two objects are equal. The string "S" and the regexp /[S]/ are of course completely different objects and not equal.
=~ (which is a correct way to match a regexp against a string in Ruby) returns the match position. In your first example the match position is the beginning of the string, 0. In the second example there is no match, so =~ returns nil.
"S" == /[S]/
Everything (almost) in Ruby is an object. In this case you are checking for equality between an instance of a String "S" and an instance of a Regexp /[S]/. Therefore, by definition, they are two different objects, hence the expression returns false. Rather than checking for equality with == you should use =~
"S" == /[S]/
When you use a match operator =~ it returns the index of the first match found in a string. Remember that indexing in Ruby starts from 0. In your example the first character in the provided string is matched. The first character is indexed with 0 and that is what the statement returns.
"S" == /[^S]/
By using a caret ^ you are telling Ruby to match anything but what is between square brackets (this is only true in square brackets, ^ is also used to indicate the beginning of a string if used outside []). In your case it is anything but S. Ruby does not find a match and returns nil.

How to do with the string method `[]` in ruby

2.0.0p247 :069 > str[str.length].class
=> NilClass
2.0.0p247 :071 > str[str.length, 1].class
=> String
2.0.0p247 :072 > str[str.length, 2].class
=> String
2.0.0p247 :073 > str[str.length+ 1, 2].class
=> NilClass
The first line returns NilClass, while the second line returns String. Ruby method String#[n] return a single-character string, and String#[m, n] returns substrings from the string. Does that means the single-character string is different from the substrings?
Does that means the single-character string is different from the substrings?
No. It means that String#[] behaves differently depending on the arguments passed to it.
You are trying to access past the last character of the string.
str[str.length]
returns nil because there is no character there.
The documentation states:
Returns nil if the initial index falls outside the string or the length is negative.
str[-1]
returns the last character, and...
str[-1].class
returns String.
Similarly...
str[str.length, 1]
returns the empty string "".
Again, the documentation states (emphasis mine):
If passed a start index and a length, returns a substring containing length characters starting at the index.
Since there are no more characters past the end of str, this substring is empty.
Follow the code below :
s = "abc"
s[s.size] # => nil
s[s.size,1] # => ""
s.size # => 3
Documentation of String#[]:
Element Reference — (a) If passed a single index, returns a substring of one character at that index.(b) If passed a start index and a length, returns a substring containing length characters starting at the index.
apply for both the above - if an index is negative, it is counted from the end of the string. For the start and range cases the starting index is just before a character and an index matching the string’s size. (c)Additionally, an empty string is returned when the starting index for a character range is at the end of the string.
(d) Returns nil if the initial index falls outside the string or the length is negative
Why s[s.size] # => nil ?
Because at index 3 there is no character,so returns nil.(applying rule - a).Rule-a says that,return the character from the specified index if present or nil if not found.
Why s[s.size,1] # => "" ?
Because this goes to directly rule-c.
Why s[s.size+1,1] # => nil ?
Because rule-d says like that.
Said that nil is an instance of Nilclass and '' empty string is an instance of String class.Thus what you got,all are valid..
s = "abc"
s[s.size].class # => NilClass
s[s.size,1].class # => String

String containment

Is there a way to check in Ruby whether the string "1:/2" is contained within a larger string str, beside iterating over all positions of str?
You can use the include? method
str = "wdadwada1:/2wwedaw"
# => "wdadwada1:/2wwedaw"
str.include? "1:/2"
# => true
A regular expression will do that.
s =~ /1:\/2/
This will return either nil if s does not contain the string, or the integer position if it does. Since nil is falsy and an integer is truthy, you can use this expression in an if statement:
if s =~ /1:\/2/
...
end
The regular expression is normally delimited by /, which is why the slash within the regular expression is escaped as \/
It is possible to use a different delimiter to avoid having to escape the /:
s =~ %r"1:/2"
You could use other characters than " with this syntax, if you want.
The simplest and most straight-forward is to simply ask the string if it contains the sub-string:
"...the string 1:/2 is contained..."['1:/2']
# => "1:/2"
!!"...the string 1:/2 is contained..."['1:/2']
# => true
The documentation has the full scoop; Look at the last two examples.

Regex for matching capitals

def normalized?
matches = match(/[^A-Z]*/)
return matches.size == 0
end
This is my function operating on a string, checking wether a string contains only uppercase letters. It works fine ruling out non matches, but when i call it on a string like "ABC" it says no match, because apparently matches.size is 1 and not zero. There seems to be an empty element in it or so.
Can anybody explain why?
Your regex is wrong - if you want it to match ONLY uppercase strings, use /^[A-Z]+$/.
Your regular expression is incorrect. /[^A-Z]*/ means "match zero or more characters that are not between A and Z, anywhere in the string". The string ABC has zero characters that are not between A and Z, so it matches the regular expression.
Change your regular expression to /^[^A-Z]+$/. This means "match one or more characters that are not between A and Z, and make sure every character between the beginning and end of the string are not between A and Z". Then the string ABC will not match, and then you can check matches[0].size or whatever, as per sepp2k's answer.
MatchData#size returns the number of capturing groups in the regex plus one, so that md[i] will access a valid group iff i < md.size. So the value returned by size only depends on the regex, not the matched string, and will never be 0.
You want matches.to_s.size or matches[0].size.
ruby-1.9.2-p180> def normalized? s
ruby-1.9.2-p180?> s.match(/^[[:upper:]]+$/) ? true : false
ruby-1.9.2-p180?> end
=> nil
ruby-1.9.2-p180> normalized? "asdf"
=> false
ruby-1.9.2-p180> normalized? "ASDF"
=> true
The * in your regular expression means that it matches any number of non-uppercase characters, including zero. So it always matches everything. The fix is to remove the *, then it will fail to match a string containing only uppercase characters. (Although you would need a different test if zero-length strings are not permitted.)
If you want to know that the input string entirely consists of English uppercase letters, i.e. A-Z, then you must remove the Kleene Star as it will match before and after every single character in any input string (zero length match). The statement !s[/[^A-Z]/] tells you if there's no match of non-A-to-Z characters:
irb(main):001:0> def normalized? s
irb(main):002:1> return !s[/[^A-Z]/]
irb(main):003:1> end
=> nil
irb(main):004:0> normalized? "ABC"
=> true
irb(main):005:0> normalized? "AbC"
=> false
irb(main):006:0> normalized? ""
=> true
irb(main):007:0> normalized? "abc"
=> false
There is only 1 regular expression that defines a string with only and All capitals:
def onlyupper(s)
(s =~ /^[A-Z]+$/) != nil
end
Truth table:
/[^A-Z]*/:
Testing 'asdf' matched 'asdf' length 4
Testing 'HHH' matched '' length 0
Testing '' matched '' length 0
Testing '-=AAA' matched '-=' length 2
--------
/[^A-Z]+/:
Testing 'asdf' matched 'asdf' length 4
Testing 'HHH' matched nil
Testing '' matched nil
Testing '-=AAA' matched '-=' length 2
--------
/^[^A-Z]*$/:
Testing 'asdf' matched 'asdf' length 4
Testing 'HHH' matched nil
Testing '' matched '' length 0
Testing '-=AAA' matched nil
--------
/^[^A-Z]+$/:
Testing 'asdf' matched 'asdf' length 4
Testing 'HHH' matched nil
Testing '' matched nil
Testing '-=AAA' matched nil
--------
/^[A-Z]*$/:
Testing 'asdf' matched nil
Testing 'HHH' matched 'HHH' length 3
Testing '' matched '' length 0
Testing '-=AAA' matched nil
--------
/^[A-Z]+$/:
Testing 'asdf' matched nil
Testing 'HHH' matched 'HHH' length 3
Testing '' matched nil
Testing '-=AAA' matched nil
--------
This question needs a more clear answer. As tchrist commented, I wish he would have answered. The "Regex for matching capitals" is to use:
/\p{Uppercase}/
As tchrist mentions "is distinct from the general category \p{Uppercase_Letter} aka \p{Lu}. That’s because there exist non-Letters that count as Uppercase"

Resources