How does this Ruby code work - if-stmt with ranges ? - ruby

I'm currently learning Ruby and I can't seem to wrap around what if /start/../end does... Help?
while gets
print if /start/../end/
end

Since you mentioned that you're new to Ruby, it's first worth taking note that you're dealing with Regular Expressions (regex) in the example - anything that is delimited between two forward slashes:
/start/ # a regular expression literal
Regular Expressions are a powerful way of matching a certain combination of letters from a larger string.
"To start means to begin." =~ /start/ #=> true, because 'start' is in the string.
The double dot notation is the flip-flop operator, a controversial construct probably inherited from Perl and not usually recommended to be used because it can lead to confusion.
It means the following:
It will collectively evaluate to false until the left hand operand is true. At which point it will collectively evaluate to true. However it will only remain true until the right hand operand evaluates to true - at which point it will again evaluate collectively to false.
Using your above example therefore:
while gets
print if /start/../end/
end
Until 'start' is entered in, the entire expression is false, and nothing is printed.
When 'start' is input, the entire expression is true, therefore EVERYTHING input after this point will also be printed out. (despite not being 'start')
As soon as 'end' is input, the entire expression evaluates to false, and nothing from that point on is printed out.

It's called the flip-flop operator. You can read more at "Ruby flip-flop operator".

Related

Find exact word in string and not partial

I have the following string
str = "feminino blue"
I need to know if there is a string called "mini" inside this string.
When I use include? method, the return is true because "feMINino" has "min"
Is there a way to search for the exact word that is passed as param?
Thanks
Sounds like a use case for regular expressions, which can match all kinds of more complex string patterns. You can read through that page for all the specifics (and it's very valuable to learn, not just as a Ruby concept; Regexes are used in almost every modern language), but this should cover your use case.
/\bmini\b/ =~ str
\b means "match a word boundary", so exactly one of the things to the left or right should be a word character and the other side should not (i.e. should be whitespace or the beginning/end of the string).
This will return nil if there's no match or the index of the match if there is one. Since nil is falsy and all numbers are truthy, this return value is safe to use in an if statement if all you need is a yes/no answer.
If the string you're working with is not constant and is instead in a variable called, say, my_word, you can interpolate it.
/\b#{Regexp.quote(my_word)}\b/ =~ str

What does /anystring/ mean in ruby?

I came across this: /sera/ === coursera. What does /sera/ mean? Please tell me. I do not understand the meaning of the expression above.
It's a regular expression. The more formal version of same is this:
coursera.match(/sera/)
Or:
/sera/.match(coursera)
These are both functionally similar. Either a string matches a regular expression, or a regular expression can be tested for matches against a string.
The long explanation of your original code is: Are the characters sera can be found in the variable coursera?
If you do this:
"coursera".match(/sera/)
# => #<MatchData "sera">
You get a MatchData result which means it matched. For more complicated expressions you can capture parts of the string using arbitrary patterns and so on. The general rule here is regular expressions in Ruby look like /.../ or vaguely like %r[...] in form.
You may also see the =~ operator used which is something Ruby inherited from Perl. It also means match.

Ruby regular expression utilizing OR within a character class

While going through the ruby-doc for regular expressions, I came across this example for implementing the && operator:
/[a-w&&[^c-g]z]/ # ([a-w] AND ([^c-g] OR z))
# This is equivalent to:
/[abh-w]/
I understand that
/[a-w&&[^c-g]]/
would equate to
/[abh-w]/
because the "^" denotes symbols that should be excluded from the regular expression.
However, I am wondering about why "z" is not also included? Why was the equivalent regular expression NOT:
/[abh-wz]/
I am very new to regular expressions, much less any specifics for regular expressions within Ruby, so any help is greatly appreciated!
The page explicitly says:
/[a-w&&[^c-g]z]/ # ([a-w] AND ([^c-g] OR z))
# This is equivalent to:
/[abh-w]/
"z" is not included in the left "AND" term, so it can't be matched.
See: "All things that are both apples, and also either apples or pears, at the same time" does not include pears. Only apples are both apples and (apples or pears). Likewise, a is in both a-w and [^c-g]z, so it matches; z is not in the left side, so "AND" is not satisfied, thus the whole expression fails.

How the Look-ahead and Look-behind concept supports such Zero-Width Assertions concept in Regex of Ruby?

I just gone through the concept Zero-Width Assertions from the documentation. And some quick questions comes into my mind-
why such name Zero-Width Assertions?
How the Look-ahead and look-behind concept supports such
Zero-Width Assertions concept?
What such ?<=s,<!s,=s,<=s - 4 symbols are instructing inside the pattern? can you help me here to focus to understand what is actually going on
I also tried some tiny codes to understand the logic, but not that much confident with the output of those:
irb(main):001:0> "foresight".sub(/(?!s)ight/, 'ee')
=> "foresee"
irb(main):002:0> "foresight".sub(/(?=s)ight/, 'ee')
=> "foresight"
irb(main):003:0> "foresight".sub(/(?<=s)ight/, 'ee')
=> "foresee"
irb(main):004:0> "foresight".sub(/(?<!s)ight/, 'ee')
=> "foresight"
Can anyone help me here to understand?
EDIT
Here i have tried two snippets one with "Zero-Width Assertions" concepts as below:
irb(main):002:0> "foresight".sub(/(?!s)ight/, 'ee')
=> "foresee"
and the other is without "Zero-Width Assertions" concepts as below:
irb(main):003:0> "foresight".sub(/ight/, 'ee')
=> "foresee"
Both the above produces same output,now internally how the both regexp move by their own to produce output- could you help me to visualize?
Thanks
Regular expressions match from left to right, and move a sort of "cursor" along the string as they go. If your regex contains a regular character like a, this means: "if there's a letter a in front of the cursor, move the cursor ahead one character, and keep going. Otherwise, something's wrong; back up and try something else." So you might say that a has a "width" of one character.
A "zero-width assertion" is just that: it asserts something about the string (i.e., doesn't match if some condition doesn't hold), but it doesn't move the cursor forwards, because its "width" is zero.
You're probably already familiar with some simpler zero-width assertions, like ^ and $. These match the start and end of a string. If the cursor isn't at the start or end when it sees those symbols, the regex engine will fail, back up, and try something else. But they don't actually move the cursor forwards, because they don't match characters; they only check where the cursor is.
Lookahead and lookbehind work the same way. When the regex engine tries to match them, it checks around the cursor to see if the right pattern is ahead of or behind it, but in case of a match, it doesn't move the cursor.
Consider:
/(?=foo)foo/.match 'foo'
This will match! The regex engine goes like this:
Start at the beginning of the string: |foo.
The first part of the regex is (?=foo). This means: only match if foo appears after the cursor. Does it? Well, yes, so we can proceed. But the cursor doesn't move, because this is zero-width. We still have |foo.
Next is f. Is there an f in front of the cursor? Yes, so proceed, and move the cursor past the f: f|oo.
Next is o. Is there an o in front of the cursor? Yes, so proceed, and move the cursor past the o: fo|o.
Same thing again, bringing us to foo|.
We reached the end of the regex, and nothing failed, so the pattern matches.
On your four assertions in particular:
(?=...) is "lookahead"; it asserts that ... does appear after the cursor.
1.9.3p125 :002 > 'jump june'.gsub(/ju(?=m)/, 'slu')
=> "slump june"
The "ju" in "jump" matches because an "m" comes next. But the "ju" in "june" doesn't have an "m" next, so it's left alone.
Since it doesn't move the cursor, you have to be careful when putting anything after it. (?=a)b will never match anything, because it checks that the next character is a, then also checks that the same character is b, which is impossible.
(?<=...) is "lookbehind"; it asserts that ... does appear before the cursor.
1.9.3p125 :002 > 'four flour'.gsub(/(?<=f)our/, 'ive')
=> "five flour"
The "our" in "four" matches because there's an "f" immediately before it, but the "our" in "flour" has an "l" immediately before it so it doesn't match.
Like above, you have to be careful with what you put before it. a(?<=b) will never match, because it checks that the next character is a, moves the cursor, then checks that the previous character was b.
(?!...) is "negative lookahead"; it asserts that ... does not appear after the cursor.
1.9.3p125 :003 > 'child children'.gsub(/child(?!ren)/, 'kid')
=> "kid children"
"child" matches, because what comes next is a space, not "ren". "children" doesn't.
This is probably the one I get the most use out of; finely controlling what can't come next comes in handy.
(?<!...) is "negative lookbehind"; it asserts that ... does not appear before the cursor.
1.9.3p125 :004 > 'foot root'.gsub(/(?<!r)oot/, 'eet')
=> "feet root"
The "oot" in "foot" is fine, since there's no "r" before it. The "oot" in "root" clearly has an "r".
As an additional restriction, most regex engines require that ... has a fixed length in this case. So you can't use ?, +, *, or {n,m}.
You can also nest these and otherwise do all kinds of crazy things. I use them mainly for one-offs I know I'll never have to maintain, so I don't have any great examples of real-world applications handy; honestly, they're weird enough that you should try to do what you want some other way first. :)
Afterthought: The syntax comes from Perl regular expressions, which used (? followed by various symbols for a lot of extended syntax because ? on its own is invalid. So <= doesn't mean anything by itself; (?<= is one entire token, meaning "this is the start of a lookbehind". It's like how += and ++ are separate operators, even though they both start with +.
They're easy to remember, though: = indicates looking forwards (or, really, "here"), < indicates looking backwards, and ! has its traditional meaning of "not".
Regarding your later examples:
irb(main):002:0> "foresight".sub(/(?!s)ight/, 'ee')
=> "foresee"
irb(main):003:0> "foresight".sub(/ight/, 'ee')
=> "foresee"
Yes, these produce the same output. This is that tricky bit with using lookahead:
The regex engine has tried some things, but they haven't worked, and now it's at fores|ight.
It checks (?!s). Is the character after the cursor s? No, it's i! So that part matches and the matching continues, but the cursor doesn't move, and we still have fores|ight.
It checks ight. Does ight come after the cursor? Well, yes, it does, so move the cursor: foresight|.
We're done!
The cursor moved over the substring ight, so that's the full match, and that's what gets replaced.
Doing (?!a)b is useless, since you're saying: the next character must not be a, and it must be b. But that's the same as just matching b!
This can be useful sometimes, but you need a more complex pattern: for example, (?!3)\d will match any digit that isn't a 3.
This is what you want:
1.9.3p125 :001 > "foresight".sub(/(?<!s)ight/, 'ee')
=> "foresight"
This asserts that s doesn't come before ight.
Zero-width assertions are difficult to understand until you realize that regex matches positions as well as characters.
When you see the string "foo" you naturally read three characters. But, there are also four positions, marked here by pipes: "|f|o|o|". A lookahead or lookbehind (aka lookarounds) match a position where the character before or after match the expression.
The difference between a zero-width expression and other expressions is that the zero-width expression only matches (or "consumes") the position. So, for example:
/(app)apple/
will fail to match "apple" because it's trying to match "app" twice. But
/(?=app)apple/
will succeed because the lookahead is only matching the position where "app" follows. It doesn't actually match the "app" character, allowing the next expression to consume them.
LOOKAROUND DESCRIPTIONS
Positive Lookahead: (?=s)
Imagine you are a drill sergeant and you are performing an inspection. You begin at the front of the line with the intention of walking past each private and ensuring they meet expectations. But, before doing so, you look ahead one by one to make sure they have lined up in the property order. The privates' names are "A", "B", "C", "D" and "E". /(?=ABCDE)...../.match('ABCDE'). Yep, they are all present and accounted for.
Negative Lookahead: (?!s)
You perform the inspection down the line and are finally standing at private D. Now you are going to look ahead to make sure that "F" from the other company has not, yet again, accidentally slipped into the wrong formation. /.....(?!F)/.match('ABCDE'). Nope, he hasn't slipped in this time, so all is well.
Positive Lookbehind: (?<=s)
After completing the inspection, the sergeant is at the end of the formation. He turns and scans back to make sure no one has snuck away. /.....(?<=ABCDE)/.match('ABCDE'). Yep, everyone is present and accounted for.
Negative Lookbehind: (?<!s)
Finally, the drill sergeant takes one last look to make sure that privates A and B have not, once again, switched places (because they like KP). /.....(?<!BACDE)/.match('ABCDE'). Nope, they haven't, so all is well.
The meaning of a zero-width assertion is an expression that consumes zero characters while matching. For example, in this example,
"foresight".sub(/sight/, 'ee')
what is matched is
foresight
^^^^^
and thus the result would be
foreee
However, in this example,
"foresight".sub(/(?<=s)ight/, 'ee')
what is matched is
foresight
^^^^
and therefore the result would be
foresee
Another example of a zero-width assertion is the word-boundary character, \b. For example, to match a complete word, you might try surrounding the word with spaces, e.g.
"flight light plight".sub(/\slight\s/, 'dark')
to get
flightdarkplight
But you see how matching the spaces removes it during substitution? Using a word boundary gets around this problem:
"flight light plight".sub(/\blight\b/, 'dark')
The \b matches the beginning or end of a word, but does not actually match a character: it's zero-width.
Maybe the most succinct answer to your question is this: Lookahead and lookbehind assertions are one kind of zero-width assertions. All lookahead and lookbehind assertions are zero-width assertions.
Here are explanations of your examples:
irb(main):001:0> "foresight".sub(/(?!s)ight/, 'ee')
=> "foresee"
Above, you're saying, "Match where the next character is not an s, and then an i." This is always true for an i, since an i is never an s, so the substitution succeeds.
irb(main):002:0> "foresight".sub(/(?=s)ight/, 'ee')
=> "foresight"
Above, you're saying, "Match where the next character is an s, and then an i." This is never true, since an i is never an s, so the substitution fails.
irb(main):003:0> "foresight".sub(/(?<=s)ight/, 'ee')
=> "foresee"
Above, already explained. (This is the correct one.)
irb(main):004:0> "foresight".sub(/(?<!s)ight/, 'ee')
=> "foresight"
Above, should be clear by now. In this case, "firefight" would substitute to "firefee", but not "foresight" to "foresee".

What does (?m:\s*) mean in Regex jargon?

What would this mean in an expression?
(?m:.*?)
or this
(?m:\s*)
I mean, it appears to be something to do with whitespace but I'm unsure.
ADDITIONAL DETAILS:
The full expression I'm looking at is:
\A((?m:\s*)((\/\*(?m:.*?)\*\/)|(\#\#\# (?m:.*?)\#\#\#)|(\/\/ .* \n?)+|(\# .* \n?)+))+
(?...) is a way of applying modifiers to the regular expression inside the parentheses.
(?:...) allows you to treat the part between the parentheses as a group, without affecting the set of strings captured by the matching engine. But you can add option letters between the ? and the :, in which case the part of the regular expression between the parentheses behaves as if you had included those option letters when creating the regular expression. That is, /(?m:...)/ behaves the same as /.../m.
The m, in turn, enables "multiline" mode.
CORRECTED:
Here's where I got confused in the original answer, because this option has different meanings in different environments.
This question is tagged Ruby, in which "multiline mode" causes the dot character (.) to match newlines, whereas normally that's the one character it doesn't match:
irb(main):001:0> "a\nb" =~ /a.b/
=> nil
irb(main):002:0> "a\nb" =~ /a.b/m
=> 0
irb(main):003:0> "a\nb" =~ /(?m:a.b)/
=> 0
So your first regular expression, (?m:.*?) will match any number (including zero) of any characters (including newlines). Basically, it will match anything at all, including nothing.
In the second regular expression, (?m:\s*), the modifier has no effect at all because there are no dots in the contained expression to modify.
Back to the first expression. As Ωmega says, the ? after the * means that it is a non-greedy match. If that were the whole expression, or if there were no captures, it wouldn't matter. But when something follows that section and there are captures, you get different results. Without the ?, the longest possible match wins:
irb(main):001:0> /<(.*)>/.match("<a><b>")[1]
=> "a><b"
With the ?, you get the shortest one instead:
irb(main):002:0> /<(.*?)>/.match("<a><b>")[1]
=> "a"
Finally, about the above-mentioned /m confusion (though if you want to avoid becoming confused yourself, this might be a good place to stop reading):
In Perl 5 (which is the source of most regular expression extensions beyond the basic syntax), the behavior triggered by /m in Ruby is instead triggered by the /s option (which Ruby doesn't have, though if you put one on your regex it will silently ignore it). In Perl, /m, despite still being called "multiline mode", has a completely different effect: it causes the ^ and $ anchors to match at newlines within the string as well as at the beginning and end of the whole string respectively. But in Ruby, that behavior is the default, and there's not even an option to change it.
Pattern .*? will match any string, but as short string as possible, as there is a lazy operator ?.
Pattern \s* will match white-space characters (zero of more).
(?m) enables "multi-line mode". In this mode, the caret and dollar match before and after newlines in the subject string. To apply this mode to some sub-pattern only, sytax (?m:...) is used, where ... is a matching pattern.
For more information read http://www.regular-expressions.info/modifiers.html

Resources