What does /anystring/ mean in ruby? - ruby

I came across this: /sera/ === coursera. What does /sera/ mean? Please tell me. I do not understand the meaning of the expression above.

It's a regular expression. The more formal version of same is this:
coursera.match(/sera/)
Or:
/sera/.match(coursera)
These are both functionally similar. Either a string matches a regular expression, or a regular expression can be tested for matches against a string.
The long explanation of your original code is: Are the characters sera can be found in the variable coursera?
If you do this:
"coursera".match(/sera/)
# => #<MatchData "sera">
You get a MatchData result which means it matched. For more complicated expressions you can capture parts of the string using arbitrary patterns and so on. The general rule here is regular expressions in Ruby look like /.../ or vaguely like %r[...] in form.
You may also see the =~ operator used which is something Ruby inherited from Perl. It also means match.

Related

Using a ruby regular expression

I'm completely new to Ruby so I was just wondering if someone could help me out.
I have the following String:
"<planKey><key>OR-J8U</key></planKey>"
What is the regex I have to write to get the center part OR-J8U?
Use the following:
str = "<planKey><key>OR-J8U</key></planKey>"
str[/(?<=\<key\>).*(?=\<\/key\>)/]
#=> "OR-J8U"
This captures anything in between opening and closing 'key' tags using lookahead and lookbehinds
If you want to get the string OR-J8U then you could simply use that string in the regular expression; the - character has to be escaped:
/OR\-J8U/
Though, I believe you want any string that is enclosed within <planKey><key> and </key></planKey>. In that case ice's answer is useful if you allow for an empty string:
/(?<=\<key\>).*(?=\<\/key\>)/
If you don't allow for an empty string, replace the * with +:
/(?<=\<key\>).*(?=\<\/key\>)/
If you prefer a more general approach (any string enclosed within any tags), then I believe the common opinion is not to use a regular expression. Instead consider using an HTML parser. On SO you can find some questions and answers in that regard.

Tokenize (lex? parse?) a regular expression

Using Ruby I'd like to take a Regexp object (or a String representing a valid regex; your choice) and tokenize it so that I may manipulate certain parts.
Specifically, I'd like to take a regex/string like this:
regex = /var (\w+) = '([^']+)';/
parts = ["foo","bar"]
and create a replacement string that replaces each capture with a literal from the array:
"var foo = 'bar';"
A naïve regex-based approach to parsing the regex, such as:
i = -1
result = regex.source.gsub(/\([^)]+\)/){ parts[i+=1] }
…would fail for things like nested capture groups, or non-capturing groups, or a regex that had a parenthesis inside a character class. Hence my desire to properly break the regex into semantically-valid pieces.
Is there an existing Regex parser available for Ruby? Is there a (horror of horrors) known regex that cleanly matches regexes? Is there a gem I've not found?
The motivation for this question is a desire to find a clean and simple answer to this question.
I have a JavaScript project on GitHub called: Dynamic (?:Regex Highlighting)++ with Javascript! you may want to look at. It parses PCRE compatible regular expressions written in both free-spacing and non-free-spacing modes. Since the regexes are written in the less-feature-rich JavaScript syntax, these regexes could be easily converted to Ruby.
Note that regular expressions may contain arbitrarily nested parentheses structures and JavaScript has no recursive regex features, so the code must parse the tree of nested parens from the-inside-out. Its a bit tricky but works quite well. Be sure to try it out on the highlighter demo page, where you can input and dynamically highlight any regex. The JavaScript regular expressions used to parse regular expressions are documented here.

How do I make part of a regular expression optional in Ruby?

To match the following:
On Mar 3, 2011 11:05 AM, "mr person"
wrote:
I have the following regular expression:
/(On.* (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2}, [12]\d{3}.* at \d{1,2}:\d{1,2} (?:AM|PM),.*wrote:)/m
Is there a way to make the at optional? so if it's there great, if not, it still matches?
Sure. Put it in parentheses, put a question mark after it. Include one of the spaces (since otherwise you'll be trying to match two spaces if the "at" is missing.) (at )? (or as someone else suggested, (?:at )? to avoid it being captured).
Don't forget (?:) to make sure the bracketed expression doesn't get captured
(?:at)?
Sure, you just need to group the optional part...
(at )*
And, ok, I guess that will match at at at at, so you might want to just do:
(at )?
Others got your answer. This is just an aside re: Regular Expressions.
When you say "conditions" in regular expressions, it refers to the regex language. Like any language, its a branch in code execution, but the code is a different regular expression path, the "code" of regular expressions.
So in psudo code: if (evaluation is true) do this regular sub-expression, else do this other sub-expression.
This conditional exists in advanced regular expression engines ... Perl.
Perl uses the most advanced regular expression engine that exists. In version 6 and beyond it will be an integral part of the language, where code and expression intermingle seamlessly.
Perl 5.10 has this construct:
(?(condition)yes-pattern|no-pattern).
Edit Just a warning that where Perl goes, every other language follows as far as regular expression.

What is this regex doing? match = /^plus_([0-9]+)$/.match(m.to_s)

What is this RUBY regex doing:
match = /^plus_([0-9]+)$/.match(m.to_s)
It seems to be matching 'plus_' and then a number.
But what is the .match(m.to_s) part doing? is it chaining to itself? I don't understand.
Sorry its Ruby.
Calling .match(s) on a regex runs the regex against s and returns a MatchData object. m.to_s simply means "call the to_s method of m" (i.e. convert it to a string).
Haha, nice edit! I thought it was Ruby to begin with - my answer still holds.
You are correct about what the regex matches. However, .match() is the method used to match the regex against strings. It returns a MatchData object which you can then use to find out information about the match.
So /^plus_([0-9]+)$/ creates a regex object, .match(m.to_s) matches it against m as a string, and the resulting MatchData is stored in match.
See the Regexp documenation.
I've never looked at ruby, but based on what others have said here, this seems to be calling the "match" method of the regex that was just defined, against the string "m".
So the regex itself becomes an "object", to which the method "match" is called, and the argument is the result of (m.to_s), which is just a string. The result of the "match()" method is then returned into a variable named "match".
I think the fact that the method call is the same name as the return variable is what's making this seem weird.
Now this could be 100% wrong, as I've NEVER looked at ruby, but based on what others have said, this is what it looks like.

Help with Regex statement in Ruby

I have a string called 'raw'. I am trying to parse it in ruby in the following way:
raw = "HbA1C ranging 8.0—10.0%"
raw.scan /\d*\.?\d+[ ]*(-+|\342\200\224)[ ]*\d*\.?\d+/
The output from the above is []. I think it should be: ["8.0—10.0"].
Does anyone have any insight into what is wrong with the above regex statement?
Note: \342\200\224 is equal to — (em-dash, U+2014).
The piece that is not working is:
(-+|\342\200\224)
I think it should be equivalent to saying, match on 1 or more - OR match on the string \342\200\224.
Any help would be greatly appreciated it!
The original regex works for me (ruby 1.8.7), justs needs the capture to be non-capturing and scan will output the entire match. Or switch to String#[] or String#match instead of String#scan and don't edit the regex.
raw = "HbA1C ranging 8.0—10.0%"
raw.scan /\d*\.?\d+[ ]*(?:-+|\342\200\224)[ ]*\d*\.?\d+/
# => ["8.0—10.0"]
For testing/building regular expressions in Ruby there's a fantastic tool over at http://rubular.com that makes it a lot easier. http://rubular.com/r/b1318BBimb is the edited regex with a few test cases to make sure it works against them.
raw = "HbA1C ranging 8.0—10.0%"
raw.scan(/\d+\.\d+.+\d+\.\d+/)
#=> ["8.0\342\200\22410.0"]

Resources