Help with Regex statement in Ruby - ruby

I have a string called 'raw'. I am trying to parse it in ruby in the following way:
raw = "HbA1C ranging 8.0—10.0%"
raw.scan /\d*\.?\d+[ ]*(-+|\342\200\224)[ ]*\d*\.?\d+/
The output from the above is []. I think it should be: ["8.0—10.0"].
Does anyone have any insight into what is wrong with the above regex statement?
Note: \342\200\224 is equal to — (em-dash, U+2014).
The piece that is not working is:
(-+|\342\200\224)
I think it should be equivalent to saying, match on 1 or more - OR match on the string \342\200\224.
Any help would be greatly appreciated it!

The original regex works for me (ruby 1.8.7), justs needs the capture to be non-capturing and scan will output the entire match. Or switch to String#[] or String#match instead of String#scan and don't edit the regex.
raw = "HbA1C ranging 8.0—10.0%"
raw.scan /\d*\.?\d+[ ]*(?:-+|\342\200\224)[ ]*\d*\.?\d+/
# => ["8.0—10.0"]
For testing/building regular expressions in Ruby there's a fantastic tool over at http://rubular.com that makes it a lot easier. http://rubular.com/r/b1318BBimb is the edited regex with a few test cases to make sure it works against them.

raw = "HbA1C ranging 8.0—10.0%"
raw.scan(/\d+\.\d+.+\d+\.\d+/)
#=> ["8.0\342\200\22410.0"]

Related

What does /anystring/ mean in ruby?

I came across this: /sera/ === coursera. What does /sera/ mean? Please tell me. I do not understand the meaning of the expression above.
It's a regular expression. The more formal version of same is this:
coursera.match(/sera/)
Or:
/sera/.match(coursera)
These are both functionally similar. Either a string matches a regular expression, or a regular expression can be tested for matches against a string.
The long explanation of your original code is: Are the characters sera can be found in the variable coursera?
If you do this:
"coursera".match(/sera/)
# => #<MatchData "sera">
You get a MatchData result which means it matched. For more complicated expressions you can capture parts of the string using arbitrary patterns and so on. The general rule here is regular expressions in Ruby look like /.../ or vaguely like %r[...] in form.
You may also see the =~ operator used which is something Ruby inherited from Perl. It also means match.

Changing "word" to "Word" using a RegEx like [A-Z]([a-z]*)\b

The title sums up my conundrum pretty well. I've been searching around the net for a while, and being new to Ruby and Regular Expressions as a whole, I'm stuck trying to figure out how to alter the case of a single word string using a RegEx "filter" such as [A-Z]([a-z]*)\b.
Basically I want the flow to be
input: woRD
filter: [A-Z]([a-z]*)\b
output: Word
I already have the words filtered into a list, so I don't need to match words; I only need to filter the case of the word using a RegEx filter.
I do not want to use standard capitalization methods, I want this to be done using Regular Expressions.
You can use
"woRD".downcase.capitalize
Ruby provides some predefined methods for these type of functionality. Try to use them instead of regex. which saves coding time!
Well, for some reason you want to use regexps. Here you go:
# prepare hashes for gsub
to_down = (to_upper = Hash[('a'..'z').zip('A'..'Z')]).invert
# convert to downcase
downcased = 'woRD'.gsub(/[A-Z]/, to_down)
# ⇛ 'word'
titlecased = downcased.gsub(/^\w/, to_upper)
# ⇒ 'Word'
Hope it helps. Note the usage of String#gsub(re, hash) method.
You can't use Regex to such altering as you want to do.
Please read carefully this topic: How to change case of letters in string using regex in Ruby.
The best way to solve your problem is to use:
"woRD".downcase.capitalize
or
name_of_your_variable.downcase!.capitalize!
if you want to alter string in your variable permanently without need of assign it to other variable.

Capture float in string using Ruby short-hand regex syntax

I have a Ruby string which contains a dollar amount that I would like to convert into a float. I found a short hand syntax for extracting the float from the string:
"$123.45"[/\d+\.\d+/].to_f
# => 123.45
Now I realize that it does not work when there is a comma in the number:
"$1,023.45"[/\d+\.\d+/].to_f
# => 23.45
How do I change the syntax of this regex to exclude the comma while still keeping the syntax as concise as possible?
You can delete the commas first using String#delete
"$1,023.45".delete(",")[/\d+\.\d+/].to_f
#=> 1023.45
"$1,023.45".gsub(/[\$,]/, '').to_f
# => 1023.45
p "$1,023.45".delete(",$").to_f #=> 1023.45
This regex should do the job [/\d+[,.]\d+/]
[.,] means , or . may be at that position
Update: I thought you mean instead of the . like the eropeans do it. So this might not work for you. You should go for deleting the comma first to avoid a 1,234.56 situation, like the others stated. This can not be solved with regex directly.

Ruby Regular Expression lookahead to Split at pipe unless contained in brackets

I'm trying to decode the following string:
body = '{type:paragaph|class:red|content:[class:intro|body:This is the introduction paragraph.][body:This is the second paragraph.]}'
body << '{type:image|class:grid|content:[id:1|title:image1][id:2|title:image2][id:3|title:image3]}'
I need the string to split at the pipes but not where a pipe is contained with square brackets, to do this I think I need to perform a lookahead as described here: How to split string by ',' unless ',' is within brackets using Regex?
My attempt(still splits at every pipe):
x = self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/ *\|(?!\]) */)}
->
[
["type:paragaph", "class:red", "content:[class:intro", "body:This is the introduction paragraph.][body:This is the second paragraph.]"]
["type:image", "class:grid", "content:[id:1", "title:image1][id:2", "title:image2][id:3", "title:image3]"]
]
Expecting:
->
[
["type:paragaph", "class:red", "content:[class:intro|body:This is the introduction paragraph.][body:This is the second paragraph.]"]
["type:image", "class:grid", "content:[id:1|title:image1][id:2|title:image2][id:3|title:image3]"]
]
Does anyone know the regex required here?
Is it possible to match this regex? I can't seem to modify it correctly Regular Expression to match underscores not surrounded by brackets?
I modified the answer here Split string in Ruby, ignoring contents of parentheses? to get:
self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/\|\s*(?=[^\[\]]*(?:\[|$))/)}
Seems to do the trick. Though I'm sure if there's any shortfalls.
Dealing with nested structures that have identical syntax is going to make things difficult for you.
You could try a recursive descent parser (a quick Google turned up https://github.com/Ragmaanir/grammy - not sure if any good)
Personally, I'd go for something really hacky - some gsubs that convert your string into JSON, then parse with a JSON parser :-). That's not particularly easy either, though, but here goes:
require 'json'
b1 = body.gsub(/([^\[\|\]\:\}\{]+)/,'"\1"').gsub(':[',':[{').gsub('][','},{').gsub(']','}]').gsub('}{','},{').gsub('|',',')
JSON.parse('[' + b1 + ']')
It wasn't easy because the string format apparently uses [foo:bar][baz:bam] to represent an array of hashes. If you have a chance to modify the serialised format to make it easier, I would take it.
I modified the answer here Split string in Ruby, ignoring contents of parentheses? to get:
self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/\|\s*(?=[^\[\]]*(?:\[|$))/)}
Seems to do the trick. If it has any shortfalls please suggest something better.

String.scan returning empty array in Ruby

I've written a very basic regex in Ruby for scraping email-addresses off the web. It looks like the following:
/\b\w+(\.\w+)*#\w+\.\w+(\.\w+)*\b/
When I load this into irb or rubular, I create the following string:
"example#live.com"
When I run the Regexp.match(string) command in irb, I get this:
regexp.match(string) =>#<MatchData "example#live.com" 1:nil 2:nil>
So the match seems to be recorded in the MatchData object. However, when I run the String.scan(regex) command (which is what I'm primarily interested in), I get the following:
string.scan(regex) => [[nil, nil]]
Why isn't scan returning the matched email address? Is it a problem with the regular expression? Or is it a nuance of String.scan/Regexp/MatchData that somebody could make me aware of?
The main issue is that your capturing groups (the stuff matched by whatever's in parentheses) aren't capturing what you want.
Let's say you want just the username and domain. You should use something along the lines of /\b(\w+(?:\.\w+)*)#(\w+(?:\.\w+)*)\.\w+\b/. As it stands, your pattern matches the input text, but the groups don't actually capture any text.
Also, why not just use /([\w\.]+)#([\w\.]+)\.\w+/? (not too familiar with ruby's regex engine, but that should be about right... you don't even need to check for word boundaries if you're using greedy quantifiers)

Resources