Ruby scan regex will not match optional - ruby

Take this string.
a = "real-ab(+)real-bc(+)real-cd-xy"
a.scan(/[a-z_0-9]+\-[a-z_0-9]+[\-\[a-z_0-9]+\]?/)
=> ["real-ab", "real-bc", "real-cd-xy"]
But how come this next string gets nothing?
a = "real-a(+)real-b(+)real-c"
a.scan(/[a-z_0-9]+\-[a-z_0-9]+[\-\[a-z_0-9]+\]?/)
=> []
How can I have it so both strings output into a 3 count array?

You've confused parentheses (used for grouping) and square brackets (used for character classes). You want
a.scan(/[a-z_0-9]+-[a-z_0-9]+(?:-[a-z_0-9]+)?/)
(?:...) creates a non-capturing group which is what you need here.
Furthermore, unless you want to disallow uppercase letters explicitly, you can write \w as a shorthand for "a letter, digit or underscore":
a.scan(/\w+-\w+(?:-\w+)?/)

a.scan(/[a-z_0-9]+\-[a-z_0-9]+/)

Why not simply?
a.scan(/[a-z_0-9\-]+/)

Related

Method gsub does not work as expected

I want to change "#" to "\40" in a string. But am not able to do so.
a = "srikanth#in.com"
a.gsub("#", "\40")
# => "srikanth in.com"
It's changing \40 with space. Any idea how to implement this?
An other solution
puts a.gsub("#") {"\\40"}
# => srikanth\40in.com
\\40 doesn't work because it refers to a capture group. From the docs:
If replacement is a String it will be substituted for the matched
text. It may contain back-references to the pattern’s capture groups
of the form \\d, where d is a group number ...
You can use gsub's hash syntax instead:
If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string.
Example:
a.gsub('#', '#' => '\\40')
#=> "srikanth\\40in.com"
backslashes have a special meaning in the second parameter of gsub. They refer to a possibly matched regex groups. I tried escaping, but couldn't get it to work. It works this way, though:
s = "srikanth#in.com"
s['#'] = '\\40'
s # => "srikanth\\40in.com"

String gsub - Replace characters between two elements, but leave surrounding elements

Suppose I have the following string:
mystring = "start/abc123/end"
How can you splice out the abc123 with something else, while leaving the "/start/" and "/end" elements intact?
I had the following to match for the pattern, but it replaces the entire string. I was hoping to just have it replace the abc123 with 123abc.
mystring.gsub(/start\/(.*)\/end/,"123abc") #=> "123abc"
Edit: The characters between the start & end elements can be any combination of alphanumeric characters, I changed my example to reflect this.
You can do it using this character class : [^\/] (all that is not a slash) and lookarounds
mystring.gsub(/(?<=start\/)[^\/]+(?=\/end)/,"7")
For your example, you could perhaps use:
mystring.gsub(/\/(.*?)\//,"/7/")
This will match the two slashes between the string you're replacing and putting them back in the substitution.
Alternatively, you could capture the pieces of the string you want to keep and interpolate them around your replacement, this turns out to be much more readable than lookaheads/lookbehinds:
irb(main):010:0> mystring.gsub(/(start)\/.*\/(end)/, "\\1/7/\\2")
=> "start/7/end"
\\1 and \\2 here refer to the numbered captures inside of your regular expression.
The problem is that you're replacing the entire matched string, "start/8/end", with "7". You need to include the matched characters you want to persist:
mystring.gsub(/start\/(.*)\/end/, "start/7/end")
Alternatively, just match the digits:
mystring.gsub(/\d+/, "7")
You can do this by grouping the start and end elements in the regular expression and then referring to these groups in in the substitution string:
mystring.gsub(/(?<start>start\/).*(?<end>\/end)/, "\\<start>7\\<end>")

Ruby regular expression

Apparently I still don't understand exactly how it works ...
Here is my problem: I'm trying to match numbers in strings such as:
910 -6.258000 6.290
That string should gives me an array like this:
[910, -6.2580000, 6.290]
while the string
blabla9999 some more text 1.1
should not be matched.
The regex I'm trying to use is
/([-]?\d+[.]?\d+)/
but it doesn't do exactly that. Could someone help me ?
It would be great if the answer could clarify the use of the parenthesis in the matching.
Here's a pattern that works:
/^[^\d]+?\d+[^\d]+?\d+[\.]?\d+$/
Note that [^\d]+ means at least one non digit character.
On second thought, here's a more generic solution that doesn't need to deal with regular expressions:
str.gsub(/[^\d.-]+/, " ").split.collect{|d| d.to_f}
Example:
str = "blabla9999 some more text -1.1"
Parsed:
[9999.0, -1.1]
The parenthesis have different meanings.
[] defines a character class, that means one character is matched that is part of this class
() is defining a capturing group, the string that is matched by this part in brackets is put into a variable.
You did not define any anchors so your pattern will match your second string
blabla9999 some more text 1.1
^^^^ here ^^^ and here
Maybe this is more what you wanted
^(\s*-?\d+(?:\.\d+)?\s*)+$
See it here on Regexr
^ anchors the pattern to the start of the string and $ to the end.
it allows Whitespace \s before and after the number and an optional fraction part (?:\.\d+)? This kind of pattern will be matched at least once.
maybe /(-?\d+(.\d+)?)+/
irb(main):010:0> "910 -6.258000 6.290".scan(/(\-?\d+(\.\d+)?)+/).map{|x| x[0]}
=> ["910", "-6.258000", "6.290"]
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map(&:to_f)
# => [910.0, -6.258, 6.29]
If you don't want integers to be converted to floats, try this:
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map do |ns|
ns[/\./] ? ns.to_f : ns.to_i
end
# => [910, -6.258, 6.29]

How to get the particular part of string matching regexp in Ruby?

I've got a string Unnecessary:12357927251data and I need to select all data after colon and numbers. I will do it using Regexp.
string.scan(/:\d+.+$/)
This will give me :12357927251data, but can I select only needed information .+ (data)?
Anything in parentheses in a regexp will be captured as a group, which you can access in $1, $2, etc. or by using [] on a match object:
string.match(/:\d+(.+)$/)[1]
If you use scan with capturing groups, you will get an array of arrays of the groups:
"Unnecessary:123data\nUnnecessary:5791next".scan(/:\d+(.+)$/)
=> [["data"], ["next"]]
Use parenthesis in your regular expression and the result will be broken out into an array. For example:
x='Unnecessary:12357927251data'
x.scan(/(:\d+)(.+)$/)
=> [[":12357927251", "data"]]
x.scan(/:\d+(.+$)/).flatten
=> ["data"]
Assuming that you are trying to get the string 'data' from your string, then you can use:
string.match(/.*:\d*(.*)/)[1]
String#match returns a MatchData object. You can then index into that MatchData object to find the part of the string that you want.
(The first element of MatchData is the original string, the second element is the part of the string captured by the parentheses)
Try this: /(?<=\:)\d+.+$/
It changes the colon to a positive look-behind so that it does not appear in the output. Note that the colon alone is a metacharacter and so must be escaped with a backslash.
Using IRB
irb(main):004:0> "Unnecessary:12357927251data".scan(/:\d+(.+)$/)
=> [["data"]]

ruby parametrized regular expression

I have a string like "{some|words|are|here}" or "{another|set|of|words}"
So in general the string consists of an opening curly bracket,words delimited by a pipe and a closing curly bracket.
What is the most efficient way to get the selected word of that string ?
I would like do something like this:
#my_string = "{this|is|a|test|case}"
#my_string.get_column(0) # => "this"
#my_string.get_column(2) # => "is"
#my_string.get_column(4) # => "case"
What should the method get_column contain ?
So this is the solution I like right now:
class String
def get_column(n)
self =~ /\A\{(?:\w*\|){#{n}}(\w*)(?:\|\w*)*\}\Z/ && $1
end
end
We use a regular expression to make sure that the string is of the correct format, while simultaneously grabbing the correct column.
Explanation of regex:
\A is the beginnning of the string and \Z is the end, so this regex matches the enitre string.
Since curly braces have a special meaning we escape them as \{ and \} to match the curly braces at the beginning and end of the string.
next, we want to skip the first n columns - we don't care about them.
A previous column is some number of letters followed by a vertical bar, so we use the standard \w to match a word-like character (includes numbers and underscore, but why not) and * to match any number of them. Vertical bar has a special meaning, so we have to escape it as \|. Since we want to group this, we enclose it all inside non-capturing parens (?:\w*\|) (the ?: makes it non-capturing).
Now we have n of the previous columns, so we tell the regex to match the column pattern n times using the count regex - just put a number in curly braces after a pattern. We use standard string substition, so we just put in {#{n}} to mean "match the previous pattern exactly n times.
the first non skipped column after that is the one we care about, so we put that in capturing parens: (\w*)
then we skip the rest of the columns, if any exist: (?:\|\w*)*.
Capturing the column puts it into $1, so we return that value if the regex matched. If not, we return nil, since this String has no nth column.
In general, if you wanted to have more than just words in your columns (like "{a phrase or two|don't forget about punctuation!|maybe some longer strings that have\na newline or two?}"), then just replace all the \w in the regex with [^|{}] so you can have each column contain anything except a curly-brace or a vertical bar.
Here's my previous solution
class String
def get_column(n)
raise "not a column string" unless self =~ /\A\{\w*(?:\|\w*)*\}\Z/
self[1 .. -2].split('|')[n]
end
end
We use a similar regex to make sure the String contains a set of columns or raise an error. Then we strip the curly braces from the front and back (using self[1 .. -2] to limit to the substring starting at the first character and ending at the next to last), split the columns using the pipe character (using .split('|') to create an array of columns), and then find the n'th column (using standard Array lookup with [n]).
I just figured as long as I was using the regex to verify the string, I might as well use it to capture the column.

Resources