Split Ruby regex over multiple lines - ruby

This might not be quite the question you're expecting! I don't want a regex that will match over line-breaks; instead, I want to write a long regex that, for readability, I'd like to split onto multiple lines of code.
Something like:
"bar" =~ /(foo|
bar)/ # Doesn't work!
# => nil. Would like => 0
Can it be done?

Using %r with the x option is the prefered way to do this.
See this example from the github ruby style guide
regexp = %r{
start # some text
\s # white space char
(group) # first group
(?:alt1|alt2) # some alternation
end
}x
regexp.match? "start groupalt2end"
https://github.com/github/rubocop-github/blob/master/STYLEGUIDE.md#regular-expressions

You need to use the /x modifier, which enables free-spacing mode.
In your case:
"bar" =~ /(foo|
bar)/x

you can use:
"bar" =~ /(?x)foo|
bar/

Rather than cutting the regex mid-expression, I suggest breaking it into parts:
full_rgx = /This is a message\. A phone number: \d{10}\. A timestamp: \d*?/
msg = /This is a message\./
phone = /A phone number: \d{10}\./
tstamp = /A timestamp: \d*?/
/#{msg} #{phone} #{tstamp}/
I do the same for long strings.

regexp = %r{/^
WRITE
EXPRESSION
HERE
$/}x

Related

How remove "(2002)" (without quotes) from string in Ruby?

I have a string like this
This is some text; Awesome! (2002)
I want to remove the "(2002)" part from it using Ruby. How is this done? I know in unix it'd be
sed -e 's/([0-9]*)//g'
To remove any amount of whitespace symbols followed with a (, then one or more digits and a ) at the end of the string, use a sub with a /\s*\(\d+\)\z/ regex:
s = "This is some text; Awesome! (2002)"
s = s.sub(/\s*\(\d+\)\z/,"") # => This is some text; Awesome!
or
s[/\s*\(\d+\)\z/] = "" # => This is some text; Awesome!
See Ruby demo
If you mean a literal 2002, use it instead of \d+.
NOTE: When you use s[...] = "" approach, you still get a string as the return type, you can check it with s.class.
NOTE2: If you need to obtain the 2002 value separately, use s[/\s*\((\d+)\)\z/, 1] where 1 is passed to the matching method to return the contents of Group 1 only.
NOTE3: To split the string at the last space and get the ["This is some text; Awesome!", "2002"] as a result, use either Cary's suggestion with the regex containing a capturing group around \d+ - [s.sub(/\s*\((\d+)\)\z/,''), $1] (as $1 variable will hold the capture group 1 contents after sub executes), or s.split(/\s*\((\d+)\)\z/) where the result holds the substring from the start up to our pattern, and the digits that are wrapped with a (...) capturing group (after splitting, these values are placed into the result, not discarded).
And finally, /\([^)]*\)/ matches anything inside (...) (\( matches an open parenthesis, [^)]* matches 0 or more chars other than ) and \) matches a closing parenthesis).
If I wanted to remove something, I'd use:
foo = 'This is some text; Awesome! (2002)'
foo['(2002)'] = ''
foo # => "This is some text; Awesome! "
You can also use regex instead of the fixed string. Either way, assigning '' to the match will remove it.
foo[/\(2002\)/] = ''
foo # => "This is some text; Awesome! "
or:
foo[/\(\d+\)/] = ''
foo # => "This is some text; Awesome! "
This is documented in String's []= method.
The regex I showed you on a different question can be modified for use here:
str = "something (capture) something (capture2)"
regex = /(\(\w+\))‌​/
str.scan(regex).flatten(1) # => ["(capture)", "(capture2)"]
The only change is the addition of \( and \) in the match group.
You can plug this regex into gsub to remove all matches:
str.gsub(regex, "")
# => "something something "

Regex to grab full firstname and first letter of last name

I have a list of users grabbed by the Etc Ruby library:
Thomas_J_Perkins
Jennifer_Scanner
Amanda_K_Loso
Aaron_Cole
Mark_L_Lamb
What I need to do is grab the full first name, skip the middle name (if given), and grab the first character of the last name. The output should look like this:
Thomas P
Jennifer S
Amanda L
Aaron C
Mark L
I'm not sure how to do this, I've tried grabbing all of the characters: /\w+/ but that will grab everything.
You don't always need regular expressions.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. Jamie Zawinski
You can do it with some simple Ruby code
string = "Mark_L_Lamb"
string.split('_').first + ' ' + string.split('_').last[0]
=> "Mark L"
I think its simpler without regex:
array = "Thomas_J_Perkins".split("_") # split at _
array.first + " " + array.last[0] # .first prints first name .last[0] prints first char of last name
#=> "Thomas P"
You can use
^([^\W_]+)(?:_[^\W_]+)*_([^\W_])[^\W_]*$
And replace with \1_\2. See the regex demo
The [^\W_] matches a letter or a digit. If you want to only match letters, replace [^\W_] with \p{L}.
^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$
See updated demo
The point is to match and capture the first chunk of letters up to the first _ (with (\p{L}+)), then match 0+ sequences of _ + letters inside (with (?:_\p{L}+)*_) and then match and capture the last word first letter (with (\p{L})) and then match the rest of the string (with \p{L}*).
NOTE: replace ^ with \A and $ with \z if you have independent strings (as in Ruby ^ matches the start of a line and $ matches the end of the line).
Ruby code:
s.sub(/^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$/, "\\1_\\2")
I'm in the don't-use-a-regex-for-this camp.
str1 = "Alexander_Graham_Bell"
str2 = "Sylvester_Grisby"
"#{str1[0...str1.index('_')]} #{str1[str1.rindex('_')+1]}"
#=> "Alexander B"
"#{str2[0...str2.index('_')]} #{str2[str2.rindex('_')+1]}"
#=> "Sylvester G"
or
first, last = str1.split(/_.+_|_/)
#=> ["Alexander", "Bell"]
first+' '+last[0]
#=> "Alexander B"
first, last = str2.split(/_.+_|_/)
#=> ["Sylvester", "Grisby"]
first+' '+last[0]
#=> "Sylvester G"
but if you insist...
r = /
(.+?) # match any characters non-greedily in capture group 1
(?=_) # match an underscore in a positive lookahead
(?:.*) # match any characters greedily in a non-capture group
(?:_) # match an underscore in a non-capture group
(.) # match any character in capture group 2
/x # free-spacing regex definition mode
str1 =~ r
$1+' '+$2
#=> "Alexander B"
str2 =~ r
$1+' '+$2
#=> "Sylvester G"
You can of course write
r = /(.+?)(?=_)(?:.*)(?:_)(.)/
This is my attempt:
/([a-zA-Z]+)_([a-zA-Z]+_)?([a-zA-Z])/
See demo
Let's see if this works:
/^([^_]+)(?:_\w)?_(\w)/
And then you'll have to combine the first and second matches into the format you want. I don't know Ruby, so I can't help you there.
And another attempt using a replacement method:
result = subject.gsub(/^([^_]+)(?:_[^_])?_([^_])[^_]+$/, '\1 \2')
We capture the entire string, with the relevant parts in capturing groups. Then just return the two captured groups
using the split method is much better
full_names.map do |full_name|
parts = full_name.split('_').values_at(0,-1)
parts.last.slice!(1..-1)
parts.join(' ')
end
/^[A-Za-z]{5,15}\s[A-Za-z]{1}]$/i
This will have the following criteria:
5-15 characters for first name then a whitespace and finally a single character for last name.

When Ruby regex doesn't fit on line

When I have a very long regex, like a cucumber step definition, what would be the best way to line wrap it?
example, i would like something like:
When /^I have a very long step definition here in my step definition file$/ do
...
end
break up into two lines (this doesnt work:)
When /^I have a very long step definition here in /\
/my step definition file$/ do
...
end
2018 update
If you're here specifically for cucumber, using cucumber expressions is a great alternative to regexes
You can use a verbose regex with the /x modifier, but then you need to make spaces explicit because they will otherwise be ignored. Another advantage is that this allows you to comment your regex (which, if it's long, might be a good idea):
/^ # Match start of string
I[ ]have[ ]a[ ]very[ ]long[ ]
step[ ]definition[ ]here[ ]
in[ ]my[ ]step[ ]definition[ ]file
$ # Match end of string
/x
What about string to regex transformation ? You'll lose syntax highlighting but I'd say it isn't too bad ?
When( Regexp.new(
'^My long? regex(?:es) that are '\
'(?:maybe ) broken into (\d+) lines '\
'because they (?:might )be non-readable otherwise$'
)) do |lines|
...
end
# => /^My long? regex(?:es) that are (?:maybe ) broken into (\d+) lines because they (?:might )be non-readable otherwise$/
What about
/a\
b/
# => /ab/
I know you want it in Ruby, but i can give you an example how it can be realised in Perl. I really think that you can use the idea behind this in Ruby as well.
my $re1 = "I have a very long step definition here in";
my $re2 = "my step definition file";
if ( $line =~ m/^$re1 $re2$/i ) {
...
}
The idea is to save the Regex into a variable and write the variable inside the regex.

Ruby remove everything except some characters?

How can I remove from a string all characters except white spaces, numbers, and some others?
Something like this:
oneLine.gsub(/[^ULDR0-9\<\>\s]/i,'')
I need only: 0-9 l d u r < > <space>
Also, is there a good document about the use of regex in Ruby, like a list of special characters with examples?
The regex you have is already working correctly. However, you do need to assign the result back to the string you're operating on. Otherwise, you're not changing the string (.gsub() does not modify the string in-place).
You can improve the regex a bit by adding a '+' quantifier (so consecutive characters can be replaced in one go). Also, you don't need to escape angle brackets:
oneLine = oneLine.gsub(/[^ULDR0-9<>\s]+/i, '')
A good resource with special consideration of Ruby regexes is the Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan. A good online tutorial by the same author is here.
Good old String#delete does this without a regular expression. The ^ means 'NOT'.
str = "12eldabc8urp pp"
p str.delete('^0-9ldur<> ') #=> "12ld8ur "
Just for completeness: you don't need a regular expression for this particular task, this can be done using simple string manipulation:
irb(main):005:0> "asdasd123".tr('^ULDRuldr0-9<>\t\r\n ', '')
=> "dd123"
There's also the tr! method if you want to replace the old value:
irb(main):009:0> oneLine = 'UasdL asd 123'
irb(main):010:0> oneLine.tr!('^ULDRuldr0-9<>\t\r\n ', '')
irb(main):011:0> oneLine
=> "UdL d 123"
This should be a bit faster as well (but performance shouldn't be a big concern in Ruby :)

How to strip leading and trailing quote from string, in Ruby

I want to strip leading and trailing quotes, in Ruby, from a string. The quote character will occur 0 or 1 time. For example, all of the following should be converted to foo,bar:
"foo,bar"
"foo,bar
foo,bar"
foo,bar
You could also use the chomp function, but it unfortunately only works in the end of the string, assuming there was a reverse chomp, you could:
'"foo,bar"'.rchomp('"').chomp('"')
Implementing rchomp is straightforward:
class String
def rchomp(sep = $/)
self.start_with?(sep) ? self[sep.size..-1] : self
end
end
Note that you could also do it inline, with the slightly less efficient version:
'"foo,bar"'.chomp('"').reverse.chomp('"').reverse
EDIT: Since Ruby 2.5, rchomp(x) is available under the name delete_prefix, and chomp(x) is available as delete_suffix, meaning that you can use
'"foo,bar"'.delete_prefix('"').delete_suffix('"')
I can use gsub to search for the leading or trailing quote and replace it with an empty string:
s = "\"foo,bar\""
s.gsub!(/^\"|\"?$/, '')
As suggested by comments below, a better solution is:
s.gsub!(/\A"|"\Z/, '')
As usual everyone grabs regex from the toolbox first. :-)
As an alternate I'll recommend looking into .tr('"', '') (AKA "translate") which, in this use, is really stripping the quotes.
Another approach would be
remove_quotations('"foo,bar"')
def remove_quotations(str)
if str.start_with?('"')
str = str.slice(1..-1)
end
if str.end_with?('"')
str = str.slice(0..-2)
end
end
It is without RegExps and start_with?/end_with? are nicely readable.
It frustrates me that strip only works on whitespace. I need to strip all kinds of characters! Here's a String extension that will fix that:
class String
def trim sep=/\s/
sep_source = sep.is_a?(Regexp) ? sep.source : Regexp.escape(sep)
pattern = Regexp.new("\\A(#{sep_source})*(.*?)(#{sep_source})*\\z")
self[pattern, 2]
end
end
Output
'"foo,bar"'.trim '"' # => "foo,bar"
'"foo,bar'.trim '"' # => "foo,bar"
'foo,bar"'.trim '"' # => "foo,bar"
'foo,bar'.trim '"' # => "foo,bar"
' foo,bar'.trim # => "foo,bar"
'afoo,bare'.trim /[aeiou]/ # => "foo,bar"
Assuming that quotes can only appear at the beginning or end, you could just remove all quotes, without any custom method:
'"foo,bar"'.delete('"')
I wanted the same but for slashes in url path, which can be /test/test/test/ (so that it has the stripping characters in the middle) and eventually came up with something like this to avoid regexps:
'/test/test/test/'.split('/').reject(|i| i.empty?).join('/')
Which in this case translates obviously to:
'"foo,bar"'.split('"').select{|i| i != ""}.join('"')
or
'"foo,bar"'.split('"').reject{|i| i.empty?}.join('"')
Regexs can be pretty heavy and lead to some funky errors. If you are not dealing with massive strings and the data is pretty uniform you can use a simpler approach.
If you know the strings have starting and leading quotes you can splice the entire string:
string = "'This has quotes!'"
trimmed = string[1..-2]
puts trimmed # "This has quotes!"
This can also be turned into a simple function:
# In this case, 34 is \" and 39 is ', you can add other codes etc.
def trim_chars(string, char_codes=[34, 39])
if char_codes.include?(string[0]) && char_codes.include?(string[-1])
string[1..-2]
else
string
end
end
You can strip non-optional quotes with scan:
'"foo"bar"'.scan(/"(.*)"/)[0][0]
# => "foo\"bar"

Resources