extract with regex, one liner in ruby - ruby

I would like to extract the word after "=". For "GENEINFO=AGRN:" in a document, I can use the regex /GENEINFO=(.*?):/ to extract the required. However the value I wanted to returned is just "AGRN". Is there a one-liner that I can use for this task?

Try using a lookbehind and a lookahead:
/(?<=GENEINFO=).*?(?=:)/

You could also use match:
'GENEINFO=AGRN:'.match(/GENEINFO=(.*?):/)[1]
#=> "AGRN"
Which could also be written using the String#[] method:
'GENEINFO=AGRN:'[/GENEINFO=(.*?):/, 1]
#=> "AGRN"

"GENEINFO=AGRN:"[/(?<==).*(?=:)/]
# => "AGRN"

You want a lookbehind and lookahead.
pattern = /(?<=GENEINFO=)(.*?)(?=:)/
value = "GENEINFO=AGRN:".scan(pattern)
// [["AGRN"]]

Related

Ruby regular expression to get users names after # symbol

How can I get the username without the # symbol?
That's everything between # and any non-word character.
message = <<-MESSAGE
From #victor with love,
To #andrea,
and CC goes to #ghost
MESSAGE
Using a Ruby regular expression, I tried
username_pattern = /#\w+/
I will like to get the following output
message.scan(username_pattern)
#=> ["victor", "andrea", "ghost"]
Use look behind
(?<=#)\w+
this will leave # symbol regex
I would go with:
message.scan(/(?<=#)\w+/)
#=> ["victor","andrea","ghost"]
You might want to read about look-behind regexp.
You could match the # and then capture one or more times a word character in a capturing group
#(\w+)
username_pattern = /#(\w+)/
Regex demo
Try this
irb(main):010:0> message.scan(/#(\w+)/m).flatten
=> ["victor", "andrea", "ghost"]

Can I use positive lookbehind to return a match in Ruby?

Suppose that I want to find all words in a given string that start with b and end with ing . However, I only want to return the portion of the
word that precedes the ing. Thus, if the word is bailing, I should only
match and return bail.
The below Ruby regex will certainly match:
\bt[a-zA-Z]*ing\b
but it doesn't return just the "bail" portion. Can I use some kind of lookahead or lookbehind assertion? If not, what is a good way to do this in Ruby?
words = "booster bailings balling failing"
words.scan /(?<=\b)b\w*?(?=ing\b)/
#⇒ ["ball"]
Here are two ways to extract the desired information.
str = "blathering fumbling blinging bérgering blings"
str.scan(/\bb[[:alpha:]]*(?=ing\b)/)
#=> ["blather", "bling", "bérger"]
str.scan(/\b(b[[:alpha:]]*)ing\b/).flatten
#=> ["blather", "bling", "bérger"]
whereas
str.scan(/\bb[a-zA-Z]*(?=ing\b)/)
#=> ["blather", "bling"]

What would the regex be to detect this string?

I need to determine if a given string has the sequence dash-alpha-alpha-dash.
Example strings:
114888-ZV-209897
409-II-224858
86296-MO-184080
2459-ND-217906
What would be the the regex to determine that?
I'm using Ruby 1.9.3, FWIW.
if subject =~ /-[A-Z]{2}-/
# Successful match
else
# Match attempt failed
end
That [A-Z] thingy is a character class.
It's a simple pattern:
/-[A-Z]{2}-/
will do it.
Your regex is available at: http://rubular.com/r/6hn8BLc7rF
For instance:
"114888-ZV-209897"[/-[A-Z]{2}-/]
=> "-ZV-"
So use:
if "114888-ZV-209897"[/-[A-Z]{2}-/] ...

Replacing regex capture with the same capture and an extra string

I am trying to escape certain characters in a string. In particular, I want to turn
abc/def.ghi into abc\/def\.ghi
I tried to use the following syntax:
1.9.3p125 :076 > "abc/def.ghi".gsub(/([\/.])/, '\\\1')
=> "abc\\1def\\1ghi"
Hmm. This behaves as if capture replacements didn't work. Yet, when I tried this:
1.9.3p125 :075 > "abc/def.ghi".gsub(/([\/.])/, '\1')
=> "abc/def.ghi"
... I got the replacement to work, but, of course, my prefixes weren't part of it.
What is the correct syntax to do something like this?
This should be easier
gsub(/(?=[.\/])/, "\\")
If you are trying to prepare a string to be used as a regex pattern, use the right tool:
Regexp.escape('abc/def.ghi')
=> "abc/def\\.ghi"
You can then use the resulting string to create a regex:
/#{ Regexp.escape('abc/def.ghi') }/
=> /abc\/def\.ghi/
or:
Regexp.new(Regexp.escape('abc/def.ghi'))
=> /abc\/def\.ghi/
From the docs:
Escapes any characters that would have special meaning in a regular expression. Returns a new escaped string, or self if no characters are escaped. For any string, Regexp.new(Regexp.escape(str))=~str will be true.
Regexp.escape('\*?{}.') #=> \\\*\?\{\}\.
You can pass a block to gsub:
>> "abc/def.ghi".gsub(/([\/.])/) {|m| "\\#{m}"}
=> "abc\\/def\\.ghi"
Not nearly as elegant as #sawa's answer, but it was the only way I could find to get it to work if you need the replacing string to contain the captured group/backreference (rather than inserting the replacement before the look-ahead).

Remove character from string if it starts with that character?

How can I remove the very first "1" from any string if that string starts with a "1"?
"1hello world" => "hello world"
"112345" => "12345"
I'm thinking of doing
string.sub!('1', '') if string =~ /^1/
but I' wondering there's a better way. Thanks!
Why not just include the regex in the sub! method?
string.sub!(/^1/, '')
As of Ruby 2.5 you can use delete_prefix or delete_prefix! to achieve this in a readable manner.
In this case "1hello world".delete_prefix("1").
More info here:
https://blog.jetbrains.com/ruby/2017/10/10-new-features-in-ruby-2-5/
https://bugs.ruby-lang.org/issues/12694
'invisible'.delete_prefix('in') #=> "visible"
'pink'.delete_prefix('in') #=> "pink"
N.B. you can also use this to remove items from the end of a string with delete_suffix and delete_suffix!
'worked'.delete_suffix('ed') #=> "work"
'medical'.delete_suffix('ed') #=> "medical"
https://bugs.ruby-lang.org/issues/13665
I've answered in a little more detail (with benchmarks) here: What is the easiest way to remove the first character from a string?
if you're going to use regex for the match, you may as well use it for the replacement
string.sub!(%r{^1},"")
BTW, the %r{} is just an alternate syntax for regular expressions. You can use %r followed by any character e.g. %r!^1!.
Careful using sub!(/^1/,'') ! In case the string doesn't match /^1/ it will return nil. You should probably use sub (without the bang).
This answer might be more optimised: What is the easiest way to remove the first character from a string?
string[0] = '' if string[0] == '1'
I'd like to post a tiny improvement to the otherwise excellent answer by Zach. The ^ matches the beginning of every line in Ruby regex. This means there can be multiple matches per string. Kenji asked about the beginning of the string which means they have to use this regex instead:
string.sub!(/\A1/, '')
Compare this - multiple matches with this - one match.

Resources