regexp to match string1 unless preceded by string2 - ruby

Using Ruby, how can I use a single regex to match all occurrences of 'y' in "xy y ay xy +y" that are NOT preceded by x (y, ay, +y)?
/[^x]y/ matches the preceding character too, so I need an alternative...

You need a zero-width negative look-behind assertion. Try /(?<!x)y/ which says exactly what you're looking for, i.e. find all 'y' not preceeded by 'x', but doesn't include the prior character, which is the zero-width part.
Edited to add: Apparently this is only supported from Ruby 1.9 and up.

Negative look-behind is not supported until Ruby 1.9, but you can do something similar using scan:
"xy y ay xy +y".scan(/([^x])(y)/) # => [[" ", "y"], ["a", "y"], ["+", "y"]]
"xy y ay xy +y".scan(/([^x])(y)/).map {|match| match[1]} # => ["y", "y", "y"]
Of course, this is much more difficult if you want to avoid much more than a single character before the y. Then you'd have to do something like:
"abby y crabby bally +y".scan(/(.*?)(y)/).reject {|str| str[0] =~ /ab/} # => [[" ", "y"], [" ball", "y"], [" +", "y"]]
"abby y crabby bally +y".scan(/(.*?)(y)/).reject {|str| str[0] =~ /ab/}.map {|match| match[1]} # => ["y", "y" "y"]

Ruby unfortunately doesn't support negative lookbehind, so you'll have trouble if you need to look for more than a single character. For just one character, you can take care of this by capturing the match:
/[^x](y)/

In PCRE, you use a negative look-behind:
(:<!x)y
Not sure if this is supported by Ruby, but you can always look up.

It can be done with negative look behind, (?<!x)y

Related

How to split on a character and keep value

I have a string:
"N8383"
I want to split on the character and maintain it to get:
["N", "8383"]
I tried the following:
"N8383".split(/[A-Z]/)
which gives me:
["", "8383"]
I want to match some more example strings like:
N344 344N S555 555S
String#split is a bad fit for this problem for the reasons others have stated. I would approach it like this, using String#scan instead:
str_parts = "N8383".scan(/[[:alpha:]]+/)
num_parts = "N8383".scan(/[[:digit:]]+/)
This will give you something to work with if the strings contain multiple string parts and/or multiple numeric parts.
This expression:
%w[N344 344N S555 555S].map do |str|
next str.scan(/[[:alpha:]]+/), str.scan(/[[:digit:]]+/)
end
Will return:
[
[["N"], ["344"]],
[["N"], ["344"]],
[["S"], ["555"]],
[["S"], ["555"]]
]
Although you are scanning each string twice, I think it's a better solution than 1. trying to come up with a complex regex that backtracks to return the parts in the right order, or 2. reprocessing the results to put the parts in the right order. Especially if the strings are as short as they are in the examples you've provided. That being said, if scanning each string twice really rankles you, here's another way to do it:
str_parts, num_parts = str.scan(/([[:alpha:]]+)|([[:digit:]]+)/).transpose.each(&:compact!)
Okay given the examples you could use the following regex
/(?=[A-Z])|(?<=[A-Z])/
This will look look ahead (?=) for a single character [A-Z] or look behind (?<=) for a single character [A-Z]. Since these are zero length assertions the split is placed between the characters rather than being the character. e.g.
%w{N8383 N344 344N S555 555S}.map {|s| s.split(/(?=[A-Z])|(?<=[A-Z])/) }
#=> [["N", "8383"], ["N", "344"], ["344", "N"], ["S", "555"], ["555", "S"]]
However this regex is specific to the given cases and does not offer any real deviation from the given cases e.g I have no idea of desired output for "N344S" but right now it will be ["N", "344" ,"S"] and worse yet "NSS344S" will be ["N", "S", "S", "344", "S"]
def doit(str)
str.scan(/\d+|\p{L}+/)
end
doit "N123" #=> ["N", "123"]
doit "123N" #=> ["123", "N"]
doit "N123M" #=> ["N", "123", "M"]
doit "N12M3P" #=> ["N", "12", "M", "3", "P"]
doit "123" #=> ["123"]
doit "NMN" #=> ["NMN"]
doit "" #=> []

How do I write a regex that matches the beginning of the line or NOT a character?

I'm using Ruby 2.4. How do I write a regular expression in which matches something where the last character is a dash and the preceding character is not a dash or the beginning of the line. So this expression shoudl match
"-"
as shoudl
"ab-"
but this should not
"---"
I tried the below but I'm not matching anything
2.4.0 :012 > word = "abc-"
=> "abc-"
2.4.0 :013 > word =~ /(^|\^\-)\-$/
=> nil
Here is my go at it:
regex = /[^-\A]-\z/
%w(- ab- ---).map { |s| s =~ regex }
=> [nil, 1, nil]
Not 100% sure I got your requirements right, but this does seem to do the trick:
regex = /(^|(?!-).*)-$/
%w(- ab- ---).map { |s| s =~ regex }
#=> [0, 0, nil]
Check it out on Rubular with some test cases.

gsub method and regex (case sensitive and case insensitive)

In ruby, I want to substitute some letters in a string, is there a better way of doing this?
string = "my random string"
string.gsub(/a/, "#").gsub(/i/, "1").gsub(/o/, "0")`
And if I want to substitute both "a" and "A" with a "#", I know I can do .gsub(/a/i, "#"), but what if I want to substitute every "a" with an "e" and every "A" with an "E"? Is there a way of abstracting it instead of specifying both like .gsub(/a/, "e").gsub(/A/, "E")?
You can use a Hash. eg:
h = {'a' => '#', 'b' => 'B', 'A' => 'E'}
"aAbBcC".gsub(/[abA]/, h)
# => "#EBBcC"
Not really an answer to your question, but more an other way to proceed: use the translation:
'aaAA'.tr('aA', 'eE')
# => 'eeEE'
For the same transformation, you can also use the ascii table:
'aaAA'.gsub(/a/i) {|c| (c.ord + 4).chr}
# => 'eeEE'
other example (the last character is used by default):
'aAaabbXXX'.tr('baA', 'B#')
# => '####BBXXX'
Here are two variants of #Santosh's answer:
str ="aAbBcC"
h = {'a' => '#', 'b' => 'B', 'A' => 'E'}
#1
str.gsub(/[#{h.keys.join}]/, h) #=> "#EBBcC"
#2
h.default_proc = ->(_,k) { k }
str.gsub(/./, h) #=> "#EBBcC"
These offer better maintainability should h could change in future
You can also pass gsub a block
"my random string".gsub(/[aoi]/) do |match|
case match; when "a"; "#"; when "o"; "0"; when "i"; "I" end
end
# => "my r#nd0m strIng"
The use of a hash is of course much more elegant in this case, but if you have complex rules of substitution it can come in handy to dedicate a class to it.
"my random string".gsub(/[aoi]/) {|match| Substitute.new(match).result}
# => "my raws0m strAINng"

Scanning through a hash and return a value if true

Based on my hash, I want to match it if it's in the string:
def conv
str = "I only have one, two or maybe sixty"
hash = {:one => 1, :two => 2, :six => 6, :sixty => 60 }
str.match( Regexp.union( hash.keys.to_s ) )
end
puts conv # => <blank>
The above does not work but this only matches "one":
str.match( Regexp.union( hash[0].to_s ) )
Edited:
Any idea how to match "one", "two" and sixty in the string exactly?
If my string has "sixt" it return "6" and that should not happen based on #Cary's answer.
You need to convert each element of hash.keys to a string, rather than converting the array hash.keys to a string, and you should use String#scan rather than String#match. You may also need to play around with the regex until it returns everyhing you want and nothing you don't want.
Let's first look at your example:
str = "I only have one, two or maybe sixty"
hash = {:one => 1, :two => 2, :six => 6, :sixty => 60}
We might consider constructing the regex with word breaks (\b) before and after each word we wish to match:
r0 = Regexp.union(hash.keys.map { |k| /\b#{k.to_s}\b/ })
#=> /(?-mix:\bone\b)|(?-mix:\btwo\b)|(?-mix:\bsix\b)|(?-mix:\bsixty\b)/
str.scan(r0)
#=> ["one", "two", "sixty"]
Without the word breaks, scan would return ["one", "two", "six"], as "sixty" in str would match "six". (Word breaks are zero-width. One before a string requires that the string be preceded by a non-word character or be at the beginning of the string. One after a string requires that the string be followed by a non-word character or be at the end of the string.)
Depending on your requirements, word breaks may not be sufficient or suitable. Suppose, for example (with hash above):
str = "I only have one, two, twenty-one or maybe sixty"
and we do not wish to match "twenty-one". However,
str.scan(r0)
#=> ["one", "two", "one", "sixty"]
One option would be to use a regex that demands that matches be preceded by whitespace or be at the beginning of the string, and be followed by whitespace or be at the end of the string:
r1 = Regexp.union(hash.keys.map { |k| /(?<=^|\s)#{k.to_s}(?=\s|$)/ })
str.scan(r1)
#=> ["sixty"]
(?<=^|\s) is a positive lookbehind; (?=\s|$) is a positive lookahead.
Well, that avoided the match of "twenty-one" (good), but we no longer matched "one" or "two" (bad) because of the comma following each of those words in the string.
Perhaps the solution here is to first remove punctuation, which allows us to then apply either of the above regexes:
str.tr('.,?!:;-','')
#=> "I only have one two twentyone or maybe sixty"
str.tr('.,?!:;-','').scan(r0)
#=> ["one", "two", "sixty"]
str.tr('.,?!:;-','').scan(r1)
#=> ["one", "two", "sixty"]
You may also want to change / at the end of the regex to /i to make the match insensitive to case.1
1 Historical note for readers who want to know why 'a' is called lower case and 'A' is called upper case.

putting enumeration with spaces in rails collection

irb(main):001:0> t = %w{this is a test}
=> ["this", "is", "a", "test"]
irb(main):002:0> t.size
=> 4
irb(main):003:0> t = %w{"this is" a test}
=> ["\"this", "is\"", "a", "test"]
irb(main):004:0> t.size
=> 4
In the end I expected t.size to be 3.
As suggested, each space has to be escaped ...which turns out to be a lot of work. What other options are there? I have a list of about 30 words that I need to put in a collection because I am showing them as checkboxes using simple_form
Why not just use a normal array so no one has to visually parse all the escaping to figure out what's going on? This is pretty clear:
t = [
'this is',
'a',
'test'
]
and the people maintaining your code won't hate you for using %w{} when it isn't appropriate or when they mess things up because they didn't see your escaped whitespace.
You need to escape the space with a '\', like t = %w{this\ is a test} if you dont want that space to be a splitter.
Escape the space using \:
%w{this\ is a test}
You can escape the space %w{this\ is a test} to get ['this is', 'a', 'test'], but in general I wouldn't use %w unless then intention is to split on whitespace.
As others have pointed out use the %w{} construct when spaces are the separator for the words. If you have items that must be quoted and still want to use the construct you can do:
> %w{a test here}.unshift("This is")
=> ["This is", "a", "test", "here"]
require 'csv'
str = '"this is" a test'
p CSV.parse_line(str,{:col_sep=>' '})
#=> ["this is", "a", "test"]

Resources