GSUB replace 3 or more repeating characters - ruby

A little help would be appreciated with a gsub regex in ruby. I need to replace 3 or more forward slashes "//////" with just 2 forward slashes "//" in a string of text. However, a single forward slash and double forward slashes should be skipped and left as is.
my data looks like this jeep/grand cherokee////////hyundai/////harley davidson//bmw and should be converted to jeep/grand cherokee//hyundai//harley davidson//bmw
I don't have much experience with using gsub regex's, here's a few things I've tried but they either strip out all forward slashes or add in too many.
vehicles = vehicles.gsub(/[\/\\1{3,}]/, "")
vehicles = vehicles.gsub(/[\/\2+]/, "//")
vehicles = vehicles.gsub(/[\/{3,}]/,"//")

When you enclose the whole pattern inside square brackets, you make it match a single char.
Your regexps mean:
[\/\\1{3,}] - a single char, /, \, 1, {, 3, , or }
[\/\2+] - /, \u0002 char or +
[\/{3,}] - /, {, 3, , or }
You can use
s.gsub(/\/{3,}/, '//')
See the Ruby demo online.

Related

How do I match a regex in which the next non-space character is not a "/"?

How do I express in regex the letter "s" whose next non-space character is not a "/"?
These should match: "s", "str"
These should not: "s/m", "s /n"
I tried this
"str" =~ /s[^[[:space:]]]^\// #=> nil
but it does not even match the simple use case.
It seems you need to match any s that is not followed with any 0+ whitespace chars and a / after them.
Use
/s(?![[:space:]]*\/)/
See the Rubular demo.
Details
s - the letter s
(?![[:space:]]*\/) - a negative lookahead that fails the match if, immediately to the right of the current location, there are
[[:space:]]* - 0+ whitespaces
\/ - a /.
If you merely want to know the number of 's' characters that are not followed by zero or more spaces and then a forward slash (as opposed to their indices in the string), you don't have to use a regular expression.
"sea shells /by the sea s/hore".delete(" ").gsub("s/", "").count("s")
#=> 3
If you only want to know if there is at least one such 's' you could replace count("s") with include?("s").
I'm not arguing that this is preferable to the use of a regular expression.

Regular Expression replacement to convert Less mixins to Scss

I'm looking to convert Less mixin calls to their equivalents in Scss:
.mixin(); should become #mixin();
.mixin(0); should become #mixin(0);
.mixin(0; 1; 2); should become #mixin(0, 1, 2);
I'm having the most difficulty with the third example, as I essentially need to match n groups separated by semicolons, and replace those with the same groups separated by commas. I suppose this relies on some sort of repeating groups functionality in regexes that I'm not familiar with.
It's not simply enough to simply replace semicolons within paren - I need a regex that will only match the \.[\w\-]+\(.*\) format of mixins, but obviously with some magic in the second match group to handle the 3rd example above.
I'm doing this in Ruby, so if you're able to provide replacement syntax that's compatible with gsub, that would be awesome. I would like a single regex replacement, something that doesn't require multiple passes to clean up the semicolons.
I suggest adding two capturing groups round the subvalues you need and using an additional gsub in the first gsub block to replace the ; with , only in the 2nd group.
See
s = ".mixin(0; 1; 2);"
puts s.gsub(/\.([\w\-]+)(\(.*\))/) { "##{$1}#{$2.gsub(/;/, ',')}" }
# => #mixin(0, 1, 2);
The pattern details:
\. - a literal dot
([\w\-]+) - Group 1 capturing 1 or more word chars ([a-zA-Z0-9_]) or -
(\(.*\)) - Group 2 capturing a (, then any 0+ chars other than linebreak symbols as many as possible up to the last ) and the last ). NOTE: if there are multiple values, use lazy matching - (\(.*?\)) - here.
Here you go:
less_style = ".mixin(0; 1; 2);"
# convert the first period to #
less_style.gsub! /^\./, '#'
# convert the inner semicolons to commas
scss_style = less_style.gsub /(?<=[\(\d]);/, ','
scss_style
# => "#mixin(0, 1, 2);"
The second regex is using positive lookbehinds. You can read about those here: http://www.regular-expressions.info/lookaround.html
I also use this neat web app to play around with regexes: http://rubular.com/
This will get you a single pass through gsub:
".mixin(0; 1; 2);".gsub(/(?<!\));|\./, ";" => ",", "." => "#")
=> "#mixin(0, 1, 2);"
It's an OR regex with a hash for the replacement parameters.
Assuming from your example that you just want to replace semicolons not following close parens(negative lookbehind): (?<!\));
You can modify/build on this with other expressions. Even add more OR conditions to the regex.
Also, you can use the block version of gsub if you need more options.

How exactly does this work string.split(/\?|\.|!/).size?

I know, or at least I think I know, what this does (string.split(/\?|\.|!/).size); splits the string at every ending punctuation into an array and then gets the size of the array.
The part I am confused with is (/\?|\.|!/).
Thank you for your explanation.
Regular expressions are surrounded by slashes / /
The backslash before the question mark and dot means use those characters literally (don't interpret them as special instructions)
The vertical pipes are "or"
So you have / then question mark \? then "or" | then period \. then "or" | then exclamation point ! then / to end the expression.
/\?|\.|!/
It's a Regular Expression. That particular one matches any '?', '.' or '!' in the target string.
You can learn more about them here: http://regexr.com/
A regular expression splitting on the char "a" would look like this: /a/. A regular expression splitting on "a" or "b" is like this: /a|b/. So splitting on "?", "!" and "." would look like /?|!|./ - but it does not. Unfortunately, "?", and "." have special meaning in regexps which we do not want in this case, so they must be escaped, using "\".
A way to avoid this is to use Regexp.union("?","!",".") which results in /\?|!|\./
(/\?|\.|!/)
Working outside in:
The parentheses () captures everything enclosed.
The // tell Ruby you're using a Regular Expression.
\? Matches any ?
\. Matches any .
! Matches any !
The preceding \ tells Ruby we want to find these specific characters in the string, rather than using them as special characters.
Special characters (that need to be escaped to be matched) are:
. | ( ) [ ] { } + \ ^ $ * ?.
There is a nice guide to Ruby RegEx at:
http://rubular.com/ & http://www.tutorialspoint.com/ruby/ruby_regular_expressions.htm
For SO answers that involve regular expressions, I often use the "extended" mode, which makes them self-documenting. This one would be:
r = /
\? # match a question mark
| # or
\. # match a period
| # or
! # match an explamation mark
/x # extended mode
str = "Out, damn'd spot! out, I say!—One; two: why, then 'tis time to " +
"do't.—Hell is murky.—Fie, my lord, fie, a soldier, and afeard?"
str.split(r)
#=> ["Out, damn'd spot",
# " out, I say",
# "—One; two: why, then 'tis time to do't",
# "—Hell is murky",
# "—Fie, my lord, fie, a soldier, and afeard"]
str.split(r).size #=> 5
#steenslag mentioned Regexp::union. You could also use Regexp::new to write (with single quotes):
r = Regexp.new('\?|\.|!')
#=> /\?|\.|!/
but it really doesn't buy you anything here. You might find it useful in other situations, however.

Ruby Regex Match Between "foo" and "bar"

I have unfortunately wandered into a situation where I need regex using Ruby. Basically I want to match this string after the underscore and before the first parentheses. So the end result would be 'table salt'.
_____ table salt (1) [F]
As usual I tried to fight this battle on my own and with rubular.com. I got the first part
^_____ (Match the beginning of the string with underscores ).
Then I got bolder,
^_____(.*?) ( Do the first part of the match, then give me any amount of words and letters after it )
Regex had had enough and put an end to that nonsense and crapped out. So I was wondering if anyone on stackoverflow knew or would have any hints on how to say my goal to the Ruby Regex parser.
EDIT: Thanks everyone, this is the pattern I ended up using after creating it with rubular.
ingredientNameRegex = /^_+([^(]*)/;
Everything got better once I took a deep breath, and thought about what I was trying to say.
str = "_____ table salt (1) [F]"
p str[ /_{3}\s(.+?)\s+\(/, 1 ]
#=> "table salt"
That says:
Find at least three underscores
and a whitespace character (\s)
and then one or more (+) of any character (.), but as little as possible (?), up until you find
one or more whitespace characters,
and then a literal (
The parens in the middle save that bit, and the 1 pulls it out.
Try this: ^[_]+([^(]*)\(
It will match lines starting with one or more underscores followed by anything not equal to an opening bracket: http://rubular.com/r/vthpGpVr4y
Here's working regex:
str = "_____ table salt (1) [F]"
match = str.match(/_([^_]+?)\(/)
p match[1].strip # => "table salt"
You could use
^_____\s*([^(]+?)\s*\(
^_____ match the underscore from the beginning of string
\s* matches any whitespace character
( grouping start
[^(]+ matches all non ( character at least once
? matches the shortest possible string (non greedy)
) grouping end
\s* matches any whitespace character
\( find the (
"_____ table salt (1) [F]".gsub(/[_]\s(.+)\s\(/, ' >>>\1<<< ')
# => "____ >>>table salt<<< 1) [F]"
It seems to me the simplest regex to do what you want is:
/^_____ ([\w\s]+) /
That says:
leading underscores, space, then capture any combination of word chars or spaces, then another space.

How to remove the first 4 characters from a string if it matches a pattern in Ruby

I have the following string:
"h3. My Title Goes Here"
I basically want to remove the first four characters from the string so that I just get back:
"My Title Goes Here".
The thing is I am iterating over an array of strings and not all have the h3. part in front so I can't just ditch the first four characters blindly.
I checked the docs and the closest thing I could find was chomp, but that only works for the end of a string.
Right now I am doing this:
"h3. My Title Goes Here".reverse.chomp(" .3h").reverse
This gives me my desired output, but there has to be a better way. I don't want to reverse a string twice for no reason. Is there another method that will work?
To alter the original string, use sub!, e.g.:
my_strings = [ "h3. My Title Goes Here", "No h3. at the start of this line" ]
my_strings.each { |s| s.sub!(/^h3\. /, '') }
To not alter the original and only return the result, remove the exclamation point, i.e. use sub. In the general case you may have regular expressions that you can and want to match more than one instance of, in that case use gsub! and gsub—without the g only the first match is replaced (as you want here, and in any case the ^ can only match once to the start of the string).
You can use sub with a regular expression:
s = 'h3. foo'
s.sub!(/^h[0-9]+\. /, '')
puts s
Output:
foo
The regular expression should be understood as follows:
^ Match from the start of the string.
h A literal "h".
[0-9] A digit from 0-9.
+ One or more of the previous (i.e. one or more digits)
\. A literal period.
A space (yes, spaces are significant by default in regular expressions!)
You can modify the regular expression to suit your needs. See a regular expression tutorial or syntax guide, for example here.
A standard approach would be to use regular expressions:
"h3. My Title Goes Here".gsub /^h3\. /, '' #=> "My Title Goes Here"
gsub means globally substitute and it replaces a pattern by a string, in this case an empty string.
The regular expression is enclosed in / and constitutes of:
^ means beginning of the string
h3 is matched literally, so it means h3
\. - a dot normally means any character so we escape it with a backslash
is matched literally

Resources