Regex to select all the commas from string that do not have any white space around them - ruby

I want to select all the commas in a string that do not have any white space around. Suppose I have this string:
"He,she, They"
I want to select only the comma between he and she. I tried this in rubular and came up with this regex:
(,[^(,\s)(\s,)])
This selects the comma that I want, but also selects an s which is a character after it.

In your regex (,[^(,\s)(\s,)]) you capture a comma followed by a negated character class that matches not any of the specified characters, which could also be written as (,[^)(,\s]) which will capture for example ,s in a group,
What you could do is use a positive lookahead and a positve lookbehind to check what is on the left and what is on the right is not a \S whitespace character:
(?<=\S),(?=\S)
Regex demo

In Ruby, you may use [[:space:]] to match any (Unicode) whitespace and [^[:space:]] to match any char other than whitespace. Using these character classes inside lookarounds solves the problem:
/(?<=[^[:space:]]),(?=[^[:space:]])/
See the Rubular demo
Here,
(?<=[^[:space:]]) - a positive lookbehind that matches a location that is immediately preceded with a non-whitespace char (if the string start position should also be matched, replace with (?<![[:space:]]))
, - a comma
(?=[^[:space:]]) - a positive lookahead that matches a location that is immediately followed with a non-whitespace char (if the string end position should also be matched, replace with (?![[:space:]])).

Check the regex below and use the code hope it will help you!
re = /[^\s](,)[^\s]/m
str = 'check ,my,domain, qwe,sd'
# Print the match result
str.scan(re) do |match|
puts match.to_s
end
Check LIVE DEMO HERE

Related

How do I match a regex in which the next non-space character is not a "/"?

How do I express in regex the letter "s" whose next non-space character is not a "/"?
These should match: "s", "str"
These should not: "s/m", "s /n"
I tried this
"str" =~ /s[^[[:space:]]]^\// #=> nil
but it does not even match the simple use case.
It seems you need to match any s that is not followed with any 0+ whitespace chars and a / after them.
Use
/s(?![[:space:]]*\/)/
See the Rubular demo.
Details
s - the letter s
(?![[:space:]]*\/) - a negative lookahead that fails the match if, immediately to the right of the current location, there are
[[:space:]]* - 0+ whitespaces
\/ - a /.
If you merely want to know the number of 's' characters that are not followed by zero or more spaces and then a forward slash (as opposed to their indices in the string), you don't have to use a regular expression.
"sea shells /by the sea s/hore".delete(" ").gsub("s/", "").count("s")
#=> 3
If you only want to know if there is at least one such 's' you could replace count("s") with include?("s").
I'm not arguing that this is preferable to the use of a regular expression.

Matching the word without space and has to include certain start of the word

I am trying to match
driver. in
def fun
driver.find_element(:link_text, "Standard Menu Rates").click
driver.find_element(:id, "jpform:fromStation").send_keys("HOSUR - HSRA")
#driver.find_element(:id, "jpform:toStation").send_keys("SATUR - SRT")
So I have to written the following regular expression
^driver.
But driver. is having some space infront of the word, so it's not matching, How would I eliminate the space as well as stick to the start of the string as driver not #driver or not any other word?
Input
def fun
driver.find_element(:link_text, "Standard Menu Rates").click
driver.find_element(:id, "jpform:fromStation").send_keys("HOSUR - HSRA")
#driver.find_element(:id, "jpform:toStation").send_keys("SATUR - SRT")
output
driver.find_element(:link_text, "Standard Menu Rates").click
driver.find_element(:id, "jpform:fromStation").send_keys("HOSUR - HSRA")
And also,
I know to match those words inside the "" but how would I match those words which are outside the double quote?
Input
# 0 = {String#3546} "Policy Duration (Days)"
# 1 = {String#3547} "Related Proposal Nr."
Ouput
# 0 = {String#3546}
# 1 = {String#3547}
As per your comments, you want to match the start of the line, then any number of whitespaces on the same line, then driver and then a dot.
You need to use [[:blank:]]* (it will match any 0+ Unicode horizontal whitesdace chars). Note also, the . should be escaped to match a literal ..
Use
/^[[:blank:]]*driver\./
See the Rubular demo
Details
^ - start of a line
[[:blank:]]* - 0+ horizontal whitespace chars
driver - a literal substring
\. - a dot.
As for the second part, you may remove "..." substrings from the strings using
s.gsub(/[[:blank:]]*"[^"]*"$/, '')
See this Rubular demo
Alternatively, if you want to match a line part up to the first ", you may use
/^[^"\r\n]+/
See this Rubular demo
you can use the regex
^\s*\bdriver\.
where \b is represents a boundary. check the regex101 demo
for the 2nd part, you can replace the string inside the quotes. The remaining string would be the required string see the regex101 demo

Regex matching chars around text

I have a string with chars inside and I would like to match only the chars around a string.
"This is a [1]test[/1] string. And [2]test[/2]"
Rubular http://rubular.com/r/f2Xwe3zPzo
Currently, the code in the link matches the text inside the special chars, how can I change it?
Update
To clarify my question. It should only match if the opening and closing has the same number.
"[2]first[/2] [1]second[/2]"
In the code above, only first should match and not second. The text inside the special chars (first), should be ignored.
Try this:
(\[[0-9]\]).+?(\[\/[0-9]\])
Permalink to the example on Rubular.
Update
Since you want to remove the 'special' characters, try this instead:
foo = "This is a [1]test[/1] string. And [2]test[/2]"
foo.gsub /\[\/?\d\]/, ""
# => "This is a test string. And test"
Update, Part II
You only want to remove the 'special' characters when the surrounding tags match, so what about this:
foo = "This is a [1]test[/1] string. And [2]test[/2], but not [3]test[/2]"
foo.gsub /(?:\[(?<number>\d)\])(?<content>.+?)(?:\[\/\k<number>\])/, '\k<content>'
# => "This is a test string. And test, but not [3]test[/2]"
\[([0-9])\].+?\[\/\1\]
([0-9]) is a capture since it is surrounded with parentheses. The \1 tells it to use the result of that capture. If you had more than one capture, you could reference them as well, \2, \3, etc.
Rubular
You can also use a named capture, rather than \1 to make it a little less cryptic. As in: \[(?<number>[0-9])\].+?\[\/\k<number>\]
Here's a way to do it that uses the form of String#gsub that takes a block. The idea is to pull strings such as "[1]test[/1]" into the block, and there remove the unwanted bits.
str = "This is a [1]test[/1] string. And [2]test[/2], plus [3]test[/99]"
r = /
\[ # match a left bracket
(\d+) # capture one or more digits in capture group 1
\] # match a right bracket
.+? # match one or more characters lazily
\[\/ # match a left bracket and forward slash
\1 # match the contents of capture group 1
\] # match a right bracket
/x
str.gsub(r) { |s| s[/(?<=\]).*?(?=\[)/] }
#=> "This is a test string. And test, plus [3]test[/99]"
Aside: When I first heard of named capture groups, they seemed like a great idea, but now I wonder if they really make regexes easier to read than \1, \2....

Regex remove a first period

I'm trying to remove a period prior to the "#" symbol from an email. I got:
array[0][2].gsub(/\./, '').strip
which removes both periods; "an.email#test.com" becomes "anemail#testcom", while I'm looking for it to become "anemail#test.com". I can't remove just the single period by itself. What am I doing wrong?
If there are no periods before # or if there are more than one period, you can use this regex
email = "my.very.long.email#me.com"
email.gsub(/\.(?=[^#]*\#)/, '')
# => "myverylongemail#me.com"
Regex explanation: period followed by zero or more occurrence of any character other than #, followed by an #
If only the first occurrence of a period before # has to be removed, you can use the same regex with sub instead of gsub
result = subject.gsub(/\.(?=\S+#)/, '')
Explanation
\. matches a period
the (?=\S+#) lookahead asserts that what follows is any non-whitespace chars followed by an arrobas
we replace with the empty string
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
Don't make this more complicated by trying to make it short. Just write it the way you mean it:
a, b = address.split('#')
cleaned = [a.delete('.'), b].join('#')

how to remove leading and trailing non-alphabetic characters in ruby

I want to remove any leading and trailing non-alphabetic character in my string.
for eg. ":----- pt-br:-" , i want "pt-br"
Thanks
result = subject.gsub(/\A[\d_\W]+|[\d_\W]+\Z/, '')
will remove non-letters from the start and end of the string.
\A and \Z anchor the regex at the start/end of the string (^/$ would also match after/before a newline which is probably not what you want - but that might not matter in this case);
[\d_\W]+ matches one or more digits, the underscore or anything else that is not an alphanumeric character, leaving only letters.
| is the alternation operator.
In ruby 1.9.1 :
":----- pt-br:-".partition( /[a-zA-Z](...)[a-zA-Z]/ )[1]
partition searches the pattern in the string and returns the part before it, the match, and the part after it.
result = subject.gsub(/^[^a-zA-Z]+/, '').gsub(/[^a-zA-Z]+$/, '')

Resources