How to scan for substrings with specific characters in them - ruby

This is a follow-up to this question. How to scan and return a set of words with specific characters in them in Ruby
We want to scan for words starting with a certain set of letters and then return them in an array. Something like this:
b="h ARCabc s and other ARC12".scan(/\w+ARC*\w+/)
and get back:
["ARCabc","ARC12"]
How would I do this (and I know this is very similar to what I asked yesterday)?

Just use the following regex:
\bARC\w*\b
or (to exclude underscores from matching)
\bARC[[:alnum:]]*\b
See regex demo
The regex matches:
\b - a word boundary (ARC at the start of a word only)
ARC - a fixed sequence of characters
\w* - 0 or more letter, digits or underscores. NOTE: if you only want to limit the matches to letters and digits, replace this \w* with [[:alnum:]]*.
\b - end of word (trailing) boundary.
See IDEONE demo here (output: ARCabc and ARC12).
NOTE2: If you plan to match Unicode strings, consider using either of the following regexps:
\bARC\p{Word}*\b - this variation will match words with underscores after ARC
\bARC[\p{L}\p{M}\d]*\b - this regex will match words that only have digits and Unicode letters after ARC.

For good readability, you could split the string into words and then select the ones you want:
str = "h ARCabc s and other ARC12"
target = "ARC"
str.split.select { |w| w.include?(target) }
#=> ["ARCabc", "ARC12"]
If the words must begin with target:
str.split.select { |w| w.start_with?(target) }

Related

Regex to select all the commas from string that do not have any white space around them

I want to select all the commas in a string that do not have any white space around. Suppose I have this string:
"He,she, They"
I want to select only the comma between he and she. I tried this in rubular and came up with this regex:
(,[^(,\s)(\s,)])
This selects the comma that I want, but also selects an s which is a character after it.
In your regex (,[^(,\s)(\s,)]) you capture a comma followed by a negated character class that matches not any of the specified characters, which could also be written as (,[^)(,\s]) which will capture for example ,s in a group,
What you could do is use a positive lookahead and a positve lookbehind to check what is on the left and what is on the right is not a \S whitespace character:
(?<=\S),(?=\S)
Regex demo
In Ruby, you may use [[:space:]] to match any (Unicode) whitespace and [^[:space:]] to match any char other than whitespace. Using these character classes inside lookarounds solves the problem:
/(?<=[^[:space:]]),(?=[^[:space:]])/
See the Rubular demo
Here,
(?<=[^[:space:]]) - a positive lookbehind that matches a location that is immediately preceded with a non-whitespace char (if the string start position should also be matched, replace with (?<![[:space:]]))
, - a comma
(?=[^[:space:]]) - a positive lookahead that matches a location that is immediately followed with a non-whitespace char (if the string end position should also be matched, replace with (?![[:space:]])).
Check the regex below and use the code hope it will help you!
re = /[^\s](,)[^\s]/m
str = 'check ,my,domain, qwe,sd'
# Print the match result
str.scan(re) do |match|
puts match.to_s
end
Check LIVE DEMO HERE

How do I match a regex in which the next non-space character is not a "/"?

How do I express in regex the letter "s" whose next non-space character is not a "/"?
These should match: "s", "str"
These should not: "s/m", "s /n"
I tried this
"str" =~ /s[^[[:space:]]]^\// #=> nil
but it does not even match the simple use case.
It seems you need to match any s that is not followed with any 0+ whitespace chars and a / after them.
Use
/s(?![[:space:]]*\/)/
See the Rubular demo.
Details
s - the letter s
(?![[:space:]]*\/) - a negative lookahead that fails the match if, immediately to the right of the current location, there are
[[:space:]]* - 0+ whitespaces
\/ - a /.
If you merely want to know the number of 's' characters that are not followed by zero or more spaces and then a forward slash (as opposed to their indices in the string), you don't have to use a regular expression.
"sea shells /by the sea s/hore".delete(" ").gsub("s/", "").count("s")
#=> 3
If you only want to know if there is at least one such 's' you could replace count("s") with include?("s").
I'm not arguing that this is preferable to the use of a regular expression.

Splitting the content of brackets without separating the brackets ruby

I am currently working on a ruby program to calculate terms. It works perfectly fine except for one thing: brackets. I need to filter the content or at least, to put the content into an array, but I have tried for an hour to come up with a solution. Here is my code:
splitted = term.split(/\(+|\)+/)
I need an array instead of the brackets, for example:
"1-(2+3)" #=>["1", "-", ["2", "+", "3"]]
I already tried this:
/(\((?<=.*)\))/
but it returned:
Invalid pattern in look-behind.
Can someone help me with this?
UPDATE
I forgot to mention, that my program will split the term, I only need the content of the brackets to be an array.
If you need to keep track of the hierarchy of parentheses with arrays, you won't manage it just with regular expressions. You'll need to parse the string word by word, and keep a stack of expressions.
Pseudocode:
Expressions = new stack
Add new array on stack
while word in string:
if word is "(": Add new array on stack
Else if word is ")": Remove the last array from the stack and add it to the (next) last array of the stack
Else: Add the word to the last array of the stack
When exiting the loop, there should be only one array in the stack (if not, you have inconsistent opening/closing parentheses).
Note: If your ultimate goal is to evaluate the expression, you could save time and parse the string in Postfix aka Reverse-Polish Notation.
Also consider using off-the-shelf libraries.
A solution depends on the pattern you expect between the parentheses, which you have not specified. (For example, for "(st12uv)" you might want ["st", "12", "uv"], ["st12", "uv"], ["st1", "2uv"] and so on). If, as in your example, it is a natural number followed by a +, followed by another natural number, you could do this:
str = "1-( 2+ 3)"
r = /
\(\s* # match a left parenthesis followed by >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
(\+) # match a plus sign in a capture group
\s* # match >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
\) # match a right parenthesis
/x
str.scan(r0).first
=> ["2", "+", "3"]
Suppose instead + could be +, -, * or /. Then you could change:
(\+)
to:
([-+*\/])
Note that, in a character class, + needn't be escaped and - needn't be escaped if it is the first or last character of the class (as in those cases it would not signify a range).
Incidentally, you received the error message, "Invalid pattern in look-behind" because Ruby's lookarounds cannot contain variable-length matches (i.e., .*). With positive lookbehinds you can get around that by using \K instead. For example,
r = /
\d+ # match one or more digits
\K # forget everything previously matched
[a-z]+ # match one or more lowercase letters
/x
"123abc"[r] #=> "abc"

Regex Replacing Everything But Specific String Regex

How using regex would I take a string like "ratings-small star rating-4 field_stars_rating csm_review" and using gsub have it only return "rating-4", where 4 could be any digit? Anything I use replaces only partial bits
gsub is the wrong choice here. It would make much more sense to do something like this:
"ratings-small star rating-4 field_stars_rating csm_review".match(/\brating-\d\b/).to_s
Because you're looking for a specific part of the string, it makes more sense to search directly for that.
To just get the number after the hyphen, use this:
"ratings-small star rating-4 field_stars_rating csm_review".match(/\brating-(\d)\b/)[0]
Since you are trying to keep a bit of the string, instead of thinking how you can remove anything else to leave only the interesting bit, you should think how to extract the relevant part of the string. The String#[] method with a regexp argument would be my choice:
string = "ratings-small star rating-4 field_stars_rating csm_review"
string[/\brating-\d\b/]
# => "rating-4"
Instead of trying to replace everything up to the position of the word or after the position of the digit you want matched, a better approach would be to match that subpattern throughout your string.
string.match(/\b[a-z]+-\d+\b/i)
Explanation:
A word boundary does not consume any characters. It asserts that on one side there is a word character, and on the other side there is not.
\b # the boundary between a word char (\w) and not a word char
[a-z]+ # any character of: 'a' to 'z' (1 or more times)
- # '-'
\d+ # digits (0-9) (1 or more times)
\b # the boundary between a word char (\w) and not a word char
I wouldn't go with pure regex for this as it would make it pretty hard to read:
string = "ratings-small star rating-4 field_stars_rating csm_review"
string.split.select {|s| s =~ /^rating-\d$/}.join(' ')
If you expect only one element:
string[/\brating-\d\b/]

Using Regexp to check whether a string starts with a consonant

Is there a better way to write the following regular expression in Ruby? The first regex matches a string that begins with a (lower case) consonant, the second with a vowel.
I'm trying to figure out if there's a way to write a regular expression that matches the negative of the second expression, versus writing the first expression with several ranges.
string =~ /\A[b-df-hj-np-tv-z]/
string =~ /\A[aeiou]/
The statement
$string =~ /\A[^aeiou]/
will test whether the string starts with a non-vowel character, which includes digits, punctuation, whitespace and control characters. That is fine if you know beforehand that the string begins with a letter, but to check that it starts with a consonant you can use forward look-ahead to test that it starts with both a letter and a non-vowel, like this
$string =~ /\A(?=[^aeiou])(?=[a-z])/i
To match an arbitrary number of consonants, you can use the sub-expression (?i:(?![aeiou])[a-z]) to match a consonant. It is atomic, so you can put a repetition count like {3} right after it. For example, this program finds all the strings in a list that contain three consonants in a row
list = %w/ aab bybt xeix axei AAsE SAEE eAAs xxsa Xxsr /
puts list.select { |word| word =~ /\A(?i:(?![aeiou])[a-z]){3}/ }
output
bybt
xxsa
Xxsr
I modified the answer provided by #Alexander Cherednichenko in order to get rid of the if statements.
/^[^aeiou\W]/i.match(s) != nil
If you want to catch a string that doesn't start with vowels, but only starts with consonants you can use this code below. It returns true if a string starts with any letter other than A, E, I, O, U. s is any string we give to a function
if /^[^aeiou\W]/i.match(s) == nil
return false
else
return true
end
i added at the end to make regular expression case insensitive.
\W is used to catch any non-word character, for example if a string starts with a digit like: "1something"
[^aeiou] means a range of character except a e i o u
And we put ^ at the beginning before [ to indicate that the following range [^aeiou\W] if for the 1st character
Note that ^[^aeiou\W] pattern is not correct because it also matches a line that starts with a digit, or underscore. Borodin's solution is working well, but there is one more possible solution without lookaheads, based on character class subtraction (more here) and using the more contemporary Regexp#match?:
/\A[a-z&&[^aeiou]]/i.match?(word)
See the Rubular demo.
Details
\A - start of a string (^ in Ruby is start of any line)
[a-z&&[^aeiou]] - an a-z character range matching any ASCII letter (/i flag makes it case insensitive) except for the aeiou chars.
See the Ruby demo:
test = %w/ 1word _word ball area programming /
puts test.select { |w| /\A[a-z&&[^aeiou]]/i.match?(w) }
# => ['ball', 'programming']

Resources