Ruby: Understanding Regex - ruby

I am trying to write a script that will match all words in a string and will strip all non words (IE: . [dot], ampersands, colons, etc) out and will replace them with a hyphen.
Example String:
L. L. Cool J & Sons: The Cool Kats
Example Output:
L-L-Cool-J-Sons-The-Cool-Kats
Here is some code I am working with:
str = "L. L. Cool J & Sons: The Cool Kats"
str.scan(/\w+/)
Thanks for all the help! I am still pretty new to regex

In a single line, find all bits of text that aren't "word characters" and replace with a dash:
str.gsub(/\W+/, '-')
Note that "word characters" includes numbers and underscores. To just allow letters you could use the following:
str.gsub(/[^A-Za-z]+/, '-')

Update: I just noticed that the two calls can be expressed as one:
str.gsub(/\W+/, '-')
=> "L-L-Cool-J-Sons-The-Cool-Kats"
...which results to the same as Narendra's answer or my original answer:
# 1st gsub: replace all non-words with hyphens
# 2nd gsub: replace multiple hyphens with a single one
str.gsub(/\W/,'-').gsub(/-+/, '-')
=> "L-L-Cool-J-Sons-The-Cool-Kats"

str.gsub(/\W/, '-') #=> replaces all non-words with space
# OR
str.gsub(/[^A-Za-z\s]/, ' ') #=> replace all non letters/spaces with space
str.gsub(/\s+/, '-') #=> replaces all groups of spaces to hypens
Input:
L. L. Cool J & Sons: The Cool Kats
Output:
L-L-Cool-J-Sons-The-Cool-Kats

You can use \W for non word characters. It should do your job. \w stands for word characters. I don't work much with ruby but it should look something like this
result = subject.gsub(/\W+/, '-') .

Related

Delete all the whitespaces that occur after a word in ruby

I have a string " hello world! How is it going?"
The output I need is " helloworld!Howisitgoing?"
So all the whitespaces after hello should be removed. I am trying to do this in ruby using regex.
I tried strip and delete(' ') methods but I didn't get what I wanted.
some_string = " hello world! How is it going?"
some_string.delete(' ') #deletes all spaces
some_string.strip #removes trailing and leading spaces only
Please help. Thanks in advance!
There are numerous ways this could be accomplished without without a regular expressions, but using them could be the "cleanest" looking approach without taking sub-strings, etc. The regular expression I believe you are looking for is /(?!^)(\s)/.
" hello world! How is it going?".gsub(/(?!^)(\s)/, '')
#=> " helloworld!Howisitgoing?"
The \s matched any whitespace character (including tabs, etc), and the ^ is an "anchor" meaning the beginning of the string. The ! indicates to reject a match with following criteria. Using those together to your goal can be accomplished.
If you are not familiar with gsub, it is very similar to replace, but takes a regular expression. It additionally has a gsub! counter-part to mutate the string in place without creating a new altered copy.
Note that strictly speaking, this isn't all whitespace "after a word" to quote the exact question, but I gathered from your examples that your intentions were "all whitespace except beginning of string", which this will do.
def remove_spaces_after_word(str, word)
i = str.index(/\b#{word}\b/i)
return str if i.nil?
i += word.size
str.gsub(/ /) { Regexp.last_match.begin(0) >= i ? '' : ' ' }
end
remove_spaces_after_word("Hey hello world! How is it going?", "hello")
#=> "Hey helloworld!Howisitgoing?"

How to remove strings that end with a particular character in Ruby

Based on "How to Delete Strings that Start with Certain Characters in Ruby", I know that the way to remove a string that starts with the character "#" is:
email = email.gsub( /(?:\s|^)#.*/ , "") #removes strings that start with "#"
I want to also remove strings that end in ".". Inspired by "Difference between \A \z and ^ $ in Ruby regular expressions" I came up with:
email = email.gsub( /(?:\s|$).*\./ , "")
Basically I used gsub to remove the dollar sign for the carrot and reversed the order of the part after the closing parentheses (making sure to escape the period). However, it is not doing the trick.
An example I'd like to match and remove is:
"a8&23q2aas."
You were so close.
email = email.gsub( /.*\.\s*$/ , "")
The difference lies in the fact that you didn't consider the relationship between string of reference and the regex tokens that describe the condition you wish to trigger. Here, you are trying to find a period (\.) which is followed only by whitespace (\s) or the end of the line ($). I would read the regex above as "Any characters of any length followed by a period, followed by any amount of whitespace, followed by the end of the line."
As commenters pointed out, though, there's a simpler way: String#end_with?.
I'd use:
words = %w[#a day in the life.]
# => ["#a", "day", "in", "the", "life."]
words.reject { |w| w.start_with?('#') || w.end_with?('.') }
# => ["day", "in", "the"]
Using a regex is overkill for this if you're only concerned with the starting or ending character, and, in fact, regular expressions will slow your code in comparison with using the built-in methods.
I would really like to stick to using gsub....
gsub is the wrong way to remove an element from an array. It could be used to turn the string into an empty string, but that won't remove that element from the array.
def replace_suffix(str,suffix)
str.end_with?(suffix)? str[0, str.length - suffix.length] : str
end

how to prepend each letter with a symbol using Regex

I have a string time format like this: d-m-Y H:i. I want to format it like this: %d-%m-%Y %H:%i.
How do I prepend each letter with % using regular expressions?
This is pretty basic using String#gsub:
str = "d-m-Y H:i"
str.gsub(/[a-z]/i, '%\0')
# => "%d-%m-%Y %H:%i"
In the replacement string '%\0', \0 represents the entire match, which in this case is the matched letter, so this says, "Replace each letter with a % followed by the letter."
sorted 'd-m-Y H:i'.gsub(/[a-zA-Z]+/) { |sym| "%#{sym}" }
'd-m-Y H:i'.gsub(/(?=[a-z])/i, '%')
#=> "%d-%m-%Y %H:%i"
This reads, "replace every empty string followed by a lowercase or uppercase letter with the character '%'". (?=[a-z]) is a positive lookahead.

Regex to grab full firstname and first letter of last name

I have a list of users grabbed by the Etc Ruby library:
Thomas_J_Perkins
Jennifer_Scanner
Amanda_K_Loso
Aaron_Cole
Mark_L_Lamb
What I need to do is grab the full first name, skip the middle name (if given), and grab the first character of the last name. The output should look like this:
Thomas P
Jennifer S
Amanda L
Aaron C
Mark L
I'm not sure how to do this, I've tried grabbing all of the characters: /\w+/ but that will grab everything.
You don't always need regular expressions.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. Jamie Zawinski
You can do it with some simple Ruby code
string = "Mark_L_Lamb"
string.split('_').first + ' ' + string.split('_').last[0]
=> "Mark L"
I think its simpler without regex:
array = "Thomas_J_Perkins".split("_") # split at _
array.first + " " + array.last[0] # .first prints first name .last[0] prints first char of last name
#=> "Thomas P"
You can use
^([^\W_]+)(?:_[^\W_]+)*_([^\W_])[^\W_]*$
And replace with \1_\2. See the regex demo
The [^\W_] matches a letter or a digit. If you want to only match letters, replace [^\W_] with \p{L}.
^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$
See updated demo
The point is to match and capture the first chunk of letters up to the first _ (with (\p{L}+)), then match 0+ sequences of _ + letters inside (with (?:_\p{L}+)*_) and then match and capture the last word first letter (with (\p{L})) and then match the rest of the string (with \p{L}*).
NOTE: replace ^ with \A and $ with \z if you have independent strings (as in Ruby ^ matches the start of a line and $ matches the end of the line).
Ruby code:
s.sub(/^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$/, "\\1_\\2")
I'm in the don't-use-a-regex-for-this camp.
str1 = "Alexander_Graham_Bell"
str2 = "Sylvester_Grisby"
"#{str1[0...str1.index('_')]} #{str1[str1.rindex('_')+1]}"
#=> "Alexander B"
"#{str2[0...str2.index('_')]} #{str2[str2.rindex('_')+1]}"
#=> "Sylvester G"
or
first, last = str1.split(/_.+_|_/)
#=> ["Alexander", "Bell"]
first+' '+last[0]
#=> "Alexander B"
first, last = str2.split(/_.+_|_/)
#=> ["Sylvester", "Grisby"]
first+' '+last[0]
#=> "Sylvester G"
but if you insist...
r = /
(.+?) # match any characters non-greedily in capture group 1
(?=_) # match an underscore in a positive lookahead
(?:.*) # match any characters greedily in a non-capture group
(?:_) # match an underscore in a non-capture group
(.) # match any character in capture group 2
/x # free-spacing regex definition mode
str1 =~ r
$1+' '+$2
#=> "Alexander B"
str2 =~ r
$1+' '+$2
#=> "Sylvester G"
You can of course write
r = /(.+?)(?=_)(?:.*)(?:_)(.)/
This is my attempt:
/([a-zA-Z]+)_([a-zA-Z]+_)?([a-zA-Z])/
See demo
Let's see if this works:
/^([^_]+)(?:_\w)?_(\w)/
And then you'll have to combine the first and second matches into the format you want. I don't know Ruby, so I can't help you there.
And another attempt using a replacement method:
result = subject.gsub(/^([^_]+)(?:_[^_])?_([^_])[^_]+$/, '\1 \2')
We capture the entire string, with the relevant parts in capturing groups. Then just return the two captured groups
using the split method is much better
full_names.map do |full_name|
parts = full_name.split('_').values_at(0,-1)
parts.last.slice!(1..-1)
parts.join(' ')
end
/^[A-Za-z]{5,15}\s[A-Za-z]{1}]$/i
This will have the following criteria:
5-15 characters for first name then a whitespace and finally a single character for last name.

Ruby Regular expression not matching properly

I am trying to creat a RegEx to find words that contains any vowel.
so far i have tried this
/(.*?\S[aeiou].*?[\s|\.])/i
but i have not used RegEx much so its not working properly.
for example if i input "test is 1234 and sky fly test1234"
it should match test , is, and, test1234 but showing
test, is,1234 and
if put something else then different output.
Alternatively you can also do something like:
"test is 1234 and sky fly test1234".split.find_all { |a| a =~ /[aeiou]/ }
# => ["test", "is", "and", "test1234"]
You could use the below regex.
\S*[aeiou]\S*
\S* matches zero or more non-space characters.
or
\w*[aeiou]\w*
It will solve:
\b\w*[aeiou]+\w*\b
https://www.debuggex.com/r/O-fU394iC5ErcSs7
or you can substitute \w by \S
\b\S*[aeiou]+\S*\b
https://www.debuggex.com/r/RNE6Y6q1q5yPJbe-
\b - a word boundary
\w - same as [_a-zA-Z0-9]
\S - a non-whitespace character
Try this:
\b\w*[aeiou]\w*\b
\b denotes a word boundry, so this regexp matches word bounty, zero or more letters, a vowel, zero or more letters and another word boundry

Resources