Ruby Regex to eliminate non word characters - ruby

Hello I would like to eliminate non words characters by a Regex in Ruby.
Let's say that I have:
pal1 = "a#b?a"
pal1 = /[a-z0-9]/.match(pal1)
When I put this in http://www.rubular.com/, it says that the Match result is:
aba
But whe I run the code in my ruby it is not true, it gives only "a"
How can I change my Regex to achieve aba in pal1.
Thanks in advance for your time.

You can use gsub to remove these characters.
pal1 = 'a#b?a'
pal1.gsub(/[^a-z0-9]/i, '')
# => "aba"
You can also use scan to match these characters and join them together.
pal1 = 'a#b?a'
pal1.scan(/[a-z0-9]/i).join
# => "aba"

You can do either of:
pal1.gsub!( /[^a-z\d]/i, '' ) # Kill all characters that don't match
pal1 = pal1.scan(/[a-z\d]/i).join # Find all the matching characters as array
# and then join them all into one string.

Related

extract number in string with regex in ruby

I have this string
url = "#AppDashboardPlace:p=io.transporterapp.deep.test1&appid=4975603106871514996"
I would like to get 4975603106871514996
I have tried that
url.to_s[/\appid\=(.*?)\&/, 1]
=> nil
Your regex doesn't match because there's no & after the appid value. Try this:
url.to_s[/appid=(\d+)/,1]
If you left the matching part as .*? with nothing after it, it would match the minimum amount of the string possible, which is the empty string. If you know that the appid is the very end of the string, then you could use .* without the ?, but it's best to be precise and specify that what you're looking for is a series of one or more (+) decimal digits (\d).
You could use String#match with the \d regex matcher, for matching on \d+, which means one or more digit.
url = "#AppDashboardPlace:p=io.transporterapp.deep.test1&appid=4975603106871514996"
match = url.match(/appid\=(\d+)/)
# => #<MatchData "appid=4975603106871514996" 1:"4975603106871514996">
puts match[0]
# => "appid=4975603106871514996"
puts match[1]
# => "4975603106871514996"

Regex to grab full firstname and first letter of last name

I have a list of users grabbed by the Etc Ruby library:
Thomas_J_Perkins
Jennifer_Scanner
Amanda_K_Loso
Aaron_Cole
Mark_L_Lamb
What I need to do is grab the full first name, skip the middle name (if given), and grab the first character of the last name. The output should look like this:
Thomas P
Jennifer S
Amanda L
Aaron C
Mark L
I'm not sure how to do this, I've tried grabbing all of the characters: /\w+/ but that will grab everything.
You don't always need regular expressions.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. Jamie Zawinski
You can do it with some simple Ruby code
string = "Mark_L_Lamb"
string.split('_').first + ' ' + string.split('_').last[0]
=> "Mark L"
I think its simpler without regex:
array = "Thomas_J_Perkins".split("_") # split at _
array.first + " " + array.last[0] # .first prints first name .last[0] prints first char of last name
#=> "Thomas P"
You can use
^([^\W_]+)(?:_[^\W_]+)*_([^\W_])[^\W_]*$
And replace with \1_\2. See the regex demo
The [^\W_] matches a letter or a digit. If you want to only match letters, replace [^\W_] with \p{L}.
^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$
See updated demo
The point is to match and capture the first chunk of letters up to the first _ (with (\p{L}+)), then match 0+ sequences of _ + letters inside (with (?:_\p{L}+)*_) and then match and capture the last word first letter (with (\p{L})) and then match the rest of the string (with \p{L}*).
NOTE: replace ^ with \A and $ with \z if you have independent strings (as in Ruby ^ matches the start of a line and $ matches the end of the line).
Ruby code:
s.sub(/^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$/, "\\1_\\2")
I'm in the don't-use-a-regex-for-this camp.
str1 = "Alexander_Graham_Bell"
str2 = "Sylvester_Grisby"
"#{str1[0...str1.index('_')]} #{str1[str1.rindex('_')+1]}"
#=> "Alexander B"
"#{str2[0...str2.index('_')]} #{str2[str2.rindex('_')+1]}"
#=> "Sylvester G"
or
first, last = str1.split(/_.+_|_/)
#=> ["Alexander", "Bell"]
first+' '+last[0]
#=> "Alexander B"
first, last = str2.split(/_.+_|_/)
#=> ["Sylvester", "Grisby"]
first+' '+last[0]
#=> "Sylvester G"
but if you insist...
r = /
(.+?) # match any characters non-greedily in capture group 1
(?=_) # match an underscore in a positive lookahead
(?:.*) # match any characters greedily in a non-capture group
(?:_) # match an underscore in a non-capture group
(.) # match any character in capture group 2
/x # free-spacing regex definition mode
str1 =~ r
$1+' '+$2
#=> "Alexander B"
str2 =~ r
$1+' '+$2
#=> "Sylvester G"
You can of course write
r = /(.+?)(?=_)(?:.*)(?:_)(.)/
This is my attempt:
/([a-zA-Z]+)_([a-zA-Z]+_)?([a-zA-Z])/
See demo
Let's see if this works:
/^([^_]+)(?:_\w)?_(\w)/
And then you'll have to combine the first and second matches into the format you want. I don't know Ruby, so I can't help you there.
And another attempt using a replacement method:
result = subject.gsub(/^([^_]+)(?:_[^_])?_([^_])[^_]+$/, '\1 \2')
We capture the entire string, with the relevant parts in capturing groups. Then just return the two captured groups
using the split method is much better
full_names.map do |full_name|
parts = full_name.split('_').values_at(0,-1)
parts.last.slice!(1..-1)
parts.join(' ')
end
/^[A-Za-z]{5,15}\s[A-Za-z]{1}]$/i
This will have the following criteria:
5-15 characters for first name then a whitespace and finally a single character for last name.

Ruby search a string for matching character pairs

I want to match character pairs in a string. Let's say the string is:
"zttabcgqztwdegqf". Both "zt" and "gq" are matching pairs of characters in the string.
The following code finds the "zt" matching pair, but not the "gq" pair:
#!/usr/bin/env ruby
string = "zttabcgqztwdegqf"
puts string.scan(/.{1,2}/).detect{ |c| string.count(c) > 1 }
The code provides matching pairs where the indices of the pairs are 0&1,2&3,4&5... but not 1&2,3&4,5&6, etc:
zt
ta
bc
gq
zt
wd
eg
qf
I'm not sure regex in Ruby is the best way to go. But I want to use Ruby for the solution.
You can do your search with a single regex:
puts string.scan(/(?=(.{2}).*\1)/)
regex101 demo
Output
zt
gq
Regex Breakout
(?= # Start a lookahead
(.{2}) # Search any couple of char and group it in \1
.*\1 # Search ahead in the string for another \1 to validate
) # Close lookahead
Note
Putting all the checks inside lookahead assure the regex engine does not consume the couple when validates it.
So it also works with overlapping couples like in the string abcabc: the output will correctly be ab,bc.
Oddity
If the regex engine does not consume the chars how it can reach the end of the string?
Internally after the check Onigmo (the ruby regex engine) makes one step further automatically. Most regex flavours behaves in this way but e.g. the javascript engine needs the programmer to increment the last match index manually.
str = "ztcabcgqzttwtcdegqf"
r = /
(.) # match any character in capture group 1
(?= # begin a positive lookahead
(.) # match any character in capture group 2
.+ # match >= 1 characters
\1 # match capture group 1
\2 # match capture group 2
) # close positive lookahead
/x # extended/free-spacing regex definition mode
str.scan(r).map(&:join)
#=> ["zt", "tc", "gq"]
Here is one way to do this without using regex:
string = "zttabcgqztwdegqf"
p string.split('').each_cons(2).map(&:join).select {|i| string.scan(i).size > 1 }.uniq
#=> ["zt", "gq"]

Regex to find strings with only letters or numbers or both

I am searching for strings with only letters or numbers or both. How could I write a regex for that?
You can use following regex to check if the string contains letters and/or numbers
^[a-zA-Z0-9]+$
Explanation
^: Starts with
[]: Character class
a-zA-Z: Matches any alphabet
0-9: Matches any number
+: Matches previous characters one or more time
$: Ends with
RegEx101 Demo
"abc&#*(2743438" !~ /[^a-z0-9]/i # => false
"abc2743438" !~ /[^a-z0-9]/i # => true
This example let to avoid multiline anchors use (^ or $) (which may present a security risk) so it's better to use \A and \z, or to add the :multiline => true option in Rails.
Only letters and numbers:
/\A[a-zA-Z0-9]+\z/
Or if you want to leave - and _ chars also:
/\A[a-zA-Z0-9_\-]+\z/

How do you capture part of a regex to a variable in Ruby?

I know about "string"[/regex/], which returns the part of the string that matches. But what if I want to return only the captured part(s) of a string?
I have the string "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3". I want to store in the variable title the text The_Case_of_the_Gold_Ring.
I can capture this part with the regex /\d_(?!.*\d_)(.*).mp3$/i. But writing the Ruby "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"[/\d_(?!.*\d_)(.*).mp3$/i] returns 0_The_Case_of_the_Gold_Ring.mp3 which isn't what I want.
I can get what I want by writing
"1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" =~ /\d_(?!.*\d_)(.*).mp3$/i
title = $~.captures[0]
But this seems sloppy. Surely there's a proper way to do this?
(I'm aware that someone can probably write a simpler regex to target the text I want that lets the "string"[/regex/] method work, but this is just an example to illustrate the problem, the specific regex isn't the issue.)
You can pass number of part to [/regexp/, index] method:
=> string = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
=> string[/\d_(?!.*\d_)(.*).mp3$/i, 1]
=> "The_Case_of_the_Gold_Ring"
=> string[/\d_(?!.*\d_)(.*).mp3$/i, 0]
=> "0_The_Case_of_the_Gold_Ring.mp3"
Have a look at the match method:
string = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
regexp = /\d_(?!.*\d_)(.*).mp3$/i
matches = regexp.match(string)
matches[1]
#=> "The_Case_of_the_Gold_Ring"
Where matches[0] would return the whole match and matches[1] (and following) returns all subcaptures:
matches.to_a
#=> ["0_The_Case_of_the_Gold_Ring.mp3", "The_Case_of_the_Gold_Ring"]
Read more examples: http://ruby-doc.org/core-2.1.4/MatchData.html#method-i-5B-5D
You can use named captures
"1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" =~ /\d_(?!.*\d_)(?<title>.*).mp3$/i
and $~[:title] will give you want you want
Meditate on this:
Here's the source string to be parsed:
str = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
Patterns can be defined as strings:
DATE_REGEX = '\d{4}-[A-Z]{3}-\d{2}'
SERIAL_REGEX = '\d{2}'
TITLE_REGEX = '.+'
Then interpolated into a regexp:
regex = /^(#{ DATE_REGEX })_(#{ SERIAL_REGEX })_(#{ TITLE_REGEX })/
# => /^(\d{4}-[A-Z]{3}-\d{2})_(\d{2})_(.+)/
The advantage to that is it's easier to maintain because the pattern is really several smaller ones.
str.match(regex) # => #<MatchData "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" 1:"1952-FEB-21" 2:"70" 3:"The_Case_of_the_Gold_Ring.mp3">
regex.match(str) # => #<MatchData "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" 1:"1952-FEB-21" 2:"70" 3:"The_Case_of_the_Gold_Ring.mp3">
are equivalent because both Regexp and String implement match.
We can retrieve what was captured as an array:
regex.match(str).captures # => ["1952-FEB-21", "70", "The_Case_of_the_Gold_Ring.mp3"]
regex.match(str).captures.last # => "The_Case_of_the_Gold_Ring.mp3"
We can also name the captures and access them like we would a hash:
regex = /^(?<date>#{ DATE_REGEX })_(?<serial>#{ SERIAL_REGEX })_(?<title>#{ TITLE_REGEX })/
matches = regex.match(str)
matches[:date] # => "1952-FEB-21"
matches[:serial] # => "70"
matches[:title] # => "The_Case_of_the_Gold_Ring.mp3"
Of course, it's not necessary to mess with that rigamarole at all. We can split the string on underscores ('_'):
str = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
str.split('_') # => ["1952-FEB-21", "70", "The", "Case", "of", "the", "Gold", "Ring.mp3"]
split can take a limit parameter saying how many times it should split the string. Passing in 3 gives us:
str.split('_', 3) # => ["1952-FEB-21", "70", "The_Case_of_the_Gold_Ring.mp3"]
Grabbing the last element returns:
str.split('_', 3).last # => "The_Case_of_the_Gold_Ring.mp3"
I believe it would be easiest to use a capture group here, but I'd like to present some possibilities that do not, for illustrative purposes. All employ the same positive lookahead ((?=\.mp3$)). all but one use a positive lookbehind and one uses \K to "forget" the match up to the last character before beginning of the desired match. Some permit the matched string to contain digits (.+); others do not ([^\d]).
str = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
1 # match follows last digit followed by underscore, cannot contain digits
str[/(?<=\d_)[^\d]+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"
2 # same as 1, as `\K` disregards match to that point
str[/\d_\K[^\d]+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"
3 # match follows underscore, two digits, underscore, may contain digits
str[/(?<=_\d\d_).+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"
4 # match follows string having specfic pattern, may contain digits
str[/(?<=\d{4}-[A-Z]{3}-\d{2}_\d{2}_).+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"
5 # match follows digit, any 12 characters, another digit and underscore,
# may contain digits
str[/(?<=\d.{12}\d_).+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"

Resources