Regex to leave desired string remaining and others removed - ruby

In Ruby, what regex will strip out all but a desired string if present in the containing string? I know about /[^abc]/ for characters, but what about strings?
Say I have the string "group=4&type_ids[]=2&type_ids[]=7&saved=1" and want to retain the pattern group=\d, if it is present in the string using only a regex?
Currently, I am splitting on & and then doing a select with matching condition =~ /group=\d/ on the resulting enumerable collection. It works fine, but I'd like to know the regex to do this more directly.

Simply:
part = str[/group=\d+/]
If you want only the numbers, then:
group_str = str[/group=(\d+)/,1]
If you want only the numbers as an integer, then:
group_num = str[/group=(\d+)/,1].to_i
Warning: String#[] will return nil if no match occurs, and blindly calling nil.to_i always returns 0.

You can try:
$str =~ s/.*(group=\d+).*/\1/;

Typically I wouldn't really worry too much about a complex regex. Simply break the string down into smaller parts and it becomes easier:
asdf = "group=4&type_ids[]=2&type_ids[]=7&saved=1"
asdf.split('&').select{ |q| q['group'] } # => ["group=4"]
Otherwise, you can use regex a bunch of different ways. Here's two ways I tend to use:
asdf.scan(/group=\d+/) # => ["group=4"]
asdf[/(group=\d+)/, 1] # => "group=4"

Try:
str.match(/group=\d+/)[0]

Related

How do I remove a common substring using Ruby?

I have read How do I remove substring after a certain character in a string using Ruby?. This is close, but different.
I have these emails with a mask:
email1 = 'giovanna.macedo#lojas100.com.br-215000695716b.ct.domain.com.br'
email2 = 'alvaro-neves#stockshop.com-215000695716b.ct.domain.com.br'
email3 = 'filiallojas123#filiallojas.net-215000695716b.ct.domain.com.br'
I want to remove the substrings that are after .br, .com and .net. The return must be:
email1 = 'giovanna.macedo#lojas100.com.br'
email2 = 'alvaro-neves#stockshop.com'
email3 = 'filiallojas123#filiallojas.net'
You can do that with the method String#[] with an argument that is a regular expression.
r = /.*?\.(?:rb|com|net|br)(?!\.br)/
'giovanna.macedo#lojas100.com.br-215000695716b.ct.domain.com.br'[r]
#=> "giovanna.macedo#lojas100.com.br"
'alvaro-neves#stockshop.com-215000695716b.ct.domain.com.br'[r]
#=> "alvaro-neves#stockshop.com"
'filiallojas123#filiallojas.net-215000695716b.ct.domain.com.br'[r]
#=> "filiallojas123#filiallojas.net"
The regular expression reads as follows: "Match zero or more characters non-greedily (?), follow by a period, followed by 'rb' or 'com' or 'net' or 'br', which is not followed by .br. (?!\.br) is a negative lookahead.
Alternatively the regular expression can be written in free-spacing mode to make it self-documenting:
r = /
.*? # match zero or more characters non-greedily
\. # match '.'
(?: # begin a non-capture group
rb # match 'rb'
| # or
com # match 'com'
| # or
net # match 'net'
| # or
br # match 'br'
) # end non-capture group
(?! # begin a negative lookahead
\.br # match '.br'
) # end negative lookahead
/x # invoke free-spacing regex definition mode
This should work for your scenario:
expr = /^(.+\.(?:br|com|net))-[^']+(')$/
str = "email = 'giovanna.macedo#lojas100.com.br-215000695716b.ct.domain.com.br'"
str.gsub(expr, '\1\2')
Use the String#delete_suffix Method
This was tested with Ruby 3.0.2. Your mileage may vary with other versions that don't support String#delete_suffix or its related bang method. Since you're trying to remove the exact same suffix from all your emails, you can simply invoke #delete_suffix! on each of your strings. For example:
common_suffix = "-215000695716b.ct.domain.com.br".freeze
emails = [email1, email2, email3]
emails.each { _1.delete_suffix! common_suffix }
You can then validate your results with:
emails
#=> ["giovanna.macedo#lojas100.com.br", "alvaro-neves#stockshop.com", "filiallojas123#filiallojas.net"]
email1
#=> "giovanna.macedo#lojas100.com.br"
email2
#=> "alvaro-neves#stockshop.com"
email3
#=> "filiallojas123#filiallojas.net"
You can see that the array has replaced each value, or you can call each of the array's variables individually if you want to check that the strings have actually been modified in place.
String Methods are Usually Faster, But Your Mileage May Vary
Since you're dealing with String objects instead of regular expressions, this solution is likely to be faster at scale, although I didn't bother to benchmark all solutions to compare. If you care about performance, you can measure larger samples using IRB's new measure command, it took only 0.000062s to process the strings this way on my system, and String methods generally work faster than regular expressions at large scales. You'll need to do more extensive benchmarking if performance is a core concern, though.
Making the Call Shorter
You can even make the call shorter if you want. I left it a bit verbose above so you could see what the intent was at each step, but you can trim this to a single one-liner with the following block:
# one method chain, just wrapped to prevent scrolling
[email1, email2, email3].
map { _1.delete_suffix! "-215000695716b.ct.domain.com.br" }
Caveats
You Need Fixed-String Suffixes
The main caveat here is that this solution will only work when you know the suffix (or set of suffixes) you want to remove. If you can't rely on the suffixes to be fixed, then you'll likely need to pursue a regex solution in one way or another, even if it's just to collect a set of suffixes.
Dealing with Frozen Strings
Another caveat is that if you've created your code with frozen string literals, you'll need to adjust your code to avoid attempting in-place changes to frozen strings. There's more than one way to do this, but a simple destructuring assignment is probably the easiest to follow given your small code sample. Consider the following:
# assume that the strings in email1 etc. are frozen, but the array
# itself is not; you can't change the strings in-place, but you can
# re-assign new strings to the same variables or the same array
emails = [email1, email2, email3]
email1, email2, email3 =
emails.map { _1.delete_suffix "-215000695716b.ct.domain.com.br" }
There are certainly other ways to work around frozen strings, but the point is that while the now-common use of the # frozen_string_literal: true magic comment can improve VM performance or memory usage in large programs, it isn't always the best option for string-mangling code. Just keep that in mind, as tools like RuboCop love to enforce frozen strings, and not everyone stops to consider the consequences of such generic advice to the given problem domain.
I would just use the chomp(string) method like so:
mask = "-215000695716b.ct.domain.com.br"
email1.chomp(mask)
#=> "giovanna.macedo#lojas100.com.br"
email2.chomp(mask)
#=> "alvaro-neves#stockshop.com"
email3.chomp(mask)
#=> "filiallojas123#filiallojas.net"

How to use gsubstitution with more letters

I've printed the code, wit ruby
string = "hahahah"
pring string.gsub("a","b")
How do I add more letter replacements into gsub?
string.gsub("a","b")("h","l") and string.gsub("a","b";"h","l")
didnt work...
*update I have tried this too but without any success .
letters = {
"a" => "l"
"b" => "n"
...
"z" => "f"
}
string = "hahahah"
print string.gsub(\/w\,letters)
You're overcomplicating. As with most method calls in Ruby, you can simply chain #gsub calls together, one after the other:
str = 'adfh'
print str.gsub("a","b").gsub("h","l") #=> 'bdfl'
What you're doing here is applying the second #gsub to the result of the first one.
Of course, that gets a bit long-winded if you do too many of them. So, when you find yourself stringing too many together, you'll want to look for a regex solution. Rubular is a great place to tinker with them.
The way to use your hash trick with #gsub and a regex expression is to provide a hash for all possible matches. This has the same result as the two #gsub calls:
print str.gsub(/[ah]/, {'a'=>'b', 'h'=>'l'}) #=> 'bdfl'
The regex matches either a or h (/[ah]/), and the hash is saying what to substitute for each of them.
All that said, str.tr('ah', 'bl') is the simplest way to solve your problem as specified, as some commenters have mentioned, so long as you are working with single letters. If you need to work with two or more characters per substitution, you'll need to use #gsub.

Regex to capture string into ruby method params

I Looking for an Regex to capture this examples of strings:
first_paramenter, first_hash_key: 'class1 class2', second_hash_key: true
first_argument, single_hash_key: 'class1 class2'
first_argument_without_second_argument
The pattern rules are:
The string must start some word (the first parameter) /^(\w+)/
The second parameter is optional
If second parameter provided, must have one comma after fisrt parameter
The second argument is an hash, with keys and values. Values can be true, false or an string enclosed by quotes
The hash keys must start with letter
I'm using this regex, but it matches with the only second example:
^(\w+),(\s[a-z]{1}[a-z_]+:\s'?[\w\s]+'?,?)$
I'd go with something like:
^(\w+)(?:, ([a-z]\w+): ('[^']*')(?:, ([a-z]\w+): (\w+))?)?
Here's a Rubular example of it.
(?:...) create non-capturing groups which we can easily test for existence using ?. That makes it easy to test for optional chunks.
([a-z]\w+) is an easy way to say "it must start with a letter" while allowing normal alpha, digits and "_".
As far as testing for "Values can be true, false or an string enclosed by quotes", I'd do that in code after capturing. It's way too easy to create a complex pattern, and then be unable to maintain it later. It's better to use simple ones, then look to see whether you got what you expected, than to try to enforce it inside the regex.
in the third example, your regex return 5 matches. It would be better if return only one. It's possible?
I'm not sure what you're asking. This will return a single capture for each, but why you'd want that makes no sense to me if you're capturing parameters to send to a method:
/^(\w+(?:, [a-z]\w+: '[^']*'(?:, [a-z]\w+: \w+)?)?)/
http://rubular.com/r/GLVuSOieI6
There is frequently a choice to be made between attacking an entire string with a single regex or breaking the string up with one or more String methods, and then going after each piece separately. The latter approach often makes debugging and testing easier, and may also make the code intelligible to mere mortals. It's always a judgement call, of course, but I think this problem lends itself well to the divide and conquer approach. This is how I'd do it.
Code
def match?(str)
a = str.split(',')
return false unless a.shift.strip =~ /^\w+$/
a.each do |s|
return false unless ((key_val = s.split(':')).size == 2) &&
key_val.first.strip =~ /^[a-z]\w*$/ &&
key_val.last.strip =~ /^(\'.*?\'|true|false)$/
end
true
end
Examples
match?("first_paramenter, first_hash_key: 'class1 class2',
second_hash_key: true")
#=>true
match?("first_argument, single_hash_key: 'class1 class2'")
#=>true
match?("first_argument_without_second_argument")
#=>true
match?("first_parameter, first_hash_key: 7")
#=>false
match?("dogs and cats, first_hash_key: 'class1 class2'")
#=>false
match?("first_paramenter, first_hash_key: 'class1 class2',
second_hash_key: :true")
#=>false
You've got the basic idea, you have a bunch of small mistakes in there
/^(\w+)(,\s[a-z][a-z_]+:\s('[^']*'|true|false))*$/
explained:
/^(\w+) # starts with a word
(
,\s # the comma goes _inside_ the parens since its optional
[a-z][a-z_]+:\s # {1} is completely redundant
( # use | in a capture group to allow different possible keys
'[^']*' | # note that '? doesn't make sure that the quotes always match
true |
false
)
)*$/x # can have 0 or more hash keys after the first word

Using regex to find an exact pattern match in Ruby

How would I go about testing for an exact match using regex.
"car".match(/[ca]+/) returns true.
How would I get the above statement to return false since the regex pattern doesn't contain an "r"? Any string that contains any characters other than "c" and "a" should return false.
"acacaccc" should return true
"acacacxcc" should return false
Add some anchors to it:
/^[ca]+$/
You just need anchors.
"car".match(/^[ca]+$/)
This'll force the entire string to be composed of "c" or "a", since the "^" and "$" mean "start" and "end" of the string. Without them, the regex will succeed as long as it matches any portion of the string.
Turn your logic around and look for bad things:
string.match(/[^ca]/)
string.index(/[^ca]/)
If either of the above are non-nil, then you have a bad string. If you just want to test and don't care about where it matches then:
if string.index(/[^ca]/).nil?
# You have a good string
else
# You have a bad string
For example:
>> "car".index(/[^ca]/).nil?
=> false
>> "caaaacaac".index(/[^ca]/).nil?
=> true
try this
"car".match /^(a|c)+$/
Try this:
"car".match(/^(?:c|a)$/)

Ruby doesn't recognize the g flag for regex

Is it implied by default in str.scan? Is it off by default in str[regex] ?
Yes, how often the regex is applied depends on the method used, not on the regex's flags.
scan will return an array containing (or iterate over) all matches of the regex. match and String#[] will return the first match. =~ will return the index of the first match. gsub will replace all occurrences of the regex and sub will replace the first occurence.
smotchkkiss:~$ irb
>> 'Foobar does not like food because he is a fool'.gsub(/foo/i, 'zim')
=> "zimbar does not like zimd because he is a ziml"

Resources