Regex to capture string into ruby method params - ruby

I Looking for an Regex to capture this examples of strings:
first_paramenter, first_hash_key: 'class1 class2', second_hash_key: true
first_argument, single_hash_key: 'class1 class2'
first_argument_without_second_argument
The pattern rules are:
The string must start some word (the first parameter) /^(\w+)/
The second parameter is optional
If second parameter provided, must have one comma after fisrt parameter
The second argument is an hash, with keys and values. Values can be true, false or an string enclosed by quotes
The hash keys must start with letter
I'm using this regex, but it matches with the only second example:
^(\w+),(\s[a-z]{1}[a-z_]+:\s'?[\w\s]+'?,?)$

I'd go with something like:
^(\w+)(?:, ([a-z]\w+): ('[^']*')(?:, ([a-z]\w+): (\w+))?)?
Here's a Rubular example of it.
(?:...) create non-capturing groups which we can easily test for existence using ?. That makes it easy to test for optional chunks.
([a-z]\w+) is an easy way to say "it must start with a letter" while allowing normal alpha, digits and "_".
As far as testing for "Values can be true, false or an string enclosed by quotes", I'd do that in code after capturing. It's way too easy to create a complex pattern, and then be unable to maintain it later. It's better to use simple ones, then look to see whether you got what you expected, than to try to enforce it inside the regex.
in the third example, your regex return 5 matches. It would be better if return only one. It's possible?
I'm not sure what you're asking. This will return a single capture for each, but why you'd want that makes no sense to me if you're capturing parameters to send to a method:
/^(\w+(?:, [a-z]\w+: '[^']*'(?:, [a-z]\w+: \w+)?)?)/
http://rubular.com/r/GLVuSOieI6

There is frequently a choice to be made between attacking an entire string with a single regex or breaking the string up with one or more String methods, and then going after each piece separately. The latter approach often makes debugging and testing easier, and may also make the code intelligible to mere mortals. It's always a judgement call, of course, but I think this problem lends itself well to the divide and conquer approach. This is how I'd do it.
Code
def match?(str)
a = str.split(',')
return false unless a.shift.strip =~ /^\w+$/
a.each do |s|
return false unless ((key_val = s.split(':')).size == 2) &&
key_val.first.strip =~ /^[a-z]\w*$/ &&
key_val.last.strip =~ /^(\'.*?\'|true|false)$/
end
true
end
Examples
match?("first_paramenter, first_hash_key: 'class1 class2',
second_hash_key: true")
#=>true
match?("first_argument, single_hash_key: 'class1 class2'")
#=>true
match?("first_argument_without_second_argument")
#=>true
match?("first_parameter, first_hash_key: 7")
#=>false
match?("dogs and cats, first_hash_key: 'class1 class2'")
#=>false
match?("first_paramenter, first_hash_key: 'class1 class2',
second_hash_key: :true")
#=>false

You've got the basic idea, you have a bunch of small mistakes in there
/^(\w+)(,\s[a-z][a-z_]+:\s('[^']*'|true|false))*$/
explained:
/^(\w+) # starts with a word
(
,\s # the comma goes _inside_ the parens since its optional
[a-z][a-z_]+:\s # {1} is completely redundant
( # use | in a capture group to allow different possible keys
'[^']*' | # note that '? doesn't make sure that the quotes always match
true |
false
)
)*$/x # can have 0 or more hash keys after the first word

Related

How do I remove a common substring using Ruby?

I have read How do I remove substring after a certain character in a string using Ruby?. This is close, but different.
I have these emails with a mask:
email1 = 'giovanna.macedo#lojas100.com.br-215000695716b.ct.domain.com.br'
email2 = 'alvaro-neves#stockshop.com-215000695716b.ct.domain.com.br'
email3 = 'filiallojas123#filiallojas.net-215000695716b.ct.domain.com.br'
I want to remove the substrings that are after .br, .com and .net. The return must be:
email1 = 'giovanna.macedo#lojas100.com.br'
email2 = 'alvaro-neves#stockshop.com'
email3 = 'filiallojas123#filiallojas.net'
You can do that with the method String#[] with an argument that is a regular expression.
r = /.*?\.(?:rb|com|net|br)(?!\.br)/
'giovanna.macedo#lojas100.com.br-215000695716b.ct.domain.com.br'[r]
#=> "giovanna.macedo#lojas100.com.br"
'alvaro-neves#stockshop.com-215000695716b.ct.domain.com.br'[r]
#=> "alvaro-neves#stockshop.com"
'filiallojas123#filiallojas.net-215000695716b.ct.domain.com.br'[r]
#=> "filiallojas123#filiallojas.net"
The regular expression reads as follows: "Match zero or more characters non-greedily (?), follow by a period, followed by 'rb' or 'com' or 'net' or 'br', which is not followed by .br. (?!\.br) is a negative lookahead.
Alternatively the regular expression can be written in free-spacing mode to make it self-documenting:
r = /
.*? # match zero or more characters non-greedily
\. # match '.'
(?: # begin a non-capture group
rb # match 'rb'
| # or
com # match 'com'
| # or
net # match 'net'
| # or
br # match 'br'
) # end non-capture group
(?! # begin a negative lookahead
\.br # match '.br'
) # end negative lookahead
/x # invoke free-spacing regex definition mode
This should work for your scenario:
expr = /^(.+\.(?:br|com|net))-[^']+(')$/
str = "email = 'giovanna.macedo#lojas100.com.br-215000695716b.ct.domain.com.br'"
str.gsub(expr, '\1\2')
Use the String#delete_suffix Method
This was tested with Ruby 3.0.2. Your mileage may vary with other versions that don't support String#delete_suffix or its related bang method. Since you're trying to remove the exact same suffix from all your emails, you can simply invoke #delete_suffix! on each of your strings. For example:
common_suffix = "-215000695716b.ct.domain.com.br".freeze
emails = [email1, email2, email3]
emails.each { _1.delete_suffix! common_suffix }
You can then validate your results with:
emails
#=> ["giovanna.macedo#lojas100.com.br", "alvaro-neves#stockshop.com", "filiallojas123#filiallojas.net"]
email1
#=> "giovanna.macedo#lojas100.com.br"
email2
#=> "alvaro-neves#stockshop.com"
email3
#=> "filiallojas123#filiallojas.net"
You can see that the array has replaced each value, or you can call each of the array's variables individually if you want to check that the strings have actually been modified in place.
String Methods are Usually Faster, But Your Mileage May Vary
Since you're dealing with String objects instead of regular expressions, this solution is likely to be faster at scale, although I didn't bother to benchmark all solutions to compare. If you care about performance, you can measure larger samples using IRB's new measure command, it took only 0.000062s to process the strings this way on my system, and String methods generally work faster than regular expressions at large scales. You'll need to do more extensive benchmarking if performance is a core concern, though.
Making the Call Shorter
You can even make the call shorter if you want. I left it a bit verbose above so you could see what the intent was at each step, but you can trim this to a single one-liner with the following block:
# one method chain, just wrapped to prevent scrolling
[email1, email2, email3].
map { _1.delete_suffix! "-215000695716b.ct.domain.com.br" }
Caveats
You Need Fixed-String Suffixes
The main caveat here is that this solution will only work when you know the suffix (or set of suffixes) you want to remove. If you can't rely on the suffixes to be fixed, then you'll likely need to pursue a regex solution in one way or another, even if it's just to collect a set of suffixes.
Dealing with Frozen Strings
Another caveat is that if you've created your code with frozen string literals, you'll need to adjust your code to avoid attempting in-place changes to frozen strings. There's more than one way to do this, but a simple destructuring assignment is probably the easiest to follow given your small code sample. Consider the following:
# assume that the strings in email1 etc. are frozen, but the array
# itself is not; you can't change the strings in-place, but you can
# re-assign new strings to the same variables or the same array
emails = [email1, email2, email3]
email1, email2, email3 =
emails.map { _1.delete_suffix "-215000695716b.ct.domain.com.br" }
There are certainly other ways to work around frozen strings, but the point is that while the now-common use of the # frozen_string_literal: true magic comment can improve VM performance or memory usage in large programs, it isn't always the best option for string-mangling code. Just keep that in mind, as tools like RuboCop love to enforce frozen strings, and not everyone stops to consider the consequences of such generic advice to the given problem domain.
I would just use the chomp(string) method like so:
mask = "-215000695716b.ct.domain.com.br"
email1.chomp(mask)
#=> "giovanna.macedo#lojas100.com.br"
email2.chomp(mask)
#=> "alvaro-neves#stockshop.com"
email3.chomp(mask)
#=> "filiallojas123#filiallojas.net"

How to use gsubstitution with more letters

I've printed the code, wit ruby
string = "hahahah"
pring string.gsub("a","b")
How do I add more letter replacements into gsub?
string.gsub("a","b")("h","l") and string.gsub("a","b";"h","l")
didnt work...
*update I have tried this too but without any success .
letters = {
"a" => "l"
"b" => "n"
...
"z" => "f"
}
string = "hahahah"
print string.gsub(\/w\,letters)
You're overcomplicating. As with most method calls in Ruby, you can simply chain #gsub calls together, one after the other:
str = 'adfh'
print str.gsub("a","b").gsub("h","l") #=> 'bdfl'
What you're doing here is applying the second #gsub to the result of the first one.
Of course, that gets a bit long-winded if you do too many of them. So, when you find yourself stringing too many together, you'll want to look for a regex solution. Rubular is a great place to tinker with them.
The way to use your hash trick with #gsub and a regex expression is to provide a hash for all possible matches. This has the same result as the two #gsub calls:
print str.gsub(/[ah]/, {'a'=>'b', 'h'=>'l'}) #=> 'bdfl'
The regex matches either a or h (/[ah]/), and the hash is saying what to substitute for each of them.
All that said, str.tr('ah', 'bl') is the simplest way to solve your problem as specified, as some commenters have mentioned, so long as you are working with single letters. If you need to work with two or more characters per substitution, you'll need to use #gsub.

Removing all whitespace from a string in Ruby

How can I remove all newlines and spaces from a string in Ruby?
For example, if we have a string:
"123\n12312313\n\n123 1231 1231 1"
It should become this:
"12312312313123123112311"
That is, all whitespaces should be removed.
You can use something like:
var_name.gsub!(/\s+/, '')
Or, if you want to return the changed string, instead of modifying the variable,
var_name.gsub(/\s+/, '')
This will also let you chain it with other methods (i.e. something_else = var_name.gsub(...).to_i to strip the whitespace then convert it to an integer). gsub! will edit it in place, so you'd have to write var_name.gsub!(...); something_else = var_name.to_i. Strictly speaking, as long as there is at least one change made,gsub! will return the new version (i.e. the same thing gsub would return), but on the chance that you're getting a string with no whitespace, it'll return nil and things will break. Because of that, I'd prefer gsub if you're chaining methods.
gsub works by replacing any matches of the first argument with the contents second argument. In this case, it matches any sequence of consecutive whitespace characters (or just a single one) with the regex /\s+/, then replaces those with an empty string. There's also a block form if you want to do some processing on the matched part, rather than just replacing directly; see String#gsub for more information about that.
The Ruby docs for the class Regexp are a good starting point to learn more about regular expressions -- I've found that they're useful in a wide variety of situations where a couple of milliseconds here or there don't count and you don't need to match things that can be nested arbitrarily deeply.
As Gene suggested in his comment, you could also use tr:
var_name.tr(" \t\r\n", '')
It works in a similar way, but instead of replacing a regex, it replaces every instance of the nth character of the first argument in the string it's called on with the nth character of the second parameter, or if there isn't, with nothing. See String#tr for more information.
You could also use String#delete:
str = "123\n12312313\n\n123 1231 1231 1"
str.delete "\s\n"
#=> "12312312313123123112311"
You could use String#delete! to modify str in place, but note delete! returns nil if no change is made
Alternatively you could scan the string for digits /\d+/ and join the result:
string = "123\n\n12312313\n\n123 1231 1231 1\n"
string.scan(/\d+/).join
#=> "12312312313123123112311"
Please note that this would also remove alphabetical characters, dashes, symbols, basically everything that is not a digit.

Regex for series of four digits each up to 100

I'm trying to write a regex to validate a string and accepts only a series of four comma-separated digits, each up to 100. Something like this would be valid:
20,30,40,50
and these invalid:
120,0,20,0
20,30,40,ss
invalid_string
Any thoughts?
They're used for CMYK colours. We just need to store them here, not use them.
Number Range and Subroutine
In Ruby 2+, for a compact regex, use this:
^([0-9]|[1-9][0-9]|100)(?:,\g<1>){3}$
Explanation
The ^ anchor asserts that we are at the beginning of the string
The parentheses around ([0-9]|[1-9][0-9]|100) match a number from 0 to 100 and define subroutine #1
(?:,\g<1>) matches one comma and the expression defined by subroutine # 1
The {3} quantifier repeats that three times
The $ anchor asserts that we are at the end of the string
I'd save myself the headache of using regex for a number related problem. Also the validation message will look akward so it's better to make your own:
validate :that_string_has_only_4_numbers_upto_100
def that_string_has_only_4_numbers_upto_100
errors.add(:str, 'is not valid.') unless str.split(/,/).all? { |n| 1..100 === n.to_i }
end
Unless you a re regex jedi guru like #zx81 :p.
^(?:\d{1,2},){3}\d{1,2}$
Try this

Regex to leave desired string remaining and others removed

In Ruby, what regex will strip out all but a desired string if present in the containing string? I know about /[^abc]/ for characters, but what about strings?
Say I have the string "group=4&type_ids[]=2&type_ids[]=7&saved=1" and want to retain the pattern group=\d, if it is present in the string using only a regex?
Currently, I am splitting on & and then doing a select with matching condition =~ /group=\d/ on the resulting enumerable collection. It works fine, but I'd like to know the regex to do this more directly.
Simply:
part = str[/group=\d+/]
If you want only the numbers, then:
group_str = str[/group=(\d+)/,1]
If you want only the numbers as an integer, then:
group_num = str[/group=(\d+)/,1].to_i
Warning: String#[] will return nil if no match occurs, and blindly calling nil.to_i always returns 0.
You can try:
$str =~ s/.*(group=\d+).*/\1/;
Typically I wouldn't really worry too much about a complex regex. Simply break the string down into smaller parts and it becomes easier:
asdf = "group=4&type_ids[]=2&type_ids[]=7&saved=1"
asdf.split('&').select{ |q| q['group'] } # => ["group=4"]
Otherwise, you can use regex a bunch of different ways. Here's two ways I tend to use:
asdf.scan(/group=\d+/) # => ["group=4"]
asdf[/(group=\d+)/, 1] # => "group=4"
Try:
str.match(/group=\d+/)[0]

Resources