How do I improve regex to eliminate unnecessary method chaining?

How do I improve regex to eliminate unnecessary method chaining? - ruby

This functional method takes a number and returns the same value separated with commas, as is the common convention in the US.
The only way I could get it to work with regex was to reverse the string before and after the expression. Is there a regex that can help me eliminate the need to call String#reverse twice for method functionality?
def separate_comma(number)
raise "You must enter a number." if number.is_a?(Numeric) == false
number.to_s.reverse.gsub(/(\d{3})(?=\d{1,3})/, "\\1,").reverse
end

Other libraries have already solved this problem - ActiveSupport for one.
require "active_support/number_helper"
ActiveSupport::NumberHelper.number_to_delimited(1234567890)
#=> "1,234,567,890"
You can even change the delimiter if you wish:
ActiveSupport::NumberHelper.number_to_delimited(1234567890, delimiter: "|")
#=> "1|234|567|890"

"1234556".gsub(/\d(?=\d{3}+\b)/,'\\0,')
# => "1,234,556"
This doesn't handle long fractional values, but this wasn't a concern for the OP's regex either.

The established way to do it is:
string.gsub(/(?<=\d)(?=(?:\d{3})+\z)/, ",")
If you want to do it with floats:
string.gsub(/(?<=\d)(?=(?:\d{3})+[.\z])/, ",")

Related

opposite of sub in ruby

I want to replace the content (or delete it) that does not match with my filter.
I think the perfect description would be an opposite sub. I cannot find anything similar in the docs, and I'm not sure how to invert the regex, but I think a method would probably be the more convenient.
An example of how it would work (I've just changed the words to make it more clear)
"bird.cats.dogs".opposite_sub(/(dogs|cats)\.(dogs|cats)/, '')
#"cats.dogs"
I hope it's easy enough to understand.
Thanks in advance.

String#[] can take a regular expression as its parameter:
▶ "bird.cats.dogs"[/(dogs|cats)\.(dogs|cats)/]
#⇒ "cats.dogs"
For multiple matches one can use String#scan:
▶ "bird.cats.dogs.bird.cats.dogs".scan /(?:dogs|cats)\.(?:dogs|cats)/
#⇒ ["cats.dogs", "cats.dogs"]

So you want to extract the part that matches your regex?
You can use String#slice, for example:
"bird.cats.dogs".slice(/(dogs|cats)\.(dogs|cats)/)
#=> "cats.dogs"
And String#[] does the same.
"bird.cats.dogs"[/(dogs|cats)\.(dogs|cats)/]
#=> "cats.dogs"

You cannot have a single replacement string because the part of the string that matches the regex might not be at the beginning or end of the string, in which case it's not clear whether the replacement string should precede or follow the matching string. I've therefore written the following with two replacement strings, one for pre-match, the other for post_match. I've made this a method of the String class as that's what you've asked for (though I've given the method a less-perfect name :-) )
class String
def replace_non_matching(regex, replace_before, replace_after)
first, match, last = partition(regex)
replace_before + match + replace_after
end
end
r = /(dogs|cats)\.(dogs|cats)/
"birds.cats.dogs.pigs".replace_non_matching(r, "", "")
#=> "cats.dogs"
"birds.cats.dogs".replace_non_matching(r, "snakes.", ".hens")
#=> "snakes.cats.dogs.hens"
"birds.cats.dogs.mice.cats.dogs.bats".replace_non_matching(r, "snakes.", ".hens")
#=> "snakes.cats.dogs.hens"
Regarding the last example, the method could be modified to replace "birds.", ".mice." and ".bats", but in that case three replacement strings would be needed. In general, determining in advance the number of replacement strings needed could be problematic.

how can i consolidate this expression?

I'm removing the initial "The" and spaces of band names for concatenating into a url.
I have this, but it's ugly and I'd like to consolidate into one expression.
#artist.sub!(/[Tt]he/, '')
#artist.gsub!(/\s+/, '')

Try:
#artist.gsub!(/(\A[Tt]he)|(\s+)/, '')

You can of course chain #sub and #gsub expressions; e.g.,
#artist = #artist.sub(/^[Tt]he/, '').gsub(/\s+/, '')
Any more compact and I would hesitate to call it elegant—just clever (and unclear).
Note the use of #sub and #gsub instead of #sub! and #gsub!. Per #pguardiario's comment, the second two will return nil if there is no match, causing a NoMethodError exception. Also, note that this has an anchor to prevent "The" from being removed from the middle of the string.
If you're trying to create a slug for use in URLs, you might be better going with a method in a library.

I'd go with:
#artist = #artist.sub(/\Athe\b/i, '').strip

Optimising ruby regexp -- lots of match groups

I'm working on a ruby baser lexer. To improve performance, I joined up all tokens' regexps into one big regexp with match group names. The resulting regexp looks like:
/\A(?<__anonymous_-1038694222803470993>(?-mix:\n+))|\A(?<__anonymous_-1394418499721420065>(?-mix:\/\/[\A\n]*))|\A(?<__anonymous_3077187815313752157>(?-mix:include\s+"[\A"]+"))|\A(?<LET>(?-mix:let\s))|\A(?<IN>(?-mix:in\s))|\A(?<CLASS>(?-mix:class\s))|\A(?<DEF>(?-mix:def\s))|\A(?<DEFM>(?-mix:defm\s))|\A(?<MULTICLASS>(?-mix:multiclass\s))|\A(?<FUNCNAME>(?-mix:![a-zA-Z_][a-zA-Z0-9_]*))|\A(?<ID>(?-mix:[a-zA-Z_][a-zA-Z0-9_]*))|\A(?<STRING>(?-mix:"[\A"]*"))|\A(?<NUMBER>(?-mix:[0-9]+))/
I'm matching it to my string producing a MatchData where exactly one token is parsed:
bigregex =~ "\n ... garbage"
puts $~.inspect
Which outputs
#<MatchData
"\n"
__anonymous_-1038694222803470993:"\n"
__anonymous_-1394418499721420065:nil
__anonymous_3077187815313752157:nil
LET:nil
IN:nil
CLASS:nil
DEF:nil
DEFM:nil
MULTICLASS:nil
FUNCNAME:nil
ID:nil
STRING:nil
NUMBER:nil>
So, the regex actually matched the "\n" part. Now, I need to figure the match group where it belongs (it's clearly visible from #inspect output that it's _anonymous-1038694222803470993, but I need to get it programmatically).
I could not find any option other than iterating over #names:
m.names.each do |n|
if m[n]
type = n.to_sym
resolved_type = (n.start_with?('__anonymous_') ? nil : type)
val = m[n]
break
end
end
which verifies that the match group did have a match.
The problem here is that it's slow (I spend about 10% of time in the loop; also 8% grabbing the #input[#pos..-1] to make sure that \A works as expected to match start of string (I do not discard input, just shift the #pos in it).
You can check the full code at GH repo.
Any ideas on how to make it at least a bit faster? Is there any option to figure the "successful" match group easier?

You can do this using the regexp methods .captures() and .names():
matching_string = "\n ...garbage" # or whatever this really is in your code
#input = matching_string.match bigregex # bigregex = your regex
arr = #input.captures
arr.each_with_index do |value, index|
if not value.nil?
the_name_you_want = #input.names[index]
end
end
Or if you expect multiple successful values, you could do:
success_names_arr = []
success_names_arr.push(#input.names[index]) #within the above loop
Pretty similar to your original idea, but if you're looking for efficiency .captures() method should help with that.

I may have misunderstood this completely but but I'm assuming that all but one token is not nil and that's the one your after?
If so then, depending on the flavour of regex you're using, you could use a negative lookahead to check for a non-nil value
([^\n:]+:(?!nil)[^\n\>]+)
This will match the whole token ie NAME:value.

how do I use String.delete to remove '<em>' from a string in Ruby?

I'm sure I can do this with a regex, but I can't find any explanation for this behavior using just normal delete!:
#1.9.2
>> "helllom<em>".delete!"<em>"
=> "hlllo"
The docs don't have anything to say about this. Seems to me that it's treating '<em>' as a set. Where is this documented?
Edit: in my defense I was looking for special treatment of < and > in the docs under delete. Didn't see anything about it and tried google, which also didn't have anything to say about that -- because it doesn't exist.

String#delete is one of those unfortunate methods that is difficult to explain (I have no idea what the use case is). In practice, I've always used gsub with an empty string as the second argument.
'helllom<em>'.gsub '<em>', '' # => "helllom"
Note that String#gsub! also has weirdness such that you should not depend on its return value, it will return nil if it does not alter the string, so it is best to use gsub if you depend on the return value, or if you want to mutate the string, then use gsub! but and don't use anything else on that line.

You cannot use String#delete to remove substrings.
Check the API. It removes all the characters from given parameters from the given string.
I your case it removes all occurrences of e, m, < and >.

Straight from the docs:
delete([other_str]+) → new_str
Returns a copy of str with all characters in the intersection of its
arguments deleted. Uses the same rules for building the set of
characters as String#count.
ex:
"hello".delete "l","lo" #=> "heo"
"hello".delete "lo" #=> "he"
"hello".delete "aeiou", "^e" #=> "hell"
"hello".delete "ej-m" #=> "ho"
So every character in the intersection of the two strings is removed.

Remove character from string if it starts with that character?

How can I remove the very first "1" from any string if that string starts with a "1"?
"1hello world" => "hello world"
"112345" => "12345"
I'm thinking of doing
string.sub!('1', '') if string =~ /^1/
but I' wondering there's a better way. Thanks!

Why not just include the regex in the sub! method?
string.sub!(/^1/, '')

As of Ruby 2.5 you can use delete_prefix or delete_prefix! to achieve this in a readable manner.
In this case "1hello world".delete_prefix("1").
More info here:
https://blog.jetbrains.com/ruby/2017/10/10-new-features-in-ruby-2-5/
https://bugs.ruby-lang.org/issues/12694
'invisible'.delete_prefix('in') #=> "visible"
'pink'.delete_prefix('in') #=> "pink"
N.B. you can also use this to remove items from the end of a string with delete_suffix and delete_suffix!
'worked'.delete_suffix('ed') #=> "work"
'medical'.delete_suffix('ed') #=> "medical"
https://bugs.ruby-lang.org/issues/13665
I've answered in a little more detail (with benchmarks) here: What is the easiest way to remove the first character from a string?

if you're going to use regex for the match, you may as well use it for the replacement
string.sub!(%r{^1},"")
BTW, the %r{} is just an alternate syntax for regular expressions. You can use %r followed by any character e.g. %r!^1!.

Careful using sub!(/^1/,'') ! In case the string doesn't match /^1/ it will return nil. You should probably use sub (without the bang).

This answer might be more optimised: What is the easiest way to remove the first character from a string?
string[0] = '' if string[0] == '1'

I'd like to post a tiny improvement to the otherwise excellent answer by Zach. The ^ matches the beginning of every line in Ruby regex. This means there can be multiple matches per string. Kenji asked about the beginning of the string which means they have to use this regex instead:
string.sub!(/\A1/, '')
Compare this - multiple matches with this - one match.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How do I improve regex to eliminate unnecessary method chaining? - ruby

"1234556".gsub(/\d(?=\d{3}+\b)/,'\\0,') # => "1,234,556" This doesn't handle long fractional values, but this wasn't a concern for the OP's regex either.

The established way to do it is: string.gsub(/(?<=\d)(?=(?:\d{3})+\z)/, ",") If you want to do it with floats: string.gsub(/(?<=\d)(?=(?:\d{3})+[.\z])/, ",")

Related

opposite of sub in ruby

how can i consolidate this expression?

Optimising ruby regexp -- lots of match groups

how do I use String.delete to remove '<em>' from a string in Ruby?

Remove character from string if it starts with that character?

Categories

Resources