I want to replace the content (or delete it) that does not match with my filter.
I think the perfect description would be an opposite sub. I cannot find anything similar in the docs, and I'm not sure how to invert the regex, but I think a method would probably be the more convenient.
An example of how it would work (I've just changed the words to make it more clear)
"bird.cats.dogs".opposite_sub(/(dogs|cats)\.(dogs|cats)/, '')
#"cats.dogs"
I hope it's easy enough to understand.
Thanks in advance.
String#[] can take a regular expression as its parameter:
▶ "bird.cats.dogs"[/(dogs|cats)\.(dogs|cats)/]
#⇒ "cats.dogs"
For multiple matches one can use String#scan:
▶ "bird.cats.dogs.bird.cats.dogs".scan /(?:dogs|cats)\.(?:dogs|cats)/
#⇒ ["cats.dogs", "cats.dogs"]
So you want to extract the part that matches your regex?
You can use String#slice, for example:
"bird.cats.dogs".slice(/(dogs|cats)\.(dogs|cats)/)
#=> "cats.dogs"
And String#[] does the same.
"bird.cats.dogs"[/(dogs|cats)\.(dogs|cats)/]
#=> "cats.dogs"
You cannot have a single replacement string because the part of the string that matches the regex might not be at the beginning or end of the string, in which case it's not clear whether the replacement string should precede or follow the matching string. I've therefore written the following with two replacement strings, one for pre-match, the other for post_match. I've made this a method of the String class as that's what you've asked for (though I've given the method a less-perfect name :-) )
class String
def replace_non_matching(regex, replace_before, replace_after)
first, match, last = partition(regex)
replace_before + match + replace_after
end
end
r = /(dogs|cats)\.(dogs|cats)/
"birds.cats.dogs.pigs".replace_non_matching(r, "", "")
#=> "cats.dogs"
"birds.cats.dogs".replace_non_matching(r, "snakes.", ".hens")
#=> "snakes.cats.dogs.hens"
"birds.cats.dogs.mice.cats.dogs.bats".replace_non_matching(r, "snakes.", ".hens")
#=> "snakes.cats.dogs.hens"
Regarding the last example, the method could be modified to replace "birds.", ".mice." and ".bats", but in that case three replacement strings would be needed. In general, determining in advance the number of replacement strings needed could be problematic.
Related
I have a string as given below,
./component/unit
and need to split to get result as component/unit which I will use this as key for inserting hash.
I tried with .split(/.\//).last but its giving result as unit only not getting component/unit.
I think, this should help you:
string = './component/unit'
string.split('./')
#=> ["", "component/unit"]
string.split('./').last
#=> "component/unit"
Your regex was almost fine :
split(/\.\//)
You need to escape both . (any character) and / (regex delimiter).
As an alternative, you could just remove the first './' substring :
'./component/unit'.sub('./','')
#=> "component/unit"
All the other answers are fine, but I think you are not really dealing with a String here but with a URI or Pathname, so I would advise you to use these classes if you can. If so, please adjust the title, as it is not about do-it-yourself-regexes, but about proper use of the available libraries.
Link to the ruby doc:
https://docs.ruby-lang.org/en/2.1.0/URI.html
and
https://ruby-doc.org/stdlib-2.1.0/libdoc/pathname/rdoc/Pathname.html
An example with Pathname is:
require 'pathname'
pathname = Pathname.new('./component/unit')
puts pathname.cleanpath # => "component/unit"
# pathname.to_s # => "component/unit"
Whether this is a good idea (and/or using URI would be cool too) also depends on what your real problem is, i.e. what you want to do with the extracted String. As stated, I doubt a bit that you are really intested in Strings.
Using a positive lookbehind, you could do use regex:
reg = /(?<=\.\/)[\w+\/]+\w+\z/
Demo
str = './component'
str2 = './component/unit'
str3 = './component/unit/ruby'
str4 = './component/unit/ruby/regex'
[str, str2, str3, str4].each { |s| puts s[reg] }
#component
#component/unit
#component/unit/ruby
#component/unit/ruby/regex
This functional method takes a number and returns the same value separated with commas, as is the common convention in the US.
The only way I could get it to work with regex was to reverse the string before and after the expression. Is there a regex that can help me eliminate the need to call String#reverse twice for method functionality?
def separate_comma(number)
raise "You must enter a number." if number.is_a?(Numeric) == false
number.to_s.reverse.gsub(/(\d{3})(?=\d{1,3})/, "\\1,").reverse
end
Other libraries have already solved this problem - ActiveSupport for one.
require "active_support/number_helper"
ActiveSupport::NumberHelper.number_to_delimited(1234567890)
#=> "1,234,567,890"
You can even change the delimiter if you wish:
ActiveSupport::NumberHelper.number_to_delimited(1234567890, delimiter: "|")
#=> "1|234|567|890"
"1234556".gsub(/\d(?=\d{3}+\b)/,'\\0,')
# => "1,234,556"
This doesn't handle long fractional values, but this wasn't a concern for the OP's regex either.
The established way to do it is:
string.gsub(/(?<=\d)(?=(?:\d{3})+\z)/, ",")
If you want to do it with floats:
string.gsub(/(?<=\d)(?=(?:\d{3})+[.\z])/, ",")
Say I have a string : "hEY "
I want to convert it to "Hey "
string.gsub!(/([a-z])([A-Z]+ )/, '\1'.upcase)
That is the idea I have, but it seems like the upcase method does nothing when I use it within the gsub method. Why is that?
EDIT: I came up with this method:
string.gsub!(/([a-z])([A-Z]+ )/) { |str| str.downcase!.capitalize! }
Is there a way to do this within the regex though? I don't really understand the '\1' '\2' thing. Is that backreferencing? How does that work
#sawa Has the simple answer, and you've edited your question with another mechanism. However, to answer two of your questions:
Is there a way to do this within the regex though?
No, Ruby's regex does not support a case-changing feature as some other regex flavors do. You can "prove" this to yourself by reviewing the official Ruby regex docs for 1.9 and 2.0 and searching for the word "case":
https://github.com/ruby/ruby/blob/ruby_1_9_3/doc/re.rdoc
https://github.com/ruby/ruby/blob/ruby_2_0_0/doc/re.rdoc
I don't really understand the '\1' '\2' thing. Is that backreferencing? How does that work?
Your use of \1 is a kind of backreference. A backreference can be when you use \1 and such in the search pattern. For example, the regular expression /f(.)\1/ will find the letter f, followed by any character, followed by that same character (e.g. "foo" or "f!!").
In this case, within a replacement string passed to a method like String#gsub, the backreference does refer to the previous capture. From the docs:
"If replacement is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form \d, where d is a group number, or \k<n>, where n is a group name. If it is a double-quoted string, both back-references must be preceded by an additional backslash."
In practice, this means:
"hello world".gsub( /([aeiou])/, '_\1_' ) #=> "h_e_ll_o_ w_o_rld"
"hello world".gsub( /([aeiou])/, "_\1_" ) #=> "h_\u0001_ll_\u0001_ w_\u0001_rld"
"hello world".gsub( /([aeiou])/, "_\\1_" ) #=> "h_e_ll_o_ w_o_rld"
Now, you have to understand when code runs. In your original code…
string.gsub!(/([a-z])([A-Z]+ )/, '\1'.upcase)
…what you are doing is calling upcase on the string '\1' (which has no effect) and then calling the gsub! method, passing in a regex and a string as parameters.
Finally, another way to achieve this same goal is with the block form like so:
# Take your pick of which you prefer:
string.gsub!(/([a-z])([A-Z]+ )/){ $1.upcase << $2.downcase }
string.gsub!(/([a-z])([A-Z]+ )/){ [$1.upcase,$2.downcase].join }
string.gsub!(/([a-z])([A-Z]+ )/){ "#{$1.upcase}#{$2.downcase}" }
In the block form of gsub the captured patterns are set to the global variables $1, $2, etc. and you can use those to construct the replacement string.
I don't know why you are trying to do it in a complicated way, but the usual way is:
"hEY".capitalize # => "Hey"
If you insist in using a regex and upcase, then you would also need downcase:
"hEY".downcase.sub(/\w/){$&.upcase} # => "Hey"
If you really want to just swap the case of every letter in the string, you can avoid the complexity of regex entirely because There's A Method For That™.
"hEY".swapcase # => "Hey"
"HellO thERe".swapcase # => "hELLo THerE"
There's also swapcase! to do it destructively.
I'm sure I can do this with a regex, but I can't find any explanation for this behavior using just normal delete!:
#1.9.2
>> "helllom<em>".delete!"<em>"
=> "hlllo"
The docs don't have anything to say about this. Seems to me that it's treating '<em>' as a set. Where is this documented?
Edit: in my defense I was looking for special treatment of < and > in the docs under delete. Didn't see anything about it and tried google, which also didn't have anything to say about that -- because it doesn't exist.
String#delete is one of those unfortunate methods that is difficult to explain (I have no idea what the use case is). In practice, I've always used gsub with an empty string as the second argument.
'helllom<em>'.gsub '<em>', '' # => "helllom"
Note that String#gsub! also has weirdness such that you should not depend on its return value, it will return nil if it does not alter the string, so it is best to use gsub if you depend on the return value, or if you want to mutate the string, then use gsub! but and don't use anything else on that line.
You cannot use String#delete to remove substrings.
Check the API. It removes all the characters from given parameters from the given string.
I your case it removes all occurrences of e, m, < and >.
Straight from the docs:
delete([other_str]+) → new_str
Returns a copy of str with all characters in the intersection of its
arguments deleted. Uses the same rules for building the set of
characters as String#count.
ex:
"hello".delete "l","lo" #=> "heo"
"hello".delete "lo" #=> "he"
"hello".delete "aeiou", "^e" #=> "hell"
"hello".delete "ej-m" #=> "ho"
So every character in the intersection of the two strings is removed.
How can I remove the very first "1" from any string if that string starts with a "1"?
"1hello world" => "hello world"
"112345" => "12345"
I'm thinking of doing
string.sub!('1', '') if string =~ /^1/
but I' wondering there's a better way. Thanks!
Why not just include the regex in the sub! method?
string.sub!(/^1/, '')
As of Ruby 2.5 you can use delete_prefix or delete_prefix! to achieve this in a readable manner.
In this case "1hello world".delete_prefix("1").
More info here:
https://blog.jetbrains.com/ruby/2017/10/10-new-features-in-ruby-2-5/
https://bugs.ruby-lang.org/issues/12694
'invisible'.delete_prefix('in') #=> "visible"
'pink'.delete_prefix('in') #=> "pink"
N.B. you can also use this to remove items from the end of a string with delete_suffix and delete_suffix!
'worked'.delete_suffix('ed') #=> "work"
'medical'.delete_suffix('ed') #=> "medical"
https://bugs.ruby-lang.org/issues/13665
I've answered in a little more detail (with benchmarks) here: What is the easiest way to remove the first character from a string?
if you're going to use regex for the match, you may as well use it for the replacement
string.sub!(%r{^1},"")
BTW, the %r{} is just an alternate syntax for regular expressions. You can use %r followed by any character e.g. %r!^1!.
Careful using sub!(/^1/,'') ! In case the string doesn't match /^1/ it will return nil. You should probably use sub (without the bang).
This answer might be more optimised: What is the easiest way to remove the first character from a string?
string[0] = '' if string[0] == '1'
I'd like to post a tiny improvement to the otherwise excellent answer by Zach. The ^ matches the beginning of every line in Ruby regex. This means there can be multiple matches per string. Kenji asked about the beginning of the string which means they have to use this regex instead:
string.sub!(/\A1/, '')
Compare this - multiple matches with this - one match.