I have a regexp that sets $1 : it corresponds to the text between ( and ) in : the_beginning(.*)the_end.
I want to replace the value corresponding to $1 with somethingelse, not all the regexp.
In real context :
my_string contains :
/* MyKey */ = { [code_missing]; MY_VALUE = "123456789"; [code_missing]; }
I want to replace "123456789" ( with "987654321" for example ).
And this is my regexp :
"/\\* MyKey \\*/ = {[^}]*MY_VALUE = \"(.*)\";"
I'm still not sure exactly what you want, but here's some code that should help you:
str = "Hello this is the_beginning that comes before the_end of the string"
p str.sub /the_beginning(.+?)the_end/, 'new_beginning\1new_end'
#=> "Hello this is new_beginning that comes before new_end of the string"
p str.sub /(the_beginning).+?(the_end)/, '\1new middle\2'
#=> "Hello this is the_beginningnew middlethe_end of the string"
Edit:
theDoc = '/* MyKey */ = { [code_missing]; MY_VALUE = "123456789";'
regex = %r{/\* MyKey \*/ = {[^}]*MY_VALUE = "(.*)";}
p theDoc[ regex, 1 ] # extract the captured group
#=> "123456789"
newDoc = theDoc.sub( regex, 'var foo = \1' )
#=> "var foo = 123456789" # replace, saving the captured information
Edit #2: Getting access to information before/after a match
regex = /\d+/
match = regex.match( theDoc )
p match.pre_match, match[0], match.post_match
#=> "/* MyKey */ = { [code_missing]; MY_VALUE = \""
#=> "123456789"
#=> "\";"
newDoc = "#{match.pre_match}HELLO#{match.post_match}"
#=> "/* MyKey */ = { [code_missing]; MY_VALUE = \"HELLO\";"
Note that this requires a regex that does not actually match the pre/post text.
If you need to specify the limits, and not the contents, you can use zero-width lookbehind/lookahead:
regex = /(?<=the_beginning).+?(?=the_end)/
m = regex.match(str)
"#{m.pre_match}--new middle--#{m.post_match}"
#=> "Hello this is the_beginning--new middle--the_end of the string"
…but now this is clearly more work than just capturing and using \1 and \2. I'm not sure I fully understand what you are looking for, why you think it would be easier.
Related
Is is possible to create/use a regular expression pattern in ruby that is based on the value of a variable name?
For instance, we all know we can do the following with Ruby strings:
str = "my string"
str2 = "This is #{str}" # => "This is my string"
I'd like to do the same thing with regular expressions:
var = "Value"
str = "a test Value"
str.gsub( /#{var}/, 'foo' ) # => "a test foo"
Obviously that doesn't work as listed, I only put it there as an example to show what I'd like to do. I need to regexp match based on the value of a variable's content.
The code you think doesn't work, does:
var = "Value"
str = "a test Value"
p str.gsub( /#{var}/, 'foo' ) # => "a test foo"
Things get more interesting if var can contain regular expression meta-characters. If it does and you want those matacharacters to do what they usually do in a regular expression, then the same gsub will work:
var = "Value|a|test"
str = "a test Value"
str.gsub( /#{var}/, 'foo' ) # => "foo foo foo"
However, if your search string contains metacharacters and you do not want them interpreted as metacharacters, then use Regexp.escape like this:
var = "*This*"
str = "*This* is a string"
p str.gsub( /#{Regexp.escape(var)}/, 'foo' )
# => "foo is a string"
Or just give gsub a string instead of a regular expression. In MRI >= 1.8.7, gsub will treat a string replacement argument as a plain string, not a regular expression:
var = "*This*"
str = "*This* is a string"
p str.gsub(var, 'foo' ) # => "foo is a string"
(It used to be that a string replacement argument to gsub was automatically converted to a regular expression. I know it was that way in 1.6. I don't recall which version introduced the change).
As noted in other answers, you can use Regexp.new as an alternative to interpolation:
var = "*This*"
str = "*This* is a string"
p str.gsub(Regexp.new(Regexp.escape(var)), 'foo' )
# => "foo is a string"
It works, but you need to use gsub! or assign the return to another variable
var = "Value"
str = "a test Value"
str.gsub!( /#{var}/, 'foo' ) # Or this: new_str = str.gsub( /#{var}/, 'foo' )
puts str
Yes
str.gsub Regexp.new(var), 'foo'
You can use regular expressions through variables in ruby:
var = /Value/
str = "a test Value"
str.gsub( /#{var}/, 'foo' )
str.gsub( Regexp.new("#{var}"), 'foo' )
The regex pattern works when I try it in regexr.com
pattern = %r{(["'])(?:\\\1|[^\1])*?\1}
string = %Q(var msg = 'hello' + 'world')
string.gsub(pattern, '<span>\1</span>')
I am expecting the output to be:
"var msg = <span>'hello'</span> + <span>'world'</span>"
But instead I am getting:
# => "var msg = <span>'</span> + <span>'</span>"
Don't you think you are printing and capturing wrong. It should be
pattern = %r{(["'])(\\\1|[^\1]*?)\1}
# ^^
# ||
#(This will capture the string hello etc. which are after quotes)
string = %Q(var msg = 'hello' + 'world')
string.gsub(pattern, '<span>\1\2\1</span>')
Ideone Demo
I think it would be clearer to use a regex that does not have capture groups.
r = /
(?:'.+?') # match one or more of any character, lazily, surrounded by
# single quotes, in a non-capture group
| # or
(?:".+?") # match one or more of any character, lazily, surrounded by
# double quotes, in a non-capture group
/x # free-spacing regex definition mode
which you would normally see written
r = /(?:'.+?')|(?:".+?")/
Then
"var msg = 'hello' + 'world'".gsub(r) { |s| "<span>%s</span>" % s }
#=> "var msg = <span>'hello'</span> + <span>'world'</span>"
"var msg = \"hello\" + \"world\"".gsub(r) { |s| "<span>#{s}</span>" }
#=> "var msg = <span>\"hello\"</span> + <span>\"world\"</span>"
"var msg = 'It is my' + 'friend Bubba'".gsub(r) { |s| "<span>"+s+"</span>" }
#=> "var msg = <span>'It is my'</span> + <span>'friend Bubba'</span>"
I'm having some trouble trying to find an appropriate method for string substitution. I would like to replace every character in a string 'except' for a selection of words or set of string (provided in an array). I know there's a gsub method, but I guess what I'm trying to achieve is its reverse. For example...
My string: "Part of this string needs to be substituted"
Keywords: ["this string", "substituted"]
Desired output: "**** ** this string ***** ** ** substituted"
ps. It's my first question ever, so your help will be greatly appreciated!
Here's a different approach. First, do the reverse of what you ultimately want: redact what you want to keep. Then compare this redacted string to your original character by character, and if the characters are the same, redact, and if they are not, keep the original.
class String
# Returns a string with all words except those passed in as keepers
# redacted.
#
# "Part of this string needs to be substituted".gsub_except(["this string", "substituted"], '*')
# # => "**** ** this string ***** ** ** substituted"
def gsub_except keep, mark
reverse_keep = self.dup
keep.each_with_object(Hash.new(0)) { |e, a| a[e] = mark * e.length }
.each { |word, redacted| reverse_keep.gsub! word, redacted }
reverse_keep.chars.zip(self.chars).map do |redacted, original|
redacted == original && original != ' ' ? mark : original
end.join
end
end
You can use something like:
str="Part of this string needs to be substituted"
keep = ["this","string", "substituted"]
str.split(" ").map{|word| keep.include?(word) ? word : word.split("").map{|w| "*"}.join}.join(" ")
but this will work only to keep words, not phrases.
This might be a little more understandable than my last answer:
s = "Part of this string needs to be substituted"
k = ["this string", "substituted"]
tmp = s
for(key in k) {
tmp = tmp.replace(k[key], function(x){ return "*".repeat(x.length)})
}
res = s.split("")
for(charIdx in s) {
if(tmp[charIdx] != "*" && tmp[charIdx] != " ") {
res[charIdx] = "*"
} else {
res[charIdx] = s.charAt(charIdx)
}
}
var finalResult = res.join("")
Explanation:
This goes off of my previous idea about using where the keywords are in order to replace portions of the string with stars. First off:
For each of the keywords we replace it with stars, of the same length as it. So:
s.replace("this string", function(x){
return "*".repeat(x.length)
}
replaces the portion of s that matches "this string" with x.length *'s
We do this for each key, for completeness, you should make sure that the replace is global and not just the first match found. /this string/g, I didn't do it in the answer, but I think you should be able to figure out how to use new RegExp by yourself.
Next up, we split a copy of the original string into an array. If you're a visual person, it should make sense to think of this as a weird sort of character addition:
"Part of this string needs to be substituted"
"Part of *********** needs to be substituted" +
---------------------------------------------
**** ** this string ***** ** ** ***********
is what we're going for. So if our tmp variable has stars, then we want to bring over the original string, and otherwise we want to replace the character with a *
This is easily done with an if statement. And to make it like your example in the question, we also bring over the original character if it's a space. Lastly, we join the array back into a string via .join("") so that you can work with a string again.
Makes sense?
You can use the following approach: collect the substrings that you need to turn into asterisks, and then perform this replacement:
str="Part of this string needs to be substituted"
arr = ["this string", "substituted"]
arr_to_remove = str.split(Regexp.new("\\b(?:" + arr.map { |x| Regexp.escape(x) }.join('|') + ")\\b|\\s+")).reject { |s| s.empty? }
arr_to_remove.each do |s|
str = str.gsub(s, "*" * s.length)
end
puts str
Output of the demo program:
**** ** this string ***** ** ** substituted
str = "Part of this string needs to be substituted"
keywords = ["this string", "substituted"]
pattern = /(#{keywords.join('|')})/
str.split(pattern).map {|i| keywords.include?(i) ? i : i.gsub(/\S/,"*")}.join
#=> "**** ** this string ***** ** ** substituted"
A more readable version of the same code
str = "Part of this string needs to be substituted"
keywords = ["this string", "substituted"]
#Use regexp pattern to split string around keywords.
pattern = /(#{keywords.join('|')})/ #pattern => /(this string|substituted)/
str = str.split(pattern) #=> ["Part of ", "this string", " needs to be ", "substituted"]
redacted = str.map do |i|
if keywords.include?(i)
i
else
i.gsub(/\S/,"*") # replace all non-whitespace characters with "*"
end
end
# redacted => ["**** **", "this string", "***** ** **", "substituted"]
redacted.join
You can do that using the form of String#split that uses a regex with a capture group.
Code
def sub_some(str, keywords)
str.split(/(#{keywords.join('|')})/)
.map {|s| keywords.include?(s) ? s : s.gsub(/./) {|c| (c==' ') ? c : '*'}}
.join
end
Example
str = "Part of this string needs to be substituted"
keywords = ["this string", "substituted"]
sub_some(str, keywords)
#=> "**** ** this string ***** ** ** substituted"
Explanation
r = /(#{keywords.join('|')})/
#=> /(this string|substituted)/
a = str.split(r)
#=> ["Part of ", "this string", " needs to be ", "substituted"]
e = a.map
#=> #<Enumerator: ["Part of ", "this string", " needs to be ",
# "substituted"]:map>
s = e.next
#=> "Part of "
keywords.include?(s) ? s : s.gsub(/./) { |c| (c==' ') ? c : '*' }
#=> s.gsub(/./) { |c| (c==' ') ? c : '*' }
#=> "Part of "gsub(/./) { |c| (c==' ') ? c : '*' }
#=> "**** ** "
s = e.next
keywords.include?(s) ? s : s.gsub(/./) { |c| (c==' ') ? c : '*' }
#=> "this string"
keywords.include?(s) ? s : s.gsub(/./) { |c| (c==' ') ? c : '*' }
#=> s
#=> "this string"
and so on... Lastly,
["**** ** ", "this string", " ***** ** ** ", "substituted"].join('|')
#=> "**** ** this string ***** ** ** substituted"
Note that, prior to v.1.9.3, Enumerable#map did not return an enumerator when no block is given. The calculations are the same, however.
I'd like to replace/duplicate a substring, between two delimeters -- e.g.,:
"This is (the string) I want to replace"
I'd like to strip out everything between the characters ( and ), and set that substr to a variable -- is there a built in function to do this?
I would just do:
my_string = "This is (the string) I want to replace"
p my_string.split(/[()]/) #=> ["This is ", "the string", " I want to replace"]
p my_string.split(/[()]/)[1] #=> "the string"
Here are two more ways to do it:
/\((?<inside_parenthesis>.*?)\)/ =~ my_string
p inside_parenthesis #=> "the string"
my_new_var = my_string[/\((.*?)\)/,1]
p my_new_var #=> "the string"
Edit - Examples to explain the last method:
my_string = 'hello there'
capture = /h(e)(ll)o/
p my_string[capture] #=> "hello"
p my_string[capture, 1] #=> "e"
p my_string[capture, 2] #=> "ll"
var = "This is (the string) I want to replace"[/(?<=\()[^)]*(?=\))/]
var # => "the string"
str = "This is (the string) I want to replace"
str.match(/\((.*)\)/)
some_var = $1 # => "the string"
As I understand, you want to remove or replace a substring as well as set a variable equal to that substring (sans the parentheses). There are many ways to do this, some of which are slight variants of the other answers. Here's another way that also allows for the possibility of multiple substrings within parentheses, picking up from #sawa's comments:
def doit(str, repl)
vars = []
str.gsub(/\(.*?\)/) {|m| vars << m[1..-2]; repl}, vars
end
new_str, vars = doit("This is (the string) I want to replace", '')
new_str # => => "This is I want to replace"
vars # => ["the string"]
new_str, vars = doit("This is (the string) I (really) want (to replace)", '')
new_str # => "This is I want"
vars # => ["the string", "really, "to replace"]
new_str, vars = doit("This (short) string is a () keeper", "hot dang")
new_str # => "This hot dang string is a hot dang keeper"
vars # => ["short", ""]
In the regex, the ? in .*? makes .* "lazy". gsub passes each match m to the block; the block strips the parens and adds it to vars, then returns the replacement string. This regex also works:
/\([^\(]*\)/
How do I add a apostrophe at the beginning and end of a string?
string = "1,2,3,4"
I would like that string to be:
'1','2','3','4'
Not sure, if this is what you want:
>> s = "1,2,3,4"
>> s.split(',').map { |x| "'#{x}'" }.join(',')
=> "'1','2','3','4'"
result = []
"1,2,3,4".split(',').each do |c|
result << "'#{c.match /\d+/}'"
end
puts result.join(',')
'1','2','3','4'
We can use regular expression to find digits
string = "1,2,3,4"
string.gsub(/(\d)/, '\'\1\'')
#=> "'1','2','3','4'"
str.insert(0, 'x')
str.insert(str.length, 'x')
After seeing your edit.
q = "1,2,3,4"
ar = q.split(',')
ar.each{|i| i.insert(0, "'").insert(-1, "'")}
q = ar.join(',')