Unable to highlight quoted string using regex in ruby - ruby

The regex pattern works when I try it in regexr.com
pattern = %r{(["'])(?:\\\1|[^\1])*?\1}
string = %Q(var msg = 'hello' + 'world')
string.gsub(pattern, '<span>\1</span>')
I am expecting the output to be:
"var msg = <span>'hello'</span> + <span>'world'</span>"
But instead I am getting:
# => "var msg = <span>'</span> + <span>'</span>"

Don't you think you are printing and capturing wrong. It should be
pattern = %r{(["'])(\\\1|[^\1]*?)\1}
# ^^
# ||
#(This will capture the string hello etc. which are after quotes)
string = %Q(var msg = 'hello' + 'world')
string.gsub(pattern, '<span>\1\2\1</span>')
Ideone Demo

I think it would be clearer to use a regex that does not have capture groups.
r = /
(?:'.+?') # match one or more of any character, lazily, surrounded by
# single quotes, in a non-capture group
| # or
(?:".+?") # match one or more of any character, lazily, surrounded by
# double quotes, in a non-capture group
/x # free-spacing regex definition mode
which you would normally see written
r = /(?:'.+?')|(?:".+?")/
Then
"var msg = 'hello' + 'world'".gsub(r) { |s| "<span>%s</span>" % s }
#=> "var msg = <span>'hello'</span> + <span>'world'</span>"
"var msg = \"hello\" + \"world\"".gsub(r) { |s| "<span>#{s}</span>" }
#=> "var msg = <span>\"hello\"</span> + <span>\"world\"</span>"
"var msg = 'It is my' + 'friend Bubba'".gsub(r) { |s| "<span>"+s+"</span>" }
#=> "var msg = <span>'It is my'</span> + <span>'friend Bubba'</span>"

Related

How to write a regex in a single line

I have this code:
str = 'printf("My name is %s and age is %0.2d", name, age);'
SPECIFIERS = 'diuXxofeEgsc'
format_specifiers = /((?:%(?:\*?([-+]?\d*\.?\d+)*(?:[#{SPECIFIERS}]))))/i
variables = /([.[^"]]*)\);$/
format = str.scan(format_specifiers)
var = str.scan(variables).first.first.split(/,/)
Is there any way a single regex can do that in a couple of lines?
My desired output is:
%s, name
%0.2d, age
I'm a big believer in keeping regular expressions as simple as possible; They can too quickly mushroom into unwieldy/unmaintainable messes. I'd start with something like this, then tweak as necessary:
str = 'printf("My name is %s and age is %0.2d", name, age);'
formats = str.scan(/%[a-z0-9.]+/) # => ["%s", "%0.2d"]
str[/,(.+)\);$/] # => ", name, age);"
vars = str[/,(.+)\);$/].scan(/[a-z]+/) # => ["name", "age"]
puts formats.zip(vars).map{ |a| a.join(', ')}
# >> %s, name
# >> %0.2d, age
Your question has two parts:
Q1: Is it possible to do this with a single regex?
Q2: Can this be done in one or two lines of code?
The answer to both questions is "yes".
format_specifiers = /
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
/x # free-spacing regex definition mode
variables = /
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-z] # match a lowercase letter
\w* # match >= 0 word characters
/x
You can decide, after testing, if these two regexes do their jobs adequately. For testing, refer to Kernel#sprintf.
r = /
(?:#{format_specifiers}) # match format_specifiers in a non-capture group
| # or
(?:#{variables}) # match variables in a non-capture group
/x
#=> /
(?:(?x-mi:
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
)) # match format_specifiers in a non-capture group
| # or
(?:(?x-mi:
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-zA-Z] # match a letter
\w* # match >= 0 word characters
)) # match variables in a non-capture group
/x
r can of course also be written:
/(?:(?x-mi:%[^\s\"\z]+))|(?:(?x-mi:,\s*\K[a-zA-Z]\w*))/
One advantage of constructing r from two regexes is that each of the latter can be tested separately.
str = 'printf("My name is %s and age is %0.2d", name, age);'
arr = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
arr.each_slice(arr.size/2).to_a.transpose.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
I have five lines of code. We could reduce this to two by simply substituting out r in str.scan(r). We could make it a single line by writing:
str.scan(r).tap { |a|
a.replace(a.each_slice(a.size/2).to_a.transpose.map { |s| s.join(', ') }) }
#=> ["%s, name", "%0.2d, age"]
with r substituted out.
The steps here are as follows:
a = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
b = a.each_slice(a.size/2)
#=> a.each_slice(2)
#=> #<Enumerator: ["%s", "%0.2d", "name", "age"]:each_slice(2)>
c = b.to_a
#=> [["%s", "%0.2d"], ["name", "age"]]
d = c.transpose
#=> [["%s", "name"], ["%0.2d", "age"]]
e = d.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
a.replace(e)
#=> ["%s, name", "%0.2d, age"]
The methods used (aside from Array#size) are String#scan, Enumerable#each_slice, Enumerable#to_a, Enumerable#map, Array#transpose and Array#replace.

Replace s string in Ruby by gsub

I have a string
path = "MT_Store_0 /47/47/47/opt/47/47/47/data/47/47/47/FCS/47/47/47/oOvt4wCtSuODh8r9RuQT3w"
I want to remove the part of string from first /47 using gsub.
path.gsub! '/47/', '/'
Expected output:
"MT_Store_0 "
Actual output:
"MT_Store_0 /47/opt/47/data/47/FCS/47/oOvt4wCtSuODh8r9RuQT3w"
path.gsub! /\/47.*/, ''
In the regex, \/47.* matches /47 and any characters following it.
Or, you can write the regex using %r to avoid escaping the forward slashes:
path.gsub! %r{/47.*}, ''
If the output have to be MT_Store_0
then gsub( /\/47.*/ ,'' ).strip is what you want
Here are two solutions that employ neither Hash#gsub nor Hash#gsub!.
Use String#index
def extract(str)
ndx = str.index /\/47/
ndx ? str[0, ndx] : str
end
str = "MT_Store_0 /47/47/oOv"
str = extract str
#=> "MT_Store_0 "
extract "MT_Store_0 cat"
#=> "MT_Store_0 cat"
Use a capture group
R = /
(.+?) # match one or more of any character, lazily, in capture group 1
(?: # start a non-capture group
\/47 # match characters
| # or
\z # match end of string
) # end non-capture group
/x # extended mode for regex definition
def extract(str)
str[R, 1]
end
str = "MT_Store_0 /47/47/oOv"
str = extract str
#=> "MT_Store_0 "
extract "MT_Store_0 cat"
#=> "MT_Store_0 cat"

How to substitute all characters in a string except for some (in Ruby)

I'm having some trouble trying to find an appropriate method for string substitution. I would like to replace every character in a string 'except' for a selection of words or set of string (provided in an array). I know there's a gsub method, but I guess what I'm trying to achieve is its reverse. For example...
My string: "Part of this string needs to be substituted"
Keywords: ["this string", "substituted"]
Desired output: "**** ** this string ***** ** ** substituted"
ps. It's my first question ever, so your help will be greatly appreciated!
Here's a different approach. First, do the reverse of what you ultimately want: redact what you want to keep. Then compare this redacted string to your original character by character, and if the characters are the same, redact, and if they are not, keep the original.
class String
# Returns a string with all words except those passed in as keepers
# redacted.
#
# "Part of this string needs to be substituted".gsub_except(["this string", "substituted"], '*')
# # => "**** ** this string ***** ** ** substituted"
def gsub_except keep, mark
reverse_keep = self.dup
keep.each_with_object(Hash.new(0)) { |e, a| a[e] = mark * e.length }
.each { |word, redacted| reverse_keep.gsub! word, redacted }
reverse_keep.chars.zip(self.chars).map do |redacted, original|
redacted == original && original != ' ' ? mark : original
end.join
end
end
You can use something like:
str="Part of this string needs to be substituted"
keep = ["this","string", "substituted"]
str.split(" ").map{|word| keep.include?(word) ? word : word.split("").map{|w| "*"}.join}.join(" ")
but this will work only to keep words, not phrases.
This might be a little more understandable than my last answer:
s = "Part of this string needs to be substituted"
k = ["this string", "substituted"]
tmp = s
for(key in k) {
tmp = tmp.replace(k[key], function(x){ return "*".repeat(x.length)})
}
res = s.split("")
for(charIdx in s) {
if(tmp[charIdx] != "*" && tmp[charIdx] != " ") {
res[charIdx] = "*"
} else {
res[charIdx] = s.charAt(charIdx)
}
}
var finalResult = res.join("")
Explanation:
This goes off of my previous idea about using where the keywords are in order to replace portions of the string with stars. First off:
For each of the keywords we replace it with stars, of the same length as it. So:
s.replace("this string", function(x){
return "*".repeat(x.length)
}
replaces the portion of s that matches "this string" with x.length *'s
We do this for each key, for completeness, you should make sure that the replace is global and not just the first match found. /this string/g, I didn't do it in the answer, but I think you should be able to figure out how to use new RegExp by yourself.
Next up, we split a copy of the original string into an array. If you're a visual person, it should make sense to think of this as a weird sort of character addition:
"Part of this string needs to be substituted"
"Part of *********** needs to be substituted" +
---------------------------------------------
**** ** this string ***** ** ** ***********
is what we're going for. So if our tmp variable has stars, then we want to bring over the original string, and otherwise we want to replace the character with a *
This is easily done with an if statement. And to make it like your example in the question, we also bring over the original character if it's a space. Lastly, we join the array back into a string via .join("") so that you can work with a string again.
Makes sense?
You can use the following approach: collect the substrings that you need to turn into asterisks, and then perform this replacement:
str="Part of this string needs to be substituted"
arr = ["this string", "substituted"]
arr_to_remove = str.split(Regexp.new("\\b(?:" + arr.map { |x| Regexp.escape(x) }.join('|') + ")\\b|\\s+")).reject { |s| s.empty? }
arr_to_remove.each do |s|
str = str.gsub(s, "*" * s.length)
end
puts str
Output of the demo program:
**** ** this string ***** ** ** substituted
str = "Part of this string needs to be substituted"
keywords = ["this string", "substituted"]
pattern = /(#{keywords.join('|')})/
str.split(pattern).map {|i| keywords.include?(i) ? i : i.gsub(/\S/,"*")}.join
#=> "**** ** this string ***** ** ** substituted"
A more readable version of the same code
str = "Part of this string needs to be substituted"
keywords = ["this string", "substituted"]
#Use regexp pattern to split string around keywords.
pattern = /(#{keywords.join('|')})/ #pattern => /(this string|substituted)/
str = str.split(pattern) #=> ["Part of ", "this string", " needs to be ", "substituted"]
redacted = str.map do |i|
if keywords.include?(i)
i
else
i.gsub(/\S/,"*") # replace all non-whitespace characters with "*"
end
end
# redacted => ["**** **", "this string", "***** ** **", "substituted"]
redacted.join
You can do that using the form of String#split that uses a regex with a capture group.
Code
def sub_some(str, keywords)
str.split(/(#{keywords.join('|')})/)
.map {|s| keywords.include?(s) ? s : s.gsub(/./) {|c| (c==' ') ? c : '*'}}
.join
end
Example
str = "Part of this string needs to be substituted"
keywords = ["this string", "substituted"]
sub_some(str, keywords)
#=> "**** ** this string ***** ** ** substituted"
Explanation
r = /(#{keywords.join('|')})/
#=> /(this string|substituted)/
a = str.split(r)
#=> ["Part of ", "this string", " needs to be ", "substituted"]
e = a.map
#=> #<Enumerator: ["Part of ", "this string", " needs to be ",
# "substituted"]:map>
s = e.next
#=> "Part of "
keywords.include?(s) ? s : s.gsub(/./) { |c| (c==' ') ? c : '*' }
#=> s.gsub(/./) { |c| (c==' ') ? c : '*' }
#=> "Part of "gsub(/./) { |c| (c==' ') ? c : '*' }
#=> "**** ** "
s = e.next
keywords.include?(s) ? s : s.gsub(/./) { |c| (c==' ') ? c : '*' }
#=> "this string"
keywords.include?(s) ? s : s.gsub(/./) { |c| (c==' ') ? c : '*' }
#=> s
#=> "this string"
and so on... Lastly,
["**** ** ", "this string", " ***** ** ** ", "substituted"].join('|')
#=> "**** ** this string ***** ** ** substituted"
Note that, prior to v.1.9.3, Enumerable#map did not return an enumerator when no block is given. The calculations are the same, however.

Replacing a char in Ruby with another character

I'm trying to replace all spaces in a string with '%20', but it's not producing the result I want.
I'm splitting the string, then going through each character. If the character is " " I want to replace it with '%20', but for some reason it is not being replaced. What am I doing wrong?
def twenty(string)
letters = string.split("")
letters.each do |char|
if char == " "
char = '%20'
end
end
letters.join
end
p twenty("Hello world is so played out")
Use URI.escape(...) for proper URI encoding:
require 'uri'
URI.escape('a b c') # => "a%20b%20c"
Or, if you want to roll your own as a fun learning exercise, here's my solution:
def uri_escape(str, encode=/\W/)
str.gsub(encode) { |c| '%' + c.ord.to_s(16) }
end
uri_escape('a b!c') # => "a%20%20b%21c"
Finally, to answer your specific question, your snippet doesn't behave as expected because the each iterator does not mutate the target; try using map with assignment (or map!) instead:
def twenty(string)
letters = string.split('')
letters.map! { |c| (c == ' ') ? '%20' : c }
letters.join
end
twenty('a b c') # => "a%20b%20c"
If you want to first split the string on spaces, you could do this:
def twenty(string)
string.split(' ').join('%20')
end
p twenty("Hello world is so played out")
#=> "Hello%20world%20is%20so%20played%20out"
Note that this is not the same as
def twenty_with_gsub(string)
string.gsub(' ', '%20')
end
for if
string = 'hi there'
then
twenty(string)
#=> "hi%20there"
twenty_with_gsub(string)
#=> "hi%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20there"

String replacing with regexp

I have a regexp that sets $1 : it corresponds to the text between ( and ) in : the_beginning(.*)the_end.
I want to replace the value corresponding to $1 with somethingelse, not all the regexp.
In real context :
my_string contains :
/* MyKey */ = { [code_missing]; MY_VALUE = "123456789"; [code_missing]; }
I want to replace "123456789" ( with "987654321" for example ).
And this is my regexp :
"/\\* MyKey \\*/ = {[^}]*MY_VALUE = \"(.*)\";"
I'm still not sure exactly what you want, but here's some code that should help you:
str = "Hello this is the_beginning that comes before the_end of the string"
p str.sub /the_beginning(.+?)the_end/, 'new_beginning\1new_end'
#=> "Hello this is new_beginning that comes before new_end of the string"
p str.sub /(the_beginning).+?(the_end)/, '\1new middle\2'
#=> "Hello this is the_beginningnew middlethe_end of the string"
Edit:
theDoc = '/* MyKey */ = { [code_missing]; MY_VALUE = "123456789";'
regex = %r{/\* MyKey \*/ = {[^}]*MY_VALUE = "(.*)";}
p theDoc[ regex, 1 ] # extract the captured group
#=> "123456789"
newDoc = theDoc.sub( regex, 'var foo = \1' )
#=> "var foo = 123456789" # replace, saving the captured information
Edit #2: Getting access to information before/after a match
regex = /\d+/
match = regex.match( theDoc )
p match.pre_match, match[0], match.post_match
#=> "/* MyKey */ = { [code_missing]; MY_VALUE = \""
#=> "123456789"
#=> "\";"
newDoc = "#{match.pre_match}HELLO#{match.post_match}"
#=> "/* MyKey */ = { [code_missing]; MY_VALUE = \"HELLO\";"
Note that this requires a regex that does not actually match the pre/post text.
If you need to specify the limits, and not the contents, you can use zero-width lookbehind/lookahead:
regex = /(?<=the_beginning).+?(?=the_end)/
m = regex.match(str)
"#{m.pre_match}--new middle--#{m.post_match}"
#=> "Hello this is the_beginning--new middle--the_end of the string"
…but now this is clearly more work than just capturing and using \1 and \2. I'm not sure I fully understand what you are looking for, why you think it would be easier.

Resources