How to replace partial content with ruby regular expressions - ruby

I have a string containing this
some_string = "[quote=\"user.name, post:1, topic:14\"] some other content here"
And i'm looking to replace the number of post and topic, something like this:
"[quote=\"user.name, post:#{a}, topic:#{b}\"] whatevercontent"
How can i achieve this?

Use positive lookbehind
>> some_string = "[quote=\"user.name, post:1, topic:14\"] some other content here"
=> "[quote=\"user.name, post:1, topic:14\"] some other content here"
>> some_string.sub(/(?<=post:)[^,"]+/, 'aaa').sub(/(?<=topic:)[^,"]+/, 'bbb')
=> "[quote=\"user.name, post:aaa, topic:bbb\"] some other content here"
Explanation:
/(?<=post:)[^,"]+/
Is a string of non-comma, non-double-quote characters preceded by post:. We replace that, using the sub method, to aaa.
Then we do the same for the characters preceded by topic:, replacing that piece with bbb.
I assume that the parts you want to replace are those between the colon and either a comma or a double quote; adjust those characters if necessary.
Another approach is to not worry about a regex and invoke split to break up what you have into key value pairs and put everything back together with the new values. But if your use case is restricted enough, the regex approach above can work.
ADDENDUM
The OP wants to make sure that the replacement happens only within the bracketed part of the string and not anywhere else. Here is how that can be done, assuming no square brackets inside the quote part (and therefore no nesting):
>> s = 'post:no change, [quote="user.name, post:1, topic:14"] topic:no change,'
=> "post:no change, [quote=\"user.name, post:1, topic:14\"] topic:no change,"
>> quote_part = s.scan(/\[quote[^\]]+\]/)[0]
=> "[quote=\"user.name, post:1, topic:14\"]"
>> new_quote_part = quote_part.sub(/(?<=post:)[^,"]+/, 'aaa').sub(/(?<=topic:)[^,"]+/, 'bbb')
=> "[quote=\"user.name, post:aaa, topic:bbb\"]"
>> s.sub(quote_part, new_quote_part)
=> "post:no change, [quote=\"user.name, post:aaa, topic:bbb\"] topic:no change,"
The last line has replacements only within the bracketed quote part.

How about:
some_string = "[quote=\"user.name, post:1, topic:14\"] some other content here"
new_post = "2"
new_topic = "15"
some_string = some_string.sub(/post:\d+/, "post:#{new_post}").sub(/topic:\d+/, "topic:#{new_topic}")
puts some_string

Related

How remove "(2002)" (without quotes) from string in Ruby?

I have a string like this
This is some text; Awesome! (2002)
I want to remove the "(2002)" part from it using Ruby. How is this done? I know in unix it'd be
sed -e 's/([0-9]*)//g'
To remove any amount of whitespace symbols followed with a (, then one or more digits and a ) at the end of the string, use a sub with a /\s*\(\d+\)\z/ regex:
s = "This is some text; Awesome! (2002)"
s = s.sub(/\s*\(\d+\)\z/,"") # => This is some text; Awesome!
or
s[/\s*\(\d+\)\z/] = "" # => This is some text; Awesome!
See Ruby demo
If you mean a literal 2002, use it instead of \d+.
NOTE: When you use s[...] = "" approach, you still get a string as the return type, you can check it with s.class.
NOTE2: If you need to obtain the 2002 value separately, use s[/\s*\((\d+)\)\z/, 1] where 1 is passed to the matching method to return the contents of Group 1 only.
NOTE3: To split the string at the last space and get the ["This is some text; Awesome!", "2002"] as a result, use either Cary's suggestion with the regex containing a capturing group around \d+ - [s.sub(/\s*\((\d+)\)\z/,''), $1] (as $1 variable will hold the capture group 1 contents after sub executes), or s.split(/\s*\((\d+)\)\z/) where the result holds the substring from the start up to our pattern, and the digits that are wrapped with a (...) capturing group (after splitting, these values are placed into the result, not discarded).
And finally, /\([^)]*\)/ matches anything inside (...) (\( matches an open parenthesis, [^)]* matches 0 or more chars other than ) and \) matches a closing parenthesis).
If I wanted to remove something, I'd use:
foo = 'This is some text; Awesome! (2002)'
foo['(2002)'] = ''
foo # => "This is some text; Awesome! "
You can also use regex instead of the fixed string. Either way, assigning '' to the match will remove it.
foo[/\(2002\)/] = ''
foo # => "This is some text; Awesome! "
or:
foo[/\(\d+\)/] = ''
foo # => "This is some text; Awesome! "
This is documented in String's []= method.
The regex I showed you on a different question can be modified for use here:
str = "something (capture) something (capture2)"
regex = /(\(\w+\))‌​/
str.scan(regex).flatten(1) # => ["(capture)", "(capture2)"]
The only change is the addition of \( and \) in the match group.
You can plug this regex into gsub to remove all matches:
str.gsub(regex, "")
# => "something something "

How would this complicated search-and-replace operation be done in Ruby?

I have a big text file. Within this text file, I want to replace all mentions of the word 'pizza' with 'spinach', 'Pizza' with 'Spinach', and 'pizzing' with 'spinning' -- unless those words occur anywhere within curly braces. So {pizza}, {giant.pizza} and {hot-pizza-oven} should remain unchanged.
My best proposed solution so far is to iterate over the file line-by-line, issuing a regex that detects everything before an { or after an }, and using regexes on each of those strings. But that gets really complex and unwieldy and I want to know if there's a proper solution for this problem.
This can be done in a few steps. I'd iterate through the file line by line, and pass each line to this method:
def spinachize line
# list of words to swap
swaps = {
'pizza' => 'spinach',
'Pizza' => 'Spinach',
'pizzing' => 'spinning'
}
# random placeholder for bracketed text
placeholder = 'fdjfafdlskdsfajkldfas'
# save all instances of bracketed text
bracketed_text = line.scan(/\{.*?\}/)
# remove bracketed text from line
line.gsub!(/\{.*?\}/, placeholder)
# replace all swaps
swaps.each do |original_text, new_text|
line.gsub!(original_text, new_text)
end
# re-insert bracketed text
line.gsub(placeholder){bracketed_text.shift}
end
The comments above explain things as we go. Here are a couple of examples:
spinachize "Pizza is good, but more pizza is better"
=> "Spinach is good, but more spinach is better"
spinachize "Leave bracketed instances of {pizza} or {this.pizza} alone"
=> "Leave bracketed instances of {pizza} or {this.pizza} alone"
As you can see, you can specify the items you want swapped, or modify the method to pull the list from a database or flat file somewhere. The placeholder just needs to be something unique that wouldn't come up in the source file naturally.
The process is this: remove bracketed text from the original line, and remember it for later. Swap all text that needs swapping, then add back the bracketed text. It's not a one-liner, but it works well and is readable and easy to update.
The last line of the method might need some clarification. Not many people know that the "gsub" method can take a block instead of a second parameter. That block then determines what gets put in place of the original text. In this case, every time the block is called I remove the first item off our saved bracket list, and use that.
rules = {'pizza' => 'spinach','Pizza' => 'Spinach','pizzing' => 'spinning'}
regexp = /\{[^{}]*\}|#{rules.keys.join('|')}/m
puts(file.read.gsub(regexp) { |s| rules[s] || s })
This constructs a regular expression that matches either bracketed strings or the strings to replace. We then run it through a block that replaces strings with the given value, and will leave bracketed strings unchanged. With the /m flag, the regular expression can tolerate newlines inside the brackets--if that won't happen, you can take it out. Either way, no need to iterate line by line.
str = "Pizza {pizza} with spinach is not pizzing."
swaps = {'{pizza}' =>'{pizza}',
'{Pizza}' =>'{Pizza}',
'{pizzing}'=> '{pizzing}'
'pizza' => 'spinach',
'Pizza' => 'Spinach',
'pizzing' => 'spinning'}
regex = Regexp.union(swaps.keys)
p str.gsub(regex, swaps) # => "Spinach {pizza} with spinach is not spinning."
I would call the following method for each line of the file.
Code
def doit(line)
replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
r = /\{.*?\}/
arr= line.split(r).map { |str|
str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
line.scan(r).each_with_object(arr.shift) { |str,res|
res << str << arr.shift }
end
Examples
doit("Pizza Primastrada's {pizza} is the best {pizzing} pizza in town.")
#=> "Spinach Primastrada's {pizza} is the best {pizzing} spinach in town."
doit("{Pizza Primastrada}'s pizza is the best pizzing {pizza} in town.")
#=> "{Pizza Primastrada}'s spinach is the best spinning {pizza} in town."
Explanation
line = "Pizza Primastrada's {pizza} is the best {pizzing} pizza in town."
replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
r = /\{.*?\}/
a = line.split(r)
#=> ["Pizza Primastrada's ", " is the best ", " pizza in town."]
b = a.map { |str| str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
#=> ["Spinach Primastrada's ", " is the best ", " spinach in town."]
keepers = line.scan(r)
#=> ["{pizza}", "{pizzing}"]
keepers.each_with_object(b.shift) { |str,res| res << str << b.shift }
#=> "Spinach Primastrada's {pizza} is the best {pizzing} spinach in town."
Nested braces
If you wish to permit nested braces, change the regex to:
r = /\{[^{}]*?(?:\{.*?\})*?[^{}]*?\}/
doit("Pizza Primastrada's {{great {great} pizza} is the best pizza.")
#=> "Spinach Primastrada's {{great {great} pizza} is the best spinach."
You referred to the string
{words,salad,#{1,2,3} pizza|}
in a comment. If that is part of a string enclosed in single quotes, not a problem. If enclosed in double quotes, however, # will raise a syntax error. Again, no problem, if the pound character is escaped (\#).

How do you place a \n in a regex replacement string?

I'd like to do something like:
string.gsub(/(whatever)/,'\n\1\n')
But I don't want "whatever" to be replaced with the literal "\nwhatever\n"
I want the \n to actually correspond to a new line.
I think you need double quotes:
string.gsub(/(whatever)/,"\n\\1\n")
\n is a new line, that's what it means
depending on how you print it, it will give you a new line so
puts "\nwhatever\n".inspect
=> "\nwhatever\n"
however:
puts "\nwhatever\n"
=>
=> whatever
=>
Unless I misunderstand the question.
If you wanted to split it into a list, do this:
puts "\nwhatever\n".split(?\n).inspect
=> ["", "whatever"]

Replace single quote with backslash single quote

I have a very large string that needs to escape all the single quotes in it, so I can feed it to JavaScript without upsetting it.
I have no control over the external string, so I can't change the source data.
Example:
Cote d'Ivoir -> Cote d\'Ivoir
(the actual string is very long and contains many single quotes)
I'm trying to this by using gsub on the string, but can't get this to work:
a = "Cote d'Ivoir"
a.gsub("'", "\\\'")
but this gives me:
=> "Cote dIvoirIvoir"
I also tried:
a.gsub("'", 92.chr + 39.chr)
but got the same result; I know it's something to do with regular expressions, but I never get those.
The %q delimiters come in handy here:
# %q(a string) is equivalent to a single-quoted string
puts "Cote d'Ivoir".gsub("'", %q(\\\')) #=> Cote d\'Ivoir
The problem is that \' in a gsub replacement means "part of the string after the match".
You're probably best to use either the block syntax:
a = "Cote d'Ivoir"
a.gsub(/'/) {|s| "\\'"}
# => "Cote d\\'Ivoir"
or the Hash syntax:
a.gsub(/'/, {"'" => "\\'"})
There's also the hacky workaround:
a.gsub(/'/, '\#').gsub(/#/, "'")
# prepare a text file containing [ abcd\'efg ]
require "pathname"
backslashed_text = Pathname("/path/to/the/text/file.txt").readlines.first.strip
# puts backslashed_text => abcd\'efg
unslashed_text = "abcd'efg"
unslashed_text.gsub("'", Regexp.escape(%q|\'|)) == backslashed_text # true
# puts unslashed_text.gsub("'", Regexp.escape(%q|\'|)) => abcd\'efg

Ruby Regexp: How do I replace doubly escaped characters such as \\n with \n

So, I have
puts "test\\nstring".gsub(/\\n/, "\n")
and that works.
But how do I write one statement that replaces \n, \r, and \t with their correctly escaped counterparts?
You have to use backreferences. Try
puts "test\\nstring".gsub(/(\\[nrt])/, $1)
gsub sets $n (where 'n' is the number of the corresponding group in the regular expression used) to the content matched the pattern.
EDIT:
I modified the regexp, now the output should be:
test\nstring
The \n won't be intepreted as newline by puts.
Those aren't escaped characters, those are literal characters that are only represented as being escaped so they're human readable. What you need to do is this:
escapes = {
'n' => "\n",
'r' => "\r",
't' => "\t"
}
"test\\nstring".gsub(/\\([nrt])/) { escapes[$1] }
# => "test\nstring"
You will have to add other escape characters as required, and this still won't accommodate some of the more obscure ones if you really need to interpret them all. A potentially dangerous but really simple solution is to just eval it:
eval("test\\nstring")
So long as you can be assured that your input stream doesn't contain things like #{ ... } that would allow injecting arbitrary Ruby, which is possible if this is a one shot repair to fix some damaged encoding, this would be fine.
Update
There might be a mis-understanding as to what these backslashes are. Here's an example:
"\n".bytes.to_a
# => [10]
"\\n".bytes.to_a
# => [92, 110]
You can see these are two entirely different things. \n is a representation of ASCII character 10, a linefeed.
through the help of #tadman, and #black, I've discovered the solution:
>> escapes = {'\\n' => "\n", '\\t' => "\t"}
=> {"\\t"=>"\t", "\\n"=>"\n"}
>> "test\\nstri\\tng".gsub(/\\([nrt])/) { |s| escapes[s] }
=> "test\nstri\tng"
>> puts "test\\nstri\\tng".gsub(/\\([nrt])/) { |s| escapes[s] }
test
stri ng
=> nil
as it turns out, ya just map the \\ to \ and all is good. Also, you need to use puts for the terminal to output the whitespace correctly.
escapes = {'\\n' => "\n", '\\t' => "\t"}
puts "test\\nstri\\tng".gsub(/\\([nrt])/) { |s| escapes[s] }

Resources