How do you place a \n in a regex replacement string? - ruby

I'd like to do something like:
string.gsub(/(whatever)/,'\n\1\n')
But I don't want "whatever" to be replaced with the literal "\nwhatever\n"
I want the \n to actually correspond to a new line.

I think you need double quotes:
string.gsub(/(whatever)/,"\n\\1\n")

\n is a new line, that's what it means
depending on how you print it, it will give you a new line so
puts "\nwhatever\n".inspect
=> "\nwhatever\n"
however:
puts "\nwhatever\n"
=>
=> whatever
=>
Unless I misunderstand the question.
If you wanted to split it into a list, do this:
puts "\nwhatever\n".split(?\n).inspect
=> ["", "whatever"]

Related

How to replace partial content with ruby regular expressions

I have a string containing this
some_string = "[quote=\"user.name, post:1, topic:14\"] some other content here"
And i'm looking to replace the number of post and topic, something like this:
"[quote=\"user.name, post:#{a}, topic:#{b}\"] whatevercontent"
How can i achieve this?
Use positive lookbehind
>> some_string = "[quote=\"user.name, post:1, topic:14\"] some other content here"
=> "[quote=\"user.name, post:1, topic:14\"] some other content here"
>> some_string.sub(/(?<=post:)[^,"]+/, 'aaa').sub(/(?<=topic:)[^,"]+/, 'bbb')
=> "[quote=\"user.name, post:aaa, topic:bbb\"] some other content here"
Explanation:
/(?<=post:)[^,"]+/
Is a string of non-comma, non-double-quote characters preceded by post:. We replace that, using the sub method, to aaa.
Then we do the same for the characters preceded by topic:, replacing that piece with bbb.
I assume that the parts you want to replace are those between the colon and either a comma or a double quote; adjust those characters if necessary.
Another approach is to not worry about a regex and invoke split to break up what you have into key value pairs and put everything back together with the new values. But if your use case is restricted enough, the regex approach above can work.
ADDENDUM
The OP wants to make sure that the replacement happens only within the bracketed part of the string and not anywhere else. Here is how that can be done, assuming no square brackets inside the quote part (and therefore no nesting):
>> s = 'post:no change, [quote="user.name, post:1, topic:14"] topic:no change,'
=> "post:no change, [quote=\"user.name, post:1, topic:14\"] topic:no change,"
>> quote_part = s.scan(/\[quote[^\]]+\]/)[0]
=> "[quote=\"user.name, post:1, topic:14\"]"
>> new_quote_part = quote_part.sub(/(?<=post:)[^,"]+/, 'aaa').sub(/(?<=topic:)[^,"]+/, 'bbb')
=> "[quote=\"user.name, post:aaa, topic:bbb\"]"
>> s.sub(quote_part, new_quote_part)
=> "post:no change, [quote=\"user.name, post:aaa, topic:bbb\"] topic:no change,"
The last line has replacements only within the bracketed quote part.
How about:
some_string = "[quote=\"user.name, post:1, topic:14\"] some other content here"
new_post = "2"
new_topic = "15"
some_string = some_string.sub(/post:\d+/, "post:#{new_post}").sub(/topic:\d+/, "topic:#{new_topic}")
puts some_string

How to scan text/string for certain words and gsub them out with desired words

I want to create a 'swearscan' that can scan user text and swap the swear words out for 'censored'. I thought I coded it properly, but obviously not because I'll show you what's happening. Someone please help!
And since its stackflow we'll substitute swear words for something else
puts "Input your sentence here: "
text = gets.downcase.strip
swear_words = {'cat' => 'censored', 'dog' => 'censored', 'cow' => 'censored'}
clean_text = swear_words.each do |word, clean|
text.gsub(word,clean)
end
puts clean_text
When I ran this program (with the actual swearwords) all it would return is the hash like so: catcensoreddogcensoredcowcensored. What is wrong with my code that it's returning the hash and not the clean_text with everything substituted out?
This works for me:
puts "Input your sentence here: "
text = gets.downcase.strip
swear_words = {'cat' => 'censored', 'dog' => 'censored', 'cow' => 'censored'}
swear_words.each do |word, clean| # No need to copy here
text.gsub!(word,clean) # Changed from gsub
end
puts text # Changed from clean_text
What is wrong is that gsub does not change the original string, but you are expecting it to do so. Using gsub! will change the original string. You are also wrong to expect each to return something in it. Just refer to text in the end to get the replaced string.
By the way, if the replacement strings are all the same 'censored', then it does not make sense to use a hash there. You should just have an array of the swear words, and put the replacement string in the gsub! method directly (or define it as a constant in some other place).

Get content between { } braces

How can I get the content in between "{ }" in Ruby? For example,
I love {you}
How can I fetch the element "you"? If I want to replace the content, say change "you" to "her", how should I do that? Probably using gsub?
replacements = {
'you' => 'her',
'angels' => 'demons',
'ice cream' => 'puppies',
}
my_string = "I love {you}.\nYour voice is like {angels} singing.\nI would love to eat {ice cream} with you sometime!"
replacements.each do |source, replacement|
my_string.gsub! "{#{source}}", replacement
end
puts my_string
# => I love her.
# => Your voice is like demons singing.
# => I would love to eat puppies with you sometime!
The simple way to get the content from the inside of the {...} is:
str = 'I love {you}'
str[/{(.+)}/, 1] # => "you"
That basically says, "grab everything inside a leading { to a trailing }. It's not real sophisticated and can be fooled by nested {} pairs.
Replacing the target string can be done various ways:
replace_str = 'her'
'I love {you}'.sub('you', replace_str) # => "I love {her}"
A simple sub will replace the first occurrence of the target string with the replacement text.
You could use a regex instead of the string:
'I love you {you}'.sub(/you/, replace_str) # => "I love her {you}"
If there are multiple occurrences of the target string then use a bit more text to locate it. This uses the wrapping delimiters to locate it, and then replaces them also. There are other ways to do this, but I'd do it like:
'I love you {you}'.sub(/{.+}/, "{#{ replace_str }}") # => "I love you {her}"
Alex Wayne's answer came close but didn't go all the way: Ruby's gsub has a really nice feature, where you can pass it a regex and a hash, and it will replace all the occurrences of the regex matches with the values in the hash:
hash = {
'I' => 'She',
'love' => 'loves',
'you' => 'me'
}
str.gsub(Regexp.union(hash.keys), hash) # => "She loves {me}"
That's really powerful when you want to take a template and quickly replace all the placeholders in it.
You can always use .index:
a = 'I love {bill gates}'
a[a.index('{')+1..a.index('}')-1]
The last line just says get 'a' from right after the first occurrence of '{' and right before the first occurrence of '}'. It is important to note, however, that this will only get the text between the first occurrences of {}. So it will work for your above example.
I would use indexing also to add something new between the {}s.
That would look something like:
a[0..a.index('{')] + 'Steve Jobs' + a[a.index('}')..-1]
Again this only works for the first occurrence of '{' and '}'.
Michael G.
why not use some template engine like: https://github.com/defunkt/mustache
note that ruby can do this for %{}:
"foo = %{foo}" % { :foo => 'bar' }
#=> "foo = bar"
and finally do not forget to check existing ruby template engines - do not reinvent the wheel!
Regular expressions are the way to go with gsub. Something like:
existingString.gsub(/\{(.*?)\}/) { "her" }

Ruby Regexp: How do I replace doubly escaped characters such as \\n with \n

So, I have
puts "test\\nstring".gsub(/\\n/, "\n")
and that works.
But how do I write one statement that replaces \n, \r, and \t with their correctly escaped counterparts?
You have to use backreferences. Try
puts "test\\nstring".gsub(/(\\[nrt])/, $1)
gsub sets $n (where 'n' is the number of the corresponding group in the regular expression used) to the content matched the pattern.
EDIT:
I modified the regexp, now the output should be:
test\nstring
The \n won't be intepreted as newline by puts.
Those aren't escaped characters, those are literal characters that are only represented as being escaped so they're human readable. What you need to do is this:
escapes = {
'n' => "\n",
'r' => "\r",
't' => "\t"
}
"test\\nstring".gsub(/\\([nrt])/) { escapes[$1] }
# => "test\nstring"
You will have to add other escape characters as required, and this still won't accommodate some of the more obscure ones if you really need to interpret them all. A potentially dangerous but really simple solution is to just eval it:
eval("test\\nstring")
So long as you can be assured that your input stream doesn't contain things like #{ ... } that would allow injecting arbitrary Ruby, which is possible if this is a one shot repair to fix some damaged encoding, this would be fine.
Update
There might be a mis-understanding as to what these backslashes are. Here's an example:
"\n".bytes.to_a
# => [10]
"\\n".bytes.to_a
# => [92, 110]
You can see these are two entirely different things. \n is a representation of ASCII character 10, a linefeed.
through the help of #tadman, and #black, I've discovered the solution:
>> escapes = {'\\n' => "\n", '\\t' => "\t"}
=> {"\\t"=>"\t", "\\n"=>"\n"}
>> "test\\nstri\\tng".gsub(/\\([nrt])/) { |s| escapes[s] }
=> "test\nstri\tng"
>> puts "test\\nstri\\tng".gsub(/\\([nrt])/) { |s| escapes[s] }
test
stri ng
=> nil
as it turns out, ya just map the \\ to \ and all is good. Also, you need to use puts for the terminal to output the whitespace correctly.
escapes = {'\\n' => "\n", '\\t' => "\t"}
puts "test\\nstri\\tng".gsub(/\\([nrt])/) { |s| escapes[s] }

How do I remove carriage returns with Ruby?

I thought this code would work, but the regular expression doesn't ever match the \r\n. I have viewed the data I am reading in a hex editor and verified there really is a hex D and hex A pattern in the file.
I have also tried the regular expressions /\xD\xA/m and /\x0D\x0A/m but they also didn't match.
This is my code right now:
lines2 = lines.gsub( /\r\n/m, "\n" )
if ( lines == lines2 )
print "still the same\n"
else
print "made the change\n"
end
In addition to alternatives, it would be nice to know what I'm doing wrong (to facilitate some learning on my part). :)
Use String#strip
Returns a copy of str with leading and trailing whitespace removed.
e.g
" hello ".strip #=> "hello"
"\tgoodbye\r\n".strip #=> "goodbye"
Using gsub
string = string.gsub(/\r/," ")
string = string.gsub(/\n/," ")
Generally when I deal with stripping \r or \n, I'll look for both by doing something like
lines.gsub(/\r\n?/, "\n");
I've found that depending on how the data was saved (the OS used, editor used, Jupiter's relation to Io at the time) there may or may not be the newline after the carriage return. It does seem weird that you see both characters in hex mode. Hope this helps.
If you are using Rails, there is a squish method
"\tgoodbye\r\n".squish => "goodbye"
"\tgood \t\r\nbye\r\n".squish => "good bye"
What do you get when you do puts lines? That will give you a clue.
By default File.open opens the file in text mode, so your \r\n characters will be automatically converted to \n. Maybe that's the reason lines are always equal to lines2. To prevent Ruby from parsing the line ends use the rb mode:
C:\> copy con lala.txt
a
file
with
many
lines
^Z
C:\> irb
irb(main):001:0> text = File.open('lala.txt').read
=> "a\nfile\nwith\nmany\nlines\n"
irb(main):002:0> bin = File.open('lala.txt', 'rb').read
=> "a\r\nfile\r\nwith\r\nmany\r\nlines\r\n"
irb(main):003:0>
But from your question and code I see you simply need to open the file with the default modifier. You don't need any conversion and may use the shorter File.read.
modified_string = string.gsub(/\s+/, ' ').strip
lines2 = lines.split.join("\n")
"still the same\n".chomp
or
"still the same\n".chomp!
http://www.ruby-doc.org/core-1.9.3/String.html#method-i-chomp
How about the following?
irb(main):003:0> my_string = "Some text with a carriage return \r"
=> "Some text with a carriage return \r"
irb(main):004:0> my_string.gsub(/\r/,"")
=> "Some text with a carriage return "
irb(main):005:0>
Or...
irb(main):007:0> my_string = "Some text with a carriage return \r\n"
=> "Some text with a carriage return \r\n"
irb(main):008:0> my_string.gsub(/\r\n/,"\n")
=> "Some text with a carriage return \n"
irb(main):009:0>
I think your regex is almost complete - here's what I would do:
lines2 = lines.gsub(/[\r\n]+/m, "\n")
In the above, I've put \r and \n into a class (that way it doesn't matter in which order they might appear) and added the "+" qualifier (so that "\r\n\r\n\r\n" would also match once, and the whole thing replaced with "\n")
Just another variant:
lines.delete(" \n")
Why not read the file in text mode, rather than binary mode?
lines.map(&:strip).join(" ")
You can use this :
my_string.strip.gsub(/\s+/, ' ')
def dos2unix(input)
input.each_byte.map { |c| c.chr unless c == 13 }.join
end
remove_all_the_carriage_returns = dos2unix(some_blob)

Resources