Replace words in a string - Ruby

Replace words in a string - Ruby - ruby

I have a string in Ruby:
sentence = "My name is Robert"
How can I replace any one word in this sentence easily without using complex code or a loop?

sentence.sub! 'Robert', 'Joe'
Won't cause an exception if the replaced word isn't in the sentence (the []= variant will).
How to replace all instances?
The above replaces only the first instance of "Robert".
To replace all instances use gsub/gsub! (ie. "global substitution"):
sentence.gsub! 'Robert', 'Joe'
The above will replace all instances of Robert with Joe.

If you're dealing with natural language text and need to replace a word, not just part of a string, you have to add a pinch of regular expressions to your gsub as a plain text substitution can lead to disastrous results:
'mislocated cat, vindicating'.gsub('cat', 'dog')
=> "mislodoged dog, vindidoging"
Regular expressions have word boundaries, such as \b which matches start or end of a word. Thus,
'mislocated cat, vindicating'.gsub(/\bcat\b/, 'dog')
=> "mislocated dog, vindicating"
In Ruby, unlike some other languages like Javascript, word boundaries are UTF-8-compatible, so you can use it for languages with non-Latin or extended Latin alphabets:
'сіль у кисіль, для весіль'.gsub(/\bсіль\b/, 'цукор')
=> "цукор у кисіль, для весіль"

You can try using this way :
sentence ["Robert"] = "Roger"
Then the sentence will become :
sentence = "My name is Roger" # Robert is replaced with Roger

First, you don't declare the type in Ruby, so you don't need the first string.
To replace a word in string, you do: sentence.gsub(/match/, "replacement").

Related

Replace special character with its index

I need to replace all special characters within a string with their index.
For example,
"I-need_to#change$all%special^characters^"
should become:
"I1need6to9change16all20special28characters39"
The index of all special character differs.
I have checked many links replacing all with single character, occurances of a character.
I found very similar link but it I do not want to adopt these replace its index number as I need to replace all of the special characters.
I have also tried to do something like this:
str.gsub!(/[^0-9A-Za-z]/, '')
Here str is my example string.
As this replaces all the characters but with space, and I want the index instead of space. Either all of the special character or these seven
\/*[]:?
I need to replace this seven mainly but it would be OK if we replace all of them.
I need a simpler way.
Thanks in advance.

You can use the global variable $` and the block form of gsub:
irb> str = "I-need_to#change$all%special^characters^"
=> "I-need_to#change$all%special^characters^"
irb> str.gsub(/[^0-9A-Za-z]/) { $`.length }
=> "I1need6to9change16all20special28characters39"

How use match in ruby?

Im trying to get the uppercase words from a text. How i can use .match() for this?
Example
text = "Pediatric stroke (PS) is a relatively rare disease, having an estimated incidence of 2.5–13/100,000/year [1–4], but remains one of the most common causes of death in childhood, with a mortality rate of 0.6/100,000 dead/year [5, 6]"
and I need something like:
r = /[A-Z]/
puts r.match(text)
I never used match and i need a method that gets all uppercase words (Acronym).

If you only want acronyms, you can use something like:
text = "Pediatric stroke (PS) is a relatively rare disease, having an estimated incidence of 2.5–13/100,000/year [1–4], but remains one of the most common causes of death in childhood, with a mortality rate of 0.6/100,000 dead/year [5, 6]"
text.scan(/\b[A-Z]+\b/)
# => ["PS"]
It's important to match entire words, which is where \b helps, as it marks word boundaries.
The problem is when your text contains single, stand-alone capital letters:
text = "Pediatric stroke (PS) I U.S.A"
text.scan(/\b[A-Z]+\b/)
# => ["PS", "I", "U", "S", "A"]
At that point we need a bit more intelligence and foreknowledge of the text content being searched. The question is, are single-letter acronyms valid? If not, then a minor modification will help:
text.scan(/\b[A-Z]{2,}\b/)
# => ["PS"]
{2,} is explained in the Regexp documentation, so read that for more information.
i only want acronym type " (ACRONYM) ", in this case PS
It's not easy to tell what you want by your description. An acronym is defined as:
An acronym is an abbreviation used as a word which is formed from the initial components in a phrase or a word. Usually these components are individual letters (as in NATO or laser) or parts of words or names (as in Benelux).
according to Wikipedia. By that definition, lowercase, all caps and mixed case can be valid.
If, you mean you only want all-caps within parenthesis, then you can easily modify the regex to honor that, but you'll fail on other acronyms you could encounter, by either missing ones you should want, or by capturing others you should want to ignore.
text = "(PS) (CT/CAT scan)"
text.scan(/\([A-Z]+\)/) # => ["(PS)"]
text.scan(/\([A-Z]+\)/).map{ |s| s[1..-2] } # => ["PS"]
text.scan(/\(([A-Z]+)\)/) # => [["PS"]]
text.scan(/\(([A-Z]+)\)/).flatten # => ["PS"]
are varying ways grab the text but this only opens a new can of worms when you look at "List of medical abbreviations" and "Medical Acronyms / Abbreviations".
Typically I'd have a table of the ones I'll accept, use a simple pattern to capture anything that looks like something I'd want, check to see if it's in the table then keep it or reject it. How to do that is for you to figure out as it's a completely different question and doesn't belong in this one.

Wrong function for the job. Use String#scan.

To get all words that start with uppercase, use String#scan with \b\p{Lu}\w*\b:
text = "Pediatric stroke (PS) is a relatively rare disease, having an estimated incidence of 2.5–13/100,000/year [1–4], but remains one of the most common causes of death in childhood, with a mortality rate of 0.6/100,000 dead/year [5, 6]"
puts text.scan(/\b\p{Lu}\w*\b/).flatten
See demo
The String.match() will only get you the first match, while scan will return all matches.
The regex \b\p{Lu}\w*\b matches:
\b - word boundary
\p{Lu} - an uppercase Unicode letter
\w* - 0 or more alphanumeric characters
\b - a trailing word boundary
To only match linguistic words (made of letters) you can use
puts text.scan(/\b\p{Lu}\p{M}*+(?>\p{L}\p{M}*+)*\b/).flatten
See another demo
Here, \p{Lu}\p{M}*+ matches any Unicode uppercase letter (even a precomposed one as \p{M} matches diacritics) and (?>\p{L}\p{M}*+)* matches 0 or more letters.
To only get words in ALLCAPS, use
puts text.scan(/\b(?>\p{Lu}\p{M}*+)+\b/).flatten
See the 3rd demo

Yes, you can use String#match for this. It may not be the best way, but you didn't ask if it was. You'd have to do something like this:
text.split.map { |s| s.match(/[A-Z]\w*/) }.compact.map { |md| md[0] }
#=> ["Pediatric", "PS"]
If you knew in advance that text contained two words beginning with a capital letter, you could write:
text.match(/([A-Z]\w*).*([A-Z]\w*)/)
[$1,$2]
#=> ["Pediatric", "PS"]
Note that using a regex is not your only option:
text.delete('.,!?()[]{}').split.select { |str| ('A'..'Z').cover?(str[0]) }
#=> ["Pediatric", "PS"]

Ruby Regex Rubular vs reality

I have a string and I want to remove all non-word characters and whitespace from it. So I thought Regular expressions would be what I need for that.
My Regex looks like that (I defined it in the string class as a method):
/[\w&&\S]+/.match(self.downcase)
when I run this expression in Rubular with the test string "hello ..a.sdf asdf..," it highlioghts all the stuff I need ("hellloasdfasdf") but when I do the same in irb I only get "hello".
Has anyone any ideas about why that is?

Because you use match, with returns one matching element. If you use scan instead, all should work properly:
string = "hello ..a.sdf asdf..,"
string.downcase.scan(/[\w&&\S]+/)
# => ["hello", "a", "sdf", "asdf"]

\w means [a-zA-Z0-9_]
\S means any non-whitespace character [a-zA-Z_-0-9!##$%^&*\(\)\\{}?><....etc]
so using a \w and \S condition is ambiguous.
Its like saying What is an intersection of India and Asia. Obviously its going to be India. So I will suggest you to use \w+.
and you can use scan to get all matches as mentioned in the second answer :
string = "hello ..a.sdf asdf..,"
string.scan(/\w+/)

How do I write a regular expression for gsub that keeps whitespace, letters and numbers but removes punctuation marks?

I am trying to write a regular expression that achieves the following:
General Motors --> General Motors (stays the same!)
Yahoo! --> Yahoo (remove exclamation point)
Le7el --> Le7el
Mat. Science --> Mat Science
I tried a simple "/\W+$/", but that catches punctuation at the end of the line only, unfortunately.

If you need to be Unicode-aware then use the "Punct" property:
s.gsub(/\p{Punct}/, '')
That will work just as well with simple ASCII punctuation.

Try s/[^\w\s]//g, it should replace all non-word and non-space characters with an empty string.
If needed, specify exactly what you consider valid characters, like s/[^A-Za-z0-9 ]//g for instance.
OKay, so that's Perl, but it's the thought regex that counts.

['General Motors','Yahoo!','Le7el','Mat. Science'].map{|e| e.tr('.!','')}
# => ["General Motors", "Yahoo", "Le7el", "Mat Science"]
['General Motors','Yahoo!','Le7el','Mat. Science'].map{|e| e.gsub(/[[:punct:]]/,'')}
# => ["General Motors", "Yahoo", "Le7el", "Mat Science"]

String gsub - Replace characters between two elements, but leave surrounding elements

Suppose I have the following string:
mystring = "start/abc123/end"
How can you splice out the abc123 with something else, while leaving the "/start/" and "/end" elements intact?
I had the following to match for the pattern, but it replaces the entire string. I was hoping to just have it replace the abc123 with 123abc.
mystring.gsub(/start\/(.*)\/end/,"123abc") #=> "123abc"
Edit: The characters between the start & end elements can be any combination of alphanumeric characters, I changed my example to reflect this.

You can do it using this character class : [^\/] (all that is not a slash) and lookarounds
mystring.gsub(/(?<=start\/)[^\/]+(?=\/end)/,"7")

For your example, you could perhaps use:
mystring.gsub(/\/(.*?)\//,"/7/")
This will match the two slashes between the string you're replacing and putting them back in the substitution.

Alternatively, you could capture the pieces of the string you want to keep and interpolate them around your replacement, this turns out to be much more readable than lookaheads/lookbehinds:
irb(main):010:0> mystring.gsub(/(start)\/.*\/(end)/, "\\1/7/\\2")
=> "start/7/end"
\\1 and \\2 here refer to the numbered captures inside of your regular expression.

The problem is that you're replacing the entire matched string, "start/8/end", with "7". You need to include the matched characters you want to persist:
mystring.gsub(/start\/(.*)\/end/, "start/7/end")
Alternatively, just match the digits:
mystring.gsub(/\d+/, "7")

You can do this by grouping the start and end elements in the regular expression and then referring to these groups in in the substitution string:
mystring.gsub(/(?<start>start\/).*(?<end>\/end)/, "\\<start>7\\<end>")

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Replace words in a string - Ruby - ruby

I have a string in Ruby: sentence = "My name is Robert" How can I replace any one word in this sentence easily without using complex code or a loop?

You can try using this way : sentence ["Robert"] = "Roger" Then the sentence will become : sentence = "My name is Roger" # Robert is replaced with Roger

First, you don't declare the type in Ruby, so you don't need the first string. To replace a word in string, you do: sentence.gsub(/match/, "replacement").

Related

Replace special character with its index

How use match in ruby?

Ruby Regex Rubular vs reality

How do I write a regular expression for gsub that keeps whitespace, letters and numbers but removes punctuation marks?

String gsub - Replace characters between two elements, but leave surrounding elements

Categories

Resources