Ruby regex how to replace each item after a lookback? - ruby

I'm struggling a bit with replacing some spaces with HTML nbsp; characters.
I'm trying to replace each space with a nbsp; character (not replace all of them with one nbsp;).
Here's what I'm trying at the moment:
"My String: ".gsub(/(?<=:).*\s/, ' ')
=>"My String: "
but this is about as close that I can get (I can kinda see why it's not working, but I'm unable to take it to that next step - if there is one?)...
Any Regex gods out there that can help?

If you are happy with your regexp you could just go:
p "My String: ".gsub(/(?<=:).*\s/){|x| ' '*x.size }
#=> "My String: "
If you instead want to make a new regexp:
# Any single space character that must be followed by 0+ spaces and then end of string.
string.gsub(/\s(?=\s*\Z)/,' ')

Related

Remove non-alphabetic characters and more than one space from a string

I am not sure what I am doing wrong here:
"#New York".gsub(/[^a-zA-Z\s]/,"").strip
This regex should remove all non-alphabetic characters and all spaces greater than 1 space.
It should give me the following result:
"New York"
What is wrong with the regex?
You can replace strip with squeeze:
"#New York".gsub(/[^a-zA-Z\s]/,"").squeeze(" ")
# => New York
Another way is to use a regex like
" #New \t York ".gsub(/\s{2,}|[^\sa-zA-Z]/, ' ').strip
Or
" #New \t York ".gsub(/(\s){2,}|[^\sa-zA-Z]/, '\1').strip
Here, /\s{2,}|[^\sa-zA-Z]/ matches 2 or more consecutive whitespaces (\s{2,}) or (|) any char other than an ASCII letter or whitespace ([^\sa-zA-Z]). In case of (\s){2,}, the last whitespace captured will be inserted into the resulting string with the help of the \1 placeholder.
See a Rubular demo.
Regex: (?:\s+(?=\s)|[^A-Za-z\s]+)
Ruby code:
"#New York".gsub(/(?:\s+(?=\s)|[^A-Za-z\s]+)/, '')
Output:
New York
Code demo

Extract values after pattern in Ruby string

I have a string like this:
"<root><some ProdCode=\"40\" ProducerName=\"demo1\" ProdCode=\"40\" Need_Confirmation=\"1\"/><some ProdCode=\"40\" ProducerName=\"demo1\" ProdCode=\"40\" Need_Confirmation=\"1\"/></root>"
I'm trying to pull the content from this string which is between =\"content\" and put it in an array, like ["40","demo1","40","1",40......]
You should use :scan to select elements by regexp pattern. Then remove escape characters.
string.scan(/"[^"]+"/).map { |element| element.delete('\\"') }
Explanation of pattern:
/ – regexp starts
" – first char should be "
[^"]+ – next should be any char except ". + sign says that number of such chars should be at least 1.
" – next should be again "
/ – regexp ends
So string.scan(/"[^"]+"/) would return:
["\"40\"", "\"demo1\"", "\"40\"", "\"1\"", "\"40\"", "\"demo1\"", "\"40\"", "\"1\""]
Then we can just delete \" using :delete method.
Convenient tool to build regexps is http://rubular.com/
When your string is this simple you can use scan + regular expression like this:
result = html.scan(/ProdCode="\d+?"/)
If it is more complex you can use a html parser like nokogiri or oga.

Erase word from ruby that contains a certain char

As the tittle suggests, I'd like to get some chars and check if the string as any of them. If I suppose, for example, "!" to be forbidden, then string.replace("",word_with_!). How can I check for forbidden chars if forbidden_chars is an array?
forbidden_chars = ["!",",",...]
check ARRAY (it is the string split into an array) for forbidden chars
erase all words with forbidden chars
Could anyone help me please? I just consider searching for the words with the cards and retrieving index as mandatory in the answer please. Thank you very much :)
string = 'I like my coffee hot, with no sugar!'
forbidden_chars = ['!', ',']
forbidden_chars_pattern = forbidden_chars.map(&Regexp.method(:escape)).join('|')
string.gsub /\S*(#{forbidden_chars_pattern})\S*/, ''
# => "I like my coffee with no "
The idea is to match as many non-white space characters as possible \S*, followed by any of the forbidden characters (!|,), followed by as many non-white space characters as possible again.
The reason we need the Regexp.escape is for the cases when a forbidden character has special regex meaning (like .).
string = 'I like my coffee strong, with no cream or sugar!'
verboten = '!,'
string.split.select { |s| s.count(verboten).zero? }.join ' '
#=> "I like my coffee with no cream or"
Note this does not preserve the spacing between "I" and "like" but if there are no extra spaces in string it returns a string that has no extra spaces.

removing all spaces within a specific string (email address) using ruby

The user is able to input text, but the way I ingest the data it often contains unnecessary carriage returns and spaces.
To remove those to make the input look more like a real sentence, I use the following:
string.delete!("\n")
string = string.squeeze(" ").gsub(/([.?!]) */,'\1 ')
But in the case of the following, I get an unintended space in the email:
string = "Hey what is \n\n\n up joeblow#dude.com \n okay"
I get the following:
"Hey what is up joeblow#dude. com okay"
How can I enable an exception for the email part of the string so I get the following:
"Hey what is up joeblow#dude.com okay"
Edited
your method does the following:
string.squeeze(" ") # replaces each squence of " " by one space
gsub(/([.?!] */, '\1 ') # check if there is a space after every char in the between the brackets [.?!]
# and whether it finds one or more or none at all
# it adds another space, this is why the email address
# is splitted
I guess what you really want by this is, if there is no space after punctuation marks, add one space. You can do this instead.
string.gsub(/([.?!])\W/, '\1 ') # if there is a non word char after
# those punctuation chars, just add a space
Then you just need to replace every sequence of space chars with one space. so the last solution will be:
string.gsub(/([.?!])(?=\W)/, '\1 ').gsub(/\s+/, ' ')
# ([.?!]) => this will match the ., ?, or !. and capture it
# (?=\W) => this will match any non word char but will not capture it.
# so /([.?!])(?=\W)/ will find punctuation between parenthesis that
# are followed by a non word char (a space or new line, or even
# puctuation for example).
# '\1 ' => \1 is for the captured group (i.e. string that match the
# group ([.?!]) which is a single char in this case.), so it will add
# a space after the matched group.
If you are okay with getting rid of the squeeze statement then, using Nafaa's answer is the simplest way to do it but I've listed an alternate method in case its helpful:
string = string.split(" ").join(" ")
However, if you want to keep that squeeze statement you can amend Nafaa's method and use it after the squeeze statement:
string.gsub(/\s+/, ' ').gsub('. com', '.com')
or just directly change the string:
string.gsub('. com', '.com')

Ruby regex: "capture string unless it is followed by..."

My regex captures quoted phrases:
"([^"]*)"
I want to improve it, by ignoring quotes, which are followed by ', -' (a comma, a space and a dash in this particular order).
How do I do this?
The test: http://rubular.com/r/xls6vN1w92
This should do it, using a Negative Lookahead:
"(?!, -)([^"]*)"(?!, -)
A little icky, but it works. You want to make sure either quote isn't followed by your string, or else the match will start at the closing quotes.
http://rubular.com/r/yFMyUKJOHL
Regex
"(.*?)"(?!, -)
Working Example
http://rubular.com/r/9kOmZLxLfy
This is unparsable in your context, its open ended. The only way to parse it is to consume the not's as well as the want's, but its still an invalid premise.
/"([^"]*?)"(?!, -)|"[^"]*?"(?=, -)/
Then check for capture group 1 on each match, something like this:
$rx = qr/"([^"]*?)"(?!, -)|"[^"]*?"(?=, -)/;
while (' "ingnore me", - "but not me" ' =~ /$rx/g) {
print "'$1'\n" if defined $1
}
Add (?!...) at the end of the regex:
"([^"\n]*)"(?!, -)

Resources