removing all spaces within a specific string (email address) using ruby - ruby

The user is able to input text, but the way I ingest the data it often contains unnecessary carriage returns and spaces.
To remove those to make the input look more like a real sentence, I use the following:
string.delete!("\n")
string = string.squeeze(" ").gsub(/([.?!]) */,'\1 ')
But in the case of the following, I get an unintended space in the email:
string = "Hey what is \n\n\n up joeblow#dude.com \n okay"
I get the following:
"Hey what is up joeblow#dude. com okay"
How can I enable an exception for the email part of the string so I get the following:
"Hey what is up joeblow#dude.com okay"

Edited
your method does the following:
string.squeeze(" ") # replaces each squence of " " by one space
gsub(/([.?!] */, '\1 ') # check if there is a space after every char in the between the brackets [.?!]
# and whether it finds one or more or none at all
# it adds another space, this is why the email address
# is splitted
I guess what you really want by this is, if there is no space after punctuation marks, add one space. You can do this instead.
string.gsub(/([.?!])\W/, '\1 ') # if there is a non word char after
# those punctuation chars, just add a space
Then you just need to replace every sequence of space chars with one space. so the last solution will be:
string.gsub(/([.?!])(?=\W)/, '\1 ').gsub(/\s+/, ' ')
# ([.?!]) => this will match the ., ?, or !. and capture it
# (?=\W) => this will match any non word char but will not capture it.
# so /([.?!])(?=\W)/ will find punctuation between parenthesis that
# are followed by a non word char (a space or new line, or even
# puctuation for example).
# '\1 ' => \1 is for the captured group (i.e. string that match the
# group ([.?!]) which is a single char in this case.), so it will add
# a space after the matched group.

If you are okay with getting rid of the squeeze statement then, using Nafaa's answer is the simplest way to do it but I've listed an alternate method in case its helpful:
string = string.split(" ").join(" ")
However, if you want to keep that squeeze statement you can amend Nafaa's method and use it after the squeeze statement:
string.gsub(/\s+/, ' ').gsub('. com', '.com')
or just directly change the string:
string.gsub('. com', '.com')

Related

Remove extra white space but leave one space remaining in string

I want to be able to take in a string with the word 'WUB' placed randomly throughout it, and remove those instances replaced with white space.
Ex. "WUBWUBWUBWEWUBWUBAREWUBWUBWUBTHEWUBCHAMPIONSWUBWUBMYWUBFRIENDS"
turns into...
WE ARE THE CHAMPIONS MY FRIENDS
However the problems stands that I receive that with extra white space for each 'WUB.' How can I take out the extra white space and retain only one single white space?
def song_decoder(song)
song.gsub!(/WUB/, " ")
song.strip!
print song
return song
end
song_decoder("WUBWUBWUBWEWUBWUBAREWUBWUBWUBTHEWUBCHAMPIONSWUBWUBMYWUBFRIENDS")
# above is test case
/WUB/ gets all the "WUB" in the string, so if there are some consecutive ones, you'll have two white spaces, and using strip on the result, just would remove all whitespaces, so wouldn't be what you expect.
You could get any "WUB" as groups and replace them with ' '. As this specific result leaves you just with an initial whitespace (first character), lstrip would deal with that:
str = 'WUBWUBWUBWEWUBWUBAREWUBWUBWUBTHEWUBCHAMPIONSWUBWUBMYWUBFRIENDS'
p str.gsub(/(WUB)+/, ' ').lstrip
# "WE ARE THE CHAMPIONS MY FRIENDS"
I just found the squeeze method!
def song_decoder(song)
song.gsub!(/WUB/, " ")
song.strip!
song.squeeze!(" ")
print song
return song
end
This does exactly what I need. My apologies.
Without regular expression:
"WUBWUBWUBWEWUBWUBAREWUBWUBWUBTHEWUBCHAMPIONSWUBWUBMYWUBFRIENDS".
split('WUB').reject(&:empty?).join(' ')
#⇒ "WE ARE THE CHAMPIONS MY FRIENDS"

RegExp match fail in Ruby

I've got a problem with a chatbot in Ruby, there's a command for ban users, and it's supossed to work like writing on the chat
!ban [Username (the username sometimes may have blank spaces)]
[Length of the ban in seconds] [Reason]
like
!ban Chara Cipher 3600 making
flood
and the code is like
match /^ban (.*)(^0-9) (.+)/, :method => :ban
# #param [User] user
# #param [String] target
# #param [Integer] length
# #param [String] reason
def ban(user, target, length, reason)
if user.is? :mod
#client.ban(target, length, reason)
#client.send_msg "#{target} ha sido baneado gracias a la magia de la amistad."
end
end
The problem is that the arguments don't match correctly with every string, maybe because the Regular Expression match part, (.*)(^0-9) (.+).
Does somebody know how to fix it?
Update
https://gist.github.com/carlosqh2/b926e59772e3c28d104d756589acc75e#file-admin-rb-L213
line 214, 255-263, from Admin.rb and line 188 from client.rb are the most relevant lines, also, in lines 202-213 from Admin.rb the "!" is required for the commands to work in the chat
Three issues I see. First, you're matching 'ban' not '!ban'. Second, the first match will just match the entire rest of the string including the time of ban and reason. Third, the pattern for second match is wrong. I suggest explicitly matching the spaces to delimit arguments like ^!ban\s(.+)\s(\d+)\s(.+).
I don't think (^0-9) does what you think it does. In regex it means "capture the literal characters '0-9' at the start of the current line.
Meditate on this:
" 0-9"[/(^0-9)/] # => nil
"0-9"[/(^0-9)/] # => "0-9"
" \n0-9"[/(^0-9)/] # =>
"0-9"
The last one matched the new-line along with 0-9 and returned those, causing the output to fall on the next line.
Instead you probably want [^0-9] which means "a character that is not 0-9" and will match correctly in the middle of strings:
" 0-9"[/[^0-9]/] # => " "
"0-9"[/[^0-9]/] # => "-"
" \n0-9"[/[^0-9]/] # => " "
Read the Regexp documentation and you can piece this all together.

Regex matching chars around text

I have a string with chars inside and I would like to match only the chars around a string.
"This is a [1]test[/1] string. And [2]test[/2]"
Rubular http://rubular.com/r/f2Xwe3zPzo
Currently, the code in the link matches the text inside the special chars, how can I change it?
Update
To clarify my question. It should only match if the opening and closing has the same number.
"[2]first[/2] [1]second[/2]"
In the code above, only first should match and not second. The text inside the special chars (first), should be ignored.
Try this:
(\[[0-9]\]).+?(\[\/[0-9]\])
Permalink to the example on Rubular.
Update
Since you want to remove the 'special' characters, try this instead:
foo = "This is a [1]test[/1] string. And [2]test[/2]"
foo.gsub /\[\/?\d\]/, ""
# => "This is a test string. And test"
Update, Part II
You only want to remove the 'special' characters when the surrounding tags match, so what about this:
foo = "This is a [1]test[/1] string. And [2]test[/2], but not [3]test[/2]"
foo.gsub /(?:\[(?<number>\d)\])(?<content>.+?)(?:\[\/\k<number>\])/, '\k<content>'
# => "This is a test string. And test, but not [3]test[/2]"
\[([0-9])\].+?\[\/\1\]
([0-9]) is a capture since it is surrounded with parentheses. The \1 tells it to use the result of that capture. If you had more than one capture, you could reference them as well, \2, \3, etc.
Rubular
You can also use a named capture, rather than \1 to make it a little less cryptic. As in: \[(?<number>[0-9])\].+?\[\/\k<number>\]
Here's a way to do it that uses the form of String#gsub that takes a block. The idea is to pull strings such as "[1]test[/1]" into the block, and there remove the unwanted bits.
str = "This is a [1]test[/1] string. And [2]test[/2], plus [3]test[/99]"
r = /
\[ # match a left bracket
(\d+) # capture one or more digits in capture group 1
\] # match a right bracket
.+? # match one or more characters lazily
\[\/ # match a left bracket and forward slash
\1 # match the contents of capture group 1
\] # match a right bracket
/x
str.gsub(r) { |s| s[/(?<=\]).*?(?=\[)/] }
#=> "This is a test string. And test, plus [3]test[/99]"
Aside: When I first heard of named capture groups, they seemed like a great idea, but now I wonder if they really make regexes easier to read than \1, \2....

Regex to find a newline character ("\n") and replace with empty string from address

We have a string which contains address in it like below:
"first-name, last-name, email, address\n Ashok, G, \"Hyderabad\nTelangana\n India\"\n John, M, \"Mayur Vihar\nNew Delhi\n110096, India\"\n"
and the requirement is to replace all the newline characters ("\n") characters with "" from the address string only (inside \" \")
The Expected output should be like:
"first-name, last-name, email, address\n Ashok, G, \"Hyderabad Telangana India\"\n John, M, \"Mayur Vihar, New Delhi 110096, India\"\n "
\\n(?=(?:(?!\\").)*\\"(?:(?:(?!\\").)*\\"(?:(?!\\").)*\\")*(?:(?!\\").)*$)
Try this.Replace by empty string.See demo.
https://www.regex101.com/r/rG7gX4/7
I suggest you do it as follows:
str.gsub(/(?<=\").*?(?=\")/) { |s| s.gsub(/\n/,' ') }
#=> "first-name, last-name, email, address\n Ashok, G, \"heyderabad |
Telangana India\" ABCD, L, \"Guntur AP 500505, India\"\n"
This matches each string bracketed by \", which in turn is passed to the block for removal of all \n's. (?<=\") is a positive lookbehind; (?=\") is a postive lookahead. ? is needed to make .* non-greedy, so the match stops before the first matching postive lookahead.
This doesn't give quite the spacing contained in your desired output. That spacing seems somewhat inconsistent, however. For example, where did the single space at the end of the string come from? You said you wanted to replace \n between pairs of \", but you didn't say what you want to replace it with. (I assumed one space.) If you want different spacing, you could adjust the regex used by gsub inside the block. For example, you might have /\s*\n\s*/.

Ruby Regex gsub - everything after string

I have a string something like:
test:awesome my search term with spaces
And I'd like to extract the string immediately after test: into one variable and everything else into another, so I'd end up with awesome in one variable and my search term with spaces in another.
Logically, what I'd so is move everything matching test:* into another variable, and then remove everything before the first :, leaving me with what I wanted.
At the moment I'm using /test:(.*)([\s]+)/ to match the first part, but I can't seem to get the second part correctly.
The first capture in your regular expression is greedy, and matches spaces because you used .. Instead try:
matches = string.match(/test:(\S*) (.*)/)
# index 0 is the whole pattern that was matched
first = matches[1] # this is the first () group
second = matches[2] # and the second () group
Use the following:
/^test:(.*?) (.*)$/
That is, match "test:", then a series of characters (non-greedily), up to a single space, and another series of characters to the end of the line.
I am guessing you want to remove all the leading spaces before the second match too, hence I have \s+ in the expression. Otherwise, remove the \s+ from the expression, and you'll have what you want:
m = /^test:(\w+)\s+(.*)/.match("test:awesome my search term with spaces")
a = m[1]
b = m[2]
http://codepad.org/JzuNQxBN

Resources