Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I want to remove all special characters (including spaces) from a string's beginning and end and replace consecutive spaces with one. For example,
" !:;:§" this string is normal. "§$"§"$"§$ $"$§" "
should become:
"this string is normal"
I want to allow ! and ? at the end of the string.
" !:;:§" this string is normal? "§$"§"$"§$ $"$§" "
" !:;:§" this string is very normal! "§$"§"$"§$ $"$§" "
" !:;:§" this string is very normal!? "§$"§"$"§$ $"$§" "
should become:
"this string is normal?"
"this string is normal!"
"this string is normal!?"
This is all for getting nice titles in an app.
Can someone help me please? Or does anyone know a good regex command for nice titles?
Do it step by step:
str.
gsub(/\A\W+/, ''). # remove garbage from the very beginning
gsub(/\W*\z/) { |m| m[/\A\p{Punct}*/] }. # leave trailing punctuation
gsub(/\s{2,}/, ' ') # squeeze
R = /
(?: # begin a non-capture group
\p{Alnum}+ # match one or more alphanumeric characters
[ ]+ # match one or more spaces
)* # end non-capture group and execute zero or more times
\p{Alnum}+ # match one or more alphanumeric characters
[!?]* # match zero or more characters '!' and '?'
/x # free-spacing regex definition mode
def extract(str)
str[R].squeeze(' ')
end
arr = [
' !:;:§" this string is normal? "§$"§"$"§$ $"$§" ',
' !:;:§" this string is very normal! "§$"§"$"§$ $"$§" ',
' !:;:§" this string is very normal!? "§$"§"$"§$ $"$§" ',
' !:;:§" cette chaîne est normale? "§$"§"$"§$ $"$§" '
]
arr.each { |s| puts extract(s) }
prints
this string is normal?
this string is very normal!
this string is very normal!?
cette chaîne est normale?
See the doc for \p{Alnum} in Regexp (search for "\p{} construct").
I wrote the regular expression in free-spacing mode in order to document each step. It would conventionally be written as follows.
/(?:\p{Alnum}+ +)*\p{Alnum}+[!?]*/
Notice that in free-spacing mode I put a space in a character class. Had I not done so the space would have been removed before the regular expression was evaluated.
If non-alphanumeric characters, other than spaces, are permitted in the interior of the string, change the regular expression to the following.
def extract(str)
str.gsub(R,'')
end
R = /
\A # match the beginning of the string
[^\p{Alnum}]+ # match one non-alphanumeric characters
| # or
[^\p{Alnum}!?] # match a character other than a alphanumeric, '!' and '?'
[^\p{Alnum}]+ # match one non-alphanumeric characters
\z # match the end of the string
| # or
[ ] # match a space...
(?=[ ]) # ...followed by a space
/x # free-spacing regex definition mode
extract ' !:;:§" this string $$ is abnormal? "§$" $"$§" '
prints
"this string $$ is abnormal?"
This will regex will:
Question and exclamantion marks that are not preceded by a "normal" character or a question or exclamation mark.
Whitespaces that are not preceded by a "normal" character
All non-"normal" characters
The word "very"
(I assume "normal" characters in this case are 0..9, a..z and A..Z).
str = '" !:;:§" this string is very normal!? "§$"§"$"§$ $"$§" "'
str.gsub(/
(?:\bvery\s+) |
(?:(?<![A-Za-z\d!?])[!?]) |
(?:(?<![A-Za-z\d])\s) |
[^A-Za-z\s\d!?]
/x, '')
=> "this string is normal!?"
Related
I want to replace a space between one or two numbers and a colon followed by a space, a number, or the end of the line. If I have a string like,
line = " 0 : 28 : 37.02"
the result should be:
" 0: 28: 37.02"
I tried as below:
line.gsub!(/(\A|[ \u00A0|\r|\n|\v|\f])(\d?\d)[ \u00A0|\r|\n|\v|\f]:(\d|[ \u00A0|\r|\n|\v|\f]|\z)/, '\2:\3')
# => " 0: 28 : 37.02"
It seems to match the first ":", but the second ":" is not matched. I can't figure out why.
The problem
I'll define your regex with comments (in free-spacing mode) to show what it is doing.
r =
/
( # begin capture group 1
\A # match beginning of string (or does it?)
| # or
[ \u00A0|\r|\n|\v|\f] # match one of the characters in the string " \u00A0|\r\n\v\f"
) # end capture group 1
(\d?\d) # match one or two digits in capture group 2
[ \u00A0|\r|\n|\v|\f] # match one of the characters in the string " \u00A0|\r\n\v\f"
: # match ":"
( # begin capture group 3
\d # match a digit
| # or
[ \u00A0|\r|\n|\v|\f] # match one of the characters in the string " \u00A0|\r\n\v\f"
| # or
\z # match the end of the string
) # end capture group 3
/x # free-spacing regex definition mode
Note that '|' is not a special character ("or") within a character class. It's treated as an ordinary character. (Even if '|' were treated as "or" within a character class, that would serve no purpose because character classes are used to force any one character within it to be matched.)
Suppose
line = " 0 : 28 : 37.02"
Then
line.gsub(r, '\2:\3')
#=> " 0: 28 : 37.02"
$1 #=> " "
$2 #=> "0"
$3 #=> " "
In capture group 1 the beginning of the line (\A) is not matched because it is not a character and only characters are not matched (though I don't know why that does not raise an exception). The special character for "or" ('|') causes the regex engine to attempt to match one character of the string " \u00A0|\r\n\v\f". It therefore would match one of the three spaces at the beginning of the string line.
Next capture group 2 captures "0". For it to do that, capture group 1 must have captured the space at index 2 of line. Then one more space and a colon are matched, and lastly, capture group 3 takes the space after the colon.
The substring ' 0 : ' is therefore replaced with '\2:\3' #=> '0: ', so gsub returns " 0: 28 : 37.02". Notice that one space before '0' was removed (but should have been retained).
A solution
Here's how you can remove the last of one or more Unicode whitespace characters that are preceded by one or two digits (and not more) and are followed by a colon at the end of the string or a colon followed by a whitespace or digit. (Whew!)
def trim(str)
str.gsub(/\d+[[:space:]]+:(?![^[:space:]\d])/) do |s|
s[/\d+/].size > 2 ? s : s[0,s.size-2] << ':'
end
end
The regular expression reads, "match one or more digits followed by one or more whitespace characters, followed by a colon (all these characters are matched), not followed (negative lookahead) by a character other than a unicode whitespace or digit". If there is a match, we check to see how many digits there are at the beginning. If there are more than two the match is returned (no change), else the whitespace character before the colon is removed from the match and the modified match is returned.
trim " 0 : 28 : 37.02"
#=> " 0: 28: 37.02" xxx
trim " 0\v: 28 :37.02"
#=> " 0: 28:37.02"
trim " 0\u00A0: 28\n:37.02"
#=> " 0: 28:37.02"
trim " 123 : 28 : 37.02"
#=> " 123 : 28: 37.02"
trim " A12 : 28 :37.02"
#=> " A12: 28:37.02"
trim " 0 : 28 :"
#=> " 0: 28:"
trim " 0 : 28 :A"
#=> " 0: 28 :A"
If, as in the example, the only characters in the string are digits, whitespaces and colons, the lookbehind is not needed.
You can use Ruby's \p{} construct, \p{Space}, in place of the POSIX expression [[:space:]]. Both match a class of Unicode whitespace characters, including those shown in the examples.
Excluding the third digit can be done with a negative lookback, but since the other one or two digits are of variable length, you cannot use positive lookback for that part.
line.gsub(/(?<!\d)(\d{1,2}) (?=:[ \d\$])/, '\1')
# => " 0: 28: 37.02"
" 0 : 28 : 37.02".gsub!(/(\d)(\s)(:)/,'\1\3')
=> " 0: 28: 37.02"
I basically need to get the bit after the last pipe
"3083505|07733366638|3"
What would the regular expression for this be?
You can do this without regex. Here:
"3083505|07733366638|3".split("|").last
# => "3"
With regex: (assuming its always going to be integer values)
"3083505|07733366638|3".scan(/\|(\d+)$/)[0][0] # or use \w+ if you want to extract any word after `|`
# => "3"
Try this regex :
.*\|(.*)
It returns whatever comes after LAST | .
You could do that most easily by using String#rindex:
line = "3083505|07733366638|37"
line[line.rindex('|')+1..-1]
#=> "37"
If you insist on using a regex:
r = /
.* # match any number of any character (greedily!)
\| # match pipe
(.+) # match one or more characters in capture group 1
/x # extended mode
line[r,1]
#=> "37"
Alternatively:
r = /
.* # match any number of any character (greedily!)
\| # match pipe
\K # forget everything matched so far
.+ # match one or more characters
/x # extended mode
line[r]
#=> "37"
or, as suggested by #engineersmnky in a comment on #shivam's answer:
r = /
(?<=\|) # match a pipe in a positive lookbehind
\d+ # match any number of digits
\z # match end of string
/x # extended mode
line[r]
#=> "37"
I would use split and last, but you could do
last_field = line.sub(/.+\|/, "")
That remove all chars up to and including the last pipe.
So I'm having an issue replacing \" in a string.
My Objective:
Given a string, if there's an escaped quote in the string, replace it with just a quote
So for example:
"hello\"74" would be "hello"74"
simp"\"sons would be simp"sons
jump98" would be jump98"
I'm currently trying this: but obviously that doesn't work and messes everything up, any assistance would be awesome
str.replace "\\"", "\""
I guess you are being mistaken by how \ works. You can never define a string as
a = "hello"74"
Also escape character is used only while defining the variable its not part of the value. Eg:
a = "hello\"74"
# => "hello\"74"
puts a
# hello"74
However in-case my above assumption is incorrect following example should help you:
a = 'hello\"74'
# => "hello\\\"74"
puts a
# hello\"74
a.gsub!("\\","")
# => "hello\"74"
puts a
# hello"74
EDIT
The above gsub will replace all instances of \ however OP needs only to replace '" with ". Following should do the trick:
a.gsub!("\\\"","\"")
# => "hello\"74"
puts a
# hello"74
You can use gsub:
word = 'simp"\"sons';
print word.gsub(/\\"/, '"');
//=> simp""sons
I'm currently trying str.replace "\\"", "\"" but obviously that doesn't work and messes everything up, any assistance would be awesome
str.replace "\\"", "\"" doesn't work for two reasons:
It's the wrong method. String#replace replaces the entire string, you are looking for String#gsub.
"\\"" is incorrect: " starts the string, \\ is a backslash (correctly escaped) and " ends the string. The last " starts a new string.
You have to either escape the double quote:
puts "\\\"" #=> \"
Or use single quotes:
puts '\\"' #=> \"
Example:
content = <<-EOF
"hello\"74"
simp"\"sons
jump98"
EOF
puts content.gsub('\\"', '"')
Output:
"hello"74"
simp""sons
jump98"
I have a string that looks like this.
mystring="The Body of a\r\n\t\t\t\tSpider"
I want to replace all the \r, \n, \t etc with a whitespace.
The code I wrote for this is :
mystring.gsub(/\\./, " ")
But this isn't doing anything to the string.
Help.
\r, \n and \t are escape sequences representing carriage return, line feed and tab. Although they are written as two characters, they are interpreted as a single character:
"\r\n\t".codepoints #=> [13, 10, 9]
Because it is such a common requirement, there's a shortcut \s to match all whitespace characters:
mystring.gsub(/\s/, ' ')
#=> "The Body of a Spider"
Or \s+ to match multiple whitespace characters:
mystring.gsub(/\s+/, ' ')
#=> "The Body of a Spider"
/\s/ is equivalent to /[ \t\r\n\f]/
String#tr is designed for stream symbol substitution. It appears to be a bit quickier, than String#gsub:
mystring.tr "\r", ' '
It hasan insplace version also (this will replace all carriage returns, line feed and spaces with space):
mystring.tr! "\s\r\n\t\f", ' '
Stefen's Answer is really very Cool as always comeup with very short and clean solutions. But here what I tried to remove all special characters. [Posted as just optional solution] ;)
> a = "The Body of a\r\n\t\t\t\tSpider"
=> "The Body of a\r\n\t\t\t\tSpider"
> a.gsub(/[^0-9A-Za-z]/, ' ')
=> "The Body of a Spider"
you can use strip , then add a space to your string
mystring.strip . " "
If you literally has \r\n\t in your string:
mystring="The Body of a\r\n\t\t\t\tSpider"
mystring.split(/[\r\t\n]/)
I have this string:
string = "SEGUNDA A SEXTA\n05:24 \n05:48\n06:12\n06:36\n07:00\n07:24\n07:48\n\n08:12 \n08:36\n09:00\n09:24\n09:48\n10:12\n10:36\n11:00 \n11:24\n11:48\n12:12\n12:36\n13:00\n13:24\n13:48 \n14:12\n14:36\n15:00\n15:24\n15:48\n16:12\n16:36 \n17:00\n17:24\n17:48\n18:12\n18:36\n19:00\n19:48 \n20:36\n21:24\n22:26\n23:15\n00:00\n"
And I'd like to replace all \n\n occurrences to only one \n and if it's possible I'd like to remove also all " " (spaces) between the numbers and the newline character \n
I'm trying to do:
string.gsub(/\n\n/, '\n')
but it is replacing \n\n by \\n
Can anyone help me?
The real reason is because single quoted sting doesn't escape special characters (like \n).
string.gsub(/\n/, '\n')
It replaces one single character \n with two characters '\' and 'n'
You can see the difference by printing the string:
[302] pry(main)> puts '\n'
\n
=> nil
[303] pry(main)> puts "\n"
=> nil
[304] pry(main)> string = '\n'
=> "\\n"
[305] pry(main)> string = "\n"
=> "\n"
I think you're looking for:
string.gsub( / *\n+/, "\n" )
This searches for zero or more spaces followed by one or more newlines, and replaces the match with a single newline.