Shorter way to remove non-characters than gsub(/\d|\W/, "") - ruby

my_string = 'Here's the #: 49848! - but will dashes, commas & stars (*) show?'
puts src.gsub(/\d|\W/, "")
i.e. can I remove the or ("|").
Here's how I got here, can I get shorter?
src = "Here's the #: 49848! - but will dashes, commas & stars (*) show?"
puts "A) - " + src
puts "B) - " + src.gsub(/\d\s?/, "")
puts "C) - " + src.gsub(/\W\s?/, "")
puts "D) - " + src.gsub(/\d|\W\s?/, "")
puts "E) - " + src.gsub(/\d|\W/, "")
puts "F) - " + src
A) - Here's the #: 49848! - but will dashes, commas & stars (*) show?
B) - Here's the #: ! - but will dashes, commas & stars (*) show?
C) - Heresthe49848butwilldashescommasstarsshow
D) - Heresthebutwilldashescommasstarsshow
E) - Heresthebutwilldashescommasstarsshow
F) - Here's the #: 49848! - but will dashes, commas & stars (*) show?
n.d. D) and E) are what I want for output. Just characters.

my_string = "Here's the #: 49848! - but will dashes, commas & stars (*) show?"
p my_string.delete('^a-zA-Z')
#=>"Heresthebutwilldashescommasstarsshow"

I have this one
src.gsub(/[^a-z]/i, "")
also not shorter, but better to read in my opinion.
The i modifier makes the regex case independent, so that a-z matches also A-Z. A small difference is that this regex will also replace _ which is not replaced by yours.

If you want to keep also unicode letters, use this one:
/\PL/
This matches all non letter character.

Related

How to describe a quoted string in EBNF

How do I describe a quoted string (like in C, Java, etc) in EBNF notation?
I was thinking of this (see below), but the AnyCharacter part will also match the double quotes (").
QuotedString = '"' AnyCharacter* '"' ;
In other words, how do I match all characters except the double quote character ("), but still allow escapes (/")?
You could do something like
string = " printable-chars | nested-quotes "
where
printable chars = letter | digit | ~ # # % _ $ & ' - + /
where
letter = A..Z | a..z | extended ascii
and
digit = 0..9
I think you've got the general idea

How do I write a regex that eliminates the space between a number and a colon?

I want to replace a space between one or two numbers and a colon followed by a space, a number, or the end of the line. If I have a string like,
line = " 0 : 28 : 37.02"
the result should be:
" 0: 28: 37.02"
I tried as below:
line.gsub!(/(\A|[ \u00A0|\r|\n|\v|\f])(\d?\d)[ \u00A0|\r|\n|\v|\f]:(\d|[ \u00A0|\r|\n|\v|\f]|\z)/, '\2:\3')
# => " 0: 28 : 37.02"
It seems to match the first ":", but the second ":" is not matched. I can't figure out why.
The problem
I'll define your regex with comments (in free-spacing mode) to show what it is doing.
r =
/
( # begin capture group 1
\A # match beginning of string (or does it?)
| # or
[ \u00A0|\r|\n|\v|\f] # match one of the characters in the string " \u00A0|\r\n\v\f"
) # end capture group 1
(\d?\d) # match one or two digits in capture group 2
[ \u00A0|\r|\n|\v|\f] # match one of the characters in the string " \u00A0|\r\n\v\f"
: # match ":"
( # begin capture group 3
\d # match a digit
| # or
[ \u00A0|\r|\n|\v|\f] # match one of the characters in the string " \u00A0|\r\n\v\f"
| # or
\z # match the end of the string
) # end capture group 3
/x # free-spacing regex definition mode
Note that '|' is not a special character ("or") within a character class. It's treated as an ordinary character. (Even if '|' were treated as "or" within a character class, that would serve no purpose because character classes are used to force any one character within it to be matched.)
Suppose
line = " 0 : 28 : 37.02"
Then
line.gsub(r, '\2:\3')
#=> " 0: 28 : 37.02"
$1 #=> " "
$2 #=> "0"
$3 #=> " "
In capture group 1 the beginning of the line (\A) is not matched because it is not a character and only characters are not matched (though I don't know why that does not raise an exception). The special character for "or" ('|') causes the regex engine to attempt to match one character of the string " \u00A0|\r\n\v\f". It therefore would match one of the three spaces at the beginning of the string line.
Next capture group 2 captures "0". For it to do that, capture group 1 must have captured the space at index 2 of line. Then one more space and a colon are matched, and lastly, capture group 3 takes the space after the colon.
The substring ' 0 : ' is therefore replaced with '\2:\3' #=> '0: ', so gsub returns " 0: 28 : 37.02". Notice that one space before '0' was removed (but should have been retained).
A solution
Here's how you can remove the last of one or more Unicode whitespace characters that are preceded by one or two digits (and not more) and are followed by a colon at the end of the string or a colon followed by a whitespace or digit. (Whew!)
def trim(str)
str.gsub(/\d+[[:space:]]+:(?![^[:space:]\d])/) do |s|
s[/\d+/].size > 2 ? s : s[0,s.size-2] << ':'
end
end
The regular expression reads, "match one or more digits followed by one or more whitespace characters, followed by a colon (all these characters are matched), not followed (negative lookahead) by a character other than a unicode whitespace or digit". If there is a match, we check to see how many digits there are at the beginning. If there are more than two the match is returned (no change), else the whitespace character before the colon is removed from the match and the modified match is returned.
trim " 0 : 28 : 37.02"
#=> " 0: 28: 37.02" xxx
trim " 0\v: 28 :37.02"
#=> " 0: 28:37.02"
trim " 0\u00A0: 28\n:37.02"
#=> " 0: 28:37.02"
trim " 123 : 28 : 37.02"
#=> " 123 : 28: 37.02"
trim " A12 : 28 :37.02"
#=> " A12: 28:37.02"
trim " 0 : 28 :"
#=> " 0: 28:"
trim " 0 : 28 :A"
#=> " 0: 28 :A"
If, as in the example, the only characters in the string are digits, whitespaces and colons, the lookbehind is not needed.
You can use Ruby's \p{} construct, \p{Space}, in place of the POSIX expression [[:space:]]. Both match a class of Unicode whitespace characters, including those shown in the examples.
Excluding the third digit can be done with a negative lookback, but since the other one or two digits are of variable length, you cannot use positive lookback for that part.
line.gsub(/(?<!\d)(\d{1,2}) (?=:[ \d\$])/, '\1')
# => " 0: 28: 37.02"
" 0 : 28 : 37.02".gsub!(/(\d)(\s)(:)/,'\1\3')
=> " 0: 28: 37.02"

How do I find any space before "."

I have names "example .png" and "example 2.png". I am trying to convert any space to "_" and any space before "." should be removed.
So far I am doing it like this:
file.gsub(" .",".").gsub(" ", "_").gsub(".tif", "")
Use an rstripped File.basename(filename,File.extname(filename)) and replace spaces with underscores inside it then add an extname:
File.basename(filename,File.extname(filename)).rstrip.gsub(" ", "_") + File.extname(filename)
See the Ruby demo
Details:
File.basename(filename,File.extname(filename)) - get file name without extension
.rstrip - remove whitespace before the extension
.gsub(" ", "_") - replaces spaces (use /\s+/ regex to remove any whitespaces) with underscores
File.extname(filename) - a file extension.
If you prefer a regex way:
s = 'some example 2 .png'
puts s.gsub(/\s+(\.[^.]+\z)|\s/) {
Regexp.last_match(1) ?
Regexp.last_match(1) :
"_"
}
(can be shortened to s.gsub(/\s+(\.[^.]+\z)|\s/) { $1 || "_" } (see Jordan's remark)).
See this Ruby demo.
Here, the pattern matches:
\s+(\.[^.]+\z) - 1 or more whitespaces (\s+) before the extension (\.[^.]+ - a dot followed with 1+ chars other than a dot before the end of string \z), while capturing the extension into Group 1
| - or
\s - any other whitespace symbol (add + after it if you need to replace whole whitespace chunks with underscores).
In the gsub block, a check is performed to test Group 1, and if it matched, only the extension is inserted into the result. Else, a whitespace is replaced with an underscore.

Ruby - How to remove space after some characters?

I need to remove white spaces after some characters, not all of them. I want to remove whites spaces after these chars: I,R,P,O. How can I do it?
"I ".gsub(/(?<=[IRPO]) /, "") # => "I"
"A ".gsub(/(?<=[IRPO]) /, "") # => "A "
" P $ R 3I&".gsub(/([IRPO])\s+/,'\1')
#=> " P$ R3I&"

Ruby - how to remove some chars from string?

I have following strings:
" asfagds gfdhd"sss dg "
"sdg "dsg "
desired output:
asfagds gfdhd"sss dg
sdg "dsg
(Empty spaces removed from the front and end of the strings, as well as leading and trailing double quotes.)
I have a big file with these lines and I need them format to our needs... How could I remove the " from the start and end of the respective file and remove the white spaces from the start and end of the file?
Use string.strip or string.strip!.
" asfagds gfdhd\"sss dg ".strip
"asfagds gfdhd\"sss dg"
Be aware that strip removes all whitespaces (fe. tabs, newlines), not just spaces.
If you want to remove just spaces use:
string.gsub /^ *| *$/, ''
If you want to remove " as well:
string.gsub /^" *| *"$/, ''
If the data in the file is clean and uniform, then this should do
'" asfagds gfdhd"sss dg "'[1..-2].strip
If the data is not clean, you may need to do a strip before too.. (ie if there are trailing spaces after the closing quotation marks.
'" asfagds gfdhd"sss dg "'.strip[1..-2].strip
Really depends on how clean the data in the file is.
Use strip:
" Hello World ".strip #=> "Hello World"
Or to only strip from the left/right use lstrip and rstrip respectively.
One liner:
irb> '" asfagds gfdhd"sss dg "'[1..-2].strip
=> "asfagds gfdhd"sss dg"
take the [1,n-1] substring, remove whitespace

Resources