Beginner here, obviously. I need to add a sum and a string together and from the product, I have to replace every 4th character with underscore, the end product should look something like this: 160_bws_np8_1a
I think .gsub is the way, but I can find a way to format the first part in .gsub where I have to specify every 4th character.
total = (1..num).sum
final_output = "#{total.to_s}" + "06bwsmnp851a"
return final_output.gsub(//, "_")
This would work:
s = '12345678901234'
s.gsub(/(...)./, '\1_')
#=> "123_567_901_34"
The regex matches 3 characters (...) that are captured (parentheses) followed by another character (.). Each match is replaced by the first capture (\1) and a literal underscore (_).
s = "12345678901234"
Here are two ways to do that. Both return
"123_567_901_34"
Match every four-character substring and replace the match with the first three characters of the match followed by an underscore
s.gsub(/.{4}/) { |s| s[0,3] << '_' }
Chain the enumerator s.gsub(/./) to Enumerator#with_index and replace every fourth character with an underscore
s.gsub(/./).with_index { |c,i| i%4 == 3 ? '_' : c }
See the form of String#gsub that takes a single argument and no block.
I have a string, and I would like to replace all special characters with underscores.
In other words, I just want 26 english letters (lower and upper cases) and 0-9 and the "_" character.
Also note that there are the non-english characters and they need to be replaced with "_" as well.
What is the most elegant way to do this in Ruby?
It sounds like you want to replace all non-word characters with underscores. Therefore,
result = subject.gsub(/[^\w]/, '_')
But are you okay that this would also replace newlines and other whitespace characters?
If not, change it to
result = subject.gsub(/[^\w\s]/, '_')
Explain Regex
[^\w\s] # any character except: word characters (a-
# z, A-Z, 0-9, _), whitespace (\n, \r, \t,
# \f, and " ")
Note
As #CarySwoveland mentions, the [^\w] can also be written with the shorthand \W.
What is the opposite of Regexp.escape ?
> Regexp.escape('A & B')
=> "A\\ &\\ B"
> # do something, to get the next result: (something like Regexp.unescape(A\\ &\\ B))
=> "A & B"
How can I get the original value?
replaces = Hash.new { |hash,key| key } # simple trick to return key if there is no value in hash
replaces['t'] = "\t"
replaces['n'] = "\n"
replaces['r'] = "\r"
replaces['f'] = "\f"
replaces['v'] = "\v"
rx = Regexp.escape('A & B')
str = rx.gsub(/\\(.)/){ replaces[$1] }
Also make sure to #puts output in irb, because #inspect escapes characters by default.
Basically escaping/quoting looks for meta-characters, and prepends \ character (which has to be escaped for string interpretation in source code). But if we find any control character from list: \t, \n, \r, \f, \v, then quoting outputs \ character followed by this special character translated to ascii.
UPDATE:
My solution had problems with special characters (\n, \t ans so on), I updated it after investigating source code for rb_reg_quote method.
UPDATE 2:
replaces is hash, which converts escaped characters (thats why it is used in block attached to gsub) to unescaped ones. It is indexed by character without escape character (second character in sequence) and searches for unescaped value. The only defined values are control-characters, but there is also default_proc attached (block attached to Hash.new), which returns key if there is no value found in hash. So it works like this:
for "n" it returns "\n", the same for all other escaped control characters, because it is value associated with key
for "(" it returns "(", because there is no value associated with "(" key, hash calls #default_proc, which returns key itself
The only characters escaped by Regexp.escape are meta characters and control characters, so we don't have to worry about alphanumerics.
Take a look at http://ruby-doc.org/core-2.0.0/Hash.html#method-i-default_proc for documentation on #defoult_proc
You can perhaps use something like this?
def unescape(s)
eval %Q{"#{s}"}
end
puts unescape('A\\ &\\ B')
Credits to this question.
codepad demo
If you are okay with a regex solution, you can use this:
res = s.gsub(/\\(?!\\)|(\\)\\/, "\\1")
codepad demo
Try this
>> r = Regexp.escape("A & B (and * c [ e] + )")
# => "A\\ &\\ B\\ \\(and\\ \\*\\ c\\ \\[\\ e\\]\\ \\+\\ \\)"
>> r.gsub("\\(","(").gsub("\\)",")").gsub("\\[","[").gsub("\\]","]").gsub("\\{","{").gsub("\\}","}").gsub("\\.",".").gsub("\\?","?").gsub("\\+","+").gsub("\\*","*").gsub("\\ "," ")
# => "A & B (and * c [ e] + )"
Basically, these (, ), [, ], {, }, ., ?, +, * are the meta characters in regex. And also \ which is used as an escape character.
The chain of gsub() calls replace the escaped patterns with corresponding actual value.
I am sure there is a way to DRY this up.
Update: DRY version as suggested by user2503775
>> r.gsub("\\","")
Update:
following are the special characters in regex
[,],{,},(,),|,-,*,.,\\,?,+,^,$,<space>,#,\t,\f,\v,\n,\r
using a regex replace using \\(?=([\\\*\+\?\|\{\[\(\)\^\$\.\#\ ]))\
should give you the string unescaped, you would only have to replace \r\n sequences with there CrLf counterparts.
"There\ is\ a\ \?\ after\ the\ \(white\)\ car\.\ \r\n\ it\ should\ be\ http://car\.com\?\r\n"
is unescaped to :
"There is a ? after the (white) car. \r\n it should be http://car.com?\r\n"
and removing the \r\n gives you :
There is a ? after the (white) car.
it should be http://car.com?
I would like to remove all non alpha numerical characters from a string. Except space, - and some German characters.
Example
regexp = "mönchengladbach."
regexp.gsub(/[^0-9a-z \-]/i, '')
=> mnchengladbach
I need this:
=> mönchengladbach
It should also not replace other German characters such as:
ä ö ü ß
Thanks!
Edit:
It was just me not testing properly. The IRB did not accept special characters. This works for me:
regexp.gsub(/[^0-9a-z \-äüöß]/i, '')
To remove all that is not a letter or a space you can use this:
str.gsub(/[^\p{L}\s]+/, '')
I use here a negated character class, [^\p{L}\s] means all that is not a letter (in all language you want) or a white charater (space, tab, newlines)
\p{L} is an unicode character class for Letters.
You can easily add other characters you want to preserve like -:
str.gsub(/[^\p{L}\s-]+/, '')
example script:
# encoding: UTF-8
str = "mönchengladbach."
str = str.gsub(/[^\p{L}\s]+/, '#')
puts str
I think you want:
/[^[:alnum:] -]/
Note the //i is not necessary and no need to escape - when it's at the end of a []
I want to remove any leading and trailing non-alphabetic character in my string.
for eg. ":----- pt-br:-" , i want "pt-br"
Thanks
result = subject.gsub(/\A[\d_\W]+|[\d_\W]+\Z/, '')
will remove non-letters from the start and end of the string.
\A and \Z anchor the regex at the start/end of the string (^/$ would also match after/before a newline which is probably not what you want - but that might not matter in this case);
[\d_\W]+ matches one or more digits, the underscore or anything else that is not an alphanumeric character, leaving only letters.
| is the alternation operator.
In ruby 1.9.1 :
":----- pt-br:-".partition( /[a-zA-Z](...)[a-zA-Z]/ )[1]
partition searches the pattern in the string and returns the part before it, the match, and the part after it.
result = subject.gsub(/^[^a-zA-Z]+/, '').gsub(/[^a-zA-Z]+$/, '')