Gsub to keep all but 1 character - gsub

I have been trying to substitute "." in my data with ":", and keeping the trailing 00's, but upon switching out the decimals, my trailing zeroes disappear. Is it possible to switch only the ".", while keeping everything else exactly the same? I have been using:
gsub("\\.", ":",df[,1])

You need to use format to convert the string to character first:
gsub("\\.",":",format(df[ ,1]))

Related

Replace non-word characters, unless given sequence matches

I have a string like this:
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
I want to replace all non-word characters (symbols and whitespace), except the ### delimiters.
I'm currently using:
str.gsub(/[^\w#]+/, 'X')
which yields:
"JimXBobXsXemailX###hl###address###endhl###XisXjb#exampleXcom"
In practice, this is good enough, but it offends me for two reasons:
The # in the email address is not replaced.
The use of [^\w] instead of \W feels sloppy.
How do I replace all non-word characters, unless those characters make up the ###hl### or ###endhl### delimiter strings?
str.gsub(/(###.*?###|\w+)|./) { $1 || "X" }
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"
This approach uses the fact that alternations work like case structure: the first matching one consumes the corresponding string, then no further matching is done on it. Thus, ###.*?### will consume a marker (like ###hl###; nothing else will be matched inside it. We also match any sequence of word characters. If any of those are captured, we can just return them as-is ($1). If not, then we match any other character (i.e. not inside a marker, and not a word character) and replace it with "X".
Regarding your second point, I think you are asking too much; there is no simple way to avoid that.
Regarding the first point, a simple way is to temporarily replace "###" with a character that you will never use (let's say you are using a system without "\r", so that that character is not used; we can use that as a temporal replacement).
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
.gsub("###", "\r").gsub(/[^\w\r]/, "X").gsub("\r", "###")
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"

How would I get this portion of the string?

Here's my string:
http://media.example.com.s3.amazonaws.com/videos/1/123ab564we65a16a5w_web.m4v
I want this: 123ab564we65a16a5w
The only variables that will change here are the /1/ and the unique key that I'm trying to pull. Everything else will be exactly the same.
For the /1/ portion, that 1 could be anywhere from 1-3 digits, but will always be numeric.
I'm running Ruby 1.9.2.
Assuming nothing else changes, here's the regex for it:
http://media.example.com.s3.amazonaws.com/videos/\d{1,3}/(.*)_web.m4v
If there are other changes, you need to let us know all the variables.
This is shorter -
s.split(/[/_.]/)[-3]
Since you've indicated that the value you want will always have a "/" immediately before it (and none after it) and an "_" immediately after it, you could use this generic regex:
^.*/(.*)_.*$
Here's why this would work:
^ matches the beginning of the line
.*/ matches any number of characters up to the slash - this is greedy, so it will go until the last slash in the input value
(.*) matches any number of characters and captures the result
_.* matches an underscore and then any number of characters
$ matches the end of the line
By matching anything up to the last "/" and then anything after the "_", you easily isolate the desired value.
NOTE: I don't know if the Ruby regex syntax is any different than this, so your mileage may vary.
--
EDIT: It looks like in Ruby, you might not need/want the ^ or $ at the beginning and end.

In Ruby, what's the easiest way to "chomp" at the start of a string instead of the end?

In Ruby, sometimes I need to remove the new line character at the beginning of a string. Currently what I did is like the following. I want to know the best way to do this. Thanks.
s = "\naaaa\nbbbb"
s.sub!(/^\n?/, "")
lstrip seems to be what you want (assuming trailing white space should be kept):
>> s = "\naaaa\nbbbb" #=> "\naaaa\nbbbb"
>> s.lstrip #=> "aaaa\nbbbb"
From the docs:
Returns a copy of str with leading whitespace removed. See also
String#rstrip and String#strip.
http://ruby-doc.org/core-1.9.3/String.html#method-i-lstrip
strip will remove all trailing whitespace
s = "\naaaa\nbbbb"
s.strip!
Little hack to chomp leading whitespace:
str = "\nmy string"
chomped_str = str.reverse.chomp.reverse
To be perfectly accurate chomp not only can delete whitespace, from the end of a string, but can also delete arbitrary characters.
If the latter functionality is sought, one can use:
'\naaaa\nbbbb'.delete_prefix( "\n" )
As opposed to strip this works for arbitrary characters exactly like chomp.
So, just for a bit of clarification, there are three ways that you can go about this: sub, reverse.chomp.reverse and lstrip.
I'd recommend against sub because it's a bit less readable, but also because of how it works: by creating a new string that inherits from your old string. Plus you need a regular expression for something that's fairly simple.
So then you're down to reverse.chomp.reverse and lstrip. Most likely, you want lstrip because it's a bit faster, but keep in mind that the strip operations are not the same as the chomp operations. strip will remove all leading newlines and whitespace:
"\n aaa\nbbb".reverse.chomp.reverse # => " aaa\nbbb"
"\n aaa\nbbb".lstrip # => "aaa\nbbb"
If you want to make sure you only remove one character and that it's definitely a newline, use the reverse.chomp.reverse solution. If you consider all leading newlines and whitespace garbage, go with lstrip.
The one case I can think of for using regular expressions would be if you have an unknown number of \rs and \ns at the beginning and want to trim them all but avoid touching any whitespace. You could use a loop and the more String methods for trimming but it would just be uglier. The performance implications don't really matter that much.
s.sub(/^[\n\r]*/, '')
This removes leading newlines (carriage returns and line feeds, as in chomp), not any whitespace.
Not sure if it's the best way but you could try:
s.reverse.chomp.reverse
if you want to leave the trailing newline (if it exists).
This should work for you: s.strip.
A way to do this for whitespace or non-whitespace characters is like this:
s = "\naaaa\nbbbb"
s.slice!("\n") # returns "\n" but s also has the first newline removed.
puts s # shows s has the first newline removed

How can I strip tab characters from a string in Ruby?

I have a program that loads some tab-separated lines into a MySQL table. One of the values has tabs in it, which is causing some problems. The data is created column by column, so I need to find a way to strip the tab character out of an individual field with gsub. I do not, however, want to get rid of anything else, like spaces.
It's really easy \t is the tab character.
result = string.gsub /\t/, ''
or, in-place
string.gsub! /\t/, ''
\t is the escape character for tabs within strings. So you can just search for "\t" and replace that by a space or something.

gsub ASCII code characters from a string in ruby

I am using nokogiri to screen scrape some HTML. In some occurrences, I am getting some weird characters back, I have tracked down the ASCII code for these characters with the following code:
#parser.leads[0].phone_numbers[0].each_byte do |c|
puts "char=#{c}"
end
The characters in question have an ASCII code of 194 and 160.
I want to somehow strip these characters out while parsing.
I have tried the following code but it does not work.
#parser.leads[0].phone_numbers[0].gsub(/160.chr/,'').gsub(/194.chr/,'')
Can anyone tell me how to achieve this?
I found this question while trying to strip out invisible characters when "trimming" a string.
s.strip did not work for me and I found that the invisible character had the ord number 194
None of the methods above worked for me but then I found "Convert non-breaking spaces to spaces in Ruby " question which says:
Use /\u00a0/ to match non-breaking spaces: s.gsub(/\u00a0/, ' ') converts all non-breaking spaces to regular spaces
Use /[[:space:]]/ to match all whitespace, including Unicode whitespace like non-breaking spaces. This is unlike /\s/, which matches only ASCII whitespace.
So glad I found that! Now I'm using:
s.gsub(/[[:space:]]/,'')
This doesn't answer the question of how to gsub specific character codes, but if you're just trying to remove whitespace it seems to work pretty well.
Your problem is that you want to do a method call but instead you're creating a Regexp. You're searching and replacing strings consisting of the string "160" followed by any character and then the string "chr", and then doing the same except with "160" replaced with "194".
Instead, do gsub(160.chr, '').
Update (2018): This code does not work in current Ruby versions. Please refer to other answers.
You can also try
s.gsub(/\xA0|\xC2/, '')
or
s.delete 160.chr+194.chr
First thought would be should you be using gsub! instead of gsub
gsub returns a string and gsub! performs the substitution in place
I was getting "invalid multibyte escape" error while trying the above solution, but for a different situation. Google was return \xA0 when the number is greater than 999 and I wanted to remove it. So what I did was use return_value.gsub(/[\xA0]/n,"") instead and it worked perfectly fine for me.

Resources