Method chaining chomp/chompbang - ruby

Why does chomp allow chaining but chomp! doesn't? For example:
"HELLO ".chomp.downcase
#> hello
"HELLO ".chomp!.downcase
#> nil
Another interesting example:
"100 ".chomp.to_i
#> 100
"100 ".chomp!.to_i
#> 0
Any ideas why this behavior occurs on the string, and why nil.to_i returns 0?

From the fine manual:
chomp(separator=$/) → new_str
Returns a new String with the given record separator removed from the end of str (if present). If $/ has not been changed from the default Ruby record separator, then chomp also removes carriage return characters (that is it will remove \n, \r, and \r\n).
and for chomp!:
chomp!(separator=$/) → str or nil
Modifies str in place as described for String#chomp, returning str, or nil if no modifications were made.
So neither chomp nor chomp! do what you think they do. Observe:
>> s = '100 '
=> "100 "
>> s.chomp
=> "100 "
>> s
=> "100 "
>> s.chomp!
=> nil
>> s
=> "100 "
So neither one cares about trailing spaces unless you tell them to, they just strip off trailing EOLs by default.
'100 '.chomp! returns nil because that's what the documentation says it does. No substitution was made so it returns nil.
Why does nil.to_i give you zero? Well, from the fine manual:
to_i → 0
Always returns zero.
That doesn't leave much room for ambiguity or interpretation.
I think you're actually after the strip family of methods rather than chomp:
String#lstrip
String#lstrip!
String#rstrip
String#rstrip!
String#strip
String#strip!
Those remove whitespace from the string.

Your question will vanish if you remember to provide an argument to chomp. Without one, chomp will only remove newlines and carriage returns, which are absent from your string. The bang augmented chomp returns nil because it hasn't done any modification (as per the documentation).
In brief, you really wanted to write:
"HELLO ".chomp(" ").downcase
=> hello
And:
"HELLO ".chomp!(" ").downcase
=> hello

Related

Ruby `downcase!` returns `nil`

With this code:
input = gets.chomp.downcase!
puts input
if there is at least one uppercase letter in the input, the input will be put on screen, freed of its uppercases. But if the input has no uppercase letter, it will put nil, like if nothing was written.
I want my input to be fully downcased; if it is a string with no uppercase letter, it should return the same string.
I thought about something like this:
input = gets.chomp
if input.include(uppercase) then input.downcase! end
But this doesn't work. I hope someone has an idea on how I should do this.
According to the docs for String:
(emphasis is mine added)
downcase
Returns a copy of str with all uppercase letters replaced with their lowercase counterparts. The operation is locale
insensitive—only characters “A” to “Z” are affected. Note: case
replacement is effective only in ASCII region.
downcase!
Downcases the contents of str, returning nil if no changes were made. Note: case replacement is effective only in ASCII
region.
Basically it says that downcase! (with exclamation mark) will return nil if there is no uppercase letters.
To fix your program:
input = gets.chomp.downcase
puts input
Hope that helped!
This will work:
input = gets.chomp.downcase
puts input
String#downcase
Returns a modified string and leaves the original unmodified.
str = "Hello world!"
str.downcase # => "hello world!"
str # => "Hello world!"
String#downcase!
Modifies the original string, returns nil if no changes were made or returns the new string if a change was made.
str = "Hello world!"
str.downcase! # => "hello world!"
str # => "hello world!"
str.downcase! # => nil
! (bang) methods
It's common for Ruby methods with ! / non-! variants to behave in a similar manner. See this post for an in-depth explanation why.
The reason that downcase! returns nil is so you know whether or not the object was changed. If you're assigning the modified string to another variable, like you are here, you should use downcase instead (without the bang !).
If you're not familiar, the standard library bang methods typically act on the receiver directly. That means this:
foo = "Hello"
foo.downcase!
foo #=> "hello"
Versus this:
foo = "Hello"
bar = foo.downcase
foo #=> "Hello"
bar #=> "hello"

Multiple Ruby chomp! statements

I'm writing a simple method to detect and strip tags from text strings. Given this input string:
{{foobar}}
The function has to return
foobar
I thought I could just chain multiple chomp! methods, like so:
"{{foobar}}".chomp!("{{").chomp!("}}")
but this won't work, because the first chomp! returns a NilClass. I can do it with regular chomp statements, but I'm really looking for a one-line solution.
The String class documentation says that chomp! returns a Str if modifications have been made - therefore, the second chomp! should work. It doesn't, however. I'm at a loss at what's happening here.
For the purposes of this question, you can assume that the input string is always a tag which begins and ends with double curly braces.
You can definitely chain multiple chomp statements (the non-bang version), still having a one-line solution as you wanted:
"{{foobar}}".chomp("{{").chomp("}}")
However, it will not work as expected because both chomp! and chomp removes the separator only from the end of the string, not from the beginning.
You can use sub
"{{foobar}}".sub(/{{(.+)}}/, '\1')
# => "foobar"
"alfa {{foobar}} beta".sub(/{{(.+)}}/, '\1')
# => "alfa foobar beta"
# more restrictive
"{{foobar}}".sub(/^{{(.+)}}$/, '\1')
# => "foobar"
Testing this out, it's clear that chomp! will return nil if the separator it's provided as an argument is not present at the end of the string.
So "{{text}}".chomp!("}}") returns a string, but "{{text}}".chomp!("{{") reurns nil.
See here for an answer of how to chomp at the beginning of a string. But recognize that chomp only looks at the end of the string. So you can call str.reverse.chomp!("{{").reverse to remove the opening brackets.
You could also use a regex:
string = "{{text}}"
puts [/^\{\{(.+)\}\}$/, 1]
# => "text"
Try tr:
'{{foobar}}'.tr('{{', '').tr('}}', '')
You can also use gsub or sub but if the replacement is not needed as pattern, then tr should be faster.
If there are always curly braces, then you can just slice the string:
'{{foobar}}'[2...-2]
If you plan to make a method which returns the string without curly braces then DO NOT use bang versions. Modifying the input parameter of a method will be suprising!
def strip(string)
string.tr!('{{', '').tr!('}}', '')
end
a = '{{foobar}}'
b = strip(a)
puts b #=> foobar
puts a #=> foobar

Finding the first duplicate character in the string Ruby

I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.
The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end
s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }
Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).
You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}
I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c
As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"

Eval a string without string interpolation

AKA How do I find an unescaped character sequence with regex?
Given an environment set up with:
#secret = "OH NO!"
$secret = "OH NO!"
##secret = "OH NO!"
and given string read in from a file that looks like this:
some_str = '"\"#{:NOT&&:very}\" bad. \u262E\n##secret \\#$secret \\\\###secret"'
I want to evaluate this as a Ruby string, but without interpolation. Thus, the result should be:
puts safe_eval(some_str)
#=> "#{:NOT&&:very}" bad. ☮
#=> ##secret #$secret \###secret
By contrast, the eval-only solution produces
puts eval(some_str)
#=> "very" bad. ☮
#=> OH NO! #$secret \OH NO!
At first I tried:
def safe_eval(str)
eval str.gsub(/#(?=[{#$])/,'\\#')
end
but this fails in the malicious middle case above, producing:
#=> "#{:NOT&&:very}" bad. ☮
#=> ##secret \OH NO! \###secret
You can do this via regex by ensuring that there are an even number of backslashes before the character you want to escape:
def safe_eval(str)
eval str.gsub( /([^\\](?:\\\\)*)#(?=[{#$])/, '\1\#' )
end
…which says:
Find a character that is not a backslash [^\\]
followed by two backslashes (?:\\\\)
repeated zero or more times *
followed by a literal # character
and ensure that after that you can see either a {, #, or $ character.
and replace that with
the non-backslash-maybe-followed-by-even-number-of-backslashes
and then a backslash and then a #
How about not using eval at all? As per this comment in chat, all that's necessary are escaping quotes, newlines, and unicode characters. Here's my solution:
ESCAPE_TABLE = {
/\\n/ => "\n",
/\\"/ => "\"",
}
def expand_escapes(str)
str = str.dup
ESCAPE_TABLE.each {|k, v| str.gsub!(k, v)}
#Deal with Unicode
str.gsub!(/\\u([0-9A-Z]{4})/) {|m| [m[2..5].hex].pack("U") }
str
end
When called on your string the result is (in your variable environment):
"\"\"\#{:NOT&&:very}\" bad. ☮\n\##secret \\\#$secret \\\\\###secret\""
Although I would have preferred not to have to treat unicode specially, it is the only way to do it without eval.

How do I remove carriage returns with Ruby?

I thought this code would work, but the regular expression doesn't ever match the \r\n. I have viewed the data I am reading in a hex editor and verified there really is a hex D and hex A pattern in the file.
I have also tried the regular expressions /\xD\xA/m and /\x0D\x0A/m but they also didn't match.
This is my code right now:
lines2 = lines.gsub( /\r\n/m, "\n" )
if ( lines == lines2 )
print "still the same\n"
else
print "made the change\n"
end
In addition to alternatives, it would be nice to know what I'm doing wrong (to facilitate some learning on my part). :)
Use String#strip
Returns a copy of str with leading and trailing whitespace removed.
e.g
" hello ".strip #=> "hello"
"\tgoodbye\r\n".strip #=> "goodbye"
Using gsub
string = string.gsub(/\r/," ")
string = string.gsub(/\n/," ")
Generally when I deal with stripping \r or \n, I'll look for both by doing something like
lines.gsub(/\r\n?/, "\n");
I've found that depending on how the data was saved (the OS used, editor used, Jupiter's relation to Io at the time) there may or may not be the newline after the carriage return. It does seem weird that you see both characters in hex mode. Hope this helps.
If you are using Rails, there is a squish method
"\tgoodbye\r\n".squish => "goodbye"
"\tgood \t\r\nbye\r\n".squish => "good bye"
What do you get when you do puts lines? That will give you a clue.
By default File.open opens the file in text mode, so your \r\n characters will be automatically converted to \n. Maybe that's the reason lines are always equal to lines2. To prevent Ruby from parsing the line ends use the rb mode:
C:\> copy con lala.txt
a
file
with
many
lines
^Z
C:\> irb
irb(main):001:0> text = File.open('lala.txt').read
=> "a\nfile\nwith\nmany\nlines\n"
irb(main):002:0> bin = File.open('lala.txt', 'rb').read
=> "a\r\nfile\r\nwith\r\nmany\r\nlines\r\n"
irb(main):003:0>
But from your question and code I see you simply need to open the file with the default modifier. You don't need any conversion and may use the shorter File.read.
modified_string = string.gsub(/\s+/, ' ').strip
lines2 = lines.split.join("\n")
"still the same\n".chomp
or
"still the same\n".chomp!
http://www.ruby-doc.org/core-1.9.3/String.html#method-i-chomp
How about the following?
irb(main):003:0> my_string = "Some text with a carriage return \r"
=> "Some text with a carriage return \r"
irb(main):004:0> my_string.gsub(/\r/,"")
=> "Some text with a carriage return "
irb(main):005:0>
Or...
irb(main):007:0> my_string = "Some text with a carriage return \r\n"
=> "Some text with a carriage return \r\n"
irb(main):008:0> my_string.gsub(/\r\n/,"\n")
=> "Some text with a carriage return \n"
irb(main):009:0>
I think your regex is almost complete - here's what I would do:
lines2 = lines.gsub(/[\r\n]+/m, "\n")
In the above, I've put \r and \n into a class (that way it doesn't matter in which order they might appear) and added the "+" qualifier (so that "\r\n\r\n\r\n" would also match once, and the whole thing replaced with "\n")
Just another variant:
lines.delete(" \n")
Why not read the file in text mode, rather than binary mode?
lines.map(&:strip).join(" ")
You can use this :
my_string.strip.gsub(/\s+/, ' ')
def dos2unix(input)
input.each_byte.map { |c| c.chr unless c == 13 }.join
end
remove_all_the_carriage_returns = dos2unix(some_blob)

Resources