Why does Ruby's 'gets' includes the closing newline? - ruby

I never need the ending newline I get from gets. Half of the time I forget to chomp it and it is a pain in the....
Why is it there?

Like puts (which sounds similar), it is designed to work with lines, using the \n character.
gets takes an optional argument that is used for "splitting" the input (or "just reading till it arrives). It defaults to the special global variable $/, which contains a \n by default.
gets is a pretty generic method for readings streams and includes this separator. If it would not do it, parts of the stream content would be lost.

var = gets.chomp
This puts it all on one line for you.

If you look at the documentation of IO#gets, you'll notice that the method takes an optional parameter sep which defaults to $/ (the input record separator). You can decide to split input on other things than newlines, e.g. paragraphs ("a zero-length separator reads the input a paragraph at a time (two successive newlines in the input separate paragraphs)"):
>> gets('')
dsfasdf
fasfds
dsafadsf #=> "dsfasdf\nfasfds\n\n"

From a performance perspective, the better question would be "why should I get rid of it?". It's not a big cost, but under the hood you have to pay to chomp the string being returned. While you may never have had a case where you need it, you've surely had plenty of cases where you don't care -- gets s; puts stuff() if s =~ /y/i, etc. In those cases, you'll see a (tiny, tiny) performance improvement by not chomping.

How I auto-detect line endings:
# file open in binary mode
line_ending = 13.chr + 10.chr
check = file.read(1000)
case check
when /\r\n/
# already set
when /\n/
line_ending = 10.chr
when /\r/
line_ending = 13.chr
end
file.rewind
while !file.eof?
line = file.gets(line_ending).chomp
...
end

Related

Match & includes? method

My code is about a robot who has 3 posible answers (it depends on what you put in the message)
So, inside this posible answers, one depends if the input it's a question, and to prove it, i think it has to identify the "?" symbol on the string.
May i have to use the "match" method or includes?
This code it's gonna be include in a loop, that may answer in 3 possible ways.
Example:
puts "whats your meal today?"
answer = gets.chomp
answer.includes? "?"
or
answer.match('?')
Take a look at String#end_with? I think that is what you should use.
Use String#match? Instead
String#chomp will only remove OS-specific newlines from a String, but neither String#chomp nor String#end_with? will handle certain edge cases like multi-line matches or strings where you have whitespace characters at the end. Instead, use a regular expression with String#match?. For example:
print "Enter a meal: "
answer = gets.chomp
answer.match? /\?\s*\z/m
The Regexp literal /\?\s*\z/m will return true value if the (possibly multi-line) String in your answer contains:
a literal question mark (which is why it's escaped)...
followed by zero or more whitespace characters...
anchored to the end-of-string with or without newline characters, e.g. \n or \r\n, although those will generally have been removed by #chomp already.
This will be more robust than your current solution, and will handle a wider variety of inputs while being more accurate at finding strings that end with a question mark without regard to trailing whitespace or line endings.

What does $/ mean in Ruby?

I was reading about Ruby serialization (http://www.skorks.com/2010/04/serializing-and-deserializing-objects-with-ruby/) and came across the following code. What does $/ mean? I assume $ refers to an object?
array = []
$/="\n\n"
File.open("/home/alan/tmp/blah.yaml", "r").each do |object|
array << YAML::load(object)
end
$/ is a pre-defined variable. It's used as the input record separator, and has a default value of "\n".
Functions like gets uses $/ to determine how to separate the input. For example:
$/="\n\n"
str = gets
puts str
So you have to enter ENTER twice to end the input for str.
Reference: Pre-defined variables
This code is trying to read each object into an array element, so you need to tell it where one ends and the next begins. The line $/="\n\n" is setting what ruby uses to to break apart your file into.
$/ is known as the "input record separator" and is the value used to split up your file when you are reading it in. By default this value is set to new line, so when you read in a file, each line will be put into an array. What setting this value, you are telling ruby that one new line is not the end of a break, instead use the string given.
For example, if I have a comma separated file, I can write $/="," then if I do something like your code on a file like this:
foo, bar, magic, space
I would create an array directly, without having to split again:
["foo", " bar", " magic", " space"]
So your line will look for two newline characters, and split on each group of two instead of on every newline. You will only get two newline characters following each other when one line is empty. So this line tells Ruby, when reading files, break on empty lines instead of every line.
I found in this page something probably interesting:
http://www.zenspider.com/Languages/Ruby/QuickRef.html#18
$/ # The input record separator (eg #gets). Defaults to newline.
The $ means it is a global variable.
This one is however special as it is used by Ruby. Ruby uses that variable as a input record separator
For a full list with the special global variables see:
http://www.rubyist.net/~slagell/ruby/globalvars.html

Difference between 2 ways working with pipes using ARGF?

Using ARGF I can create Ruby programs that respect pipelines. Suppose, I to constantly read new entries:
$ tail -f log/test.log | my_prog
I can do this using:
ARGF.each_line do |line|
...
end
Also, I found another way:
while input = ARGF.gets
input.each_line do |line|
...
end
end
Looks like, both variants do the same thing or there is a difference between them? If so, what is it?
Thanks in advance.
As Stefan mentioned, you did a little mistake in second case. Proper way of using "ARGF.gets" approach in your case will look like:
while input = ARGF.gets
# input here represents a line
end
If you rewrite the second example as above, you will not have difference in behavior.
Actual difference you may notice between ARGF#gets and ARGF#each_line is in semantics: each_line accepts block or returns enumerator and gets returns a next line if it is available.
Another option is to use Kernel#gets. Beware it's behavior may differ from ARGF#gets in some cases, especially if you change a separator:
A separator of nil reads the entire contents, and a zero-length separator reads the input one paragraph at a time, where paragraphs are divided by two consecutive newlines.
But for reading (and then printing) constantly from stdin you may use it as follows:
print while gets

Difference between ways to use gets method

I saw two ways to use gets, a simple form:
print 'Insert your name: '
name = gets()
puts "Your name is #{name}"
and a form that drew my attention:
print 'Insert your name: '
STDOUT.flush
name = gets.chomp
puts "Your name is #{name}"
The second sample looks like perl in using the flush method of the default output stream. Perl makes explicit default output stream manipulating; the method flush is a mystery to me. It can behave different from what I'm inferring, and it uses chomp to remove the new line character.
What happens behind the scenes in the second form? What situation is it useful or necessary to use the second form?
"Flushing" the output ensures that it shows the printed message before it waits for your input; this may be just someone being certain unnecessarily, or it may be that on certain operating systems you need it. Alternatively you can use STDOUT.sync = true to force a flush after every output. (You may wonder, "Why wouldn't I always use this?" Well, if your code is outputting a lot of content, repeatedly flushing it may slow it down.)
chomp removes the newline from the end of the input. If you want the newline (the result of the user pressing "Enter" after typing their name) then don't chomp it.
Looking at some Github code I can see that STDOUT.flush is used mostly for server-side/multi-threaded jobs, and not in everyday use.
Generally speaking, when you want to accept input from the user, you'd want to use gets.chomp. Just remember, no matter what the user enters, Ruby will ALWAYS interprete that as a string.
To convert it to an integer, you need to call to_i, or to_f for a float. You don't need chomp in these cases, since to_i or to_f removes the "\n" automatically. There are a lot of subtle things going on implicitly as you'll see, and figuring them out is simply a matter of practice.
I've rarely seen someone use STDOUT.flush except in mutli-threading. Also it makes things confusing, defeating the whole purpose of writing elegant code.

Is it ever necessary to use 'chomp' before using `to_i` or `to_f`?

I see people use the following code:
gets.chomp.to_i
or
gets.chomp.to_f
I don't understand why, when the result of those lines are always the same as when there is no chomp after gets.
Is gets.chomp.to_i really necessary, or is gets.to_i just enough?
From the documentation for String#to_i:
Returns the result of interpreting leading characters in str as an
integer base base (between 2 and 36). Extraneous characters past the
end of a valid number are ignored. If there is not a valid number at
the start of str, 0 is returned
String#to_f behaves the same way, excluding, of course, the base numbers.
Extraneous characters past the end of a valid number are ignored, this would include the newline. So there is no need to use chomp.
There is no need to use chomp method because:
String#chomp returns a new String with the given record separator removed from the end of str (if present). If $/ has not been changed from the default Ruby record separator, then chomp also removes carriage return characters (that is it will remove "\n", "\r", and "\r\n"). Here are some examples.
String#to_f returns the result of interpreting leading characters in str as a floating point number. Extraneous characters past the end of a valid number are ignored. If there is not a valid number at the start of str, 0.0 is returned. This method never raises an exception. Here are some examples for to_f.
It is my opinion that it works the same either way, so there is no need for the chomp after gets if you are going to immediately do to_i or to_f.
In practice, I have never seen an error raised or different behavior because of leaving chomp out of the line.
I find it is distracting, when I see it used in answers, and there is absolutely no need for it. It doesn't add to a "style", and it is, as #TheTinMan states, wasted CPU cycles.

Resources