Here is a simple ruby script, that takes input from a user and provides an output(yes, it will be refactored). I wanted this script to provide output into a text file, rather than console window. That was accomplished, by simply adding $stdout = File.new('out.txt', 'w'), but I thought that this line will just describe a variable which I will use later to tell script to use it to write output into created file.
I cant find much documentation about this method and wondering how does this program knows how to write generated output into that file?
$stdout is a global variable. By default it stores an object of type IO associated with the standard output of the program (which is, by default, the console).
puts is a method of the Kernel module that actually calls $stdout.send() and pass it the list of arguments it receives. As the documentation explains, puts(obj, ...) is equivalent to $stdout.puts(obj, ...).
Your code replaces $stdout with an object of type File that extends class IO. When it is created, your object opens the file out.txt for writing and together with its inheritance from IO it is fully compatible with the default behaviour of $stdout.
Since by default, all the output goes to $stdout, your new definition of $stdout ensures the output is written to the file out.txt without other changes in the code.
$stdout is a global variable (as indicated by $) and according to the documentation, puts is:
Equivalent to
$stdout.puts(obj, ...)
If you assign another object to $stdout, then Kernel#puts will simply send puts to that object. Likewise, print will send write:
class Foo < BasicObject
def puts(*args)
::STDOUT.puts "Foo#puts called with #{args.inspect}"
end
def write(*args)
::STDOUT.puts "Foo#write called with #{args.inspect}"
end
end
$stdout = Foo.new
puts 'hello', 'world'
# Foo#puts called with ["hello", "world"]
print "\n"
# Foo#write called with ["\n"]
Note that if you assign to $stdout, Ruby checks whether the object responds to write. If not, a TypeError will be raised.
It's probably worth highlighting the fact, mentioned by the other posts, that this is a global variable, and modifying it is not thread safe. For instance, if you point $stdout to a new IO stream, anything across the app (on that process thread) that would be logged to stdout will now be logged to your new IO stream. This can lead to multiple streams of unexpected and possibly sensitive input landing in your new IO stream.
Related
I wonder if there is any real difference between STDIN and $stdin. I do in irb:
STDIN == $stdin
and get back true. Are they just two names for the same thing? Or is there some difference?
From Ruby globals:
STDIN
The standard input. The default value for $stdin.
They are the same object by default.
[1] pry(main)> $stdin.object_id
=> 13338048
[2] pry(main)> STDIN.object_id
=> 13338048
[3] pry(main)> $stdin.object_id == STDIN.object_id
=> true
As #shivam commented, $stdin is a global variable and it may be assigned to something different, while STDIN is a constant.
STDIN is a constant and therefore you'll get a ruby warning if you try to replace it. Otherwise the two are just normal ruby variables in that they can point to the same object (and are by default) and if they do, doing something with one will affect the other variable, but if you assign something else to one of the variables, they will be different.
Standard ruby methods like get will read from $stdin (not STDIN) by default. That means you can override $stdin ($stdout, $stderr) for standard methods and use the constant versions to see what $stdin, $stdout or $stderr originaly were.
Note that overriding $stdin, $stdout, or $stderr won't affect the standard streams of newly spawned programs (the actual filedescriptors 0, 1, and 2 respectively). To do that you'd need to call IO#reopen on the stream you'd want to change, e.g. (assuming the constant version hasn't been forcibly replaced),
STDOUT.reopen("newfile") #Write all output to "newfile" including the output of newly spawned processes (`%x{}`,`system`, `spawn`, `IO.popen`, etc.)
Now with reopen, you can replace the streams only to actual OS-level files/file descriptors (e.g., no StringIO), but if you're on UNIX, there's not much you can't do with OS-level files (you can change them to pipes which you can read elsewhere in your program, for example).
What is the difference between doing:
file = open('myurl')
# Do stuff with file
And doing:
open('myurl') do |file|
# Do things with file
end
Do I need to close and remove the file when I am not using the block approach? If so, how do I close and remove it? I don't see any close/remove method in the docs
The documentation for OpenURI is a little opaque to beginners, but the docs for #open can be found here.
Those docs say:
#open returns an IO-like object if block is not given. Otherwise it yields the IO object and return the value of the block.
The key words here are "IO-like object." We can infer from that that the object (in your examples, file), will respond to the #close method.
While the documentation doesn't say so, by looking at the source we can see that #open will return either a StringIO or a Tempfile object, depending on the size of the data returned. OpenURI's internal Buffer class first initializes a StringIO object, but if the size of the output exceeds 10,240 bytes it creates a Tempfile and writes the data to it (to avoid storing large amounts of data in memory). Both StringIO and Tempfile have behavior consistent with IO, so it's good practice (when not passing a block to #open), to call #close on the object in an ensure:
begin
file = open(url)
# ...do some work...
ensure
file.close
end
Code in the ensure section always runs, even if code between begin and ensure raises an exception, so this will, well, ensure that file.close gets called even if an error occurs.
I've often read that Ruby is a pure object oriented language since commands are typically given as messages passed to the object.
For example:
In Ruby one writes: "A".ord to get the ascii code for A and 0x41.chr to emit the character given its ascii code.
This is in contrast to Python's: ord("A") and chr(0x41)
So far so good --- Ruby's syntax is message passing.
But the apparent inconsistency appears when considering the string output command:
Now one has: puts str or puts(str) instead of str.puts
Given the pure object orientation expectation for Ruby's syntax, I would have expected the output command to be a message passed to the string object, i.e. calling a method from the string class, hence str.puts
Any explanations? Am I missing something?
Thanks
I would have expected the output command to be a message passed to the string object, i.e. calling a method from the string class, hence str.puts
This is incorrect expectation, let's start with that. Why would you tell a string to puts itself? What would it print itself to? It knows nothing (and should know nothing) of files, I/O streams, sockets and other places you can print things to.
When you say puts str, it's actually seen as self.puts str (implicit receiver). That is, the message is sent to the current object.
Now, all objects include Kernel module. Therefore, all objects have Kernel#puts in their lists of methods. Any object can puts (including current object, self).
As the doc says,
puts str
is translated to
$stdout.puts str
That is, by default, the implementation is delegated to standard output (print to console). If you want to print to a file or a socket, you have to invoke puts on an instance of file or socket classes. This is totally OO.
Ruby isn't entirely OO (for example, methods are not objects), but in this case, it is. puts is Kernel#puts, which is shorthand for $stdout.puts. That is, you're calling the puts method of the $stdout stream and passing a string as the parameter to be output to the stream. So, when you call
puts "foo"
You're really calling:
$stdout.puts("foo")
Which is entirely consistent with OO.
puts is a method on an output streams e.g.
$stdout.puts("this", "is", "a", "test")
Printing something to somewhere at least involves two things: what is written and where it is written to. Depending on what you focus on, there can be different implementations, even in OOP. Besides that, Ruby has a way to make a method look more like a function (i.e., not being particularly tied to a receiver as in OOP) for methods that are used all over the place. So there are at least three logical options that could be thought of for such methods like printing.
An OOP method defined on the object to be printed
An OOP method defined on the object where it should be printed
A function-style method
For the second option, IO#write is one example; The receiver is the destination of writing.
The puts without an explicit receiver is actually Kernel#puts, and takes neither of the two as the arguments; it is an example of the third option; you are correct to point out that this is not so OOP, but Matz especially provided the Kernel module to be able to do things like this: a function-style method.
The first option is what you are expecting; it is nothing wrong. It happens that there is no well known method of this type, but it was proposed in the Ruby core by one of the developers, but unfortunately, it did not make it. Actually, I felt the same thing as you, and have something similar in my personal library called Object#intercept. A simplified version is this:
class Object
def intercept
tap{|x| p x}
end
end
:foo.intercept # => :foo
You can replace p with puts if you want.
What does 'a' and |f| mean below ?
open('myfile.out', 'a') { |f|
f.puts "Hello, world."
}
From the ruby IO doc:
"a" | Write-only, starts at end of file if file exists,
| otherwise creates a new file for writing.
The |f| is a variable that holds the IO object in the block (everything in the {}). So when you f.puts "Hello World" you're calling puts on the IO object which then writes to the file.
The 'a' is just a file open mode, like you'd see in C / C++. It means append, and is relatively uncommon - you're more likely to be familiar with 'r' (read), 'w' (write), etc.
The {|f| ... } bit is the exciting part. It's called a a block - they're everywhere, and they're probably my favourite part of Ruby - I've gone back to C++ recently, and I find myself cursing the language for not supporting them all the time.
Think of code like foo(bar) {|baz| ... } as creating a nameless function, and passing that function as another (hidden) argument to foo (kinda like this is a hidden argument to member functions in C++) - it's just not as hidden, 'cause you specify it right there.
Now, when you pass the block to foo, it will eventually call your block (using the yield statement), and it will supply the argument baz. If my foo behaved like your File.open function, its definition would look something like this:
def foo(filename, &block)
file = File.open(filename)
yield(file)
file.close
end
You can see how it opens the file, passes it to your block with yield, and then closes the file once your block returns. Very convenient - blocks are your friends!
Another good place to start wrapping your head around them is the each function - one of the simplest and most common block functions in Ruby:
[holt#Michaela ~]$ irb
irb(main):001:0> ['Welcome', 'to', 'Ruby!'].each {|word| puts word}
Welcome
to
Ruby!
=> ["Welcome", "to", "Ruby!"]
irb(main):002:0>
This time, your block gets called three times, and each time a different array element gets yielded to your block as word - it's a super-simple way to call a function for every element of an array.
Hope this helps, and welcome to Ruby!
'a' -> Mode in which to open the file ('append' mode)
f is a parameter to the block. A block is a piece of code that can be executed (it is a Proc object underneath).
Here, f will be the file descriptor, I think.
1) You call the open method, passing in the two arguments:
myfile.out <-- This is your file that you want to access
a <-- you are stating that you want to write to a file, starting at the end of the file(aka append)
2) The method open that exists in Kernel, yields an IO stream object aka |f|, in which you can access throughout your block.
3) You are appending "hello world" to myfile.out
4) Once the block ends, the IO stream closes.
The 'a', which stands for append, opens the file in write-only mode and starts writing at the end of the file. If no file exists, a new file is created. Please see the Ruby Docs for more information.
The |f| is a block parameter, which is being passed within the {}. For more information on blocks, please see The Pragmatic Programmer's Guide.
I would highly suggest reading through the help file for the File class for starters.
You can see there the documentation for the open method.
The method signature is File.open(filename, mode)
So, in your example, a, is the mode which in this case is append. Here's a list of valid values for the mode argument:
'r' - Open a file for reading. The file must exist.
'w' - Create an empty file for writing. If a file with the same name already exists its content is erased and the file is treated as a new empty file.
'a' - Append to a file. Writing operations append data at the end of the file. The file is created if it does not exist.
'r+' - Open a file for update both reading and writing. The file must exist.
'w+' - Create an empty file for both reading and writing. If a file with the same name already exists its content is erased and the file is treated as a new empty file.
'a+' - Open a file for reading and appending. All writing operations are performed at the end of the file, protecting the previous content to be overwritten. You can reposition (fseek, rewind) the internal pointer to anywhere in the file for reading, but writing operations will move it back to the end of file. The file is created if it does not exist.
If File.open is used in a block, such as in your example, then f becomes the block variable that points to the newly-opened file, which allows you to both read and write to the file just using f as the reference, while within the block. Using this form of File.open is nice because it handles closing the file automatically when the block ends.
open('myfile.out', 'a') -> Here 'a' means Write only access. Pointer is positioned at end of file.
|f| is the file descriptor, it does puts of "Hello, World."
Instead of |f|, you can write anything, say |abc| or |line|, it doesn't matter.
Ruby has two ways of referring to the standard input: The STDIN constant , and the $stdin global variable.
Aside from the fact that I can assign a different IO object to $stdin because it's not a constant (e.g. before forking to redirect IO in my children), what's the difference between STDIN and $stdin? When should I use each in my code?
If I reassign $stdin, does it affect STDIN?
And does this also apply to STDOUT/$stdout and STDER/$stderr?
If $stdin is reassigned, STDIN is not affected. Likewise $stdin is not affected when STDIN is reassigned (which is perfectly possible (though pointless), but will produce a warning). However if neither variable has been reassigned, they both point to the same IO object, so calling reopen¹ on one will affect the other.
All the built-in ruby methods use $< (a.k.a. ARGF) to read input. If ARGV is empty, ARGF reads from $stdin, so if you reassign $stdin, that will affect all built-in methods. If you reassign STDIN it will have no effect unless some 3rd party method uses STDIN.
In your own code you should use $stdin to be consistent with the built-in methods².
¹ reopen is a method which can redirect an IO object to another stream or file. However you can't use it to redirect an IO to a StringIO, so it does not eliminate all uses cases of reassigning $stdin.
² You may of course also use $</ARGF to be even more consistent with the built-in methods, but most of the time you don't want the ARGF behavior if you're explicitly using the stdin stream.
STDERR and $stderr are pointing to the same thing initially; you can reassign the global variable but you shouldn't mess with the constant. $stdin and STDIN, $stdout and STDOUT pairs are likewise.
I had to change STDERR a couple of times as an alternative to monkey-patching some gems outputting error messages with STDERR.puts. If you reassign with STDERR = $stdout you get a warning while STDERR.reopen('nul', 'w') goes without saying.