Difference with open-uri with block and without it - ruby

What is the difference between doing:
file = open('myurl')
# Do stuff with file
And doing:
open('myurl') do |file|
# Do things with file
end
Do I need to close and remove the file when I am not using the block approach? If so, how do I close and remove it? I don't see any close/remove method in the docs

The documentation for OpenURI is a little opaque to beginners, but the docs for #open can be found here.
Those docs say:
#open returns an IO-like object if block is not given. Otherwise it yields the IO object and return the value of the block.
The key words here are "IO-like object." We can infer from that that the object (in your examples, file), will respond to the #close method.
While the documentation doesn't say so, by looking at the source we can see that #open will return either a StringIO or a Tempfile object, depending on the size of the data returned. OpenURI's internal Buffer class first initializes a StringIO object, but if the size of the output exceeds 10,240 bytes it creates a Tempfile and writes the data to it (to avoid storing large amounts of data in memory). Both StringIO and Tempfile have behavior consistent with IO, so it's good practice (when not passing a block to #open), to call #close on the object in an ensure:
begin
file = open(url)
# ...do some work...
ensure
file.close
end
Code in the ensure section always runs, even if code between begin and ensure raises an exception, so this will, well, ensure that file.close gets called even if an error occurs.

Related

In ruby, how to rename a text file, keep same handle and delete the old file

I am using a constant (FLOG) as a handle to write to my log. At a given point, I have to use a temporary log, and later append that content to the regular log, all that with the same handle, which is used through a bunch of methods.
My test program is below. After closing the handle 'FLOG' associated with the temp log, when I re-assign FLOG to the new log, this somehow re-opens the temp log, and I can't delete it.
Is there a way to make sure that the old temp file stays close (so I can delete it)
# Pre-existing log:
final_log = "final_#{Time.now.strftime("%Y%m%d")}.txt"
#Writing something in it
File.open(final_log, "w+") { |file| file.write("This is the final log: #{final_log}\n") }
# temp log:
temp_log = "temp_#{Time.now.strftime("%Y%m%d")}.txt"
FLOG = File.new(temp_log, "w+")
# write some stuff in temp_log
FLOG.puts "Writing in temp_log named #{temp_log}"
# closing handle for temp_log
FLOG.close
# avoid constant reuse warning:
Object.send(:remove_const,'FLOG') if Object.const_defined?('FLOG')
# need to append temp_log content to final_log with handle FLOG
FLOG = File.open(final_log, "a+")
# appending old temp log to new log
File.open(temp_log, "r").readlines.each do |line|
puts "appending... #{line}"
FLOG.puts "appending... #{line}"
end
# closing handle
FLOG.close
# this tells me that 'temp_log' is somehow re-opened:
ObjectSpace.each_object(File) { |f| puts("3: #{temp_log} is open") if f.path == temp_log && !f.closed? }
File.delete(temp_log) # Cant do that:
# test_file2.rb:35:in `delete': Permission denied - temp_20150324.txt (Errno::EACCES)
If you're going to use a temp file, use tempfile
require 'tempfile'
# Pre-existing log:
final_log = "final_#{Time.now.strftime("%Y%m%d")}.txt"
#Writing something in it
File.open(final_log, "w+") { |file| file.write("This is the final log: #{final_log}\n") }
# give the tempfile a meaningful prefix
temp_log = Tempfile.new('foobar')
begin
$flog = temp_log
# write some stuff in temp_log
$flog.puts "Writing in temp_log named #{temp_log.path}"
# need to append temp_log content to final_log with handle $flog
$flog = File.open(final_log, "a+")
# reopen temp_log for reading, append to new log
temp_log.open.readlines.each do |line|
puts "appending... #{line}"
$flog.puts "appending... #{line}"
end
# closing handle
$flog.close
ensure
# delete temp_log
temp_log.unlink
end
And while globals are generally bad, hacking a constant so that you can use it like a global is worse.
temp_log is still open because you didn't close it. If you did something like:
temp_log_lines = File.open(temp_log, 'r') { |f| f.readlines }
then the I/O stream for to temp_log would be closed at the end of the block. However, doing File.open(temp_log, "r").readlines takes the IO object returned by File.open and calls readlines on it, which you then call each on with an accompanying block. Since the block is part of your call to each and not File.open, the stream is not closed at the end of it, and stays open for the rest of the program.
As to why you can't delete temp_log at the end of the program, it's hard to say without knowing what's going on in the underlying file system. Neither Ruby nor the underlying (POSIX) OS will complain if you delete a file that you've opened a stream for and not closed; the file will be unlinked but the stream will persist and still have the contents of the file and so on. The error you're getting is saying that the owner of the Ruby process for this program doesn't have the rights to delete the file the program created. That's strange, but hard to diagnose just from this code. Consider the directory the program is working in, what the permissions on it are, etc.
Speaking more generally, there are some things you could make use of in Ruby here that would make your life easier.
If you want a temporary file, there is a Tempfile class you could make use of that does a lot of legwork for you.
The idiomatic way of doing I/O with files in Ruby is to pass a block into File.open. The block is handed an I/O stream for the file that is automatically closed at the end of the block, so you don't need to do it manually. Here's an example:
flog = File.new(temp_log, 'w+') do |f|
f.puts "Writing in temp_log named #{temp_log}"
end
FLOG is not a true constant in your code. A constant is only a constant if its value never changes throughout the life of the program it's declared in. Ruby is a very permissive language, so it allows you to reassign them, but warns you if you do. Stricter languages would throw an error if you did that. FLOG is just a normal variable and should be written flog. A good use for a constant is a value external to the action of your program that your program needs be able to reference—for instance, instead of writing 3.141592653589793 every time you need to refer to an approximation of pi, you could declare PI = 3.141592653589793 and use PI afterwards. In Ruby, this has been done for you in the Math module—Math::PI returns this. User settings are another place constants often show up—they're determined before the program gets going, help to determine what it does, and should be unmodified during its execution, so storing them in constants sometimes makes sense.
You describe the program you supplied as a test program. Ruby has really great testing libraries you could make use of that will be nicer than writing scripts like this. Minitest is part of the Ruby standard library and is my favorite testing framework in Ruby, but a lot of people like RSpec too. (I'd like to link to the documentation for those frameworks, but I don't have enough reputation—sorry. You'll have to Google.)
It'll be hard to make use of those frameworks if you write your code imperatively like this, though. Ruby is a very object-oriented language and you'll get a lot out of structuring your code in an object-oriented style when working in it. If you're not familiar with OO design, some books that have been really good for me are Practical Object-Oriented Design in Ruby by Sandi Metz, Refactoring: Improving the Design of Existing Code by Martin Fowler et al., and Growing Object-Oriented Software, Guided by Tests, by Steve Freeman and Nat Pryce. (Same thing here with the lack of links.)

Should I destroy File object after File.read and File.open in Ruby?

Suppose two kinds of ruby File operations.
Firstly,
file = File.open("xxx")
file.close
Secondly,
file = File.read("xxx")
file.close
It's known to all that we should close file after we finish using it. But, in the second block of code, Ruby interpreter throw an error message shown below:
in `<main>': undefined method `close' for #<String:0x000000022a3a08> (NoMethodError)
I need not to use file.close in the second case? I wonder why?
It's because File.read method returns string with content of the file, not the File object. And yes, you don't need to use close explicitly if you use File.read method, because ruby does it for you automatically.
Marek Lipka answered correctly, I just wanted you to point to the documentation sentences again.
I need not to use file.close in the second case?
You don't need to do so.
Read the doc IO::read carefully :
Opens the file, optionally seeks to the given offset, then returns length bytes (defaulting to the rest of the file). read ensures the file is closed before returning.

Where does Ruby keep track of its open file descriptors?

What This Question Is Not About
This question is not about how to auto-close a file with File#close or the File#open block syntax. It's a question about where Ruby stores its list of open file descriptors at runtime.
The Actual Question
If you have a program with open descriptors, but you don't have access to the related File or IO object, how can you find a reference to the currently-open file descriptors? Take this example:
filename='/tmp/foo'
%x( touch "#{filename}" )
File.open(filename)
filehandle = File.open(filename)
The first File instance is opened, but the reference to the object is not stored in a variable. The second instance is stored in filehandle, where I can easily access it with #inspect or #close.
However, the discarded File object isn't gone; it's just not accessible in any obvious way. Until the object is finalized, Ruby must be keeping track of it somewhere...but where?
TL; DR
All File and IO objects are stored in ObjectSpace.
Answer
The ObjectSpace class says:
The ObjectSpace module contains a number of routines that interact with the garbage collection facility and allow you to traverse all living objects with an iterator.
How I Tested This
I tested this at the console on Ruby 1.9.3p194.
The test fixture is really simple. The idea is to have two File objects with different object identities, but only one is directly accessible through a variable. The other is "out there somewhere."
# Don't save a reference to the first object.
filename='/tmp/foo'
File.open(filename)
filehandle = File.open(filename)
I then explored different ways I could interact with the File objects even if I didn't use an explicit object reference. This was surprisingly easy once I knew about ObjectSpace.
# List all open File objects.
ObjectSpace.each_object(File) do |f|
puts "%s: %d" % [f.path, f.fileno] unless f.closed?
end
# List the "dangling" File object which we didn't store in a variable.
ObjectSpace.each_object(File) do |f|
unless f.closed?
printf "%s: %d\n", f.path, f.fileno unless f === filehandle
end
end
# Close any dangling File objects. Ignore already-closed files, and leave
# the "accessible" object stored in *filehandle* alone.
ObjectSpace.each_object(File) {|f| f.close unless f === filehandle rescue nil}
Conclusion
There may be other ways to do this, but this is the answer I came up with to scratch my own itch. If you know a better way, please post another answer. The world will be a better place for it.

Do I need to close the file when using YAML::load?

like the following line of code
sites = YAML::load(File.open(SITESPATH))
is it necessary to change to
File.open(SITESPATH) do |file|
sites = YAML::load(file)
end
just in order to make it sure that file got closed?
Yes, you should close the file, so your second example is the correct one.
Just as a side-note, remember that the sites variable will not be visible outside the block, unless you already created it before the block.
Because IO.open, when called with block, returns the value of the block, you may use:
sites = File.open(SITESPATH) {|file| YAML::load(file) }
You could use YAML.load_file(filename) instead.
This isn't really about YAML::load so much as it is about File/IO streams generally.
Called without a block, File.open is exactly the same as File.new. This doesn't close a file on its own, so you would need to close it yourself.
From the documentation:
With no associated block, [File.]open is a synonym for IO.new. If the
optional code block is given, it will be passed io as an argument,
and the IO object will automatically be closed when the block
terminates. In this instance, IO.open returns the value of the block.

What does 'a' mean in Ruby `open()`, and what does |f| mean?

What does 'a' and |f| mean below ?
open('myfile.out', 'a') { |f|
f.puts "Hello, world."
}
From the ruby IO doc:
"a" | Write-only, starts at end of file if file exists,
| otherwise creates a new file for writing.
The |f| is a variable that holds the IO object in the block (everything in the {}). So when you f.puts "Hello World" you're calling puts on the IO object which then writes to the file.
The 'a' is just a file open mode, like you'd see in C / C++. It means append, and is relatively uncommon - you're more likely to be familiar with 'r' (read), 'w' (write), etc.
The {|f| ... } bit is the exciting part. It's called a a block - they're everywhere, and they're probably my favourite part of Ruby - I've gone back to C++ recently, and I find myself cursing the language for not supporting them all the time.
Think of code like foo(bar) {|baz| ... } as creating a nameless function, and passing that function as another (hidden) argument to foo (kinda like this is a hidden argument to member functions in C++) - it's just not as hidden, 'cause you specify it right there.
Now, when you pass the block to foo, it will eventually call your block (using the yield statement), and it will supply the argument baz. If my foo behaved like your File.open function, its definition would look something like this:
def foo(filename, &block)
file = File.open(filename)
yield(file)
file.close
end
You can see how it opens the file, passes it to your block with yield, and then closes the file once your block returns. Very convenient - blocks are your friends!
Another good place to start wrapping your head around them is the each function - one of the simplest and most common block functions in Ruby:
[holt#Michaela ~]$ irb
irb(main):001:0> ['Welcome', 'to', 'Ruby!'].each {|word| puts word}
Welcome
to
Ruby!
=> ["Welcome", "to", "Ruby!"]
irb(main):002:0>
This time, your block gets called three times, and each time a different array element gets yielded to your block as word - it's a super-simple way to call a function for every element of an array.
Hope this helps, and welcome to Ruby!
'a' -> Mode in which to open the file ('append' mode)
f is a parameter to the block. A block is a piece of code that can be executed (it is a Proc object underneath).
Here, f will be the file descriptor, I think.
1) You call the open method, passing in the two arguments:
myfile.out <-- This is your file that you want to access
a <-- you are stating that you want to write to a file, starting at the end of the file(aka append)
2) The method open that exists in Kernel, yields an IO stream object aka |f|, in which you can access throughout your block.
3) You are appending "hello world" to myfile.out
4) Once the block ends, the IO stream closes.
The 'a', which stands for append, opens the file in write-only mode and starts writing at the end of the file. If no file exists, a new file is created. Please see the Ruby Docs for more information.
The |f| is a block parameter, which is being passed within the {}. For more information on blocks, please see The Pragmatic Programmer's Guide.
I would highly suggest reading through the help file for the File class for starters.
You can see there the documentation for the open method.
The method signature is File.open(filename, mode)
So, in your example, a, is the mode which in this case is append. Here's a list of valid values for the mode argument:
'r' - Open a file for reading. The file must exist.
'w' - Create an empty file for writing. If a file with the same name already exists its content is erased and the file is treated as a new empty file.
'a' - Append to a file. Writing operations append data at the end of the file. The file is created if it does not exist.
'r+' - Open a file for update both reading and writing. The file must exist.
'w+' - Create an empty file for both reading and writing. If a file with the same name already exists its content is erased and the file is treated as a new empty file.
'a+' - Open a file for reading and appending. All writing operations are performed at the end of the file, protecting the previous content to be overwritten. You can reposition (fseek, rewind) the internal pointer to anywhere in the file for reading, but writing operations will move it back to the end of file. The file is created if it does not exist.
If File.open is used in a block, such as in your example, then f becomes the block variable that points to the newly-opened file, which allows you to both read and write to the file just using f as the reference, while within the block. Using this form of File.open is nice because it handles closing the file automatically when the block ends.
open('myfile.out', 'a') -> Here 'a' means Write only access. Pointer is positioned at end of file.
|f| is the file descriptor, it does puts of "Hello, World."
Instead of |f|, you can write anything, say |abc| or |line|, it doesn't matter.

Resources