Where does Ruby keep track of its open file descriptors? - ruby

What This Question Is Not About
This question is not about how to auto-close a file with File#close or the File#open block syntax. It's a question about where Ruby stores its list of open file descriptors at runtime.
The Actual Question
If you have a program with open descriptors, but you don't have access to the related File or IO object, how can you find a reference to the currently-open file descriptors? Take this example:
filename='/tmp/foo'
%x( touch "#{filename}" )
File.open(filename)
filehandle = File.open(filename)
The first File instance is opened, but the reference to the object is not stored in a variable. The second instance is stored in filehandle, where I can easily access it with #inspect or #close.
However, the discarded File object isn't gone; it's just not accessible in any obvious way. Until the object is finalized, Ruby must be keeping track of it somewhere...but where?

TL; DR
All File and IO objects are stored in ObjectSpace.
Answer
The ObjectSpace class says:
The ObjectSpace module contains a number of routines that interact with the garbage collection facility and allow you to traverse all living objects with an iterator.
How I Tested This
I tested this at the console on Ruby 1.9.3p194.
The test fixture is really simple. The idea is to have two File objects with different object identities, but only one is directly accessible through a variable. The other is "out there somewhere."
# Don't save a reference to the first object.
filename='/tmp/foo'
File.open(filename)
filehandle = File.open(filename)
I then explored different ways I could interact with the File objects even if I didn't use an explicit object reference. This was surprisingly easy once I knew about ObjectSpace.
# List all open File objects.
ObjectSpace.each_object(File) do |f|
puts "%s: %d" % [f.path, f.fileno] unless f.closed?
end
# List the "dangling" File object which we didn't store in a variable.
ObjectSpace.each_object(File) do |f|
unless f.closed?
printf "%s: %d\n", f.path, f.fileno unless f === filehandle
end
end
# Close any dangling File objects. Ignore already-closed files, and leave
# the "accessible" object stored in *filehandle* alone.
ObjectSpace.each_object(File) {|f| f.close unless f === filehandle rescue nil}
Conclusion
There may be other ways to do this, but this is the answer I came up with to scratch my own itch. If you know a better way, please post another answer. The world will be a better place for it.

Related

In ruby, how to rename a text file, keep same handle and delete the old file

I am using a constant (FLOG) as a handle to write to my log. At a given point, I have to use a temporary log, and later append that content to the regular log, all that with the same handle, which is used through a bunch of methods.
My test program is below. After closing the handle 'FLOG' associated with the temp log, when I re-assign FLOG to the new log, this somehow re-opens the temp log, and I can't delete it.
Is there a way to make sure that the old temp file stays close (so I can delete it)
# Pre-existing log:
final_log = "final_#{Time.now.strftime("%Y%m%d")}.txt"
#Writing something in it
File.open(final_log, "w+") { |file| file.write("This is the final log: #{final_log}\n") }
# temp log:
temp_log = "temp_#{Time.now.strftime("%Y%m%d")}.txt"
FLOG = File.new(temp_log, "w+")
# write some stuff in temp_log
FLOG.puts "Writing in temp_log named #{temp_log}"
# closing handle for temp_log
FLOG.close
# avoid constant reuse warning:
Object.send(:remove_const,'FLOG') if Object.const_defined?('FLOG')
# need to append temp_log content to final_log with handle FLOG
FLOG = File.open(final_log, "a+")
# appending old temp log to new log
File.open(temp_log, "r").readlines.each do |line|
puts "appending... #{line}"
FLOG.puts "appending... #{line}"
end
# closing handle
FLOG.close
# this tells me that 'temp_log' is somehow re-opened:
ObjectSpace.each_object(File) { |f| puts("3: #{temp_log} is open") if f.path == temp_log && !f.closed? }
File.delete(temp_log) # Cant do that:
# test_file2.rb:35:in `delete': Permission denied - temp_20150324.txt (Errno::EACCES)
If you're going to use a temp file, use tempfile
require 'tempfile'
# Pre-existing log:
final_log = "final_#{Time.now.strftime("%Y%m%d")}.txt"
#Writing something in it
File.open(final_log, "w+") { |file| file.write("This is the final log: #{final_log}\n") }
# give the tempfile a meaningful prefix
temp_log = Tempfile.new('foobar')
begin
$flog = temp_log
# write some stuff in temp_log
$flog.puts "Writing in temp_log named #{temp_log.path}"
# need to append temp_log content to final_log with handle $flog
$flog = File.open(final_log, "a+")
# reopen temp_log for reading, append to new log
temp_log.open.readlines.each do |line|
puts "appending... #{line}"
$flog.puts "appending... #{line}"
end
# closing handle
$flog.close
ensure
# delete temp_log
temp_log.unlink
end
And while globals are generally bad, hacking a constant so that you can use it like a global is worse.
temp_log is still open because you didn't close it. If you did something like:
temp_log_lines = File.open(temp_log, 'r') { |f| f.readlines }
then the I/O stream for to temp_log would be closed at the end of the block. However, doing File.open(temp_log, "r").readlines takes the IO object returned by File.open and calls readlines on it, which you then call each on with an accompanying block. Since the block is part of your call to each and not File.open, the stream is not closed at the end of it, and stays open for the rest of the program.
As to why you can't delete temp_log at the end of the program, it's hard to say without knowing what's going on in the underlying file system. Neither Ruby nor the underlying (POSIX) OS will complain if you delete a file that you've opened a stream for and not closed; the file will be unlinked but the stream will persist and still have the contents of the file and so on. The error you're getting is saying that the owner of the Ruby process for this program doesn't have the rights to delete the file the program created. That's strange, but hard to diagnose just from this code. Consider the directory the program is working in, what the permissions on it are, etc.
Speaking more generally, there are some things you could make use of in Ruby here that would make your life easier.
If you want a temporary file, there is a Tempfile class you could make use of that does a lot of legwork for you.
The idiomatic way of doing I/O with files in Ruby is to pass a block into File.open. The block is handed an I/O stream for the file that is automatically closed at the end of the block, so you don't need to do it manually. Here's an example:
flog = File.new(temp_log, 'w+') do |f|
f.puts "Writing in temp_log named #{temp_log}"
end
FLOG is not a true constant in your code. A constant is only a constant if its value never changes throughout the life of the program it's declared in. Ruby is a very permissive language, so it allows you to reassign them, but warns you if you do. Stricter languages would throw an error if you did that. FLOG is just a normal variable and should be written flog. A good use for a constant is a value external to the action of your program that your program needs be able to reference—for instance, instead of writing 3.141592653589793 every time you need to refer to an approximation of pi, you could declare PI = 3.141592653589793 and use PI afterwards. In Ruby, this has been done for you in the Math module—Math::PI returns this. User settings are another place constants often show up—they're determined before the program gets going, help to determine what it does, and should be unmodified during its execution, so storing them in constants sometimes makes sense.
You describe the program you supplied as a test program. Ruby has really great testing libraries you could make use of that will be nicer than writing scripts like this. Minitest is part of the Ruby standard library and is my favorite testing framework in Ruby, but a lot of people like RSpec too. (I'd like to link to the documentation for those frameworks, but I don't have enough reputation—sorry. You'll have to Google.)
It'll be hard to make use of those frameworks if you write your code imperatively like this, though. Ruby is a very object-oriented language and you'll get a lot out of structuring your code in an object-oriented style when working in it. If you're not familiar with OO design, some books that have been really good for me are Practical Object-Oriented Design in Ruby by Sandi Metz, Refactoring: Improving the Design of Existing Code by Martin Fowler et al., and Growing Object-Oriented Software, Guided by Tests, by Steve Freeman and Nat Pryce. (Same thing here with the lack of links.)

Difference with open-uri with block and without it

What is the difference between doing:
file = open('myurl')
# Do stuff with file
And doing:
open('myurl') do |file|
# Do things with file
end
Do I need to close and remove the file when I am not using the block approach? If so, how do I close and remove it? I don't see any close/remove method in the docs
The documentation for OpenURI is a little opaque to beginners, but the docs for #open can be found here.
Those docs say:
#open returns an IO-like object if block is not given. Otherwise it yields the IO object and return the value of the block.
The key words here are "IO-like object." We can infer from that that the object (in your examples, file), will respond to the #close method.
While the documentation doesn't say so, by looking at the source we can see that #open will return either a StringIO or a Tempfile object, depending on the size of the data returned. OpenURI's internal Buffer class first initializes a StringIO object, but if the size of the output exceeds 10,240 bytes it creates a Tempfile and writes the data to it (to avoid storing large amounts of data in memory). Both StringIO and Tempfile have behavior consistent with IO, so it's good practice (when not passing a block to #open), to call #close on the object in an ensure:
begin
file = open(url)
# ...do some work...
ensure
file.close
end
Code in the ensure section always runs, even if code between begin and ensure raises an exception, so this will, well, ensure that file.close gets called even if an error occurs.

Learn Ruby the Hard Way ex17 extra credit 3 - consolidating to one line

For exercise 17, through searching other responses I was able to condense the following into one line (as asked in the extra credit #3)
from_file, to_file = ARGV
script = $0
input = File.open(from_file)
indata = input.read()
output = File.open(to_file, 'w')
output.write(indata)
output.close()
input.close()
I was able to condense it into:
from_file, to_file = ARGV
script = $0
File.open(to_file, 'w') {|f| f.write IO.read(from_file)}
Is there a better/different way to condense this into 1 line?
Can someone help explain the line I created? I created this from various questions/answers unrelated to this question. I have tried looking up exactly what I did but I am still a little lost and want a full understanding of it.
Similar to using IO::read to simplify "just read the whole file into a string", you can use IO::write to "just write the string to the file":
from_file, to_file = ARGV
IO.write(to_file, IO.read(from_file))
Since you don't use script, it can be removed. If you really want to get things down to one line, you can do:
IO.write(ARGV[1], IO.read(ARGV[0]))
I personally find this just as comprehensible, and the lack of error checking is equivalent.
You're using File#open with a block to open to_file in write-only mode ('w'). Inside the block you have access to the open file as f, and the file will be closed for you when the block terminates. IO::read reads the entire contents of from_file, which you then pass to IO#write on f (File is a subclass of IO), writing those contents to f (which is the open, write-only File for to_file).
There are always different ways of doing things:
Using File.open with a block is a good approach here. I like that to_file and from_file are declared in variables. So I think this is a good and readable solution that is not overly verbose.
The basic approach here is swapping out open/close operations with the more-clean File.open method with a block. File.open with a block will open a file, run the block, and then close the file, which is exactly what is needed here. Because the method automatically opens and closes the file, we are able to remove the boilerplate code that appears in the initial example. IO.read is another shortcut method that allows us to open/read/close the file without all of the open/close boilerplate. This is an exercise to learn more about Ruby's standard File/IO library, and in this case swapping out the more verbose methods is sufficient to reduce things to a single line.
I'm just a complete beginner, but this works for me:
open(ARGV[1], 'w').write(open(ARGV[0]).read)
It doesn't look elegant for me, but it works.
Edit: This is my attempt to put the entire script into one line if it's not clear.

Is it possible to refer to a parameter passed to a method within the passed block in ruby?

I hope I am not repeating anyone here, but I have been searching google and here and not coming up with anything. This question is really more a matter of "sexifying" my code.
What I am specifically trying to do is this:
Dir.new('some_directory').each do |file|
# is there a way to refer to the string 'some_directory' via a method or variable?
end
Thanks!
Not in general; it's totally up to the method itself what arguments the block gets called with, and by the time each has been called (which calls your block), the fact that the string 'some_directory' was passed to Dir.new has been long forgotten, i.e. they're quite separate things.
You can do something like this, though:
Dir.new(my_dir = 'some_directory').each do |file|
puts "#{my_dir} contains #{file}"
end
The reason it won't work is that new and each are two different methods so they don't have access to each others' parameters. To 'sexify' your code, you could consider creating a new method to contain the two method calls and pass the repeated parameter to that:
def do_something(dir)
Dir.new(dir).each do |file|
# use dir in some way
end
end
The fact that creating a new method has such a low overhead means it's entirely reasonable to create one for as small a chunk of code as this - and is one of the many reasons that make Ruby such a pleasure of a language to work with.
Just break it out into a variable. Ruby blocks are closures so they will have access -
dir = 'some_directory'
Dir.new(dir).each do |file|
# use dir here as expected.
end

What does 'a' mean in Ruby `open()`, and what does |f| mean?

What does 'a' and |f| mean below ?
open('myfile.out', 'a') { |f|
f.puts "Hello, world."
}
From the ruby IO doc:
"a" | Write-only, starts at end of file if file exists,
| otherwise creates a new file for writing.
The |f| is a variable that holds the IO object in the block (everything in the {}). So when you f.puts "Hello World" you're calling puts on the IO object which then writes to the file.
The 'a' is just a file open mode, like you'd see in C / C++. It means append, and is relatively uncommon - you're more likely to be familiar with 'r' (read), 'w' (write), etc.
The {|f| ... } bit is the exciting part. It's called a a block - they're everywhere, and they're probably my favourite part of Ruby - I've gone back to C++ recently, and I find myself cursing the language for not supporting them all the time.
Think of code like foo(bar) {|baz| ... } as creating a nameless function, and passing that function as another (hidden) argument to foo (kinda like this is a hidden argument to member functions in C++) - it's just not as hidden, 'cause you specify it right there.
Now, when you pass the block to foo, it will eventually call your block (using the yield statement), and it will supply the argument baz. If my foo behaved like your File.open function, its definition would look something like this:
def foo(filename, &block)
file = File.open(filename)
yield(file)
file.close
end
You can see how it opens the file, passes it to your block with yield, and then closes the file once your block returns. Very convenient - blocks are your friends!
Another good place to start wrapping your head around them is the each function - one of the simplest and most common block functions in Ruby:
[holt#Michaela ~]$ irb
irb(main):001:0> ['Welcome', 'to', 'Ruby!'].each {|word| puts word}
Welcome
to
Ruby!
=> ["Welcome", "to", "Ruby!"]
irb(main):002:0>
This time, your block gets called three times, and each time a different array element gets yielded to your block as word - it's a super-simple way to call a function for every element of an array.
Hope this helps, and welcome to Ruby!
'a' -> Mode in which to open the file ('append' mode)
f is a parameter to the block. A block is a piece of code that can be executed (it is a Proc object underneath).
Here, f will be the file descriptor, I think.
1) You call the open method, passing in the two arguments:
myfile.out <-- This is your file that you want to access
a <-- you are stating that you want to write to a file, starting at the end of the file(aka append)
2) The method open that exists in Kernel, yields an IO stream object aka |f|, in which you can access throughout your block.
3) You are appending "hello world" to myfile.out
4) Once the block ends, the IO stream closes.
The 'a', which stands for append, opens the file in write-only mode and starts writing at the end of the file. If no file exists, a new file is created. Please see the Ruby Docs for more information.
The |f| is a block parameter, which is being passed within the {}. For more information on blocks, please see The Pragmatic Programmer's Guide.
I would highly suggest reading through the help file for the File class for starters.
You can see there the documentation for the open method.
The method signature is File.open(filename, mode)
So, in your example, a, is the mode which in this case is append. Here's a list of valid values for the mode argument:
'r' - Open a file for reading. The file must exist.
'w' - Create an empty file for writing. If a file with the same name already exists its content is erased and the file is treated as a new empty file.
'a' - Append to a file. Writing operations append data at the end of the file. The file is created if it does not exist.
'r+' - Open a file for update both reading and writing. The file must exist.
'w+' - Create an empty file for both reading and writing. If a file with the same name already exists its content is erased and the file is treated as a new empty file.
'a+' - Open a file for reading and appending. All writing operations are performed at the end of the file, protecting the previous content to be overwritten. You can reposition (fseek, rewind) the internal pointer to anywhere in the file for reading, but writing operations will move it back to the end of file. The file is created if it does not exist.
If File.open is used in a block, such as in your example, then f becomes the block variable that points to the newly-opened file, which allows you to both read and write to the file just using f as the reference, while within the block. Using this form of File.open is nice because it handles closing the file automatically when the block ends.
open('myfile.out', 'a') -> Here 'a' means Write only access. Pointer is positioned at end of file.
|f| is the file descriptor, it does puts of "Hello, World."
Instead of |f|, you can write anything, say |abc| or |line|, it doesn't matter.

Resources