It's common knowledge in most programming languages that the flow for working with files is open-use-close. Yet I saw many times in ruby codes unmatched File.open calls, and moreover I found this gem of knowledge in the ruby docs:
I/O streams are automatically closed when they are claimed by the garbage collector.
darkredandyellow friendly irc take on the issue:
[17:12] yes, and also, the number of file descriptors is usually limited by the OS
[17:29] I assume you can easily run out of available file descriptors before the garbage collector cleans up. in this case, you might want to use close them yourself. "claimed by the garbage collector." means that the GC acts at some point in the future. and it's expensive. a lot of reasons for explicitly closing files.
Do we need to explicitly close
If yes then why does the GC autoclose ?
If not then why the option?
I saw many times in ruby codes unmatched File.open calls
Can you give an example? I only ever see that in code written by newbies who lack the "common knowledge in most programming languages that the flow for working with files is open-use-close".
Experienced Rubyists either explicitly close their files, or, more idiomatically, use the block form of File.open, which automatically closes the file for you. Its implementation basically looks something like like this:
def File.open(*args, &block)
return open_with_block(*args, &block) if block_given?
open_without_block(*args)
end
def File.open_without_block(*args)
# do whatever ...
end
def File.open_with_block(*args)
yield f = open_without_block(*args)
ensure
f.close
end
Scripts are a special case. Scripts generally run so short, and use so few file descriptors that it simply doesn't make sense to close them, since the operating system will close them anyway when the script exits.
Do we need to explicitly close?
Yes.
If yes then why does the GC autoclose?
Because after it has collected the object, there is no way for you to close the file anymore, and thus you would leak file descriptors.
Note that it's not the garbage collector that closes the files. The garbage collector simply executes any finalizers for an object before it collects it. It just so happens that the File class defines a finalizer which closes the file.
If not then why the option?
Because wasted memory is cheap, but wasted file descriptors aren't. Therefore, it doesn't make sense to tie the lifetime of a file descriptor to the lifetime of some chunk of memory.
You simply cannot predict when the garbage collector will run. You cannot even predict if it will run at all: if you never run out of memory, the garbage collector will never run, therefore the finalizer will never run, therefore the file will never be closed.
You should always close file descriptors after use, that will also flush it. Often people use File.open or equivalent method with blocks to handle file descriptor lifetime. For example:
File.open('foo', 'w') do |f|
f.write "bar"
end
In that example the file is closed automatically.
According to http://ruby-doc.org/core-2.1.4/File.html#method-c-open
With no associated block, File.open is a synonym for ::new. If the
optional code block is given, it will be passed the opened file as an argument
and the File object will automatically be closed when the block
terminates. The value of the block will be returned from File.open.
Therefore, will automatically be closed when the block terminates :D
Yes
In case you don't, or if there is some other failure
See 2.
We can use the File.read() function to read the file in ruby.....
such as,
file_variable = File.read("filename.txt")
in this example file_variable can have the full value of that file....
Related
Is there a way to create a single IO object whose read stream is the current process's STDOUT and whose write stream is the current process's STDIN?
This is similar to IO.popen, which runs a command as a subprocess and returns an IO object connected to the subprocesses standard streams. However, I don't want to run a subprocess, I want to use the current Ruby process.
Is there a way to create a single IO object
No. STDIN and STDOUT are two different file descriptors. An IO represents a single FD.
You can however, make something that acts like an IO object.
This probably contains a bunch of bugs as duplicating FDs is often bad.
require "forwardable"
class IOTee < IO
extend Forwardable
def_delegators :#in,
:close_read,
:read,
:read_nonblock,
:readchar,
:readlines,
:readpartial,
:sysread
def_delegators :#out,
:close_write,
:syswrite,
:write,
:write_nonblock
def initialize(input,output)
#in = input
#out = output
end
end
io = IOTee.new(STDIN,STDOUT) # You would swap these
io.puts("hi")
hi
=> nil
Depending on what you're doing there is IO#pipe and IO#reopen which could also be helpful.
http://ruby-doc.org/core-2.1.0/IO.html#method-i-reopen
http://ruby-doc.org/core-2.1.0/IO.html#method-c-pipe
I suspect that the above isn't really the problem you want to solve, but the problem you hit with your solution to the problem.
I suspect really making a pipe and reopening STDOUT and STDIN to either end is what you're really after. Combining them in a single IO object doesn't make much sense.
Also, if you were talking to yourself via STDIN and STDOUT, it would be very easy to reach a deadlock while you wait for yourself to read or write data.
I'd like to read and write a file atomically in Ruby between multiple independent Ruby processes (not threads).
I found atomic_write from ActiveSupport. This writes to a temp file, then moves it over the original and sets all permissions. However, this does not prevent the file from being read while it is being written.
I have not found any atomic_read. (Are file reads already atomic?)
Do I need to implement my own separate 'lock' file that I check for before reads and writes? Or is there a better mechanism already present in the file system for flagging a file as 'busy' that I could check before any read/write?
The motivation is dumb, but included here because you're going to ask about it.
I have a web application using Sinatra and served by Thin which (for its own reasons) uses a JSON file as a 'database'. Each request to the server reads the latest version of the file, makes any necessary changes, and writes out changes to the file.
This would be fine if I only had a single instance of the server running. However, I was thinking about having multiple copies of Thin running behind an Apache reverse proxy. These are discrete Ruby processes, and thus running truly in parallel.
Upon further reflection I realize that I really want to make the act of read-process-write atomic. At which point I realize that this basically forces me to process only one request at a time, and thus there's no reason to have multiple instances running. But the curiosity about atomic reads, and preventing reads during write, remains. Hence the question.
You want to use File#flock in exclusive mode. Here's a little demo. Run this in two different terminal windows.
filename = 'test.txt'
File.open(filename, File::RDWR) do |file|
file.flock(File::LOCK_EX)
puts "content: #{file.read}"
puts 'doing some heavy-lifting now'
sleep(10)
end
Take a look at transaction and open_and_lock_file methods in "pstore.rb" (Ruby stdlib).
YAML::Store works fine for me. So when I need to read/write atomically I (ab)use it to store data as a Hash.
I wrote a script that operates on my Mac just fine. It has this line of code in it:
filename = "2011"
File.open(filename, File::WRONLY|File::CREAT|File::EXCL) do |logfile|
logfile.puts "MemberID,FirstName,LastName,BadEmail,gender,dateofbirth,ActiveStatus,Phone"
On Windows the script runs fine and it creates the logfile 2011, but it doesn't actually puts anything to that logfile, so the file is created, the script runs, but the logging doesn't happen.
Does anyone know why? I can't think of what would have changed in the actual functionality of the script that would cause the logging to cease.
First, for clarity I wouldn't use the flags to specify how to open/create the file. I'd use:
File.open(filename, 'a')
That's the standard mode for log-files; You want to create it if it doesn't exist, and you want to append if it does.
Logging typically requires writing to the same file multiple times through the running time of an application. People like to open the log and leave it open, but there's potential for problems if the code crashes before the file is closed or it gets flushed by Ruby or the OS. Also, the built-in buffering by Ruby and the OS can cause the file to buffer, then flush, which, when you're tailing the file, will make it jump in big chunks, which isn't much good if you're watching for something.
You can tell Ruby to force flushing immediately when you write to the file by setting sync = true:
logfile = File.open(filename, 'a')
logfile.sync = true
logfile.puts 'foo'
logfile.close
You could use fsync, which also forces the OS to flush its buffer.
The downside to forcing sync in either way is you negate the advantage of buffering your I/O. For normal file writing, like to a text file, don't use sync because you'll slow your application down. Instead let normal I/O happen as Ruby and the OS want. But for logging it's acceptable because logging should periodically send a line, not a big blob of text.
You could immediately flush the output, but that gets redundant and violates the DRY principle:
logfile = File.open(filename, 'a')
logfile.puts 'foo'
logfile.flush
logfile.puts 'bar'
logfile.flush
logfile.close
close flushes before actually closing the file I/O.
You can wrap your logging output in a method:
def log(text)
File.open(log_file, 'a') do |logout|
logout.puts(text)
end
end
That'll open, then close, the log file, and automatically flush the buffer, and negate the need to use sync.
Or you could take advantage of Ruby's Logger class and let it do all the work for you.
I am performing very rapid file access in ruby (2.0.0 p39474), and keep getting the exception Too many open files
Having looked at this thread, here, and various other sources, I'm well aware of the OS limits (set to 1024 on my system).
The part of my code that performs this file access is mutexed, and takes the form:
File.open( filename, 'w'){|f| Marshal.dump(value, f) }
where filename is subject to rapid change, depending on the thread calling the section. It's my understanding that this form relinquishes its file handle after the block.
I can verify the number of File objects that are open using ObjectSpace.each_object(File). This reports that there are up to 100 resident in memory, but only one is ever open, as expected.
Further, the exception itself is thrown at a time when there are only 10-40 File objects reported by ObjectSpace. Further, manually garbage collecting fails to improve any of these counts, as does slowing down my script by inserting sleep calls.
My question is, therefore:
Am I fundamentally misunderstanding the nature of the OS limit---does it cover the whole lifetime of a process?
If so, how do web servers avoid crashing out after accessing over ulimit -n files?
Is ruby retaining its file handles outside of its object system, or is the kernel simply very slow at counting 'concurrent' access?
Edit 20130417:
strace indicates that ruby doesn't write all of its data to the file, returning and releasing the mutex before doing so. As such, the file handles stack up until the OS limit.
In an attempt to fix this, I have used syswrite/sysread, synchronous mode, and called flush before close. None of these methods worked.
My question is thus revised to:
Why is ruby failing to close its file handles, and how can I force it to do so?
Use dtrace or strace or whatever equivalent is on your system, and find out exactly what files are being opened.
Note that these could be sockets.
I agree that the code you have pasted does not seem to be capable of causing this problem, at least, not without a rather strange concurrency bug as well.
I am trying to write to a single file from multiple threads. The problem I'm running into is that I don't see anything being written to the file until the program exits.
You need to file.flush to write it out. You can also set file.sync = true to have it flush automatically.
What is the value of the sync method on your io object? It is possible that either ruby or the underlying o/s are buffering the file output.
Check out the refences on buffering and syncing within the documentation