How to handle exceptions in Find.find - ruby

I'm processing files and directories looking for the most recent modified file in each directory. The code I have works but, being new to Ruby, I'm having trouble handling errors correctly.
I use Find.find to get a recursive directory listing, calling my own function newestFile for each directory:
Find.find(ARGV[0]) { |f|
if File.directory?(f)
newestFile(f)
end
}
In the directory tree there are folders I do not have permission to access, so I want to ignore them and go on to the next, but I cannot see how to incorporate the exception handling in to the Find.find "loop".
I tried to put begin..rescue..end around the block but that does not allow me to continue processing the loop.
I also found this SO question: How to continue processing a block in Ruby after an exception? but that handles the error in the loop. I'm trying to recover from an errors occurring in Find.find which would be outside the exception block.
Here's the stack trace of the error:
PS D:\dev\ruby\> ruby .\findrecent.rb "M:/main/*"
C:/Ruby200/lib/ruby/2.0.0/find.rb:51:in `open': Invalid argument - M:/main/<A FOLDER I CAN'T ACCESS> (Errno::EINVAL)
from C:/Ruby200/lib/ruby/2.0.0/find.rb:51:in `entries'
from C:/Ruby200/lib/ruby/2.0.0/find.rb:51:in `block in find'
from C:/Ruby200/lib/ruby/2.0.0/find.rb:42:in `catch'
from C:/Ruby200/lib/ruby/2.0.0/find.rb:42:in `find'
from ./findrecent.rb:17:in `<main>'
How do I add exception handling to this code?
I had a look in the code where the exception is being generated and the method contains the following block:
if s.directory? then
begin
fs = Dir.entries(file)
rescue Errno::ENOENT, Errno::EACCES, Errno::ENOTDIR, Errno::ELOOP, Errno::ENAMETOOLONG
next
end
... more code
Performing a horrible hack I added Errno::EINVAL to the list of rescue errors. My code now executes and goes through all the folders but I can't leave that change in the Ruby library code.
Internally find is using Dir.entries, so maybe I need to rewrite my code to process the folders myself, and not rely on find.
I would still like to know if there is a way of handling errors in this sort of code construct as from reading other code this type of small/concise code is used a lot in Ruby.

Do you get this error on your newestFile function or when you try to run File#directory??
If this happens in newestFile you can do something like this:
Find.find(ARGV[0]) do |f|
if File.directory?(f)
newestFile(f) rescue nil
end
end
This just ignores any errors and punts until the next folder. You could also do some nicer output if desired:
Find.find(ARGV[0]) do |f|
if File.directory?(f)
begin
newestFile(f)
rescue
puts "error accessing: #{f}, you might now have permissions"
end
end
end
If the error happens in the File#directory? you need to wrap that section as well:
Find.find(ARGV[0]) do |f|
begin
if File.directory?(f)
newestFile(f)
end
rescue
puts "error accessing: #{f}, you might now have permissions"
end
end
Like you mentioned if the error is occurring in the Find#find itself then you can't catch that from the block. It would have to happen inside of that method.
Can you confirm that the exception is happening in that method and not the subsequent ones by pasting a stack trace of the exception?
Edit
I was going to suggest traversing the directories yourself with something like Dir#entries so you would have that capacity to catch the errors then. One thing I am interested in is if you leave of the * in the call from the command line. I am on MacOS so I can't duplicate 100% what you are seeing but If I allow it to traverse a directory that I don't have access to on my mac it prints debug info about what folders I can't access but continues on. If I give it the * on the other had it seems to do nothing except print the error of the first folder it can't access.
One difference in my experience on the MacOS is that it isn't actually throwing the exception, it is just printing that debug info to the console. But it was interesting that the inclusion of the * made mine stop completely if I didn't have access to a folder.

You can be reactive or proactive, either works, but by testing in advance, your code will run a little faster since you won't be triggering the exception mechanism.
Instead of waiting for a problem to happen then trying to handle the exception, you can find out whether you actually should try to change to a directory or access a file using the File class's owned? and grpowned? methods. From the File documentation:
grpowned?(file_name) → true or false
Returns true if the named file exists and the effective group id of the calling process is the owner of the file. Returns false on Windows.
owned?(file_name) → true or false
Returns true if the named file exists and the effective used id of the calling process is the owner of the file.
That means your code can look like:
Find.find(ARGV[0]) do |f|
if File.directory?(f) && %w[grpowned? owned?].any?{ |m| File.send(m.to_s, f) }
newestFile(f)
end
end
Ruby will check to see if the directory entry is a directory and whether it is owned or grpowned by the current process. Because && is short-circuiting, if it's not a directory the second set of tests won't be triggered.
On some systems the group permissions will give you a better chance of having access rights if there are lots of shared resources, so that gets tested first, and if it returns true, any? will return true and the code will progress. If false is returned because the group permissions don't allow access, then owned? will test the file and the code will skip or step into newestFile. Reverse those two tests for speed depending on the set-up of your system. Or, run the code one with using time ruby /path/to/your/code then twiddle the two and run it again. Compare the resulting times to know which is faster on your system.
There are different schools of thought about whether using exception handling to control program flow is good and different languages prefer different things in their programming styles. To me, it seems like code will always run faster and more safely if I know in advance whether I can do something, rather than try and have it blow up. If it blows up in an expected way, that's one thing, but if it blows up in ways I didn't expect, then I might not have exception handling in place to react correctly, or it might trigger other exceptions that mask the true cause. I'd rather see if I can work my way out of a situation by checking the state, and then if all my attempts failed, have an exception handler that lets me gracefully exit. YMMV.
Finally, in Ruby, we don't name methods using Camelcase, we use snake_case. snake_case_is_easier toReadThanCamelCase.

Related

Is Ruby file deletion asynchronous?

I have (had) some Ruby code which for historical reasons is (was) essentially
Dir.mktmpdir do |dir|
path_list = something_which_creates_files_in(dir)
path_list.each(&:delete)
end
Occasionally I get (got) exceptions from this code:
Errno::ENOENT: No such file or directory # dir_s_rmdir - /tmp/d2..4w/file.csv
:
/path/to/source.rb:124:in `unlink'
/path/to/source.rb:124:in `delete'
/path/to/source.rb:124:in `each'
:
/usr/lib/ruby/2.5.0/tmpdir.rb:93:in `mktmpdir'
:
so it appears to me that that my "cleanup" of the list of paths at the end of the block is not entirely synchronous, that (some of) the files still exist after it completes, then mktmpdir removes the temporary directory so that the asynchronous (?) unlink fails, its target has gone away. Is this a reasonable interpretation?
This is more an academic question than anything else; the behaviour puzzles me. Just removing the cleanup (the path_list.each(&:delete)) and leaving the deletion to Dir.mktmpdir seems to stop these exceptions.
If it makes a difference, this is Ruby 2.5 (MRI) running on Linux.
Your assumption seems to be correct.
If you would check the source of File.unlink you could see the following:
static VALUE
rb_file_s_unlink(int argc, VALUE *argv, VALUE klass)
{
return apply2files(unlink_internal, argc, argv, 0);
}
Here unlink_internal is a trivial thing (just a thin wrapper around a system call?), but what is interesting is the implementation of apply2files. You could see there the following call:
...
rb_thread_call_without_gvl(no_gvl_apply2files, aa, RUBY_UBF_IO, 0);
...
where aa is some fancy struct that contains among other things a pointer to the thing that we want to apply - unlink in our case.
The name of this function is quite self-descriptive, but the source contains some documentation too, so we can just refer to it:
/*
* rb_thread_call_without_gvl - permit concurrent/parallel execution.
* rb_thread_call_without_gvl2 - permit concurrent/parallel execution
* without interrupt process.
...
So from what I see (disclaimer: without the really careful analysis of the source code :)) the deletions within the block in question 1) do happen concurrently and 2) without GIL "protection" - so "surprises" are more than possible if one tries to delete temporary files twice (1st time explicitly and 2nd time implicitly when the mktmpdir block exits).

What are all the ways to check if a file exists in ruby without shelling out?

What are all the ways to check if a file exists using Ruby's core classes/modules without shelling out?
Would also appreciate reasons why choosing one method over another makes sense. For example: Using Dir['**/*'].grep(/foo/) is the shortest way I've found to match paths using a regex.
However, I think Pathname.new('.').find.any? { |pn| pn.fnmatch? "*foo*" } is a good option because Pathname is a cross-platform solution that usually seems to "just work".
Are there any solutions/classes/modules I've missed? Also, would appreciate answers that involve speed/efficiency analysis.
require 'minitest/autorun'
require 'pathname'
class TestTouch < Minitest::Test
include FileUtils
attr_reader :foo
def setup
#foo = Pathname.new('foo')
foo.delete if foo.exist?
end
def teardown
foo.delete if foo.exist?
end
def test_touch
touch foo
cwd = Pathname.new('.')
assert cwd.find.to_a.map(&:to_s).grep(/foo/).any?
assert cwd.find.any? { |pn| pn.fnmatch? "*foo*" }
assert cwd.join('foo').exist?
assert Dir['**/*'].grep(/foo/)
assert Dir.glob('**/*').grep(/foo/)
assert !Dir.glob('foo').empty?
assert File.exist?('foo')
end
end
Try this
File.exist?(fname)
File.file?(fname)
Sometimes though, if you need to check existence and open the file in an atomic operation is can be best to just open the file and handle the case of a missing file by rescuing the exception.
Why would it be a good idea? This mostly applies to infrastructure code on the backend when you deal with databases and caching layers. Sometimes it can be critical that your code is not affected if the file is deleted or replaced between taking the branch and consuming the content—when a file is deleted the handle remains open and can still be used!
begin
File.open(fname) { ... }
rescue Errno::ENOENT => e
...
end
ENOENT is the C library error code for "file not found" for a complete list of all error codes see here. Most of Ruby's file handling is basically just a thin wrapper around the underlying C libraries. As you might have already noticed from browsing the File class.
Off the top of my head...
Pathname.exist?(NAME)
FileTest.exist?(NAME)
Pathname.file?(NAME)
FileTest.file?(NAME)
How about
File.exists?(NAME)
? Note that this also returns true if NAME is, for instance, a directory.

In ruby, how to rename a text file, keep same handle and delete the old file

I am using a constant (FLOG) as a handle to write to my log. At a given point, I have to use a temporary log, and later append that content to the regular log, all that with the same handle, which is used through a bunch of methods.
My test program is below. After closing the handle 'FLOG' associated with the temp log, when I re-assign FLOG to the new log, this somehow re-opens the temp log, and I can't delete it.
Is there a way to make sure that the old temp file stays close (so I can delete it)
# Pre-existing log:
final_log = "final_#{Time.now.strftime("%Y%m%d")}.txt"
#Writing something in it
File.open(final_log, "w+") { |file| file.write("This is the final log: #{final_log}\n") }
# temp log:
temp_log = "temp_#{Time.now.strftime("%Y%m%d")}.txt"
FLOG = File.new(temp_log, "w+")
# write some stuff in temp_log
FLOG.puts "Writing in temp_log named #{temp_log}"
# closing handle for temp_log
FLOG.close
# avoid constant reuse warning:
Object.send(:remove_const,'FLOG') if Object.const_defined?('FLOG')
# need to append temp_log content to final_log with handle FLOG
FLOG = File.open(final_log, "a+")
# appending old temp log to new log
File.open(temp_log, "r").readlines.each do |line|
puts "appending... #{line}"
FLOG.puts "appending... #{line}"
end
# closing handle
FLOG.close
# this tells me that 'temp_log' is somehow re-opened:
ObjectSpace.each_object(File) { |f| puts("3: #{temp_log} is open") if f.path == temp_log && !f.closed? }
File.delete(temp_log) # Cant do that:
# test_file2.rb:35:in `delete': Permission denied - temp_20150324.txt (Errno::EACCES)
If you're going to use a temp file, use tempfile
require 'tempfile'
# Pre-existing log:
final_log = "final_#{Time.now.strftime("%Y%m%d")}.txt"
#Writing something in it
File.open(final_log, "w+") { |file| file.write("This is the final log: #{final_log}\n") }
# give the tempfile a meaningful prefix
temp_log = Tempfile.new('foobar')
begin
$flog = temp_log
# write some stuff in temp_log
$flog.puts "Writing in temp_log named #{temp_log.path}"
# need to append temp_log content to final_log with handle $flog
$flog = File.open(final_log, "a+")
# reopen temp_log for reading, append to new log
temp_log.open.readlines.each do |line|
puts "appending... #{line}"
$flog.puts "appending... #{line}"
end
# closing handle
$flog.close
ensure
# delete temp_log
temp_log.unlink
end
And while globals are generally bad, hacking a constant so that you can use it like a global is worse.
temp_log is still open because you didn't close it. If you did something like:
temp_log_lines = File.open(temp_log, 'r') { |f| f.readlines }
then the I/O stream for to temp_log would be closed at the end of the block. However, doing File.open(temp_log, "r").readlines takes the IO object returned by File.open and calls readlines on it, which you then call each on with an accompanying block. Since the block is part of your call to each and not File.open, the stream is not closed at the end of it, and stays open for the rest of the program.
As to why you can't delete temp_log at the end of the program, it's hard to say without knowing what's going on in the underlying file system. Neither Ruby nor the underlying (POSIX) OS will complain if you delete a file that you've opened a stream for and not closed; the file will be unlinked but the stream will persist and still have the contents of the file and so on. The error you're getting is saying that the owner of the Ruby process for this program doesn't have the rights to delete the file the program created. That's strange, but hard to diagnose just from this code. Consider the directory the program is working in, what the permissions on it are, etc.
Speaking more generally, there are some things you could make use of in Ruby here that would make your life easier.
If you want a temporary file, there is a Tempfile class you could make use of that does a lot of legwork for you.
The idiomatic way of doing I/O with files in Ruby is to pass a block into File.open. The block is handed an I/O stream for the file that is automatically closed at the end of the block, so you don't need to do it manually. Here's an example:
flog = File.new(temp_log, 'w+') do |f|
f.puts "Writing in temp_log named #{temp_log}"
end
FLOG is not a true constant in your code. A constant is only a constant if its value never changes throughout the life of the program it's declared in. Ruby is a very permissive language, so it allows you to reassign them, but warns you if you do. Stricter languages would throw an error if you did that. FLOG is just a normal variable and should be written flog. A good use for a constant is a value external to the action of your program that your program needs be able to reference—for instance, instead of writing 3.141592653589793 every time you need to refer to an approximation of pi, you could declare PI = 3.141592653589793 and use PI afterwards. In Ruby, this has been done for you in the Math module—Math::PI returns this. User settings are another place constants often show up—they're determined before the program gets going, help to determine what it does, and should be unmodified during its execution, so storing them in constants sometimes makes sense.
You describe the program you supplied as a test program. Ruby has really great testing libraries you could make use of that will be nicer than writing scripts like this. Minitest is part of the Ruby standard library and is my favorite testing framework in Ruby, but a lot of people like RSpec too. (I'd like to link to the documentation for those frameworks, but I don't have enough reputation—sorry. You'll have to Google.)
It'll be hard to make use of those frameworks if you write your code imperatively like this, though. Ruby is a very object-oriented language and you'll get a lot out of structuring your code in an object-oriented style when working in it. If you're not familiar with OO design, some books that have been really good for me are Practical Object-Oriented Design in Ruby by Sandi Metz, Refactoring: Improving the Design of Existing Code by Martin Fowler et al., and Growing Object-Oriented Software, Guided by Tests, by Steve Freeman and Nat Pryce. (Same thing here with the lack of links.)

ruby - how can i still do something when there's error (example: NameError)

i manage to do "something"(like deleting files,etc) when exit or exit! is called from Object.
by changing the method exit and exit! inside Object class. (at_exit is too unreliable)
but then the "something" never execute if there's error such as NameError, etc.
is there way so i can make "something" that still execute if there's error.
(any error possible if necessary).
something like at_exit but works with all errors
thanks in advance for assistance. and forgive me if there's already question ask for this.
i do search a lot before asking here. and found nothing.
edit: i don't know where the original author of the code place the method. since the original author load it from dll files in the exe program the author used for launcher. (i can only edit After the original code take place...). so i think i need another approach for this... but i manage to make a workaround for my problem... by putting begin rescue in other places that send data to the original object. i filter the data send and throw my own error before it reach the main program... so i guess this works too.
Borrowing from an answer on a different thread, and definitely along the lines of what Marek commented, this is how you should handle errors in Ruby:
begin
# something which might raise an exception
rescue SomeExceptionClass => some_variable
# code that deals with some exception
rescue SomeOtherException => some_other_variable
# code that deals with some other exception
else
# code that runs only if *no* exception was raised
ensure
# ensure that this code always runs, no matter what
end
Original credit: Begin, Rescue and Ensure in Ruby?

ignoring errors and proceeding in ruby

whenever there is an exception call that is raised, the script terminates.
do i have to resort to putting each action ? it gets very complicated fast.....
begin
#someaction
begin
#someaction2
rescue
end
rescue
end
You could use some sort of AOP mechanism to surround every method call with exception handling code (like Aquarium: http://aquarium.rubyforge.org/), or put rescue nil after every line of code, but I'm guessing that if you need to do that, then the exceptions raised are not really signalling exceptional situations in your app (which is bad) or you want to try to continue even in a situation where there's really no point to do so (which is even worse). Anyway I'd advise you to reconsider what you really need to do, because it seems to me that you are approaching the problem in a wrong way.
It's difficult to give a specific answer because I don't know what your program does.
But in general terms, I find that the best way to deal with this is to put the code that could fail into one or more seperate methods (and perhaps the method name should reflect this).
This has a number of advantages. First of all, it means that the rest of your code doesn't have to be hedged around with exception handling; secondly, if the "dangerous" actions are carefully split up into logical groups, you may be able to do exception handling on the method, not the actual actions. Stupid example:
my_list = get_list("one") # perfectly safe method
my_list.each do |x|
begin
x.dangerous_file_method() # dangerous method
rescue
x.status = 1
end
end

Resources