Is Ruby file deletion asynchronous? - ruby

I have (had) some Ruby code which for historical reasons is (was) essentially
Dir.mktmpdir do |dir|
path_list = something_which_creates_files_in(dir)
path_list.each(&:delete)
end
Occasionally I get (got) exceptions from this code:
Errno::ENOENT: No such file or directory # dir_s_rmdir - /tmp/d2..4w/file.csv
:
/path/to/source.rb:124:in `unlink'
/path/to/source.rb:124:in `delete'
/path/to/source.rb:124:in `each'
:
/usr/lib/ruby/2.5.0/tmpdir.rb:93:in `mktmpdir'
:
so it appears to me that that my "cleanup" of the list of paths at the end of the block is not entirely synchronous, that (some of) the files still exist after it completes, then mktmpdir removes the temporary directory so that the asynchronous (?) unlink fails, its target has gone away. Is this a reasonable interpretation?
This is more an academic question than anything else; the behaviour puzzles me. Just removing the cleanup (the path_list.each(&:delete)) and leaving the deletion to Dir.mktmpdir seems to stop these exceptions.
If it makes a difference, this is Ruby 2.5 (MRI) running on Linux.

Your assumption seems to be correct.
If you would check the source of File.unlink you could see the following:
static VALUE
rb_file_s_unlink(int argc, VALUE *argv, VALUE klass)
{
return apply2files(unlink_internal, argc, argv, 0);
}
Here unlink_internal is a trivial thing (just a thin wrapper around a system call?), but what is interesting is the implementation of apply2files. You could see there the following call:
...
rb_thread_call_without_gvl(no_gvl_apply2files, aa, RUBY_UBF_IO, 0);
...
where aa is some fancy struct that contains among other things a pointer to the thing that we want to apply - unlink in our case.
The name of this function is quite self-descriptive, but the source contains some documentation too, so we can just refer to it:
/*
* rb_thread_call_without_gvl - permit concurrent/parallel execution.
* rb_thread_call_without_gvl2 - permit concurrent/parallel execution
* without interrupt process.
...
So from what I see (disclaimer: without the really careful analysis of the source code :)) the deletions within the block in question 1) do happen concurrently and 2) without GIL "protection" - so "surprises" are more than possible if one tries to delete temporary files twice (1st time explicitly and 2nd time implicitly when the mktmpdir block exits).

Related

In ruby, how to rename a text file, keep same handle and delete the old file

I am using a constant (FLOG) as a handle to write to my log. At a given point, I have to use a temporary log, and later append that content to the regular log, all that with the same handle, which is used through a bunch of methods.
My test program is below. After closing the handle 'FLOG' associated with the temp log, when I re-assign FLOG to the new log, this somehow re-opens the temp log, and I can't delete it.
Is there a way to make sure that the old temp file stays close (so I can delete it)
# Pre-existing log:
final_log = "final_#{Time.now.strftime("%Y%m%d")}.txt"
#Writing something in it
File.open(final_log, "w+") { |file| file.write("This is the final log: #{final_log}\n") }
# temp log:
temp_log = "temp_#{Time.now.strftime("%Y%m%d")}.txt"
FLOG = File.new(temp_log, "w+")
# write some stuff in temp_log
FLOG.puts "Writing in temp_log named #{temp_log}"
# closing handle for temp_log
FLOG.close
# avoid constant reuse warning:
Object.send(:remove_const,'FLOG') if Object.const_defined?('FLOG')
# need to append temp_log content to final_log with handle FLOG
FLOG = File.open(final_log, "a+")
# appending old temp log to new log
File.open(temp_log, "r").readlines.each do |line|
puts "appending... #{line}"
FLOG.puts "appending... #{line}"
end
# closing handle
FLOG.close
# this tells me that 'temp_log' is somehow re-opened:
ObjectSpace.each_object(File) { |f| puts("3: #{temp_log} is open") if f.path == temp_log && !f.closed? }
File.delete(temp_log) # Cant do that:
# test_file2.rb:35:in `delete': Permission denied - temp_20150324.txt (Errno::EACCES)
If you're going to use a temp file, use tempfile
require 'tempfile'
# Pre-existing log:
final_log = "final_#{Time.now.strftime("%Y%m%d")}.txt"
#Writing something in it
File.open(final_log, "w+") { |file| file.write("This is the final log: #{final_log}\n") }
# give the tempfile a meaningful prefix
temp_log = Tempfile.new('foobar')
begin
$flog = temp_log
# write some stuff in temp_log
$flog.puts "Writing in temp_log named #{temp_log.path}"
# need to append temp_log content to final_log with handle $flog
$flog = File.open(final_log, "a+")
# reopen temp_log for reading, append to new log
temp_log.open.readlines.each do |line|
puts "appending... #{line}"
$flog.puts "appending... #{line}"
end
# closing handle
$flog.close
ensure
# delete temp_log
temp_log.unlink
end
And while globals are generally bad, hacking a constant so that you can use it like a global is worse.
temp_log is still open because you didn't close it. If you did something like:
temp_log_lines = File.open(temp_log, 'r') { |f| f.readlines }
then the I/O stream for to temp_log would be closed at the end of the block. However, doing File.open(temp_log, "r").readlines takes the IO object returned by File.open and calls readlines on it, which you then call each on with an accompanying block. Since the block is part of your call to each and not File.open, the stream is not closed at the end of it, and stays open for the rest of the program.
As to why you can't delete temp_log at the end of the program, it's hard to say without knowing what's going on in the underlying file system. Neither Ruby nor the underlying (POSIX) OS will complain if you delete a file that you've opened a stream for and not closed; the file will be unlinked but the stream will persist and still have the contents of the file and so on. The error you're getting is saying that the owner of the Ruby process for this program doesn't have the rights to delete the file the program created. That's strange, but hard to diagnose just from this code. Consider the directory the program is working in, what the permissions on it are, etc.
Speaking more generally, there are some things you could make use of in Ruby here that would make your life easier.
If you want a temporary file, there is a Tempfile class you could make use of that does a lot of legwork for you.
The idiomatic way of doing I/O with files in Ruby is to pass a block into File.open. The block is handed an I/O stream for the file that is automatically closed at the end of the block, so you don't need to do it manually. Here's an example:
flog = File.new(temp_log, 'w+') do |f|
f.puts "Writing in temp_log named #{temp_log}"
end
FLOG is not a true constant in your code. A constant is only a constant if its value never changes throughout the life of the program it's declared in. Ruby is a very permissive language, so it allows you to reassign them, but warns you if you do. Stricter languages would throw an error if you did that. FLOG is just a normal variable and should be written flog. A good use for a constant is a value external to the action of your program that your program needs be able to reference—for instance, instead of writing 3.141592653589793 every time you need to refer to an approximation of pi, you could declare PI = 3.141592653589793 and use PI afterwards. In Ruby, this has been done for you in the Math module—Math::PI returns this. User settings are another place constants often show up—they're determined before the program gets going, help to determine what it does, and should be unmodified during its execution, so storing them in constants sometimes makes sense.
You describe the program you supplied as a test program. Ruby has really great testing libraries you could make use of that will be nicer than writing scripts like this. Minitest is part of the Ruby standard library and is my favorite testing framework in Ruby, but a lot of people like RSpec too. (I'd like to link to the documentation for those frameworks, but I don't have enough reputation—sorry. You'll have to Google.)
It'll be hard to make use of those frameworks if you write your code imperatively like this, though. Ruby is a very object-oriented language and you'll get a lot out of structuring your code in an object-oriented style when working in it. If you're not familiar with OO design, some books that have been really good for me are Practical Object-Oriented Design in Ruby by Sandi Metz, Refactoring: Improving the Design of Existing Code by Martin Fowler et al., and Growing Object-Oriented Software, Guided by Tests, by Steve Freeman and Nat Pryce. (Same thing here with the lack of links.)

How to handle exceptions in Find.find

I'm processing files and directories looking for the most recent modified file in each directory. The code I have works but, being new to Ruby, I'm having trouble handling errors correctly.
I use Find.find to get a recursive directory listing, calling my own function newestFile for each directory:
Find.find(ARGV[0]) { |f|
if File.directory?(f)
newestFile(f)
end
}
In the directory tree there are folders I do not have permission to access, so I want to ignore them and go on to the next, but I cannot see how to incorporate the exception handling in to the Find.find "loop".
I tried to put begin..rescue..end around the block but that does not allow me to continue processing the loop.
I also found this SO question: How to continue processing a block in Ruby after an exception? but that handles the error in the loop. I'm trying to recover from an errors occurring in Find.find which would be outside the exception block.
Here's the stack trace of the error:
PS D:\dev\ruby\> ruby .\findrecent.rb "M:/main/*"
C:/Ruby200/lib/ruby/2.0.0/find.rb:51:in `open': Invalid argument - M:/main/<A FOLDER I CAN'T ACCESS> (Errno::EINVAL)
from C:/Ruby200/lib/ruby/2.0.0/find.rb:51:in `entries'
from C:/Ruby200/lib/ruby/2.0.0/find.rb:51:in `block in find'
from C:/Ruby200/lib/ruby/2.0.0/find.rb:42:in `catch'
from C:/Ruby200/lib/ruby/2.0.0/find.rb:42:in `find'
from ./findrecent.rb:17:in `<main>'
How do I add exception handling to this code?
I had a look in the code where the exception is being generated and the method contains the following block:
if s.directory? then
begin
fs = Dir.entries(file)
rescue Errno::ENOENT, Errno::EACCES, Errno::ENOTDIR, Errno::ELOOP, Errno::ENAMETOOLONG
next
end
... more code
Performing a horrible hack I added Errno::EINVAL to the list of rescue errors. My code now executes and goes through all the folders but I can't leave that change in the Ruby library code.
Internally find is using Dir.entries, so maybe I need to rewrite my code to process the folders myself, and not rely on find.
I would still like to know if there is a way of handling errors in this sort of code construct as from reading other code this type of small/concise code is used a lot in Ruby.
Do you get this error on your newestFile function or when you try to run File#directory??
If this happens in newestFile you can do something like this:
Find.find(ARGV[0]) do |f|
if File.directory?(f)
newestFile(f) rescue nil
end
end
This just ignores any errors and punts until the next folder. You could also do some nicer output if desired:
Find.find(ARGV[0]) do |f|
if File.directory?(f)
begin
newestFile(f)
rescue
puts "error accessing: #{f}, you might now have permissions"
end
end
end
If the error happens in the File#directory? you need to wrap that section as well:
Find.find(ARGV[0]) do |f|
begin
if File.directory?(f)
newestFile(f)
end
rescue
puts "error accessing: #{f}, you might now have permissions"
end
end
Like you mentioned if the error is occurring in the Find#find itself then you can't catch that from the block. It would have to happen inside of that method.
Can you confirm that the exception is happening in that method and not the subsequent ones by pasting a stack trace of the exception?
Edit
I was going to suggest traversing the directories yourself with something like Dir#entries so you would have that capacity to catch the errors then. One thing I am interested in is if you leave of the * in the call from the command line. I am on MacOS so I can't duplicate 100% what you are seeing but If I allow it to traverse a directory that I don't have access to on my mac it prints debug info about what folders I can't access but continues on. If I give it the * on the other had it seems to do nothing except print the error of the first folder it can't access.
One difference in my experience on the MacOS is that it isn't actually throwing the exception, it is just printing that debug info to the console. But it was interesting that the inclusion of the * made mine stop completely if I didn't have access to a folder.
You can be reactive or proactive, either works, but by testing in advance, your code will run a little faster since you won't be triggering the exception mechanism.
Instead of waiting for a problem to happen then trying to handle the exception, you can find out whether you actually should try to change to a directory or access a file using the File class's owned? and grpowned? methods. From the File documentation:
grpowned?(file_name) → true or false
Returns true if the named file exists and the effective group id of the calling process is the owner of the file. Returns false on Windows.
owned?(file_name) → true or false
Returns true if the named file exists and the effective used id of the calling process is the owner of the file.
That means your code can look like:
Find.find(ARGV[0]) do |f|
if File.directory?(f) && %w[grpowned? owned?].any?{ |m| File.send(m.to_s, f) }
newestFile(f)
end
end
Ruby will check to see if the directory entry is a directory and whether it is owned or grpowned by the current process. Because && is short-circuiting, if it's not a directory the second set of tests won't be triggered.
On some systems the group permissions will give you a better chance of having access rights if there are lots of shared resources, so that gets tested first, and if it returns true, any? will return true and the code will progress. If false is returned because the group permissions don't allow access, then owned? will test the file and the code will skip or step into newestFile. Reverse those two tests for speed depending on the set-up of your system. Or, run the code one with using time ruby /path/to/your/code then twiddle the two and run it again. Compare the resulting times to know which is faster on your system.
There are different schools of thought about whether using exception handling to control program flow is good and different languages prefer different things in their programming styles. To me, it seems like code will always run faster and more safely if I know in advance whether I can do something, rather than try and have it blow up. If it blows up in an expected way, that's one thing, but if it blows up in ways I didn't expect, then I might not have exception handling in place to react correctly, or it might trigger other exceptions that mask the true cause. I'd rather see if I can work my way out of a situation by checking the state, and then if all my attempts failed, have an exception handler that lets me gracefully exit. YMMV.
Finally, in Ruby, we don't name methods using Camelcase, we use snake_case. snake_case_is_easier toReadThanCamelCase.

How to implement the behaviour of -time-passes in my own Jitter?

I am working on a Jitter which is based on LLVM. I have a real issue with performance. I was reading a lot about this and I know it is a problem in LLVM. However, I am wondering if there are other bottlenecks. Hence, I want to use in my Jitter the same mechanism offers by -time-passes, but saving the result to a specific file. In this way, I can do some simple math like:
real_execution_time = total_time - time_passes
I added the option to the command line, but it does not work:
// Disable branch fold for accurate line numbers.
llvm_argv[arrayIndex++] = "-disable-branch-fold";
llvm_argv[arrayIndex++] = "-stats";
llvm_argv[arrayIndex++] = "-time-passes";
llvm_argv[arrayIndex++] = "-info-output-file";
llvm_argv[arrayIndex++] = "pepe.txt";
cl::ParseCommandLineOptions(arrayIndex, const_cast<char**>(llvm_argv));
Any solution?
Ok, I found the solution. I am publishing the solution because It may be useful for someone else.
Before any exit(code) in your program you must include a call to
llvm::llvm_shutdown();
This call flush the information to the file.
My problem was:
1 - Other threads emitted exit without the mentioned call.
2 - There is a fancy struct llvm::llvm_shutdown_obj with a destructor which call to the mentioned method. I had declared a variable in the main function as follow:
llvm::llvm_shutdown_obj X();
Everybody know that the compiler should call the destructor, but in this case it was no happening. The reason is that the variable was not used, so the compiler removed it.
No variable => No destructor => No flush to the file

Difference between `File::exist?` and `File::exists?`

On ruby-doc, the documentation entries for File::exist? and File::exists? are duplicated with different semantics: one entry says returns true if file_name is a directory; the other says returns true if file_name is a file.
I don't think either entry is correct. Both methods seem to be implemented in file.c using rb_file_exist_p, which, seems to try to call fstat() if the value passed is an IO, or stat() if it's a string. Both fstat() and stat() return 0 on success and -1 on error, and this is passed back to rb_file_exist_p, and turned into a boolean result. It seems to me that
there are two methods for making code read more easily; there are no semantic differences
neither really relates to a file existing, but to whether a file-like item exists, e.g. a file, a dir, a socket, a fifo etc.
perhaps the document could say that the methods tell the caller whether or not a thing that has file-like semantics is there, but more specific tests will tell what it actually is: e.g. directory?, file?, socket? etc.
Is my understanding of the (lack of) difference in the methods correct, and is it worth suggesting a change to the document ?
Note that the answer to this question depends on the Ruby version. See the other answers for newer versions of Ruby. AFAIK exists? was deprecated in 2.2.
If we look at the C source, we see this:
rb_cFile = rb_define_class("File", rb_cIO);
/* ... */
define_filetest_function("exist?", rb_file_exist_p, 1);
define_filetest_function("exists?", rb_file_exist_p, 1);
So File.exist? and File.exists? are exactly the same thing and the corresponding documentation is:
Return <code>true</code> if the named file exists.
The rb_file_exist_p C function is just a very thin wrapper around rb_stat, that's a wrapper for the STAT macro, and STAT is just a portability wrapper for the stat system call. So, the documentation above is correct: File#exist? returns true if the file exists.
If we check file.c for the documentation snippet that talks about directories, we find this:
/*
* Document-method: exist?
*
* call-seq:
* Dir.exist?(file_name) -> true or false
* Dir.exists?(file_name) -> true or false
*
* Returns <code>true</code> if the named file is a directory,
* <code>false</code> otherwise.
*
*/
So it looks like the documentation generator is getting confused because Dir.exist? and File.exist? are documented in file.c even though Dir is defined in dir.c.
The underlying problem seems to be that the source code arrangement doesn't match what the documentation generator expects and the result is confused and incorrect documentation. I'm not sure how this should be fixed though.
Since ruby 2.2.0 File.exists? is deprecated use instead File.exist?
http://ruby-doc.org/core-2.2.0/File.html#exist-3F-method
File.exist? and File.exists? are NOT exactly the same thing anymore. See https://github.com/ruby/ruby/blob/ruby_2_3/file.c#L5920
define_filetest_function("exist?", rb_file_exist_p, 1);
define_filetest_function("exists?", rb_file_exists_p, 1);
rb_file_exists_p contains this line:
rb_warning("%sexists? is a deprecated name, use %sexist? instead", s, s);
So you should stick with File.exist?.
git pull made it go away - this was fixed here - not sure why generated doco on ruby-doc and apidock still wrong

How do I change this case statement to an if statement?

I would like to check for the value of a node attribute. This case statement is what I have so far, and it works:
case node[:languages][:ruby][:host_cpu]
when "x86_64"
...
when "i686"
...
end
What I would like to do is use an if statement instead. This is what I tried:
if node[:languages][:ruby][:host_cpu]?("X86_64")
...
end
This is based on the following, Which worked.
if platform?("ubuntu")
...
end
However, my try didn't work. it gave a syntax error on the if line saying that there was an unexpected \n and $end was expected.
I found that there are two kinds of ways of performing an if. The first being the one I demonstrated above, which (apparently) only works with resources, and if_only, which works with nodes. like so
if_only {node[:languages]}
which seems to work only for checking the presence of nodes, and within a do context.
How do I check the value of a node using an if statement? One method does check values, but only of resources, the other checks nodes, but only for their presence, and not their values.
You are mixing up way to many different variants for conditionals, most of which are part of Chef, not Ruby. Let me try to describe the different options one by one.
Generally, a case is roughly comparable to a series of if and elsif statements. Your case above
case node[:languages][:ruby][:host_cpu]
when "x86_64"
...
when "i686"
...
end
is thus roughly equivalent to
if node[:languages][:ruby][:host_cpu] == "x86_64"
...
elsif node[:languages][:ruby][:host_cpu] == "i686"
...
end
As a side remark, case actually uses the === operator which is often not commutative but more powerful. For simple comparisons it works the same as == though. Both these variants are part of the Ruby language, in which you write your cookbooks.
The other options you mentioned are actually part of the API which Chef defined on top of Ruby. This is often called the Chef DSL (which stands for Domain Specific Language, i.e. an extension or adaption of a language, in this case Ruby for a specific usage domain, in this case configuration management.)
The platform? method is a method defined by Chef that checks whether the curent platform is one of the passed values. You can read more about that (and similar methods, e.g. the now recommended platform_family? method at the Chef docs for recipes in general and some often used ruby idioms.
As a side-remark: you might be surprised by the fact that Ruby allows the ? and ! characters to appear at the end of method names, which makes Ruby rather unique among similar languages in this regard. These characters are simply part of the method name and have no special meaning to the language. They are only used by convention to programmers to better identify the purpose of a method. If a method has a ? at the end, it is to be used to check some condition and is expected to return either a truthy or falsy value. Methods with a ! at the end often perform some potentially dangerous operation, e.g. change object in place, delete stuff, ... Again, this is only a convention and is not interpreted by the language.
The last option you mentioned, the only_if and by extension not_if are used to define conditionals on Chef resources to make sure they are only executed when a certain condition is true (or when using not_if, if it is false). As these attributes are used on Chef resources only, they are naturally also defined by Chef.
To understand why they are useful it is necessary to understand how a Chef run works. The details can be found at the description of the Anatomy of a Chef Run. What is important there is, that you basically have two execution phases: Resource Compilation and Convergence. In the first step, the actual code to define the resources is executed. Here, also the code in your case statement would be run. After all the recipes have been loaded and all the resources have been defined, Chef enters the second phase, the Convergence phase. There, the actual implementation of the resources which performs the changes (create files and directories, in stall packages, ...) is run. Only in this phase, the only_if and not_if conditions are checked.
In fact, you can observe the difference between
file "/tmp/helloworld"
action :create
content "hello world"
end
if File.exist?("/tmp/helloworld")
file "/tmp/foobar"
action :create
content "foobar"
end
end
and
file "/tmp/helloworld"
action :create
content "hello world"
end
file "/tmp/foobar"
action :create
content "foobar"
only_if{ File.exist?("/tmp/helloworld") }
end
In the first variant, the condition whether /tmp/foobar exists is checked during resource compilation. At this time, the code to actually create the /tmp/helloworld file has not been run, as it does that only in the Conversion step. Thus, during your first run, the /tmp/foobar file would not be created.
In the second variant however, the check is done with only_if which is evaluated during conversion. Here you will notice that both files get created in the first run.
If you want to read a bit more on how the definition of the conditionals works in terms of Ruby (and you definitely should), you can read about Ruby Blocks which are more or less pieces of code that can be passed around for later execution.

Resources