Opening a file without creating a new one - ruby

I have a ruby function that accesses files in my unix filesystem.
I have 2050 files each representing an hashed value in a dedicated directory.
The function reads a file containing email addresses and performs a hashing function, finds out the file id and prints.
Usually I do those things in Java but I wanna start doing it in Ruby. My problem is, that within my function, I try to open the correct file for reading, but I see that open works the same as new when no code block is provided. From the IO class, method ::openWith no associated block, IO.open is a synonym for ::new.
What I simply need to do is, open the file, set the reader pointer to the first available line, write and flush.
For simplicity I will put every code statement in one line. The file should be opened with its current status (see the HERE comment).
def dispatch
while (id=IDS_FILE.gets)
bucket="#{BUCKETS}" << (PERFORM HERE THE HASH CALCULATION) ".txt"
#HERE
bucket_file=File.open("#{bucket}","w")
bucket_file.write(id)
bucket_file.close
end
log "Writing #{id.chomp!} to #{bucket_file.to_path}"
end
end

To get "the file to be opened with its current status" you'll need the "append mode", as documented here:
"a" Write-only, starts at end of file if file exists,
otherwise creates a new file for writing.
So your code should read like so:
File.open(bucket, "a") do |f|
f.write id
end

Related

Opening filehandles for use with TabularMSA in skbio

Hey there skbio team.
So I need to allow either DNA or RNA MSAs. When I do the following, if I leave out the alignment_fh.close() skbio reads the 'non header' line in the except block making me think I need to close the file first so it will start at the beginning, but if I add alignment_fh.close() I cannot get it to read the file. I've tried opening it via a variety of methods, but I believe TabularMSA.read() should allow files OR file handles. Thoughts? Thank you!
try:
aln = skbio.TabularMSA.read(alignment_fh, constructor=skbio.RNA)
except:
alignment_fh.close()
aln = skbio.TabularMSA.read(alignment_fh, constructor=skbio.DNA)
I've tried opening it via a variety of methods, but I believe TabularMSA.read() should allow files OR file handles.
You're correct: scikit-bio generally supports reading and writing files using open file handles or file paths.
The issue you're running into is that your first TabularMSA.read() call reads the entire contents of the open file handle, so that when the second TabularMSA.read() call is hit within the except block, the file pointer is already at the end of the open file handle -- this is why you're getting an error message hinting that the file is empty.
This behavior is intentional; when scikit-bio is given an open file handle, it will read from or write to the file but won't attempt to manage the handle's file pointer (that type of management is up to the caller of the code).
Now, when asking scikit-bio to read a file path (i.e. a string containing the path to a file on disk or accessible at some URI), scikit-bio will handle opening and closing the file handle for you, so that's often the easier way to go.
You can use file paths or file handles to accomplish your goal. In the following examples, suppose aln_filepath is a str pointing to your alignment file on disk (e.g. "/path/to/my/alignment.fasta").
With file paths: You can simply pass the file path to both TabularMSA.read() calls; no open() or close() calls are necessary on your part.
try:
aln = skbio.TabularMSA.read(aln_filepath, constructor=skbio.RNA)
except ValueError:
aln = skbio.TabularMSA.read(aln_filepath, constructor=skbio.DNA)
With file handles: You'll need to open a file handle and reset the file pointer within your except block before reading a second time.
with open(aln_filepath, 'r') as aln_filehandle:
try:
aln = skbio.TabularMSA.read(aln_filehandle, constructor=skbio.RNA)
except ValueError:
aln_filehandle.seek(0) # reset file pointer to beginning of file
aln = skbio.TabularMSA.read(aln_filehandle, constructor=skbio.DNA)
Note: In both examples, I've used except ValueError instead of a "catch-all" except statement. I recommend catching specific error types (e.g. ValueError) instead of any exception because the code could be failing in different ways than what you're expecting. For example, with a "catch-all" except statement, users won't be able to interrupt your program with Ctrl-C because KeyboardInterrupt will be caught and ignored.

How to get the name of an instance of Tempfile in Ruby?

When creating a Tempfile in ruby, it takes the basename you pass it, and then it appends a random string to the end.
From the docs: http://ruby-doc.org/stdlib-1.9.3/libdoc/tempfile/rdoc/Tempfile.html
file = Tempfile.new('hello')
file.path # => something like: "/tmp/hello2843-8392-92849382--0"
You can see it starts with hello and then adds 2843-8392-92849382--0. Though this ending will change every time you create an instance.
This makes it difficult (at least for me) to lookup in the directory its saved in.
Question:
Is there any method (like file.fullName) that could be run on the instance to just get the hello2843-8392-92849382--0, in order to look it up in the directory where its saved?
Thoughts:
You could take the path and parse it but that seems excessive.
Basically you're asking for:
File.basename(file.path)
There's rarely a reason to need that exposed as a method, but if you want you could subclass Tempfile to add it in:
class SuperTempfile < Tempfile
def basename
File.basename(path)
end
end

Deleting contents of file after a specific line in ruby

Probably a simple question, but I need to delete the contents of a file after a specific line number? So I wan't to keep the first e.g 5 lines and delete the rest of the contents of a file. I have been searching for a while and can't find a way to do this, I am an iOS developer so Ruby is not a language I am very familiar with.
That is called truncate. The truncate method needs the byte position after which everything gets cut off - and the File.pos method delivers just that:
File.open("test.csv", "r+") do |f|
f.each_line.take(5)
f.truncate( f.pos )
end
The "r+" mode from File.open is read and write, without truncating existing files to zero size, like "w+" would.
The block form of File.open ensures that the file is closed when the block ends.
I'm not aware of any methods to delete from a file so my first thought was to read the file and then write back to it. Something like this:
path = '/path/to/thefile'
start_line = 0
end_line = 4
File.write(path, File.readlines(path)[start_line..end_line].join)
File#readlines reads the file and returns an array of strings, where each element is one line of the file. You can then use the subscript operator with a range for the lines you want
This isn't going to be very memory efficient for large files, so you may want to optimise if that's something you'll be doing.

Ruby - Files - gets method

I am following Wicked cool ruby scripts book.
here,
there are two files, file_output = file_list.txt and oldfile_output = file_list.old. These two files contain list of all files the program went through and going to go through.
Now, the file is renamed as old file if a 'file_list.txt' file exists .
then, I am not able to understand the code.
Apparently every line of the file is read and the line is stored in oldfile hash.
Can some one explain from 4 the line?
And also, why is gets used here? why cant a .each method be used to read through every line?
if File.exists?(file_output)
File.rename(file_output, oldfile_output)
File.open(oldfile_output, 'rb') do |infile|
while (temp = infile.gets)
line = /(.+)\s{5,5}(\w{32,32})/.match(temp)
puts "#{line[1]} ---> #{line[2]}"
oldfile_hash[line[1]] = line[2]
end
end
end
Judging from the redundant use of quantifiers ({5,5} and {32,32}) in the regex (which would be better written as {5}, {32}), it looks like the person who wrote that code is not a professional Ruby programmer. So you can assume that the choice taken in the code is not necessarily the best.
As you pointed out, the code could have used each instead of while with gets. The latter approach is sort of an old-school Ruby way of doing it. There is nothing wrong in using it. Until the end of file is reached, gets will return a string, and when it does reach the end of file, gets will return nil, so the while loop works as the same when you use each; in each iteration, it reads the next line.
It looks like each line is supposed to represent a key-value pair. The regex assumes that the key is not an empty string, and that the key and the value are separated by exactly five spaces, and the the value consists of exactly thirty-two letters. Each key-value pair is printed (perhaps for monitoring the progress), and is stored in oldfile_hash, which is most likely a hash.
So the point of using .gets is to tell when the file is finished being read. Essentially, it's tied to the
while (condition)
....
end
block. So gets serves as a little method that will keep giving ruby the next line of the file until there is no more lines to give.

Rails File I/O: What works in Ruby doesn't work in Rails?

So, I wrote a simple Ruby class, and put it in my rails /lib directory. This class has the following method:
def Image.make_specific_image(paths, newfilename)
puts "making specific image"
#new_image = File.open(newfilename, "w")
puts #new_image.inspect
##blank.each(">") do |line|
puts line + "~~~~~"
#new_image.puts line
if line =~ /<g/
paths.each do |p|
puts "adding a path"
puts p
#new_image.puts p
end
end
end
end
Which creates a new file, and copies a hardcoded string (##blank) to this file, adding custom content at a certain location (after a g tag is found).
If I run this code from ruby, everything is just peachy.
HOWEVER, if I run this code from rails, the file gets CREATED, but is then empty. I've inspected each line of the code: nothing I'm trying to write to the file is nil, but the file is empty nonetheless.
I'm really stumped here. Is it a permissions thing? If so, why on EARTH would Rails have the permissions necessary to MAKE a file, but then not WRITE to the file it made?
Does File I/O somehow work differently in rails?
Specifically, I have a model method that calls:
Image.make_specific_image(paths, creature.id.to_s + ".svg")
which succesfully makes a file of the type "47.svg" that is empty.
Have you tried calling close on the file after you're done writing it? (You could also use the block-based File.open syntax, which will automatically close once the block is complete). I'm guessing the problem is that the writes aren't getting flushed to disk.
So.
Apparently File I/0 DOES work in Rails...just very, very slowly. In Ruby, as soon as I go to look at the file, it's there, it works, everything is spiffy.
Before, after seeing blank files from Rails, I would get frustrated, then delete the file, and change some code and try again (so as not to be full of spam, since each file is genearted on creature creation, so I would soon end up with a lot of files like "47.svg" and "48.svg", etc.
....So. I took my lunch break, came back to see if I could tell if the permissions of the rails generated file were different from the ruby generated file...and noticed that the RAILS file is no longer blank.
Seems to take about five minutes for rails to finally write to the file, even AFTER it claims it's done processing that whole call. Ruby takes a few seconds. Not really sure WHY they are so different, but at least now I know it's not a permissions thing.
Edit: Actually, on some files take so long, others are instant...

Resources