Ruby Tempfile vs File - ruby

I want to know the difference between Tempfile and File.
I found that :
require 'open-uri'
open('c:/boot.ini'){|file|
puts file.class #File
}
open('http://coderlee.cnblogs.com'){|file|
puts file.class #Tempfile
}
and when I save the stream to a remote storage server,the Tempfile will cause an error,It seems that the reason is the encoding is not ASCII-8BIT why?

In the first case, you are loading a file from your file system. This create a File object, using the file name (it has one).
In the second case, you are opening a stream toward a remote file. There is no associated file on your file system, yet you need one if you want to make any operation on it. Thus, Ruby creates a Tempfile for you with a unique filename that you don't even need to know (as the resource does not have a name itself). It then behave exactly like a File object.

The encoding of the document you retrieved is controlled by the server. If you want to retrieve the document in a different encoding, you need to change the encoding on the server.

Related

find whether a zipped file is text or binary without unzipping it

I'm creating a ruby script which goes through several zip files and validates the content of any xml files within. To optimise my script, I'm using the ruby-zip gem to open the zip files without extracting them.
My initial thought was to use filemagic to determine the MIME-type of the files, but the filemagic gem takes a file path and all I have are these Entry and InputStream classes which are unique to ruby-zip.
Is there a good way to determine the filetype without extracting? Ultimately I need to identify xml files, but I can get away with identifying plain-text files and using a regex to look for the
the filemagic gem takes a file path
The filemagic gem's file method takes a file path, but file isn't the only method it has. A glance at the docs reveals it has an io method, too.
all I have are these Entry and InputStream classes which are unique to ruby-zip
I wouldn't say InputStream is "unique to ruby-zip." From the docs (emphasis mine):
A InputStream inherits IOExtras::AbstractInputStream in order to provide an IO-like interface for reading from a single zip entry
So FileMagic has an io method and Zip::InputStream is IO-like. That leads us to a pretty straightforward solution:
require 'filemagic'
require 'zip'
Zip::InputStream.open('/path/to/file.zip') do |io|
entry = io.get_next_entry
FileMagic.open(:mime) do |fm|
p fm.io(entry.get_input_stream)
end
end

Does the ruby File object constructor load the entire file into memory

I'm trying to understanding what happens when I create a ruby File object.
The case have in mind is whether I can create a File object for a file while on one branch of the source tree and then switch branches and then still be able to access the file from the previous branch via the in initial file object I created.
So something like this:
repo = Rugged::Repository.new('path/to/repo/')
repo.checkout("test_branch")
file = File.new('path/to/repo/file.xml')
repo.checkout("master")
file.read # hopefully reading a file from the `test_branch`
I'm hoping that file.read would now be reading the file from the test_branch even though I checked-out the repo back to master branch.
Will this work?
File.new is lazy and does not load the whole file into to memory. You must read it to get it contents. Do a simple test, create a file, open it with File.new and modify its content before reading.

Ruby Tempfile - Modify file name?

I'm using Tempfile to store a generated PDF before uploading to a new destination.
pdf_file = WickedPdf.new.pdf_from_string(msgbody)
tempfile = Tempfile.new(['Bob', '.pdf'], Rails.root.join('public','pdf-test'))
tempfile.binmode
tempfile.write pdf_file
tempfile.close
While this works fine, the resulting file names, eg- bob20140331-19260-1g6rzr1.pdf are not user friendly.
I understand that Tempfile creates a unique name and why, but I ultimately need to change the name to make it more intuitive/easier to digest for my users.
Is there a recommended way to do so? Even if its to simply remove the middle (19260)? Thanks for your time and assistance.
A Tempfile is used to create a temporary file with a unique file name, which will be cleaned up by the garbage collector or when the ruby interpreter exits.
Tempfiles behave like File objects, but I am not sure if you can rename files and if you can, if the automatic cleanup described above will still work. Additionally you might break the constraint of unique file names if you change the temporary file name manually.
I suggest creating an ordinary file and specify the entire name by yourself (the succ method can be helpful to prevent name clashes).
Another solution might be setting the file name during or after the upload process, you mentioned.
Note sure if there is one with Tempfile, but can you not rename the file after creation time via FileUtils module? That way you could achieve that the file that was created still has a valid and user-friendly name.

Save WWW::Mechanize::File to disk using FileUtils

Using Mechanize with Ruby I get a certain file using agent.get('http://example.com/foo.torrent'), with FileUtils or otherwise, how do I save this file to my hard drive (for instance, in a directory wherefrom the script is running)?
P.S. class => WWW::Mechanize::File
Well, WWW::Mechanize::File has a save_as instance method, so I suppose something like this might work:
agent.get('http://example.com/foo.torrent').save_as 'a_file_name'
Please note that the Mechanize::File class is not the most appropriate for large files. In those cases, one should use the Mechanize::Download class instead, as it downloads the content in small chunks to disk. The file will be downloaded to where the script is running (although you can specify a different path as well). You need to set the default parser first, create a new one or modify an existing parser. Then, save it to the desired path:
agent.pluggable_parser.default = Mechanize::Download
agent.get( "http://example.com/foo.torrent}").save("path/to/a_file_name")
Check here and here for more details. Also, there's a similar question here in Stackoverflow.

Reading a zip file from a GET request without saving first in Ruby?

I am trying read zip file from HTTP GET request. One way to do it is by
saving the response body to a physical file first and then reading the
zip file to read the files inside the zip.
Is there a way to read the files inside directly without having to save
the zip file into a physical file first?
My current code:
Net::HTTP.start("clinicaltrials.gov") do |http|
resp = http.get("/ct2/results/download?id=15002A")
open("C:\\search_result.zip", "wb") do |file|
file.write(resp.body)
end
end
Zip::ZipFile.open("C:\\search_result.zip") do |zipfile|
xml = zipfile.file.read("search_result.xml")
end
Looks like you're using rubyzip, which can't unzip from an in-memory buffer.
You might want to look at using Chilkat's Ruby Zip Library instead as it supports in-memory processing of zip data. It claims to be able to "Create or open in-memory Zips", though I have no experience with the library myself. Chilkat's library isn't free, however, so that may be a consideration. Not sure if there is a free library that has this feature.
One way might be to implement in-memory file, so that RubyZip can still play with your file without changing anything.
You sould take a look at this Ruby Hack

Resources