After loading a url by open-uri, how to handle the generated Tempfile object? - ruby

I wanna figure out how to download images from internet then store them locally.
Here's what I did:
require 'open-uri' # => true
file = open "https://s3-ap-southeast-1.amazonaws.com/xxx/Snip20180323_40.png"
# => #<Tempfile:/var/folders/k0/.../T/open-uri20180524-60756-1r44uix>
Then I was confused about this Tempfile object. I found I can get the original url by:
file.base_uri
# => #<URI::HTTPS https://s3-ap-southeast-1.amazonaws.com/xxx/Snip20180323_40.png>
But I failed in finding a method that can directly get the original file name Snip20180323_40.png.
Is there a method that can directly get the original file name from a Tempfile object?
What purpose are Tempfile objects mainly used for? Are they different from normal file objects such as: file_object = File.open('how_old.rb') # => #<File:how_old.rb>?
Can I convert a Tempfile object to a File object?
How can I write this Tempfile as the same name file in a local directory, for example /users/user_name/images/Snip20180323_40.png?

The original filename is only really available in the URL. Just take uri.path.split("/").last.
Tempfiles are effective Files, with the distinction that when it is garbage collected, the underlying file is deleted.
You can copy the underlying file with FileUtils.copy, or you can open the Tempfile, read it, and write it into a new File handle of your choosing.
Something like this should work:
def download_url_to(url, base_path)
uri = URI(url)
filename = uri.path.split("/").last
new_file = File.join(base_path, filename)
response = uri.open
open(new_file, "wb") {|fp| fp.puts response.read }
return new_file
end
It's worth noting that if the file is less than 10kb, you'll get a StringIO object rather than a Tempfile object. The above solution handles both cases. This also just accepts whatever the last part of the path parameter is - it's going to be up to you to sanitize it, as well as the contents of the file itself; you don't want to permit clients to download arbitrary files to your system, in most cases. For example, you may want to be extra sure that the filename doesn't include paths like ..\\..\\.."which may be used to write files to non-intended locations.

Related

How to replace the first few bytes of a file in Ruby without opening the whole file?

I have a 30MB XML file that contains some gibberish in the beginning, and so typically I have to remove that in order for Nokogiri to be able to parse the XML document properly.
Here's what I currently have:
contents = File.open(file_path).read
if contents[0..123].include? 'authenticate_response'
fixed_contents = File.open(file_path).read[123..-1]
File.open(file_path, 'w') { |f| f.write(fixed_contents) }
end
However, this actually causes the ruby script to open up the large XML file twice. Once to read the first 123 characters, and another time to read everything but the first 123 characters.
To solve the first issue, I was able to accomplish this:
contents = File.open(file_path).read(123)
However, now I need to remove these characters from the file without reading the entire file. How can I "trim" the beginning of this file without having to open the entire thing in memory?
You can open the file once, then read and check the "garbage" and finally pass the opened file directly to nokogiri for parsing. That way, you only need read the file once and don't need to write it at all.
File.open(file_path) do |xml_file|
if xml_file.read(123).include? 'authenticate_response'
# header found, nothing to do
else
# no header found. We rewind and let nokogiri parse the whole file
xml_file.rewind
end
xml = Nokogiri::XML.parse(xml_file)
# Now to whatever you want with the parsed XML document
end
Please refer to the documentation of IO#read, IO#rewind and Nokigiri::XML::Document.parse for details about those methods.

in Ruby open IO object and pass each line to another object

I need to download a large zipped file, unzip it and modify each string before I save them to array.
I prefer to read downloaded zipped file line(entry) at a time, and manipulate each line(entry) as they load, rather then load the whole file in the memory.
I experimented with many IO methods of opening a file this way, but I struggle to pass a line(entry) to Zip::InputStream object. This is what I have:
require 'tempfile'
require 'zip'
require 'open-uri'
f = open(FILE_URL) #FILE_URL contains download path to .zip file
Zip::InputStream.open(f) do |io| #io is a String
while (io.get_next_entry)
io.each do |line|
# manipulate the line and push it to an array
end
end
end
if I use open(FILE_URL).each do |zip_entry|, I cannot figure out how to pass zip_entry to Zip::InputStream. Simply Zip::InputStream.open(zip_entry) does not work...
is this scenario possible, or do I have to have content of zipped file downloaded in to Tempfile completely? Any pointers so solve will be helpful

How to pass file url to helper method in middleman

I'm writing a helper method to convert images to base64 strings when needed. Below is the code
# config.rb
helpers do
def base64_url(img_link, file_type: "jpg")
require "base64"
if file_type =="jpg"
"data:image/jpg;base64,#{Base64.encode64(open(img_link).to_a.join)}"
elsif file_type =="png"
"data:image/jpg;base64,#{Base64.encode64(open(img_link).to_a.join)}"
else
link
end
end
end
In page.html.erb
<%= image_tag base64_url('/images/balcozy-logo.jpg') %>
Now the problem is when ruby reads '/images/balcozy-logo.jpg' it reads the file from system root not from the root of the project.
Error message as follows
Errno::ENOENT at /
No such file or directory # rb_sysopen - /images/balcozy-logo.jpg
How do I get around this and pass proper image url from project_root/source/images
In Middleman app.root returns the root directory of the application. There's also app.root_path, which does the same but returns a Pathname object, which is slightly more convenient:
full_path = app.root_path.join("source", img_link.gsub(/^\//, ''))
The gsub is necessary if img_link starts with a /, since it would be interpreted as the root of your filesystem.
I've taken the liberty of making a few more revisions to your method:
require "base64"
helpers do
def base64_url(path, file_type: "jpg")
return path unless ["jpg", "png"].include?(file_type)
full_path = app.root_path.join("source", path.gsub(/^\//, ''))
data_encoded = File.open(full_path, 'r') do |file|
Base64.urlsafe_encode64(file.read)
end
"data:image/#{file_type};base64,#{data_encoded}"
end
end
I've done a few things here:
Moved require "base64" to the top of the file; it doesn't belong inside a method.
Check file_type at the very beginning of the method and return early if it's not among the listed types.
Instead of open(filename).to_a.join (or the more succinct open(filename).read), use File.open. OpenURI (which supplies the open method you were using) is overkill for reading from the local filesystem.
Use Base64.urlsafe_encode64 instead of encode64. Probably not necessary but it doesn't hurt.
Remove the unnecessary if; since we know file_type will be either jpg or png we can use it directly in the data URI.
There may be a more elegant way to get file_path or determine the file's MIME type using Middleman's built-in asset system, but a very brief search of the docs didn't turn anything up.

Changing information in a CSV file

I'm trying to write a ruby script that will read through a CSV file and prepend information to certain cells (for instance adding a path to a file). I am able to open and mutate the text just fine, but am having issues writing back to the CSV without overriding everything. This is a sample of what I have so far:
CSV.foreach(path) { |row|
text = row[0].to_s
new_text = "test:#{text}"
}
I would like to add something within that block that would then write new_textback to the same reference cell(row) in the file. The only way I have to found to write to a file is
CSV.open(path, "wb") { |row|
row << new_text
}
But I think that is bad practice since you are reopening the file within the file block already. Is there a better way I could do this?
EX: I have a CSV file that looks something like:
file,destination
test.txt,A101
and need it to be:
file,destination
path/test.txt,id:A101
Hope that makes sense. Thanks in advance!
Depending on the size if the file, you might consider loading the contents of the file into a local variable and then manipulating that, overwriting the original file.
lines = CSV.read(path)
File.open(path, "wb") do |file|
lines.each do |line|
text = line[0].to_s
line[0] = "test:#{text}" # Replace this with your editing logic
file.write CSV.generate_line(line)
end
end
Alternately, if the file is big, you could write each modified line to a new file along the way and then replace the old file with the new one at the end.
Given that you don't appear to be doing anything that draws on CSV capabilities, I'd recommend using Ruby's "in-place" option variable $-i.
Some of the stats software I use wants just the data, and can't deal with a header line. Here's a script I wrote a while back to (appear to) strip the first line out of one or more data files specified on the command-line.
#! /usr/bin/env ruby -w
#
# User supplies the name of one or more files to be "stripped"
# on the command-line.
#
# This script ignores the first line of each file.
# Subsequent lines of the file are copied to the new version.
#
# The operation saves each original input file with a suffix of
# ".orig" and then operates in-place on the specified files.
$-i = ".orig" # specify backup suffix
oldfilename = ""
ARGF.each do |line|
if ARGF.filename == oldfilename # If it's an old file
puts line # copy lines through.
else # If it's a new file remember it
oldfilename = ARGF.filename # but don't copy the first line.
end
end
Obviously you'd want to change the puts line pass-through to whatever edit operations you want to perform.
I like this solution because even if you screw it up, you've preserved your original file as its original name with .orig (or whatever suffix you choose) appended.

How to get a filename from an IO object in ruby

In ruby...
I have an IO object created by an external process, which I need to get the file name from.
However I only seem to be able to get the File descriptor (3), which is not very useful to me.
Is there a way to get the filename from this object or even to get a File Object?
I am getting the IO object from notifier. So this may be a way of getting the file path as well?
There is a similar question on how to get a the filename in C, I will present here the answer to this question in a ruby way.
Getting the filename in Linux
Suppose io is your IO Object. The following code gives you the filename.
File.readlink("/proc/self/fd/#{io.fileno}")
This does not work for example if the file was removed after the io object was created for it. With this solution you have the filename, but not an File object.
Getting a File object which does not know the filename
The method IO#for_fd can create an IO and it's subclasses for any given integer filedescriptor. Your get your File object for your fd by doing:
File.for_fd(io.fileno)
Unfortunely this File object does not know the filename.
File.for_fd(io.fileno).path # => nil
I scanned through the ruby-1.9.2 sources. There seems to be no way in pure ruby to manipulate the path after the file object was created.
Getting a File object which does know the filename
An extension to ruby can be created in C which first calls File#for_fd and afterwards manipulates the Files internal data structures. This sourcecode does work for ruby-1.9.2, for other versions of ruby it may has to be adjustet.
#include "ruby.h"
#include "ruby/io.h"
VALUE file_fd_filename(VALUE self, VALUE fd, VALUE filename) {
VALUE file= rb_funcall3(self, rb_intern("for_fd"), 1, &fd);
rb_io_t *fptr= RFILE(rb_io_taint_check(file))->fptr;
fptr->pathv= rb_str_dup(filename);
return file;
}
void Init_filename() {
rb_define_singleton_method(rb_cFile, "for_fd_with_filename", file_fd_filename, 2);
}
Now you can do after compiling:
require "./filename"
f= File.for_fd_with_filename(io.fileno, File.readlink("/proc/self/fd/#{io.fileno}"))
f.path # => the filename
The readlink could also be put into the File#for_fd_with_filename definiton. This examples is just to show how it works.
If you are sure that the IO object represents a File you could try something like this
path = io.path if io.respond_to?(:path)
See documentation for File#path

Resources