How to read file from s3? - ruby

I'm trying to read a CSV file directly from s3.
I'm getting the s3 URL but I am not able to open it as it's not in the local system. I don't want to download the file and read it.
Is there any other way to achieve this?

There are few ways, depending on the gems that you are using. For example, one of the approaches from official documentation:
s3 = Aws::S3::Client.new
resp = s3.get_object(bucket:'bucket-name', key:'object-key')
resp.body
#=> #<StringIO ...>
resp.body.read
#=> '...'
Or if you are using CarrierWave/Fog:
obj = YourModel.first
content = obj.attachment.read

You can open the file from URL directly:
require 'open-uri'
csv = open('http://server.com/path-to-your-file.csv').read

I think s3 doesn't provide you any way of reading the file without downloading it.
What you can do is save it in a tempfile:
#temp_file = Tempfile.open("your_csv.csv")
#temp_file.close
`s3cmd get s3://#{#your_path} #{#temp_file.path}`
For further information: http://www.ruby-doc.org/stdlib-1.9.3/libdoc/tempfile/rdoc/Tempfile.html

Related

How do I get the file metadata from an AWS S3 file with Ruby?

I'm trying to simply retrieve the meta data from a file uploaded to S3. Specifically I need to the content type.
I know the file has metadata, because I can see it in S3 console. But I'm unable to get it programmatically. I must have some syntax error.
See the code below, the file.key returns the file name correctly. But the file.metadata doesn't seem to return an array with data.
s3 = Aws::S3::Resource.new(region: ENV['REGION'])
file = s3.bucket(sourceS3Bucket).object(sourceS3Key)
puts file.key # this works!
puts file.metadata # this returns an empty array {}
puts file.metadata['content-type'] # empty
As Aleksei Matiushkin suggested file.data[:content_type] will give the file type.

Open a local file with open-uri

I am doing data scraping with Ruby and Nokogiri. Is it possible to download and parse a local file in my computer?
I have:
require 'open-uri'
url = "file:///home/nav/Desktop/Scraping/scrap1.html"
It gives error as:
No such file or directory # rb_sysopen - file:\home/nav/Desktop/Scraping/scrap1.html
If you want to parse a local file with Nokogiri you can do it like this.
file = File.read('/home/nav/Desktop/Scraping/scrap1.html')
doc = Nokogiri::HTML(file)
When you open a local file in a browser, the URL in the address bar is displayed as:
file:///Users/7stud/Desktop/accounts.txt
But that doesn't mean you use that format in a Ruby script. Your Ruby script doesn't send the file name to a browser and then ask the browser to retrieve the file. Your Ruby script searches your file system directly.
The same is true for URLs: your Ruby script doesn't ask your browser to go retrieve a page from the internet, Ruby retrieves the page itself by sending a request using your system's network interface. After all, a browser and a Ruby program are both just computer programs. What your browser can do over a network, a Ruby program can do, too.
This works for me:
require 'open-uri'
text = open('./data.txt').read
puts text
You have to get your path right, though. The only reason I can think of to use open() is if you had an array of filenames and URLs mixed together. If that isn't your situation, see new2code's answer.
This is how I do it as according to the documentation.
f = File.open("//home/nav/Desktop/Scraping/scrap1.html")
doc = Nokogiri::HTML(f)
f.close
I would make use of Mechanize and save the file locally, then parse it with Nokogiri like so:
# Save the file
agent = Mechanize.new
agent.pluggable_parser.default = Mechanize::Download
current_url = 'http://www.example.com'
file = agent.get(current_url)
file.save!("#{Rails.root}/tmp/")
# Read the file
page = Nokogiri::HTML::Reader(File.open(file))
Hope that helps!

how to parse XML file remotely from FTP with nokogiri gem, without downloading

require 'net/ftp'
require 'nokogiri'
server = "xxxxxx"
user = "xxxxx"
password = "xxxxx"
ftp = Net::FTP.new(server, user, password)
files = ftp.nlst('File*.xml')
files.each do |file|
ftp.getbinaryfile(file)
doc = Nokogiri::XML(open(file))
# some operations with doc
end
With the code above I'm able to parse/read XML file, because it first downloads a file.
But how can I parse remote XML file without downloading it?
The code above is a part of rake task that loads rails environment when run.
UPDATE:
I'm not going to create any file. I will import info into the mongodb using mongoid.
If you simply want to avoid using a temporary local file, it is possible to to fetch the file contents direct as a String, and process in memory, by supplying nil as the local file name:
files.each do |file|
xml_string = ftp.getbinaryfile( file, nil )
doc = Nokogiri::XML( xml_string )
# some operations with doc
end
This still does an FTP fetch of the contents, and XML parsing happens at the client.
It is not really possible to avoid fetching the data in some form or other, and if FTP is the only protocol you have available, then that means copying data over the network using an FTP get. However, it is possible, but far more complicated, to add capabilities to your FTP (or other net-based) server, and return the data in some other form. That could include Nokogiri parsing done remotely on the server, but you'd still need to serialise the end result, fetch it and deserialise it.

How to FTP in Ruby without first saving the text file

Since Heroku does not allow saving dynamic files to disk, I've run into a dilemma that I am hoping you can help me overcome. I have a text file that I can create in RAM. The problem is that I cannot find a gem or function that would allow me to stream the file to another FTP server. The Net/FTP gem I am using requires that I save the file to disk first. Any suggestions?
ftp = Net::FTP.new(domain)
ftp.passive = true
ftp.login(username, password)
ftp.chdir(path_on_server)
ftp.puttextfile(path_to_web_file)
ftp.close
The ftp.puttextfile function is what is requiring a physical file to exist.
StringIO.new provides an object that acts like an opened file. It's easy to create a method like puttextfile, by using StringIO object instead of file.
require 'net/ftp'
require 'stringio'
class Net::FTP
def puttextcontent(content, remotefile, &block)
f = StringIO.new(content)
begin
storlines("STOR " + remotefile, f, &block)
ensure
f.close
end
end
end
file_content = <<filecontent
<html>
<head><title>Hello!</title></head>
<body>Hello.</body>
</html>
filecontent
ftp = Net::FTP.new(domain)
ftp.passive = true
ftp.login(username, password)
ftp.chdir(path_on_server)
ftp.puttextcontent(file_content, path_to_web_file)
ftp.close
David at Heroku gave a prompt response to a support ticket I entered there.
You can use APP_ROOT/tmp for temporary file output. The existence of files created in this dir is not guaranteed outside the life of a single request, but it should work for your purposes.
Hope this helps,
David

How to read image metadata from a URL?

I want to read metadata of already uploaded JPEGs on S3. Is there a way to do that in Ruby without downloading the file locally?
The problem I am facing is that Image(Mini)Magick doesn't take a URL as a source (or at least I didn't find the right command).
Update:
This is working:
>> image = MiniMagick::Image.from_file -path_to_file-
>> image["EXIF:datetime"]
=> "2010:07:19 23:07:54"
But I didn't find a good substitude for "from_file", for URLs so something like:
>> image = MiniMagick::Image.from_url http://image_adress.com/image.jpg
doesn't work.
What about using open-uri?
require 'open-uri'
image = nil
open('http://image_adress.com/image.jpg') do |file|
image = MiniMagick::Image.from_blob(file.read) rescue nil
end
image["EXIF:datetime"] if image

Resources