How do I get the file metadata from an AWS S3 file with Ruby? - ruby

I'm trying to simply retrieve the meta data from a file uploaded to S3. Specifically I need to the content type.
I know the file has metadata, because I can see it in S3 console. But I'm unable to get it programmatically. I must have some syntax error.
See the code below, the file.key returns the file name correctly. But the file.metadata doesn't seem to return an array with data.
s3 = Aws::S3::Resource.new(region: ENV['REGION'])
file = s3.bucket(sourceS3Bucket).object(sourceS3Key)
puts file.key # this works!
puts file.metadata # this returns an empty array {}
puts file.metadata['content-type'] # empty

As Aleksei Matiushkin suggested file.data[:content_type] will give the file type.

Related

After loading a url by open-uri, how to handle the generated Tempfile object?

I wanna figure out how to download images from internet then store them locally.
Here's what I did:
require 'open-uri' # => true
file = open "https://s3-ap-southeast-1.amazonaws.com/xxx/Snip20180323_40.png"
# => #<Tempfile:/var/folders/k0/.../T/open-uri20180524-60756-1r44uix>
Then I was confused about this Tempfile object. I found I can get the original url by:
file.base_uri
# => #<URI::HTTPS https://s3-ap-southeast-1.amazonaws.com/xxx/Snip20180323_40.png>
But I failed in finding a method that can directly get the original file name Snip20180323_40.png.
Is there a method that can directly get the original file name from a Tempfile object?
What purpose are Tempfile objects mainly used for? Are they different from normal file objects such as: file_object = File.open('how_old.rb') # => #<File:how_old.rb>?
Can I convert a Tempfile object to a File object?
How can I write this Tempfile as the same name file in a local directory, for example /users/user_name/images/Snip20180323_40.png?
The original filename is only really available in the URL. Just take uri.path.split("/").last.
Tempfiles are effective Files, with the distinction that when it is garbage collected, the underlying file is deleted.
You can copy the underlying file with FileUtils.copy, or you can open the Tempfile, read it, and write it into a new File handle of your choosing.
Something like this should work:
def download_url_to(url, base_path)
uri = URI(url)
filename = uri.path.split("/").last
new_file = File.join(base_path, filename)
response = uri.open
open(new_file, "wb") {|fp| fp.puts response.read }
return new_file
end
It's worth noting that if the file is less than 10kb, you'll get a StringIO object rather than a Tempfile object. The above solution handles both cases. This also just accepts whatever the last part of the path parameter is - it's going to be up to you to sanitize it, as well as the contents of the file itself; you don't want to permit clients to download arbitrary files to your system, in most cases. For example, you may want to be extra sure that the filename doesn't include paths like ..\\..\\.."which may be used to write files to non-intended locations.

Append new lines to a csv from json.parse

more sysadmin (chef) than ruby guy, so this may be a five minute fix.
I am working on a task where i write a ruby script that pulls json data from multiple files, parses it, and writes the desired fields to a single .csv file. Basically pulling metadata about aws accounts and putting it in an accountant friendly format.
Got a lot of help from another stackoverflow on how to solve the problem for a single file, json.parse help.
My issue is that I am trying to pull the same data from multiple JSON files in an array. I can get it to loop through each file with the code below.
require 'csv'
require "json"
delim_file = CSV.open("delimited_test.csv", "w")
aws_account_list = %w(example example2)
aws_account_list.each do |account|
json_file = File.read(account.to_s + "_aws.json")
parsed_json = JSON.parse(json_file)
delim_file = CSV.open("delimited_test.csv", "w")
# This next line could be a problem if you ran this code multiple times
delim_file << ["EbsOptimized", "PrivateDnsName", "KeyName", "AvailabilityZone", "OwnerId"]
parsed_json['Reservations'].each do |inner_json|
inner_json['Instances'].each do |instance_json|
delim_file << [[instance_json['EbsOptimized'].to_s, instance_json['PrivateDnsName'], instance_json['KeyName'], instance_json['Placement']['AvailabilityZone'], inner_json['OwnerId']],[]]
end
delim_file.close
end
end
However, whenever I do it, it overwrites every time to the same single row in the .csv file. I have tried adding a \n string to the end of the array, converting the array to a string with hashes and doing a \n, but all that does is add a line to the same row that it overwrites.
How would I go about writing that it reads each json file, then appending each files metadata to a new row? This looks like a simple case of writing the right loop, but I can't figure it out.
You declared your file like this:
delim_file = CSV.open("delimited_test.csv", "w")
To fix your issue, all you have to do is change "w" to "a":
delim_file = CSV.open("delimited_test.csv", "a")
See the docs for IO#new for a description of the available file modes. In short, w creates an empty file at the filename, overwriting anyothers, and writes to that. a only creates the file if it doesn't exist, and appends otherwise. Because you have it currently at w, it'll overwrite it each time you run the script. With a, it'll append to what's already there.
You need to open file in append mode, use
delim_file = CSV.open("delimited_test.csv", "a")
'a' Write-only, starts at end of file if file exists, otherwise creates a new file for writing.
'a+' Read-write, starts at end of file if file exists, otherwise creates a new file for reading and writing'

How to read file from s3?

I'm trying to read a CSV file directly from s3.
I'm getting the s3 URL but I am not able to open it as it's not in the local system. I don't want to download the file and read it.
Is there any other way to achieve this?
There are few ways, depending on the gems that you are using. For example, one of the approaches from official documentation:
s3 = Aws::S3::Client.new
resp = s3.get_object(bucket:'bucket-name', key:'object-key')
resp.body
#=> #<StringIO ...>
resp.body.read
#=> '...'
Or if you are using CarrierWave/Fog:
obj = YourModel.first
content = obj.attachment.read
You can open the file from URL directly:
require 'open-uri'
csv = open('http://server.com/path-to-your-file.csv').read
I think s3 doesn't provide you any way of reading the file without downloading it.
What you can do is save it in a tempfile:
#temp_file = Tempfile.open("your_csv.csv")
#temp_file.close
`s3cmd get s3://#{#your_path} #{#temp_file.path}`
For further information: http://www.ruby-doc.org/stdlib-1.9.3/libdoc/tempfile/rdoc/Tempfile.html

Ruby Tempfile download missing headers

I'm working on a file upload feature in a Sinatra app. It's small and simple and was done just using Ruby's File class and saving a temporary file to a directory by hand. I'm trying to implement the same functionality using Tempfile.
I've got the upload working, but now when I click a link to download the file, the filename is just a number. It downloads and reads the file correctly but it doesn't retain the filename or type of file. Before I made my changes the file would open up in the browser by redirecting to the newly uploaded file's endpoint on the server. I'd like to get that functionality back.
My code is as follows:
post "/positions/:id/attachment" do
html_settings
new_data = post_data
if params[:file_attachment][:file].present?
file = params[:file_attachment][:file]
# file looks like this when uploaded:
#{:filename=>"Screen Shot 2013-11-26 at 4.36.13 PM.png", :type=>"image/png", :name=>"file_attachment[file]", :tempfile=>#<File:/var/folders/85/0kp_g81s1ws16zths3s8d9p80000gn/T/RackMultipart20131127-2757-1kdficq>, :head=>"Content-Disposition: form-data; name=\"file_attachment[file]\"; filename=\"Screen Shot 2013-11-26 at 4.36.13 PM.png\"\r\nContent-Type: image/png\r\n"}
# Tempfile object
temp_file = Tempfile.new(file[:filename], 'uploads') # Create tempfile, save to uploads folder
begin
write_tempfile(file, temp_file)
new_data['file_attachment']['file'] = temp_file
new_data['multipart'] = true
# At this point, the new_data hash looks the same except for a small difference in the path name
# Before tempfile - {"file_attachment"=> {"display_name"=>"test","file"=>#<File:uploads/Screen Shot 2013-11-26 at 11.35.36 AM.png>}, "id"=>"1"}
# With tempfile - {"file_attachment"=> {"display_name"=>"test", "file"=>#<File:/path/to/uploads/Screen Shot 2013-11-26 at 4.36.13 PM.png20131127-2757-eb6w6r>}, "id"=>"1", "multipart"=>true}
response = api_post(attachment_upload_endpoint(params[:id]), new_data)
ensure
delete_tempfile(temp_file)
response
end
end
end
Helper methods:
def write_tempfile(file, temp)
file[:tempfile].rewind # Rewind before reading
temp.write(file[:tempfile].read) # Write to the temp file
temp.rewind # Rewind in order to be read
end
def delete_tempfile(temp_file)
#close! calls #close AND #unlink. #unlink deletes the file
temp_file.close!
end
After the file is uploaded there is a link to https://myserver.com/positions/1/file_attachments/46
Does anyone understand why now, when I click on that link, it downloads the file with the filename 46 and not in the browser anymore?
I also get this notification in the console:
Resource interpreted as Document but transferred with MIME type binary/octet-stream
Thanks.
I was able to get it working with some extra parsing for the extension:
ext = file[:filename].split('.').last
temp_file = Tempfile.new([file[:filename], ".#{ext}"])

Writing file to bucket fails on Elastic Beanstalk application

I am writing an application in Ruby on Elastic Beanstalk in which I download a file from a remote server and write it to an object in a bucket.
require 'open-uri'
...
s3 = AWS::S3.new
bucket = s3.buckets['mybucket']
f = open(params[:url]) #using open-uri
obj = bucket.objects[params[:key]]
obj.write[f] #<< fails here
The last line, however, fails with the following exception in the log:
:data must be provided as a String, Pathname, File, or an object that responds to #read and #eof?
I know, however, from executing the same #open on my machine, that f is a StringIO object, which does have #read and #eof?.
I was getting same error during zip file upload on S3 and finally this worked for me:
zip_data = File.read(zip_file_path)
means, zip_data will be the object at the zip file path that is located in your tmp directory.
Hope, this will work for you also.

Resources