How do I create a presigned link to a particular version of an object using the Ruby AWS SDK v2? - ruby

I am using the Ruby AWS SDK (v2) to upload log files to a versioned S3 bucket. The log files are not public, but I would like to generate a presigned link to the log to make it available for a limited time via a chat integration. I want to link to a particular version, which this answer says is possible via the S3 console.
Documentation on Aws::S3::Presigner shows how to do this for an unversioned object (or the head version of a versioned object) but not for a particular version. The possible parameters to #presigned_url are not well documented, and reading the source it looks like the parameters are just passed to Seahorse::Client::Base#build_request which is not S3-specific.

I think I've finally worked this out, though I'm still not sure I could trace the entire code path. In short: You can pass :version_id in the options parameter to presigned_url.
#
# Uploads a log to S3 at the given key, returning a URL
# to the file that's good for one hour.
#
# #param [String] bucket
# #param [String] key
# #param [String] body of the log to be uploaded
# #param [Hash] options
# #return [String] public URL of uploaded log, valid for one hour
# #raise [Exception] if the S3 upload fails
#
def upload_log(bucket, key, body, options={})
# Upload log
result = AWS::S3.create_client.put_object(
options.merge(
bucket: bucket,
key: key,
body: body
)
)
# Get presigned URL that expires in one hour
options = {bucket: bucket, key: key, expires_in: 3600}
options[:version_id] = result[:version_id] unless result[:version_id].nil?
Aws::S3::Presigner.new.presigned_url(:get_object, options)
end
And here, everything I could trace about why this works:
presigned_url passes its params argument through to #client.build_request (presigner.rb#L48).
build_request eventually pushes those parameters onto request.context.params of the request it returns (documented in client/base_spec.rb#L91).
From here my understanding is fuzzy; I expect that something like Aws::Rest::Request::Builder passes all the params along to create the Endpoint and the particular rules for this operation (which I'm unable to find) allow version_id to be added to the querystring.
In any case, it's working. Thanks for the pointer Michael!

Related

ruby-fog: Delete an item from the object storage in less than 3 requests

I started using fog storage for a project. I do the most simple actions: upload an object, get the object, delete the object. My code looks something like this:
storage = get_storage(...) // S3 / OpenStack / ...
dir = storage.directories.get(bucket) # 1st request
if !dir.nil?
dir.files.create(key: key, body: body) # 2nd request
# or:
dir.files.get(key) # 2nd request
#or
file = dir.files.get(key) # 2nd request
if !file.nil?
file.destroy # 3rd request
end
end
In all cases there's a 1st step to get the directory, which does a request to the storage engine (it returns nil if the directory doesn't exists).
Then there's another step to do whatever I'd like to do (in case of delete there's even a 3rd step in the middle).
However if I look at let's say the Amazon S3 API, it's clear that deleting an object doesn't need 3 requests to amazon.
Is there a way to use fog but make it do less requests to the storage provider?
I think this was already answered on the mailing list, but if you use #new on directories/files it will give you just a local reference (vs #get which does a lookup). That should get you what you want, though it may raise errors if the file or directory does not exist.
Something like this:
storage = get_storage(...) // S3 / OpenStack / ...
dir = storage.directories.new(key: bucket)
dir.files.create(key: key, body: body) # 1st request
# or:
dir.files.get(key) # 1st request
#or
file = dir.files.new(key)
if !file.nil?
file.destroy # 1st request
end
Working in this way should allow any of the 3 modalities to work in a single request, but may result in errors if the bucket does not exist (as trying to add a file to non-existent bucket is an error). So it is more efficient, but would need different error handling. Conversely, you can make the extra requests if you need to be sure.

aws-sdk ruby returns nil when deleting object

Im trying to delete an object on S3 using the ruby aws-sdk (version 2). It works fine, but it returns this
<struct Aws::S3::Types::DeleteObjectOutput delete_marker=nil,version_id=nil, request_charged=nil>
Which doesnt make sense because in the documentation it says the response should be of the type:
resp.delete_marker #=> true/false
resp.version_id #=> String
resp.request_charged #=> String, one of "requester"
Why am I becoming nil? I want to know if the object was deleted or not. I am getting that response both when i succeed in deleting the object and when I dont.
This is the code Im using to delete the object:
creds = Aws::Credentials.new(user_access_key,
user_secret_key,
session_token)
s3 = Aws::S3::Client.new( region: 'eu-west-1',
credentials: creds)
key = "myKey.csv"
r = s3.delete_object(bucket: "myBucket",
key: key)
Your delete object was successful. The Amazon S3 API only returns those values under certain circumstances. In this case, your object was not in a versioned bucket (no version id or delete marker boolean), and is not configured for request-payer.
As a general rule, if the SDK does not raise an error from the response, it is successful. In this case, the API reference documentation may be confusing as it does not clearly indicate that these values may be nil.
aws-sdk-ruby Aws::CloudFormation::Client#delete_stack Documentation

Avoid repeated calls to an API in Jekyll Ruby plugin

I have written a Jekyll plugin to display the number of pageviews on a page by calling the Google Analytics API using the garb gem. The only trouble with my approach is that it makes a call to the API for each page, slowing down build time and also potentially hitting the user call limits on the API.
It would be possible to return all the data in a single call and store it locally, and then look up the pageview count from each page, but my Jekyll/Ruby-fu isn't up to scratch. I do not know how to write the plugin to run once to get all the data and store it locally where my current function could then access it, rather than calling the API page by page.
Basically my code is written as a liquid block that can be put into my page layout:
class GoogleAnalytics < Liquid::Block
def initialize(tag_name, markup, tokens)
super # options that appear in block (between tag and endtag)
#options = markup # optional optionss passed in by opening tag
end
def render(context)
path = super
# Read in credentials and authenticate
cred = YAML.load_file("/home/cboettig/.garb_auth.yaml")
Garb::Session.api_key = cred[:api_key]
token = Garb::Session.login(cred[:username], cred[:password])
profile = Garb::Management::Profile.all.detect {|p| p.web_property_id == cred[:ua]}
# place query, customize to modify results
data = Exits.results(profile,
:filters => {:page_path.eql => path},
:start_date => Chronic.parse("2011-01-01"))
data.first.pageviews
end
Full version of my plugin is here
How can I move all the calls to the API to some other function and make sure jekyll runs that once at the start, and then adjust the tag above to read that local data?
EDIT Looks like this can be done with a Generator and writing the data to a file. See example on this branch Now I just need to figure out how to subset the results: https://github.com/Sija/garb/issues/22
To store the data, I had to:
Write a Generator class (see Jekyll wiki plugins) to call the API.
Convert data to a hash (for easy lookup by path, see 5):
result = Hash[data.collect{|row| [row.page_path, [row.exits, row.pageviews]]}]
Write the data hash to a JSON file.
Read in the data from the file in my existing Liquid block class.
Note that the block tag works from the _includes dir, while the generator works from the root directory.
Match the page path, easy once the data is converted to a hash:
result[path][1]
Code for the full plugin, showing how to create the generator and write files, etc, here
And thanks to Sija on GitHub for help on this.

Updating content-type after file upload on Amazon S3 with Amazon-SDK Ruby gem

I'm running a script that updates a metadata field on some of my S3 objects after they have already been uploaded to the S3 bucket. On initialization, I am setting the content-type by checking the file name.
def save_to_amazon(file, s3_object, file_name, meta_path)
puts "uploaded #{file} to Amazon S3"
content_type = set_content_type(file_name)
s3_object.write(file.get_input_stream.read, :metadata => { :folders => meta_path}, :content_type => content_type)
end
At this point, the S3 content-type works fine for these objects. The problem arises when I update the metadata later on. I run something like this:
s3_object.metadata['folders'] = "some string"
At this point, I get an empty string returned when I run s3_objects.content_type after updating the metadata.
s3_object.content_type = is not available.
As far as I can tell from reading the Rdoc there isn't a way to assign content-type after uploading the S3 file. I have tried using the metadata method like
s3.object.metadata['content_type'] = "some string"
s3.object.metadata['content-type'] = "some string"
Both of these appear to assign a new custom metadata attribute instead of updating the object's mime type.
Is there a way to set this, or do I need to completely re-upload the file again?
To elaborate on tkotisis reponse, here is what I did to update the content-type using copy_to. You can use s3object.head[:metadata] to pull out the existing metadata to copy it over as referenced here.
amazon_bucket.objects.each do |ob|
metadata = ob.head[:metadata]
content_type = "foo/bar"
ob.copy_to(ob.key, :metadata => metadata, :content_type => content_type)
end
EDIT
amazon_bucket.objects.each do |ob|
metadata = ob.metadata
content_type = "foo/bar"
ob.copy_to(ob.key, :metadata{:foo => metadata[:foo]}, :content_type => content_type)
end
Your example code only modifies your in-memory object.
To modify the metadata of the actual S3 object, issue a copy request with destination key the one of your current object.
EDIT
According to the documentation
Using the copy operation, you can rename objects by copying them and
deleting the original ones.
When copying an object, you might decide to update some of the
metadata values. For example, if your source object is configured to
use standard storage, you might choose to use reduced redundancy
storage for the object copy. You might also decide to alter some of
the user-defined metadata values present on the source object. Note
that if you choose to update any of the object's user configurable
metadata (system or user-defined) during the copy, then you must
explicitly specify all the user configurable metadata, even if you are
only changing only one of the metadata values, present on the source
object in your request.
I haven't tried it, but using the Ruby SDK this is probably achieved through the
- (S3Object) copy_to(target, options = {})
method.
I'm using a gem "aws-sdk", "~> 2" (2.2.3)
Assume that you have a current file without set content-type (Content-type will be set as a "binary/octet-stream" by default)
How to check a content-type file?
If you use the RestClient as follows:
object mean Aws::S3::Object
bucket = Aws::S3::Bucket.new(bucket_name)
object = bucket.object(key)
RestClient.head(object.presigned_url(:head)) do |resp|
puts resp.headers
puts resp.headers[:content_type]
end
How to change a content-type file?
In my case, I wanna change a content-type to 'image/jpeg' which current object is 'binary/octet-stream' so you can
object.copy_from(
object,
content_type: 'image/jpeg',
metadata_directive: 'REPLACE'
)
Make sure you set the ACL to :public read, otherwise your files will be unavailable after copying.
This did the trick for me:
bucket.objects.with_prefix('my_assets').each do |obj|
metadata = obj.head[:metadata]
content_type = "application/pdf"
obj.copy_to(obj.key, :metadata => metadata, :content_type => content_type)
obj.acl = :public_read
end
Although not Ruby I found this project which automatically guessing the mime type based on the extension and resets is via the same copy method that the other answers refers to. It's not terribly quick since it has to copy the blob. If you needed to make it happen faster you could probably divide up the work and copy in parallel via something like IronWorker. I did a similar thing for resetting permissions.

amazon s3 and carrierwave random image name in bucket does not match in database

I'm using carrier wave, rails and amazon s3. Every time I save an image, the image shows up in s3 and I can see it in the management console with the name like this:
https://s3.amazonaws.com/bucket-name/
uploads/images/10/888fdcfdd6f0eeea_1351389576.png
But in the model, the name is this:
https://bucket-name.s3.amazonaws.com/
uploads/images/10/b3ca26c2baa3b857_1351389576.png
First off, why is the random name different? I am generating it in the uploader like so:
def filename
if original_filename
"#{SecureRandom::hex(8)}_#{Time.now.to_i}#{File.extname(original_filename).downcase}"
end
end
I know it is not generating a random string every call because the wrong url in the model is consistent and saved. Somewhere in the process a new one must be getting generated to save in the model after the image name has been saved and sent to amazon s3. Strange.
Also, can I have the url match the one in terms of s3/bucket instead of bucket.s3 without using a regex? Is there an option in carrierwave or something for that?
CarrierWave by default doesn't store the URL. Instead, it generates it every time you need it.
So, every time filename is called it will return a different value, because of Time.now.to_i.
Use created_at column instead, or add a new column for storing the random id or the full filename.
I solved it by saving the filename if it was still the original filename. In the uploader, put:
def filename
if original_filename && original_filename == #filename
#filename = "#{any_string}#{File.extname(original_filename).downcase}"
else
#filename
end
end
The issue of the sumbdomain versus the path is not actually an issue. It works with the subdomain. I.e. https://s3.amazonaws.com/bucket-name/ and https://bucket-name.s3.amazonaws.com/ both work fine.

Resources