How do I update a batch of S3 objects' metadata using ruby? - ruby

I need to change some metadata (Content-Type) on hundreds or thousands of objects on S3. What's a good way to do this with ruby? As far as I can tell there is no way to save only metadata with fog.io, the entire object must be re-saved. Seems like using the official sdk library would require me rolling a wrapper environment just for this one task.

You're right, the official SDK lets you modify the object metadata without uploading it again. What it does is copy the object but that's on the server so you don't need to download the file and re-upload it.
A wrapper would be easy to implement, something like
bucket.objects.each do |object|
object.metadata['content-type'] = 'application/json'
end

In the v2 API, you can use Object#copy_from() or Object.copy_to() with the :metadata and :metadata_directive => 'REPLACE' options to update an object's metadata without downloading it from S3.
The code in Joost's gist throws this error:
Aws::S3::Errors::InvalidRequest: This copy request is illegal because
it is trying to copy an object to itself without changing the object's
metadata, storage class, website redirect location or encryption
attributes.
This is because by default AWS ignores the :metadata supplied with a copy operation because it copies metadata. We must set the :metadata_directive => 'REPLACE' option if we want to update the metadata in-place.
See http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Object.html#copy_from-instance_method
Here's a full, working code snippet that I recently used to perform metadata update operations:
require 'aws-sdk'
# S3 setup boilerplate
client = Aws::S3::Client.new(
:region => 'us-east-1',
:access_key_id => ENV['AWS_ACCESS_KEY'],
:secret_access_key => ENV['AWS_SECRET_KEY'],
)
s3 = Aws::S3::Resource.new(:client => client)
# Get an object reference
object = s3.bucket('my-bucket-name').object('my-object/key')
# Create our new metadata hash. This can be any hash; in this example we update
# existing metadata with a new key-value pair.
new_metadata = object.metadata.merge('MY_NEW_KEY' => 'MY_NEW_VALUE')
# Use the copy operation to replace our metadata
object.copy_to(object,
:metadata => new_metadata,
# IMPORTANT: normally S3 copies the metadata along with the object.
# we must supply this directive to replace the existing metadata with
# the values we supply
:metadata_directive => "REPLACE",
)
For easy re-use:
def update_metadata(s3_object, new_metadata = {})
s3_object.copy_to(s3_object,
:metadata => new_metadata
:metadata_directive => "REPLACE"
)
end

For future readers, here's a complete sample of changing stuff using the Ruby aws-sdk v1 (also see this Gist for a aws-sdk v2 sample):
# Using v1 of Ruby aws-sdk as currently v2 seems not able to do this (broken?).
require 'aws-sdk-v1'
key = YOUR_AWS_KEY
secret = YOUR_AWS_SECRET
region = YOUR_AWS_REGION
AWS.config(access_key_id: key, secret_access_key: secret, region: region)
s3 = AWS::S3.new
bucket = s3.buckets[bucket_name]
bucket.objects.with_prefix('images/').each do |obj|
puts obj.key
# Add metadata: {} to next line for more metadata.
obj.copy_from(obj.key, content_type: obj.content_type, cache_control: 'max-age=1576800000', acl: :public_read)
end

after some search this seems to work for me
obj.copy_to(obj, :metadata_directive=>"REPLACE", :acl=>"public-read",:content_type=>"text/plain")

Using the sdk to change the content type will result in x-amz-meta- prefix. My solution was to use ruby + aws cli. This will directly write to the content-type instead of x-amz-meta-content-type.
ids_to_copy = all_object_ids
ids_to_copy.each do |id|
object_key = "#{id}.pdf"
command = "aws s3 cp s3://{bucket-name}/#{object_key} s3://{bucket-name}/#{object_key} --no-guess-mime-type --content-type='application/pdf' --metadata-directive='REPLACE'"
system(command)
end

This API appears to be available now:
Fog::Storage.new({
:provider => 'AWS',
:aws_access_key_id => 'foo',
:aws_secret_access_key => 'bar',
:endpoint => 'https://s3.amazonaws.com/',
:path_style => true
}).put_object_tagging(
'bucket_name',
's3_key',
{foo: 'bar'}
)

Related

Fog with Carrierwave upload to S3 default upload path invalid

I'm trying to upload to S3 with Carrierwave and Fog-Aws, and I'm having an issue. For some reason, fog is trying to upload to my bucket at
https://{bucket-name}.s3.amazonaws.com
But, when I access a file directly from aws, the url format is like this:
https://s3-{region}.amazonaws.com/{bucket-name
Whenever I try to use the path that Fog is using, it gives me the following error:
The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
So my question is, is there a way to
A) Change the endpoint format on S3 to match what Fog is expecting it to be, or
B) Change a setting for Fog to use this different format instead?
For reference:
I'm using Carrierwave version 1.0, fog-aws version 0.11.0
Here's my carrierwave.rb file:
if Rails.env.test? or Rails.env.development?
CarrierWave.configure do |config|
config.storage = :file
config.root = "#{Rails.root}/tmp"
config.cache_dir = "#{Rails.root}/tmp/images"
end
else
CarrierWave.configure do |config|
config.fog_provider = 'fog/aws'
config.fog_credentials = {
:provider => 'AWS',
:aws_access_key_id => ENV['AWS_ACCESS_KEY_ID'],
:aws_secret_access_key => ENV['AWS_SECRET_ACCESS_KEY'],
:region => ENV['AWS_S3_REGION'],
:endpoint => "https://s3-#{ENV['AWS_S3_REGION']}.amazonaws.com/#{ENV['AWS_S3_BUCKET_NAME']}"
}
config.storage = :fog
config.fog_directory = ENV['AWS_S3_BUCKET_NAME']
config.fog_public = false
end
end
I believe :region is the only setting you should need to change in this case. As long as it is set accurately (and isn't the default us-east-1 region) it should change the host as you desire.
That said, I would NOT expect to also need to change endpoint like this. It would be set if you needed to use CNAME stuff, which it doesn't sound like you need. Omitting this, while setting region, should hopefully get you what you are after.

AWS Ruby SDK CORE: Upload files to S3

I want to upload a file (any file, could be a .txt, .mp4, .mp3, .zip, .tar ...etc) to AWS S3 using AWS-SDK-CORE ruby SDK
Here is my code:
require 'aws-sdk-core'
Aws.config = {
:access_key_id => MY_ACCESS_KEY
:secret_access_key => MY_SECRET_KEY,
:region => 'us-west-2'
}
s3 = Aws::S3.new
resp = s3.put_object(
:bucket => "mybucket",
:key => "myfolder/upload_me.sql",
:body => "./upload_me.sql"
)
Now, Above code runs and creates a key myfolder/upload_me.sql which has only one line written and that is ./upload_me.sql which is wrong. The file upload_me.sql has several lines.
Expected behaviour is to upload the file upload_me.sql on S3 as mybucket/myfolder/upload_me.sql. But instead it just writes one line to mybucket/myfolder/upload_me.sql and that is ./upload_me.sql
Now, If I omit the :body part as below:
s3 = Aws::S3.new
resp = s3.put_object(
:bucket => "mybucket",
:key => "myfolder/upload_me.sql",
)
Then it just creates and empty key called mybucket/myfolder/upload_me.sql which is not even downloadable (well, even if it gets downloaded, it is useless)
Could you point me where I am going wrong?
Here is ruby-SDK-core documentation for put_object Method: http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/V20060301.html#put_object-instance_method
UPDATE:
If I try to upload the same file using AWS-CLI, it gets uploaded fine. Here is the command:
aws s3api put-object --bucket mybucket --key myfolder/upload_me.sql --body ./upload_me.sql
So, After spending a frustrating sunday afternoon on htis issue, I finally cracked it. What I really needed is :body => IO.read("./upload_me.sql")
So my code looks like below:
s3 = Aws::S3.new
resp = s3.put_object(
:bucket => "mybucket",
:key => "myfolder/upload_me.sql",
:body => IO.read("./upload_me.sql")
)
The body variable is the contents that will be written to S3. So if you send a file to S3 you need to manually load by using File.read("upload_me.sql") something similar.
s3 = Aws::S3.new
resp = s3.put_object(
:bucket => "mybucket",
:key => "myfolder/upload_me.sql",
:body => File.read("./upload_me.sql")
)
According to the documentation another way to do this is to use write on the bucket.
s3 = AWS::S3.new
key = File.basename(file_name)
s3.buckets["mybucket"].objects[key].write(:file => "upload_me.sql")
Another way would be
AWS.config(
:access_key_id => 'MY_ACCESS_KEY',
:secret_access_key => 'MY_SECRET_KEY',
)
#Set the filename
file_name = 'filename.txt'
#Set the bucket name
s3_bucket_name = 'my bucket name'
#If file has to go in some specific folder
bucket_directory = 'key or folder'
begin
s3 = AWS::S3.new
#Check if directory name has provided and Make an object in your bucket for your upload
if bucket_directory == ''
bucket_obj = s3.buckets[s3_bucket_name].objects[bucket_directory]
else
bucket_obj = s3.buckets[s3_bucket_name].objects["#{bucket_directory}/#{file_name}"]
end
# Upload the file
bucket_obj.write(:file => file_name)
puts "File was successfully uploaded : #{bucket_obj}"
rescue Exception => e
puts "There was an error in uploading file: #{e}"
end
Working Example
Reference
Probably the file wasn't found as the path is relative.
This is a strange behavior, where the interface try to make too many decisions.
I can assure you this works (v3):
client = Aws::S3::Client.new(...)
client.put_object(
body: './existing_file.txt',
bucket: 'kick-it',
key: 'test1.txt'
) # kick-it:/test1.txt contains the same as the contents of existing_file.txt
client.put_object(
body: './non_existing_file.txt',
bucket: 'kick-it',
key: 'test2.txt'
) # kick-it:/test2.txt contains just the string './non_existing_file.txt'
Using body for both cases is a bad decision, if you ask me.

accessing non standard s3 bucket

Using the aws-s3 gem, I can successfully perform transaction with a standard s3 bucket but one made in Ireland (s3-eu-west-1) gives the error The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. After 2 hours of searching this still means nothing to me, is there a way to get round this problem.
This simple tutorial works fine for standard s3 bucket but not for Ireland.
This person's experiences seem to suggest it's not possible.
Ok I've just found the answer here.
require 'aws/s3'
AWS::S3::Base.establish_connection!(
:access_key_id => ACCESS_KEY_ID,
:secret_access_key => SECRET_ACCESS_KEY
)
AWS::S3::DEFAULT_HOST.replace('s3-eu-west-1.amazonaws.com') # <= the crucial hacky line
AWS::S3::S3Object.store(
file_name,
temp_file,
bucket,
:content_type => mime_type
)
Edit
Much better option is to use the aws-sdk gem whose API seems a lot nicer, e.g.:
require 'aws-sdk'
s3 = AWS::S3.new(
:access_key_id => ACCESS_KEY_ID,
:secret_access_key => SECRET_ACCESS_KEY,
:s3_endpoint => 's3-eu-west-1.amazonaws.com'
)
bucket = s3.buckets[bucket_name]
bucket.objects.create(
file_name,
temp_file,
:content_type => mime_type
)

Verifying permissions on S3 object in Ruby

I'm using aws-sdk for Ruby to manage objects on S3. I'm able grant public read permissions by setting
object.acl = :public_read
Is there a way to determine if there is already public read permission granted to the object before doing that?
Ruby aws-sdk has poor documentation, and I wasn't able to locate it as well. Below is a function that I have created to check whether a file has read permission or not. Modify it as per your needs:
def check_if_public_read(object)
object.acl.grants.each do |grant|
begin
if(grant.grantee.uri == "http://acs.amazonaws.com/groups/global/AllUsers")
return true if ([:read, :full_control].include?(grant.permission.name))
end
rescue
end
end
return false
end
where object is any S3 Object:
AWS.config(
:access_key_id => "access key",
:secret_access_key => "secret key"
)
s3 = AWS::S3.new
file = s3.buckets["my_bucket"].objects["path/to/file.png"]
check_if_public_read(file) => true
Please note that I have figured this out looking at the objects and aws-sdk source code, and the uri parameter may change over time. This works now, and for aws-sdk gem version 1.3.5.

How to do the equivalent of 's3cmd ls s3://some_bucket/foo/bar' in Ruby?

How do I do the equivalent of 's3cmd ls s3://some_bucket/foo/bar' in Ruby?
I found the Amazon S3 gem for Ruby and also the Right AWS S3 library, but somehow it's not immediately obvious how to do a simple 'ls' like command on an S3 'folder' like location.
Using the aws gem this should do the trick:
s3 = Aws::S3.new(YOUR_ID, YOUR_SECTRET_KEY)
bucket = s3.bucket('some_bucket')
bucket.keys('prefix' => 'foo/bar')
I found a similar question here: Listing directories at a given level in Amazon S3
Based on that I created a method that behaves as much as possible as 's3cmd ls <path>':
require 'right_aws'
module RightAws
class S3
class Bucket
def list(prefix, delimiter = '/')
list = []
#s3.interface.incrementally_list_bucket(#name, {'prefix' => prefix, 'delimiter' => delimiter}) do |item|
if item[:contents].empty?
list << item[:common_prefixes]
else
list << item[:contents].map{|n| n[:key]}
end
end
list.flatten
end
end
end
end
s3 = RightAws::S3.new(ID, SECRET_KEY)
bucket = s3.bucket('some_bucket')
puts bucket.list('foo/bar/').inspect
In case some looks for the answer to this question for the aws-sdk version 2, you can very easily do this this way:
creds = Aws::SharedCredentials.new(profile_name: 'my_credentials')
s3_client = Aws::S3::Client.new(region: 'us-east-1',
credentials: creds)
response = s3_client.list_objects(bucket: "mybucket",
delimiter: "/")
Now, if you do
response.common_prefixes
It will give you the "Folders" of that particular subdirectory, and if you do
response.contents
It will have the files of that particular directory
The official Ruby AWS SDK now supports this: http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/Tree.html
You can also add the following convenience method:
class AWS::S3::Bucket
def ls(path)
as_tree(:prefix => path).children.select(&:branch?).map(&:prefix)
end
end
Then use it like this:
mybucket.ls 'foo/bar' # => ["/foo/bar/dir1/", "/foo/bar/dir2/"]
a quick and simple method to list files in a bucket folder using the ruby aws-sdk:
require 'aws-sdk'
s3 = AWS::S3.new
your_bucket = s3.buckets['bucket_o_files']
your_bucket.objects.with_prefix('lots/of/files/in/2014/09/03/').each do |file|
puts file.key
end
Notice the '/' at the end of the key, it is important.
I like the Idea of opening the Bucket class and adding a 'ls' method.
I would have done it like this...
class AWS::S3::Bucket
def ls(path)
objects.with_prefix("#{path}").as_tree.children.select(&:leaf?).collect(&:member).collect(&:key)
end
end
s3 = AWS::S3.new
your_bucket = s3.buckets['bucket_o_files']
your_bucket.ls('lots/of/files/in/2014/09/03/')

Resources