How to check if a resource exists in an AWS S3bucket - ruby

I have an AWS S3 bucket to which I have multiple folders.
s3 = AWS::S3.new
bucket = s3.buckets['test']
bucket.exists? => true
Say I have a resource named demo/index.html, how I will check whether this resource is present in this bucket?
May be my question is too simple, but I am not able to find a proper answer for this. Any help is appreciated.

#exists? ⇒ Boolean
Returns true if the object exists in S3.
# new object, does not exist yet
obj = bucket.objects["my-text-object"]
# no instruction file present
begin
bucket.objects['my-text-object.instruction'].exists? #=> false
rescue
# exists? can raise an error `Aws::S3::Errors::Forbidden`
end
# store the encryption materials in the instruction file
# instead of obj#metadata
obj.write("MY TEXT",
:encryption_key => MY_KEY,
:encryption_materials_location => :instruction_file)
begin
bucket.objects['my-text-object.instruction'].exists? #=> true
rescue
# exists? can raise an error `Aws::S3::Errors::Forbidden`
end
http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html#exists%3F-instance_method

Related

List properties of a resource

I'm implementing a custom resource which is basically a facade to an existing resource (in the example below its the vault_certificate resource).
Using the existing resource this code is valid:
certificate = vault_certificate 'a common name' do
combine_certificate_and_chain true
output_certificates false # Just to decrease the chef-client run output
vault_path "pki/issue/#{node['deployment']}"
end
template "a path" do
source 'nginx/dummy.conf.erb'
variables(
certificate: certificate.certificate_filename
key: certificate.key_filename
)
end
Notice I can invoke certificate.certificate_filename or certificate.key_filename. Or more generally I can read any property defined by the vault_certificate resource.
Now with the new resource (sort of a facade to vault_certificate)
provides :vault_certificate_handle_exceptions
unified_mode true
property :common_name, String, name_property: true
property :max_retries, Integer, default: 5
action :create do
require 'retries'
# new_resource.max_retries is being used inside the retry_options. I omitted that part as its not relevant for the question
with_retries(retry_options) do
begin
vault_certificate new_resource.common_name do
combine_certificate_and_chain true
output_certificates false # Just to decrease the chef-client run output
vault_path "pki/issue/#{node['deployment']}"
ignore_failure :quiet
end
rescue Vault::HTTPClientError => e
data = JSON.parse(e.errors)['data']
if data['error'] == 'Certificate not found locally'
# This error is one we can recover from (actually we are expecting it). This raise with VaultCertificateError will trigger the with_retries.
raise VaultCertificateError.new("Waiting for the certificate to appear in the store (because I'm not the leader)", data)
else
# Any other error means something really went wrong.
raise e
end
end
end
end
If I now use this resource and try to invoke .certificate_filename or .key_filename:
certificate = vault_certificate_handle_exceptions 'a common name' do
action :create
end
template "a path" do
source 'nginx/dummy.conf.erb'
variables(
certificate: certificate.certificate_filename
key: certificate.key_filename
)
end
I get an error saying the method certificate_filename (or key_filename) is not defined for vault_certificate_handle_exceptions. To solve it I resorted to this hack:
provides :vault_certificate_handle_exceptions
unified_mode true
property :common_name, String, name_property: true
property :max_retries, Integer, default: 5
action :create do
require 'retries'
# new_resource.max_retries is being used inside the retry_options. I omitted that part as its not relevant for the question
with_retries(retry_options) do
begin
cert = vault_certificate new_resource.common_name do
combine_certificate_and_chain true
output_certificates false # Just to decrease the chef-client run output
vault_path "pki/issue/#{node['deployment']}"
ignore_failure :quiet
end
# These lines ensure we can read the vault_certificate properties as if they were properties of this resource (vault_certificate_handle_exceptions)
Chef::ResourceResolver.resolve(cert.resource_name).properties.keys.each do |name|
new_resource.send(:define_singleton_method, name.to_sym) do
cert.send(name.to_sym)
end
end
rescue Vault::HTTPClientError => e
data = JSON.parse(e.errors)['data']
if data['error'] == 'Certificate not found locally'
# This error is one we can recover from (actually we are expecting it). This raise with VaultCertificateError will trigger the with_retries.
raise VaultCertificateError.new("Waiting for the certificate to appear in the store (because I'm not the leader)", data)
else
# Any other error means something really went wrong.
raise e
end
end
end
end
Is there a cleaner way to achieve this? If not, is there a more direct way to list all the properties of a resource? I thought cert.properties would work, but no luck there.

Verifying permissions on S3 object in Ruby

I'm using aws-sdk for Ruby to manage objects on S3. I'm able grant public read permissions by setting
object.acl = :public_read
Is there a way to determine if there is already public read permission granted to the object before doing that?
Ruby aws-sdk has poor documentation, and I wasn't able to locate it as well. Below is a function that I have created to check whether a file has read permission or not. Modify it as per your needs:
def check_if_public_read(object)
object.acl.grants.each do |grant|
begin
if(grant.grantee.uri == "http://acs.amazonaws.com/groups/global/AllUsers")
return true if ([:read, :full_control].include?(grant.permission.name))
end
rescue
end
end
return false
end
where object is any S3 Object:
AWS.config(
:access_key_id => "access key",
:secret_access_key => "secret key"
)
s3 = AWS::S3.new
file = s3.buckets["my_bucket"].objects["path/to/file.png"]
check_if_public_read(file) => true
Please note that I have figured this out looking at the objects and aws-sdk source code, and the uri parameter may change over time. This works now, and for aws-sdk gem version 1.3.5.

Ruby aws-sdk: "Unable to find marker in S3 list objects response"

I am trying to iterate over the children of a AWS::S3::Tree object, like so:
conn = AWS::S3.new(:access_key_id => 'ACCESSKEYID', :secret_access_key => 'ACCESSKEY')
bucket = conn.buckets['bucketname']
tree = bucket.objects.with_prefix('assets/images').as_tree
directories = tree.children.select(&:branch?).collect(&:prefix)
This works fine on most of the paths I use as a prefix, but one folder (that has tens of thousands of sub folders) returns this error:
/lib/aws/s3/object_collection.rb:311:in `next_markers': Unable to find marker in S3 list objects response (RuntimeError)
I have overridden the gem method to debug like so:
module AWS
class S3
class ObjectCollection
protected
def next_markers page
raise page.inspect
marker = (last = page.contents.last and last.key)
if marker.nil?
raise 'Unable to find marker in S3 list objects response xxxxx'
else
{ :marker => marker }
end
end
end
end
end
And this outputs the returned data:
{:delimiter=>"/", :contents=>[], :common_prefixes=>[{:prefix=>"assets/images/100/"}, {:prefix=>"assets/images/1000/"}, {:prefix=>"assets/images/1001/"}, etc etc
Why is the call returning an empty array for :contents ?
This happens when the XML from S3 is not valid and the contents can not be parsed. If you enable wiretracing (add :http_wire_trace => true to AWS::S3.new) and you should be able to examine the offending http response body.

How do I update a batch of S3 objects' metadata using ruby?

I need to change some metadata (Content-Type) on hundreds or thousands of objects on S3. What's a good way to do this with ruby? As far as I can tell there is no way to save only metadata with fog.io, the entire object must be re-saved. Seems like using the official sdk library would require me rolling a wrapper environment just for this one task.
You're right, the official SDK lets you modify the object metadata without uploading it again. What it does is copy the object but that's on the server so you don't need to download the file and re-upload it.
A wrapper would be easy to implement, something like
bucket.objects.each do |object|
object.metadata['content-type'] = 'application/json'
end
In the v2 API, you can use Object#copy_from() or Object.copy_to() with the :metadata and :metadata_directive => 'REPLACE' options to update an object's metadata without downloading it from S3.
The code in Joost's gist throws this error:
Aws::S3::Errors::InvalidRequest: This copy request is illegal because
it is trying to copy an object to itself without changing the object's
metadata, storage class, website redirect location or encryption
attributes.
This is because by default AWS ignores the :metadata supplied with a copy operation because it copies metadata. We must set the :metadata_directive => 'REPLACE' option if we want to update the metadata in-place.
See http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Object.html#copy_from-instance_method
Here's a full, working code snippet that I recently used to perform metadata update operations:
require 'aws-sdk'
# S3 setup boilerplate
client = Aws::S3::Client.new(
:region => 'us-east-1',
:access_key_id => ENV['AWS_ACCESS_KEY'],
:secret_access_key => ENV['AWS_SECRET_KEY'],
)
s3 = Aws::S3::Resource.new(:client => client)
# Get an object reference
object = s3.bucket('my-bucket-name').object('my-object/key')
# Create our new metadata hash. This can be any hash; in this example we update
# existing metadata with a new key-value pair.
new_metadata = object.metadata.merge('MY_NEW_KEY' => 'MY_NEW_VALUE')
# Use the copy operation to replace our metadata
object.copy_to(object,
:metadata => new_metadata,
# IMPORTANT: normally S3 copies the metadata along with the object.
# we must supply this directive to replace the existing metadata with
# the values we supply
:metadata_directive => "REPLACE",
)
For easy re-use:
def update_metadata(s3_object, new_metadata = {})
s3_object.copy_to(s3_object,
:metadata => new_metadata
:metadata_directive => "REPLACE"
)
end
For future readers, here's a complete sample of changing stuff using the Ruby aws-sdk v1 (also see this Gist for a aws-sdk v2 sample):
# Using v1 of Ruby aws-sdk as currently v2 seems not able to do this (broken?).
require 'aws-sdk-v1'
key = YOUR_AWS_KEY
secret = YOUR_AWS_SECRET
region = YOUR_AWS_REGION
AWS.config(access_key_id: key, secret_access_key: secret, region: region)
s3 = AWS::S3.new
bucket = s3.buckets[bucket_name]
bucket.objects.with_prefix('images/').each do |obj|
puts obj.key
# Add metadata: {} to next line for more metadata.
obj.copy_from(obj.key, content_type: obj.content_type, cache_control: 'max-age=1576800000', acl: :public_read)
end
after some search this seems to work for me
obj.copy_to(obj, :metadata_directive=>"REPLACE", :acl=>"public-read",:content_type=>"text/plain")
Using the sdk to change the content type will result in x-amz-meta- prefix. My solution was to use ruby + aws cli. This will directly write to the content-type instead of x-amz-meta-content-type.
ids_to_copy = all_object_ids
ids_to_copy.each do |id|
object_key = "#{id}.pdf"
command = "aws s3 cp s3://{bucket-name}/#{object_key} s3://{bucket-name}/#{object_key} --no-guess-mime-type --content-type='application/pdf' --metadata-directive='REPLACE'"
system(command)
end
This API appears to be available now:
Fog::Storage.new({
:provider => 'AWS',
:aws_access_key_id => 'foo',
:aws_secret_access_key => 'bar',
:endpoint => 'https://s3.amazonaws.com/',
:path_style => true
}).put_object_tagging(
'bucket_name',
's3_key',
{foo: 'bar'}
)

How to do the equivalent of 's3cmd ls s3://some_bucket/foo/bar' in Ruby?

How do I do the equivalent of 's3cmd ls s3://some_bucket/foo/bar' in Ruby?
I found the Amazon S3 gem for Ruby and also the Right AWS S3 library, but somehow it's not immediately obvious how to do a simple 'ls' like command on an S3 'folder' like location.
Using the aws gem this should do the trick:
s3 = Aws::S3.new(YOUR_ID, YOUR_SECTRET_KEY)
bucket = s3.bucket('some_bucket')
bucket.keys('prefix' => 'foo/bar')
I found a similar question here: Listing directories at a given level in Amazon S3
Based on that I created a method that behaves as much as possible as 's3cmd ls <path>':
require 'right_aws'
module RightAws
class S3
class Bucket
def list(prefix, delimiter = '/')
list = []
#s3.interface.incrementally_list_bucket(#name, {'prefix' => prefix, 'delimiter' => delimiter}) do |item|
if item[:contents].empty?
list << item[:common_prefixes]
else
list << item[:contents].map{|n| n[:key]}
end
end
list.flatten
end
end
end
end
s3 = RightAws::S3.new(ID, SECRET_KEY)
bucket = s3.bucket('some_bucket')
puts bucket.list('foo/bar/').inspect
In case some looks for the answer to this question for the aws-sdk version 2, you can very easily do this this way:
creds = Aws::SharedCredentials.new(profile_name: 'my_credentials')
s3_client = Aws::S3::Client.new(region: 'us-east-1',
credentials: creds)
response = s3_client.list_objects(bucket: "mybucket",
delimiter: "/")
Now, if you do
response.common_prefixes
It will give you the "Folders" of that particular subdirectory, and if you do
response.contents
It will have the files of that particular directory
The official Ruby AWS SDK now supports this: http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/Tree.html
You can also add the following convenience method:
class AWS::S3::Bucket
def ls(path)
as_tree(:prefix => path).children.select(&:branch?).map(&:prefix)
end
end
Then use it like this:
mybucket.ls 'foo/bar' # => ["/foo/bar/dir1/", "/foo/bar/dir2/"]
a quick and simple method to list files in a bucket folder using the ruby aws-sdk:
require 'aws-sdk'
s3 = AWS::S3.new
your_bucket = s3.buckets['bucket_o_files']
your_bucket.objects.with_prefix('lots/of/files/in/2014/09/03/').each do |file|
puts file.key
end
Notice the '/' at the end of the key, it is important.
I like the Idea of opening the Bucket class and adding a 'ls' method.
I would have done it like this...
class AWS::S3::Bucket
def ls(path)
objects.with_prefix("#{path}").as_tree.children.select(&:leaf?).collect(&:member).collect(&:key)
end
end
s3 = AWS::S3.new
your_bucket = s3.buckets['bucket_o_files']
your_bucket.ls('lots/of/files/in/2014/09/03/')

Resources