accessing non standard s3 bucket - ruby

Using the aws-s3 gem, I can successfully perform transaction with a standard s3 bucket but one made in Ireland (s3-eu-west-1) gives the error The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. After 2 hours of searching this still means nothing to me, is there a way to get round this problem.
This simple tutorial works fine for standard s3 bucket but not for Ireland.
This person's experiences seem to suggest it's not possible.

Ok I've just found the answer here.
require 'aws/s3'
AWS::S3::Base.establish_connection!(
:access_key_id => ACCESS_KEY_ID,
:secret_access_key => SECRET_ACCESS_KEY
)
AWS::S3::DEFAULT_HOST.replace('s3-eu-west-1.amazonaws.com') # <= the crucial hacky line
AWS::S3::S3Object.store(
file_name,
temp_file,
bucket,
:content_type => mime_type
)
Edit
Much better option is to use the aws-sdk gem whose API seems a lot nicer, e.g.:
require 'aws-sdk'
s3 = AWS::S3.new(
:access_key_id => ACCESS_KEY_ID,
:secret_access_key => SECRET_ACCESS_KEY,
:s3_endpoint => 's3-eu-west-1.amazonaws.com'
)
bucket = s3.buckets[bucket_name]
bucket.objects.create(
file_name,
temp_file,
:content_type => mime_type
)

Related

Fog with Carrierwave upload to S3 default upload path invalid

I'm trying to upload to S3 with Carrierwave and Fog-Aws, and I'm having an issue. For some reason, fog is trying to upload to my bucket at
https://{bucket-name}.s3.amazonaws.com
But, when I access a file directly from aws, the url format is like this:
https://s3-{region}.amazonaws.com/{bucket-name
Whenever I try to use the path that Fog is using, it gives me the following error:
The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
So my question is, is there a way to
A) Change the endpoint format on S3 to match what Fog is expecting it to be, or
B) Change a setting for Fog to use this different format instead?
For reference:
I'm using Carrierwave version 1.0, fog-aws version 0.11.0
Here's my carrierwave.rb file:
if Rails.env.test? or Rails.env.development?
CarrierWave.configure do |config|
config.storage = :file
config.root = "#{Rails.root}/tmp"
config.cache_dir = "#{Rails.root}/tmp/images"
end
else
CarrierWave.configure do |config|
config.fog_provider = 'fog/aws'
config.fog_credentials = {
:provider => 'AWS',
:aws_access_key_id => ENV['AWS_ACCESS_KEY_ID'],
:aws_secret_access_key => ENV['AWS_SECRET_ACCESS_KEY'],
:region => ENV['AWS_S3_REGION'],
:endpoint => "https://s3-#{ENV['AWS_S3_REGION']}.amazonaws.com/#{ENV['AWS_S3_BUCKET_NAME']}"
}
config.storage = :fog
config.fog_directory = ENV['AWS_S3_BUCKET_NAME']
config.fog_public = false
end
end
I believe :region is the only setting you should need to change in this case. As long as it is set accurately (and isn't the default us-east-1 region) it should change the host as you desire.
That said, I would NOT expect to also need to change endpoint like this. It would be set if you needed to use CNAME stuff, which it doesn't sound like you need. Omitting this, while setting region, should hopefully get you what you are after.

How to use an authenticated user with S3 and AWS-SDK in Ruby

I am following the following example:
http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadObjSingleOpRuby.html
require 'aws-sdk'
s3 = Aws::S3::Resource.new(
:access_key_id => "something",
:secret_access_key => "verysecret",
:region => 'us-east-1')
bucket = s3.bucket('mybucket').object('test')
bucket.upload_file('/files/useless.txt')
I am getting this terrible "Access Denied" error, and that's most likely because I am not being the authenticated user I need to be - and missing in this code. Where do I fit it in?
Thank you!
Your access_key_id and secret_access_key are user specific. If you go to the AWS console and go to the IAM tools, you can set up new keys.

AWS Ruby SDK CORE: Upload files to S3

I want to upload a file (any file, could be a .txt, .mp4, .mp3, .zip, .tar ...etc) to AWS S3 using AWS-SDK-CORE ruby SDK
Here is my code:
require 'aws-sdk-core'
Aws.config = {
:access_key_id => MY_ACCESS_KEY
:secret_access_key => MY_SECRET_KEY,
:region => 'us-west-2'
}
s3 = Aws::S3.new
resp = s3.put_object(
:bucket => "mybucket",
:key => "myfolder/upload_me.sql",
:body => "./upload_me.sql"
)
Now, Above code runs and creates a key myfolder/upload_me.sql which has only one line written and that is ./upload_me.sql which is wrong. The file upload_me.sql has several lines.
Expected behaviour is to upload the file upload_me.sql on S3 as mybucket/myfolder/upload_me.sql. But instead it just writes one line to mybucket/myfolder/upload_me.sql and that is ./upload_me.sql
Now, If I omit the :body part as below:
s3 = Aws::S3.new
resp = s3.put_object(
:bucket => "mybucket",
:key => "myfolder/upload_me.sql",
)
Then it just creates and empty key called mybucket/myfolder/upload_me.sql which is not even downloadable (well, even if it gets downloaded, it is useless)
Could you point me where I am going wrong?
Here is ruby-SDK-core documentation for put_object Method: http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/V20060301.html#put_object-instance_method
UPDATE:
If I try to upload the same file using AWS-CLI, it gets uploaded fine. Here is the command:
aws s3api put-object --bucket mybucket --key myfolder/upload_me.sql --body ./upload_me.sql
So, After spending a frustrating sunday afternoon on htis issue, I finally cracked it. What I really needed is :body => IO.read("./upload_me.sql")
So my code looks like below:
s3 = Aws::S3.new
resp = s3.put_object(
:bucket => "mybucket",
:key => "myfolder/upload_me.sql",
:body => IO.read("./upload_me.sql")
)
The body variable is the contents that will be written to S3. So if you send a file to S3 you need to manually load by using File.read("upload_me.sql") something similar.
s3 = Aws::S3.new
resp = s3.put_object(
:bucket => "mybucket",
:key => "myfolder/upload_me.sql",
:body => File.read("./upload_me.sql")
)
According to the documentation another way to do this is to use write on the bucket.
s3 = AWS::S3.new
key = File.basename(file_name)
s3.buckets["mybucket"].objects[key].write(:file => "upload_me.sql")
Another way would be
AWS.config(
:access_key_id => 'MY_ACCESS_KEY',
:secret_access_key => 'MY_SECRET_KEY',
)
#Set the filename
file_name = 'filename.txt'
#Set the bucket name
s3_bucket_name = 'my bucket name'
#If file has to go in some specific folder
bucket_directory = 'key or folder'
begin
s3 = AWS::S3.new
#Check if directory name has provided and Make an object in your bucket for your upload
if bucket_directory == ''
bucket_obj = s3.buckets[s3_bucket_name].objects[bucket_directory]
else
bucket_obj = s3.buckets[s3_bucket_name].objects["#{bucket_directory}/#{file_name}"]
end
# Upload the file
bucket_obj.write(:file => file_name)
puts "File was successfully uploaded : #{bucket_obj}"
rescue Exception => e
puts "There was an error in uploading file: #{e}"
end
Working Example
Reference
Probably the file wasn't found as the path is relative.
This is a strange behavior, where the interface try to make too many decisions.
I can assure you this works (v3):
client = Aws::S3::Client.new(...)
client.put_object(
body: './existing_file.txt',
bucket: 'kick-it',
key: 'test1.txt'
) # kick-it:/test1.txt contains the same as the contents of existing_file.txt
client.put_object(
body: './non_existing_file.txt',
bucket: 'kick-it',
key: 'test2.txt'
) # kick-it:/test2.txt contains just the string './non_existing_file.txt'
Using body for both cases is a bad decision, if you ask me.

How do I update a batch of S3 objects' metadata using ruby?

I need to change some metadata (Content-Type) on hundreds or thousands of objects on S3. What's a good way to do this with ruby? As far as I can tell there is no way to save only metadata with fog.io, the entire object must be re-saved. Seems like using the official sdk library would require me rolling a wrapper environment just for this one task.
You're right, the official SDK lets you modify the object metadata without uploading it again. What it does is copy the object but that's on the server so you don't need to download the file and re-upload it.
A wrapper would be easy to implement, something like
bucket.objects.each do |object|
object.metadata['content-type'] = 'application/json'
end
In the v2 API, you can use Object#copy_from() or Object.copy_to() with the :metadata and :metadata_directive => 'REPLACE' options to update an object's metadata without downloading it from S3.
The code in Joost's gist throws this error:
Aws::S3::Errors::InvalidRequest: This copy request is illegal because
it is trying to copy an object to itself without changing the object's
metadata, storage class, website redirect location or encryption
attributes.
This is because by default AWS ignores the :metadata supplied with a copy operation because it copies metadata. We must set the :metadata_directive => 'REPLACE' option if we want to update the metadata in-place.
See http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Object.html#copy_from-instance_method
Here's a full, working code snippet that I recently used to perform metadata update operations:
require 'aws-sdk'
# S3 setup boilerplate
client = Aws::S3::Client.new(
:region => 'us-east-1',
:access_key_id => ENV['AWS_ACCESS_KEY'],
:secret_access_key => ENV['AWS_SECRET_KEY'],
)
s3 = Aws::S3::Resource.new(:client => client)
# Get an object reference
object = s3.bucket('my-bucket-name').object('my-object/key')
# Create our new metadata hash. This can be any hash; in this example we update
# existing metadata with a new key-value pair.
new_metadata = object.metadata.merge('MY_NEW_KEY' => 'MY_NEW_VALUE')
# Use the copy operation to replace our metadata
object.copy_to(object,
:metadata => new_metadata,
# IMPORTANT: normally S3 copies the metadata along with the object.
# we must supply this directive to replace the existing metadata with
# the values we supply
:metadata_directive => "REPLACE",
)
For easy re-use:
def update_metadata(s3_object, new_metadata = {})
s3_object.copy_to(s3_object,
:metadata => new_metadata
:metadata_directive => "REPLACE"
)
end
For future readers, here's a complete sample of changing stuff using the Ruby aws-sdk v1 (also see this Gist for a aws-sdk v2 sample):
# Using v1 of Ruby aws-sdk as currently v2 seems not able to do this (broken?).
require 'aws-sdk-v1'
key = YOUR_AWS_KEY
secret = YOUR_AWS_SECRET
region = YOUR_AWS_REGION
AWS.config(access_key_id: key, secret_access_key: secret, region: region)
s3 = AWS::S3.new
bucket = s3.buckets[bucket_name]
bucket.objects.with_prefix('images/').each do |obj|
puts obj.key
# Add metadata: {} to next line for more metadata.
obj.copy_from(obj.key, content_type: obj.content_type, cache_control: 'max-age=1576800000', acl: :public_read)
end
after some search this seems to work for me
obj.copy_to(obj, :metadata_directive=>"REPLACE", :acl=>"public-read",:content_type=>"text/plain")
Using the sdk to change the content type will result in x-amz-meta- prefix. My solution was to use ruby + aws cli. This will directly write to the content-type instead of x-amz-meta-content-type.
ids_to_copy = all_object_ids
ids_to_copy.each do |id|
object_key = "#{id}.pdf"
command = "aws s3 cp s3://{bucket-name}/#{object_key} s3://{bucket-name}/#{object_key} --no-guess-mime-type --content-type='application/pdf' --metadata-directive='REPLACE'"
system(command)
end
This API appears to be available now:
Fog::Storage.new({
:provider => 'AWS',
:aws_access_key_id => 'foo',
:aws_secret_access_key => 'bar',
:endpoint => 'https://s3.amazonaws.com/',
:path_style => true
}).put_object_tagging(
'bucket_name',
's3_key',
{foo: 'bar'}
)

Paperclip, Delayed Job, S3, Heroku - design for delayed processing of sensitive uploaded files: db or s3?

I need feedback on the design for uploading and delayed processing of a file using heroku, paperclip, delayed job and, if necessary, s3. Parts of it have been discussed in other places but I couldn't find a complete discussion anywhere.
Task description:
Upload file (using paperclip to s3/db on heroku). File needs to be private as it contains sensitive data.
Queue file for processing (delayed job)
Job gets run in queue
File is retrieved (from s3/db), and processing is completed
File is deleted (from s3/db)
Since I am using delayed job, I have to decide between storing the file in the database or on s3. I am assuming that storing the file on the web server is out of the question as I am using heroku and delayed job. Uploading files to s3 takes a long time. But, storing files in db is more expensive. Ideally, we would want the processing to finish as quickly as possible.
What is the more common design pattern? Store files on s3? Store files in db? Any particular recommended gems used to retrieve and process files stored in s3 (aws-s3? s3?)?
Heroku has a timeout of 30 seconds on any server request (learnt the hard way), so definitely storing files on s3 is a must.
Try carrierwave (carrierwave railscasts) instead of paperclip, as I prefer the added helpers that come onboard, plus there a number of great plugins, like carrierwave_direct for uploading large files to s3, which integrate nicely with carrierwave.
Delayed_job (railscasts - delayed_job) will work nicely for deleting files from s3 and any other background processing that may be required.
My gem file includes the following:
gem 'delayed_job'
gem "aws-s3", :require => 'aws/s3'
gem 'fog'
gem 'carrierwave'
gem 'carrierwave_direct'
fog gem is a nice way to have all your account info in a single place and sets up everything quite nicely. For the AWS gem how-to, good resource.
Here is a sample controller when submitting a form to upload (there are definitely better ways of doing this, but for illustrative purposes)
def create
#asset = Asset.new(:description => params[:description], :user_id => session[:id], :question_id => #question.id)
if #asset.save && #asset.update_attributes(:file_name => sanitize_filename(params[:uploadfile].original_filename, #asset.id))
AWS::S3::S3Object.store(sanitize_filename(params[:uploadfile].original_filename, #asset.id), params[:uploadfile].read, 'bucket_name', :access => :private, :content_type => params[:uploadfile].content_type)
if object.content_length.to_i < #question.emailatt.to_i.megabytes && object.content_length.to_i < 5.megabytes
url = AWS::S3::S3Object.url_for(sanitize_filename(params[:uploadfile].original_filename, #asset.id), 'bucket_name')
if #asset.update_attributes(:download_link => 1)
if Usermailer.delay({:run_at => 5.minutes.from_now}).attachment_user_mailer_download_notification(#asset, #question)
process_attachment_user_mailer_download(params[:uploadfile], #asset.id, 24.hours.from_now, #question.id)
flash[:notice] = "Thank you for the upload, we will notify this posts author"
end
end
end
else
#asset.destroy
flash[:notice] = "There was an error in processing your upload, please try again"
redirect_to(:controller => "questions", :action => "show", :id => #question.id)
end
end
private
def sanitize_filename(file_name, id)
just_filename = File.basename(file_name)
just_filename.sub(/[^\w\.\-]/,'_')
new_id = id.to_s
new_filename = "#{new_id}" + just_filename
end
def delete_process(uploadfile, asset_id, time, question_id)
asset = Asset.find(:first, :conditions => ["id = ?", asset_id])
if delete_file(uploadfile, asset_id, time) && asset.destroy
redirect_to(:controller => "questions", :action => "show", :id => question_id)
end
end
def process_attachment_user_mailer_download(uploadfile, asset_id, time, question_id)
asset = Asset.find(:first, :conditions => ["id = ?", asset_id])
if delete_file(uploadfile, asset_id, time) && #asset.delay({:run_at => time}).update_attributes(:download_link => 0)
redirect_to(:controller => "questions", :action => "show", :id => question_id)
end
end
#S3 METHODS FOR CREATE ACTION
#deletes the uploaded file from s3
def delete_file(uploadfile, asset_id, time)
AWS::S3::S3Object.delay({:run_at => time}).delete(sanitize_filename(uploadfile.original_filename, asset_id), 'bucket_name')
end
Lots of unnecessary code, I know (wrote this when I was starting with Rails). Hopefully it will give some idea of the processes involved in writing this type of app. Hope it helps.
For my part I'm using :
Delayed Job
Paperclip
Delayed Paperclip which uploads the original file
on S3 and create a delayed job with the custom post processing. It
can add a column to you model stating that the file is being
processed.
Only a few lines to set up. And you can do a lot with paperclip interpolations and generators.

Resources