ruby script to request videos and upload them to s3 - ruby

Problem: Transfer some videos from openstack(swift) to s3
Gems: fog, aws-sdk
I have an array of paths something like:
videos_paths = ["videos/attachments/5142/9f988f89ds9f8/lecture.mp4", "videos/attachments/3134/lecture2.mp4" ..... ]
I create urls for videos based on those paths.
My question is how can I "download" the video directly to S3 bucket and if there is any way to create a dir structure based on the video path.
E.g.
Video: https://myproject.com:443/v1/AUTH_a0fffc9ea361409795fb2e9736012940/production_videos/videos%2Fattachments%2F18116%2Fd6a5bd77a3b203cddsfb0c9d%2Foriginal%2Flecture.mp4?temp_url_sig=dce06f61775f24e88c80bed803b808668b073ed0&temp_url_expires=141243074
Workflow: Request video -> send it to S3 and store it in a similar dir structure
I accept any sugestion and ideas. If I can use other gem for this or if it can be done in another way.
Thanks,
I already checked:
1: Uploading Videos to S3 with Carrierwave and Fog
2: Upload videos to Amazon S3 using ruby with sinatra

Finally had time to finish this task before deadline :) If someone have a similar issue, I hope they can use something from this answer as inspiration.
#!/usr/bin/env ruby
require 'fog'
require 'aws-sdk'
require 'open-uri'
videos_paths = ["videos/attachments/5142/e01a339b41ce487643e85/original/lecture.mp4", "videos/attachments/5143/a4fa624f9324bd9988fcc/original/lecture-only.mp4", "videos/attachments/5144/95141978d5ecc14a1995fc/original/lecture.mp4", .... ] # 282 videos
fog_credentials = {
"hp_access_key" => "",
"hp_secret_key" => "",
"hp_tenant_id" => "",
"hp_auth_uri" => "",
"hp_use_upass_auth_style" => true,
"hp_avl_zone" => "",
"os_account_meta_temp_url_key" => "",
"persistent" => false
}
#storage = Fog::Storage::HP.new(fog_credentials) # Connect to fog storage
#my_time = 60 * 60 * 24 * 7 * 4 # 4 week links?
def make_temp_url(path, time = #my_time)
#storage.generate_object_temp_url("videos", path, time, "GET")
end
def status(path, options = {})
File.open('./stats.txt', 'a') { |file| file.puts "#{options[:msg]}: #{path}" }
end
s3 = AWS::S3.new(
:access_key_id => '',
:secret_access_key => ''
)
bucket = s3.buckets['']
videos_paths.each do |video_path|
cur_url = make_temp_url(video_path)
obj = bucket.objects[video_path]
if obj.exists?
status(video_path, msg: "Exists")
else
begin
open(cur_url, 'rb') do |video|
obj.write(video.read)
status(video_path, msg: "Success")
end
rescue
status(video_path, msg: "Error")
end
end
end

Related

How to retrieve CSV headers only from S3 [duplicate]

Below is the code I'm using to parse the CSV from within the app, but I want to parse a file located in a Amazon S3 bucket. It needs to work when pushed to Heroku as well.
namespace :csvimport do
desc "Import CSV Data to Inventory."
task :wiwt => :environment do
require 'csv'
csv_file_path = Rails.root.join('public', 'wiwt.csv.txt')
CSV.foreach(csv_file_path) do |row|
p = Wiwt.create!({
:user_id => row[0],
:date_worn => row[1],
:inventory_id => row[2],
})
end
end
end
There are cases with S3, when permissions on S3 Object disallow public access. In-built Ruby functions do assume a path is publicly accessible and don't account for AWS S3 specificity.
s3 = Aws::S3::Resource.new
bucket = s3.bucket("bucket_name_here")
str = bucket.object("file_path_here").get.body.string
content = CSV.parse(str, col_sep: "\t", headers: true).map(&:to_h)
Per-line explanation using AWS SDK:
Line 1. Initialize
Line 2. Choose a bucket.
Line 3. Choose an object and get it as a String.
Line 4. Effectively CSV.parse('the string'), but I also added a options and map over it just in case it helps you.
You can do it like this
CSV.new(open(path_to_s3)).each do |row|
...
end
This worked for me
open(s3_file_path) do |file|
CSV.foreach(file, {headers: true, header_converters: :symbol}) do |row|
Model.create(row.to_hash)
end
end
You can get the csv file from S3 like this:
require 'csv'
require 'net/http'
CSV.parse(Net::HTTP.get(s3_file_url), headers: true).each do |row|
# code for processing row here
end

AWS Ruby SDK CORE: Upload files to S3

I want to upload a file (any file, could be a .txt, .mp4, .mp3, .zip, .tar ...etc) to AWS S3 using AWS-SDK-CORE ruby SDK
Here is my code:
require 'aws-sdk-core'
Aws.config = {
:access_key_id => MY_ACCESS_KEY
:secret_access_key => MY_SECRET_KEY,
:region => 'us-west-2'
}
s3 = Aws::S3.new
resp = s3.put_object(
:bucket => "mybucket",
:key => "myfolder/upload_me.sql",
:body => "./upload_me.sql"
)
Now, Above code runs and creates a key myfolder/upload_me.sql which has only one line written and that is ./upload_me.sql which is wrong. The file upload_me.sql has several lines.
Expected behaviour is to upload the file upload_me.sql on S3 as mybucket/myfolder/upload_me.sql. But instead it just writes one line to mybucket/myfolder/upload_me.sql and that is ./upload_me.sql
Now, If I omit the :body part as below:
s3 = Aws::S3.new
resp = s3.put_object(
:bucket => "mybucket",
:key => "myfolder/upload_me.sql",
)
Then it just creates and empty key called mybucket/myfolder/upload_me.sql which is not even downloadable (well, even if it gets downloaded, it is useless)
Could you point me where I am going wrong?
Here is ruby-SDK-core documentation for put_object Method: http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/V20060301.html#put_object-instance_method
UPDATE:
If I try to upload the same file using AWS-CLI, it gets uploaded fine. Here is the command:
aws s3api put-object --bucket mybucket --key myfolder/upload_me.sql --body ./upload_me.sql
So, After spending a frustrating sunday afternoon on htis issue, I finally cracked it. What I really needed is :body => IO.read("./upload_me.sql")
So my code looks like below:
s3 = Aws::S3.new
resp = s3.put_object(
:bucket => "mybucket",
:key => "myfolder/upload_me.sql",
:body => IO.read("./upload_me.sql")
)
The body variable is the contents that will be written to S3. So if you send a file to S3 you need to manually load by using File.read("upload_me.sql") something similar.
s3 = Aws::S3.new
resp = s3.put_object(
:bucket => "mybucket",
:key => "myfolder/upload_me.sql",
:body => File.read("./upload_me.sql")
)
According to the documentation another way to do this is to use write on the bucket.
s3 = AWS::S3.new
key = File.basename(file_name)
s3.buckets["mybucket"].objects[key].write(:file => "upload_me.sql")
Another way would be
AWS.config(
:access_key_id => 'MY_ACCESS_KEY',
:secret_access_key => 'MY_SECRET_KEY',
)
#Set the filename
file_name = 'filename.txt'
#Set the bucket name
s3_bucket_name = 'my bucket name'
#If file has to go in some specific folder
bucket_directory = 'key or folder'
begin
s3 = AWS::S3.new
#Check if directory name has provided and Make an object in your bucket for your upload
if bucket_directory == ''
bucket_obj = s3.buckets[s3_bucket_name].objects[bucket_directory]
else
bucket_obj = s3.buckets[s3_bucket_name].objects["#{bucket_directory}/#{file_name}"]
end
# Upload the file
bucket_obj.write(:file => file_name)
puts "File was successfully uploaded : #{bucket_obj}"
rescue Exception => e
puts "There was an error in uploading file: #{e}"
end
Working Example
Reference
Probably the file wasn't found as the path is relative.
This is a strange behavior, where the interface try to make too many decisions.
I can assure you this works (v3):
client = Aws::S3::Client.new(...)
client.put_object(
body: './existing_file.txt',
bucket: 'kick-it',
key: 'test1.txt'
) # kick-it:/test1.txt contains the same as the contents of existing_file.txt
client.put_object(
body: './non_existing_file.txt',
bucket: 'kick-it',
key: 'test2.txt'
) # kick-it:/test2.txt contains just the string './non_existing_file.txt'
Using body for both cases is a bad decision, if you ask me.

send_file for a tempfile in Sinatra

I'm trying to use Sinatra's built-in send_file command but it doesn't seem to be working for tempfiles.
I basically do the following to zip an album of mp3s:
get '/example' do
songs = ...
file_name = "zip_test.zip"
t = Tempfile.new(['temp_zip', '.zip'])
# t = File.new("testfile.zip", "w")
Zip::ZipOutputStream.open(t.path) do |z|
songs.each do |song|
name = song.name
name += ".mp3" unless name.end_with?(".mp3")
z.put_next_entry(name)
z.print(open(song.url) {|f| f.read })
p song.name + ' added to file'
end
end
p t.path
p t.size
send_file t.path, :type => 'application/zip',
:disposition => 'attachment',
:filename => file_name,
:stream => false
t.close
t.unlink
end
When I use t = File.new(...) things work as expected, but I don't want to use File as it will have concurrency problems.
When I use t = Tempfile.new(...), I get:
!! Unexpected error while processing request: The file identified by body.to_path does not exist`
Edit: It looks like part of the problem is that I'm sending multiple files. If I just send one song, the Tempfile system works as well.
My guess is that you have a typo in one of your song-names, or maybe a slash in one of the last parts of song.url? I adopted your code and if all the songs exist, sending the zip as a tempfile works perfectly fine.

getting BadDigest error while trying to upload compressed file to s3 on ruby 1.9.3

as stated, i am trying to upload a file to s3
require 'digest/md5'
require 'base64'
require 'aws-sdk'
def digest f
f.rewind
Digest::MD5.new.tab do |dig|
f.each_chunk{|ch| dig << ch}
end.base64digest
ensure
f.rewind
end
file = File.new(compress file) #file zipped with zip/zip
total = file.size
digest = digest(file)
s3 = AWS::S3::new(:access_key_id => #access_key_id, :secret_access_key
=> #secret_access_key)
bucket = s3.buckets['mybucket']
bucket.objects["myfile"].write :content_md5 => digest, :content_length
=> total do |buf,len|
buf.write(file.read len)
end
but i constantly get AWS::S3::Errors::BadDigest exception
if i try to upload the file without passing :content_md5, everything goes well, archive downloads and opens correctly.
also as i just found out this fails on ruby 1.9.3 but works well on 1.9.2
fixed by changing digest func to
def digest f
Digest::MD5.file(f.path).base64digest
end
i think the issue was in the fact that the file passed to it was open

How to do the equivalent of 's3cmd ls s3://some_bucket/foo/bar' in Ruby?

How do I do the equivalent of 's3cmd ls s3://some_bucket/foo/bar' in Ruby?
I found the Amazon S3 gem for Ruby and also the Right AWS S3 library, but somehow it's not immediately obvious how to do a simple 'ls' like command on an S3 'folder' like location.
Using the aws gem this should do the trick:
s3 = Aws::S3.new(YOUR_ID, YOUR_SECTRET_KEY)
bucket = s3.bucket('some_bucket')
bucket.keys('prefix' => 'foo/bar')
I found a similar question here: Listing directories at a given level in Amazon S3
Based on that I created a method that behaves as much as possible as 's3cmd ls <path>':
require 'right_aws'
module RightAws
class S3
class Bucket
def list(prefix, delimiter = '/')
list = []
#s3.interface.incrementally_list_bucket(#name, {'prefix' => prefix, 'delimiter' => delimiter}) do |item|
if item[:contents].empty?
list << item[:common_prefixes]
else
list << item[:contents].map{|n| n[:key]}
end
end
list.flatten
end
end
end
end
s3 = RightAws::S3.new(ID, SECRET_KEY)
bucket = s3.bucket('some_bucket')
puts bucket.list('foo/bar/').inspect
In case some looks for the answer to this question for the aws-sdk version 2, you can very easily do this this way:
creds = Aws::SharedCredentials.new(profile_name: 'my_credentials')
s3_client = Aws::S3::Client.new(region: 'us-east-1',
credentials: creds)
response = s3_client.list_objects(bucket: "mybucket",
delimiter: "/")
Now, if you do
response.common_prefixes
It will give you the "Folders" of that particular subdirectory, and if you do
response.contents
It will have the files of that particular directory
The official Ruby AWS SDK now supports this: http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/Tree.html
You can also add the following convenience method:
class AWS::S3::Bucket
def ls(path)
as_tree(:prefix => path).children.select(&:branch?).map(&:prefix)
end
end
Then use it like this:
mybucket.ls 'foo/bar' # => ["/foo/bar/dir1/", "/foo/bar/dir2/"]
a quick and simple method to list files in a bucket folder using the ruby aws-sdk:
require 'aws-sdk'
s3 = AWS::S3.new
your_bucket = s3.buckets['bucket_o_files']
your_bucket.objects.with_prefix('lots/of/files/in/2014/09/03/').each do |file|
puts file.key
end
Notice the '/' at the end of the key, it is important.
I like the Idea of opening the Bucket class and adding a 'ls' method.
I would have done it like this...
class AWS::S3::Bucket
def ls(path)
objects.with_prefix("#{path}").as_tree.children.select(&:leaf?).collect(&:member).collect(&:key)
end
end
s3 = AWS::S3.new
your_bucket = s3.buckets['bucket_o_files']
your_bucket.ls('lots/of/files/in/2014/09/03/')

Resources