uploading folder to s3 with ruby sdk - ruby

I have a script that is supposed to upload a local folder to s3 using aws-sdk and ruby.
As much as I understand from ruby, the files need to be uploaded one by one, so here is the code used:
require 'aws-sdk'
require 'open3'
s3_bucket = ARGV[0]
debug = ARGV[1] || nil
#s3 = Aws::S3::Client.new(region: 'eu-west-1')
files = Dir[ File.join('srv', '**', '*') ].reject { |p| File.directory? p }
files.each do |f|
o, e, s = Open3.capture3("gio info -a standard::content-type #{f}")
abort(e) unless s.to_s.match(/exit 0/)
content_type = o.split('standard::content-type: ')[1].strip
s3_key = f.split('srv/lila/')[1]
puts "Uploading #{f} with content-type #{content_type}" if debug
File.open(f,'rb') do |file|
#s3.put_object({body: file, content_type: content_type, bucket: s3_bucket, key: s3_key})
end
end
My local file name is like this: srv/lila/1.1.1/somename/index.html
Somehow, only the file name is uploaded, and not the content.So when I go to the URL, I can see the name of the file as the content srv/lila/1.1.1/somename/index.html. My ruby knowledge is limited and I am not sure what is wrong in this script. Can you help please?

Your issue is this line:
resp = #s3.put_object({body: f, content_type: content_type, bucket: s3_bucket, key: s3_key})
In this case f is not a File but rather a String that represents the path to a file.
body: accepts a String, StringIO or File object as an argument. In this case you are passing a String and it treats that as the contents of the uploaded file.
Instead I would recommend the following alteration:
File.open(f,'rb') do |file|
#s3.put_object({body: file, content_type: content_type, bucket: s3_bucket, key: s3_key})
end
Now file is an actual File object.
I also removed resp as that local variable did not serve a purpose.

Related

RubyZip docx issues with write_buffer instead of open

I'm adapting the RubyZip recursive zipping example (found here) to work with write_buffer instead of open and am coming across a host of issues. I'm doing this because the zip archive I'm producing has word documents in it and I'm getting errors on opening those word documents. Therefore, I'm trying the work-around that RubyZip suggests, which is using write_buffer instead of open (example found here).
The problem is, I'm getting errors because I'm using an absolute path, but I'm not sure how to get around that. I'm getting the error "#//', name must not start with />"
Second, I'm not sure what to do to mitigate the issue with word documents. When I used my original code, which worked and created an actual zip file, any word document in that zip file had the following error upon opening: "Word found unreadable content in Do you want to recover the contents of this document? If you trust the source of this document, click Yes." The unreadable content error is the reason why I went down the road of attempting to use write_buffer.
Any help would be appreciated.
Here is the code that I'm currently using:
require 'zip'
require 'zip/zipfilesystem'
module AdvisoryBoard
class ZipService
def initialize(input_dir, output_file)
#input_dir = input_dir
#output_file = output_file
end
# Zip the input directory.
def write
entries = Dir.entries(#input_dir) - %w[. ..]
path = ""
buffer = Zip::ZipOutputStream.write_buffer do |zipfile|
entries.each do |e|
zipfile_path = path == '' ? e : File.join(path, e)
disk_file_path = File.join(#input_dir, zipfile_path)
#file = nil
#data = nil
if !File.directory?(disk_file_path)
#file = File.open(disk_file_path, "r+b")
#data = #file.read
unless [#output_file, #input_dir].include?(e)
zipfile.put_next_entry(e)
zipfile.write #data
end
#file.close
end
end
zipfile.put_next_entry(#output_file)
zipfile.put_next_entry(#input_dir)
end
File.open(#output_file, "wb") { |f| f.write(buffer.string) }
end
end
end
I was able to get word documents to open without any warnings or corruption! Here's what I ended up doing:
require 'nokogiri'
require 'zip'
require 'zip/zipfilesystem'
class ZipService
# Initialize with the directory to zip and the location of the output archive.
def initialize(input_dir, output_file)
#input_dir = input_dir
#output_file = output_file
end
# Zip the input directory.
def write
entries = Dir.entries(#input_dir) - %w[. ..]
::Zip::File.open(#output_file, ::Zip::File::CREATE) do |zipfile|
write_entries entries, '', zipfile
end
end
private
# A helper method to make the recursion work.
def write_entries(entries, path, zipfile)
entries.each do |e|
zipfile_path = path == '' ? e : File.join(path, e)
disk_file_path = File.join(#input_dir, zipfile_path)
if File.directory? disk_file_path
recursively_deflate_directory(disk_file_path, zipfile, zipfile_path)
else
put_into_archive(disk_file_path, zipfile, zipfile_path, e)
end
end
end
def recursively_deflate_directory(disk_file_path, zipfile, zipfile_path)
zipfile.mkdir zipfile_path
subdir = Dir.entries(disk_file_path) - %w[. ..]
write_entries subdir, zipfile_path, zipfile
end
def put_into_archive(disk_file_path, zipfile, zipfile_path, entry)
if File.extname(zipfile_path) == ".docx"
Zip::File.open(disk_file_path) do |zip|
doc = zip.read("word/document.xml")
xml = Nokogiri::XML.parse(doc)
zip.get_output_stream("word/document.xml") {|f| f.write(xml.to_s)}
end
zipfile.add(zipfile_path, disk_file_path)
else
zipfile.add(zipfile_path, disk_file_path)
end
end
end

rubyzip: open zip, modify it temporary, send to client

i want to temporary modify a zip file and send the changed file to the client.
right now i create a file stream and send it:
require 'zip'
zip_stream = Zip::OutputStream.write_buffer do |zip|
zip.put_next_entry 'new_folder/file'
zip.print "some text"
end
zip_stream.rewind
send_data zip_stream.read, type: 'application/zip', disposition: 'attachment', filename: 'thing.zip'
i dont get how i can open a existing zip in the filesystem and put additional file in it and send it without saving it do the disk.
can you give me a hint?
in the end i did it like this:
require 'zip'
zip_stream = Zip::OutputStream.write_buffer do |new_zip|
existing_zip = Zip::File.open('existing.zip')
existing_zip.entries.each do |e|
new_zip.put_next_entry(e.name)
new_zip.write e.get_input_stream.read
end
new_zip.put_next_entry 'new_file'
new_zip.print "text"
end
Check this https://github.com/rubyzip/rubyzip
require 'rubygems'
require 'zip'
folder = "Users/me/Desktop/stuff_to_zip"
input_filenames = ['image.jpg', 'description.txt', 'stats.csv']
zipfile_name = "/Users/me/Desktop/archive.zip"
Zip::File.open(zipfile_name, Zip::File::CREATE) do |zipfile|
input_filenames.each do |filename|
# Two arguments:
# - The name of the file as it will appear in the archive
# - The original file, including the path to find it
zipfile.add(filename, File.join(folder, filename))
end
zipfile.get_output_stream("myFile") { |f| f.write "myFile contains just this" }
end

Ruby Build Hash from file

I'm consuming a web-service and using Savon to do +-1000 (paid) requests and parse the requests to a csv file.
I save the xml.hash response in a file if the parsing failed.
How can I initialize an hash that was saved to a file? (or should I save in XML and then let savon make it into a hash it again?
Extra info:
client = Savon.client do
wsdl "url"
end
response = client.call(:read_request) do
message "dat:number" => number
end
I use the response.hash to build/parse my csv data. Ex:
name = response.hash[:description][:name]
If the building failed I'm thinking about saving the response.hash to a file. But the problem is I don't know how to reuse the saved response (XML/Hash) so that an updated version of the building/parsing can be run using the saved response.
You want to serialize the Hash to a file then deserialize it back again.
You can do it in text with YAML or JSON and in a binary via Marshal.
Marshal
def serialize_marshal filepath, object
File.open( filepath, "wb" ) {|f| Marshal.dump object, f }
end
def deserialize_marshal filepath
File.open( filepath, "rb") {|f| Marshal.load(f)}
end
Marshaled data has a major and minor version number written with it, so it's not guaranteed to always load in another Ruby if the Marshal data version changes.
YAML
require 'yaml'
def serialize_yaml filepath, object
File.open( filepath, "w" ) {|f| YAML.dump object, f }
end
def deserialize_yaml filepath
File.open( filepath, "r") {|f| YAML.load(f) }
end
JSON
require 'json'
def serialize_json filepath, object
File.open( filepath, "w" ) {|f| JSON.dump object, f }
end
def deserialize_json filepath
File.open( filepath, "r") {|f| JSON.load(f)}
end
Anecdotally, YAML is slow, Marshal and JSON are quick.
If your code is expecting to use/manipulate a ruby hash as demonstrated above, then if you want to save the Savon response, then use the json gem and do something like:
require 'json'
File.open("responseX.json","w") do |f|
f << response.hash.to_json
end
Then if you need to read that file to recreate your response hash:
File.open('responseX.json').each do |line|
reponseHash = JSON.parse(line)
# do something with responseHash
end

How to copy an entire "folder" to another path using S3 with sdk?

When I do for a single file it works:
aws_s3 = AWS::S3.new(S3_CONFIG)
bucket = aws_s3.buckets[S3_CONFIG["bucket"]]
object = bucket.objects["user/1/photos/image_1.jpg"]
new_object = bucket.objects["users/1/photos/image_1.jpg"]
object.copy_to new_object, {:acl => :public_read}
But I want to move the entire "/photos" folder throws No Such Key. Probably the s3 keys are only the full path for each file. How to do that?
aws_s3 = AWS::S3.new(S3_CONFIG)
bucket = aws_s3.buckets[S3_CONFIG["bucket"]]
object = bucket.objects["user/1/photos"]
new_object = bucket.objects["users/1/photos"]
object.copy_to new_object, {:acl => :public_read}
Thanks!
Did it:
bucket.objects.with_prefix("user/1/photos").each do |object|
...
end
I needed additional code to get this working. Basically chop off the base from the source prefix, then add that to the destination prefix:
def copy_files_s3(bucket_name, source, destination)
source_bucket = #s3.buckets[bucket_name]
source_bucket.objects.with_prefix(source).each do |source_object|
new_file_name = source_object.key.dup
new_file_name.slice! source
new_object = source_bucket.objects["#{destination}#{new_file_name}"]
source_object.copy_to new_object, {acl: :public_read}
end
end
A "folder" is not an object in S3, that is why you can not get it by key, but the folder path is actually a prefix for all the keys of the objects contained by the folder.
Another important thing, you have to url encode the keys otherwise you may end up with an unknown key error.
require 'aws-sdk'
require 'aws-sdk-s3'
require 'securerandom'
require 'uri'
require "erb"
include ERB::Util
def copy_folder(folder, destination)
bucket_name = 'your_bucket'
credentials = Aws::Credentials.new('key', 'secret')
s3_client = Aws::S3::Client.new(region:'the_region', credentials: credentials)
enumerate_keys_with_prefix(source).each do |source_object|
source_key = url_encode(source_object.key)
destination_key = source_object.key.dup.sub(source, "")
s3_client.copy_object({bucket: bucket_name, copy_source: bucket_name+'/'+source_key, key: destination+'/'+destination_key, acl: "public-read"})
end
end
def enumerate_keys_with_prefix(prefix)
bucket_name = 'your_bucket'
credentials = Aws::Credentials.new('key', 'secret')
s3 = Aws::S3::Resource.new(region:'the_region', credentials:credentials)
return s3.bucket(bucket_name).objects(prefix: prefix)
end

Using aws-sdk to download files from s3. Encoding not right

I am trying to use aws-sdk to load s3 files to local disk, and question why my pdf file (which just has a text saying SAMPLE PDF) turns out with an apparently empty content.
I guess it has something to do with the encoding...but how can i fix it?
Here is my code :
require 'aws-sdk'
bucket_name = "****"
access_key_id = "***"
secret_access_key = "**"
s3=AWS::S3.new(
access_key_id: access_key_id,
secret_access_key: secret_access_key)
b = s3.buckets[bucket_name]
filen = File.basename("Sample.pdf")
path = "original/90/#{filen}"
o = b.objects[path]
require 'tempfile'
ext= File.extname(filen)
file = File.open("test.pdf","w", encoding: "ascii-8bit")
# streaming download from S3 to a file on disk
begin
file.write(o.read) do |chunk|
file.write(chunk)
end
end
file.close
If i take out the encoding: "ascii-8bit", i just get an error message Encoding::UndefinedConversionError: "\xC3" from ASCII-8BIT to UTF-8
After some research and a tip from a cousin of mine, i finally got this to work.
Instead of using the aws solution to load the file from amazon and write it to disk (which was generating a strange pdf file : apparently equal to the original, but with blank content, and Adobe Reader "fixing" it when opening)
i instead am now using open-uri, with SSL ignore.
Here is the final code which made my day :
require 'open-uri'
open('test.pdf', 'wb') do |file|
file << open('https://s3.amazon.com/mybucket/Sample.pdf',:ssl_verify_mode => OpenSSL::SSL::VERIFY_NONE).read
end

Resources