How to get the real file from S3 using CarrierWave - ruby

I have an application that reads the content of a file and indexes it. I was storing them in the disk itself, but now I'm using Amazon S3, so the following method doesn't work anymore.
It was something like this:
def perform(docId)
#document = Document.find(docId)
if #document.file?
#You should't create a new version
#document.versionless do |doc|
#document.file_content = Cloudoc::Extractor.new.extract(#document.file.file)
#document.save
end
end
end
#document.file returns the FileUploader, and doc.file.file returns the CarrierWave::Storage::Fog::File class.
How can I get the real file?

Calling #document.file.read will get you the contents of the file from S3 in Carrierwave.

Related

Is there a way to write a file to S3 in Ruby or Rails?

I am a user of Sidekiq with a use case that requires very heavy logging and file writing. I have some old workers that write directly to the file system. This isn't good because it keeps me from being able to spin up several small utility instances as needed. It has been recommended to me to instead write the files to S3.
Some of these files are pretty large, up to millions of lines in the case of some reports. Is there a way to buffer output to a file on S3?
As per the description shared, you will be needing the below mentioned gem for uploading resources to S3
gem 'aws-sdk'
Below mentioned class shows the method to initialize, store and get the public_url of the resource stored.
class S3Store
TEST = "app-uploads".freeze
def initialize file
#file = file
#s3 = AWS::S3.new
#bucket = #s3.buckets[TEST]
end
def store
#obj = #bucket.objects[filename].write(#file.tempfile, acl: :public_read)
self
end
def url
#obj.public_url.to_s
end
private
def filename
#filename ||= #file.original_filename.gsub(/[^a-zA-Z0-9_\.]/, '_')
end
end
Below mentioned code shows the calling method of above class.
image = S3Store.new(File.read(path_to_file)).store

Carrierwave filename method creating issue when uploading file to s3

I have an ImageUploader and I want to upload an image to S3.
Also, I would like to change file name using filename method.
Here is the code:
class ImageUploader < CarrierWave::Uploader::Base
storage :fog
def store_dir
"images"
end
def filename
"#{model.id}_#{SecureRandom.urlsafe_base64(5)}.#{file.extension}" if original_filename
end
end
First time when I save an image, it gets a correct file name, e.g 1_23434.png but when I get the model object from the console, it returns a different image name.
Is there anyone here who can help me? It works fine when I don't use fog.
The problem is in the filename method. On every call, it returns a different value. This is because SecureRandom.urlsafe_base64(5) generates a random string (and it isn't cached). filename is also used under the hood to build path-related strings by CarrierWave. This is why you are getting different image name when you run object.image.filename from the console.
The method that you are looking for is image_identifier (where image prefix is under what name your uploader is mounted).
You can try something like:
object.public_send("#{object.image.mounted_as}_identifier") || generate_unique_name
where generate_unique_name is your current filename implementation. Another approach is storing the hash in the model itself for the future use.
Also, the official wiki page about creating random and unique filenames might be useful for you.

Reading in gzipped data from S3 in Ruby

My company has data messages (json) stored in gzipped files on Amazon S3. I want to use Ruby to iterate through the files and do some analytics. I started to use the 'aws/s3' gem, and get get each file as an object:
#<AWS::S3::S3Object:0x4xxx4760 '/my.company.archive/data/msg/20131030093336.json.gz'>
But once I have this object, I do not know how to unzip it or even access the data inside of it.
You can see the documentation for S3Object here: http://amazon.rubyforge.org/doc/classes/AWS/S3/S3Object.html.
You can fetch the content by calling your_object.value; see if you can get that far. Then it should be a question of unpacking the gzip blob. Zlib should be able to handle that.
I'm not sure if .value returns you a big string of binary data or an IO object. If it's a string, you can wrap it in a StringIO object to pass it to Zlib::GzipReader.new, e.g.
json_data = Zlib::GzipReader.new(StringIO.new(your_object.value)).read
S3Object has a stream method, which I would hope behaves like a IO object (I can't test that here, sorry). If so, you could do this:
json_data = Zlib::GzipReader.new(your_object.stream).read
Once you have the unzipped json content, you can just call JSON.parse on it, e.g.
JSON.parse Zlib::GzipReader.new(StringIO.new(your_object.value)).read
For me the below set of steps worked:
Step to read and write the csv.gz from S3 client to local file
Open the local csv.gz file using gzipreader and read csv from it
file_path = "/tmp/gz/x.csv.gz"
File.open(file_path, mode="wb") do |f|
s3_client.get_object(bucket: bucket, key: key) do |gzfiledata|
f.write gzfiledata
end
end
data = []
Zlib::GzipReader.open(file_path) do |gz_reader|
csv_reader = ::FastestCSV.new(gz_reader)
csv_reader.each do |csv|
data << csv
end
end
The S3Object documentation is updated and the stream method is no longer available: https://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html
So, the best way to read data from an S3 object would be this:
json_data = Zlib::GzipReader.new(StringIO.new(your_object.read)).read

amazon s3 and carrierwave random image name in bucket does not match in database

I'm using carrier wave, rails and amazon s3. Every time I save an image, the image shows up in s3 and I can see it in the management console with the name like this:
https://s3.amazonaws.com/bucket-name/
uploads/images/10/888fdcfdd6f0eeea_1351389576.png
But in the model, the name is this:
https://bucket-name.s3.amazonaws.com/
uploads/images/10/b3ca26c2baa3b857_1351389576.png
First off, why is the random name different? I am generating it in the uploader like so:
def filename
if original_filename
"#{SecureRandom::hex(8)}_#{Time.now.to_i}#{File.extname(original_filename).downcase}"
end
end
I know it is not generating a random string every call because the wrong url in the model is consistent and saved. Somewhere in the process a new one must be getting generated to save in the model after the image name has been saved and sent to amazon s3. Strange.
Also, can I have the url match the one in terms of s3/bucket instead of bucket.s3 without using a regex? Is there an option in carrierwave or something for that?
CarrierWave by default doesn't store the URL. Instead, it generates it every time you need it.
So, every time filename is called it will return a different value, because of Time.now.to_i.
Use created_at column instead, or add a new column for storing the random id or the full filename.
I solved it by saving the filename if it was still the original filename. In the uploader, put:
def filename
if original_filename && original_filename == #filename
#filename = "#{any_string}#{File.extname(original_filename).downcase}"
else
#filename
end
end
The issue of the sumbdomain versus the path is not actually an issue. It works with the subdomain. I.e. https://s3.amazonaws.com/bucket-name/ and https://bucket-name.s3.amazonaws.com/ both work fine.

Convert file upload contents to a binary file without saving (Rails)

I have a rails 3 app where I am using the 'face' gem to reference the Face.com API. The api method takes a parameter of the form:
:file => File.new(path_to_file, 'rb')
which works.
I am trying to change the flow of the app so that the file can be uploaded via a form, do some work with RMagick and then make the API call, all without saving the file to disk.
I can generate the RMagick 'Image' with
image = Magick::Image.from_blob(upload_image_field.read)
I can then manipulate the file with RMagick and even save the results into the database with:
self.data = image.to_blob #normally 'upload_image_field.read' if not using RMagick
My problem is that I can't change the image file (or the blob) into something that the API will recognize (without saving it to disk and then referencing the file on disk).
For example using this in the API method fails:
:file => image.to_blob
How do I convert he blob into the same format as
File.new(path_to_file, 'rb')
Thanks
OK, I could be wrong on this one... but I wanted to dig this up. Unfortunately, you just have to live with saving it as a file. The reason is because the API makes an HTTP POST. Unfortunately, this needs to be a file.
References from: [https://github.com/rociiu/face/tree/master/lib/face]:
recognition.rb:
def faces_detect(opts={})
opts.assert_valid_keys(:urls, :file, :detector, :attributes, :callback, :callback_url)
make_request(:faces_detect, opts)
end
utils.rb:
def make_request(api_method, opts={})
....
response = JSON.parse( RestClient.post(API_METHODS[ api_method ], opts.merge(api_crendential)).body )
....
end
So, why is it a problem to save to a file then?

Resources