Upload CSV payload to Google Storage using Ruby - ruby

I am trying to upload string payload to Google Storage directly using Ruby. But, it seems there's no direct way to do this without creating a temporary file in the disk.
I am using the CSV library to generate a string payload.
Current method suggests to store the string payload in a temporary file and then use code something like below to upload the file to google storage:
require "google/cloud/storage"
storage = Google::Cloud::Storage.new
bucket = storage.bucket bucket_name
file = bucket.create_file local_file_path, file_name
Is there a way to avoid creating a temporary file to upload?

I found the documentation where it states that we can use any File-like object such as StringIO to upload string payload directly.
Here's my code:
require "google/cloud/storage"
storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-todo-app"
bucket.create_file StringIO.new("Hello world!"), "hello-world.txt"
Go to Creating a File section of this link for example
Documentation Link:
https://googleapis.dev/ruby/google-cloud-storage/latest/Google/Cloud/Storage/Bucket.html#create_file-instance_method

Related

AWS S3 bucket - Moving all xmls files form one S3 bucker to Another S3 bucker using Python lamda

In my case, I wanted to read all XML form my s3bucket/ parsing then move all parsed files to the same s3Bucker/
for me parsing logic is working fine but I am not able to move all files.
this is the example I am trying to use
**s3 = boto3.resource('s3')
src_bucket = s3.Bucket('bucket1')
dest_bucket = s3.Bucket('bucket2')
for obj in src_bucket.objects.all():
filename= obj.key.split('/')[-1]
dest_bucket.put_object(Key='sample/' + filename, Body=obj.get()["Body"].read())**
above code is not working for me at all (I have to give s3 folder full access and for testing given public full access as well).
Thanks
Check out this answer. You could use python endshwith() function and pass ".xml" to it, get a list of those files and copy them to the destination bucket and then delete them from the source bucket.

Manually populate an ImageField

I have a models.ImageField which I sometimes populate with the corresponding forms.ImageField. Sometimes, instead of using a form, I want to update the image field with an ajax POST. I am passing both the image filename, and the image content (base64 encoded), so that in my api view I have everything I need. But I do not really know how to do this manually, since I have always relied in form processing, which automatically populates the models.ImageField.
How can I manually populate the models.ImageField having the filename and the file contents?
EDIT
I have reached the following status:
instance.image.save(file_name, File(StringIO(data)))
instance.save()
And this is updating the file reference, using the right value configured in upload_to in the ImageField.
But it is not saving the image. I would have imagined that the first .save call would:
Generate a file name in the configured storage
Save the file contents to the selected file, including handling of any kind of storage configured for this ImageField (local FS, Amazon S3, or whatever)
Update the reference to the file in the ImageField
And the second .save would actually save the updated instance to the database.
What am I doing wrong? How can I make sure that the new image content is actually written to disk, in the automatically generated file name?
EDIT2
I have a very unsatisfactory workaround, which is working but is very limited. This illustrates the problems that using the ImageField directly would solve:
# TODO: workaround because I do not yet know how to correctly populate the ImageField
# This is very limited because:
# - only uses local filesystem (no AWS S3, ...)
# - does not provide the advance splitting provided by upload_to
local_file = os.path.join(settings.MEDIA_ROOT, file_name)
with open(local_file, 'wb') as f:
f.write(data)
instance.image = file_name
instance.save()
EDIT3
So, after some more playing around I have discovered that my first implementation is doing the right thing, but silently failing if the passed data has the wrong format (I was mistakingly passing the base64 instead of the decoded data). I'll post this as a solution
Just save the file and the instance:
instance.image.save(file_name, File(StringIO(data)))
instance.save()
No idea where the docs for this usecase are.
You can use InMemoryUploadedFile directly to save data:
file = cStringIO.StringIO(base64.b64decode(request.POST['file']))
image = InMemoryUploadedFile(file,
field_name='file',
name=request.POST['name'],
content_type="image/jpeg",
size=sys.getsizeof(file),
charset=None)
instance.image = image
instance.save()

Reading in gzipped data from S3 in Ruby

My company has data messages (json) stored in gzipped files on Amazon S3. I want to use Ruby to iterate through the files and do some analytics. I started to use the 'aws/s3' gem, and get get each file as an object:
#<AWS::S3::S3Object:0x4xxx4760 '/my.company.archive/data/msg/20131030093336.json.gz'>
But once I have this object, I do not know how to unzip it or even access the data inside of it.
You can see the documentation for S3Object here: http://amazon.rubyforge.org/doc/classes/AWS/S3/S3Object.html.
You can fetch the content by calling your_object.value; see if you can get that far. Then it should be a question of unpacking the gzip blob. Zlib should be able to handle that.
I'm not sure if .value returns you a big string of binary data or an IO object. If it's a string, you can wrap it in a StringIO object to pass it to Zlib::GzipReader.new, e.g.
json_data = Zlib::GzipReader.new(StringIO.new(your_object.value)).read
S3Object has a stream method, which I would hope behaves like a IO object (I can't test that here, sorry). If so, you could do this:
json_data = Zlib::GzipReader.new(your_object.stream).read
Once you have the unzipped json content, you can just call JSON.parse on it, e.g.
JSON.parse Zlib::GzipReader.new(StringIO.new(your_object.value)).read
For me the below set of steps worked:
Step to read and write the csv.gz from S3 client to local file
Open the local csv.gz file using gzipreader and read csv from it
file_path = "/tmp/gz/x.csv.gz"
File.open(file_path, mode="wb") do |f|
s3_client.get_object(bucket: bucket, key: key) do |gzfiledata|
f.write gzfiledata
end
end
data = []
Zlib::GzipReader.open(file_path) do |gz_reader|
csv_reader = ::FastestCSV.new(gz_reader)
csv_reader.each do |csv|
data << csv
end
end
The S3Object documentation is updated and the stream method is no longer available: https://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html
So, the best way to read data from an S3 object would be this:
json_data = Zlib::GzipReader.new(StringIO.new(your_object.read)).read

SharpGS how to download file?

I am using SharpGS for google cloud storage. I could upload file using the
GetBucket("some-bucket").AddObject() method but I could not download the file using the following code
GetBucket("some-bucket").GetObjectHead("some-file").Content
It gave me null value for the byte return
any idea?
thanks
The GetObjectHead looks up the object using a HEAD request, so it doesn't retrieve the content.
If you take a look at the demo code, you can retrieve object contents by listing the bucket:
var bucket = GetBucket("some-bucket");
foreach (var o in bucket.Objects) {
Console.WriteLine(Encoding.UTF8.GetString(o.Retrieve().Content));
}
There doesn't seem to be a way to get an IObject without listing the bucket. I would suggest adding a method to the IObjectContent class returned from GetObjectHead to fetch the IObject. The project is on GitHub.

How to get the real file from S3 using CarrierWave

I have an application that reads the content of a file and indexes it. I was storing them in the disk itself, but now I'm using Amazon S3, so the following method doesn't work anymore.
It was something like this:
def perform(docId)
#document = Document.find(docId)
if #document.file?
#You should't create a new version
#document.versionless do |doc|
#document.file_content = Cloudoc::Extractor.new.extract(#document.file.file)
#document.save
end
end
end
#document.file returns the FileUploader, and doc.file.file returns the CarrierWave::Storage::Fog::File class.
How can I get the real file?
Calling #document.file.read will get you the contents of the file from S3 in Carrierwave.

Resources