I am trying to sync files using the go library for rclone.
https://github.com/rclone/rclone/blob/master/fs/sync/sync.go
err = sync.Sync(ctx, dstFs, srcFs, false)
This method seems to only return an error while I'm trying to get additional metadata for bytes transferred by the sync. I've looked at the other methods and I've not seen any that returns anything but an error, how do I get the bytes transferred by rclone go lib via a sync?
I'm not sure what to try.
Related
We wanted to download files from remote-url into memory and then upload it to some public cloud. I am planning to use copy_stream lib in ruby. However I am not sure if it can be achieved by this, because I need to also maintain the memory and CPU stats in such a way that it will not hamper the performance.
Any suggestion or example how to achieve this via copy_stream lib in ruby or do we have any other lib to achieve this considering the performance.
https://ruby-doc.org/core-2.5.5/IO.html
You can setup src/dst to be simple IO abstractions that respond to read/write:
src = IO.popen(["ssh", srchost, "cat /path/to/source_file | gzip"], "r")
dst = IO.popen(["ssh", dsthost, "gunzip > /path/to/dest_file"], "w")
IO.copy_stream(src, dst)
src.close
dst.close
Set up src to be the downloadable file.
Set up dst to be the cloud resource, with write permission.
Make sure the two are compliant with sendfile().
Sendfile is a kernel based copy stream procedure. In terms of ram use and performance, there is nothing faster. You application will not be involved with the transfer.
For sendfile(), the output socket must have zero-copy support and the input file must have mmap() support. In general, this means you have already downloaded the file to a local file, you do not change the downloaded file during the copy, and you have an open socket to the output.
My goal is to download a large zip file (15 GB) and extract it to Google Cloud using Laravel Storage (https://laravel.com/docs/8.x/filesystem) and https://github.com/spatie/laravel-google-cloud-storage.
My "wish" is to sort of stream the file to Cloud Storage, so I do not need to store the file locally on my server (because it is running in multiple instances, and I want to have the disk size as small as possible).
Currently, there does not seem to be a way to do this without having to save the zip file on the server. Which is not ideal in my situation.
Another idea is to use a Google Cloud Function (eg with Python) to download, extract and store the file. However, it seems like Google Cloud Functions are limited to a max timeout of 9 mins (540 seconds). I don't think that will be enough time to download and extract 15GB...
Any ideas on how to approach this?
You should be able to use streams for uploading big files. Here’s the example code to achieve it:
$disk = Storage::disk('gcs');
$disk->put($destFile, fopen($sourceZipFile, 'r+'));
There are already several articles about starting downloads from flutter web.
I link this answer as example:
https://stackoverflow.com/a/64075629/15537341
The procedure is always similar: Request something from a server, maybe convert the body bytes to base64 and than use the AnchorElement to start the download.
It works perfectly for small files. Let's say, 30MB, no problem.
The whole file has to be loaded into the browser first, than the user starts the download.
What do to if the file is 10GB?
Is there a way to read a stream from the server and write a stream to the users download? Or is an other way preferable like to copy the file to a special folder that is directly hosted by the webserver?
So I am trying to upload a file with Celery that uses Redis on my Heroku website. I am trying to upload a .exe type file with the size of 20MB. Heroku is saying in they're hobby: dev section that the max memory that could be uploaded is 25MB. But I, who is trying to upload a file in Celery(turning it from bytes to base64, decoding it and sending it to the function) is getting kombu.exceptions.OperationalError: OOM command not allowed when used memory > 'maxmemory'. error. Keep in mind when I try to upload for e.g a 5MB file it works fine. But 20MB doesn't. I am using Python with the Flask framework
There are two ways to store files in DB (Redis is just an in-memory DB). You can either store a blob in the DB (for small files, say a few KBs), or you can store the file in memory and store a pointer to the file in DB.
So for your case, store the file on disk and place only the file pointer in the DB.
The catch here is that Heroku has a Ephemeral file system that gets erased every 24 hours, or whenever you deploy new version of the app.
So you'll have to do something like this:
Write a small function to store the file on the local disk (this is temporary storage) and return the path to the file
Add a task to Celery with the file path i.e. the parameter to the Celery task will be the "file-path" not a serialized blob of 20MB data.
The Celery worker process picks the task you just enqueued when it gets free and executes it.
If you need to access the file later, and since the local heroku disk only has temporary, you'll have to place the file in some permanent storage like AWS S3.
(The reason we go through all these hoops and not place the file directly in S3 is because access to local disk is fast while S3 disks might be in some other server farm at some other location and it takes time to save the file there. And your web process might appear slow/stuck if you try to write the file to S3 in your main process.)
Environment:
Windows 10 x64
Ruby 2.1.0 32 bit
Chef 12.12.15
Azure Gem 0.7.9
Azure-Storage Gem 0.12.1.preview
I am trying to download a ~880MB blob from a container. When I do, it throws the following error after the Ruby process hits ~500MB in size:
C:/opscode/chefdk/embedded/lib/ruby/2.1.0/net/protocol.rb:102:in `read': failed to allocate memory (NoMemoryError)
I have tried this both inside and outside of Ruby, and with both the Azure gem and the Azure-Storage gem. The result is the same with all four combinations (Azure in Chef, Azure in Ruby, Azure-Storage in Chef, Azure-Storage in Ruby).
Most of the troubleshooting I have found for these kinds of problems suggests streaming or chunking the download, but there does not appear to be a corresponding method or get_blob option to do so.
Code:
require 'azure/storage'
# vars
account_name = "myacct"
container_name = "myfiles"
access_key = "mykey"
installs_dir = "myinstalls"
# directory for files
create_dir = 'c:/' + installs_dir
Dir.mkdir(create_dir) unless File.exists?(create_dir)
# create azure client
Azure::Storage.setup(:storage_account_name => account_name, :storage_access_key => access_key)
azBlobs = Azure::Storage::Blob::BlobService.new
# get list of blobs in container
dlBlobs = azBlobs.list_blobs(container_name)
# download each blob to directory
dlBlobs.each do |dlBlob|
puts "Downloading " + container_name + "/" + dlBlob.name
portalBlob, blobContent = azBlobs.get_blob(container_name, dlBlob.name)
File.open("c:/" + installs_dir + "/" + portalBlob.name, "wb") {|f|
f.write(blobContent)
}
end
I also tried using IO.binwrite() instead of File.open() and got the same result.
Suggestions?
As #coderanger said, your issue was caused by using get_blob to local data into memory at once. There are two ways for resolving it.
According to the offical REST reference here as below.
The maximum size for a block blob created via Put Blob is 256 MB for version 2016-05-31 and later, and 64 MB for older versions. If your blob is larger than 256 MB for version 2016-05-31 and later, or 64 MB for older versions, you must upload it as a set of blocks. For more information, see the Put Block and Put Block Listoperations. It's not necessary to also call Put Blob if you upload the blob as a set of blocks.
So for a blob which consist of block blobs, you can try to get the block blob list via list_blob_blocks to write these block blobs one by one to a local file.
To generate a blob url with SAS token via signed_uri like this test code, then to download the blob via streaming to write a local file.
The problem is that get_blob has to load the data into memory at once rather than streaming it to disk. In Chef we have the remote_file resource to help with this streaming download but you would need to get the plain URL for the blob rather than downloading it using their gem.
I was just looking into using the azure/storage/blob library for a dev-ops project I was working on and it seems to me that the implementation is quite basic and does not utilise the full underlying API available. For example uploads are slow when streamed from a file, because most likely it's not uploading chunks in parallel etc. I don't think this library is production ready and the exposed ruby API is lacking. It's open source, so if anybody has some time, they can help to contribute.