How to move an object from one bucket to another? - minio

I have a big amount of objects stored in Minio. I need to move them from one bucket to another. Due to the big amount of the objects (and the size of the objects themselves), I do not want to upload them into memory.
The only way I found so far is copying the objects to the destination bucket and removing them from the source bucket.
Is there a way to move them with one command (like mv)?

#yaskovdev S3 API does not allow for mv like functionality. So, the steps that you have described are the only way to do that.

Since April 2020 the minio client utility does support move functionality:
$ mc mv -h
NAME:
mc mv - move objects
USAGE:
mc mv [FLAGS] SOURCE [SOURCE...] TARGET
FLAGS:
--recursive, -r move recursively
--older-than value move objects older than L days, M hours and N minutes
--newer-than value move objects newer than L days, M hours and N minutes
--storage-class value, --sc value set storage class for new object(s) on target
--encrypt value encrypt/decrypt objects (using server-side encryption with server managed keys)
--attr value add custom metadata for the object
--continue, -c create or resume move session
--preserve, -a preserve filesystem attributes (mode, ownership, timestamps)
--disable-multipart disable multipart upload feature
--encrypt-key value encrypt/decrypt objects (using server-side encryption with customer provided keys)
--config-dir value, -C value path to configuration folder (default: "/Users/prerok/.mc")
--quiet, -q disable progress bar display
--no-color disable color theme
--json enable JSON formatted output
--debug enable debug output
--insecure disable SSL certificate verification
--help, -h show help
The S3 API does not support the move functionality so the mc utility is actually doing the copy first and then removal of the object. Source:
https://github.com/minio/mc/blob/133dd1f7da237a91dc291cbf8f3a5ad66fffc425/cmd/mv-main.go#L363

example :
// Create object "my-objectname" in bucket "my-bucketname" by copying from object
// "my-objectname" in bucket "my-source-bucketname".
minioClient.copyObject(
CopyObjectArgs.builder()
.bucket("my-bucketname")
.object("my-objectname")
.source(
CopySource.builder()
.bucket("my-source-bucketname")
.object("my-objectname")
.build())
.build());
more info here -> minio documents

Related

How can I access MinIO files on the file system?

On the underlying server filesystem, MinIO seems to store the content of an uploaded file (e.g. X) in a file called xl.meta in a directory bearing the original file name (e.g. X/xl.meta).
However, the file xl.meta is encoded. How can I access the original file content on the server file system itself (i.e. see the text inside a plain text file or being able to play a sound file with a respective application)?
It would not be possible since the object you are seeing on the backend fs is not the actual object, it is only the erasure coded part(s) that is split across all the disks in a given erasure set. So, you could do it if you were just using fs mode (single node, single disk) but in erasure coded environment you will need to have quorum to be able to download the object, and that via an S3 supported method and not directly from the backend. Technically not quorum, rather n/2 if you just want to read the object, but as a rule you should avoid doing anything in the backend fs.
If you happen to want to just see the contents of xl.meta, and not recover the file itself, you can use this something like mc support inspect myminio/test/syslog/xl.meta --export=json (or you can build a binary from https://github.com/minio/minio/tree/master/docs/debugging/xl-meta but using mc is probably easier).

How to delete all objects from a S3 folder of various sub folder with Specific name which are older than n days using AWS Ruby SDK

I have a requirement to delete files with prefix application.log which are older than 5 days in an S3 folder.
The file is present inside log-bucket/main-shell/apps/app-main-shell-55f79d74fc-4sx6c/helpkit.
Is there a way where we can list and delete files recursively using AWS Ruby SDK?
Rather than writing your own code you can setup AWS3 life cycle with prefix by using RUBY SDK.
In life cycle mentioned after 5th day delete data from particular path.
Below are the reference Links to configure S3 cycle and Ruby SDK.
https://docs.aws.amazon.com/sdkforruby/api/Aws/S3/BucketLifecycle.html
https://docs.aws.amazon.com/AmazonS3/latest/dev/lifecycle-configuration-examples.html
If you want to perform a 1-time clean-up of objects under a single key prefix, you can use the batch operations on the objects collection.
s3 = Aws::S3::Resource.new()
s3.bucket('bucket-name').objects(prefix: 'some/key/prefix/').batch_delete!
This will list objects with the given key prefix and then issue a batch delete for each page of results. The more objects with the given prefix, the more api calls. It should be 2 requests (1x list, 1x batch delete) per 1k objects to delete.
Please note, this is a destructive operation. Make sure you key prefix is correct before you issue the batch delete.
If you want to do this on a frequent basis, then I would use a bucket lifecycle configuration.

zfs list -t snaphost to identify if pool was changed

I was using zfs list -t snapshot to identify if the pool was changed. If the last snapshot was showing that it uses some space then I was sure that pool was changed and I(actually a script) did another snapshot.
What I noticed that if I move a file from 1 folder to another folder in the pool than the command zfs list -t snapshot still returns 0 as a size of the last snapshot. That's not good for me as I need to identify if my pool was changed. What do I do wrong? Is there any other more reliable way of identifying if the pool was actually changed?
Snapshots show how your file system looked at a specific point in the past (including its size). If you remove or modify a file afterwards, the blocks that are different (meaning the blocks that are now deleted or modified) will remain on the filesystem (think of them as locked; or similar to how hard links work on Unix, as long as a reference to a file exists, it will not be deleted).
On the contrary, if you just add a new file, the old blocks stay the same, so the snapshot will not differ in size. Moving your folder inside the same file system does not add new data, so it will not show in size.
To view the differences, you can compare the current state with the last snapshot by using zfs diff pool/dataset#snapshot pool/dataset. For details on using the output in scripts, see my other answer here.

Get the size of a folder in Amazon s3 using laravel

I want to get the size of a folder without looping through all the files in laravel. The folder is in Amazon S3. My current code is:
$files = Storage::allFiles($dir);
foreach ($files as $file) {
$size+= Storage::size($file);
}
I want to avoid the looping is there any way to accomplish this.
Using listContents you can get an array of files including filesizes and then you can map that array into a total size.
$disk = Storage::disk('s3');
$size = array_sum(array_map(function($file) {
return (int)$file['size'];
}, array_filter($disk->listContents('your_folder', true /*<- recursive*/), function($file) {
return $file['type'] == 'file';
})));
The other option you have, if you can deal with a day old stats, is the newly released 'S3 Storage Inventory' feature.
S3 can put out a daily (or weekly) file that has an inventory of all of your objects in the folder, including size:
http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-inventory.html
Amazon S3 inventory is one of the tools Amazon S3 provides to help
manage your storage. You can simplify and speed up business workflows
and big data jobs using the Amazon S3 inventory, which provides a
scheduled alternative to the Amazon S3 synchronous List API operation.
Amazon S3 inventory provides a comma-separated values (CSV) flat-file
output of your objects and their corresponding metadata on a daily or
weekly basis for an S3 bucket or a shared prefix (that is, objects
that have names that begin with a common string).
You can configure what object metadata to include in the inventory,
whether to list all object versions or only current versions, where to
store the inventory list flat-file output, and whether to generate the
inventory on a daily or weekly basis. You can have multiple inventory
lists configured for a bucket. For information about pricing, see
Amazon S3 Pricing.
There is no way to compute the size of a folder without recursively looping through it.
A quick command line solution is using du.
du -hs /path/to/directory will output the disk usage.
-h is to get the numbers "human readable", e.g. get 140M instead of 143260 (size in KBytes)
-s is for summary (otherwise you'll get not only the size of the folder but also for everything in the folder separately)
Referenced: https://askubuntu.com/questions/1224/how-do-i-determine-the-total-size-of-a-directory-folder-from-the-command-line
Amazon CloudWatch provides automatic metrics for the number of objects stored in a bucket and the storage space occupied. I'm not sure how often these metrics are updated, but that would be the simplest to use. However, this measures the whole bucket rather than just a particular folder.
See: Amazon Simple Storage Service Metrics and Dimensions

determine if file is complete

I am trying to write a video ruby transformer script (using ffmpeg) that depends on mov files being ftped to a server.
The problem I've run into is that when a large file is uploaded by a user, the watch script (using rb-inotify) attempts to execute (and run the transcoder) before the mov is completely uploaded.
I'm a complete noob. But I'm trying to discover if there is a way for me to be able to ensure my watch script doesn't run until the file(s) is/are completely uploaded.
My watch script is here:
watch_me = INotify::Notifier.new
watch_me.watch("/directory_to_my/videos", :close_write) do |directories|
load '/directory_to_my/videos/.transcoder.rb'
end
watch_me.run
Thank you for any help you can provide.
Just relying on inotify(7) to tell you when a file has been updated isn't a great fit for telling when an upload is 'complete' -- an FTP session might time out and be re-started, for example, allowing a user to upload a file in chunks over several days as connectivity is cheap or reliable or available. inotify(7) only ever sees file open, close, rename, and access, but never the higher-level event "I'm done modifying this file", as the user would understand it.
There are two mechanisms I can think of: one is to have uploads go initially into one directory and ask the user to move the file into another directory when the upload is complete. The other creates some file meta-data on the client and uses that to "know" when the upload is complete.
Move completed files manually
If your users upload into the directory ftp/incoming/temporary/, they can upload the file in as many connections is required. Once the file is "complete", they can rename the file (rename ftp/incoming/temporary/hello.mov ftp/incoming/complete/hello.mov) and your rb-inotify interface looks for file renames in the ftp/incoming/complete/ directory, and starts the ffmpeg(1) command.
Generate metadata
For a transfer to be "complete", you're really looking for two things:
The file is the same size on both systems.
The file is identical on both systems.
Since "identical" is otherwise difficult to check, most people content themselves with checking if the contents of the file, when run through a cryptographic hash function such as MD5 or SHA-1 (or better, SHA-224, SHA-256, SHA-384, or SHA-512) functions. MD5 is quite fine if you're guarding against incomplete transmission but if you intend on using the output of the function for other means, using a stronger function would be wise.
MD5 is really tempting though, since tools to create and validate MD5 hashes are very widespread: md5sum(1) on most Linux systems, md5(1) on most BSD systems (including OS X).
$ md5sum /etc/passwd
c271aa0e11f560af419557ef49a27ac8 /etc/passwd
$ md5sum /etc/passwd > /tmp/sums
$ md5sum -c /tmp/sums
/etc/passwd: OK
The md5sum -c command asks the md5sum(1) program to check the file of hashes and filenames for correctness. It looks a little silly when used on just a single file, but when you've got dozens or hundreds of files, it's nice to let the software do the checking for you. For example: http://releases.mozilla.org/pub/mozilla.org/firefox/releases/3.0.19-real-real/MD5SUMS -- Mozilla has published such files with 860 entries -- checking them by hand would get tiring.
Because checking hashes can take a long time (five minutes on my system to check a high-definition hour-long video that wasn't recently used), it'd be a good idea to only check the hashes when the filesizes match. Modify your upload tool to send along some metadata about how long the file is and what its cryptographic hash is. When your rb-inotify script sees file close requests, check the file size, and if the sizes match, check the cryptographic hash. If the hashes match, then start your ffmpeg(1) command.
It seems easier to upload the file to a temporal directory on the server and move it to the location your script is watching once the transfer is completed.

Resources