Is there an open source version of s3distcp? - hadoop

I would love to use s3distcp for copying data from S3 buckets to S3 buckets but I have the need to use an external proprietary encryption mechanism to ensure the data is encrypted at rest (keeping the keys to myself so amazon could not decrypt)
I would love to do a git clone and create my own s3distcp (with hooks for external encryption/decryption libraries).
I googled and found a potential here https://github.com/libin/s3distcp
But it's not an Amazon account (apparently) and doesn't look like it's documented/updated.

I built a tool that runs in Node.js to copy data from buckets to buckets.
https://github.com/Homefinder/bucketCloner
It uses the AWS JavaScript SDK and isn't very complicated. You could easily modify it for your purposes, assuming you still have this need of course.

Related

SurrealDB - store binary files

Is it possible to store binary files (e.g. images) using SurrealDB?
I can't find anything about this in docs.
If not, where can I store images since all the other data is stored in SurrealDB.
SurrealDB wasn't created as a file store. For this purpose, you can use for example object storage. Nearly every cloud service provide object storage.
If you want an open-source solution that can host on your own, you can check MinIO object storage - github repo.

How to delete x old files in amazon s3 using amazon CLI

I have found this regarding deletion of old files in bash:
Delete all but the most recent X files in bash
I want the same functionality, however I cannot apply the same principles in my script as it is interacting with an amazon s3 directory.
Does anyone know how to use amazon CLI to achieve this?
Well you can just create a lifecycle rule on S3 to delete older files. Then this process is done automatically for you.
Otherwise I guess you need to LIST all objects' metadata and write a script that checks if the script is old enough. But if you have a lot of objects this can be quite costly, while the lifecycle rule is free.

Streaming download on S3 from an external private bucket to company bucket with rails

Our company is using Ruby 2.1.3 with AWS SDK V1 for uploading files on S3. I need to stream files directly from a private external bucket to one of our personal bucket (without actually downloading them locally). I can't find any good documentation on the subject.
The copy_from method provided by the SDK, I think, does not permit streaming from a private external bucket to one of our bucket.
We have tried using open-uri to stream the download and stream the upload to s3 but the file was always downloaded fully first and then uploaded (is it supposed to be like that?).
Any help is welcomed!
Thank you.
The V1 SDK doesn't allow you to transfer between buckets directly. You can do what open-uri does and download the file and then upload to the new bucket.
If you want a solution that can still work in Ruby I suggest using the AWS CLI. You can add a line like this to your code:
`aws s3 cp s3://frombucket/ s3://tobucket/`
The backticks allow you to execute system commands in your ruby script. Alternatively you could upgrade to the V2 SDK and use the object.copy_to command and copy between buckets. Hope this helps!

Access to filesystem on AppHarbor

I want to try AppHarbor, but I have an application which stores uploaded files in certain place on a filesystem. Is it compatible with AppHarbor? Can I store files in the file system and access them later?
(what kind of path can I expect, like c:\blabla something or what?)
Thank you.
You can store files on the local filesystem, but the application directory is wiped on each new deployment so it's not recommended to rely on for file storage.
Instead we recommend that you use a cloud storage service such as Amazon S3, Google Cloud Storage or similar. There are .NET libraries for both services.
We recently wrote a blog post about uploading files directly to S3 and GCS from the browser that you might want to read.
If you are using a background worker, you need to 'Enable File System Write Access' in the settings of you application.
Then, you are permitted access to write to: Path.GetTempPath()
Sourced from this support question: http://support.appharbor.com/discussions/problems/5868-create-directory-in-background-worker

Storing images in file system, amazon-s3 datastores

There has been numerous discussions related to storing images (or binary data) in the database or file system (Refer: Storing Images in DB - Yea or Nay?)
We have decided to use the file system for storing images and relevant image specific metadata in the database itself in the short term and migrate to an amazon s3 based data store in the future. Note: the data store will be used to store user pictures, photos from group meetings ...
Are there any off the shelf java based open source frameworks which provide an abstraction to handle storage and retrieval via http for the above data stores. We wouldn't want to write any code related to admin tasks like backups, purging, maintenance.
Jets3t - http://jets3t.s3.amazonaws.com/index.html
We've used this and it works like a charm for S3.
I'm not sure I understand if you are looking for a framework that will work for both file-system storage and S3, but as unique as S3 is, I'm not sure that such a thing would exist. Obviously with S3, backups and maintenance are handled for you.

Resources