I am using Golang SDK to communicate with AWS S3. I want to download only those files from a folder, that ends with .txt or .lib.
AWS SDK does not have this possibility.
You can list objects in a bucket, filter the output based on your needs and get a single object using getObject.
see here GetObject and here ListObjects
Another option can be to mount AWS S3 bucket on your machine/server using e.g. s3fs-fuse and filter the files you need in order to get the list to download.
Related
I'd like to create a way (using shell scripts and AWS's CLI) so that the following can be automated:
Copy specific files from an s3 bucket
Paste them into a different bucket in S3.
Would the below 'sync' command work?
aws s3 sync s3://directory1/bucket1 s3://directory2/bucket2 --exclude "US*.gz" --exclude "CA*.gz" --include "AU*.gz"
The goal here is to ONLY transfer files whose filenames begin with "AU" and exclude everything else, all in automated fashion as much as possible. Also, is it possible to exclude very old files?
Second part of the question is what do I need to add to my shell script in order to automate this process as much as possible, as "AU" files gets dropped in this folder everyday?
Copy objects
The AWS CLI can certainly copy objects between buckets. In fact, it does not even require files to be downloaded — S3 will copy directly between buckets, even if they are in different regions.
The aws s3 sync command is certainly an easy way to do it, since it will replicate any files from the source to the destination without having to specifically state which files to copy.
To only copy AU* files, use: --exclude "*" --include "AU*"
See: Use of Exclude and Include Filters
You asked about excluding old files — the sync command will sync all files, so any files that were previously copied will not be copied again. By default, any files deleted from the source will not be deleted in the destination until specifically requested.
Automate
How to automate this? The most cloud-worthy way to do this would be to create an AWS Lambda function. The Lambda function can be automatically triggered by an Amazon CloudWatch Events rule on a regular schedule.
However, the AWS CLI is not installed by default in Lambda, so it might be a little more challenging. See: Running aws-cli Commands Inside An AWS Lambda Function - Alestic.com
It would be better to have the Lambda function do the copy itself, rather than calling the AWS CLI.
Alternative idea
Amazon S3 can be configured to trigger an AWS Lambda function whenever a new object is added to an S3 bucket. This way, as soon as the object is added in S3, it will be copied to the other Amazon S3 bucket. Logic in the Lambda function can determine whether or not to copy the file, such as checking that is starts with AU.
Am somewhat new to using Amazon S3 for image storage and just wondering the easiest way to import about about 5,000 (currently publicly available) Image URLs into S3 so that the end result is hosting of the images on our S3 Account (and resulting New URLs of the same images).
Am wondering if there is a need to first download and save each image on my computer, and then to import the resulting Images to S3 (which would of course result in new URLs for those images)? Or is there an easier to accomplish this.
Thanks for any suggestions.
James
The AWS CLI for S3 doesn’t support copying a file from a remote URL to S3, but you can copy from a local filestream to S3.
Happily, cURL outputs the contents of URLs as streams by default, so the following command will stream a remote URL to a location in S3 via the local machine:
# The "-" means "stream from stdin".
curl https://example.com/some/file.ext | aws s3 cp - s3://bucket/path/file.ext
Caveat: while this doesn’t require a local temporary file, it does ship all of the file bytes through the machine where the command is running as a relay to S3.
Add whatever other options for the S3 cp command that you need, such as --acl.
Threaded multipart transfers with chunk management will probably be more performant but this is a quick solution in a pinch if you have access to the CLI.
S3 is a “dumb” object storage, meaning that it cannot do any data processing itself, including but not limited to downloading files. So yes, you’ll have to dowload your objects first and then upload them to S3.
I have been trying to find a solution for this but I need to ask you all. Do you know if there is a windows desktop application out there which would put (real time sync) objects from a local folder into predefined AWS S3 bucket? This could work just one way - upload from local to s3.
Setting it up
Insall AWS cli https://aws.amazon.com/cli/ for windows.
Through AWS website/console. Create an IAM user with a strict policy that allows access only to the required S3 bucket.
Run aws configure in powershell or cmd and set up the region, access key and secrect key for the IAM user that you created.
Test if your set up is correct by running aws s3 ls in the command line and verify you see a list of your account S3 buckets.
If not, then you probably configured IAM permissions incorrectly, you might need ListBuckets on all of S3 too.
How to sync examples
aws s3 sync path/to/yourfolder s3://mybucket/
aws s3 sync path/to/yourfolder s3://mybucket/images/
aws s3 sync path/to/yourfolder s3://mybucket/images/ --delete deletes files on S3 that are no longer available on your local path.
Not sure what this has to do with electron but you could set up a trigger on your application to invoke these commands. For example, in atom.io or VS code, you could bind this to saving a document on "ctrl+s".
If you are programming an application using Electron then you should consider using AWS JavaScript SDK instead of the AWS CLI but that is a whole different story.
And lastly, back up your files somewhere else before trying to use possibly destructive commands such as sync until you get a feeling of how they work.
Our company is using Ruby 2.1.3 with AWS SDK V1 for uploading files on S3. I need to stream files directly from a private external bucket to one of our personal bucket (without actually downloading them locally). I can't find any good documentation on the subject.
The copy_from method provided by the SDK, I think, does not permit streaming from a private external bucket to one of our bucket.
We have tried using open-uri to stream the download and stream the upload to s3 but the file was always downloaded fully first and then uploaded (is it supposed to be like that?).
Any help is welcomed!
Thank you.
The V1 SDK doesn't allow you to transfer between buckets directly. You can do what open-uri does and download the file and then upload to the new bucket.
If you want a solution that can still work in Ruby I suggest using the AWS CLI. You can add a line like this to your code:
`aws s3 cp s3://frombucket/ s3://tobucket/`
The backticks allow you to execute system commands in your ruby script. Alternatively you could upgrade to the V2 SDK and use the object.copy_to command and copy between buckets. Hope this helps!
The aws.s3.bucket name should be changed to something unique and
related to your application. For instance, the demo application uses
the value com.heroku.devcenter-java-play-s3 which would have to be
changed to something else if you want to run the demo yourself.
I am trying to use S3 with Heroku. I'm also using Play2 Framework with Scala. I used the plugin displayed in here: https://devcenter.heroku.com/articles/using-amazon-s3-for-file-uploads-with-java-and-play-2#s3-plugin-for-play-2
One thing on my config file is I need to set up these three parameters:
aws.access.key=${?AWS_ACCESS_KEY}
aws.secret.key=${?AWS_SECRET_KEY}
aws.s3.bucket=com.something.unique
I found the access and secret key on AWS console, but wht is this s3.bucket? I did have assigned a name to my S3 bucket, but the format here looks like a website or a java package hierarchy. What should I put there??
An S3 bucket is a storage container within the AWS S3 service. You need to create the bucket with their web console or API before you can store data in S3. All data lives within a bucket.
Once you have created your bucket, you need to configure your S3 client to use that bucket name where you want to store the data.
S3 bucket names are a global name space across S3. They often use a dotted demarcation like a Java package or domain name, but it's an arbitrary convention some folks use.
You can use the same bucket in multiple environments if you are comfortable with leaking data between staging and production. I recommend using separate S3 buckets for each environments.