How to delete x old files in amazon s3 using amazon CLI - bash

I have found this regarding deletion of old files in bash:
Delete all but the most recent X files in bash
I want the same functionality, however I cannot apply the same principles in my script as it is interacting with an amazon s3 directory.
Does anyone know how to use amazon CLI to achieve this?

Well you can just create a lifecycle rule on S3 to delete older files. Then this process is done automatically for you.
Otherwise I guess you need to LIST all objects' metadata and write a script that checks if the script is old enough. But if you have a lot of objects this can be quite costly, while the lifecycle rule is free.

Related

Need to write a shell script for transferring files to GCS bucket using gsutil. but don't know completely how to write it. Please tell from start

#!/bin/bash
gsutil -m cp -r dir gs://my-bucket
I write this code for transferring files from machine to GCS bucket..but I think I need to configure project and service account...please tell me the code for this.
The steps are basically:
Create project
Attach billing to the project
Create the bucket
Assign privilege to the bucket (if you open it up to all users for testing make sure you get rid of it or lock it down later)
Upload/download an object via the gui
Upload objects via gsutil
These two quickstarts should be followed in order and should cover everything you need to achieve what you're talking about.
Quickstart Storage
Quickstart Gsutil

AWS Lambda's: SAM deployment ...identifying and removing old S3 package versions?

I'm relatively new to AWS lambda's and SAM, and now I've got things working I've got a seemingly simple question I can't find an answer to.
I've spent the last week getting a lambda app up and running using SAM (build, package, deploy numerous times until it works).
Problem
So now my S3 bucket I'm using to upload to has numerous (100 or so) previously uploaded (by sam package) versions of my zip'd up code.
Question
How can you identify which zipped up packages are the current ones (ie used by a current function and/or layer), and remove all the old obsolete ones?
Is there a way in SAM (cmd line options or in the template files) to
have it automatically delete old versions of your package when you
'sam package' upload a new version?
Is there somewhere in the AWS console to find the key for which zip file in your bucket a current function or layer is using? (I tried everywhere to find that, but couldn't manage to ...it's easy to get the ARN's, but not what the actual URI in your bucket that maps to)
Slight Complication
In the bucket I'm using to store the lambda packages, I've also got a custom layer.
So if it was just the app packages, I could easily (right now) just go in and delete everything in the bucket then do a re-build/package/deploy to clean it. ...but that would also delete my layer (and - same problem - I'm now sure which zip file in the bucket the layer is using).
But that approach wouldn't work long term anyway, as I'm planning to put together approx 10-15 different packages/functions, so deleting everything in the bucket when just one of them is updated is not going to work.
thanks for any thoughts, ideas and help!
1.In your packaged.yaml (generated after invoking sam package) file you can see under each lambda function a CodeUri with unique path s3://your bucket/id . the id is the one used by the current function and/or layer and resides in your bucket.
In layer it's ContentUri.
2.automatically delete old versions of your package when you 'sam package' upload a new version - i'm not aware of something like that.
3.Through AWS console you can see your layer version i don't think there is an indication of your function/layer CodeUri/ContentUri .
You can try to compare the currently deployed stack with what you've stored in S3. Let's assume you have a stack called test-stack, then you can retrieve the processed stack from CloudFormation using the AWS CLI like this:
AWS_PAGER="" aws cloudformation get-template --stack-name test-stack \
--output json --template-stage Processed
To only get the processed template body, you may want to pipe the output again through
jq -r ".TemplateBody"
Now you have the processed CFN template that tells you which S3 buckets and keys it is using. Here is an example for a lambda function:
MyLambda:
Type: 'AWS::Lambda::Function'
Properties:
Code:
S3Bucket: my-bucket
S3Key: 0c53a7ccb1c1762eaeebd96555d13a20
You can then try to delete s3 objects that are not referenced by the current stack.
There used to be a github ticket requesting some sort of automatic cleanup mechanism but it has been closed as it was out of scope https://github.com/aws/serverless-application-model/issues/557#issuecomment-417867028
It may be worth noting that you could also try to setup a S3 lifecycle rule to automatically clean up old s3 objects as suggested here: https://github.com/aws/aws-sam-cli/issues/648 However, I don't think that this will always be a suitable solution.
Last but not least, there has been an attempt to include some automatic cleaning approach in the sam documentation, but it was dismissed as:
[...] there are certain use cases that require these packaged S3 objects to persist, and deleting them would cause significant problems. One such example is the "CloudFormation stack deployment rollback" scenario: 1) Deploy version N of a stack, 2) Delete the packaged S3 object that version N uses, 3) Deploy version N+1 with a "bad" template file that triggers a CloudFormation rollback.
https://github.com/awsdocs/aws-sam-developer-guide/pull/3#issuecomment-462993286
So while it is possible to identify obsolete S3 packaged versions, it might not always be a good idea to delete them after all...
Actually, CloudFormation (which SAM is based on) uses S3 as temporary storage only. When you create or update the Lambda function, a copy of the code is made, so you could delete all objects from the bucket and the Lambda function would still work correctly.
Caveat: there are cases where the S3 object may be required, for example to rollback a CloudFormation stack. For example the "CloudFormation stack deployment rollback" scenario (reference):
Deploy version N of a stack
Delete the packaged S3 object that version N uses
Deploy version N+1 with a "bad" template file that triggers a CloudFormation rollback

How to utilize shell script and AWS CLI to automatically copy a file daily from one S3 bucket to another?

I'd like to create a way (using shell scripts and AWS's CLI) so that the following can be automated:
Copy specific files from an s3 bucket
Paste them into a different bucket in S3.
Would the below 'sync' command work?
aws s3 sync s3://directory1/bucket1 s3://directory2/bucket2 --exclude "US*.gz" --exclude "CA*.gz" --include "AU*.gz"
The goal here is to ONLY transfer files whose filenames begin with "AU" and exclude everything else, all in automated fashion as much as possible. Also, is it possible to exclude very old files?
Second part of the question is what do I need to add to my shell script in order to automate this process as much as possible, as "AU" files gets dropped in this folder everyday?
Copy objects
The AWS CLI can certainly copy objects between buckets. In fact, it does not even require files to be downloaded — S3 will copy directly between buckets, even if they are in different regions.
The aws s3 sync command is certainly an easy way to do it, since it will replicate any files from the source to the destination without having to specifically state which files to copy.
To only copy AU* files, use: --exclude "*" --include "AU*"
See: Use of Exclude and Include Filters
You asked about excluding old files — the sync command will sync all files, so any files that were previously copied will not be copied again. By default, any files deleted from the source will not be deleted in the destination until specifically requested.
Automate
How to automate this? The most cloud-worthy way to do this would be to create an AWS Lambda function. The Lambda function can be automatically triggered by an Amazon CloudWatch Events rule on a regular schedule.
However, the AWS CLI is not installed by default in Lambda, so it might be a little more challenging. See: Running aws-cli Commands Inside An AWS Lambda Function - Alestic.com
It would be better to have the Lambda function do the copy itself, rather than calling the AWS CLI.
Alternative idea
Amazon S3 can be configured to trigger an AWS Lambda function whenever a new object is added to an S3 bucket. This way, as soon as the object is added in S3, it will be copied to the other Amazon S3 bucket. Logic in the Lambda function can determine whether or not to copy the file, such as checking that is starts with AU.

Backup strategy ubuntu laravel

I am searching for a backup strategy for my web application files.
I am hosting my (laravel) application at an ubuntu (18.04) server in the cloud and currently have around 80GB of storage that needs to be backed up (this grows fast). The biggest files are around ~30mb, the rest of it are small jpg/txt/pdf files.
I want to make at least 2 times a day a full backup of the storage directory and store it as a zip file on a local server. I have 2 reasons for this: independence from cloud providers, and for archiving.
My first backup strategy was to zip all the contents of the storage folder en rsync the zip, this goes well until a couple of gigabytes then the server is completely stuck on cpu usage.
My second approach is with rsync, but this i can't track when a file is deleted / added.
I am looking for a good backup strategy that preferable generate zips before or after backup and stores them so we can browse and examine back in time.
Strange enough i could not find anything that suits me, i hope anyone can help me out.
I agree with #RobertFridzema that the whole server becomes unresponsive when using ZIP functionality from spatie package.
Had the same situation with a customer project. My suggestion is to keep the source code files within version control. Just backup the dynamic/changing files with rsync (incremental works best and fast) and create a separate database backup strategy. For example with MySQL/Mariadb: mysqldump, encrypt the resulting file and move it to an external storage as well.
If ZIP creation still is a problem, I would maybe use a storage which is already set up with raid functionality or if that is not possible, I would definitly not use the ZIP functionality on the live server. rsync incremental to another server and do the backup strategy there.
Spatie has a package for Laravel backups that can be scheduled in the laravel job scheduler. It will create zips with the entire project including storage dirs
https://github.com/spatie/laravel-backup

AWS cli skip files that are in use?

How does the AWS cli tool handle files that are in use?
My log files are created hourly (log-2018.01.26.13, 14, 15 etc). I was thinking of making a very simple aws cli mv script to move files to a S3 bucket and having that run every 10 minutes through cron to make sure I get logs as soon as possible.
However there will be files that haven't finished writing yet. Is the AWS cli smart enough to leave those files alone or do I need extra logic that first check whether files are in use?

Resources