Shell script to fetch S3 bucket size with AWS CLI - bash

I have this script that fetches all the buckets in AWS along with the size. But when I am running the script its fetching the bucket, but when running the loop for fetching the size, its throwing error. can someone point me where I am going wrong here. bcos when I am running the awscli commands for individual bucket, its fetching the size without any issues.
The desired output wille be as below, but for all the buckets, I have fetched for one bucket.
Desired ouptut:
aws --profile aws-stage s3 ls s3://<bucket> --recursive --human-readable --summarize | awk END'{print}'
Total Size: 75.1 KiB
Error:
Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
Script:
#!/bin/bash
aws_profile=('aws-stage' 'aws-prod');
#loop AWS profiles
for i in "${aws_profile[#]}"; do
echo "${i}"
buckets=$(aws --profile "${i}" s3 ls s3:// --recursive | awk '{print $3}')
#loop S3 buckets
for j in "${buckets[#]}"; do
echo "${j}"
aws --profile "${i}" s3 ls s3://"${j}" --recursive --human-readable --summarize | awk END'{print}'
done
done

Try this:
#!/bin/bash
aws_profiles=('aws-stage' 'aws-prod');
for profile in "${aws_profiles[#]}"; do
echo "$profile"
read -rd "\n" -a buckets <<< "$(aws --profile "$profile" s3 ls | cut -d " " -f3)"
for bucket in "${buckets[#]}"; do
echo "$bucket"
aws --profile "$profile" s3 ls s3://"$bucket" --human-readable --summarize | awk END'{print}'
done
done
The problem was that your buckets was a single string, rather than an array.

Related

Using makefile to download a file from AWS to local

I want to set up a target which downloads the latest s3 file containing _id_config within a path. So I know I can get the name of file I am interested in by
FILE=$(shell aws s3 ls s3:blah//xyz/mno/here --recursive | sort | tail -n 2 | awk '{print $4}' | grep id_config)
Now, I want to download the file to local with something like
download_stuff:
aws s3 cp s3://prod_an.live.data/$FILE .
But when I run this, my $FILE has some extra stuff like
aws s3 cp s3://blah/2022-02-17 16:02:21 2098880 blah//xyz/mno/here54fa8c68e41_id_config.json .
Unknown options: 2098880,blah/xyz/mno/here54fa8c68e41_id_config.json,.
Please can someone help me understand why 2098880 and the spaces are there in the output and how to resolve this. Thank you in advance.
Suggesting a trick with ls options -1 and -t to get the latest files in a folder:
FILE=$(shell aws s3 ls -1t s3:blah//xyz/mno/here |head -n 2 | grep id_config)

How can I process every file in my S3 bucket with bash?

I have a bunch of large files (100MB - 1GB) in a Bucket. I would like to "map" all those files using a bash script. I am unable to download all files at once because my computer does not have enough storage.
Does anyone have an idea how I can do this? Anything smarter than the following solution?
for file in $(aws s3 ls s3://my-bucket | rev | cut -d' ' -f1 | rev) ; do
aws s3 cp s3://my-bucket/$file $file;
./script;
aws s3 cp $file s3://my-bucket/$file;
rm $file;
done
Explanation:
$(aws s3 ls s3://my-bucket | rev | cut -d' ' -f1 | rev) gets the name of the files in the bucket over which we are iterating
aws s3 cp s3://my-bucket/$file $file downloads a single file
./script runs my custom script
aws s3 cp $file s3://my-bucket/$file overwrites the old file with the new one
Interesting approach to reverse and reverse again - I would never have thought of that, but now one day might.
Taking the context as bash as per the title. Depending on what your script does, you could use other services like Glue for a "smarter"/faster but more complex solution.
I suppose you don't have any "folders" (prefixes) in your bucket. A more configurable and probably more reliable way to get the list to process could be to use the s3api, e.g.:
objects=$(aws s3api list-objects \
--bucket my-bucket \
--output json \
--query "Contents[].Key")
for file in $(echo $objects | jq -r '.[]'); do
aws s3 cp s3://my-bucket/$file $file;
./script;
aws s3 cp $file s3://my-bucket/$file;
rm $file;
done
This assumes you'd have jq installed.
The above doesn't need handling of "folders". You can also do some nicer things like choose a "subfolder" to process, e.g.:
--prefix path/to/process/ \
or filter on a file extension, e.g.:
--query "Contents[?contains(Key, '.mp4')].Key")
Might be nice to use prefixes to be able to handle restarts of the script or list limits being hit (e.g. default --page-size of 1000 for aws s3 ls).

aws s3 ls - how to recursively list objects with bash script avoid pagination error

I have on premise AWS S3 like storage. I need to list all files on specific bucket. When I am doing it at the top of the bucket I am getting error:
Error during pagination: The same next token was received twice:{'ContinuationToken':"file path"}
It happens when two many objects needs to be listed I think. This is something wrong at storage side but there is no cure for that right now.
I did a workaround for that and run S3 ls in the bash loop while. I manage to prepare a simple loop for different bucket where I have a much fewer number of objects. That loop were operating deep inside where I knew how many dirs I have.
./aws --profile us-bucket --endpoint-url https://endpoint:18082 --no-verify-ssl s3 ls us-bucket/dir1/dir2/dir3/dir4/dir5/dir6/ | tr -s ' '| tr '/' ' ' | awk '{print $2}' | while read line0; do ./aws --profile us-bucket --endpoint-url https://endpoint:18082 --no-verify-ssl s3 ls us-bucket/dir1/dir2/dir3/dir4/dir5/dir6/${line0}/| tr -s ' '| tr '/' ' ' | awk '{print $2}' | while read line1; do ./aws --profile us-bucket --endpoint-url https://endpoint:18082 --no-verify-ssl s3 ls us-bucket/dir1/dir2/dir3/dir4/dir5/dir6/${line0}/${line1}/| tr -s ' '| tr '/' ' ' | awk '{print $2}' |while read line2; do ./aws --profile us-bucket --endpoint-url https://endpoint:18082 --no-verify-ssl s3 ls --recursive us-bucket/dir1/dir2/dir3/dir4/dir5/dir6/${line0}/${line1}/${line2}/;done;done;done > /tmp/us-bucket/us-bucket_dir2_dir3_dir4_dir5_dir6.txt
I would like to write loop which go from the top or root (how you prefer) and list all files (no matter how many dir we have on the path) from the last dir in the path going up to avoid appearing:
Error during pagination: The same next token was received twice:{'ContinuationToken':"file path"}
Any help/clues appreciated. Thanks.
Br,
Jay

Delete files after awk command

I'm setting to do an ls in a bucket.
Make a print in folder name
remove the /, do sorting and remove the last 3.
which will be the most recent, then I'm setting remove the folds except for those 3 recent ones.
for i in $(aws s3 ls s3://portal-storage-site | awk -F '-' '{print $2}' | sed 's/\///g'| sort -n| tail -3| xargs| sed 's/ /|/g');
do aws s3 ls s3://portal-storage-site| grep -Ev "PRE\s.*\-($i)\/" | awk '{print $2}'|xargs echo "aws s3 ls s3://portal-storage-site/"; done
I expect the output is exec
aws s3 ls s3://portal-storage-site/2e5d0599-120/
aws s3 ls s3://portal-storage-site/6f08a223-118/
aws s3 ls s3://portal-storage-site/ba67667e-121/
aws s3 ls s3://portal-storage-site/ba67667e-122/
but the actual is
aws s3 ls s3://portal-storage-site/2e5d0599-119/ 2e5d0599-120/ 6f08a223-118/ ba67667e-121/ ba67667e-122/
Instead of using xargs you can try to compose your second aws ls command in awk and send it to bash:
aws s3 ls s3://portal-storage-site| grep -Ev "PRE\s.*\-($i)\/" | awk '{print "aws s3 ls s3://portal-storage-site/" $2}'| bash

aws cli - download rds postgres logs in a bash script

I wrote a simple bash script to download my RDS postgres files.
But the kicker is that is all works fine tine terminal, but when I try the same thing in the script I get an error:
An error occurred (DBLogFileNotFoundFault) when calling the DownloadDBLogFilePortion operation: DBLog File: "error/postgresql.log.2017-11-05-23", is not found on the DB instance
The command in question is this:
aws rds download-db-log-file-portion --db-instance-identifier foobar --starting-token 0 --output text --log-file error/postgresql.log.2017-11-05-23 >> test.log
It all works fine, but when I put the exact same line in the bash script I get the error message that there are no db log files - which is nonsense, they are there.
This is the bash script:
download_generate_report() {
for filename in $( aws rds describe-db-log-files --db-instance-identifier $1 | awk {'print $2'} | grep $2 )
do
echo $filename
echo $1
aws rds download-db-log-file-portion --db-instance-identifier $1 --starting-token 0 --output text --log-file $filename >> /home/ubuntu/pgbadger_script/postgres_logs/postgres_$1.log.$2
done
}
Tnx,
Tom
I re-wrote your script a little and it seems to work for me. It barked about grep. This uses jq.
for filename in $( aws rds describe-db-log-files --db-instance-identifier $1 | jq -r '.DescribeDBLogFiles[] | .LogFileName' )
do
aws rds download-db-log-file-portion --db-instance-identifier $1 --output text --no-paginate --log-file $filename >> /tmp/postgres_$1.log.$2
done
Thanks you Ian, I have an issue, with aws cli 2.4, because the log files download truncated.
to solve this I changed --no-paginate with --starting-token 0 more info in the RDS Reference
finally in bash:
#/bin/bash
set -x
for filename in $( aws rds describe-db-log-files --db-instance-identifier $1 | jq -r '.DescribeDBLogFiles[] | .LogFileName' )
do
aws rds download-db-log-file-portion --db-instance-identifier $1 --output text --starting-token 0 --log-file $filename >> $filename
done

Resources