AWS S3 download all files with same name with shell - bash

There are files from an AWS s3 bucket that I would like to download, they all have the same name but are in different subfolders. There are no credentials required to download and connect to this bucket. I would like to download all the files called "B01.tif" in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/, and save them with the name of the subfolder they are in (for example: S2A_7VEG_20170205_0_L2AB01.tif).
Path example:
s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/S2A_7VEG_20170205_0_L2A/B01.tif
I was thinking of using a bash script that prints the output of ls to download the file with cp, and save it on my pc with a name generated from the path.
Command to use ls:
aws s3 ls s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/ --no-sign-request
Command to download a single file:
aws s3 cp s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/S2A_7VEG_20170205_0_L2A/B01.tif --no-sign-request B01.tif
Attempt to download multiple files:
VAR1=B01.tif
for a in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/:
for b in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/:
for c in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/:
NAME=$(aws s3 ls s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/$a$b$c | head -1)
aws s3 cp s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/$NAME/B01.tif --no-sign-request $NAME$VAR1
done
done
done
I don't know if there is a simple way to go automatically through every subfolder and save the files directly. I know my ls command is broken, because if there are multiple subfolders it will only take the first one as a variable.

It's easier to do this in a programming language rather than as a Shell script.
Here's a Python script that will do it for you:
import boto3
BUCKET = 'sentinel-cogs'
PREFIX = 'sentinel-s2-l2a-cogs/7/V/EG/'
FILE='B01.tif'
s3_resource = boto3.resource('s3')
for object in s3_resource.Bucket(BUCKET).objects.filter(Prefix=PREFIX):
if object.key.endswith(FILE):
target = object.key[len(PREFIX):].replace('/', '_')
object.Object().download_file(target)

Related

how to rename a file and move from one s3 bucket to another

I am using this command but unable to rename and move a file from one s3 folder to another
I want to rename this file as well as move
Here anagaza.csv000 is the file I want to rename and move it to movedData folder with the name angaza.csv
Once I move the file I also want a more generic approach where instead of file name like anagaza.csv000 I could use wildcard like anaga*.* or "wildcard.csv*" something like that but my destination should have the name I choose to give which is angaza.csv
aws s3 --recursive mv
s3://mygluecrawlerbucket/angaza_accounts/to_be_processed/anagaza.csv000
s3://mygluecrawlerbucket/angaza_accounts/movedData/angaza.csv --profile default
You can generate new object names with shell script (mv or rename) and after that rename in S3 with s3 mv one by one.
For renaming the name of objects, you need to write regex or somethig by yourself. For renaming with regex, this SO will help.
Here is sample.sh I used.
list_objects=$(aws s3 ls s3://oldbucket | awk '{print $4}')
for old_object_name in $list_objects; do
new_object_name=$(...) # mv or rename
aws s3 mv s3://oldbucket/$old_object_name s3://newbucket/$new_object_name
done

aws s3 sync --recursive not working in windows

I am using AWS-CLI in windows cmd and run AWS s3 sync command but it does not work with --recursive, it shows unknown options: --recursive
aws s3 sync --recursive localpath s3://bucket-name
python --version python 3.6.5
aws --version aws-cli/1.15.38 Python/2.7.9 Windows/2012Server botocore/1.10.38
Please help
The aws s3 sync command is already recursive, so there is no need for a recursive option, and there isn't one:
Syncs directories and S3 prefixes. Recursively copies new and updated
files from the source directory to the destination. Only creates
folders in the destination if they contain one or more files.
In addition the sync command only copies things that don't already exist on the destination. If you point to a folder it will recursively sync everything inside that doesn't already exist on your target destination. This is different then the aws s3 cp command. The cp command copies whatever you tell it to, regardless of it it already exists on the target. The cp command takes a --recursive option for recursively copying folders.

aws s3 cp to a local file was not replacing file

I have a shell script that is running aws s3 cp s3://s3file /home/usr/localfile. The file already exists in that directory, so the cp command is essentially getting the latest copy from S3 to get the latest version.
However, I noticed today that the file was not the latest version; it didn't match the file on S3. Looking at the shell script's stdout from the last two runs, it looks like the command ran - the output is: download: s3://s3file to usr/localfile. But when I compared the copies, they didn't match. The changed timestamp on the file when I view it on the local machine via WinSCP (a file transfer client) didn't change either
I manually ran the command in a shell just now and it copied the file from S3 to the local machine and successfully got the latest copy.
Do I need to add a specific option for this, or is it typical behavior for files to not override a file after aws s3 cp?

Bash - how can I reference a file as a variable?

I am using a script to generate a file with the current date appended, then exporting that file to Amazon S3. The script is:
#!/bin/bash
#python script that exports file in the form of "myfile{current date}.csv" to home directory
#example: myfile20150316.csv
python some_script.py
#recreate file name exported by some_script.py and save as a variable
now=$(date "+%Y%m%d")
file_name="myfile"
file_ext=".csv"
_file="$file_name$now$file_ext"
#export to file created by python to S3
s3cmd put $_file S3://myS3/bucket/$_file
The file is created by the python script and exported to my home directory, but the file is not exported to S3. My assumption is that I am incorrectly referencing the file in my s3cmd command.
This is not an S3 question. In a generic sense, how should I reference variables that contain file names to be able to use them in subsequent commands?
Please help. Thanks.

Aws Datapipeline: List content of Output Bucket in ShellCommandActivity

How can I list the files which are contained in my output Bucket in a Shell Script?
ls ${OUTPUT1_STAGING_DIR}
does not work, as I get the message that there's no file or directory by this name.
I am sure there is an easy way to do this but I can't seem to find a solution.
From my experience using DataPipeline, that is not supported. You can only read from input bucket directory. The output bucket directory is just a place where you can write files to that will later on be copied over into S3.

Resources