I have a shell script that is running aws s3 cp s3://s3file /home/usr/localfile. The file already exists in that directory, so the cp command is essentially getting the latest copy from S3 to get the latest version.
However, I noticed today that the file was not the latest version; it didn't match the file on S3. Looking at the shell script's stdout from the last two runs, it looks like the command ran - the output is: download: s3://s3file to usr/localfile. But when I compared the copies, they didn't match. The changed timestamp on the file when I view it on the local machine via WinSCP (a file transfer client) didn't change either
I manually ran the command in a shell just now and it copied the file from S3 to the local machine and successfully got the latest copy.
Do I need to add a specific option for this, or is it typical behavior for files to not override a file after aws s3 cp?
Related
I have a bash script file where each line is a curl command to download a file. This bash file is in a Google bucket.
I would like to execute the file either directly from the storage and copy its downloaded contents there or execute it locally and directly copy its content to the bucket.
Basically, I do not want to have these fils on my local machine.. I have tried things along these lines but it either failed or simply downloaded everything locally.
gsutil cp gs://bucket/my_file.sh - | bash gs://bucket/folder_to_copy_to/
Thank you!
To do so, the bucket needs to be mounted on the pod (the pod would see it as a directory).
If the bucket supports NFS, you would be able to mount it as shown here.
Also, there is another way as shown in this question.
otherwise, you would need to copy the script to the pod, run it, then upload the generated files to the bucket, and lastly clean everything up.
The better option is to use a filestore which can be easily mounted using CSI drivers as mentioned here.
There are files from an AWS s3 bucket that I would like to download, they all have the same name but are in different subfolders. There are no credentials required to download and connect to this bucket. I would like to download all the files called "B01.tif" in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/, and save them with the name of the subfolder they are in (for example: S2A_7VEG_20170205_0_L2AB01.tif).
Path example:
s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/S2A_7VEG_20170205_0_L2A/B01.tif
I was thinking of using a bash script that prints the output of ls to download the file with cp, and save it on my pc with a name generated from the path.
Command to use ls:
aws s3 ls s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/ --no-sign-request
Command to download a single file:
aws s3 cp s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/S2A_7VEG_20170205_0_L2A/B01.tif --no-sign-request B01.tif
Attempt to download multiple files:
VAR1=B01.tif
for a in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/:
for b in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/:
for c in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/:
NAME=$(aws s3 ls s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/$a$b$c | head -1)
aws s3 cp s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/$NAME/B01.tif --no-sign-request $NAME$VAR1
done
done
done
I don't know if there is a simple way to go automatically through every subfolder and save the files directly. I know my ls command is broken, because if there are multiple subfolders it will only take the first one as a variable.
It's easier to do this in a programming language rather than as a Shell script.
Here's a Python script that will do it for you:
import boto3
BUCKET = 'sentinel-cogs'
PREFIX = 'sentinel-s2-l2a-cogs/7/V/EG/'
FILE='B01.tif'
s3_resource = boto3.resource('s3')
for object in s3_resource.Bucket(BUCKET).objects.filter(Prefix=PREFIX):
if object.key.endswith(FILE):
target = object.key[len(PREFIX):].replace('/', '_')
object.Object().download_file(target)
I have one s3 bucket which contains zip files.
I have a script which downloading the zip files to my local server, unzip them and upload them to another s3 bucket.
How can I set watermark so i will know what was the last file i downloaded so i won't need to save all the files locally or download all the files each time the script is running ?
I'm using aws sync command which as much as i understand should copy only new files, am i right ?
aws s3 sync $gcs3$gcRegion/$gcTech/$gcPrinterFamily/$gcPrinterType/$gcPrinterName/ $dir
The AWS Command-Line Interface (CLI) aws s3 sync command will copy any files that are not present in the destination (sort of).
So, you either need to keep all previously-downloaded files, or you need another way to keep track of the files that were downloaded.
Instead, I would recommend writing your own program that:
Downloads all files from the S3 bucket with a LastModified timestamp after a stored timestamp
Stores the current time
Unzips the files and copies them to the other S3 bucket
Deletes the zip files and unzipped files
So, the program will need to remember the last time it downloaded files, but it will not need to remember which files it downloaded. Be careful — S3 stores time in UTC, so you'll need to convert your timezones. Or, simply remember the highest LastModified value of the files you downloaded.
To obtain a list of files since a certain LastModified date, you could use the AWS CLI:
aws s3api list-objects --bucket jstack-a --query "Contents[?LastModified>='2019-04-11'].[Key]" --output text
However, I would recommend writing a Python program for the above activities, since it would be easier that writing command-line scripts.
My task is to upload CSV files from the local database to the Google Cloud storage.
To do this, I first copy them to my desktop and then upload them to the Google Cloud storage.
I want this to be done automatically, without my participation. Therefore, I created a CMD file that will be run by Task Scheduler. The structure of the CMD file is the next:
gsutil cp C:\Users\Myname\Desktop\test\*.csv gs://my-bucket
gsutil rm C:\Users\Myname\Desktop\test\*.csv
But after loading data into `Google Cloud storage, it does not delete the CSV files.
However, if you run the delete in a separate command, it successfully deletes files.
Just:
gsutil rm C:\Users\Myname\Desktop\test\*.csv
But I want the download and removal code to be in one file.
I also tried this way (but it did not help me either):
gsutil cp C:\Users\Myname\Desktop\test\*.csv gs://my-bucket
del C:\Users\Myname\Desktop\test\*.csv
What are the solutions to this problem?
The gsutil mv command is designed for this use case.
Note, however, the docs section about atomicity. Especially with moving from your local filesystem to the cloud, there is no way to upload and delete atomically, so the command will first upload, verify the file is stored in the cloud, and then delete the local file.
The problem is cause by gsutil being a script. On Windows, this script (gsutil) exits and stops further processing of commands in your batch file.
The solution is to add the word call in front of gsutil:
call gsutil cp C:\Users\Myname\Desktop\test\*.csv gs://my-bucket
Next, do not use gsutil to delete a local file. Use del instead.
I am using AWS-CLI in windows cmd and run AWS s3 sync command but it does not work with --recursive, it shows unknown options: --recursive
aws s3 sync --recursive localpath s3://bucket-name
python --version python 3.6.5
aws --version aws-cli/1.15.38 Python/2.7.9 Windows/2012Server botocore/1.10.38
Please help
The aws s3 sync command is already recursive, so there is no need for a recursive option, and there isn't one:
Syncs directories and S3 prefixes. Recursively copies new and updated
files from the source directory to the destination. Only creates
folders in the destination if they contain one or more files.
In addition the sync command only copies things that don't already exist on the destination. If you point to a folder it will recursively sync everything inside that doesn't already exist on your target destination. This is different then the aws s3 cp command. The cp command copies whatever you tell it to, regardless of it it already exists on the target. The cp command takes a --recursive option for recursively copying folders.