s3cmd put -preserve flag does not preserve file creation /Modified date when copied to s3 bucket - s3cmd

I am copying files for AWS ec2 to AWS s3 bucket, with --preserver flag to preserver to file create and modified date, but once file is copied to s3bucket, "s3cmd ls s3://bucket-name/" command list the file upload time as file time, it does not preserve the original file creation date-time. I am using following command(s3cmd put --preserve xyz.log s3://bucket-name/) to copy the file. Though s3cmd help list the --preserve or -p as something you can use to preserve the date it does not seems to be working.
Has anybody run in to this kind of issue and can point me what I am doing wrong.
I also tried s3cmd sync but sync command also behave same way, though I would prefer to use put.
s3cmd put --preserve xyz.log s3://bucket-name/
Thanks,

Please try the current upstream github.com/s3tools/s3cmd master branch. This is resolved there. Going round trip (s3cmd sync --preserve file s3://bucket/; rm file; s3cmd sync --preserve s3://bucket/file .;) now restores the atime and mtime values as stored during sync upload.

Related

How to execute a bash file containing curl instructions within a Google storage bucket and directly copy the contents to the bucket?

I have a bash script file where each line is a curl command to download a file. This bash file is in a Google bucket.
I would like to execute the file either directly from the storage and copy its downloaded contents there or execute it locally and directly copy its content to the bucket.
Basically, I do not want to have these fils on my local machine.. I have tried things along these lines but it either failed or simply downloaded everything locally.
gsutil cp gs://bucket/my_file.sh - | bash gs://bucket/folder_to_copy_to/
Thank you!
To do so, the bucket needs to be mounted on the pod (the pod would see it as a directory).
If the bucket supports NFS, you would be able to mount it as shown here.
Also, there is another way as shown in this question.
otherwise, you would need to copy the script to the pod, run it, then upload the generated files to the bucket, and lastly clean everything up.
The better option is to use a filestore which can be easily mounted using CSI drivers as mentioned here.

how to set watermark when downloading from s3 bucket

I have one s3 bucket which contains zip files.
I have a script which downloading the zip files to my local server, unzip them and upload them to another s3 bucket.
How can I set watermark so i will know what was the last file i downloaded so i won't need to save all the files locally or download all the files each time the script is running ?
I'm using aws sync command which as much as i understand should copy only new files, am i right ?
aws s3 sync $gcs3$gcRegion/$gcTech/$gcPrinterFamily/$gcPrinterType/$gcPrinterName/ $dir
The AWS Command-Line Interface (CLI) aws s3 sync command will copy any files that are not present in the destination (sort of).
So, you either need to keep all previously-downloaded files, or you need another way to keep track of the files that were downloaded.
Instead, I would recommend writing your own program that:
Downloads all files from the S3 bucket with a LastModified timestamp after a stored timestamp
Stores the current time
Unzips the files and copies them to the other S3 bucket
Deletes the zip files and unzipped files
So, the program will need to remember the last time it downloaded files, but it will not need to remember which files it downloaded. Be careful — S3 stores time in UTC, so you'll need to convert your timezones. Or, simply remember the highest LastModified value of the files you downloaded.
To obtain a list of files since a certain LastModified date, you could use the AWS CLI:
aws s3api list-objects --bucket jstack-a --query "Contents[?LastModified>='2019-04-11'].[Key]" --output text
However, I would recommend writing a Python program for the above activities, since it would be easier that writing command-line scripts.

After copying files using gsutil, they are not deleted instantly from the local storage

My task is to upload CSV files from the local database to the Google Cloud storage.
To do this, I first copy them to my desktop and then upload them to the Google Cloud storage.
I want this to be done automatically, without my participation. Therefore, I created a CMD file that will be run by Task Scheduler. The structure of the CMD file is the next:
gsutil cp C:\Users\Myname\Desktop\test\*.csv gs://my-bucket
gsutil rm C:\Users\Myname\Desktop\test\*.csv
But after loading data into `Google Cloud storage, it does not delete the CSV files.
However, if you run the delete in a separate command, it successfully deletes files.
Just:
gsutil rm C:\Users\Myname\Desktop\test\*.csv
But I want the download and removal code to be in one file.
I also tried this way (but it did not help me either):
gsutil cp C:\Users\Myname\Desktop\test\*.csv gs://my-bucket
del C:\Users\Myname\Desktop\test\*.csv
What are the solutions to this problem?
The gsutil mv command is designed for this use case.
Note, however, the docs section about atomicity. Especially with moving from your local filesystem to the cloud, there is no way to upload and delete atomically, so the command will first upload, verify the file is stored in the cloud, and then delete the local file.
The problem is cause by gsutil being a script. On Windows, this script (gsutil) exits and stops further processing of commands in your batch file.
The solution is to add the word call in front of gsutil:
call gsutil cp C:\Users\Myname\Desktop\test\*.csv gs://my-bucket
Next, do not use gsutil to delete a local file. Use del instead.

aws s3 cp to a local file was not replacing file

I have a shell script that is running aws s3 cp s3://s3file /home/usr/localfile. The file already exists in that directory, so the cp command is essentially getting the latest copy from S3 to get the latest version.
However, I noticed today that the file was not the latest version; it didn't match the file on S3. Looking at the shell script's stdout from the last two runs, it looks like the command ran - the output is: download: s3://s3file to usr/localfile. But when I compared the copies, they didn't match. The changed timestamp on the file when I view it on the local machine via WinSCP (a file transfer client) didn't change either
I manually ran the command in a shell just now and it copied the file from S3 to the local machine and successfully got the latest copy.
Do I need to add a specific option for this, or is it typical behavior for files to not override a file after aws s3 cp?

rsync folders where target folders has the same files, only already compressed

I am at an impass with my knowledge about bash scripting and rsync (over SSH).
In my use case there is a local folder with log files in it. Those logfiles are rotated every 24 hours and receive a date-stamp in their filename (eg. logfile.DATE) while the current one is called logfile only.
I'd like to copy those files to another (remote) server and then compress those copied log files on this remote server.
I'd like to use rsync to ensure if the script does not work once or twice that there are no files skipped (so I would like not to mess with dates and date abbriviations if not nessecary).
However, if I understand correctly, all files would be rsynced, because the already rsynced files do not "match" the rsync algorithm because they are compressed....
How can I avoid that the same file is being copied again, when this very file is on the remote location (only alraedy compressed).
Does someone have an idea or a direction I should focus my research on this?
Thank you very much
best regards
When you do the rotation, you rename logfile to logfile.DATE. As part of that operation, use ssh mv to do the same on the archive server at the same time (you can even tell the server to compress it then).
Then you only ever need to rsync the current logfile.
For example, your rotate operation goes from this:
mv logfile logfile.$(date +%F)
To this:
mv logfile logfile.$(date +%F)
ssh archiver mv logfile logfile.$(date +%F) && gzip logfile.$(date +%F)
And your rsync job goes from this:
rsync logdir/ archiver:
To this:
rsync logdir/logfile archiver:

Resources