Parallel uploads to the same s3 bucket directory with s3cmd - bash

I try to run the following code but index.js turns out to be corrupted.
Any idea why?
gzip dist/production/index.js
mv dist/production/index.js.gz dist/production/index.js
s3cmd --access_key="$S3_ACCESS_KEY" --secret_key="$S3_SECRET_KEY" \
--acl-public --no-mime-magic --progress --recursive \
--exclude "dist/production/index.js" \
put dist/production/
"s3://${BUCKET}/something/${BUILD_IDENTIFIER}/production/" &
s3cmd --access_key="$S3_ACCESS_KEY" --secret_key="$S3_SECRET_KEY" \
--acl-public --no-mime-magic --progress --recursive \
--add-header="Content-Encoding:gzip" \
put dist/production/index.js
"s3://${BUCKET}/something/${BUILD_IDENTIFIER}/production/" &
wait
Notice the & in the end of the two commands that makes two uploads to the same location in parallel.
Edit:
It works fine without parallalizing the process and making them running in the background. I wanted to make the process faster so i upload the heavy gzipped index.js while the other files are uploaded.
Edit2:
What I get in the index.js that is uploaded is gibberish content like this:
��;mS�H���W �7�i"���k��8̪
Edit3:
Looks like the problem was with how I used exclude. It excludes relatively to the uploaded folder and not to the working directory.
--exclude "dist/production/index.js" ==> --exclude index.js
fixed it.

Is your problem is not with the line?
cp dist/production/index.js.gz dist/production/index.js
You are copying gzipped file, not the plain index.js text file.
Hope it helps.
EDIT1:
If you are doing it on purpose why not maintain the gz extension. Extensions does lot of things when you handle with browser.
cp dist/production/index.js.gz dist/production/index.js.gz
If you use plain s3 to download and verify the hash, they should be the same file. I did verify it.

Related

How to keep AWS cli sync from copying files that are listed in the --exclude switch

I am uploading my photos from a Windows machine to an S3 bucket and each directory has a "Thumbs.db" file which I don't need to copy to the S3 bucket.
I have tried several variations of the --exclude switch to no avail. The documentation in both the help and online mentions using --exclude but there doesn't seem to be any exceptions and the Thumbs.db files keep getting uploaded. Any clues as to what I'm doing wrong?
aws s3 sync "c:/users/myaccount/pictures" "s3://bucketname/pictures" --exclude "Thumbs.db*"
But in each directory the "Thumbs.db" file keeps getting uploaded to the S3 bucket.
--exclude "Thumbs.db*" will exclude only root level Thumbs.db.* files. If they are in sub-folders, the sub-folders' names will be considered as part of the file key/name (e.g. ./a/b/Thumbs.db.info which is not coveted by "Thumbs.db*".
Thus you should try:
--exclude "*Thumbs.db*"

Copy files from source to destination but deleting any files in destination but NOT in source

So I am cloning one folder to another using Bash. Currently my script is recursive and noclobber. Works great.
!cp -r -n /content/gdrive/Shared\ drives/Source/. /content/gdrive/Shared\ drives/Destination
This copies just fine. I just am looking for a way to delete any files if NOT on the Source drive but IS on the Destination drive. Maybe I need an entirely different script method?
Edit. I ended up using
!rsync -v -r --ignore-existing /gdrive/Shared\ drives/Source/. /gdrive/Shared\ drives/Destination --delete
Seems to be working for now. I was using -u but it seemed to be re-copying files just because the date changed, not the file itself. Thanks 1218985 for the help!
You can do it like this with rsync:
rsync --delete "/content/gdrive/Shared\ drives/Source/" "/content/gdrive/Shared\ drives/Destination/"

How to extract and stream .tar.xz directly to s3 bucket without saving locally

I have a very large (~300GB) .tar.gz file. Upon extracting it (with tar -xzvf file.tar.gz), it yields many .json.xz files. I wish to extract and upload the raw json files to s3 without saving locally (as I don't have space to do this). I understand I could spin up an ec2 instance with enough space to extract and upload the files, but I am wondering how (or if) it may be done directly.
I have tried various versions of tar -xzvf file.tar.gz | aws s3 cp - s3://the-bucket, but this is still extracting locally; also, it seems to be resulting in json.xz files, and not raw json. I've tried to adapt this response from this question which zips and uploads a file, but haven't had any success yet.
I'm working on Ubuntu16.04 and quite new to linux, so any help is much appreciated!
I think this is how I would do it. There may be more elegant/efficient solutions:
tar --list -zf file.tar.gz | while read -r item
do
tar -xzvfO file.tar.gz $item | aws s3 cp - s3://the-bucket/$item
done
So you're iterating over the files in the archive, extracting them one-by-one to stdout and uploading them directly to S3 without first going to disk.
This assumes there is nothing funny going on with the names of the items in your tar file (no spaces, etc.).

After copying files using gsutil, they are not deleted instantly from the local storage

My task is to upload CSV files from the local database to the Google Cloud storage.
To do this, I first copy them to my desktop and then upload them to the Google Cloud storage.
I want this to be done automatically, without my participation. Therefore, I created a CMD file that will be run by Task Scheduler. The structure of the CMD file is the next:
gsutil cp C:\Users\Myname\Desktop\test\*.csv gs://my-bucket
gsutil rm C:\Users\Myname\Desktop\test\*.csv
But after loading data into `Google Cloud storage, it does not delete the CSV files.
However, if you run the delete in a separate command, it successfully deletes files.
Just:
gsutil rm C:\Users\Myname\Desktop\test\*.csv
But I want the download and removal code to be in one file.
I also tried this way (but it did not help me either):
gsutil cp C:\Users\Myname\Desktop\test\*.csv gs://my-bucket
del C:\Users\Myname\Desktop\test\*.csv
What are the solutions to this problem?
The gsutil mv command is designed for this use case.
Note, however, the docs section about atomicity. Especially with moving from your local filesystem to the cloud, there is no way to upload and delete atomically, so the command will first upload, verify the file is stored in the cloud, and then delete the local file.
The problem is cause by gsutil being a script. On Windows, this script (gsutil) exits and stops further processing of commands in your batch file.
The solution is to add the word call in front of gsutil:
call gsutil cp C:\Users\Myname\Desktop\test\*.csv gs://my-bucket
Next, do not use gsutil to delete a local file. Use del instead.

Rsync including all files

After some reading and trying rsync copy over only certain types of files using include option I can't get seem to get it to work.
I run the following command:
rsync -zarv -vvv -e ssh --prune-empty-dirs --delete --include="*/" --include="*.csv"
--include="*.hdf5" --include="*.pickle" --include="*.tar.gz" --include="*.bin"
--include="*.zip" --include="*.npz" --exclude="*" . user#host.com:/rsync
But at the target it backups any file I have in the directory and subdirectories. delete-before and delete-after does not delete files like .txt or .py. I have also tried the --exclude="*" before the extension includes but I am running 2.6.9 so it should be after as far as I have understood it.
Deleting files on the host machine will just sync them again for whatever reason I don't know.
Your command looks fine, although try using --delete-excluded instead of --delete.
--delete-excluded - also delete excluded files from destination dirs
It should eliminate any files that are --excluded and not --included on the destination.
Sorry to have bothered. This was a bash issue and not a command issue.
As I was using the command:
exec $COMMAND
instead of
eval $COMMAND
This made god knows what for error but executing it manually (after print) and correctly in bash made it work. Deleting items still seems flaky but that I can try some.

Resources