How to ensure AWS S3 cli file sync fully works without missing files - bash

I'm working on a project that takes database backups from MongoDB on S3 and puts them onto a staging box for use for that day. I noticed during a manual run today I got this output. Normally it shows a good copy of each files but today I got a connection reset error or something one of the files, *.15, was not copied over after the operation had completed.
Here is the AWS CLI command that I'm using:
aws s3 cp ${S3_PATH} ${BACKUP_PRODUCTION_PATH}/ --recursive
And here is an excerpt of the output I got back:
download: s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-production.10
to ../../data/db/myorg-production/myorg-production.10
download: s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-production.11
to ../../data/db/myorg-production/myorg-production.11
download: s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-production.12
to ../../data/db/myorg-production/myorg-production.12
download: s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-production.13
to ../../data/db/myorg-production/myorg-production.13
download s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-production.14
to ../../data/db/myorg-production/myorg-production.14
download failed: s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-produc
tion.15 to ../../data/db/myorg-production/myorg-production.15 ("Connection broken: error(104, 'Connection reset by peer')", error
(104, 'Connection reset by peer'))
download: s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-production.16
to ../../data/db/myorg-production/myorg-production.16
How can I ensure that the data from the given S3 path was fully copied over to the target path without any connection issues, missing files, etc? Is the sync command for the AWS tool a better option? Or should I try something else?
Thanks!

Related

AWS Beanstalk Laravel post deploy hooks no such file or directory

I'm trying to deploy laravel app to aws beanstalk, OS is Amazon Linux 2 AMI.
I've setup following files:
.ebextensions/01-deploy-script-permission.config
It contains below code:
container_commands:
01-storage-link:
command: 'sudo chmod +x .platform/hooks/postdeploy/post-deploy.sh'
And
.platform\hooks\postdeploy/01-post-deploy.sh
It contains below code:
php artisan optimize:clear
Upon deploying it fails with following entry in eb-engine.log file
[ERROR] An error occurred during execution of command [app-deploy] -
[RunAppDeployPostDeployHooks]. Stop running the command. Error:
Command .platform/hooks/postdeploy/post-deploy.sh failed with error
fork/exec .platform/hooks/postdeploy/post-deploy.sh: no such file or
directory
This answer is for users who are using Windows to deploy their files to elastic beanstalk.
I found this information after spending 6 precious hours. Probably not documented anywhere in official documentations
As per this link "https://forums.aws.amazon.com/thread.jspa?threadID=321653"
psss: most important that the file is saved with LF line separator.
CRLF makes "no file or directory found"
So I used Visual Studio Code to convert CRLF to LF for files in .platform/hooks/postdeploy
At the bottom right of the screen in VS Code there is a little button
that says “LF” or “CRLF”: Click that button and change it to your
preference.
I don't know for sure but I think you are running the command before the files are even created hence getting the following error.
A while ago I faced the same kind of problem where I wrote migration commands in .ebextension and it used to give me an error because my env file wasn't even created yet hence no DB connection is made so I was getting the error. Hope this will give you a direction.
By the way, I resolved the problem by creating env then pushing these commands through the pipeline.

azure cli on osx is failing to authenticate

I have to download some files from azure to local, using a Mac.
I have been given this Windows command line:
AzCopy /Source:https://XXX.blob.core.windows.net/YYY /SourceKey:TQSxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxpbA== /Dest:C:\myfolder /Pattern:c /S
I have downloaded and installed azcopy, but it has a radically different syntax, and despite I've been trying for quite some time, I haven't been able to make it work.
What's the correct syntax, given this one?
Looking at some documentation, I've tried:
azcopy cp "https://XXX.blob.core.windows.net/YYY/TQSxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxpbA==" "azcopy_dest" --recursive
but it doesn't work:
failed to perform copy command due to error:
cannot start job due to error: cannot list blobs for download. Failed
with error ->
github.com/Azure/azure-storage-azcopy/vendor/github.com/Azure/azure-storage-blob-go/azblob.NewResponseError,
/go/src/github.com/Azure/azure-storage-azcopy/vendor/github.com/Azure/azure-storage-blob-go/azblob/zz_generated_response_error.go:28
===== RESPONSE ERROR (ServiceCode=ResourceNotFound) ===== Description=The specified resource does not exist.
From you description it seems you are using AZCopy 10, which mean you do not need to specify key. You either need to generate a SAS token or Login before using azcopy.
create a token
azcopy cp "https://XXX.blob.core.windows.net/YYY?[SAS]" "/path/to/dir" --recursive=true
Login azcopy login --tenant-id "your tenantid"
azcopy cp "https://XXX.blob.core.windows.net/YYY" "/path/to/dir" --recursive=true
Used Linux, do not have a Mac but should be the same across all platform.
Hope this helps.

aws s3 cli not working in window task scheduler

I try to run the following aws cli command in console it working correctly.
I have aws access key and secret configured.
aws s3 sync "C:\uploadfolder" s3://uploadfolder
However, when i run it inside windows task scheduler in windows 10 as well as windows server 2012, I got the following error:
cannot find the file specified 0x80070002
It does not seems like it is a corrupted profile because it does not work for both windows and other command is running as expected.
Is there any step that I miss out? or any other special command needed when run aws cli in window task scheduler.
Your cli command is attempting to sync a FILE called "uploadfolder". You need to change to the directory first, then run the command. Your command should instead be:
cd C:\uploadfolder
aws s3 sync . s3://uploadfolder/
This will recursively copy all files in your local directory that are not in your s3 bucket. If you would also like the sync command to delete files that are no longer in the local directory, you also need to add the --delete flag.
aws s3 sync . s3://uploadfolder/ --delete

YAML exception: unacceptable character '' (0x0)

This error appears on Elastic Beanstalk after uploading (with a zip) a new version to Elastic Beanstalk! with a file .ebextensions/singlehttps.config that sets the https for a single instance server.
If you're doing the Amazon AWS workshop LAB:
https://github.com/awslabs/eb-node-express-signup
ie. Upload and Deploying your Elastic Beanstalk app
and getting this PROBLEM error:
*ERROR Failed to deploy application.
*ERROR The configuration file __MACOSX/.ebextensions/._setup.config in application version 1.1.0 contains invalid YAML or JSON. YAML exception: Invalid Yaml: unacceptable character '' (0x0) special characters are not allowed in "", position 0, JSON exception: Invalid JSON: Unexpected character () at position 0.. Update the configuration file.
*INFO Environment update is starting.
SOLUTION
This is because MACOS includes some extra hidden folders which you need to exclude from your ZIP file. To do this, run this command in terminal on your zip:
$ zip -d nameofyourzipfile.zip __MACOSX/\*
Now re-upload, and you should get a success message:
INFO Environment update completed successfully.
INFO New application version was deployed to running EC2 instances.
Hope this solved your issue!
The reason for this problem in the Elastic Beanstalk system was in fact in the zip that is created in the Mac osx platform.
if you upload the new version with eb deploy command and not by zipping the application, then the problem doesn't appear!
Hope this helps someone, as it has been troubling me for so long!!
When you zip folders on MACOSX, it will add its own hidden files in there alongside yours.
If you want to make a zip without those invisible Mac resource files such as “_MACOSX” or “._Filename” and .ds store files, use the “-X” option in the zip command
$ zip -r -X archive_name.zip folder_to_compress
If this is a pre-existing zip file, you can use the command others here have mentioned
$ zip -d nameofyourzipfile.zip __MACOSX/\*
Work around on Mac
Since it opens up the zip file and when you compress it, Elastic Beanstalk gives the error mentioned above. If you run command in previous comments to remove MACOSX related stuff, it still gave me an error about one of the files not found.
Workaround is that before doing the download, rename the zip file to some other extension and change to zip once its on the Mac.
When you upload this file to Elastic Beanstalk, it will work fine.

Download large files in Heroku

I am facing some issues when downloading large files in Heroku. I have to download and parse files greater than 1Gb. What I am trying to do right now, is use curl to download them into /tmp folder (of a Rails application).
The curl command is: "curl --retry 999 -o #{destination} #{uri} 2> /dev/null" and destination is Rails.root.join("tmp", "file.example")
The problem is that after a few minutes downloading, the "curl" process that is downloading the file is finished, way far from the download is finished. Before being finished, the logs show lots of "Memory exceeded". This led me to the thinking that when I am saving to /tmp folder, it is storing the downloaded content in the memory and when it memory hit its limit, the process is killed.
I would like to know if any of you have already experienced a similar issue on Heroku and if saving to /tmp folder really works like this. If so, do you have any suggestions to get this working at Heroku?
thanks,
Elvio
You are probably better off saving the file in an external cloud provider like S3 using the fog gem. In any case, Heroku is a read only filesystem, so they won't allow you to curl, must less write to it.

Resources