How can I confirm if azcopy synced successfully a dir? - bash

I have a group of files and directories stored in a container in a storage account in Azure. I am using the sync operation of azcopy to bring only the files and directories that I am missing into a local directory. When executing the command I use the --delete-destination true flag and the --recursive flag. The command is executed as part of a bash script.
How can I check if the sync process was successful? I've noticed that azcopy doesn't necessarily return a zero exit code even though the sync was successful. Other users have mentioned that checking the exit code with the copy operation of azcopy has worked for them. However, it seems that the story is different with the sync operation.
Currently what I do is delete in my local directory a file that I know will always exist in all Azure containers that I have to sync with azcopy sync. After deleting the file, I run azcopy sync and when finished I check if the file I had deleted was restored... This is clearly not the ideal solution.
I am considering checking the logs from each one of the jobs that azcopy creates, or exploring the --mirror-mode flag or even figuring out if the details provided by --dry-run can help me review if everything went according to plan.
However, all these options seem to be too much for something that should be way much simpler. So, most likely, there is something here that I am missing...

I tried in my environment and got below results:
I tried the same steps in my environment first I copied from storage account to local environment using Azcopy command.
Command:
azcopy copy `"https://venkat123.blob.core.windows.net/container1/<SAS-token>" "C:\Users\xxxx" --recursive`
The above command has copied two files to local environment.
delete in my local directory a file that I know will always exist in all Azure containers that I have to sync with azcopy sync. After deleting the file, I run azcopy sync and when finished I check if the file I had deleted was restored.
I tried the same condition in my environment. I deleted the file is local environment and now I need to sync the file to storage account I used the below commands from local to azure portal.
azcopy sync "C:\Users\xxxxx" "https://venkat123.blob.core.windows.net/container1<SAS-Token>" --recursive --delete-destination=true --mirror-mode
Console:
The above command synced with azure blob container from local environment and it get deleted in azure portal and logs can be seen with above command as below.
azcopy sync "C:\Users\v-vsettu\xxxx" "https://venkat123.blob.core.windows.net/container1<SAS Token>" --recursive --delete-destination=true --mirror-mode
INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support
Job d8d2e3c3-d583-0a4c-6841-da4c919004d0 has started
Log file is located at: C:\Users\v-vsettu\.azcopy\d8d2e3c3-d583-0a4c-6841-da4c919004d0.log
INFO: azcopy.exe: A newer version 10.17.0 is available to download
100.0 %, 1 Done, 0 Failed, 0 Pending, 1 Total, 2-sec Throughput (Mb/s): 0.0103
Job d8d2e3c3-d583-0a4c-6841-da4c919004d0 Summary
Files Scanned at Source: 1
Files Scanned at Destination: 1
Elapsed Time (Minutes): 0.067
Number of Copy Transfers for Files: 1
Number of Copy Transfers for Folder Properties: 0
Total Number Of Copy Transfers: 1
Number of Copy Transfers Completed: 1
Number of Copy Transfers Failed: 0
Number of Deletions at Destination: 0
Total Number of Bytes Transferred: 2575
Total Number of Bytes Enumerated: 2575
Final Job Status: Completed
Portal:
Reference:
azcopy sync | Microsoft Learn

Related

Duplicity Restore Throwing "IsADirectoryError: Is a directory" Error

My Linux machine recently failed and I am trying to restore my files onto a Windows 11 machine. The files were created using Duplicity (the external HD containing the files has hundreds of .difftar.gz and .sigtar.gz files as well as a '.manifest'). Having installed CGWin and the duplicity package, I traverse to my external HD in cgwin...
$ pwd
/cygdrive/e
... and attempt to restore the latest snapshot of my lost directories/files to a temp folder on my Windows 11 machine by running:
duplicity restore file:/// /cygdrive/c/Users/john/OneDrive/Documents/temp
At this juncture, the restoration fails due to a "IsADirectoryError" error.
Warning, found the following remote orphaned signature file:
duplicity-new-signatures.20211221T070230Z.to.20211224T103806Z.sigtar.gz
Warning, found signatures but no corresponding backup files
Warning, found incomplete backup sets, probably left from aborted session
Synchronizing remote metadata to local cache...
Copying duplicity-full-signatures.20211118T103831Z.sigtar to local cache.
Attempt of get Nr. 1 failed. IsADirectoryError: Is a directory
Attempt of get Nr. 2 failed. IsADirectoryError: Is a directory
Attempt of get Nr. 3 failed. IsADirectoryError: Is a directory
Attempt of get Nr. 4 failed. IsADirectoryError: Is a directory
Giving up after 5 attempts. IsADirectoryError: Is a directory
Is there an error in my duplicity command? Do I have corrupted backups? Any assistance in trouble-shooting this would be greatly appreciated!
let's assume that duplicity works (it's not officially supported on windows in any way). never tried it.
say your backup data exists in the root of your external harddrive mounted as E:.
you want to restore the complete last backup into a folder C:\Users\john\OneDrive\Documents\temp\ .
two points
point it to your backup location properly. absolutely that would be /cygdrive/e/ or as url file:///cygdrive/e/
point to your target folder as folder ending with a slash / to signal that the backup is to be restored in there.
taking these points into account a command like
duplicity file:///cygdrive/e/ /cygdrive/c/Users/john/OneDrive/Documents/temp/
should work as expected.
NOTE: you don't need the action command restore as the order of arguments (url before local file system location) tells duplicity already that you want to restore.

Unable to save output from Rscripts in system directory using Devops Pipeline

I am running Rscripts on a self hosted Devops agent. My Windows agent is able to access the system's directory where its hosted. Below is the directory structure for my code
Agent loc. : F:/agent
Source Code : F:/agent/deployment/projects/project1/sourcecode
DWH _dump : F:/agent/deployment/DWH_dump/2021/
Output loca. : F:/agent/deployment/projects/project1/output_data/2021
The agent is using CMD in the devops pipeline to trigger R from the system and use the libraries from the system directory.
Problem statement: I am unable to save the output from my Rscript in to the Output Loca. directory. It give an error as Probable reason: permission denied error by pointing to that directory.
Output File Format: file_name.rds but same issue happens even for a csv file.
Command leading to failure: saverds(paste0(Output loca.,"/",file_name.rds))
Workaround : However I found a workaround, tried to save the scripts at the Source Code directory and then save the same files at the Output Loca. directory. This works perfectly fine but costs me 2 extra hours of run time because I have to save all intermediatory files and delete them in the end. Keeping the intermediatory files in memory eats up my RAM.
I have not opened that directory anywhere in the machine. Only open application in my explorer is my browser where the pipeline is running. I spent hours to figure out the reason but no success. Even I checked the system Path to see whether I have mentioned that directory over there and its not present.
When I run the same script directly, on the machine using Rstudio, I do not have any issues with saving the file at any directory.
Spent 2 full days already. Any pointers to figure out the root cause can save me few hours of runtime.
Solution was to set the Azure Pipeline Agent services in Windows to run with Admin Credentials. The agent was not configured as an admin during creation and so after enabling it with my userid which has admin access on the VM, the pipelines were able to save files without any troubles.
Feels great, saved few hours of run time!
I was able to achieve this by following this post.

Scheduled process to copy files out of S3 into a temp-folder in Ubuntu 18.04

Looking for recommendations for the following scenario:
In an ubuntu 18.04 server, every 1 minute check for new files in an AWS S3 bucket, fetch only the newest file to a temp folder at the end of the day remove them.
It should be automated in bash.
I proposed using aws s3 events notification, queues, lambda but it was defined that is best to keep it simple.
i am looking for recommendations for the steps described below:
For step 1 i was doing aws s3 ls | awk (FUNCTION to filter files updated within the last minute)
then i realized that it was best to do it with grep
0-Cron job should run from 7:00 to 23:00 every minute
1-List the files updated to S3 bucket during the past 1 minute
2-List the files in a temp-encrypted folder in ubuntu 18.03
3-Are the files listed in step 1 already downloaded in folder temp-encrypted from step 2
4-If the files are not already donloaded > download newest files from S3 bucket into temp-encrypted
5-At end of the day 23:00 take a record of the last files fetched from s3
6-run cleanup script at end of the day to remove everything in temp-encrypted
I attach a diagram with the intended process and infrastructure design.
The solution was like this:
Change FTPS to SFTP running in Ubuntu 18.04
change main ports: randomport1 for SSH and randomport2 for SFTP
configure SFTP in sshd_config file
once everything is working create local directory structure
by using a bash script
5.1 List what is in S3 and save in a var
5.2 for each of the files listed in s3 check if there is a new file not present in the mirrored file in the local directory s3-mirror
5.3 if there is new file fetch, touch a file with empy contents in s3-mirror directory just same name, move encrypted file to SFTP and remove fetched S3 file from mirrored local directory
5.4 record successful actions in a log.
So far it works good.

How to ensure AWS S3 cli file sync fully works without missing files

I'm working on a project that takes database backups from MongoDB on S3 and puts them onto a staging box for use for that day. I noticed during a manual run today I got this output. Normally it shows a good copy of each files but today I got a connection reset error or something one of the files, *.15, was not copied over after the operation had completed.
Here is the AWS CLI command that I'm using:
aws s3 cp ${S3_PATH} ${BACKUP_PRODUCTION_PATH}/ --recursive
And here is an excerpt of the output I got back:
download: s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-production.10
to ../../data/db/myorg-production/myorg-production.10
download: s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-production.11
to ../../data/db/myorg-production/myorg-production.11
download: s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-production.12
to ../../data/db/myorg-production/myorg-production.12
download: s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-production.13
to ../../data/db/myorg-production/myorg-production.13
download s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-production.14
to ../../data/db/myorg-production/myorg-production.14
download failed: s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-produc
tion.15 to ../../data/db/myorg-production/myorg-production.15 ("Connection broken: error(104, 'Connection reset by peer')", error
(104, 'Connection reset by peer'))
download: s3://myorg-mongo-backups-raw/production/daily/2018-09-10/080001/data/s-ds063192-a1/myorg-production/myorg-production.16
to ../../data/db/myorg-production/myorg-production.16
How can I ensure that the data from the given S3 path was fully copied over to the target path without any connection issues, missing files, etc? Is the sync command for the AWS tool a better option? Or should I try something else?
Thanks!

set execution permission to files deployed from windows to lambda using serverless

I'm using serveless to deploy lambda function, I need to add an executable bin file but when it is uploaded I don't have executable permissions, also I can't change permissions after deployed, the only thing I can do is to move the file to /tmp and there change the permissions, it works ok but adds a lot of overhead because I have to move the files on every Invoke becasue /tmp is ephemeral.
I know there is a known issue that windows&linux files permission are different, so if you zip a file on windows and unzip it on a linux machines you will have problem with permission, especialy with execution, and that happens when serverless deploys the files.
¿Anyone have a better workaround for this? (rather than "deploy from a windows machine")

Resources