Delete incomplete downloads with s3cmd

Delete incomplete downloads with s3cmd - download

I'm using s3cmd on a network with bad Internet connection. Sometimes, it stops in the middle of a sync resulting in half finished files/corrupted files. Normally, I'd like it to simply continue where it paused, but in this case due to various reasons I don't want to have incomplete downloads.
Is there any way of telling s3cmd to never save local files that haven't been completely downloaded, and instead start over next time it's time to sync?

Alternatively you can use https://github.com/minio/mc to delete incomplete uploads
$ mc ls https://s3.amazonaws.com/<yourbucket> incomplete
You can selectively delete on or recursively all incomplete uploads
$ mc rm https://s3.amazonaws.com/<yourbucket>/... incomplete force
To continue from where left off mc already implements session based uploads for any network disconnection and willful termination. mc saves local sessions of your copy/mirror operation.
You can easily resume them as well with
$ mc session resume <resume_id>

Related

Transfer from smb to next cloud

I have already smb network share set up. Every user has its own home folder that is shared. Now I want to switch to nextcloud as smb is quite slow when using vpn. Probably there is a way to fix it but as know nextcloud is faster and I'm not a network expert its just to big of a time waste. Now I want to keep my old smb structure and have the files shared from smb and from next cloud. But next cloud is not aware in case files are added from smb. How can tell next cloud to "scan" for new files? I'm guessing there is some command that I can run to check if new files are added.

To enable this, you need to perform a full file scan with Nextcloud. As the folders get bigger, the file scans take more and more time, which is why an update trigger for newly added files is not worth it. The only other option is to run a cronjob once or twice a day at times when the cloud is least likely to be used.
To configure the cronjob for www-data the following changes have to be made:
type "sudo crontab -u www-data -e" in your terminal.
append the following line to trigger a file scan every day at 2am "0 2 * * * php "path_to_nextcloud"/occ files:scan --all".
Now all files in the data directory of Nextcloud will be scanned every day at 2am.
If you are not familiar with setting up a cronjob, use this site https://crontab.guru/

Downloading and Transferring Files Simultaneously with Limited EC2 Storage

I am using an EC2 instance in AWS to run a bash script that downloads files from a server using a CLI while simultaneously moving them into S3 using the AWS CLI (aws s3 mv). However, I usually run out of storage before I can do this because the download speeds are faster than the transfer speeds to S3. The files, which are downloaded daily, are usually hundreds of GB and I do not want to upgrade storage capacity if at all possible.
The CLI I am using for the downloads runs continuously until success/fail but outputs statuses to the console (when I run it from command line instead of .sh) as it goes. I am looking for a way to theoretically run this script based on the specifications given. My most recent attempt was to use something long the lines of:
until (CLI_is_downloading) | grep -m 1 "download complete"; do aws s3 mv --recursive path/local_memory s3://path/s3; done
But that ran out of memory and the download failed well before the move was finished.
Some possible solutions that I thought of are to somehow run the download CLI until I reach a certain point of memory available before switching to the transfer and then alternating back and forth. Also, I am not too experienced with AWS so I am not sure this would work, but could I limit the download speed to match the transfer speed (like network throttling)? Any advice on the practicality of my ideas or other suggestions on how to implement this would be greatly appreciated.
EDIT: I checked my console output again and it seems that the aws s3 mv --recursive only moved the files that were currently there when the function was first called and then stopped. I believe if I called it repeatedly until I got my "files downloaded" message from my other CLI command, it might work. I am not sure exactly how to do this yet so suggestions would still be appreciated but otherwise, this seems like a job for tomorrow.

Update extension for multiple files at once on Amazon S3

I'm having about 1 million files on my S3 bucket and unfortunately these files were uploaded with wrong extension. I need to add a '.gz' extension to every file in that bucket.
I can manage do that by using aws cli:
aws s3 mv bucket_name/name_1 bucket_name/name_1.gz
This works fine but the script is running so slow since it moves the file one by one, in my calculation it'll take up to 1 week, which is not acceptable.
I wonder if we have any better and faster way to achieve this goal ?

You can try S3 Browser which supports multi thread calls.
http://s3browser.com/
I suspect other tools can do multi thread as well, but the CLI doesn't.

There's no renaming feature for S3 files/bucket so you need to move or copy/delete files. If the files are big, it can indeed be a bit slow.
However there's nothing that prevents you to wait for a request to complete to continue with "renaming" the next file in your list, just process it.

Restarting an interrupted heroku db:pull

I have a decently large DB that I'm trying to pull down locally from heroku via db:pull.
I never can stick around my machine long enough to keep it from going to sleep, effectively killing the connection and terminating the process. GOTO 1.
I know I could change my system settings to stop my computer from sleeping, which would keep the connection alive, but is there a way to continue a previous pull?
Or maybe the solution is just not to use db:pull for a large db.

heroku db:pull supports resuming. When you start a pull it will create a .dat file in your project (and get rid of it when it's completed). You can do:
heroku db:pull --resume FILE # resume transfer described by a .dat file
to start the pull from the previous location.
Heroku pgbackups maybe a better option to grab the large Db file - http://devcenter.heroku.com/articles/pgbackups.
Although I'd be more inclined to prevent your computer from sleeping - just disable the sleep functionality during the downloading from settings/control panel depending on OS.

What to do when a file remains left open when a remote application crashes or forgets to close the file

I have not worked so much with files: I am wondering about possible issues with accessing remote files on another computer. What if the distant application crashes and doesn't close the file ?
My aim is to use this win32 function:
HFILE WINAPI OpenFile(LPCSTR lpFileName, LPOFSTRUCT lpReOpenBuff, UINT uStyle);
Using the flag OF_SHARE_EXCLUSIVE assures me that any concurrent access will be denied
(because several machines are writing to this file from time to time).
But what if the file is left open ? (application crash for example ?)
How to put the file back to normal ?

What if the distant application crashes and doesn't close the file ?
Then the O/S should close the file when it cleans up after the "crashed" application.
This won't help with a "hung" application (an application which stays open but does nothing forever).
I don't know what the issues are with network access: for example if the network connection disappears when a client has the file open (or if the client machine switches off or reboots). I'd guess there are timeouts which might eventually close the file on the server machine, but I don't know.
It might be better to use a database engine instead of a file: because database engines are explicitly built to handle concurrent access, locking, timeouts, etc.

I came across the same problem using VMware, which sometimes doe not release file handles on the host, when files are closed on the guest.
You can close such handles using the handle utility from www.sysinternals.com
First determine the file handle id my passing a part of the filename. handle will show all open files where the given string matches a part of the file name:
D:\sysinternals\>handle myfile
deadhist.exe pid: 744 3C8: D:\myfile.txt
Then close the hanlde using the parameters -c and -p
D:\sysinternals\>handle -c 3c8 -p 744
3C8: File (---) D:\myfile.txt
Close handle 3C8 in LOCKFILE.exe (PID 744)? (y/n) y
Handle closed.
handle does not care about the application holding the file handle. You are now able to reopen, remove, rename etc. the file

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio