rsync suddenly hanging indefinitely during transfers - macos

For the past few years, I have been using an rsync one-liner to back up important folders on my Mac Mini desktop (OSX 10.9, 2.5 GHz i5, 4 GB RAM) to a FreeNAS box (0.7.2 Sabanda revision 5266, Pentium D 2.66 GHz, 822MiB RAM [reported by the system, I think there's 1 GB in there]). I am running an rsync daemon on the FreeNAS box. Recently, these transfers have been hanging indefinitely. I have done the usual Google-fu and am unable to identify the source of the problem or a solution.
The one-liner is:
rsync -rvOlt --exclude '.DS_Store' \
--exclude '.com.apple.timemachine.supported' \
--delete /Volumes/Storage/Music/Albums/ 192.168.1.100::albums
I have tried enabling -vvv and --progress, but there is no pattern that I can discern between what hangs and what doesn't. Heck, if I retry, the same file might hang at a different point during the transfer or not at all. A dry run (-n) does not always succeed either. The only "success" I've had is implementing a timeout (--timeout=10) and rerunning the command over and over. Eventually, I creep along, but with no guarantee of success and at a pace that is unacceptable. I've reached a point where I have one file that I can't get past.
The Mac Mini is connected to my router via 5 GHz. The FreeNAS box is wired into that same router on a 100 mbit port. When transfers are actually going, rsync --progress reports 2.5-4 MB/s. According to --progress, a hang is literally just that—no data transfer is occurring as far as I can tell.
I need help with both the diagnostics and the solution.

I was having the same problem. Removing -v didn't work for me. My use-case is slightly different in that I'm going from source (EXT4) to ExFAT. The issue for me was that rsync was attempting to preserve device files and permissions, which ExFAT doesn't support. I was using the -hrltDvaP switches. The -D and -a switches seemed to be my problem. The -a switch translates to -rlptgoD (no -H,-A,-X). The -p, -g, and -o switches seemed to be my root cause as rsync was barfing on one or all of those during runtime. Removing -a and specifying -Prltvc switches explicitly is working for me.
bkupcmd="nice -n$nicelevel /usr/bin/rsync -Prltvc --exclude-from=/var/tmp/ignorelist "

I've been running into the same thing again and again and it seems to help if you drop the -v option (which is annoying if you need that output).

Try using --whole-file/-W.
This command disables the rsync delta-transfer algorithm.
That is what worked for us (WSL to OSX)
our full sync flags were -avWPle
(e was because we were using ssh, and that has to be the last flag)

This happened to me when the remote device ran out of space. The error wouldn't show when --verbose option was used; turning that off yielded some STDERR output that explained that the remote device was out of space. When I freed some space, I was able to run rsync again with --verbose and everything went fine.

I am using openSUSE 13.2 Linux, rsync version 3.1.1-2.4.1.x86_64, and I experienced similar problems, doing an rsync between my laptop and an external hard disk, with the destination device definitively having enough free space.
I thought I got an improvement omitting option -v, but after 10 minutes it was hanging again: strace said:
select(5, [], [4], [], {60, 0}) = 0 (Timeout)
And with "iotop" I counld see confirm that the rsync processes did no significant disk IO any more.
Neither removing the -v option nor limiting the bandwidth using --bwlimit fixed the problem.

Just had a similar problem while doing rsync from harddisk to a FAT32 USB drive. rsync froze already in less than a second in my case and did not react at all after that ... left it with CTRL+C.
Found out that the problem was a combination of usage of hardlinks on the harddisk and having FAT32 filesystem on the USB drive, which does not support hardlinks.
Formatting the USB drive with ext4 solved the problem for me.

In my situation rsync was not actually failing.
I have regular server backups which transfers large files over 500GB+ and have --append-verify or --checkusm over ssh parameters specified.
What I have found upon analysis is that once the client side completes it's file checks then the server side checks start. Which means while the server is doing it's checks the client side will appear hanged and frozen - run htop on the server to rsync working away.
This is likely a non issue if rsync is run in deamon mode on the server and using the rsync protocol instead of ssh for transfers.
On related note, this very LONG wait would trigger SSH timeout and a rsync: connection unexpectedly closed (254 bytes received so far) [sender] error message, sollution is to add ClientAliveInterval 120 and ClientAliveCountMax 720 to /etc/ssh/sshd_config.

I've seen this quite often on 3.0.9 on a directory with hardlinks, but it also happened on 3.1.3.
There is a nice analysis in Debian bug 820916: when its internal sockets are congested with errors, rsync could go into a deadlock.
This might have been fixed in a 3.2 release just a few days ago (Jun 2020):
Avoid a hang when an overabundance of messages clogs up all the I/O buffers.
The only good workaround I can think of is, if the problem is not persistent, then put timeout in front of it: timeout rsync <args> <source> <destination>, then retry. If it is persistent for you, you're the lucky one who can debug it :D

It also happens when the user on target machine has not write permissions on target folder.
You can try giving write permission to others target folder:
sudo chmod -R o+w /path/to/target-folder

In my case, it was the IPC (Intrusion Protection Component) in our firewall. It sees all the TCP SYN packets as a flood attack and kills the connection. I left a rsync over NFS session open and turned off the IPC for the servers firewall rule and it starting working again right away.
rsync -ravh /source /destination
When it happened I was not able to kill the rsync session. It locked up the NFS mount and I would have to reboot the client machine to get it to work again. The strange thing is it would copy some files over then all of a sudden stop. It always seemed to stop on the same file. So I was looking for file issues, permission issues, TCP offloading issues, tried removing the -v in the rsync call. If you are having this issue at least in my case it even happened with a simple.
cp -rp /source /destination
So I knew then to start looking at other factors. So if you have any sort of intrusion protection on a firewall or router between the servers you can try turning that off temporarily to see if it solves your issue as well.

Most likely not "your" problem, but I stumbled upon this question when I was researching a similar behavior:
I'm observing "hanging" when the target site has too much io load. e.G. on one of my small business servers, when someone is resyncing his IMAP account and downloading large batchs of data and a backup job runs that writes his data.
In this situation I notice a steep drop in performance for rsync. Noticeable in a high load value in top on the target machine, even though CPU and Mem are fine.
Waiting for the process to finish has helped every time or interrupting and attempting the rsync at a later time again.

I was having the same problem and it was because I was running out of memory during the rsync. Created a swap file and problem solved.

Had rsync hanging issue on Ubuntu 16. None of the options above helped. The problem was in the source drive (external SSD) which suddenly became faulty. I tried several disk checks, but all of them stuck. Ended up rebooting the system and disk suddenly became accessible again.

Holger Ohmacht aka h8ohmh / 8ohmh:
The problem lies in the filesystem buffer / usage of the interworking of harddisk/hw so far as I could investigate.
Temporal solution for local drives (eg. USB3<->HD) : A script which is polling the changing disk space. If no changing free disk space then rsync is stalled and has to be restarted
cmd="rsync -aW --progress --stats --preallocate --super \
<here your source dir> \
<here your dest dir>"
eval "$cmd" &
rm ./ndf.txt
rm ./odf.txt
while [[ 0 == 0 ]]; do
df > ./ndf.txt
cmp ./odf.txt ./ndf.txt
res="$?"
echo "$res"
if [[ $res == 0 ]]; then
echo "###########################################"
ls -al "./ndf.txt"
ls -al "./odf.txt"
killall rsync
eval "$cmd" &
else
cp ./ndf.txt ./odf.txt
fi
sleep 60
done
Change <source dir> etc to your paths!
In my case it is always stalling by usage of rsync's --preallocate option (normally because of better disk performance and rescueing continuous blocks), so as long as the disk and filesystem drivers not reworked there just this solution

Related

rsync hangs after transfer over ssh

I wrote a bash script that backs up files from a webserver (HostGator) to a local file server running FreeBSD.
I use rsync over ssh (from the file server) to connect to the remote server (I already have pre-shared rsa keys setup). When I run the following line to start the sync, the files all seem to come in just fine, but the command never returns and the script just hangs forever:
/usr/local/bin/rsync -az --chown=root:admin --chmod=ugo=rwX --exclude ".inode_lock" --rsh='ssh -p2222' admin#domain.com:/home/admin/ '/mnt/blah/blah/LocalBackup/' >> "./Logs/Backup Log.txt"
After waiting a few minutes, when I hit Ctrl+C to stop the command, I poops out the following error messages:
^CKilled by signal 2.
rsync error: unexplained error (code 255) at rsync.c(636) [generator=3.1.2]
rsync error: received SIGUSR1 (code 19) at main.c(1429) [receiver=3.1.2]
This still happens even if the both sides are already synced and it is just checking for changes.
I'm not sure what do to do troubleshoot the problem. I did try removing the -v switch for rsync as some users reported that caused hangs, but I saw no differences.
EDIT
One more additional note. I ran the script again today to continue to troubleshoot. If I leave the script running without disturbing it after it hangs, eventually I receive the following message:
rsync: connection unexpectedly closed (2984632408 bytes received so far) [receiv er]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) [receive r=3.1.2]
rsync: connection unexpectedly closed (8689703 bytes received so far) [generator ]
rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.2]
and then returns back to the command prompt. I'm think this might be due to a timeout on the remote server's end but not sure. But I'm still not sure why the hang is happening though.
UPDATE
I did an additional test and limited the rsync transfer to a specific test folder with some sample files and subfolders, rather than grabbing the entire home directory. When I did this, it was able to successfully complete the transfeer and exit appropriately. So it appears that there must be somee file or folder somewhere in the home directory of the server that is causing the problem. Are there any specific cases where rsync wouldn't be able to transfer a file? I have seen it throw errors while trying to sync files that are write-locked, with a "Permission Denied" error, but even these files didn't stop it from continuing on. Any thoughts?
As an additional note, the remote server I'm connecting to is on a shared hosting account so I don't have root access. I don't know if this could be causing some problems?
UPDATE 2
So I studied the rsync command and added a couple more commandline parameters --progress and --stats (along with --verbose) so I could better understand where it is dying. What I noticed now is that when running the command, where it was hanging was on a verrry large file that was being downloaded from the server. But now with the --progress being reported (I am having it output directly to the terminal for the moment rather than a file), it seems to be moving along just fine, with no hangups so far.
I am now beginning to suspect that maybe the ssh connection is timing out or something due to inactivity? Especially since in the original situation, nothing gets output from the function for a long time while the large file transfer is happening. Is this a possible scenario? If so, what could I do to hold the connection open? (I'm not sure it's a good idea to print the --progress updates directly to the log file).
OK, I figured it out. Apparently HostGator's Shared servers have an SSH timeout limit of 30-45 minutes set by default. Since running rsync took longer than that limit, it was closing the connection on. I called and spoke to their tech support and they got it increased for my server.

How to resume a bash session after an ssh disconnection?

I connected via ssh to remoute server and started very long wget downloading. Then ssh session was broke, and after reconnect new copy of interpretator was created. Now I see wget process in ps, but is it possible to return control to old interpretator? I know that better solution is to use screen for long commands, but is there other way?
No, there is no way to reattach a process to a different terminal if it's not set up to do that (by way of screen / tmux / what have you) in the first place.
As a crude approximation, connecting a debugger to the running process may allow you to interact with it in some limited ways, but in this particular scenario, I don't think it will be beneficial.
If you want to know the progress of your currently running wget, check out the size of the downloaded file, it should be growing. If it doesn't, run killall wget and start over.
Next time, consider running wget --background to prevent the problem from happening. See the wget info page.
This command let Wget to work in the background, and write its progress to log file my.log.
Еhe number of retries 45 (-t options)
wget -t 45 -o my.log http://upload.wikimedia.org/wikipedia/commons/5/51/Google.png &

device /dev/ttyusb0 lock failed: operation not permitted

I was playing around with a router earlier this evening using minicom and I must not have closed it cleanly. Here is the error message that I get when I try to open minicom:
device /dev/ttyusb0 lock failed: operation not permitted
I have two questions, 1) how would I go about getting out of this state, and 2) how do I exit minicom cleanly so that I can avoid this happening again.
I found I was able to fix the situation on my CentOS box by running minicom -S <device> -o and the do the normal exit key sequence (CTRL-a, x).
In your situation it would have been
sudo minicom -S ttyusb0 -o
This cleared the lock files minicom had placed in /var/lock/
Good luck
Ash
I ran into a similar issue with using gtkterm from a remote terminal. I had shutdown the terminal without explicitly terminating gtkterm. The result was that subsequent gtkterm sessions gave me the error:
Device /dev/ttyUSB0 is locked.
Checking the process list via ps did not show any gtkterm processes still running.
I corrected this by simply deleting /run/lock/LCK..ttyUSB0. After doing that, gtkterm was able to open ttyUSB0 successfully.
[root#edge-tc lock]# minicom'
Device /dev/ttyUSB0 lock failed: Operation not permitted.'
Solution:'
Check the process which have locked and kill the process'
[root#edge-tc lock]# fuser /dev/ttyUSB0'
/dev/ttyUSB0: 18328
[root#edge-tc lock]# kill -9 18328
[root#edge-tc lock]#'
[root#edge-tc lock]#'
[root#edge-tc lock]# minicom'
Welcome to minicom 2.1'
The canonical way is to use lockdev. This manages the lock files on a per-device basis in /run/lock/lockdev/ (at least under CentOS 7.x).
lockdev <device> can be used without being root, and returns non-zero if the device has already been locked, in which case it can be unlocked with lockdev -u <device>.
This is apparently obsolete these days, but minicom (at least as of version 2.6.2) still uses it.

In bash, how do I reestablish sftp connection and run it in a script that makes nautilus do it?

I have a bash script that tests whether the sftp connection exists, very simple one:
$ if [ -d ~/.gvfs/sftp for username on 192.168.1.101 ]; then echo "sftp missing" exit; fi
Now heres the question:
How do I make the script reestablish the previously connected sftp that still has a cached pass to reconnect without having it depend on if the bash script is on?
Since I have a bookmarked sftp thing in nautilus, i just point and click, presto its reconnected. I need the same for my script which will TERMINATE in a couple of lines; in other words the script only reconnects nautilus and dies, connection stays open...
I am still noobish at sftp, besides connecting...
Extra info: I use Ubuntu for both client and server, and i dont mind entering the ssh pass again if its new conection, any help is appreciated :D
Its critical that sftp wont d/c, or die, when i close script, or it ends, nohup cant be used for script since it will be run >10 times per day
Thanks!
Okay, some research done. You are using the GVFS (GNOME Virtual File System), and are looking with a none-GNOME application (bash) on the FUSE mount point of one of the URIs.
I think you can use the gvfs-mount command to reconnect, if you know the SFTP URL, but I didn't really find much documentation about this.

what's "sure kill" when talking about NFS mount option?

In the following link
http://www.faqs.org/docs/Linux-HOWTO/NFS-HOWTO.html
It said a process is not killable except by a "sure kill", but what's sure kill?
hard (NFS client mount option)
The program accessing a file on a NFS
mounted file system will hang when the
server crashes. The process cannot be
interrupted or killed (except by a
"sure kill") unless you also
specify intr. When the NFS server is
back online the program will continue
undisturbed from where it was. We
recommend using hard,intr on all NFS
mounted file systems.
I think it means kill -9, though please note that you are reading a quite old howto (as most other TLDP howtos, I think) from 2002 year, and regarding NFS mount options hard,intr are already default and
The intr / nointr mount option is
deprecated after kernel 2.6.25. Only
SIGKILL can interrupt a pending NFS
operation on these kernels, and if
specified, this mount option is
ignored to provide backwards
compatibility with older kernels.

Resources