rsync hangs after transfer over ssh - bash

I wrote a bash script that backs up files from a webserver (HostGator) to a local file server running FreeBSD.
I use rsync over ssh (from the file server) to connect to the remote server (I already have pre-shared rsa keys setup). When I run the following line to start the sync, the files all seem to come in just fine, but the command never returns and the script just hangs forever:
/usr/local/bin/rsync -az --chown=root:admin --chmod=ugo=rwX --exclude ".inode_lock" --rsh='ssh -p2222' admin#domain.com:/home/admin/ '/mnt/blah/blah/LocalBackup/' >> "./Logs/Backup Log.txt"
After waiting a few minutes, when I hit Ctrl+C to stop the command, I poops out the following error messages:
^CKilled by signal 2.
rsync error: unexplained error (code 255) at rsync.c(636) [generator=3.1.2]
rsync error: received SIGUSR1 (code 19) at main.c(1429) [receiver=3.1.2]
This still happens even if the both sides are already synced and it is just checking for changes.
I'm not sure what do to do troubleshoot the problem. I did try removing the -v switch for rsync as some users reported that caused hangs, but I saw no differences.
EDIT
One more additional note. I ran the script again today to continue to troubleshoot. If I leave the script running without disturbing it after it hangs, eventually I receive the following message:
rsync: connection unexpectedly closed (2984632408 bytes received so far) [receiv er]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) [receive r=3.1.2]
rsync: connection unexpectedly closed (8689703 bytes received so far) [generator ]
rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.2]
and then returns back to the command prompt. I'm think this might be due to a timeout on the remote server's end but not sure. But I'm still not sure why the hang is happening though.
UPDATE
I did an additional test and limited the rsync transfer to a specific test folder with some sample files and subfolders, rather than grabbing the entire home directory. When I did this, it was able to successfully complete the transfeer and exit appropriately. So it appears that there must be somee file or folder somewhere in the home directory of the server that is causing the problem. Are there any specific cases where rsync wouldn't be able to transfer a file? I have seen it throw errors while trying to sync files that are write-locked, with a "Permission Denied" error, but even these files didn't stop it from continuing on. Any thoughts?
As an additional note, the remote server I'm connecting to is on a shared hosting account so I don't have root access. I don't know if this could be causing some problems?
UPDATE 2
So I studied the rsync command and added a couple more commandline parameters --progress and --stats (along with --verbose) so I could better understand where it is dying. What I noticed now is that when running the command, where it was hanging was on a verrry large file that was being downloaded from the server. But now with the --progress being reported (I am having it output directly to the terminal for the moment rather than a file), it seems to be moving along just fine, with no hangups so far.
I am now beginning to suspect that maybe the ssh connection is timing out or something due to inactivity? Especially since in the original situation, nothing gets output from the function for a long time while the large file transfer is happening. Is this a possible scenario? If so, what could I do to hold the connection open? (I'm not sure it's a good idea to print the --progress updates directly to the log file).

OK, I figured it out. Apparently HostGator's Shared servers have an SSH timeout limit of 30-45 minutes set by default. Since running rsync took longer than that limit, it was closing the connection on. I called and spoke to their tech support and they got it increased for my server.

Related

Stop SFTP/FTP connection while the active connection is not transferring data in Unix Shell Scripting

I'm having an issue but I can't see something related with this.
I'm having an issue, something is happening while I'm performing an FTP connection with a server is transferring a file, but for some reason sometimes is stuck but I would like to prevent have the connection opened, there is a way to see if the FTP connection is not transferring, close the connection?
I really don't have any code due I'm not sure if this is possible,
Any idea what can I do at this point?
If it is closing the connection while you are transferring files, then it's either your FTP/SFTP client, server, or network. First, Switch to a different FTP/SFTP client. Some have more tools for analysis than others. I have had to do this before. If that doesn't work, check the internet connection or contact your system/network administrator.
there is a way to see if the FTP connection is not transferring, close
the connection?
If you are downloading a file, you can indirectly see the FTP transferring by watching the file's size:
name=$1
size=0
while sleep 10
set -- `ls -s $name`
[ "$1" -gt $size ]
do size=$1
done
exit 1
The above script (let's call it growing) runs while the file (passed as a parameter) grows.
In your script you could write something like
growing file || pkill ftp &
before you start the FTP. If the file stops growing for ten seconds, ftp would be killed and the connection thereby closed. If ftp terminates normally, you could kill $! or just let growing end.

Cannot successfully disconnect from remote machine using 'nohup' or 'screen'

I am trying to do some work on a remote machine and disconnect without terminating the work. I have tried both nohup and screen, unfortunately it is not working out. After I type exit to logout my work also terminates immediately.
I am trying to run 108 simulations on a remote machine. For that purpose I have written a script named batch.sh which runs one simulation after the other until all 108 are done. The program that actually runs a simulation launches 5 programs in 5 different terminals (using xterm -e). I run batch.sh using:
nohup bash batch.sh &
As long as I am connected everything works just fine. If I disconnect and then reconnect to check whether everything is working as it should...no joy :(
Are there any caveats I am overlooking? Possibly because my program launches other programs in external terminals?
UPDATE
If I use the suggestions of adding -oForwardX11=no to ssh and unset DISPLAY before launching my script I get these errors:
nohup: ignoring input and appending output to nohup.out
In nohup.out I have these messages:
xterm Xt error: Can't open display:
xterm: DISPLAY is not set
Apparently your script/program is trying to launch xterm on its own. These days many systems enable X11 forwarding for their SSH client by default - as a result the DISPLAY variable is set in your shell session but becomes invalid once you disconnect. Therefore, as long as you are connected to the remote system, the xterm processes can access the X server on your local machine through the SSH connection, but die once that connection is severed.
I have occasionally encountered the same issue with Java programs that use e.g. the Java AWT subsystem to generate image files, even when there is no actual graphical window. You should first see if your program will somehow adapt if there is no X server available. One option is to disable X11 forwarding with the -oForwardX11=no option to ssh:
$ ssh -oForwardX11=no user#server.host.name
You could also try unsetting the DISPLAY environment variable before starting your script and see what happens.
However, if your program is launching xterm windows indiscriminately then you'd have to make it e.g. use an output file on the server instead - by modifying it, if necessary. As an added advantage, you would get rid off the network load and timing overhead involved with forwarded X connections.
If you cannot change the way your program works and you do not actually care about the output in those xterm windows, then you could try launching a virtual framebuffer X server on the remote system and have your script use that for xterm.

WinSCP script will not exit after error, even with "option batch abort"

For some reason when my network drops my script will not abort.
Here's the command:
"%~dp0\winscp.exe" /console /script="script.txt"
exit
and the script.txt:
option batch abort
option confirm off
open ftp://user:pass#ftp.site.com/
cd /directory/
synchronize local
exit
When I pull my network cable (to test for network drop) I get:
The requested name is valid, but no data of the requested type was found.
Connection failed.
(A)bort, (R)econnect (5 s):
..
(A)bort, (R)econnect (0 s): Reconnect
It will continue to try and reconnect indefinitely.
Why doesn't the script auto abort? I am using option batch abort. Am I missing something?
This error is recoverable, so WinSCP keeps trying to resume the transfer even with the option batch abort as documented:
...
When batch mode is set to on any choice prompt is automatically replied negatively. Unless the prompt has a different default answer (such as a default “Reconnect” answer for a reconnect prompt), in what case the default answer is used (after a short time interval). See also a reconnecttime option below.
A value abort is like the on. ...
As mentioned above, you can configure, how long WinSCP tries to reconnect using the option reconnecttime. By default WinSCP tries to reconnect for 2 minutes in the on/abort mode.
Note that WinSCP used to reconnect indefinitely in older versions, by default. You must be using some old version.

rsync suddenly hanging indefinitely during transfers

For the past few years, I have been using an rsync one-liner to back up important folders on my Mac Mini desktop (OSX 10.9, 2.5 GHz i5, 4 GB RAM) to a FreeNAS box (0.7.2 Sabanda revision 5266, Pentium D 2.66 GHz, 822MiB RAM [reported by the system, I think there's 1 GB in there]). I am running an rsync daemon on the FreeNAS box. Recently, these transfers have been hanging indefinitely. I have done the usual Google-fu and am unable to identify the source of the problem or a solution.
The one-liner is:
rsync -rvOlt --exclude '.DS_Store' \
--exclude '.com.apple.timemachine.supported' \
--delete /Volumes/Storage/Music/Albums/ 192.168.1.100::albums
I have tried enabling -vvv and --progress, but there is no pattern that I can discern between what hangs and what doesn't. Heck, if I retry, the same file might hang at a different point during the transfer or not at all. A dry run (-n) does not always succeed either. The only "success" I've had is implementing a timeout (--timeout=10) and rerunning the command over and over. Eventually, I creep along, but with no guarantee of success and at a pace that is unacceptable. I've reached a point where I have one file that I can't get past.
The Mac Mini is connected to my router via 5 GHz. The FreeNAS box is wired into that same router on a 100 mbit port. When transfers are actually going, rsync --progress reports 2.5-4 MB/s. According to --progress, a hang is literally just that—no data transfer is occurring as far as I can tell.
I need help with both the diagnostics and the solution.
I was having the same problem. Removing -v didn't work for me. My use-case is slightly different in that I'm going from source (EXT4) to ExFAT. The issue for me was that rsync was attempting to preserve device files and permissions, which ExFAT doesn't support. I was using the -hrltDvaP switches. The -D and -a switches seemed to be my problem. The -a switch translates to -rlptgoD (no -H,-A,-X). The -p, -g, and -o switches seemed to be my root cause as rsync was barfing on one or all of those during runtime. Removing -a and specifying -Prltvc switches explicitly is working for me.
bkupcmd="nice -n$nicelevel /usr/bin/rsync -Prltvc --exclude-from=/var/tmp/ignorelist "
I've been running into the same thing again and again and it seems to help if you drop the -v option (which is annoying if you need that output).
Try using --whole-file/-W.
This command disables the rsync delta-transfer algorithm.
That is what worked for us (WSL to OSX)
our full sync flags were -avWPle
(e was because we were using ssh, and that has to be the last flag)
This happened to me when the remote device ran out of space. The error wouldn't show when --verbose option was used; turning that off yielded some STDERR output that explained that the remote device was out of space. When I freed some space, I was able to run rsync again with --verbose and everything went fine.
I am using openSUSE 13.2 Linux, rsync version 3.1.1-2.4.1.x86_64, and I experienced similar problems, doing an rsync between my laptop and an external hard disk, with the destination device definitively having enough free space.
I thought I got an improvement omitting option -v, but after 10 minutes it was hanging again: strace said:
select(5, [], [4], [], {60, 0}) = 0 (Timeout)
And with "iotop" I counld see confirm that the rsync processes did no significant disk IO any more.
Neither removing the -v option nor limiting the bandwidth using --bwlimit fixed the problem.
Just had a similar problem while doing rsync from harddisk to a FAT32 USB drive. rsync froze already in less than a second in my case and did not react at all after that ... left it with CTRL+C.
Found out that the problem was a combination of usage of hardlinks on the harddisk and having FAT32 filesystem on the USB drive, which does not support hardlinks.
Formatting the USB drive with ext4 solved the problem for me.
In my situation rsync was not actually failing.
I have regular server backups which transfers large files over 500GB+ and have --append-verify or --checkusm over ssh parameters specified.
What I have found upon analysis is that once the client side completes it's file checks then the server side checks start. Which means while the server is doing it's checks the client side will appear hanged and frozen - run htop on the server to rsync working away.
This is likely a non issue if rsync is run in deamon mode on the server and using the rsync protocol instead of ssh for transfers.
On related note, this very LONG wait would trigger SSH timeout and a rsync: connection unexpectedly closed (254 bytes received so far) [sender] error message, sollution is to add ClientAliveInterval 120 and ClientAliveCountMax 720 to /etc/ssh/sshd_config.
I've seen this quite often on 3.0.9 on a directory with hardlinks, but it also happened on 3.1.3.
There is a nice analysis in Debian bug 820916: when its internal sockets are congested with errors, rsync could go into a deadlock.
This might have been fixed in a 3.2 release just a few days ago (Jun 2020):
Avoid a hang when an overabundance of messages clogs up all the I/O buffers.
The only good workaround I can think of is, if the problem is not persistent, then put timeout in front of it: timeout rsync <args> <source> <destination>, then retry. If it is persistent for you, you're the lucky one who can debug it :D
It also happens when the user on target machine has not write permissions on target folder.
You can try giving write permission to others target folder:
sudo chmod -R o+w /path/to/target-folder
In my case, it was the IPC (Intrusion Protection Component) in our firewall. It sees all the TCP SYN packets as a flood attack and kills the connection. I left a rsync over NFS session open and turned off the IPC for the servers firewall rule and it starting working again right away.
rsync -ravh /source /destination
When it happened I was not able to kill the rsync session. It locked up the NFS mount and I would have to reboot the client machine to get it to work again. The strange thing is it would copy some files over then all of a sudden stop. It always seemed to stop on the same file. So I was looking for file issues, permission issues, TCP offloading issues, tried removing the -v in the rsync call. If you are having this issue at least in my case it even happened with a simple.
cp -rp /source /destination
So I knew then to start looking at other factors. So if you have any sort of intrusion protection on a firewall or router between the servers you can try turning that off temporarily to see if it solves your issue as well.
Most likely not "your" problem, but I stumbled upon this question when I was researching a similar behavior:
I'm observing "hanging" when the target site has too much io load. e.G. on one of my small business servers, when someone is resyncing his IMAP account and downloading large batchs of data and a backup job runs that writes his data.
In this situation I notice a steep drop in performance for rsync. Noticeable in a high load value in top on the target machine, even though CPU and Mem are fine.
Waiting for the process to finish has helped every time or interrupting and attempting the rsync at a later time again.
I was having the same problem and it was because I was running out of memory during the rsync. Created a swap file and problem solved.
Had rsync hanging issue on Ubuntu 16. None of the options above helped. The problem was in the source drive (external SSD) which suddenly became faulty. I tried several disk checks, but all of them stuck. Ended up rebooting the system and disk suddenly became accessible again.
Holger Ohmacht aka h8ohmh / 8ohmh:
The problem lies in the filesystem buffer / usage of the interworking of harddisk/hw so far as I could investigate.
Temporal solution for local drives (eg. USB3<->HD) : A script which is polling the changing disk space. If no changing free disk space then rsync is stalled and has to be restarted
cmd="rsync -aW --progress --stats --preallocate --super \
<here your source dir> \
<here your dest dir>"
eval "$cmd" &
rm ./ndf.txt
rm ./odf.txt
while [[ 0 == 0 ]]; do
df > ./ndf.txt
cmp ./odf.txt ./ndf.txt
res="$?"
echo "$res"
if [[ $res == 0 ]]; then
echo "###########################################"
ls -al "./ndf.txt"
ls -al "./odf.txt"
killall rsync
eval "$cmd" &
else
cp ./ndf.txt ./odf.txt
fi
sleep 60
done
Change <source dir> etc to your paths!
In my case it is always stalling by usage of rsync's --preallocate option (normally because of better disk performance and rescueing continuous blocks), so as long as the disk and filesystem drivers not reworked there just this solution

In bash, how do I reestablish sftp connection and run it in a script that makes nautilus do it?

I have a bash script that tests whether the sftp connection exists, very simple one:
$ if [ -d ~/.gvfs/sftp for username on 192.168.1.101 ]; then echo "sftp missing" exit; fi
Now heres the question:
How do I make the script reestablish the previously connected sftp that still has a cached pass to reconnect without having it depend on if the bash script is on?
Since I have a bookmarked sftp thing in nautilus, i just point and click, presto its reconnected. I need the same for my script which will TERMINATE in a couple of lines; in other words the script only reconnects nautilus and dies, connection stays open...
I am still noobish at sftp, besides connecting...
Extra info: I use Ubuntu for both client and server, and i dont mind entering the ssh pass again if its new conection, any help is appreciated :D
Its critical that sftp wont d/c, or die, when i close script, or it ends, nohup cant be used for script since it will be run >10 times per day
Thanks!
Okay, some research done. You are using the GVFS (GNOME Virtual File System), and are looking with a none-GNOME application (bash) on the FUSE mount point of one of the URIs.
I think you can use the gvfs-mount command to reconnect, if you know the SFTP URL, but I didn't really find much documentation about this.

Resources