Rsync stop when failover - shell

I have two cpanel servers(A->B) with failover configured in dnsmadeeasy. I have right now setup a rsync to sync the /home/account folder every 4 hours from A->B.
So when A fails, B takes over with a backlog of 4 hours of data in server A.
My problem is when A comes back to normal from a failure, the rsync in B overwrites the data from A since the rsync is A->B.
I like to know what is the best method to prevent the rsync from running after the first failover so that I can manually handle the rsync. I am thinking of a shell script which will try to access a text file in server A, which if results in failure will stop the cron from running.
Is this a good way to handle this, or is there a easier way?

Well, I have done something similar on a group of servers I have at the office. An overview of what I have found to work well is simply to run a cron script that keeps the status of each of the other servers in a temporary status file and the status is updated with calls to ping.
Specifically, the routine works by maintaining a list of hosts to be included in the check. Each host (except for the name matching the machine running the cron job) has a status file maintained in the /tmp directory called hoststatus.$HOSTNAME. Each status file contains either up or down. (if the status file does not exists, it is created during the check process and assumed up). The status files themselves provides a local means of checking the status of each remote host for any script before running it.
The cron job that checks the status, reads the status file for each remote host and provides the status to a case statement. For the case where status is up a call is made to the remote host with ping -c1 hostname. If the ping succeeds, then the script exits (remote host is up). If the ping fails, then the script waits 20 seconds (to insure the remote isn't rebooting, etc.. and checks again. If the second call succeeds, the status remains up and the script exits. If the second call to ping fails, the wait for 20 seconds repeats and retests. If the third test fails, then the status file is written down and the remote host is considered down.
Continuing in the case statement, if the initial status was down, a simple check is made with ping. If it succeeds, status is changed to up, if it fails, it remains down.
A log file is also kept that reflects each change of status to provide a running history of server availability.
Something similar would work for you case. If server A goes down, sever B could write a simple log in a similar fashion something like rsynchold.hostA that is checked before rsync is run between either A->B or B->A. This would allow you manual intervention with the first rsync after a failure -- at which time you could reset the rsynchold.hostA file.
This isn't elegant, but it has proven fairly foolproof over the past several years.

Related

What is the correct use case for SchedulerLock lockAtMostFor?

I am using SchedulerLock in Spring Boot And I am using 2 servers.
What I'm curious about is why is "lockAtMostFor" an option that exists?
Take an example: on one of my 2 servers, the schedule runs first and then locks.
But something went wrong while running, and my server went down.
At this moment, my scheduled task ends incompletely.
Any guide I read is full of vague answers about "lock time in case a node dies".
When a node dies, it can no longer execute schedules.
But why keep holding a LOCK for a dead node?
Even if I urgently try to manually execute the schedule on the 2nd server, it is impossible to manually execute it because of the above lock.
What are options that exist for?

slurm - action_unknown in pam_slurm_adopt

What does "source job" refer to in the description of action_unknown?
action_unknown
The action to perform when the user has multiple jobs on the node
and the RPC does not locate the **source job**. If the RPC mechanism works
properly in your environment, this option will likely be relevant only
when connecting from a login node. Configurable values are:
newest (default)
Pick the newest job on the node. The "newest" job is chosen based
on the mtime of the job's step_extern cgroup; asking Slurm would
require an RPC to the controller. Thus, the memory cgroup must be in
use so that the code can check mtimes of cgroup directories. The user
can ssh in but may be adopted into a job that exits earlier than the
job they intended to check on. The ssh connection will at least be
subject to appropriate limits and the user can be informed of better
ways to accomplish their objectives if this becomes a problem.
allow
Let the connection through without adoption.
deny
Deny the connection.
https://slurm.schedmd.com/pam_slurm_adopt.html
slurm_pam_adopt will try to capture an incoming SSH session into the cgroup corresponding to the job currently running on the host. This option is meant to decide what to do when there are several jobs running for the user that initiates the ssh command.
The 'source job' is the jobid of the process that initiates the ssh call. Typically, if you use an interactive ssh session from the frontend, there is not 'source job', but if the ssh command is run from within a submission script, then the 'source job' is the one corresponding to that submission script.

Implement Multiple client reads a file and multiple servers writes to a file via Client Server

Below is the question, I was asked in an interview,
Datacenter has 10000 servers.We have a single syslog driver which collates all the logs from all the servers in the datacenter and writes it in a single file called syslog.log
Let's say the datacenter has 1000 Admins.At any point of time any admin can login to syslog server and invoke a command say
getlog --serverid --severity
And the command should continuously tail the logs matching the conditions provided by the user untill he interupts.
Any number of users can concurrently login to this server and run this command. His request should be honoured, but with one condition, at any given point in time there can be only one file descriptor open for syslog.log file.
Implement getlog such that it satisfies the above condition.
I told my approach as Critical Section problem, we can use mutex/semaphore to lock the file until a user finishes. But he is expecting something like Client-Server Model.
How to serve this functionality using client and server architecture?
What is the best approach to solve this?

Keep laravel queue:work running while in jailshell

I'm having issues keeping the queue:work command running on my server. I tried nohup, but as soon as I close the terminal (which times out every 5 minutes or so no matter what I've tried) the process goes away.
I thought about running a script in cron to kick off the nohup command, however that runs in jailshell too so I have no way of seeing if the process is still running from a previous cron or not and I don't want a potential 20k copies of this running because it's trying to kick off every minute.
I also don't have access to install software to install Supervisord.
So, what other solutions can I use to ensure this stays running?
EDIT I contacted the support for my host, and pretty much it looks like there are no real alternatives for me. I think I'm going to have to set this project up on Linode, or rework things to not have queuing tasks.
It seems that the problem resides in the shell configuration, because the command ps is rewritten to show only the children process.
The solution is to ask your hosting provider (or change it yourself if allowed) to set this variable:
SHELL="/bin/bash"
This simple fix allowed me to have the function working properly.
Now my Kernel.php looks as follows:
$command = "ps faux | grep queue:work";
exec($command, $task_list);
// Process are duplicate in ps and show also the command as two single lines
$running_process = (count($task_list) / 2) - 1;
if($running_process < 1)
$schedule->command('queue:work --queue=high,low --tries=3')
->everyMinute();
else if($running_process > 5)
// If too many are active, restart everything to avoid overload
$schedule->call(function(){
Artisan::call('queue:restart');
})->everyMinute();
This code makes sure that at least one worker is always running, and at the same time forces a restart if you have more that 5 workers active.

deny parallel ssh connection to server for specific host / IP

I have a bot machine (controlled via mobile device) which
connects to the Server and fetch information from it by method os
"ssh, shell script, os commands,sql queries etc" than it feed that
information over the internet (private)
I want to disallow this multiple connection to the server via the
bot machine ONLY.. there are other machine which connects to the server which must not be affected
Suppose
Client A from his mobile acess bot machine (via webpage) than the bot
machine connect to server (1st session) now if the process of this
connection is 5 minute during this period the bot machine will be
creating, quering, deleting, appending, updating etc
in the very mean time of that 5 minute duration (suppose 2min after
the 1st session started) Client B from his mobile access bot machine
(via webpage) than the bot machine connect to server (2nd session) now
it will conflict with the 1st session and create Havoc...
Limitation
Now first of all i do not want to editing any setting on the SERVER
ANY WHAT SO EVER
I do not want to edit the webpage/mobile etc
I already know abt the lock file method of parallel shell script and
it is implemented at script level but what abt the OS commands and
stuff like that which are not in bash script
My Thougth
What i thougt was whenever we create a connection with server it
create a process named what ever (SSH) which is viewable in ps -fu
OSUSER so by applying a unique id/tag/name to our connection we can
identify if one session is active or not. This will be check as soon
as the bot connects to the server. But i do not know how to do
that.... Please also suggest any more information over it.
Also is there is way to identify if the existing process is hanged or
the time of the process started or elapsed?
Maybe try using limits.conf to enforce a hard limit of 1 login for the user/group.
You might need a periodic cron job to check for and remove any stale logins.
Locks/mutexes are hard to get right and add complexity. Limits.conf is a standard feature of most unix/linux systems and should be more reliable, emphasis on should...
A similar question was raised here:
https://unix.stackexchange.com/questions/127077/number-of-ssh-connections-on-a-single-linux-machine
Details here:
http://linux.die.net/man/5/limits.conf
I assume you have a single login for the ssh account and that this runs a script on login
Add something like this to the script at login
#!/bin/bash
LOCK_FILE="/tmp/sshlock"
trap "rm $LOCK_FILE; exit" SIGHUP SIGINT SIGTERM
if [ $(( (`date +%s` - `stat -L --format %Y $LOCK_FILE`) < (30*60) )) ]; then
exit 0
fi
touch $LOCK_FILE
When the processes that the ssh login calls end, delete the $LOCK_FILE
The trap statement is an important part of this way of locking, please do use it
The "30*60" is a 30 minute timeout, thanks to the answer on this question How can I tell if a file is older than 30 minutes from /bin/sh?

Resources