Problem: We generally face a problem where ubuntu os gets mounted readonly. Reason is clear as mentioned in fstab on errors=remount-ro.
Question: Is there any mechanism to reboot the appliance if it comes to readonly mounted state.
Tried: I tried to write a script as below which will get monitored by watchdog. This works but it continuously reboot if script return exit 1 due to any mount point is still readonly. What i expect is to check if uptime if less then a day then it should not reboot even though any mount point is readonly?
root#ubuntu1404:/home/ubuntu# cat /rofscheck.sh
#!/bin/bash
now=`date`
echo "------------"
echo "start : ${now}"
up_time=`awk '{print int($1)}' /proc/uptime`
#if uptime is less than 1 day then skip test
if [ "$up_time" -lt 86400 ]; then
echo "uptime is less ${now}, exit due to uptime"
exit 0
fi
grep -q ' ro' /proc/mounts > /dev/null 2>&1 || exit 0
# alert watchdog that fs is readonly.
exit 1
Now in /etc/watchdog.conf below config is done.
test-binary = /rofscheck.sh
To reproduce the problem to mount readonly all mounted fs, ran this:
$ echo u > /proc/sysrq-trigger
which does emergency remount readonly.
this script worked for me, even if it is much similar to yours.
I am running it on Ubuntu 14.04 64bit with latest updates.
#!/bin/bash
now=$(date)
up_time=$(awk '{print int($1)}' /proc/uptime)
min_time=7200
#if uptime is less than 2 hours then skip test
if [ ${up_time} -lt ${min_time} ]; then
echo "uptime is ${up_time} secs, less than ${min_time} secs (now is ${now}): exit 0 due to uptime"
exit 0
fi
exit=$(grep ' ro,' /proc/mounts | wc -l)
if [ ${exit} -gt 0 ]; then
exit 1
else
exit 0
fi
Please note that I have setup the minimum time as a variable, adjust it at your convenience.
A small important notice:
I am using this solution on a very cheap couple of cloud servers I own that I use for testing purposes only.
I have also setup filesystem check at every reboot with the following command:
tune2fs -c 1 /dev/sda1
I would never use this kind of watchdog usage in a production environment.
Hope this helps.
Best regards.
Related
I am trying to make the below script to execute a Restore binary between hours 17:00 - 07:00 for each folders which name starts with EAR_* in /backup_local/ARCHIVES/ but for some reason it is not working as expected, meaning that the for loop is not breaking if the time condition gets invalid.
Should I add the while loop inside the for loop?
#! /usr/bin/bash
#set -x
while :; do
currenttime=$(date +%H:%M)
if [[ "$currenttime" > "17:00" ]] || [[ "$currenttime" < "07:00" ]]; then
for path in /backup_local/ARCHIVES/EAR_*; do
[ -d "${path}" ] || continue # if not a directory, skip
dirname="$(basename "${path}")"
nohup /Restore -a /backup_local/ARCHIVES -c -I 0 -force -v > /backup_local/$dirname.txt &
wait $!
if [ $? -eq 0 ]; then
rm -rf $path
rm /backup_local/$dirname.txt
echo $dirname >> /backup_local/completed.txt
fi
done &
else
echo "Restore can be ran only outside working hours!"
break
fi
done &
your script looks like this in pseudo-code:
START
IF outside workinghours
EXIT
ELSE
RUN /Restore FOR EACH backupdir
GOTO START
The script only checks the time once, before starting a restore run (which will call /Restore for each directory to restore in a for loop)
It will continue to start the for loop, until the working hours start. Then it will exit.
E.g. if you have restore 3 folders to restore, each taking 2 hours; and you start the script at midnight; then the script will check whether it's outside working hours (it is), and will start the restore for the first folder (at 0:00), after two hours of work it will start the restore the 2nd folder (at 2:00), after another two hours it will start the restore of the 3rd folder (at 4:00). Once the 3rd folder has been restored, it will check the working hours again. Since it's now only 6:00, that is: outside the working hours, it will start the restore for the first folder (at 6:00), after two hours of work it will start the restore the 2nd folder (at 8:00), after another two hours it will start the restore of the 3rd folder (at 10:00).
It's noon when it does the next check against the working hours; since 12:00 falls within 7:00..17:00, the script will now stop. With an error message.
You probably only want the restore to run once for each folder, and stop proceeding to the next folder if the working hours start.
#!/bin/bash
for path in /backup_local/ARCHIVES/EAR_*/; do
currenttime=$(date +%H:%M)
if [[ "$currenttime" > "7:00" ]] && [[ "$currenttime" < "17:00" ]]; then
echo "Not restoring inside working hours!" 1>&2
break
fi
dirname="$(basename "${path}")"
/Restore -a /backup_local/ARCHIVES -c -I 0 -force -v > /backup_local/$dirname.txt
# handle exit code
done
update
I've just noticed your liberal spread of & for backgrounding jobs.
This is presumably to allow running the script from a remote shell. don't
What this will really do is:
it will run all the iterations over the restore-directories in parallel. This might create a bottleneck on your storage (if the directories to restore to/from share the same hardware)
it will background the entire loop-to-restore and immediately return to the out-of-hours check. if the check succeeds, it will spawn another loop-to-restore (and background it). then it will return to the out-of-hours check and spawn another backgrounded loop-to-restore.
Before dawn you probably have a few thousands background threads to restore directories. More likely you've exceeded your ressources and the process get's killed.
My example script above has omitted all the backgrounding (and the nohup).
If you want to run the script from a remote shell (and exit the shell after launching it), just run it with
nohup restore-script.sh &
Alternatively you could use
echo "restore-script.sh" | at now
or use a cron-job (if applicable)
The shebang contains an unwanted space. On my ubuntu the bash is found at /bin/bash.
Yours, is located there :
type bash
The while loop breaks in my test, replace the #!/bin/bash path with the result of the previous command:
#!/bin/bash --
#set -x
while : ; do
currenttime=$(date +%H:%M)
if [[ "$currenttime" > "17:00" ]] || [[ "$currenttime" < "07:00" ]]; then
for path in /backup_local/ARCHIVES/EAR_*; do
[ -d "${path}" ] || continue # if not a directory, skip
dirname="$(basename "${path}")"
nohup /Restore -a /backup_local/ARCHIVES -c -I 0 -force -v > /backup_local/$dirname.txt &
wait $!
if [ $? -eq 0 ]; then
rm -rf $path
rm /backup_local/$dirname.txt
echo $dirname >> /backup_local/completed.txt
fi
done &
else
echo "Restore can be ran only outside working hours!"
break
fi
done &
I have so little experience with bash scripting that it is laughable.
I have spent 3 days transferring files from a failing HDD (1 of 3 in an LVM) on my NAS to a new HDD. Most (percentage wise) of the files transfer fine, but many (thousands) are affected and instead of failing with an i/o error, they drop the speed down to agonizing rates.
I was using a simple cp command but then I switched to rsync and used the --progress option to at least be able to identify when this was happening.
Currently, I'm manually watching the screen (sucks when we're talking DAYS), ^C when there's a hangup, then copy the file name and paste it into an exclude file and restart rsync.
I NEED to automate this!
I know nothing about bash scripting, but I figure I can probably "watch" the standard output, parse the rate info and use some logic like this:
if rate is less than 5Mbps for 3 consecutive seconds, bail and restart
This is the rsync command I'm using:
rsync -aP --ignore-existing --exclude-from=EXCLUDE /mnt/olddisk/ /mnt/newdisk/
And here is a sample output from progress:
path/to/file.ext
3434,343,343 54% 144.61MB/s 0:00:05 (xfr#1, ir-chk=1024/1405)
So parse the 3rd column of the 2nd line and make sure it isn't too slow, if it is then kill the command, append the file name to EXCLUDE and give it another go.
Is that something someone can help me with?
This is a horrible approach, and I do not expect it to usefully solve your problem. However, the following is a literal answer to your question.
#!/usr/bin/env bash
[[ $1 ]] || {
echo "Usage: rsync -P --exclude=exclude-file ... | $0 exclude-file" >&2
exit 1
}
is_too_slow() {
local rate=$1
case $rate in
*kB/s) return 0 ;;
[0-4][.]*MB/s) return 0 ;;
*) return 1 ;;
esac
}
exclude_file=$1
last_slow_time=0
filename=
too_slow_count=0
while IFS=$'\n' read -r -d $'\r' -a pieces; do
for piece in "${pieces[#]}"; do
case $piece in
"sending incremental file list") continue ;;
[[:space:]]*)
read -r size pct rate time <<<"$piece"
if is_too_slow "$rate"; then
if (( last_slow_time == SECONDS )); then
continue # ignore multiple slow results in less than a second
fi
last_slow_time=$SECONDS
if (( ++too_slow_count > 3 )); then
echo "$filename" >>"$exclude_file"
exit 1
fi
else
too_slow_count=0
fi
;;
*) filename=$piece; too_slow_count=0 ;;
esac
done
done
I am writing a shell script (meant to work with Ubuntu only) that assumes that a disk has been previously open (using the command below) to make operations on it (resize2fs, lvcreate, ...). However, this might not always be the case, and when the disk is closed, the user of the script has to run this line before running the script, asking for his/her passphrase:
sudo cryptsetup luksOpen /dev/sdaX sdaX_crypt
Ideally, the script should start with this command, simplifying the user sequence. However, if the disk was indeed already opened, the script will fail because an encrypted disk cannot be opened twice.
How can I check if the disk was previously open? Is checking that /dev/mapper/sdX_crypt exists a valid solution / enough? If not or not possible, is there a way to make the command run only if necessary?
I'd also suggest lsblk - but since I came here to find some relevant info I did find and thought I'd post here the following command as well:
#: cryptsetup status <device> | grep -qi active
Cheers
You can use the lsblk command.
If the disk is already unlocked, it will display two lines: the device and the mapped device, where the mapped device should be of type crypt.
# lsblk -l -n /dev/sdaX
sdaX 253:11 0 2G 0 part
sdaX_crypt (dm-6) 253:11 0 2G 0 crypt
If the disk is not yet unlocked, it will only show the device.
# lsblk -l -n /dev/sdaX
sdaX 253:11 0 2G 0 part
Since I could not find a better solution, I went ahead and chose the "check if the device exists" one.
The encrypted disk embeds a specific Volume Group (called my-vg for the example), so my working solution is:
if [ ! -b /dev/my-vg ]; then
sudo cryptsetup luksOpen /dev/sdaX sdaX_crypt
fi
I check that /dev/my-vg exists instead of /dev/mapper/sda_cryptX because every other command in my script uses the first one as an argument so I kept it for consistency, but I reckon that this solution below looks more encapsulated:
if [ ! -b /dev/mapper/sdaX_crypt ]; then
sudo cryptsetup luksOpen /dev/sdaX sdaX_crypt
fi
Although the solution I described above works for me, is there a good reason I should switch to the latter one or it doesn't matter?
cryptsetup status volumeName
echo $? # Exit status should be 0 (success).
If you want to avoid displaying cryptsetup output, you can redirect it to /dev/null.
cryptsetup status volumeName > /dev/null
echo $? # Exit status should be 0 (success).
This is a snippet from a script I wrote last night to take daily snapshots.
DEVICE=/dev/sdx
HEADER=/root/luks/arch/sdx-header.img
KEY_FILE=/root/luks/arch/sdx-key.bin
VOLUME=luksHome
MOUNTPOINT=/home
SUBVOLUME=#home
# Ensure encrypted device is active.
cryptsetup status "${VOLUME}" > /dev/null
IS_ACTIVE=$?
while [[ $IS_ACTIVE -ne 0 ]]; do
printf "Volume '%s' does not seem to be active. Activate? [y/N]\n" $VOLUME
read -N 1 -r -s
if [[ $REPLY =~ ^[Yy]$ ]]; then
cryptsetup open --header="${HEADER}" --key-file="${KEY_FILE}" "${DEVICE}" "${VOLUME}"
IS_ACTIVE=$?
else
exit 0
fi
done
This is a follow up to an earlier question about a terminal script for Minecraft that will start up a server with 1 gigabyte of ram, and promptly begin a 30 minute loop that will make frequent backups of the server map.
This is the code I'm currently working with:
cd /Users/userme/Desktop/Minecraft
java -Xmx1024M -Xms1024M -jar minecraft_server.jar & bash -c 'while [ 0 ]; do cp -r /Users/userme/Desktop/Minecraft/world /Users/userme/Desktop/A ;sleep 1800;done'
Now obviously, this loop will save the backups in directory "A" with the name "world". Is there a modification I can make to this code so that it basically counts the amount of loops the script makes, and then applies that count to the end of the backups. For example, world5, or world 12. A modification that can get rid of old backups would be nice as well.
I broke it down into separate lines for better readability:
If you want to put it back all into one line, you can add the ; back in where appropriate
counter=1
while [ 0 ]
do
if [ -e /Users/userme/Desktop/A/world"$counter" ]; then
rm -f /Users/userme/Desktop/A/world"$counter"
fi
counter=$((counter+1))
cp -r /Users/userme/Desktop/Minecraft/world /Users/userme/Desktop/A/world"$counter"
sleep 1800
done
I need help with two scripts I'm trying to make as one. There are two different ways to detect if there are issues with a bad NFS mount. One is if there is an issue, doing a df will hang and the other is the df works but there is are other issues with the mount which a find (mount name) -type -d will catch.
I'm trying to combine the scripts to catch both issues to where it runs the find type -d and if there is an issue, return an error. If the second NFS issue occurs and the find hangs, kill the find command after 2 seconds; run the second part of the script and if the NFS issue is occurring, then return an error. If neither type of NFS issue is occurring then return an OK.
MOUNTS="egrep -v '(^#)' /etc/fstab | grep nfs | awk '{print $2}'"
MOUNT_EXCLUDE=()
if [[ -z "${NFSdir}" ]] ; then
echo "Please define a mount point to be checked"
exit 3
fi
if [[ ! -d "${NFSdir}" ]] ; then
echo "NFS CRITICAL: mount point ${NFSdir} status: stale"
exit 2
fi
cat > "/tmp/.nfs" << EOF
#!/bin/sh
cd \$1 || { exit 2; }
exit 0;
EOF
chmod +x /tmp/.nfs
for i in ${NFSdir}; do
CHECK="ps -ef | grep "/tmp/.nfs $i" | grep -v grep | wc -l"
if [ $CHECK -gt 0 ]; then
echo "NFS CRITICAL : Stale NFS mount point $i"
exit $STATE_CRITICAL;
else
echo "NFS OK : NFS mount point $i status: healthy"
exit $STATE_OK;
fi
done
The MOUNTS and MOUNT_EXCLUDE lines are immaterial to this script as shown.
You've not clearly identified where ${NFSdir} is being set.
The first part of the script assumes ${NFSdir} contains a single directory value; the second part (the loop) assumes it may contain several values. Maybe this doesn't matter since the loop unconditionally exits the script on the first iteration, but it isn't the clear, clean way to write it.
You create the script /tmp/.nfs but:
You don't execute it.
You don't delete it.
You don't allow for multiple concurrent executions of this script by making a per-process file name (such as /tmp/.nfs.$$).
It is not clear why you hide the script in the /tmp directory with the . prefix to the name. It probably isn't a good idea.
Use:
tmpcmd=${TMPDIR:-/tmp}/nfs.$$
trap "rm -f $tmpcmd; exit 1" 0 1 2 3 13 15
...rest of script - modified to use the generated script...
rm -f $tmpcmd
trap 0
This gives you the maximum chance of cleaning up the temporary script.
There is no df left in the script, whereas the question implies there should be one. You should also look into the timeout command (though commands hung because NFS is not responding are generally very difficult to kill).