Pacemaker custom resource agent does not execute script on failover - high-availability

I am implementing a test cluster with 2 nodes.
I want to execute my script when failover occours.
I create my resource fro Dummy RA named "script".
When I test with the command "pcs resource debug-start script" it works.
But when failover occurs, pacemaker send the resource to other node but the script doesn't run as in the test.
I modified th RA Dummy adding the script path as follow:
failOverScript_start() {
failOverScript_monitor
if [ $? = $OCF_SUCCESS ]; then
/usr/local/bin/ScriptTest.sh
return $OCF_SUCCESS
fi
The script is in right path and has all permissions. Could someone help me please?
Thanks

Related

Databricks init scripts halting

I am trying to install confluent kafka on my databrick drivers and using init scripts there.
I am using below command to write an script to DBFS like below:
%python
dbutils.fs.put("dbfs:/databricks/tmp/sample_n8.sh",
"""
#!/bin/bash
wget -P /dbfs/databricks/tmp/tmp1 http://packages.confluent.io/archive/1.0/confluent-1.0.1-2.10.4.zip
cd /dbfs/databricks/tmp/tmp1
unzip confluent-1.0.1-2.10.4.zip
cd confluent-1.0.1
./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties &
exit 0
""")
The I edit my intiscripts and add en entry there to denote to above location
[![init scripts entry adding][1]][1]
However, when I try to run my cluster it nevers starts and it always halts. If I go to event log, it shows that it is stuck at 'Starting init scripts execution.'
I know there should be tweak in my script to run it on the background but even I am using & at the end of the start command for zookeper.
Can someone give me any hint how to resolve above?
[1]: https://i.stack.imgur.com/CncIL.png
EDIT: I guess this question could be the same if I ask how I can run my script in a %sh databricks cell while the cell can finish the running of above bash script, but at the moment it always telling me that the command is running

LSF - automatic job rerun using sasbatch script

I am trying to create an auto-rerun mechanism by implementing some code into sasbatch script after sascommand will finish. General idea is to:
locate a log of sas process and an id of the flow containing current job,
check if the log contains particular ORA-xxxxx errors which we know that solution for them is just rerun of the process,
if so, then trigger jrerun class from LSF Platform Command Line Interface,
exit sasbatch passing $rc to LSF
The idea was implemented as:
#define used paths
log_dir=/path/to/sas_logs_directory
out_log=/path/to/auto-rerun_log.txt
out_log2=/path/to/lsf_rerun_log.txt
if [ -n "${LSB_JOBNAME}"]; then
if [ ! -f "$out_log"]; then
touch $out_log
fi
#get flow runtime attributes
IFS-: read -r flow_id username flow_name job_name <<< "${LSB_JOBNAME}"
#find log of the current process
log_path=$(ls -t $log_dir/*.log | xargs grep -li "job:\s*$job_name" | grep -i "/$flow_name_" | head -1)
#set path to txt file containing lines which represents ORA errors we look for
conf_path-/path/to/error_list
#analyse process' log line by line
while read -r line;
do
#if error is found in log then try to rerun flow
if grep -q "$line" $log_path; then
(nohup /path/to/rerun_script.sh $flow_id >$out_log2 2>&1) &
disown
break
fi
done < $conf_path
fi
While rerun_script is the script which calls jrerun class after sleep command - in order to let parent script exit $rc in the meanwhile. It looks like:
sleep 10
/some/lsf/path/jrerun
Problem is that job is running for the all time. In LSF history I can see that jrerun was called before job exited.
Furthermore in $out_log2 I can see message: <flow_id> has no starting or exit points.
Do anyone have an idea how I can pass return code to LSF before jrerun calling? Or maybe some simplier way to perform autorerun of SAS jobs in Platform LSF?
I am using SAS 9.4 and Platform Process Manager 9.1
Or maybe some simplier way to perform autorerun of SAS jobs in Platform LSF?
I'm not knowledgeable about the SAS part. But on the LSF side there's at least a couple of ways to requeue the job.
If you have control of the job script, you can use special process exit value to automatically requeue the job.
https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_admin/job_requeue_about.html
If you have control outside of the job script, you can use brequeue -r to requeue a running job.
https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_command_ref/brequeue.1.html
Good Luck
I managed to get this working by using two additional configuration files. When my grep returnes 1 I add found flow_id to flow_list.txt configuration file and modify especially made trigger_file.txt.
I scheduled additional flow execute_rerun in LSF which is triggered after file trigger_file.txt is modified. The execute_rerun flow reads flow_list.txt configuration file line by line and calls jrerun method on each flow.
I managed to achieve an automatic rerun of the flows which fails due to particular errors.

Chef run sh script

I have a problem trying to run shell script via Chef (with docker-provisioning).
This is how I try to execute my script:
bash 'shell_try' do
user "root"
run = "#{some_path_to_script}/my_script.sh some_params"
code " #{run} > stdout.txt 2> stderr.txt"
end
(note that this script should run another scripts, processes and write logs)
Here's no errors in the output, but when I log into machine and run ps aux process isn't running.
I guess something wrong with permissions (or env variables), because when I try the same command manually - it works.
A bash resource just runs the provided script text directly, if you wanted to run a long-running process generally you would set up an Upstart or systemd service and use the service resource to start it.
Finally find a solution (thanks to #coderanger) -
Install supervisor:
Download supervisor cookbook
Add:
include_recipe 'supervisor::default'
Add my service to supervisor:
supervisor_service "name" do
action :enable
#action :start
command '/path/script.sh start'
end
Run supervisor service
All done!
Please see the Chef documentation for your resource: https://docs.chef.io/resource_bash.html. The bash resource does not support a run attribute. Text of the code attribute is run as a bash script. The default action is to run the script unless told otherwise by the resource.
bash 'shell_try' do
user "root"
code " #{run} > stdout.txt 2> stderr.txt"
action :run
end
The code attribute is written to a temporary file where it is then run using the attributes specified in the resource.
The line run = "#{some_path_to_script}/my_script.sh some_params" at this point does nothing.

Starting service as non root user

Can someone help us understand how to properly start our programs service as the services user (marty for example).
We're using init.d to start our process (java application), but when the system(s) boot (Ubuntu and Debian) because the service script is run as root, we're having problems where the application is starting as root too and the PID file is being created by root which is messing things up.
We tried using sudo, but this is not a great solution as we dont want the sudo process running too with our application as a child process plus we need this to work on other systems that may not have sudo. Please help.
In the init script, you can check the $UID of the calling user.
If it is root, you can run the service with "runuser". If it is marty - run it directly, if it is another user - exit with error for example.
Here's some example bash (untested):
start() {
if [ $UID -eq 0 ]; then
runuser -s /bin/bash marty -c "$DAEMON start $DAEMONOPTS"
elif [ "$USER" = "marty" ]; then
$DAEMON start $DAEMONOPTS
else
print "Please run me with root or marty."
exit 2
fi
}
Same for stop and any other functions as required.
Feel free to modify the runuser command as necessary, maybe you won't need the shell for example.
Use start-stop-daemon which accepts a user name and an executable as parameters.

SLES HA. How to run bashscript from cluster?

I want to run simple unmount command before starting my "file-system" service on my node.
Is there any way to call bash script as cluster service?
You can run any script as a cluster service. The script needs to be LSB compliant: i.e. needs to know start, stop, restart, status, etc. I usually copy something simple from /etc/init.d and modify it for myself.
Put the script in /etc/ha.d/resource.d
Test it from command line
# sh /etc/ha.d/resource.d/start (see if it unmounts)
Now if you haven't already, create resource group. Add in all your resources into the same group. Then add your new script in the resource group. You can configure constraints so that all resources depend on your first application resource to run first.
That's about it. You don't need to actually have anything configured for stop and status except "exit" since you just want your script to run once (to unmount)
Here is a script that might work in /etc/ha.d/resource.d
#!/bin/sh
#
# description: testapp auto start-stop script.
#
. /etc/rc.status
case "$1" in
start)
umount [filesystem]
;;
stop)
;;
reload*|restart*|force-reload*)
;;
status)
;;
*)
echo "options: start|stop|reload|restart|force-reload|status"
exit 1
;;
esac
exit
Hope that helps
I have an outline of my steps for building clusters plus an ebook here:
http://geekswing.com/geek/building-a-two-node-sles11-sp2-linux-cluster-on-vmware/

Resources