“for” loop to get status of the service in bash script - bash

I have a bash script where it will start the service and checks for the status of the service. But when I start the service the application will take 15 seconds to change the service status from "No" to "Yes" and my bash script returns status as "No" because it checks the status immediately after I start the service. So, I want a "for" loop for my bash script where it should checks for the status for every 1 second and breaks the for loop once it gets the status as "Yes" and should exit after 15 seconds.
app_start=`/usr/bin/AppStart.py`
app_status=`/usr/bin/AppStatus.py | grep -oPm1 "(?<=<Status>)[^<]+"`
if [[ $app_status = "Yes" ]] ; then
echo "Yes"
else
echo "No"
fi
The above command in my bash script will return the status immediately after I start the service.

Let me give you some pseudo-code for solving your issue:
boolean bFinish = false
integer iSeconds = 0
while (NOT(bFinish) AND (iSeconds < 15)) {
bFinish = check_if_app_status_is_Yes(...)
iSeconds = iSeconds + 1
sleep 1
}
For your information: sleep <x> is the correct UNIX/Linux command for pauzing the execution for <x> seconds.

Related

Stop a bash script when awm command returns with failure

I have the following command
ads2 cls create
This command might return two outputs, a reasonable one that looks like:
kernel with pid 7148 (port 9011) killed
kernel with pid 9360 (port 9011) killed
probing service daemon # http://fdt-c-vm-0093.fdtech.intern:9010
starting kernel FDT-C-VM-0093 # http://fdt-c-yy-0093.ssbt.intern:9011 name=FDT-C-VM-0093 max_consec_timeouts=10 clustermode=Standard hostname=FDT-C-VM-0093 framerate=20000 schedmode=Standard rtaddr=fdt-c-vm-0093.fdtech.ssbt tickrole=Local tickmaster=local max_total_timeouts=1000
kernel FDT-C-VM-0093 running
probing service daemon # http://172.16.xx.xx:9010
starting kernel FDT-C-AGX-0004 # http://172.16.xx.xx:9011 name=FDT-C-AGX-0004 max_consec_timeouts=10 clustermode=Standard hostname=FDT-C-AGX-0004 framerate=20000 schedmode=Standard rtaddr=172.16.xx.xx tickrole=Local tickmaster=local max_total_timeouts=1000
kernel Fxx-x-xxx-xxx4 running
>>> start cluster establish ...
>>> cluster established ...
nodes {
node {
name = "FDT-C-VM-xxxx";
address = "http://fxx-x-xx-0093.xxx.intern:xxxx/";
state = "3";
}
node {
name = "xxx-x-xxx-xxx";
address = "http://1xx.16.xx.xx:9011/";
state = "3";
}
}
and an unreasonable one that would be:
kernel with pid 8588 (port 9011) killed
failed to probe service daemon # http://xxx-c-agx-0002.xxxx.intern:90xx
In both ways, I'm passing this output to awk in order to check the state of the nodes in case a reasonable output is returned, otherwise it should exits the whole script (line 28).
ads2 cls create | awk -F [\"] ' BEGIN{code=1} # Set the field delimiter to a double quote
/^>>> cluster established .../ {
strt=1 # If the line starts with ">>> cluster established ...", set a variable strt to 1
}
strt!=1 {
next # If strt is not equal to 1, skip to the next line
}
$1 ~ "name" {
cnt++; # If the first field contains name, increment a cnt variable
nam[cnt]=$2 # Use the cnt variable as the index of an array called nam with the second field the value
}
$1 ~ "state" {
stat[cnt]=$2; # When the first field contains "state", set up another array called stat
print "Node "nam[cnt]" has state "$2 # Print the node name as well as the state
}
END {
if (stat[1]=="3" && stat[2]=="3") {
print "\033[32m" "Success" "\033[37m" # At the end of processing, the array is used to determine whether there is a success of failure.
}
28 else {
29 print "\033[31m" "Failed. Check Nodes in devices.dev file" "\033[37m"
30 exit code
}
}'
some other commands...
Note that this code block is a part of a bash script.
All I'm trying to do is just to stop the whole script (rest following commands) from continuing to execute when it goes inside line 29 in which the exit 1 code should actually do the job. However its not working. In other words. It prints actually the statement Failed. Check Nodes in devices.dev file. However, it continues executing the next commands while i expect the script to stop as the exit command in line 30 should have also been executed.
I suspect your subject Stop a bash script from inside an awk command is what's getting you downvotes as trying to control what the shell that called awk does from inside the awk script is something you can't and shouldn't try to do as that would be a bad case of Inversion Of Control like calling a function in C to do something and that function deciding to exit the whole program instead of just returning a failure status so the calling code can decide what to do upon that failure (e.g. perform recovery actions and then call that function again).
You seem to be confusing exiting your awk script with exiting your shell script. If you want to exit your shell script when the awk script exits with a failure status then you need to write the shell code to tell the shell to do so, e.g.:
whatever | awk 'script' || exit 1
or to get fancy about it:
whatever | awk 'script' || { ret="$?"; printf 'awk exited with status %d\n' "$ret" >&2; exit "$ret"; }
For example:
$ cat tst.sh
#!/usr/bin/env bash
date | awk '{exit 1}' || { ret="$?"; printf 'awk exited with status %d\n' "$ret" >&2; exit 1; }
echo "we should not get here"
$ ./tst.sh
awk exited with status 1

How to make my currently running Autosys Job to FA [Failure] , when command script fails

I have a script which is in Autosys Job : JOB_ABC_S1
command : /ABC/script.sh
Scrpt.sh code
grep -w "ABC" /d/file1.txt
status=$?
if [ $status -eq 0 ]
then
echo "Passed"
else
echo "Failed"
exit 1
fi
My issue is even if the script failed or pass , the AutoSys job is marked as SU SUCCESS
I don't want it to mark it as success , if script fail's .. it should mark AutoSys as FA and if script pass then mark job to SU SUCCESS
What should i change in the script to make it happen ?
Job :
insert_job : JOB_ABC_S1
machine : XXXXXXXXXXX
owner : XXXXXXXX
box_name : BOX_ABC_S1
application : XXXX
permission : XXXXXXXXXXX
max_run_alarm : 60
alarm_if_fails : y
send_notification : n
std_out_file : XXXXX
std_err_file : XXXXX
command : sh /ABC/script.sh
At first look all seems to be fine.
However, i would suggest a script modification which you can try out.
By default Autosys fails the jobs if the exit code is non-zero unless specified.
JOB JIL seems to be fine.
Please update your script as below and check for 2 things:
Executed job EXIT-CODE: either it should be 1 or 2. We are trying to fail the job in both the cases.
Str log files
Script:
#!/bin/sh
srch_count=$(grep -cw ABC /d/file1.txt)
if [ $srch_count -eq 0 ]; then
echo "Passed"
#exit 0
exit 2
else
echo "Failed"
exit 1
fi
This way we can confirm if the exit code is correctly being captured by Autosys.

Tcsh Script Last Exit Code ($?) value is resetting

I am running the following script using tcsh. In my while loop, I'm running a C++ program that I created and will return a different exit code depending on certain things. While it returns an exit code of 0, I want the script to increment counter and run the program again.
#!/bin/tcsh
echo "Starting the script."
set counter = 0
while ($? == 0)
# counter ++
./auto $counter
end
I have verified that my program is definitely returning with exit code = 1 after a certain point. However, the condition in the while loop keeps evaluating to true for some reason and running.
I found that if I stick the following line at the end of my loop and then replace the condition check in the while loop with this new variable, it works fine.
while ($return_code == 0)
# counter ++
./auto $counter
set return_code = $?
end
Why is it that I can't just use $? directly? Is another operation underneath the hood performed in between running my custom program and checking the loop condition that's causing $? to change value?
That is peculiar.
I've altered your example to something that I think illustrates the issue more clearly. (Note that $? is an alias for $status.)
#!/bin/tcsh -f
foreach i (1 2 3)
false
# echo false status=$status
end
echo Done status=$status
The output is
Done status=0
If I uncomment the echo command in the loop, the output is:
false status=1
false status=1
false status=1
Done status=0
(Of course the echo in the loop would break the logic anyway, because the echo command completes successfully and sets $status to zero.)
I think what's happening is that the end that terminates the loop is executed as a statement, and it sets $status ($?) to 0.
I see the same behavior with both tcsh and bsd-csh.
Saving the value of $status in another variable immediately after the command is a good workaround -- and arguably just a better way of doing it, since $status is extremely fragile, and will almost literally be clobbered if you look at it.
Note that I've add a -f option to the #! line. This prevents tcsh from sourcing your init file(s) (.cshrc or .tcshrc) and is considered good practice. (That's not the case for sh/bash/ksh/zsh, which assign a completely different meaning to -f.)
A digression: I used tcsh regularly for many years, both as my interactive login shell and for scripting. I would not have anticipated that end would set $status. This is not the first time I've had to find out how tcsh or csh behaves by trial and error and been surprised by the result. It is one of the reasons I switched to bash for interactive and scripting use. I won't tell you to do the same, but you might want to read Tom Christiansen's classic "csh.whynot".
Slightly shorter/simpler explanation:
Recall that with tcsh/csh EACH command (including shell builtin) return a status. Therefore $? (aliases to $status) is updated by 'if' statements, 'for' loops, assignments, ...
From practical point of view, better to limit the usage of direct use of $? to an if statement after the command execution:
do-something
if ( $status == 0 )
...
endif
In all other cases, capture the status in a variable, and use only that variable
do-something
something_status=$?
if ( $something_status == 0 )
...
endif
To expand on the $status, even a condition test in an if statement will modify the status, therefore the following repeated test on $status will not never hit the '$status == 5', even when do-something will return status of 5
do-something
if ( $status == 2 ) then
echo FOO
else if ( $status == 5 ) then
echo BAR
endif

wait not working in shell script?

I am running a for loop in which a command is run in background using &. In the end i want all commands to return value..
Here is the code i tried
for((i=0 ;i<3;i++)) {
// curl command which returns a value &
}
wait
// next piece of code
I want to get all three returned value and then proceed.. But the wait command does not wait for background processes to complete and runs the next part of code. I need the returned values to proceed..
Shell builtins have documentation accessible with help BUILTIN_NAME.
help wait yields:
wait: wait [-n] [id ...]
Wait for job completion and return exit status.
Waits for each process identified by an ID, which may be a process ID or a
job specification, and reports its termination status. If ID is not
given, waits for all currently active child processes, and the return
status is zero. If ID is a a job specification, waits for all processes
in that job's pipeline.
If the -n option is supplied, waits for the next job to terminate and
returns its exit status.
Exit Status:
Returns the status of the last ID; fails if ID is invalid or an invalid
option is given.
which implies that to get the return statuses, you need to save the pid and then wait on each pid, using wait $THE_PID.
Example:
sl() { sleep $1; echo $1; return $(($1+42)); }
pids=(); for((i=0;i<3;i++)); do sl $i & pids+=($!); done;
for pid in ${pids[#]}; do wait $pid; echo ret=$?; done
Example output:
0
ret=42
1
ret=43
2
ret=44
Edit:
With curl, don't forget to pass -f (--fail) to make sure the process will fail if the HTTP request did:
CURL Example:
#!/bin/bash
URIs=(
https://pastebin.com/raw/w36QWU3D
https://pastebin.com/raw/NONEXISTENT
https://pastebin.com/raw/M9znaBB2
)
pids=(); for((i=0;i<3;i++)); do
curl -fL "${URIs[$i]}" &>/dev/null &
pids+=($!)
done
for pid in "${pids[#]}"; do
wait $pid
echo ret=$?
done
CURL Example output:
ret=0
ret=22
ret=0
GNU Parallel is a great way to do high-latency things like curl in parallel.
parallel curl --head {} ::: www.google.com www.hp.com www.ibm.com
Or, filtering results:
parallel curl --head -s {} ::: www.google.com www.hp.com www.ibm.com | grep '^HTTP'
HTTP/1.1 302 Found
HTTP/1.1 301 Moved Permanently
HTTP/1.1 301 Moved Permanently
Here is another example:
parallel -k 'echo -n Starting {} ...; sleep 5; echo done.' ::: 1 2 3 4
Starting 1 ...done.
Starting 2 ...done.
Starting 3 ...done.
Starting 4 ...done.

Bash's wait command does not return 0 if a child exits non-zero

It seems that bash's wait doesn't honor set -e as I would expect. Or it somehow loses track of the child process exiting with an error. Consider the example.
set -e # exit immediately on error
function child()
{
if [ $1 -eq 3 ]; then
echo "child $1 performing error"
# exit 1 ## I also tried this
false
else
echo "child $1 performing successful"
true
fi
echo "child $1 exiting normally"
}
# parent
child 1 & # succeeds
child 2 & # fails
child 3 & # succeeds
wait # why doesn't wait indicate an error?
echo "Launch nukes!" # don't want this to execute if a child failed
I want the set -e semantics, but wait doesn't seem to honor them.
A parent launches three children. One of them chokes and exits with an error (honoring the set -e). The problem is that the parent process plunders on as if nothing bad happened silently. I.e. I want to propagate the error.
Is there a way to enable this behavior? I.e. have wait return non-zero if any child exits non-zero.
If you are using bash 4.3, you can use wait -n in a loop to wait for each child in turn. You won't know which child failed, but whenever one does fail, the exit status of wait will be non-zero.
child 1 & # succeeds
child 2 & # fails
child 3 & # succeeds
for i in 1 2 3; do
wait -n
done
According to the Man page, that's what should happen (emphasis mine):
If n is not given, all currently active child processes are waited
for, and the return status is zero.
You want the second form where you specify wait {pid1} {pid2} {...} and get back the right error code.

Resources