How Cloud-init can be stopped on first error - provisioning

When I start a linux server with Cloud-init, I have a few scripts in /etc/cloud/cloud.cfg.d/ and they run in reverse alphabetical order
# ll /etc/cloud/cloud.cfg.d/
total 28
-rw-r--r-- 1 root root 173 Dec 10 12:38 00-cloudinit-lifecycle-hook.cfg
-rw-r--r-- 1 root root 2120 Jun 1 2021 05_logging.cfg
-rw-r--r-- 1 root root 590 Oct 26 17:55 10_aws_yumvars.cfg
-rw-r--r-- 1 root root 29 Dec 1 18:22 20_amazonlinux_repo_https.cfg
-rw-r--r-- 1 root root 586 Dec 10 12:38 50-cloudinit-tomcat.cfg
-rw-r--r-- 1 root root 585 Dec 10 12:40 60-cloudinit-newrelic.cfg
The last to execute is 00-cloudinit-lifecycle-hook.cfg, in which I complete the lifecycle for the Auto Scaling Group with a CONTINUE. The ASG fails if it doesn't receive this signal after a given time out.
The issue is that even if there's an error in 50-cloudinit-tomcat.cfg, it still runs 00-cloudinit-lifecycle-hook.cfg instead of stopping
How can I ensure cloud-init stops and never reaches the last script? I would like the ASG to never receive the CONTINUE signal if there's any error.
Here are the files:
EC2 instance user-data:
#cloud-config
bootcmd:
- [cloud-init-per, once, "app-volume", mkfs, -t, "ext4", "/dev/nvme1n1"]
mounts:
- ["/dev/nvme1n1", "/app-volume", "ext4", "defaults,nofail", "0", "0"]
merge_how:
- name: list
settings: [append]
- name: dict
settings: [no_replace, recurse_list]
50-cloudinit-tomcat.cfg
#cloud-config
merge_how:
- name: list
settings: [append]
- name: dict
settings: [no_replace, recurse_list]
runcmd:
- "#!/bin/bash -e"
- set +x
- echo ' '
- echo '# ===================================='
- echo '# Tomcat Cloud Init '
- echo '# /etc/cloud/cloud.cfg.d/'
- echo '# ===================================='
- echo ' '
- echo '#===================================='
- echo '# Run Ansible'
- echo '#===================================='
- echo ' '
- set -x
- ansible-playbook /opt/init-config/tomcat/tomcat-config.yaml
when I run ansible-playbook /opt/init-config/tomcat/tomcat-config.yaml directly in the instance I get an error, and I know it returns 2
ansible-playbook /opt/init-config/tomcat/tomcat-config.yaml #shows errors
echo $? # shows "2"
00-cloudinit-lifecycle-hook.cfg
#cloud-config
merge_how:
- name: list
settings: [append]
- name: dict
settings: [no_replace, recurse_list]
runcmd:
- "/opt/lifecycles/lifecycle-hook-continue.sh"
An alternative I can think of, is to send a ABANDON signal instead of CONTINUE as soon as there's en error in one of the cloud-init config. But I can't find in the documentation on to define if there's an error

Related

unable to execute a bash script in k8s cronjob pod's container

Team,
/bin/bash: line 5: ./repo/clone.sh: No such file or directory
cannot run above file but I can cat it well. I tried my best and still trying to find but no luck so far..
my requirement is to mount bash script from config map to a directory inside container and run it to clone a repo but am getting below message.
cron job
spec:
concurrencyPolicy: Allow
jobTemplate:
metadata:
spec:
template:
metadata:
spec:
containers:
- args:
- -c
- |
set -x
pwd && ls
ls -ltr /
cat /repo/clone.sh
./repo/clone.sh
pwd
command:
- /bin/bash
envFrom:
- configMapRef:
name: sonarscanner-configmap
image: artifactory.build.team.com/product-containers/user/sonarqube-scanner:4.7.0.2747
imagePullPolicy: IfNotPresent
name: sonarqube-sonarscanner
securityContext:
runAsUser: 0
volumeMounts:
- mountPath: /repo
name: repo-checkout
dnsPolicy: ClusterFirst
initContainers:
- args:
- -c
- cd /
command:
- /bin/sh
image: busybox
imagePullPolicy: IfNotPresent
name: clone-repo
securityContext:
privileged: true
volumeMounts:
- mountPath: /repo
name: repo-checkout
readOnly: true
restartPolicy: OnFailure
securityContext:
fsGroup: 0
volumes:
- configMap:
defaultMode: 420
name: product-configmap
name: repo-checkout
schedule: '*/1 * * * *'
ConfigMap
kind: ConfigMap
metadata:
apiVersion: v1
data:
clone.sh: |-
#!bin/bash
set -xe
apk add git curl
#Containers that fail to resolve repo url can use below step.
repo_url=$(nslookup ${CODE_REPO_URL} | grep Non -A 2 | grep Name | cut -d: -f2)
repo_ip=$(nslookup ${CODE_REPO_URL} | grep Non -A 2 | grep Address | cut -d: -f2)
if grep ${repo_url} /etc/hosts; then
echo "git dns entry exists locally"
else
echo "Adding dns entry for git inside container"
echo ${repo_ip} ${repo_url} >> /etc/hosts
fi
cd / && cat /etc/hosts && pwd
git clone "https://$RU:$RT#${CODE_REPO_URL}/r/a/${CODE_REPO_NAME}" && \
(cd "${CODE_REPO_NAME}" && mkdir -p .git/hooks && \
curl -Lo `git rev-parse --git-dir`/hooks/commit-msg \
https://$RU:$RT#${CODE_REPO_URL}/r/tools/hooks/commit-msg; \
chmod +x `git rev-parse --git-dir`/hooks/commit-msg)
cd ${CODE_REPO_NAME}
pwd
output pod describe
Warning FailedCreatePodSandBox 1s kubelet, node1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "sonarqube-cronjob-1670256720-fwv27": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:303: getting the final child's pid from pipe caused \"EOF\"": unknown
pod logs
+ pwd
+ ls
/usr/src
+ ls -ltr /repo/clone.sh
lrwxrwxrwx 1 root root 15 Dec 5 16:26 /repo/clone.sh -> ..data/clone.sh
+ ls -ltr
total 60
.
drwxr-xr-x 2 root root 4096 Aug 9 08:58 sbin
drwx------ 2 root root 4096 Aug 9 08:58 root
drwxr-xr-x 2 root root 4096 Aug 9 08:58 mnt
drwxr-xr-x 5 root root 4096 Aug 9 08:58 media
drwxrwsrwx 3 root root 4096 Dec 5 16:12 repo <<<<< MY MOUNTED DIR
.
+ cat /repo/clone.sh
#!bin/bash
set -xe
apk add git curl
#Containers that fail to resolve repo url can use below step.
repo_url=$(nslookup ${CODE_REPO_URL} | grep Non -A 2 | grep Name | cut -d: -f2)
repo_ip=$(nslookup ${CODE_REPO_URL} | grep Non -A 2 | grep Address | cut -d: -f2)
if grep ${repo_url} /etc/hosts; then
echo "git dns entry exists locally"
else
echo "Adding dns entry for git inside container"
echo ${repo_ip} ${repo_url} >> /etc/hosts
fi
cd / && cat /etc/hosts && pwd
git clone "https://$RU:$RT#${CODE_REPO_URL}/r/a/${CODE_REPO_NAME}" && \
(cd "${CODE_REPO_NAME}" && mkdir -p .git/hooks && \
curl -Lo `git rev-parse --git-dir`/hooks/commit-msg \
https://$RU:$RT#${CODE_REPO_URL}/r/tools/hooks/commit-msg; \
chmod +x `git rev-parse --git-dir`/hooks/commit-msg)
cd code_dir
+ ./repo/clone.sh
/bin/bash: line 5: ./repo/clone.sh: No such file or directory
+ pwd
pwd/usr/src
Assuming the working directory is different thant /:
If you want to source your script in the current process of bash (shorthand .) you have to add a space between the dot and the path:
. /repo/clone.sh
If you want to execute it in a child process, remove the dot:
/repo/clone.sh

how can find directory

when starting the tcl script a directory is created via a bash command. at the end of my script i want to read the directory name of the latest dirs. but my script does not find the newest directory but only the 2nd newest
bind pub "-|-" !aa pub:aaa
proc pub:aaa {nick host handle channel arg} {
set home "/home/user"
set bb [exec bash -c "start.sh"]
after 3000
set latest [exec bash -c "ls -td $home/jpg/*/ | head -n1"]
putnow "PRIVMSG $channel :$latest"
}
before starting it has the following folders in the directory:
drwxr-xr-x 2 user user 4096 Jun 24 18:30 aaa
drwxr-xr-x 2 user user 4096 Jun 24 18:14 bbb
after starting it has the following folders in the directory
drwxr-xr-x 2 user user 4096 Jun 24 18:30 aaa
drwxr-xr-x 2 user user 4096 Jun 24 18:14 bbb
drwxr-xr-x 2 user user 4096 Jun 24 18:35 ccc
output is :
<#testbot> aaa
it should be so
<#testbot> ccc
he finds the directory created during which the tcl script is not running
how can I display the newest, newly created directory?
regards
Instead of trying to exec out to a shell to find the most recently modified directory, I'd do it in pure tcl:
proc latest_directory {path {time mtime}} {
set dirs {}
foreach dir [glob -nocomplain -type d $path/*] {
file stat $dir s
lappend dirs $s($time) $dir
}
if {[llength $dirs] == 0} {
error "No directories found in $path"
} else {
return [lindex [lsort -integer -decreasing -stride 2 $dirs] 1]
}
}
# Then in pub:aaa
set latest [latest_directory $home/jpg]
As for why you're not getting ccc... hard to say for sure without seeing your start.sh script, but if it ends up running stuff in the background that continues after it exits, maybe it takes more than 3 seconds to create that directory?

Salt's cmd.run ignoring shell script return code

I am using a simple salt state to send (file.managed) and execute (cmd.run) a shell script on a minion/target. No matter what exit or return value the shell script sends, the salt master is interpreting the result as successful.
I tried using cmd.script, but keep getting a permission denied error on the temp version of the file under /tmp. Filesystem is not mounted with noexec so we can't figure out why it won't work.
For cmd.run, stdout in the job output shows the failed return code and message but Salt still says Success. Running the script locally on the minion reports the return/exit code as expected.
I tried adding stateful: True into the cmd.run block and formatted the key value pairs at the end of the shell script as demonstrated in the docs.
Running against 2 minion target, 1 fail 1 succeed. Both report Result as True but correctly populate Comment with my key value pair.
I've tried YES/NO, TRUE/FALSE, 0/1 - nothing works.
The end of my shell script, formatted as shown in the docs.
echo Return_Code=${STATUS}
# exit ${STATUS}
if [[ ${STATUS} -ne 0 ]]
then
echo ""
echo "changed=False comment='Failed'"
else
echo ""
echo "changed=True comment='Success'"
fi
The SLS block:
stop_oracle:
cmd.run:
- name: {{scriptDir}}/{{scriptName}}{{scriptArg}}
- stateful: True
- failhard: True
SLS output from Successful minion:
----------
ID: stop_oracle
Function: cmd.run
Name: /u01/orastage/oraclepsu/scripts/oracle_ss_wrapper.ksh stop
Result: True
Comment: Success
Started: 14:37:44.519131
Duration: 18930.344 ms
Changes:
----------
changed:
True
pid:
26195
retcode:
0
stderr:
stty: standard input: Inappropriate ioctl for device
stdout:
Script running under ROOT
Mon Jul 1 14:38:03 EDT 2019 : Successful
Return_Code=0
SLS output from Failed minion:
----------
ID: stop_oracle
Function: cmd.run
Name: /u01/orastage/oraclepsu/scripts/oracle_ss_wrapper.ksh stop
Result: True
Comment: Failed
Started: 14:07:14.153940
Duration: 38116.134 ms
Changes:
Output from shell script run locally on fail target:
[oracle#a9tvdb102]:/home/oracle:>>
/u01/orastage/oraclepsu/scripts/oracle_ss_wrapper.ksh stop
Mon Jul 1 15:29:18 EDT 2019 : There are errors in the process
Return_Code=1
changed=False comment='Failed'
Output from shell script run locally on success target:
[ /home/oracle ]
oracle#r9tvdo1004.giolab.local >
/u01/orastage/oraclepsu/scripts/oracle_ss_wrapper.ksh stop
Mon Jul 1 16:03:18 EDT 2019 : Successful
Return_Code=0
changed=True comment='Success'

cron task wouldn't work, why?

I want to write a cron task to record the ntpdate synchronization info into the system log, but there's no such info printed in the /var/log/messages after this cron task is done, where did I do wrong?
The followings are what my crontab looks like.
*/1 * * * * ntpdate 192.168.100.97 | logger -t "NTP"
*/1 * * * * echo "log test" | logger -t "TEST"
*/1 * * * * whoami | logger -t "WHO"
When I do tailf /var/log/messages and wait some time I only got the following lines, the NTP lines are missing.
Oct 29 15:22:01 localhost TEST: log test
Oct 29 15:22:01 localhost WHO: root
Oct 29 15:23:01 localhost TEST: log test
Oct 29 15:23:01 localhost WHO: root
Oct 29 15:24:01 localhost TEST: log test
Oct 29 15:24:01 localhost WHO: root
Oct 29 15:25:01 localhost TEST: log test
Oct 29 15:25:01 localhost WHO: root
Oct 29 15:26:01 localhost TEST: log test
Oct 29 15:26:01 localhost WHO: root
But when I do the ntpdate 192.168.100.97 | logger -t "NTP" in the command line, I could see there's message Oct 29 15:28:39 localhost NTP: 29 Oct 15:28:39 ntpdate[11101]: adjust time server 192.168.100.97 offset 0.000043 sec print out in the system log. What am I missing here?
Thanks in advance for your kind help.

No response from [ -f /path/to/file ]

I am trying to get my Capistrano deploy script working, but it is not doing the symlinking as it is configured to do as shown below.
set :linked_files, %w{config/database.yml}
set :linked_dirs, %w{log tmp vendor/bundle public/system}
When it runs the related command, I get the following:
WARN [SKIPPING] No Matching Host for /usr/bin/env [ -f /path/to/shared/config/database.yml ]
If I run this command on the server, either through ssh or through logging onto the server and running the command, I get no response from the command.
user: ~
$ [ -f /path/to/shared/config/database.yml ]
user: ~
$
The file does exist in the specified location and has permissions.
user: ~
$ ll /path/to/shared/config/
total 4.0K
drwxrwxr-x 2 user group 33 Nov 30 10:58 .
drwxrwxr-x 7 user group 89 Nov 30 10:58 ..
-rwxrwxr-x 1 user group 805 Nov 30 10:58 database.yml
user: ~
Shouldn't this return a true or a false, instead of nothing? Is there a configuration I may have changed that suppresses the output? I get no response at all whether the file exists or not.
In your response to the actual question you ask, test (which is what [ is an alias for) does in fact not return output to stdout. It returns an exit code.
user: ~
$ [ -f /path/to/shared/config/database.yml ] # if the file exists
user: ~
$ echo $?
0
user: ~
$ [ -f /path/to/shared/config/database.yml ] # if the file does not exist
user: ~
$ echo $?
1
test -f /path/to/file (or [ -f /path/to/file ]) yields an exit code of 0 if the file exists or 1 if it does not. If you want to check that a file is there and echo the path to it, try:
[ -f /path/to/file ] && echo "/path/to/file"

Resources