The status of a deployment indicates that you can look at a deployments observedGeneration vs generation and when observedGeneration >= generation then the deployment succeeded. That's fine, but I'm interested in knowing when the new container is actually running in all of my pods, so that if I hit a service I know for sure I'm hitting a server that represents the latest deployed container.
Another tip from a K8S Slack member:
kubectl get deployments | grep <deployment-name> | sed 's/ /,/g' | cut -d ' ' -f 4
I deployed a bad image, resulting in ErrImagePull, yet the deployment still reported the correct number of 8 up-date-date replicas (available replicas was 7).
Update #2: Kubernetes 1.5 will ship with a much better version of kubectl rollout status and improve even further in 1.6, possibly replacing my custom solution/script laid out below.
Update #1: I have turned my answer into a script hosted on Github which has received a small number of improving PRs by now.
Original answer:
First of all, I believe the kubectl command you got is not correct: It replaces all white spaces by commas but then tries to get the 4th field after separating by white spaces.
In order to validate that a deployment (or upgrade thereof) made it to all pods, I think you should check whether the number of available replicas matches the number of desired replicas. That is, whether the AVAILABLE and DESIRED columns in the kubectl output are equal. While you could get the number of available replicas (the 5th column) through
kubectl get deployment nginx | tail -n +2 | awk '{print $5}'
and the number of desired replicas (2nd column) through
kubectl get deployment nginx | tail -n +2 | awk '{print $2}'
a cleaner way is to use kubectl's jsonpath output, especially if you want to take the generation requirement that the official documentation mentions into account as well.
Here's a quick bash script I wrote that expects to be given the deployment name on the command line, waits for the observed generation to become the specified one, and then waits for the available replicas to reach the number of the specified ones:
#!/bin/bash
set -o errexit
set -o pipefail
set -o nounset
deployment=
get_generation() {
get_deployment_jsonpath '{.metadata.generation}'
}
get_observed_generation() {
get_deployment_jsonpath '{.status.observedGeneration}'
}
get_replicas() {
get_deployment_jsonpath '{.spec.replicas}'
}
get_available_replicas() {
get_deployment_jsonpath '{.status.availableReplicas}'
}
get_deployment_jsonpath() {
local readonly _jsonpath="$1"
kubectl get deployment "${deployment}" -o "jsonpath=${_jsonpath}"
}
if [[ $# != 1 ]]; then
echo "usage: $(basename $0) <deployment>" >&2
exit 1
fi
readonly deployment="$1"
readonly generation=$(get_generation)
echo "waiting for specified generation ${generation} to be observed"
while [[ $(get_observed_generation) -lt ${generation} ]]; do
sleep .5
done
echo "specified generation observed."
readonly replicas="$(get_replicas)"
echo "specified replicas: ${replicas}"
available=-1
while [[ ${available} -ne ${replicas} ]]; do
sleep .5
available=$(get_available_replicas)
echo "available replicas: ${available}"
done
echo "deployment complete."
Just use a rollout status:
kubectl rollout status deployment/<deployment-name>
This will run in foreground, it waits and displays the status, and exits when rollout is complete on success or failure.
If you're writing a shell script, then check the return code right after the command, something like this.
kubectl rollout status deployment/<deployment-name>
if [[ "$?" -ne 0 ]] then
echo "deployment failed!"
exit 1
fi
To even further automate your script:
deployment_name=$(kubectl get deployment -n <your namespace> | awk '!/NAME/{print $1}')
kubectl rollout status deployment/"${deployment_name}" -n <your namespace>
if [[ "$?" -ne 0 ]] then
echo "deployment failed!"
#exit 1
else
echo "deployment succeeded"
fi
If you're running in default namespace then you could leave out the -n <your namespace>.
The command awk '!/NAME/{print $1}') extracts the first field (deployment name), while ignoring the first row which is the header(NAME READY UP-TO-DATE AVAILABLE AGE).
If you have more than one deployment files then you could also add more regex or pattern to awk: e.g. awk '!/NAME/<pattern to parse>/{print $1}')
Related
I have a .NetCore based Lambda Project, that I want to build with AWS CodeBuild.
I have a CodeCommit Repo for the source, and I am using a Lambda to trigger build whenever there is a commit to my master branch. I do not want to use CodePipeline.
I the code build I will be doing following -
Build
Package
Upload package to S3
Run some AWS CLI commands to update the lambda function
Now I have a couple of shell scripts that I want to execute as part of this, these scripts are working fine for me locally and that is the reason I want to use them with CodeBuild.
I am using a Ubuntu based .net core image from AWS for my build and nn my code build project, I have updated the build spec to do - chmod +x *.sh in pre_build and made other changes to my buildspec.yml as per this thread: https://forums.aws.amazon.com/thread.jspa?messageID=760031 and also looked at following blog post trying to do something similar - http://openbedrock.blogspot.in/2017/03/aws-codebuild-howto.html
This is one such script that I want to execute:
#!/bin/bash
set -e
ZIPFILENAME=""
usage()
{
echo "build and package lambda project"
echo "./build-package.sh "
echo "Get this help message : -h --help "
echo "Required Parameters: "
echo "--zip-filename=<ZIP_FILE_NAME>"
echo ""
}
if [ $# -eq 0 ]; then
usage
exit 1
fi
while [ "$1" != "" ]; do
PARAM=`echo $1 | awk -F= '{print $1}'`
VALUE=`echo $1 | awk -F= '{print $2}'`
case $PARAM in
-h | --help)
usage
exit
;;
--zip-filename)
ZIPFILENAME=$VALUE
;;
*)
echo "ERROR: unknown parameter \"$PARAM\""
usage
exit 1
;;
esac
shift
done
Now, I am getting error trying to execute shell command in Code Build:
Running command:
sh lambda-deployer.sh --existing-lambda=n --update-lambda=y
lamdbda-deployer.sh: 5: lambda-deployer.sh: Syntax error: "(" unexpected
When building docker images in a continuous integration environment, you quickly run out of disk space and need to remove old images. However, you can't remove all the old images, including intermediate images, because that breaks caching.
How do you avoid running out of disk space on your build agents, without breaking caching?
My solution is to remove the previous version of the image after building the new one. This ensures that cached images are available to speed up the build, but avoids old images piling up and eating your disk space. This method relies on each version of the image having a unique tag.
This is my script (gist here):
#!/usr/bin/env bash
usage(){
# ============================================================
echo This script removes all images of the same repository and
echo older than the provided image from the docker instance.
echo
echo This cleans up older images, but retains layers from the
echo provided image, which makes them available for caching.
echo
echo Usage:
echo
echo '$ ./delete-images-before.sh <image-name>:<tag>'
exit 1
# ============================================================
}
[[ $# -ne 1 ]] && usage
IMAGE=$(echo $1 | awk -F: '{ print $1 }')
TAG=$(echo $1 | awk -F: '{ print $2 }')
FOUND=$(docker images --format '{{.Repository}}:{{.Tag}}' | grep ${IMAGE}:${TAG})
if ! [[ ${FOUND} ]]
then
echo The image ${IMAGE}:${TAG} does not exist
exit 2
fi
docker images --filter before=${IMAGE}:${TAG} \
| grep ${IMAGE} \
| awk '{ print $3 }' \
| xargs --no-run-if-empty \
docker --log-level=warn rmi --force || true
A tool we use to handle this is docker custodian (dcgc).
It is suggested to keep a list of images that you want to keep around and never clean up and pass that to --exclude-image (If you're using puppet or some other resource management system, it may be more useful to write a file to disk that contains the image patterns and instead use --exclude-image-file)
I have been having problems with Magento and cron jobs not running. It seems that certain parameters with cron.sh are not allowed by my hosting company (ps being one of them) therefore the shell script failed before the cron job was run. As my cron in cpanel declares the full path I am wondering if I can remove certain lines from cron.sh eg.
#!/bin/sh
# location of the php binary
if [ ! "$1" = "" ] ; then
CRONSCRIPT=$1
else
CRONSCRIPT=cron.php
fi
MODE=""
if [ ! "$2" = "" ] ; then
MODE=" $2"
fi
PHP_BIN=`which php`
# absolute path to magento installation
INSTALLDIR=`echo $0 | sed 's/cron\.sh//g'`
# prepend the intallation path if not given an absolute path
# if [ "$INSTALLDIR" != "" -a "`expr index $CRONSCRIPT /`" != "1" ];then
# if ! ps auxwww | grep "$INSTALLDIR$CRONSCRIPT$MODE" | grep -v grep 1>/dev/null 2>/dev/null ; then
# $PHP_BIN $INSTALLDIR$CRONSCRIPT$MODE &
# fi
#else
# if ! ps auxwww | grep "$CRONSCRIPT$MODE" | grep -v grep | grep -v cron.sh 1>/dev/null 2>/dev/null ; then
$PHP_BIN $CRONSCRIPT$MODE &
# fi
#fi
Does anyone know if this will work and are there any drawbacks/consequences?
Without having particular knowledge of this functionality - it looks like it could be potentially trying to avoid running the cron script again while it's already running. Perhaps the same could be done with a lock file - but this is one area of Magento I wouldn't muck around with without a lot of research.
This is orthogonal to a larger issue, however. Magento is more picky with hosting than the average PHP codebase, and this is probably just the beginning of issues you will have with your host. I strongly recommend considering a host that is very familiar with Magentos needs. If commenting out chunks of Magento core code becomes the norm - you will run into many more issues down the line.
I need help with two scripts I'm trying to make as one. There are two different ways to detect if there are issues with a bad NFS mount. One is if there is an issue, doing a df will hang and the other is the df works but there is are other issues with the mount which a find (mount name) -type -d will catch.
I'm trying to combine the scripts to catch both issues to where it runs the find type -d and if there is an issue, return an error. If the second NFS issue occurs and the find hangs, kill the find command after 2 seconds; run the second part of the script and if the NFS issue is occurring, then return an error. If neither type of NFS issue is occurring then return an OK.
MOUNTS="egrep -v '(^#)' /etc/fstab | grep nfs | awk '{print $2}'"
MOUNT_EXCLUDE=()
if [[ -z "${NFSdir}" ]] ; then
echo "Please define a mount point to be checked"
exit 3
fi
if [[ ! -d "${NFSdir}" ]] ; then
echo "NFS CRITICAL: mount point ${NFSdir} status: stale"
exit 2
fi
cat > "/tmp/.nfs" << EOF
#!/bin/sh
cd \$1 || { exit 2; }
exit 0;
EOF
chmod +x /tmp/.nfs
for i in ${NFSdir}; do
CHECK="ps -ef | grep "/tmp/.nfs $i" | grep -v grep | wc -l"
if [ $CHECK -gt 0 ]; then
echo "NFS CRITICAL : Stale NFS mount point $i"
exit $STATE_CRITICAL;
else
echo "NFS OK : NFS mount point $i status: healthy"
exit $STATE_OK;
fi
done
The MOUNTS and MOUNT_EXCLUDE lines are immaterial to this script as shown.
You've not clearly identified where ${NFSdir} is being set.
The first part of the script assumes ${NFSdir} contains a single directory value; the second part (the loop) assumes it may contain several values. Maybe this doesn't matter since the loop unconditionally exits the script on the first iteration, but it isn't the clear, clean way to write it.
You create the script /tmp/.nfs but:
You don't execute it.
You don't delete it.
You don't allow for multiple concurrent executions of this script by making a per-process file name (such as /tmp/.nfs.$$).
It is not clear why you hide the script in the /tmp directory with the . prefix to the name. It probably isn't a good idea.
Use:
tmpcmd=${TMPDIR:-/tmp}/nfs.$$
trap "rm -f $tmpcmd; exit 1" 0 1 2 3 13 15
...rest of script - modified to use the generated script...
rm -f $tmpcmd
trap 0
This gives you the maximum chance of cleaning up the temporary script.
There is no df left in the script, whereas the question implies there should be one. You should also look into the timeout command (though commands hung because NFS is not responding are generally very difficult to kill).
I'm working with App Engine and I'm thinking about using the LESS CSS extension in my next project. There's no good LESS CSS library written in Python so I went on with the original Ruby one which works great and out of the box. I'd like App Engine to execute lessc ./templates/css/style.less before running the development server and before uploading the files to the cloud. What is the best way to automate this? I'm thinking:
#run.sh:
lessc ./templates/css/style.less
.gae/dev_appserver.py --use_sqlite .
And
#deploy.sh
lessc ./templates/css/style.less
.gae/appcfg.py update .
Am I on the correct path or is there a more elegant way of doing things, perhaps at the appcfg.py level?
Thanks.
One option is to use the javascript version of Less and hence do the less-to-css conversion in the browser.. simply upload your less formatted file (see http://lesscss.org/ for details).
Alternately, I do the conversion (first with less, now I use sass) in a deploy script which does a number of things
checks that my source code control has no outstanding files checked out (uncommited changes)
joins and minifies my .js code (and runs jslint over it) into a single file
generates other content (including stamping the source code control version as a version number into certain key files and as a parameter on some files to avoid caching issues) so my main page pulls in scripts with URLs such as "allmysource.js?v=585".. the file might be static but the added params force cache invalidation
calls appcfg to perform the upload and checks the return code
makes some calls to the real site with wget to check the previously generated files are actually returned, by checking they're stamped with the expected version
applies another source code control tag to say that the intended version was successfully deployed
My script also accepts a "-preview" flag in which case it doesn't actually do the upload, but reports the version control comments for what's changed since the previous deployment.
me#here $ ./deploy -preview
Deployment preview...
Would deploy v596 to the production site (currently v593, previously v587)
594 Fix blah blah blah for X Y Z
595 New feature nah nah nah
596 Update help pages
This is pretty handy as a reminder of what I need to put in things like a changelog
I plan to also expand it so that I can, as part of my source code control, add any code that needs running once only when deployed (eg database schema changes) and know that it'll be automatically run when I next deploy a new version.
Essence of the script below as people asked... it doesn't show my "check code, generate, join, and minify" as that's another script... I realise that the original question was asking about that step of course :) but you can see where you'd add the call to generate CSS etc
#!/bin/sh
function abort () {
echo
echo "ERROR: $1"
echo "$2"
exit 99
}
function warn () {
echo
echo "WARNING: $1"
echo "$2"
}
# Overrides the Gentoo eselect mechanism to force the python version the GAE scripts expect
export EPYTHON=python2.5
# names of tags used to label bzr versions
CURR_DTAG=deployed
PREV_DTAG=prevDeployed
# command line options
PREVIEW=0
IGNORE_BZR=0
# These next few vars are set to values to identify my site, insert your own values here...
APPID=your_gae_appid_here
ADMIN_EMAIL=your_admin_email_address_here
SRCDIR=directory_to_deploy
CHECK_URL=url_of_page_to_retrive_that_does_upload_initialisation
for ARG; do
if [[ "$ARG" == "-preview" ]]; then
echo "Deployment preview..."
PREVIEW=1
fi
if [[ "$ARG" == "-force" ]]; then
echo "Ignoring the fact some files may not be committed to bzr..."
IGNORE_BZR=1
fi
done
echo
# check bzr for uncommited changed
BSTATUS=`bzr status`
if [[ "$BSTATUS" != "" ]]; then
if [[ "$IGNORE_BZR" == "0" ]]; then
abort "There are uncommited changes - commit/revert/ignore all files before deploying" "$BSTATUS"
else
warn "There are uncommited changes" "$BSTATUS"
fi
fi
# get version of numbers of last deployed etc
currver=`bzr log -l1 --line | sed -e 's/: .*//'`
lastver=`bzr log -rtag:${CURR_DTAG} --line | sed -e 's/: .*//'`
prevver=`bzr log -rtag:${PREV_DTAG} --line | sed -e 's/: .*//'`
lastlog=`bzr log -l 1 --line gae/changelog | sed -e 's/: .*//'`
RELEASE_NOTES=`bzr log --short --forward -r $lastver..$currver \
| perl -ne '$ver = $1 if /^ {0,4}(\d+) /; print " $ver $_" if ($ver and /^ {5,}\w/)' \
| grep -v "^ *$lastver "`
LOG_NOTES=`bzr log --short --forward -r $lastlog..$currver \
| perl -ne '$ver = $1 if /^ {0,4}(\d+) /; print " $ver $_" if ($ver and /^ {5,}\w/)' \
| grep -v "^ *$lastlog "`
# Crude but old habit - BUGBUGBUG is a marker in the code for things to be fixed before deployment
echo "Checking code for outstanding issues before deployment"
BUGSTATUS=`grep BUGBUGBUG js/*js`
if [[ "$BUGSTATUS" != "" ]]; then
if [[ "$IGNORE_BZR" == "0" ]]; then
abort "There are outstanding BUGBUGBUGs - fix them before deploying" "$BUGSTATUS"
else
warn "There are outstanding BUGBUGBUGs" "$BUGSTATUS"
fi
fi
echo
echo "Deploy v$currver to the production site (currently v$lastver, previously v$prevver)"
echo "$RELEASE_NOTES"
echo
if [[ "$currver" -gt "$lastlog" && "$lastver" -ne "$lastlog" ]]; then
echo "Changes since the changelog was last updated"
echo "$LOG_NOTES"
echo
fi
if [[ "$IGNORE_BZR" == "0" && $lastver -ge $currver ]]; then
abort "There don't appear to be any changes to deploy..."
fi
if [[ "$PREVIEW" == "1" ]]; then
exit 0
fi
$EPYTHON -c "import ssl" \
|| abort "$EPYTHON can't find ssl module for $EPYTHON - download it from pypi and install with the inbuilt setup.py"
# REMOVED - call to my script that calls jslint, generates files and compresses JS etc
# || abort "Generation of code failed"
/opt/google_appengine/appcfg.py --email=$ADMIN_EMAIL -v -A $APPID update $SRCDIR \
|| abort "Appcfg failed - upload presumably incomplete"
# move the tags to show we deployed properly
bzr tag -r $lastver --force ${PREV_DTAG}
bzr tag -r $currver --force ${CURR_DTAG}
echo
echo "Production site updated from v$lastver to v$currver (in turn from v$prevver)"
echo
echo "Now visiting $CHECK_URL to upload the source to the database"
# new version doesn't seem to always be there (may be caching by the webserver etc) to be uploaded into the database.. try again just in case
for cb in $RANDOM $RANDOM $RANDOM $RANDOM ; do
prodver=`wget $CHECK_URL?_cb=$cb -q -O - | perl -ne 'print $1 if /^\s*Rev #(\d+)\s*$/'`
if [[ "$currver" == "$prodver" ]]; then
echo "OK: New version $prodver successfully deployed"
exit 0
fi
echo "Retrying the upload of source to the database"
sleep 5
done
abort "The new source doesn't seem to be loading into the database" "Try 'wget $CHECK_URL?_cb=$RANDOM -q -O -'"
It's not particularly big or clever, but it automates the upload job