Kubernetes readiness probe fails - bash

I wrote a readiness_probe for my pod by using a bash script. Readiness probe failed with Reason: Unhealthy but when I manually get in to the pod and run this command /bin/bash -c health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/api/v2/ping); if [[ $health -ne 401 ]]; then exit 1; fi bash script exits with code 0.
What could be the reason? I am attaching the code and the error below.
Edit: Found out that the health variable is set to 000 which means timeout in for bash script.
readinessProbe:
exec:
command:
- /bin/bash
- '-c'
- |-
health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/api/v2/ping);
if [[ $health -ne 401 ]]; then exit 1; fi
"kubectl describe pod {pod_name}" result:
Name: rustici-engine-54cbc97c88-5tg8s
Namespace: default
Priority: 0
Node: minikube/192.168.49.2
Start Time: Tue, 12 Jul 2022 18:39:08 +0200
Labels: app.kubernetes.io/name=rustici-engine
pod-template-hash=54cbc97c88
Annotations: <none>
Status: Running
IP: 172.17.0.5
IPs:
IP: 172.17.0.5
Controlled By: ReplicaSet/rustici-engine-54cbc97c88
Containers:
rustici-engine:
Container ID: docker://f7efffe6fc167e52f913ec117a4d78e62b326d8f5b24bfabc1916b5f20ed887c
Image: batupaksoy/rustici-engine:singletenant
Image ID: docker-pullable://batupaksoy/rustici-engine#sha256:d3cf985c400c0351f5b5b10c4d294d48fedfd2bb2ddc7c06a20c1a85d5d1ae11
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Tue, 12 Jul 2022 18:39:12 +0200
Ready: False
Restart Count: 0
Limits:
memory: 350Mi
Requests:
memory: 350Mi
Liveness: exec [/bin/bash -c health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/api/v2/ping);
if [[ $health -ne 401 ]]; then exit 1; else exit 0; echo $health; fi] delay=10s timeout=5s period=10s #success=1 #failure=20
Readiness: exec [/bin/bash -c health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/api/v2/ping);
if [[ $health -ne 401 ]]; then exit 1; else exit 0; echo $health; fi] delay=10s timeout=5s period=10s #success=1 #failure=10
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-whb8d (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-whb8d:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 24s default-scheduler Successfully assigned default/rustici-engine-54cbc97c88-5tg8s to minikube
Normal Pulling 23s kubelet Pulling image "batupaksoy/rustici-engine:singletenant"
Normal Pulled 21s kubelet Successfully pulled image "batupaksoy/rustici-engine:singletenant" in 1.775919851s
Normal Created 21s kubelet Created container rustici-engine
Normal Started 20s kubelet Started container rustici-engine
Warning Unhealthy 4s kubelet Readiness probe failed:
Warning Unhealthy 4s kubelet Liveness probe failed:

The probe could be failing because it is facing performance issues or slow startup. To troubleshoot this issue, you will need to check that the probe doesn’t start until the app is up and running in your pod. Perhaps you will need to increase the Timeout of the Readiness Probe, as well as the Timeout of the Liveness Probe, like in the following example:
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 2
timeoutSeconds: 10
You can find more details about how to configure the Readlines Probe and Liveness Probe in this link.

Related

how do I pipe in file content as args to kubectl?

I wish to run k6 in a container with some simple javascript load from local file system,
It seems the below had some syntax error
$ cat simple.js
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
vus: 10,
duration: '30s',
};
export default function () {
http.get('http://100.96.1.79:8080');
sleep(1);
}
$kubectl run k6 --image=grafana/k6 -- run - <simple.js
//OR
$kubectl run k6 --image=grafana/k6 run - <simple.js
in the k6 pod log, I got
│ time="2023-02-16T12:12:05Z" level=error msg="could not initialize '-': could not load JS test 'file:///-': no exported functions in s │
I guess this means the simple.js is not really passed to k6 this way?
thank you!
I think you can't pipe (host) files into Kubernetes containers this way.
One way that it should work is to:
Create a ConfigMap to represent your file
Apply a Pod config that mounts the ConfigMap file
NAMESPACE="..." # Or default
kubectl create configmap simple \
--from-file=${PWD}/simple.js \
--namespace=${NAMESPACE}
kubectl get configmap/simple \
--output=yaml \
--namespace=${NAMESPACE}
Yields:
apiVersion: v1
kind: ConfigMap
metadata:
name: simple
data:
simple.js: |
import http from 'k6/http';
import { sleep } from 'k6';
export default function () {
http.get('http://test.k6.io');
sleep(1);
}
NOTE You could just create e.g. configmap.yaml with the above YAML content and apply it.
Then with pod.yaml:
apiVersion: v1
kind: Pod
metadata:
name: simple
spec:
containers:
- name: simple
image: docker.io/grafana/k6
args:
- run
- /m/simple.js
volumeMounts:
- name: simple
mountPath: /m
volumes:
- name: simple
configMap:
name: simple
Apply it:
kubectl apply \
--filename=${PWD}/pod.yaml \
--namespace=${NAMESPACE}
Then, finally:
kubectl logs pod/simple \
--namespace=${NAMESPACE}
Yields:
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: /m/simple.js
output: -
scenarios: (100.00%) 1 scenario, 1 max VUs, 10m30s max duration (incl. graceful stop):
* default: 1 iterations for each of 1 VUs (maxDuration: 10m0s, gracefulStop: 30s)
running (00m01.0s), 1/1 VUs, 0 complete and 0 interrupted iterations
default [ 0% ] 1 VUs 00m01.0s/10m0s 0/1 iters, 1 per VU
running (00m01.4s), 0/1 VUs, 1 complete and 0 interrupted iterations
default ✓ [ 100% ] 1 VUs 00m01.4s/10m0s 1/1 iters, 1 per VU
data_received..................: 17 kB 12 kB/s
data_sent......................: 542 B 378 B/s
http_req_blocked...............: avg=128.38ms min=81.34ms med=128.38ms max=175.42ms p(90)=166.01ms p(95)=170.72ms
http_req_connecting............: avg=83.12ms min=79.98ms med=83.12ms max=86.27ms p(90)=85.64ms p(95)=85.95ms
http_req_duration..............: avg=88.61ms min=81.28ms med=88.61ms max=95.94ms p(90)=94.47ms p(95)=95.2ms
{ expected_response:true }...: avg=88.61ms min=81.28ms med=88.61ms max=95.94ms p(90)=94.47ms p(95)=95.2ms
http_req_failed................: 0.00% ✓ 0 ✗ 2
http_req_receiving.............: avg=102.59µs min=67.99µs med=102.59µs max=137.19µs p(90)=130.27µs p(95)=133.73µs
http_req_sending...............: avg=67.76µs min=40.46µs med=67.76µs max=95.05µs p(90)=89.6µs p(95)=92.32µs
http_req_tls_handshaking.......: avg=44.54ms min=0s med=44.54ms max=89.08ms p(90)=80.17ms p(95)=84.62ms
http_req_waiting...............: avg=88.44ms min=81.05ms med=88.44ms max=95.83ms p(90)=94.35ms p(95)=95.09ms
http_reqs......................: 2 1.394078/s
iteration_duration.............: avg=1.43s min=1.43s med=1.43s max=1.43s p(90)=1.43s p(95)=1.43s
iterations.....................: 1 0.697039/s
vus............................: 1 min=1 max=1
vus_max........................: 1 min=1 max=1
Tidy:
kubectl delete \
--filename=${PWD}/pod.yaml \
--namespace=${NAMESPACE}
kubectl delete configmap/simple \
--namespace=${NAMESPACE}
kubectl delete namespace/${NAMESPACE}

unable to execute a bash script in k8s cronjob pod's container

Team,
/bin/bash: line 5: ./repo/clone.sh: No such file or directory
cannot run above file but I can cat it well. I tried my best and still trying to find but no luck so far..
my requirement is to mount bash script from config map to a directory inside container and run it to clone a repo but am getting below message.
cron job
spec:
concurrencyPolicy: Allow
jobTemplate:
metadata:
spec:
template:
metadata:
spec:
containers:
- args:
- -c
- |
set -x
pwd && ls
ls -ltr /
cat /repo/clone.sh
./repo/clone.sh
pwd
command:
- /bin/bash
envFrom:
- configMapRef:
name: sonarscanner-configmap
image: artifactory.build.team.com/product-containers/user/sonarqube-scanner:4.7.0.2747
imagePullPolicy: IfNotPresent
name: sonarqube-sonarscanner
securityContext:
runAsUser: 0
volumeMounts:
- mountPath: /repo
name: repo-checkout
dnsPolicy: ClusterFirst
initContainers:
- args:
- -c
- cd /
command:
- /bin/sh
image: busybox
imagePullPolicy: IfNotPresent
name: clone-repo
securityContext:
privileged: true
volumeMounts:
- mountPath: /repo
name: repo-checkout
readOnly: true
restartPolicy: OnFailure
securityContext:
fsGroup: 0
volumes:
- configMap:
defaultMode: 420
name: product-configmap
name: repo-checkout
schedule: '*/1 * * * *'
ConfigMap
kind: ConfigMap
metadata:
apiVersion: v1
data:
clone.sh: |-
#!bin/bash
set -xe
apk add git curl
#Containers that fail to resolve repo url can use below step.
repo_url=$(nslookup ${CODE_REPO_URL} | grep Non -A 2 | grep Name | cut -d: -f2)
repo_ip=$(nslookup ${CODE_REPO_URL} | grep Non -A 2 | grep Address | cut -d: -f2)
if grep ${repo_url} /etc/hosts; then
echo "git dns entry exists locally"
else
echo "Adding dns entry for git inside container"
echo ${repo_ip} ${repo_url} >> /etc/hosts
fi
cd / && cat /etc/hosts && pwd
git clone "https://$RU:$RT#${CODE_REPO_URL}/r/a/${CODE_REPO_NAME}" && \
(cd "${CODE_REPO_NAME}" && mkdir -p .git/hooks && \
curl -Lo `git rev-parse --git-dir`/hooks/commit-msg \
https://$RU:$RT#${CODE_REPO_URL}/r/tools/hooks/commit-msg; \
chmod +x `git rev-parse --git-dir`/hooks/commit-msg)
cd ${CODE_REPO_NAME}
pwd
output pod describe
Warning FailedCreatePodSandBox 1s kubelet, node1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "sonarqube-cronjob-1670256720-fwv27": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:303: getting the final child's pid from pipe caused \"EOF\"": unknown
pod logs
+ pwd
+ ls
/usr/src
+ ls -ltr /repo/clone.sh
lrwxrwxrwx 1 root root 15 Dec 5 16:26 /repo/clone.sh -> ..data/clone.sh
+ ls -ltr
total 60
.
drwxr-xr-x 2 root root 4096 Aug 9 08:58 sbin
drwx------ 2 root root 4096 Aug 9 08:58 root
drwxr-xr-x 2 root root 4096 Aug 9 08:58 mnt
drwxr-xr-x 5 root root 4096 Aug 9 08:58 media
drwxrwsrwx 3 root root 4096 Dec 5 16:12 repo <<<<< MY MOUNTED DIR
.
+ cat /repo/clone.sh
#!bin/bash
set -xe
apk add git curl
#Containers that fail to resolve repo url can use below step.
repo_url=$(nslookup ${CODE_REPO_URL} | grep Non -A 2 | grep Name | cut -d: -f2)
repo_ip=$(nslookup ${CODE_REPO_URL} | grep Non -A 2 | grep Address | cut -d: -f2)
if grep ${repo_url} /etc/hosts; then
echo "git dns entry exists locally"
else
echo "Adding dns entry for git inside container"
echo ${repo_ip} ${repo_url} >> /etc/hosts
fi
cd / && cat /etc/hosts && pwd
git clone "https://$RU:$RT#${CODE_REPO_URL}/r/a/${CODE_REPO_NAME}" && \
(cd "${CODE_REPO_NAME}" && mkdir -p .git/hooks && \
curl -Lo `git rev-parse --git-dir`/hooks/commit-msg \
https://$RU:$RT#${CODE_REPO_URL}/r/tools/hooks/commit-msg; \
chmod +x `git rev-parse --git-dir`/hooks/commit-msg)
cd code_dir
+ ./repo/clone.sh
/bin/bash: line 5: ./repo/clone.sh: No such file or directory
+ pwd
pwd/usr/src
Assuming the working directory is different thant /:
If you want to source your script in the current process of bash (shorthand .) you have to add a space between the dot and the path:
. /repo/clone.sh
If you want to execute it in a child process, remove the dot:
/repo/clone.sh

Unable to install ELK on AKS 1 pod has unbound immediate PersistentVolumeClaims

I am confused and need direction. I am trying to Install ELK on AKS. I am following through with the below link for documentation.
https://ahmedhosameldein.wordpress.com/2021/03/25/install-elk-stack-on-azure-kubernetes-cluster-aks-using-helm/
k get pv,pvc,sc
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-03cbd512-ba0c-403e-83b2-3b690a67912c 5Gi RWX Retain Bound default/elasticsearch-master-elasticsearch-master-0 elk-azurefile-sc 70m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-0 Bound pvc-03cbd512-ba0c-403e-83b2-3b690a67912c 5Gi RWX elk-azurefile-sc 70m
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
storageclass.storage.k8s.io/azurefile file.csi.azure.com Delete Immediate true 79m
storageclass.storage.k8s.io/azurefile-csi file.csi.azure.com Delete Immediate true 79m
storageclass.storage.k8s.io/azurefile-csi-premium file.csi.azure.com Delete Immediate true 79m
storageclass.storage.k8s.io/azurefile-premium file.csi.azure.com Delete Immediate true 79m
storageclass.storage.k8s.io/default (default) disk.csi.azure.com Delete WaitForFirstConsumer true 79m
storageclass.storage.k8s.io/elk-azurefile-sc kubernetes.io/azure-file Retain Immediate true 74m
storageclass.storage.k8s.io/managed disk.csi.azure.com Delete WaitForFirstConsumer true 79m
storageclass.storage.k8s.io/managed-csi disk.csi.azure.com Delete WaitForFirstConsumer true 79m
storageclass.storage.k8s.io/managed-csi-premium disk.csi.azure.com Delete WaitForFirstConsumer true 79m
storageclass.storage.k8s.io/managed-premium disk.csi.azure.com Delete WaitForFirstConsumer true 79m
azureuser#satishvm:~$ k get po
NAME READY STATUS RESTARTS AGE
elasticsearch-master-0 0/1 CrashLoopBackOff 16 (3m53s ago) 70m
elasticsearch-rpbon-test 0/1 Error 0 69m
When I describe pods I get the below error:
k describe po elasticsearch-master-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 23m (x31 over 77m) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
Warning BackOff 3m32s (x298 over 77m) kubelet Back-off restarting failed container
Get all status as per below:
k get all
NAME READY STATUS RESTARTS AGE
pod/elasticsearch-master-0 0/1 CrashLoopBackOff 18 (2m56s ago) 81m
pod/elasticsearch-rpbon-test 0/1 Error 0 80m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/elasticsearch-master ClusterIP 10.0.67.70 <none> 9200/TCP,9300/TCP 81m
service/elasticsearch-master-headless ClusterIP None <none> 9200/TCP,9300/TCP 81m
service/kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 90m
NAME READY AGE
statefulset.apps/elasticsearch-master 0/1 81m

Systemd service is inactive (dead), but only after many weeks

I have a custom systemd service that scans the filesystem with inotify and creates files upon certain events.
The service works fine for many days, sometimes even for several weeks. Then suddenly it is stopped. It is configured to use Restart=always, so I would expect the service to self-recover upon failure, but this isn't happening.
I would like to know how to determine why the service is not recovering itself and how to fix the issue.
Here is the service config:
[Unit]
Description=Sets a PID limit (pids.max) for each container in the docker host
After=docker.service
Wants=docker.service
[Service]
Type=simple
Restart=always
StartLimitInterval=0
RestartSec=5
ExecStart=/opt/scripts/container-pid-limit.sh
StandardError=journal
And the contents of the file /opt/scripts/container-pid-limit.sh
#!/bin/bash -x
MAX_PIDS=5000
CGROUPS_DIR=/sys/fs/cgroup/pids/docker/
CONTAINERS_DIR=/srv/docker_root/containers/
set_limit() {
limit=$(grep -ir label $CONTAINERS_DIR/$1/config.v2.json | jq -r '.Config.Labels["com.xyz.pid_limit"]')
if [[ ! $limit -gt 0 ]] ; then
limit=$MAX_PIDS
fi
echo "CONTAINER: $c LIMIT $limit FILE $f"
echo $limit > $f;
}
# set pids.max for already created containers
for f in $(find $CGROUPS_DIR -mindepth 2 -name pids.max); do
c=$(dirname $f | xargs basename)
set_limit $c
done
# monitor cgroup dir for newly created dirs
inotifywait --event create,isdir --monitor --quiet --format "%w%f" $CGROUPS_DIR | while read -r line; do
c=$(basename $line)
set_limit $c
done
Sample output of systemctl status before failure:
● container-pid-limit.service - Sets a PID limit (pids.max) for each container in the docker host
Loaded: loaded (/etc/systemd/system/container-pid-limit.service; static; vendor preset: enabled)
Active: active (running) since Wed 2019-06-05 08:44:38 UTC; 14min ago
Main PID: 277527 (container-pid-l)
Tasks: 3
Memory: 2.3M
CPU: 79ms
CGroup: /system.slice/container-pid-limit.service
├─277527 /bin/bash /opt/scripts/container-pid-limit.sh
├─277892 inotifywait --event create,isdir --monitor --quiet --format %w%f /sys/fs/cgroup/pids/docker/
└─277893 /bin/bash /opt/scripts/container-pid-limit.sh
Sample output of systemctl status after failure:
● container-pid-limit.service - Sets a PID limit (pids.max) for each container in the docker host
Loaded: loaded (/etc/systemd/system/container-pid-limit.service; static; vendor preset: enabled)
Active: inactive (dead)
EDIT: I am trying to use systemctl status and systemctl show to identify when the service was started and eventually stopped, but it seems to me that when the service fails all the history is lost:
Reference:
https://unix.stackexchange.com/questions/368767/how-do-i-see-when-a-systemd-service-was-started-stopped-restarted
Sample output of systemctl show:
Type=simple
Restart=always
NotifyAccess=none
RestartUSec=5s
TimeoutStartUSec=1min
TimeoutStopUSec=45s
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestampMonotonic=0
FailureAction=none
PermissionsStartOnly=no
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=0
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ExecMainStartTimestampMonotonic=0
ExecMainExitTimestampMonotonic=0
ExecMainPID=0
ExecMainCode=0
ExecMainStatus=0
ExecStart={ path=/opt/scripts/container-pid-limit.sh ; argv[]=/opt/scripts//container-pid-limit.sh ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
Slice=system.slice
MemoryCurrent=18446744073709551615
CPUUsageNSec=18446744073709551615
TasksCurrent=18446744073709551615
Delegate=no
CPUAccounting=no
CPUShares=18446744073709551615
StartupCPUShares=18446744073709551615
CPUQuotaPerSecUSec=infinity
BlockIOAccounting=no
BlockIOWeight=18446744073709551615
StartupBlockIOWeight=18446744073709551615
MemoryAccounting=no
MemoryLimit=18446744073709551615
DevicePolicy=auto
TasksAccounting=no
TasksMax=18446744073709551615
UMask=0022
LimitCPU=18446744073709551615
LimitCPUSoft=18446744073709551615
LimitFSIZE=18446744073709551615
LimitFSIZESoft=18446744073709551615
LimitDATA=18446744073709551615
LimitDATASoft=18446744073709551615
LimitSTACK=18446744073709551615
LimitSTACKSoft=8388608
LimitCORE=18446744073709551615
LimitCORESoft=0
LimitRSS=18446744073709551615
LimitRSSSoft=18446744073709551615
LimitNOFILE=4096
LimitNOFILESoft=1024
LimitAS=18446744073709551615
LimitASSoft=18446744073709551615
LimitNPROC=7869937
LimitNPROCSoft=7869937
LimitMEMLOCK=65536
LimitMEMLOCKSoft=65536
LimitLOCKS=18446744073709551615
LimitLOCKSSoft=18446744073709551615
LimitSIGPENDING=7869937
LimitSIGPENDINGSoft=7869937
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=18446744073709551615
LimitRTTIMESoft=18446744073709551615
OOMScoreAdjust=0
Nice=0
IOScheduling=0
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardOutput=journal
StandardError=journal
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
SecureBits=0
CapabilityBoundingSet=18446744073709551615
AmbientCapabilities=0
MountFlags=0
PrivateTmp=no
PrivateNetwork=no
PrivateDevices=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=0
RuntimeDirectoryMode=0755
KillMode=control-group
KillSignal=15
SendSIGKILL=yes
SendSIGHUP=no
Id=container-pid-limit.service
Names=container-pid-limit.service
Requires=sysinit.target system.slice
Wants=docker.service
Conflicts=shutdown.target
Before=shutdown.target
After=basic.target systemd-journald.socket system.slice docker.service sysinit.target
Description=Sets a PID limit (pids.max) for each container in the docker host
LoadState=loaded
ActiveState=inactive
SubState=dead
FragmentPath=/etc/systemd/system/container-pid-limit.service
UnitFileState=static
UnitFilePreset=enabled
StateChangeTimestampMonotonic=0
InactiveExitTimestampMonotonic=0
ActiveEnterTimestampMonotonic=0
ActiveExitTimestampMonotonic=0
InactiveEnterTimestampMonotonic=0
CanStart=yes
CanStop=yes
CanReload=no
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=no
AssertResult=no
ConditionTimestampMonotonic=0
AssertTimestampMonotonic=0
Transient=no
StartLimitInterval=0
StartLimitBurst=5
StartLimitAction=none
SystemD Restart-Always is not meant to be a loop. It's about failure-handling.
RTFM on StartLimitBurst

Autoscaling: Newly created instance always OutOfService

I have setup autoscaling using these steps...
$ elb-create-lb autoscalelb --headers --listener
"lb-port=80,instance-port=80,protocol=http" --listener
"lb-port=443,instance-port=443,protocol=tcp" --availability-zones
us-east-1d
$ elb-describe-lbs autoscalelb
$ elb-register-instances-with-lb autoscalelb --instances i-ee364697
$ elb-configure-healthcheck autoscalelb --headers --target "TCP:80"
--interval 5 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 4
$ as-create-launch-config autoscalelc --image-id ami-baba68d3
--instance-type t1.micro
$ as-create-auto-scaling-group autoscleasg --availability-zones
us-east-1d --launch-configuration autoscalelc --min-size 1 --max-size
5 --desired-capacity 1 --load-balancers autoscalelb
$ as-describe-auto-scaling-groups autoscleasg
$ as-put-scaling-policy MyScaleUpPolicy --auto-scaling-group
autoscleasg --adjustment=1 --type ChangeInCapacity --cooldown 300
$ mon-put-metric-alarm MyHighCPUAlarm --comparison-operator
GreaterThanThreshold --evaluation-periods 1 --metric-name
CPUUtilization --namespace "AWS/EC2" --period 600 --statistic Average
--threshold 80 --alarm-actions arn:aws:autoscaling:us-east-1:616259365041:scalingPolicy:46c2d3b3-7f29-42b6-ab64-548f45de334f:autoScalingGroupName/autoscleasg:policyName/MyScaleUpPolicy
--dimensions "AutoScalingGroupName=autoscleasg"
$ as-put-scaling-policy MyScaleDownPolicy --auto-scaling-group
autoscleasg --adjustment=-1 --type ChangeInCapacity --cooldown 300
$ mon-put-metric-alarm MyLowCPUAlarm --comparison-operator
LessThanThreshold --evaluation-periods 1 --metric-name CPUUtilization
--namespace "AWS/EC2" --period 600 --statistic Average --threshold 50 --alarm-actions arn:aws:autoscaling:us-east-1:616259365041:scalingPolicy:30ccd42c-06fe-401a-8b8f-a4e49bbb9c7d:autoScalingGroupName/autoscleasg:policyName/MyScaleDownPolicy
--dimensions "AutoScalingGroupName=autoscleasg"
After this I'm running this command:
$ as-describe-auto-scaling-groups autoscleasg --headers
Response:
AUTO-SCALING-GROUP GROUP-NAME LAUNCH-CONFIG AVAILABILITY-ZONES
LOAD-BALANCERS MIN-SIZE MAX-SIZE DESIRED-CAPACITY
AUTO-SCALING-GROUP autoscleasg autoscalelc us-east-1d
autoscalelb 1 5 1 INSTANCE INSTANCE-ID
AVAILABILITY-ZONE STATE STATUS LAUNCH-CONFIG INSTANCE
i-acf48bd5 us-east-1d InService Healthy autoscalelc
And then:
$ elb-describe-instance-health autoscalelb --headers
It shows:
INSTANCE_ID INSTANCE_ID STATE DESCRIPTION
REASON-CODE INSTANCE_ID i-ee364697 InService N/A
N/A INSTANCE_ID i-acf48bd5 OutOfService Instance has failed at
least the UnhealthyThreshold number of health checks consecutively.
Instance
My first problem is:
It automatically creates One extra instance when there is no load on Main instance.
Secondly,
Newly created instance is always OutOfService.
if I change Min Size to 0 using following command:
$ as-update-auto-scaling-group autoscleasg --launch-configuration
autoscalelc --availability-zones us-east-1d --min-size 0 --max-size 5
And trying to put load on instance using xen:
hg clone http://xenbits.xensource.com/xen-unstable.hg
Autoscaling not creating any instance. Even if I'm running above command on upto 5 session, CPU Utilization reaches to 100% and still no instance is being created.
Please help me...
I am not sure what you want to achieve but if you want to use autoscaling capabilities to add more instances based on traffic increase or decrease , you need to use the load balancer parameters (i.e. Latency):
Change yours to:
--namespace='AWS/ELB'
--metric-name Latency
--period 60 (this is super quick)
--threshold 2.0 (this is very low)
To test if it works, I use Apache Bench, I run below command on multiple micro instances
$ ab -n 10000 -c 10 http://<your ELB>.us-east-1.elb.amazonaws.com/index.php

Resources