GitLab job Job succeeded but not finished (create / delete Azure AKS)

GitLab job Job succeeded but not finished (create / delete Azure AKS) - bash

I am using a runner to create a AKS on the fly and also delete a previous one.
unfortunately these jobs take a while and i have now more than often experienced that the job Stops suddenly (in the range of 5+ min after a az aks delete or az aks create call.
The Situation is happening in GitLab and after several retries it usually works one time.
On some googleing in found that before and after scripts might have an impact ... but even with removing them there was not difference.
Is there any Runner rules or something particular that might need to be changed ?
It would be more understandable when it would stop with a TImeout Error, but it handles it as job succeeded, even it did no even finish running thorugh all lines. Below is the stagging segment causing the issue:
create-kubernetes-az:
stage: create-kubernetes-az
image: microsoft/azure-cli:latest
# when: manual
script:
# REQUIRE CREATED SERVICE PRINCIPAL
- az login --service-principal -u ${AZ_PRINC_USER} -p ${AZ_PRINC_PASSWORD} --tenant ${AZ_PRINC_TENANT}
# Create Resource Group
- az group create --name ${AZ_RESOURCE_GROUP} --location ${AZ_RESOURCE_LOCATION}
# ERROR HAPPENS HERE # Delete Kubernetes Cluster // SOMETIMES STOPS AFTER THIS
- az aks delete --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_AKS_TEST_CLUSTER} --yes
#// OR HERE # Create Kubernetes Cluster // SOMETIMES STOPS AFTER THIS
- az aks create --name ${AZ_AKS_TEST_CLUSTER} --resource-group ${AZ_RESOURCE_GROUP} --node-count ${AZ_AKS_TEST_NODECOUNT} --service-principal ${AZ_PRINC_USER} --client-secret ${AZ_PRINC_PASSWORD} --generate-ssh-keys
# Get cubectl
- az aks install-cli
# Get Login Credentials
- az aks get-credentials --name ${AZ_AKS_TEST_CLUSTER} --resource-group ${AZ_RESOURCE_GROUP}
# Install Helm and Tiller on Azure Cloud Shell
- curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get > get_helm.sh
- chmod 700 get_helm.sh
- ./get_helm.sh
- helm init
- kubectl create serviceaccount --namespace kube-system tiller
- kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
- kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
# Create a namespace for your ingress resources
- kubectl create namespace ingress-basic
# Wait 1 minutes
- sleep 60
# Use Helm to deploy an NGINX ingress controller
- helm install stable/nginx-ingress --namespace ingress-basic --set controller.replicaCount=2 --set controller.nodeSelector."beta\.kubernetes\.io/os"=linux --set defaultBackend.nodeSelector."beta\.kubernetes\.io/os"=linux
# Test by get public IP
- kubectl get service
- kubectl get service -l app=nginx-ingress --namespace ingress-basic
#- while [ "$(kubectl get service -l app=nginx-ingress --namespace ingress-basic | grep pending)" == "pending" ]; do echo "Updating"; sleep 1 ; done && echo "Finished"
- while [ "$(kubectl get service -l app=nginx-ingress --namespace ingress-basic -o jsonpath='{.items[*].status.loadBalancer.ingress[*].ip}')" == "" ]; do echo "Updating"; sleep 10 ; done && echo "Finished"
# Add Ingress Ext IP / Alternative
- KUBip=$(kubectl get service -l app=nginx-ingress --namespace ingress-basic -o jsonpath='{.items[*].status.loadBalancer.ingress[*].ip}')
- echo $KUBip
# Add DNS Name - TODO - GITLAB ENV VARIABELEN KLAPPEN NICHT
- DNSNAME="bl-test"
# Get the resource-id of the public ip
- PUBLICIPID=$(az network public-ip list --query "[?ipAddress!=null]|[?contains(ipAddress, '$KUBip')].[id]" --output tsv)
- echo $PUBLICIPID
- az network public-ip update --ids $PUBLICIPID --dns-name $DNSNAME
#Install CertManager Console
# Install the CustomResourceDefinition resources separately
- kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.8/deploy/manifests/00-crds.yaml
# Create the namespace for cert-manager
- kubectl create namespace cert-manager
# Label the cert-manager namespace to disable resource validation
- kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
# Add the Jetstack Helm repository
- helm repo add jetstack https://charts.jetstack.io
# Update your local Helm chart repository cache
- helm repo update
# Install the cert-manager Helm chart
- helm install --name cert-manager --namespace cert-manager --version v0.8.0 jetstack/cert-manager
# Run Command issuer.yaml
- sed 's/_AZ_AKS_ISSUER_NAME_/'"${AZ_AKS_ISSUER_NAME}"'/g; s/_BL_DEV_E_MAIL_/'"${BL_DEV_E_MAIL}"'/g' infrastructure/kubernetes/cluster-issuer.yaml > cluster-issuer.yaml;
- kubectl apply -f cluster-issuer.yaml
# Run Command ingress.yaml
- sed 's/_BL_AZ_HOST_/'"beautylivery-test.${AZ_RESOURCE_LOCATION}.${AZ_AKS_HOST}"'/g; s/_AZ_AKS_ISSUER_NAME_/'"${AZ_AKS_ISSUER_NAME}"'/g' infrastructure/kubernetes/ingress.yaml > ingress.yaml;
- kubectl apply -f ingress.yaml
And the result
Running with gitlab-runner 12.3.0 (a8a019e0)
on runner-gitlab-runner-676b494b6b-b5q6h gzi97H3Q
Using Kubernetes namespace: gitlab-managed-apps
Using Kubernetes executor with image microsoft/azure-cli:latest ...
Waiting for pod gitlab-managed-apps/runner-gzi97h3q-project-14628452-concurrent-0l8wsx to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-gzi97h3q-project-14628452-concurrent-0l8wsx to be running, status is Pending
Running on runner-gzi97h3q-project-14628452-concurrent-0l8wsx via runner-gitlab-runner-676b494b6b-b5q6h...
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/****/*******/.git/
Created fresh repository.
From https://gitlab.com/****/********
* [new branch] Setup-Kubernetes -> origin/Setup-Kubernetes
Checking out d2ca489b as Setup-Kubernetes...
Skipping Git submodules setup
$ function create_secret() { # collapsed multi-line command
$ echo "current time $(TZ=Europe/Berlin date +"%F %T")"
current time 2019-10-06 09:00:50
$ az login --service-principal -u ${AZ_PRINC_USER} -p ${AZ_PRINC_PASSWORD} --tenant ${AZ_PRINC_TENANT}
[
{
"cloudName": "AzureCloud",
"id": "******",
"isDefault": true,
"name": "Nutzungsbasierte Bezahlung",
"state": "Enabled",
"tenantId": "*******",
"user": {
"name": "http://*****",
"type": "servicePrincipal"
}
}
]
$ az group create --name ${AZ_RESOURCE_GROUP} --location ${AZ_RESOURCE_LOCATION}
{
"id": "/subscriptions/*********/resourceGroups/*****",
"location": "francecentral",
"managedBy": null,
"name": "******",
"properties": {
"provisioningState": "Succeeded"
},
"tags": null,
"type": "Microsoft.Resources/resourceGroups"
}
$ az aks delete --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_AKS_TEST_CLUSTER} --yes
Running after script...
$ echo "current time $(TZ=Europe/Berlin date +"%F %T")"
current time 2019-10-06 09:05:55
Job succeeded
Is there ways to have the running completely ?
And succesfull in the best case ?
UPDATE: What is the idea: i try to automate the process of setting up a complete kubernetes Cluster with SSL and DNS management. Having everything fas setup and ready for different use cases and different environments in future. I also want to learn how to do things better :)
NEW_UPDATE:
Added a solution

I added a small work around as I was expecting it requires an execution every once in a while.....
It seems the az aks wait command did the trick for me as of now. and the previous command requires --no-wait in order to continue.
# Delete Kubernetes Cluster
- az aks delete --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_AKS_TEST_CLUSTER} --no-wait --yes
- az aks wait --deleted -g ${AZ_RESOURCE_GROUP} -n ${AZ_AKS_TEST_CLUSTER} --updated --interval 60 --timeout 1800
# Create Kubernetes Cluster
- az aks create --name ${AZ_AKS_TEST_CLUSTER} --resource-group ${AZ_RESOURCE_GROUP} --node-count ${AZ_AKS_TEST_NODECOUNT} --service-principal ${AZ_PRINC_USER} --client-secret ${AZ_PRINC_PASSWORD} --generate-ssh-keys --no-wait
- az aks wait --created -g ${AZ_RESOURCE_GROUP} -n ${AZ_AKS_TEST_CLUSTER} --updated --interval 60 --timeout 1800

Related

Helm: Expanding environment variables in values.yaml with helm install or upgrade

In our Jenkins pipeline, I'm using a bash script to call the helm install command. We have a values.yaml containing most of the values to be passed to helm. However, few values are based upon environment variables and have to be passed using the --set argument. Here is the snippet:
helm install $RELEASE_NAME shared/phoenixmsp-app -f value.yaml \
--set global.env.production=$production \
--set global.cluster.hosts=${CONFIG[${CLUSTER_NAME}]} \
--set nameOverride=$RELEASE_NAME \
--set fullnameOverride=$RELEASE_NAME \
--set image.repository=myhelm.hub.mycloud.io/myrepo/mainservice \
--set-string image.tag=$DOCKER_TAG \
--wait --timeout 180s --namespace $APP_NAMESPACE"
We want to move these --set parameters to values.yaml. The goal is to get rid of --set and simply pass the values.yaml.
Question: Is it possible to expand Environment Variables in values.yaml while calling with helm install or helm upgrade?

The only way I think you can do that, if you really want to use a single yaml is to have a template values.yaml and either sed the values into it or use a templating language like jinja or mustache, then feed the resulting output into helm.

--set is a good solution here, but if you really don't want that, dynamically write a second values file for the run-time values.
echo "
global:
env:
production: $production
cluster:
hosts: ${CONFIG[${CLUSTER_NAME}]}
nameOverride: $RELEASE_NAME
fullnameOverride: $RELEASE_NAME
image:
repository: myhelm.hub.mycloud.io/myrepo/mainservice
tag: $DOCKER_TAG
" > runtime.yaml
helm install $RELEASE_NAME shared/phoenixmsp-app -f value.yaml -f runtime.yaml \
--wait --timeout 180s --namespace $APP_NAMESPACE
This really does nothing but slightly reduce the precedence, though.
If all these values are known ahead of time, maybe build your runtime.yaml in advance and throw it into a git repo people can peer-review before deployment time, and just use the variables to select the right file from the repo.

Azure Synapse Private Endpoint Approve

Via some Terraform scripts within a CICD process I am trying to create a Managed private Endpoint for an Azure SQL Server Linked service. This is successful using the following code:
resource "azurerm_synapse_managed_private_endpoint" "mi_metadata_transform_sql_server_private_endpoint" {
name = "mi_synapse_metadata_transform_private_endpoint"
subresource_name = "sqlServer"
synapse_workspace_id = module.mi_synapse_workspace.synapse_workspace_id
target_resource_id = azurerm_mssql_server.mi-metadata-transform-sql-server.id}
But that leaves the Endpoint in a "Pending Approval State". So adding the code below which is based on some of our existing code that approves some storage via Bash, I decided to copy that code and adjust accordingly for SQL Server. And this is where my problem begins.....
function enable_sql_private_endpoint {
endpoints=$(az sql server show --name $1 -g ${{ parameters.resourceGroupName }} --subscription $(serviceConnection) --query "privateEndpointConnections[?properties.privateLinkServiceConnectionState.status=='Pending'].id" -o tsv)
for endpoint in $endpoints
do
az sql server private-endpoint-connection approve --account-name $1 --name $endpoint --resource-group ${{ parameters.resourceGroupName }} --subscription $(serviceConnection)
done
}
sqlServers="$(az sql server list -g ${{ parameters.resourceGroupName }} --query '[].name' --subscription $(serviceConnection) -o tsv)"
for sqlServerName in $sqlServers
do
echo "Processing $sqlServerName ========================================="
enable_sql_private_endpoint $sqlServerName
done
The code above is executed in a further step in a YAML file and in it's simplest terms:
YAML Orchestrator File executed via CICD
Terraform Script called to create resource (code snippet 1)
Another YAML file executed to approve endpoints using inline Bash (code snippet 2)
The problem is with az sql server private-endpoint-connection approve and that it doesn't exist. When I review this link I cannot see anything remotely like the approve option for SQL Server Endpoints like what Storage or MySQL have. Any help would be appreciated on how this can be achieved

Currently, you can't approve a Managed Private Endpoint using Terraform.
Note: Azure PowerShell and Azure CLI are the preferred methods for managing Private Endpoint connections on Microsoft Partner Services or customer owned services.
For more details, refer to Manage Private Endpoint connections on a customer/partner owned Private Link service.

In the end, this is what I used in my YAML / Bash to get things working:
sqlServers="$(az sql server list -g ${{ parameters.resourceGroupName }} --query '[].name' --subscription $(serviceConnection) -o tsv)"
for sqlServerName in $sqlServers
do
echo "Processing $sqlServerName ========================================="
enable_sql_private_endpoint $sqlServerName
done
and
function enable_sql_private_endpoint {
endpoints=$(az sql server show --name $1 -g ${{ parameters.resourceGroupName }} --subscription $(serviceConnection) --query "privateEndpointConnections[?properties.privateLinkServiceConnectionState.status=='Pending'].id" -o tsv)
for endpoint in $endpoints
do
az network private-endpoint-connection approve -g ${{ parameters.resourceGroupName }} --subscription $(serviceConnection) --id $endpoint --type Microsoft.Sql/servers --description "Approved" --resource-name $1
done
}
With the following line being the key syntax to use if anyone ever encounters such a similar scenario in their CICD with Syanpse and Managed Private Endpoints:
az storage account private-endpoint-connection approve --account-name $1 --name $endpoint --resource-group ${{ parameters.resourceGroupName }} --subscription $(serviceConnection)

Getting value of a variable value in azure pipeline

enviornment: 'dev'
acr-login: $(enviornment)-acr-login
acr-secret: $(enviornment)-acr-secret
dev-acr-login and dev-acr-secret are secrets stored in keyvault for acr login and acr secret.
In Pipeline, getting secrets with this task
- task: AzureKeyVault#1
inputs:
azureSubscription: $(connection)
KeyVaultName: $(keyVaultName)
SecretsFilter: '*'
This task will create task variables with name 'dev-acr-login' and 'dev-acr-secret'
Not if I want to login in docker I am not able to do that
Following code works and I am able to login into acr.
- bash: |
echo $(dev-acr-secret) | docker login \
$(acrName) \
-u $(dev-acr-login) \
--password-stdin
displayName: 'docker login'
Following doesnot work. Is there a way that I can use variable names $(acr-login) and $(acr-secret) rather than actual keys from keyvault?
- bash: |
echo $(echo $(acr-secret)) | docker login \
$(acrRegistryServerFullName) \
-u $(echo $(acr-login)) \
--password-stdin
displayName: 'docker login'

You could pass them as environment variables:
- bash: |
echo $(echo $ACR_SECRET) | ...
displayName: docker login
env:
ACR_SECRET: $(acr-secret)
But what is the purpose, as opposed to just echoing the password values as you said works in the other example? As long as the task is creating secure variables, they will be protected in logs. You'd need to do that anyway, since they would otherwise show up in diagnostic logs if someone enabled diagnostics, which anyone can do.
An example to do that:
- bash: |
echo "##vso[task.setvariable variable=acr-login;issecret=true;]$ACR_SECRET"
env:
ACR_SECRET: $($(acr-secret)) # Should expand recursively
See Define variables : Set secret variables for more information and examples.

How to find out which ECS cluster is associated to an ALB

We run an ECS cluster behind an ELB (ALB, to be specific).
I have a process that allows me to find out which ECS cluster is associated with the ALB by querying the ALB and tracing the results back through the target group and then instances:
Here is the bash script:
ELB_NAME=$(aws route53 list-resource-record-sets --hosted-zone-id <Zone-ID> | jq -r --arg URL "$URL"'.ResourceRecordSets[]|select(.Name==$URL)|.AliasTarget.DNSName')
ELB_NAME=$(echo $ELB_NAME | cut -f 2- -d "." | rev | cut -f 2- -d "." | rev)
ELB_ARN=$(aws elbv2 describe-load-balancers | jq -r --arg ELB_NAME "$ELB_NAME" '.LoadBalancers[]|select((.DNSName|ascii_downcase)==$ELB_NAME)|.LoadBalancerArn')
TG_ARNS=$(aws elbv2 describe-target-groups | jq -r --arg ELB_ARN "$ELB_ARN" '.TargetGroups[]|select(.LoadBalancerArns[]==$ELB_ARN)|.TG_ARN=$(echo $TG_ARNS | cut -f 1 -d " ")
INSTANCE_ID=$(aws elbv2 describe-target-health --target-group-arn $TG_ARN | jq -r '.TargetHealthDescriptions[].Target.Id' | head -n 1)
CLUSTER=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID | jq -r '.Reservations[].Instances[].Tags[]|select(.Key=="aws:cloudformation:stack-name")|.Value' | cut -f 2 -d "-")
The problem I have is that when there are no running instances associated with the ECS cluster, I can no longer query them for the the tag that returns the Cloudformation stack name, the request for the targets from the target group is empty.
How can I use the AWS API so that I can determine which ECS cluster the ALB would target if it had running instances?

It's not really clear what you're asking for, or indeed the purpose you are trying to achieve, but the following should set you on the right track.
An ECS "cluster" is really just an Amazon service, when you create a cluster nothing is really provisioned. You can think of an empty cluster as a record or a placeholder in the ECS service.
In order to do anything with a cluster, it needs instances. When you boot an EC2 machine from a supported AMI, appropriate IAM role and the cluster name written to a config file, the instance will join the cluster. (If you create a cluster via the AWS console, a CloudFormation template is created that handles the provisioning and orchestration of these steps.) The ECS cluster management can then schedule tasks and services onto that instance as you have defined in the ECS service.
Without any instances, there can be no listening containers, therefore there can be no target groups in your ALB that route to anything. So it is not possible to get from the ELB to the cluster... as you have asked when there are no running instances.
You might find the following commands are a better way of determining whether or not you have a running cluster.
First, use the list-clusters command to show which clusters are available:
aws ecs list-clusters
{
"clusterArns": [
"arn:aws:ecs:eu-west-1:XXXXXXXXX:cluster/your_cluster"
]
}
Then use the output from that to show if there are any EC2 instances registered to the cluster:
aws ecs describe-clusters --clusters your_cluster
{
"clusters": [
{
"status": "ACTIVE",
"statistics": [],
"clusterName": "your_cluster",
"registeredContainerInstancesCount": 1,
"pendingTasksCount": 0,
"runningTasksCount": 0,
"activeServicesCount": 0,
"clusterArn": "arn:aws:ecs:eu-west-1:XXXXXXXXX:cluster/your_cluster"
}
],
"failures": []
}
Note the registeredContainerInstancesCount property shows the number of running instances. I assume you have your ECS services set to register tasks (containers) with the ALB, so when the count is greater than 0, this will be possible.
So, querying that property should tell you if your cluster is "on" or not:
if [[ $(aws ecs describe-clusters --clusters your_cluster | jq -r '.clusters[].registeredContainerInstancesCount') -gt 0 ]] ; then
echo "cluster is on"
else
echo "cluster is off"
fi

etcd error when trying to start service rejected send message

I am using ubuntu 14.04 and Im configuring etcd for use with calico, but the service does not work.
This is my etcd.conf file:
# vim:set ft=upstart ts=2 et:
description "etcd"
author "etcd maintainers"
start on stopped rc RUNLEVEL=[2345]
stop on runlevel [!2345]
respawn
setuid etcd
env ETCD_DATA_DIR=/var/lib/etcd
export ETCD_DATA_DIR
exec /usr/bin/etcd --name="uno" \
--advertise-client-urls="http://172.16.8.241:2379,http://172.16.8.241:4001" \
--listen-client-urls="http://0.0.0.0:2379,http://0.0.0.0:4001" \
--listen-peer-urls "http://0.0.0.0:2380" \
--initial-advertise-peer-urls "http://172.16.8.241:2380" \
--initial-cluster-token $(uuidgen) \
--initial-cluster "node1=http://172.16.8.241:2380" \
--initial-cluster-state "new"
When I try to start:
ikerlan#uno:~$ service etcd start
start: Rejected send message, 1 matched rules; type="method_call", sender=":1.128" (uid=1000 pid=7374 comm="start etcd ") interface="com.ubuntu.Upstart0_6.Job" member="Start" error name="(unset)" requested_reply="0" destination="com.ubuntu.Upstart" (uid=0 pid=1 comm="/sbin/init")
What could be the problem?

Try to run with sudo:
sudo service etcd start
Then if you got error like:
start: Job failed to start
Rerun after add user etcd:
sudo adduser etcd
Update:
If etcd instance can't start, check the following two things:
1: your etcd start command is right, in your case, the etcd command can't run as you will get err msg like :
etcd: couldn't find local name "uno" in the initial cluster configuration
so change your content in /etc/init/etcd.conf to :
--initial-cluster "uno=http://172.16.8.241:2380" \
where your original config is :
--initial-cluster "node1=http://172.16.8.241:2380" \
2: user etcd should have the permission to write to /var/lib/etcd

Etcd flags "name" and "initial-cluster" must be match together.
....
--name="keukenhof" \
--initial-cluster="keukenhof=http://localhost:2380"
....

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio