The problem: Some steps create entities in K8b that must eventually be removed regardless of the success of any other steps in the build.
Here is an example of cloudbuild.yaml
steps:
# TEST NAMESPACE
#
# some previous steps...
# package jar
# build container & etc.
# Kubernetes RUN DB
- name: 'gcr.io/cloud-builders/gke-deploy'
id: deploy-db
waitFor: ['-']
args: ['apply',
'--filename', './k8b/db/',
'--location', 'somewhere',
'--cluster', 'my-trololo-cluster']
# Run something else in Kubernetes
- name: 'gcr.io/cloud-builders/gke-deploy'
id: deploy-other-things
waitFor:
- 'deploy-db'
args: ['apply',
'--filename', './k8b/other-things/',
'--location', 'somewhere',
'--cluster', 'my-trololo-cluster']
# Test DB in pod
- name: 'gcr.io/cloud-builders/gke-deploy'
id: test-db
waitFor:
- 'deploy-other-things'
entrypoint: 'bash'
args: ['./scripts/test_db.sh']
# Run REST-API
- name: 'gcr.io/cloud-builders/gke-deploy'
id: deploy-rest
waitFor:
- 'deploy-other-things'
args: ['run',
'--filename', './k8b/rest/',
'--location', 'somewhere',
'--cluster', 'my-trololo-cluster' ]
# Test REST-API
- name: 'gcr.io/cloud-builders/gke-deploy'
id: test-REST-API
waitFor:
- 'deploy-rest'
- 'test-db'
entrypoint: 'bash'
args: ['./scripts/test_rest_api.sh']
# Cleanup steps
- name: 'gcr.io/cloud-builders/gke-deploy'
id: cleanup
waitFor:
- 'test-REST-API'
entrypoint: 'kubectl'
args: [ 'delete', '--filename', './k8b', '--recursive' ]
# Delete PERSISTENT VOLUME
- name: 'gcr.io/cloud-builders/gke-deploy'
id: delete-persistent-volume
waitFor:
- 'test-REST-API'
entrypoint: 'bash'
args:
- '-c'
- |
pvc_name=$(kubectl get pvc --selector=$sel -o jsonpath={.items..metadata.name})
kubectl delete pvc ${pvc_name}
So, if any step before the “cleanup steps” fails, the deletion of entities in GKE will not occur.
And the next cloudbuild runs will not happen on a clean cluster.
I can't find any solution to this case in the docs.
And I think that at the moment I can solve this problem using bash scripts at every step, and if there is a crash, I need to:
catch it in a bash script;
give a command to clear the cluster inside the bash script;
then exit the script with a non-zero code.
And so in every step.
But this is not a very good solution in my opinion. Maybe there is some better solution?
Related
Have anyone come across this error. Elastic search fails with bootstrap errors. How do we fix them.
{"#timestamp":"2022-11-14T08:57:21.047Z", "log.level":"ERROR", "message":"uncaught exception in thread [process reaper (pid 86)]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"process reaper (pid 86)","log.logger":"org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler","elasticsearch.node.name":"es-cluster-1","error.type":"java.security.AccessControlException","error.message":"access denied (\"java.lang.RuntimePermission\" \"modifyThread\")","error.stack_trace":"java.security.AccessControlException: access denied (\"java.lang.RuntimePermission\" \"modifyThread\")\n\tat java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:485)\n\tat java.base/java.security.AccessController.checkPermission(AccessController.java:1068)\n\tat java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:411)\n\tat org.elasticsearch.securesm#8.5.0/org.elasticsearch.secure_sm.SecureSM.checkThreadAccess(SecureSM.java:166)\n\tat org.elasticsearch.securesm#8.5.0/org.elasticsearch.secure_sm.SecureSM.checkAccess(SecureSM.java:120)\n\tat java.base/java.lang.Thread.checkAccess(Thread.java:2360)\n\tat java.base/java.lang.Thread.setDaemon(Thread.java:2308)\n\tat java.base/java.lang.ProcessHandleImpl.lambda$static$0(ProcessHandleImpl.java:103)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:637)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:928)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1021)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1158)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1589)\n\tat java.base/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:186)\n"}
Deployment files have the relevant settings set.
env:
- name: cluster.name
value: k8s-logs
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: discovery.seed_hosts
value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
- name: cluster.initial_master_nodes
value: "es-cluster-0,es-cluster-1,es-cluster-2"
- name: ES_JAVA_OPTS
value: "-Xms1024m -Xmx2048m"
nodeSelector:
app: aks1
initContainers:
- name: fix-permissions
image: busybox
command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
securityContext:
privileged: true
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
- name: increase-vm-max-map
image: busybox
command: ["sysctl", "-w", "vm.max_map_count=262144"]
securityContext:
privileged: true
- name: increase-fd-ulimit
image: busybox
command: ["sh", "-c", "ulimit -n 65536"]
securityContext:
privileged: true
Have anyone come across this error. Elastic search fails with bootstrap errors. How do we fix them.
{"#timestamp":"2022-11-14T08:57:21.047Z", "log.level":"ERROR", "message":"uncaught exception in thread [process reaper (pid 86)]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"process reaper (pid 86)","log.logger":"org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler","elasticsearch.node.name":"es-cluster-1","error.type":"java.security.AccessControlException","error.message":"access denied (\"java.lang.RuntimePermission\" \"modifyThread\")","error.stack_trace":"java.security.AccessControlException: access denied (\"java.lang.RuntimePermission\" \"modifyThread\")\n\tat java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:485)\n\tat java.base/java.security.AccessController.checkPermission(AccessController.java:1068)\n\tat java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:411)\n\tat org.elasticsearch.securesm#8.5.0/org.elasticsearch.secure_sm.SecureSM.checkThreadAccess(SecureSM.java:166)\n\tat org.elasticsearch.securesm#8.5.0/org.elasticsearch.secure_sm.SecureSM.checkAccess(SecureSM.java:120)\n\tat java.base/java.lang.Thread.checkAccess(Thread.java:2360)\n\tat java.base/java.lang.Thread.setDaemon(Thread.java:2308)\n\tat java.base/java.lang.ProcessHandleImpl.lambda$static$0(ProcessHandleImpl.java:103)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:637)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:928)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1021)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1158)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1589)\n\tat java.base/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:186)\n"}
I'm working on a concourse pipeline and I need to duplicate a lot of code in my YAML so I'm trying to refactor it so it is easily maintainable and I don't end up with thousands of duplicates lines/blocks.
I have achieve the following yaml file after what seems to be the way to go but it doesn't fullfill all my needs.
add-rotm-points: &add-rotm-points
task: add-rotm-points
config:
platform: linux
image_resource:
type: docker-image
source:
repository: ((registre))/polygone/concourse/cf-cli-python3
tag: 0.0.1
insecure_registries: [ ((registre)) ]
run:
path: source-pipeline/commun/rotm/trigger-rotm.sh
args: [ "source-pipeline", "source-code-x" ]
inputs:
- name: source-pipeline
- name: source-code-x
jobs:
- name: test-a
plan:
- in_parallel:
- get: source-pipeline
- get: source-code-a
trigger: true
- <<: *add-rotm-points
- name: test-b
plan:
- in_parallel:
- get: source-pipeline
- get: source-code-b
trigger: true
- <<: *add-rotm-points
My problem is that both my jobs uses the generic task defined at the top. But in the generic task I need to change source-code-x to the -a or -b version I use in my jobs.
I cannot find a way to achieve this without duplicating my anchor in every jobs and that seems to be counter productive. But i may not have full understood yaml anchors/merges.
All you need to do is map inputs on individual tasks, like this:
add-rotm-points: &add-rotm-points
task: add-rotm-points
config:
platform: linux
image_resource:
type: docker-image
source:
repository: ((registre))/polygone/concourse/cf-cli-python3
tag: 0.0.1
insecure_registries: [ ((registre)) ]
run:
path: source-pipeline/commun/rotm/trigger-rotm.sh
args: [ "source-pipeline", "source-code-x" ]
inputs:
- name: source-pipeline
- name: source-code-x
jobs:
- name: test-a
plan:
- in_parallel:
- get: source-pipeline
- get: source-code-a
trigger: true
- <<: *add-rotm-points
input_mapping:
source-code-x: source-code-a
- name: test-b
plan:
- in_parallel:
- get: source-pipeline
- get: source-code-b
trigger: true
- <<: *add-rotm-points
input_mapping:
source-code-x: source-code-b
See Example Three in this blog: https://blog.concourse-ci.org/introduction-to-task-inputs-and-outputs/
I am backuping my Postgresql database using this cronjob:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: postgres-backup
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: postgres-backup
image: postgres:10.4
command: ["/bin/sh"]
args: ["-c", 'echo "$PGPASS" > /root/.pgpass && chmod 600 /root/.pgpass && pg_dump -Fc -h <host> -U <user> <db> > /var/backups/backup.dump']
env:
- name: PGPASS
valueFrom:
secretKeyRef:
name: pgpass
key: pgpass
volumeMounts:
- mountPath: /var/backups
name: postgres-backup-storage
restartPolicy: Never
volumes:
- name: postgres-backup-storage
hostPath:
path: /var/volumes/postgres-backups
type: DirectoryOrCreate
The cronjob gets successfully executed, the backup is made and saved in the container of the Job but this container is stopped after successful execution of the script.
of course I want to access the backup files in the container but I can't because it is stopped/terminated.
is there a way to execute shell commands in a container after it is terminated, so I can access the backup files saved in the container?
I know that I could do that on the node, but I don't have the permission to access it.
#confused genius gave me a great idea to create another same container to access the dump files so this is the solution that works:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: postgres-backup
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: postgres-backup
image: postgres:10.4
command: ["/bin/sh"]
args: ["-c", 'echo "$PGPASS" > /root/.pgpass && chmod 600 /root/.pgpass && pg_dump -Fc -h <host> -U <user> <db> > /var/backups/backup.dump']
env:
- name: PGPASS
valueFrom:
secretKeyRef:
name: dev-pgpass
key: pgpass
volumeMounts:
- mountPath: /var/backups
name: postgres-backup-storage
- name: postgres-restore
image: postgres:10.4
volumeMounts:
- mountPath: /var/backups
name: postgres-backup-storage
restartPolicy: Never
volumes:
- name: postgres-backup-storage
hostPath:
# Ensure the file directory is created.
path: /var/volumes/postgres-backups
type: DirectoryOrCreate
after that one just needs to sh into the "postgres-restore" container and access the dump files.
thanks
I'm using cloud build to clone a repository. I can confirm the repository clones successfully to the cloud build /workspace volume.
steps:
- id: 'Clone repository'
name: 'gcr.io/cloud-builders/git'
args: ['clone', $_REPO_URL]
volumes:
- name: 'ssh'
path: /root/.ssh
I then run the next step to confirm
- id: 'List'
name: 'alpine'
args: ['ls']
and it shows me the repository is in the current directory. But when I try and cd into the directory the cd command doesn't work and throws an error:
ERROR: build step 3 "alpine" failed: starting step container failed: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "cd <repo-name>": executable file not found in $PATH: unknown
My ultimate goal is to cd into the repository and run some git commands. I use alpine later on because the git builder image doesn't allow me to use cd either.
substitutions:
_REPO_NAME: 'test-repo'
_REPO_URL: 'git#bitbucket.org:example/test-repo.git'
_BRANCH_NAME: 'feature/something'
steps:
- id: 'Clone repository'
name: 'gcr.io/cloud-builders/git'
args: ['clone', $_REPO_URL]
volumes:
- name: 'ssh'
path: /root/.ssh
- id: 'Check Diff'
name: 'alpine'
args: ['cd $_REPO_NAME', '&&', 'git checkout $_BRANCH_NAME', '&&', 'git diff main --name-only']
You can use bash to run any commands you would like.
Here is one example I use for one of my projects:
- name: 'gcr.io/cloud-builders/git'
id: Clone env repository
entrypoint: /bin/sh
args:
- '-c'
- |
git clone git#github.com:xyz/abc.git && \
cd gitops-env-repo/ && \
git checkout dev
Use dir field in your *.yaml file.
steps:
- name: string
args: [string, string, ...]
env: [string, string, ...]
dir: string
id: string
waitFor: [string, string, ...]
entrypoint: string
secretEnv: string
volumes: object(Volume)
timeout: string (Duration format)
- name: string
...
- name: string
...
timeout: string (Duration format)
queueTtl: string (Duration format)
logsBucket: string
options:
env: [string, string, ...]
secretEnv: string
volumes: object(Volume)
sourceProvenanceHash: enum(HashType)
machineType: enum(MachineType)
diskSizeGb: string (int64 format)
substitutionOption: enum(SubstitutionOption)
dynamicSubstitutions: boolean
logStreamingOption: enum(LogStreamingOption)
logging: enum(LoggingMode)
pool: object(PoolOption)
substitutions: map (key: string, value: string)
tags: [string, string, ...]
serviceAccount: string
secrets: object(Secret)
availableSecrets: object(Secrets)
artifacts: object (Artifacts)
images:
- [string, string, ...]
https://cloud.google.com/build/docs/build-config-file-schema
I'm trying to pull source code from Github then build and push a docker image to docker hub using Tekton pipeline and Knative on Kubernetes cluster.
I'm following this link for the installation and setup of Tekton:
https://www.ibm.com/cloud/blog/build-a-knative-service-with-tekton-and-apache-openwhisk-nodejs-runtime
task-build.yaml
apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
name: task-build
spec:
inputs:
resources:
- name: docker-source
type: git
params:
- name: TARGET_IMAGE_NAME
description: name of the image to be tagged and pushed
- name: TARGET_IMAGE_TAG
description: tag the image before pushing
default: "latest"
- name: DOCKERFILE
description: name of the dockerfile
- name: OW_RUNTIME_DEBUG
description: flag to indicate debug mode should be on/off
default: "false"
- name: OW_RUNTIME_PLATFORM
description: flag to indicate the platform, one of ["openwhisk", "knative", ... ]
default: "knative"
- name: OW_ACTION_NAME
description: name of the action
default: "foo"
- name: OW_ACTION_CODE
description: JavaScript source code to be evaluated
default: ""
- name: OW_ACTION_MAIN
description: name of the function in the "__OW_ACTION_CODE" to call as the action handler
default: "main"
- name: OW_ACTION_BINARY
description: flag to indicate zip function, for zip actions, "__OW_ACTION_CODE" must be base64 encoded string
default: "false"
- name: OW_HTTP_METHODS
description: list of HTTP methods, any combination of [GET, POST, PUT, and DELETE], default is [POST]
default: "[POST]"
- name: OW_ACTION_RAW
description: flag to indicate raw HTTP handling, interpret and process an incoming HTTP body directly
default: "false"
outputs:
resources:
- name: builtImage
type: image
steps:
- name: add-ow-env-to-dockerfile
image: "gcr.io/kaniko-project/executor:debug"
command:
- /busybox/sh
args:
- -c
- |
cat <<EOF >> ${inputs.params.DOCKERFILE}
ENV __OW_RUNTIME_DEBUG "${inputs.params.OW_RUNTIME_DEBUG}"
ENV __OW_RUNTIME_PLATFORM "${inputs.params.OW_RUNTIME_PLATFORM}"
ENV __OW_ACTION_NAME "${inputs.params.OW_ACTION_NAME}"
ENV __OW_ACTION_CODE "${inputs.params.OW_ACTION_CODE}"
ENV __OW_ACTION_MAIN "${inputs.params.OW_ACTION_MAIN}"
ENV __OW_ACTION_BINARY "${inputs.params.OW_ACTION_BINARY}"
ENV __OW_HTTP_METHODS "${inputs.params.OW_HTTP_METHODS}"
ENV __OW_ACTION_RAW "${inputs.params.OW_ACTION_RAW}"
EOF
- name: adapt-dockerfile-to-tekton
image: "gcr.io/kaniko-project/executor:debug"
command:
- sed
args:
- -i
- -e
- 's/COPY ./COPY .\/docker-source/g'
- ${inputs.params.DOCKERFILE}
- name: build-openwhisk-nodejs-runtime
image: "gcr.io/kaniko-project/executor:latest"
args: ["--destination=${inputs.params.TARGET_IMAGE_NAME}:${inputs.params.TARGET_IMAGE_TAG}", "--dockerfile=${inputs.params.DOCKERFILE}"]
When trying to build and push the image, am getting error:
conditions:
- lastTransitionTime: "2020-09-24T07:33:11Z"
"step-add-ow-env-to-dockerfile" exited with code 2 (image: "docker-pullable://gcr.io/kaniko-project/executor#sha256:0f27b0674797b56db08010dff799c8926c4e9816454ca56cc7844df228c53485"); for logs run: kubectl -n default logs task-run-helloworld-pod-5bbkx -c step-add-ow-env-to-dockerfile
reason: Failed
status: "False"
type: Succeeded
When checked the logs for error msg, I'm getting:
Error : /busybox/sh: syntax error: bad substitution