Detect network error and retry running task in Azure - bash

Running bash script in Azure pipeline. I am trying to rerun the npm publish step x-times if there is any network error is detected.
Is there any way to detect a network error specifically and rerun the whole task again?
I've found this document (https://learn.microsoft.com/en-us/azure/devops/release-notes/2021/sprint-195-update#automatic-retries-for-a-task) but I believe this reruns the process regardless of the error type.

Related

How to fail Azure devops pipeline task specifically for failures in bash script

I am using Azure Devops pipeline and in that there is one task that will create KVM guest VM and once VM is created through packer inside the host it will run a bash script to check the status of services running inside the guest VM.
If any services are not running or thrown error then this bash script will exit with code 3 as i have added the value in bash script as below
set -e
So i want the task to fail if the above bash script fails, but issue is in the same task as KVM guest VM is getting created so while booting up and shutdown it throws expected errors but i dont want this task to fail due these error but to fail it only bash scripts fails.
i have selected the option in task "Fail on Standard Error"
But not sure how we can fail the task specifically for bash script error, can anyone have some suggestions on this?
You can try and use exit 1 command to have the bash task failed. And it is often a command you'll issue soon after an error is logged.
Additionally, you also may use logging commands to customized a error message. Kindly refer to the sample below.
#!/bin/bash
echo "##vso[task.logissue type=error]Something went very wrong."
exit 1

Error syncing pod on starting Beam - Dataflow pipeline from docker

We are constantly getting an error while starting our Beam Golang SDK pipeline (driver program) from a docker image which works when started from local / VM instance. We are using Dataflow runner for our pipeline and Kubernetes to deploy.
LOCAL SETUP:
We have GOOGLE_APPLICATION_CREDENTIALS variable set with service account for our GCP cluster. When running the job from local, job gets submitted to dataflow and completes successfully.
DOCKER SETUP:
Build image used is FROM golang:1.14-alpine. When we pack the same program with Dockerfile and try to run, it fails with error
User program exited: fork/exec /bin/worker: no such file or directory
On checking Stackdriver logs for more details, we see this:
Error syncing pod 00014c7112b5049966a4242e323b7850 ("dataflow-go-job-1-1611314272307727-
01220317-27at-harness-jv3l_default(00014c7112b5049966a4242e323b7850)"),
skipping: failed to "StartContainer" for "sdk" with CrashLoopBackOff:
"back-off 2m40s restarting failed container=sdk pod=dataflow-go-job-1-
1611314272307727-01220317-27at-harness-jv3l_default(00014c7112b5049966a4242e323b7850)"
Found reference to this error in Dataflow common errors doc, but it is too generic to figure out whats failing. After multiple retries, we were able to eliminate any permission / access related issues from pods. Not sure what else could be the problem here.
After multiple attempts, we decided to start the job manually from a new Debian 10 based VM instance and it worked. This brought to our notice that we are using alpine based golang image in Docker which may not have all the required dependencies installed to start the job.
On golang docker hub, we found a golang:1.14-buster where buster is codename for Debian 10. Using that for docker build helped us solve the issue. Self answering here to help anyone else facing the same issues.

How to show Docker errors in Bamboo?

I'm currently using Docker for Windows with Bamboo. I'm running into the problem, that when a Docker task fails, the error log only shows that exact message, that the task failed. But it doesn't show the error message from within Docker, like compilation errors, access denied or out of memory. Is there any way to get to these error messages?
You can setup a task in the "Final tasks" area of your job. This job will execute regardless of whether the tasks in the job failed or not.
Depending on how you are running Docker in Bamboo you can make this a Docker task or a script task and use the Docker logs command to output the logs from the container. This should tell you why the container failed to run (e.g., build, run-time error, initialization failure).
Docker Logs CLI

Azure custom script extension stuck. not able to rerun or delete extension

I was trying to run a custom script on my scaleset vm due to the wrong location of the sh file the exeuction failed. but after that when I try to remove (az vmss extension delete) or rerun(az vmss extension set ) the custom script with correct url I keep getting the same error. It is stuck. How do I fix it.
Deployment failed. Correlation ID:
249a034f-76e2-4b0d-beb2-e9c6577623d1. VM has reported a failure when
processing extension 'customScript'. Error message: "Enable failed:
processing file downloads failed: failed to download file[0]: failed
to download file: http request failed: Get
https://wrongurl.blob.core.windows.net/script/deploytemp.sh: dial tcp:
lookup wrongurl.blob.core.windows.net on 164.33.122.16:53: no such
host".
Delete the instance and rebuild it!
It might not be the answer to delete the instance and rebuilding it, but applications on VMSS by nature should be resilient enough to let you do so.
Also, I'm curious if auto-healing/remediation helps you on this, I know that it does not reinstall the extension tough.

Composer-Rest-Server not connecting

I am testing a a business network I created, I ran the Composer-rest-server and all worked fine, then shut the server as suggested in the developers guide , then I proceeded use the yo hyperledger composer to create the skeleton of the angular app, however, now the angular app is showing in the local browser, however, the composer-rest- server is not.
Expected Behavior:
I should start the composer-rest- server in localhost:3000 and the angular app as well
Actual Behavior:
I get this message;
scovering types from business network definition ...
Connection fails: Error: Error trying to ping. Error: Error trying to query chaincode. Error: Connect Failed
It will be retried for the next request.
Exception: Error: Error trying to ping. Error: Error trying to query chaincode. Error: Connect Failed
Error: Error trying to ping. Error: Error trying to query chaincode. Error: Connect Failed
at _checkRuntimeVersions.then.catch (/home/node/.nvm/versions/node/v6.11.2/lib/node_modules/composer-rest-server/node_modules/composer-connector-hlfv1/lib/hlfconnection.js:696:34)
Your Environment
composer-cli#0.11.3
generator-hyperledger-composer#0.11.3
composer-rest-server#0.11.3
Docker version 17.06.0-ce, build 02c1d87
docker-compose version 1.13.0, build 1719ceb
The Problem
If you kill your fabric instance using ./stopFabric that you started using the ./startFabric command then all the containers that were apart of the business network were killed as well and therefore you need to reinstall the .bna and start the network again. (the development flow provided is purposely volatile for rapid development)
The Solution
1.) Type docker ps to see all of your running containers. You should see none if you are getting that error because your peer is not responding to pings
2.) Open a separate terminal and navigate to where you have fabric-dev-servers in the terminal and run ./fabricStart. This will start all the containers like your network Certificate Authority, the peer, the orderer, etc.
3.) Return to your project in another terminal. Do Step 1 & 2 found at the developer tutorial (you likely won't need to do step 3 since you likely already imported the network administrator identity going through the tutorial)
4.) Run composer network ping --card admin#tutorial-network. The ping should go through.
5.) Run docker ps. You should see 4 containers running
6.) Run composer-rest-server and follow the steps from the tutorial.
7.) Run cd tutorial-network-app to switch to where your angular application is (or wherever you generated it with the yo command)
8.) Navigate to http://localhost:3000 and everything should work.
Any other questions or problems just reply here and I can help.
The expected behaviour is that the REST server is already running (the the generator uses Loopback to spin up a REST server already (that's why you shut down the previous REST server)). Its described here https://hyperledger.github.io/composer/unstable/tutorials/developer-guide.html under 'Generate your Skeleton Web Application'.
After you created the application - following completion of the yo hyperledger-composer questions (and after providing the answers) you run your application using npm start from within the generated application directory. Your app is accessible at http://localhost:4200.

Resources