I am running selenium tests using python on Jenkins. I keep getting a build failure with the message 'Execute shell' marked build as failure and no errors. There is a post-build script which also fails afterwards. How can I get more information on why the build step fails and what could be causing it?
pre-build commands
aws ec2 run-instances --image-id ami-0c69e983f7523357a --count 1 --instance-type t3.micro --key-name Jenkins --subnet-id subnet-44c9b033 --output=json | jq -r ".Instances[].InstanceId" > untagged.json
sleep 20
# Tag every instance in that file
< untagged.json xargs -I{} aws ec2 create-tags --resources {} --tag "Key=Purpose,Value=Selenium_BO_ReportViewer"
sleep 40
build step commands
cd automation-UI
export PROJECT_PATH=`pwd`
echo "-------------------------------------------"
echo "-------------------------------------------"
python3 -m pytest --cache-clear -W ignore -v -rA -s -m "reportviewer" -n 0 --tb=short --show-capture=no --runenv ci --reruns 1 --reruns-delay 2
post-build commands
# Since we are done, terminate the selenium slave we started in the first step
< untagged.json xargs -I{} aws ec2 terminate-instances --instance-ids {}
console log
09/09/2021 12:34:56AM - N/A - FieldFxTicketPage - test_rounding_numbers_in_ticket_report - INFO: grid table data reading
0%| | 0/3 [00:00<?, ?it/s]
33%|███▎ | 1/3 [1:10:02<2:20:04, 4202.08s/it]Build step 'Execute shell' marked build as failure
[PostBuildScript] - [INFO] Executing post build scripts.
[report-viewer] $ /bin/bash -xe /tmp/jenkins1788184406631893042.sh
+ xargs '-I{}' aws ec2 terminate-instances --instance-ids '{}'
xargs: aws: terminated by signal 15
[PostBuildScript] - [ERROR] An error occured during post-build processing.
org.jenkinsci.plugins.postbuildscript.PostBuildScriptException: java.lang.InterruptedException
at org.jenkinsci.plugins.postbuildscript.processor.Processor.processBuildSteps(Processor.java:190)
at org.jenkinsci.plugins.postbuildscript.processor.Processor.processScripts(Processor.java:91)
at org.jenkinsci.plugins.postbuildscript.processor.Processor.process(Processor.java:79)
at org.jenkinsci.plugins.postbuildscript.processor.Processor.process(Processor.java:73)
at org.jenkinsci.plugins.postbuildscript.PostBuildScript.perform(PostBuildScript.java:77)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:21)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:808)
at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:757)
at hudson.model.Build$BuildExecution.post2(Build.java:179)
at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:701)
at hudson.model.Run.execute(Run.java:1914)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:100)
at hudson.model.Executor.run(Executor.java:433)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at java.lang.UNIXProcess.waitFor(UNIXProcess.java:395)
at hudson.Proc$LocalProc.join(Proc.java:331)
at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:195)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:145)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
at org.jenkinsci.plugins.postbuildscript.processor.Processor.processBuildSteps(Processor.java:180)
... 13 more
Build step 'Execute scripts' marked build as failure
[WS-CLEANUP] Deleting project workspace...
[WS-CLEANUP] Skipped based on build state FAILURE
Finished: FAILURE```
I have a bash script that is executing a dataflow job like so
python3 main.py \
--runner DataflowRunner \
--region us-west1 \
--job_name name \
--project project \
--autoscaling_algorithm THROUGHPUT_BASED \
--max_num_workers 10 \
--environment prod \
--staging_location gs staging loc \
--temp_location temp loc \
--setup_file ./setup.py \
--subnetwork subnetwork \
--experiments use_network_tags=internal-ssh-server
So I use gitlab ci to run this
Deploy Prod:
stage: Deploy Prod
environment: production
- *setup-project
- pip3 install --upgrade pip;
- pip install -r requirements.txt;
- chmod +x deploy.sh
- ./deploy.sh
- master
So now my code runs and logs in the gitlab pipeline AND in the logs viewer in dataflow. What I want to be able to do is that once gitlab sees JOB_STATE_RUNNING, it marks the pipeline as passed and stops outputting logs to gitlab. Maybe there's a way to do this in the bash script? Or can it be done in gitlab ci?
GitLab doesn't have this capability as a feature, so your only is to script the solution.
Something like this should work to monitor the output and exit once that text is encountered.
python3 myscript.py > logfile.log &
( tail -f -n0 logfile.log & ) | grep -q "JOB_STATE_RUNNING"
echo "Job is running. We're done here"
exit 0
reference: https://superuser.com/a/900134/654263
You'd need to worry about some other things in the script, but that's the basic idea. GitLab unfortunately has no success condition based on trace output.
Little late, hope this helps someone, The way i solved it was I created two scripts
# This helps submit the beam dataflow job using nohup and parse the results for job submission
nohup stdbuf -oL bash ~/submit_beam_job.sh &> ~/output.log &
# Wait for log file to appear before checking on it
sleep 5
# Count to make sure the while loop is not stuck
while ! grep -q "JOB_STATE_RUNNING" ~/output.log; do
echo "Job has not started yet, waiting for it to start"
if [[ $cnt -gt 5 ]]; then
echo "Job submission taking too long please check!"
exit 1
# For other error on the log file, checking the keyword
if grep -q "Errno" ~/output.log || grep -q "Error" ~/output.log; then
echo "Error submitting Dataflow job, please check!!"
exit 1
sleep 30
The submit beam job is like this
# This submits individual beam jobs for each topics
# Beam pipeline for carlistingprimaryimage table
python3 ~/dataflow.py \
--input_subscription=projects/data-warehouse/subscriptions/cars \
--window_size=2 \
--num_shards=2 \
--runner=DataflowRunner \
--temp_location=gs://bucket/beam-python/temp/ \
--staging_location=gs://bucket/beam-python/binaries/ \
--max_num_workers=5 \
--project=data-warehouse \
--region=us-central1 \
--gcs_project=data-warehouse \
I want my integration tests to run in parallel on Circleci.
I read this document https://circleci.com/blog/how-to-boost-build-time-with-test-parallelism/ and I setup my job like this
working_directory: *workspace_root
executor: ubuntu-machine
parallelism: 16
- prepare_workspace
- run:
name: 'Run Platform Component tests'
./gradlew platform:componentTest -PtestFilter="`circleci tests glob "platform/src/componentTest/java/**/*.java"|circleci tests split`"
By looking at the UI, I see that each of the 16 containers that are spawn execute all the tests.
Am I missing something?
I ended up slightly modifying this and incorporating what I learned from here and here to build this:
- run:
name: Run tests in parallel
# Use "./gradlew test" instead if tests are not run in parallel
command: |
cd module-with-tests-to-run/src/test/kotlin
# Get list of classnames of tests that should run on this node
CLASSNAMES=$(circleci tests glob "**/**Test.kt" \
| cut -c 1- | sed 's#/#.#g' \
| sed 's/.\{3\}$//' \
| circleci tests split --split-by=timings --timings-type=classname)
cd ../../../..
# Format the arguments to "./gradlew test"
GRADLE_ARGS=$(echo $CLASSNAMES | awk '{for (i=1; i<=NF; i++) print "--tests",$i}')
echo "Prepared arguments for Gradle: $GRADLE_ARGS"
./gradlew clean module-with-tests-to-run:test $GRADLE_ARGS
note: I tried to get the formatting right but I might have goofed.
Cloud-build is not showing build failure status
I created my own remote-builder which scp all files from /workspace to my Instance and running build on using gcloud compute ssh -- COMMAND
ssh-keygen -t rsa -N "" -f ${KEYNAME} -C ${USERNAME} || true
chmod 400 ${KEYNAME}*
cat > ssh-keys <<EOF
${USERNAME}:$(cat ${KEYNAME}.pub)
${GCLOUD} compute scp --compress --recurse \
${GCLOUD} compute ssh --ssh-key-file=${KEYNAME} \
below is the example of the code to run build(cloudbuild.yaml)
- name: gcr.io/$PROJECT_ID/remote-builder
- COMMAND="docker build -t [image_name]:[tagname] -f Dockerfile ."
During docker build inside Dockerfile it got failure and show errors in log but status showing SUCCESS
can any help me how to resolve it.
Thanks in advance.
try adding
|| exit 1
at the end of your docker command... alternatively, you might just need to change the entrypoint to 'bash' and run the script manually
To confirm -- the first part was the run-on.sh script, and the second part was your cloudbuild.yaml right? I assume you trigger the build manually via UI and/or REST API?
I wrote all docker commands on bash script and add below error handling code to it.
handle_error() {
echo "FAILED: line $1, exit code $2"
exit 1
trap 'handle_error $LINENO $?' ERR
It works!
What follows is my buildspec.yml
- 'IMAGE_TAG=$(cat package.json | grep version | head -1 | awk -F: ''{ print $2 }'' | sed ''s/[",]//g'')'
- echo $IMAGE_TAG
- docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .
Here's the output from the relevant build:
[Container] 2018/12/12 22:06:42 Running command IMAGE_TAG=$(cat package.json | grep version | head -1 | awk -F: '{ print $2 }' | sed 's/[",]//g')
[Container] 2018/12/12 22:06:42 Running command echo $IMAGE_TAG <<< GOOD
1.0.0 <<<< PERFECT
[Container] 2018/12/12 22:06:42 Running command docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .
invalid argument "gotbot-air:" for t: invalid reference format
See 'docker build --help'. <<<<<< OH NO
[Container] 2018/12/12 22:06:42 Command did not exit successfully docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG . exit status 125
As you can clearly see on the marked lines, somehow the variable $IMAGE_TAG is set correctly to 1.0.0 when echo'd, yet on the very next line of execution in my build script, it seems to have disappeared.
Please note I am using version 0.2 of the specification.
EDIT: It may be important that my other environment variables are either declared at the top in env or native code build variables, it might be that im getting some different execution environment when running the docker command?
Check the version of your BuildSpec file. Change it to version: 0.2.
In version 0.1, AWS CodeBuild runs each build command in a separate instance of the default shell in the build environment. In version 0.2, AWS CodeBuild runs all build commands in the same instance of the default shell in the build environment.
I have a function within a bash script that is executing the following:
get_repo_master_hash() {
timeout 60 bash -c "git ls-remote $REPOURL | grep refs/heads/master | cut -f 1"
But when this is executed within my script I receive the following:
timeout: can't execute '60': No such file or directory
Why is it executing the duration instead of my command?
This script is being executed within a docker container that is using alpine/git:1.0.4 as the image.
On a Docker alpine:3.6 container I get
$timeout --help
Usage: timeout [-t SECS] [-s SIG] PROG ARGS
So you should do timeout -t 60 instead.