Gradle : Cleanup resources after build failure - gradle

I execute test suite through Gradle for the build and it spins up a lots of processes on different ports. Also, failFast is set to true for my test task. So, following happens when I execute my suite:
Suite starts up and spins up processes/servers listening to different ports
Tests in the suite are executed
When one or more tests fails, the suite execution is halted and the build is marked as failed
Now, when failing tests are fixed and the build is eventually run, step 1 (described above) fails with the message that the port is already in use. Also, I am using forkEvery parameter, meaning the previous tests might have more than one JVM running.
Is there any way to clean everything up (in terms of processes and not the physical files) when a build fails through gradle?

You can add a custom TestListener that stops the processes/servers from (1)
You can reference Spring Boot's FailureRecordingTestListener: https://github.com/spring-projects/spring-boot/blob/master/buildSrc/src/main/java/org/springframework/boot/build/testing/TestFailuresPlugin.java#L57..L95
The basic idea here is that in the afterSuite method, you would stop whatever processes where started/created from (1). Although within the TestListener, you don't have access to the test instance where processes were started from (1). So you'll need to figure out how to stop those processes without having a reference to the original class where it may have defined some things.

Related

Hibernate: DB not reliably rolled back at end of UnitTests in spite of #Transactional

We have a large application using Spring for application setup, initialisation and "wiring" and Hibernate as persistence framework. For that application we have a couple of unit tests which are causing us headaches because they again and again run "red" when executing them on our Jenkins build server.
These UnitTests execute and verify some rather complex and lengthy core-operations of our application and thus we considered it too complex and too much effort to mock the DB. Instead these UTs run against a real DB. Before the UTs are executed we create the objects required (the "pre-conditions"). Then we run a Test and then we verify the creation of certain objects, their status and values etc. All plain vanilla...
Since we run multiple tests in sequence which all need the same starting point these tests derive from a common parent class that has an #Transactional annotation. The purpose of that is that the DB is always rolled back after each unit-test so that the subsequent test can start from the same baseline.
That approach is working perfectly and reliably when executing the unit-tests "locally" (i.e. running a "mvn verify" on a developer's workstation). However, when we execute the very same tests on our Jenkins, then - not always but very often - these tests fail because there are too many objects being found after a test or due to constraint violations because certain objects already exist that shouldn't yet be there.
As we found out by adding lots of log-statements (because it's otherwise impossible to observe code running on Jenkins) the reason for these failures is, that the DB is occasionally not properly rolled back after a prior test. Thus there are left-overs from the previous test(s) in the DB and these then cause issue during subsequent tests.
What's puzzling us most is:
why are these tests failing ONLY when we execute them on Jenkins, but never when we run the very same tests locally? We are using absolute identical maven command line and code here, also same Java version, Maven version, etc.
We are by now sure that this has nothing to do with UTs being executed in parallel as we initially suspected. We disabled all options to run UTs in parallel, that the Maven Surefire plugin offers. Our log-statements also clearly show that the tests are perfectly serialized but again and again objects "pile up", i.e. after each test-method, the number of these objects that were supposed to have been removed/rolled back at the end of the test, are still there and their number increases which each test.
We also observed a certain "randomness" with this effect. Often, the Jenkins builds run fine for several commits and then suddenly (even without any code change, just by retriggering a new build of the same branch) start to run red. The DB, however, is getting re-initialized before each build & test-run, so that can not be the source of this effect.
Any idea anyone what could cause this? Why do the DB rollbacks that are supposed to be triggered by the #org.springframework.transaction.annotation.Transactional annotation work reliable on our laptops but not on our build server? Any similar experiences and findings on that anyone?

How to retry only failed tests in the CI job run on Gitlab?

Our automation tests run in gitlab CI environment. We have a regression suite of around 80 tests.
If a test fails due to some intermittent issue, the CI job fails and since the next stage is dependent on the Regression one, the pipeline gets blocked.
We retry the job to rerun regression suite expecting this time it will pass, but some other test fails this time.
So, my question is:
Is there any capability using which on retrying the failed CI job, only the failed tests run (Not the whole suite)?
You can use the retry keyword when you specify the parameters for a job, to define how many times the job can be automatically retried: https://docs.gitlab.com/ee/ci/yaml/#configuration-parameters
[Retry Only Failed Scenarios]
Yes, but it depends. let me explain. I'll mention the psuedo-steps which can be performed to retry only failed scenarios. The steps are specific to pytest, but can be modified depending on the test-runner.
Execute the test scenarios with --last-failed. At first, all 80 scenarios will be executed.
The test-runner creates a metadata file containing a list of failed tests. for example, pytest creates a folder .pytest_cache containing lastfailed file with the list of failed scenarios.
We now have to add the .pytest_cache folder in the GitLab cache with the key=<gitlab-pipeline-id>.
User checks that there are 5 failures and reruns the failed job.
When the job is retried it will see that now .pytest_cache folder exists in the GitLab cache and will copy the folder to your test-running directory. (shouldn't fail if the cache doesn't exist to handle the 1st execution)
you execute the same test cases with the same parameter --last-failed to execute the tests which were failed earlier.
In the rerun, 5 test cases will be executed.
Assumptions:
The test runner you are using creates a metadata file like pytest.
POC Required:
I have not done POC for this but in theory, it looks possible. The only doubt I have is how Gitlab parses the results. Ideally in the final result, all 80 scenarios should be pass. If it doesn't work out this way, then we have to have 2 jobs. execute tests -> [manual] execute failed tests to get 2 parsed results. I am sure with 2 stages, it will definitely work.
You can use Retry Analyser. This will help you definitely.

Integration test execution should wait until server is ready

I have written Selenium tests which should be executed during the build process of an web application. I am using the maven-failsafe-plugin to execute the integration tests and the tomcat7-maven-plugin to start up a tomcat server in the pre-integration-test phase and after the execution of the tests it gets stopped in the post-integration-test phase. This works fine.
The problem is that the tomcat server is caching some data when started up to improve the search speed. Some of my tests rely on that data, so the integration tests should wait for the server to finish caching the data.
How can I make that happen?
I added a process bar to show the loading progress. Once the loading is complete the process bar is not rendered anymore and the data table will be rendered. In this way I can add to the tests which depend on the data table to be loaded this line of code:
longWait.until(ExpectedConditions.presenceOfElementLocated(By.id("dataTablePanel")));
Additionally I am using org.junit.runners.Suite as a runner so that I can specify the order of how my test classes will be executed. Thereby I can execute the test which do not rely on the data first and then the ones which need it. To ensure that the data is present and I don't need to check that in every test case, I have created a test class which will only check the presence of the data and will be executed before all test cases which depend on the data.

Pausing Teamcity builds that are running

I would like to have Teamcity build configuration that currently has 3 build steps:
Build an artifact to perform tests on & install on remote server
Kick off long running test job on remote server
Pause build awaiting external event (i.e. remote job finishing)
Retrieve results and record the report
I have had a look through the documentation and I can see how I can pause (step 3) the entire build configuration (which stops any additional builds running) ... but not just a single running build.
The Step 2 script that is running the external job has the various parameters passed to it, so that it can issue a REST call back to the teamcity server to resume the build job.
Basically I don't want to tie up a build agent waiting the entire hour the test takes to run.
I have googled and everything I can find points me at pausing the build configuration.
I am currently having to look at splitting the build configuration into two. The first will kick of the test job and finish. Then when the external test job finishes it will call teamcity to start a second job to retrieve and store the reports. But that feels disconnected to me in that I will not be able to show a single job with build/test/report.
At the moment (TeamCity v 2018.1) there is no direct way to pause the build, release the build agent, and later resume the execution.
What you described is the recommended workaround.
Also, please watch/vote for related issue: https://youtrack.jetbrains.com/issue/TW-30777

What is systemd equivalent of upstart "on stopping" and "on started"

I am translating my upstart scripts to systemd scripts and I am wondring what is the best practice of transalting the following:
start on started postgresql
stop on stopping postgresql
Is the Requires= section is the right for me or there is a better section?
start on and start on started according to upstart:
6.33 start on
This stanza defines the set of Events that will cause the Job to be automatically started.
...
6.33.2 Start depends on another service
start on started other-service
So in your case, start on started postgresql means it needs to start after postgresql has successfully started because it depends on it.
In systemd that would be:
[Unit]
Requires=postgresql
After=postgresql
Because according to the systemd.unit man page:
After=,Before= ... If a unit foo.service contains a setting Before=bar.service and both units are being started, bar.service's start-up is delayed until foo.service is started up. [...] After= is the inverse of Before=, i.e. while After= ensures that the configured unit is started after the listed unit finished starting up, Before= ensures the opposite, that the configured unit is fully started up before the listed unit is started.
...
Requires= Configures requirement dependencies on other units. If this unit gets activated, the units listed here will be activated as well. If one of the other units fails to activate, and an ordering dependency After= on the failing unit is set, this unit will not be started. [...] If a unit foo.service requires a unit bar.service as configured with Requires= and no ordering is configured with After= or Before=, then both units will be started simultaneously and without any delay between them if foo.service is activated.
As for stop on and stop on stopped according to upstart:
6.34 stop on
This stanza defines the set of Events that will cause the Job to be automatically stopped if it is already running.
...
6.34.3 Stop after dependent service
stop on stopped other-service
The After=postgresql mentioned above has you covered because again, according to the systemd.unit man page:
After=,Before= [...] Note that when two units with an ordering dependency between them are shut down, the inverse of the start-up order is applied. i.e. if a unit is configured with After= on another unit, the former is stopped before the latter if both are shut down.

Resources