How can I speedup the unittests for CakePHP - performance

I am developing with CakePHP 2.4.3 and use the Unittest a lot. At the moment mostly on models.
Is there a possibility to shorten the time these test need to run? What makes them so slow? The db insertions of the fixtures?
I notice that I don't have the patience to wait for the tests to run and while waiting I start doing other things and then when I come back I lost track of what problem I was testing.
Thanks for any hints!
CalamityJane

I strongly disagree here with marks comment:
Unittests are not supposed to be "speedy"
Technically they're not that's true but it can become annoying. If you use CI on a large project testing can become horrible slow. You don't want to wait 30min until all tests are done. We had this case in a project with ~550 tables.
The bottleneck is in fact the fixture loading. Because for each test all fixtures have to be created again over and over. It is slow.
We use an internal plugin to copy a test database template to the test database instead of using fixtures. This dropped the time to run the tests on this project from 30+ minutes down to a few minutes.
An open source plugin that should be capable of doing this as well is https://github.com/lorenzo/cakephp-fixturize. You can load fixtures from SQL files or load them from a template database as well, see this section of the readme.md.
If you just have to test a single method there is no need to run all tests, you can filter the tests:
cake test <file> --filter testMyMethod

Related

How to efficiently clean up environment after cucumber E2E tests had run

The problem which I am encountering is related to E2E tests which will run all the time for new app builds (maybe even every few hours on CircleCi). I have ( and will have much more in the future ) features that contain a lot of setups ( necessary the same setup for each scenario to run). For example, before the scenario will run ( many in the feature ) need some users, contents, configuration etc. After the scenario runs probably the best practice is to delete/remove all that users, content etc (or at least after all the scenarios had run for the feature ). I am struggling to understand what is the best practice.
If I add a background then it will run before each scenario, but then I head to remove all that data from the background ( I could add a cleanup function in the last scenario step but that seems bad, correctly if I am wrong). I could add hooks that will clean up after each scenario and keep adding more hooks for new features ( maybe use tags for the scenarios to distinguish for which they should run ).
There are options but it does feel so inefficient... Those tests will be running in a live environment ( not integration or unit tests which are fast, but E2E ). Very often the setup/background will take much more time than one scenario to run and then it will run over and over for each tinny scenario. For example, had to run in e.g. background bunch of endpoints to create users, some content and in many cases ( when we don't have an endpoint for it yet ) I will have to write an automated journey through the UI to add something or change specific settings and then same way add the end delete everything and also through UI change the setting to the state before the feature had run. It feels so slow and inefficient...
The only other thing which comes to my mind ( but will not probably work for all the cases ). Is to create a huge hooks script where I will be adding all the necessary "stuff" before the whole suite run and after the whole thing run I clean the whole stack/instance DB ( or reset to some preset DB snapshot ) to make it state as before the whole suite run.
Please help me to understand what are the best practices in such a cases
Regards
Generally with Cuking the idea is that the database is reset after every scenario. This is done by things like:
running the scenario in a transaction (which is then rolled back)
emptying the database after every scenario
Which you do depends on which flavour of cuke you are using.
The inefficiencies you talk about can be mitigated in a number of ways without compromising the idea that the database should be reset after every scenario. Basically you can think of Cukes as setting up state (Givens) doing something (When) and validating (Thens). Only Whens have to use the UI.
So with Givens you can set up state by either
writing directly to the database using factories or fixtures
calling services (the same ones your UI controllers use) to create things
The second one is much preferred.
With most of the work being done outside the UI this means that you can get rapid cukes that do complex setup quickly.
This really is the way to go when cuking. Setup everything in Givens (using background when appropriate) without using the UI, then login, then do your When using the UI, and validate the result in the UI with your Thens.
Using this approach my current project has approx 450 scenarios that run in about 5 mins on my Mac mini, and that includes
several scenarios that step through UI wizards using a browser (super slow)
many scenarios with complex setup of multiple entities
This idea what you have to work around standard practices to make your suite efficient and fast is common and almost always wrong.
You can actually go way faster than I am going (though it takes quite a bit of work)

Cleanup database state in a beforeEach?

In Using after or afterEach hooks, it is recommended to clean up server/db state in beforeEach or before. I understand the rationale but I believe the text lacks some real use case. Here is a use case that I don't know how to solve following the best practice.
Imagine I'm testing my own clone of github. To have a clean environment for my tests, I want Cypress to use a clean temporary user and a clean temporary repository. To avoid conflicts between multiple Cypress instances targeting the same server (e.g., multiple front-end developers testing their changes in parallel), there should be one user and one repository dedicated to each Cypress instance. This can be implemented by generating users and repositories with well-known random ids (e.g., temp-user-13432481 and temp-repo-134234). Cleaning up the mess in the database is just a removal of temp-* databases away.
The problem is when to clean up. If the clean up is done in a beforeEach() as is recommended, running a test in a Cypress instance will delete the data of other Cypress instances running in parallel.
Is there an obvious solution that I'm missing? How do people usually cleanup temporary testing data in a database?
The obvious answer would be to not run tests in a distributed manner against a single remote server (and instead run the DB server locally on each client), but since this is not an answer to your question, here are a few ideas:
Set up a cron job that will clean up old test repos/users at the end of each day.
If you only clean up users/repos that are older than e.g. several hours, it will avoid cleaning up resources that may still be used by running tests.
You must ensure that the ids are random and large enough (i.e. have enough entropy) that you won't run into collisions even if you don't clean them up for a while.
Make each client (i.e. the PC running the tests) use a fingerprint that you'll use to namespace the repo/user in the DB, and clean them up before each test run.
This way, each client will only clean up their own resources.
I'm leaning towards solution (1).

Running surefire concurrently with shared resources between tests

I have a project with many integration tests that and I'm trying to reduce the tests execution time.
The tests are all JUnit tests that use a DB connection.
Currently all tests run one by one using maven-surefire-plugin with fork for each test in order to handle cache issues (The caches here are not the issue).
All tests use an app that persist to the same DB schema. This face a challenge when trying to parallel the process.
I found a nice blog that explain a bit about concurrency in surefire http://incodewetrustinc.blogspot.com/2010/01/run-your-junit-tests-concurrently-with.html
but I still have a problem implementing this solution since I have a shared resource.
My idea was to create multiple schemas and share them between threads \ process. How can I assign each test with a separate connection and avoid collisions ?
I would love to hear some ideas.
Thanks,
Ika.
Use ${surefire.forkNumber} as part of your DB connection ID. Then each thread running tests will use a separate connection.

TDD Scenario: Looking for advice

I'm currently in an environment where we are parsing data off of the client's website. I want to use my tests to ensure that when the client changes their site, I know when we are no longer receiving the information.
My first approach was to do pure integration tests where my tests hit the client's site and assert that the data was found. However half way through and 500 tests in, the test run has become unbearable and in some cases started timing out. So I cleared out as many tests that I could without loosing the core protection they are providing and I'm down to 350 or so. I'm left with a fear to add more tests to only break all the tests. I also find myself not running the 5+ minute duration (some clients will be longer as this is based on speed of communication with their site) when I make changes anymore. I consider this a complete failure.
I've been putting a lot of thought into this and asking around the office, my thoughts for my next attempt at this is to pull down the client's pages and write tests against these embedded resources in my projects. This will give me my higher test coverage and allow me to go back to testing in isolation. However I would need to be notified when they make changes and then re-pull down the pages to test against. I don't think the clients will adhere to this.
A suggestion was made to me to augment this with a suite of 'random' integration tests that serve the same function as my failed tests (hit the clients site) but in a lot less number than before. I really don't like the idea of random testing, where the possibility of sometimes getting red lights and some times getting green lights with the same code. But this so far sounds like the best idea I've heard to still gain the awareness of when the client's site has changed and my code no longer finds the data.
Has anyone found themselves testing an environment like this? Any suggestions from the testing community for me?
When you say the big test has become unbearable, it suggests that you are running this test suite manually. You shouldn't have to. It should just be running constantly in the background, at whatever speed it takes to complete the suite - and then start over again (perhaps after a delay if there are associated costs). Only when something goes wrong should you get an alert.
If there is something about your tests that causes them to get slower as their number grows - find it and fix it. Tests should be independent of one another, so simply having more of them shouldn't cause individual tests to time out.
My recommendation would be to try to isolate as much as possible the part of code that deals with the uncertainty. This part should be an API that works as a service used by all the other code. This way you would be protecting most of your code against changes.
The stable parts of the code should be unit-tested. With that part being independent from the connection to client's site running the tests should be way quicker and it would also make those tests more reliable.
The part that has to deal with the changes on the client's websites can be reduced. This way you are not solving the problem but at least you're minimising it and centralising it in only one module of your code.
Suggesting to the clients to expose the data as a web service would be the best for you. But I guess that doesn't depend on you :P.
You should look at dividing your tests up, maybe into separate assemblies that can be run independently. I typically have a unit tests assembly and a slower running integration tests assembly.
My unit tests assembly is very fast (because the code is tested in isolation using mocks) and gets run very frequently as I develop. The integration tests are slower and I only run them when I finish a feature / check in or if I have a bad feeling about breaking something.
Maybe you could do something similar or even take the idea further and have 3 test suites with the third containing even slower client UI polling tests.
If you don't have a continuous integration server / process you should look at setting one up. This would continuously build you software and execute the tests. This could be set up to monitor check-ins and work in the background, sending out a notification if anything fails. With this in place you wouldn't care how long your client UI polling tests take because you wouldn't ever have to run them yourself.
Definitely split the tests out - separate unit tests from integration tests as a minimum.
As Martyn said, get a Continuous Integration system in place. I use Teamcity, which is excellent, easy to use, free for the first 20 builds, and you can happily run it on your own machine if you don't have a server at your disposal - http://www.jetbrains.com/teamcity/
Set up one build to run on every check in, and make that build run your unit tests, or fast-running tests if you will.
Set up a second build to run at midnight every night (or some other convenient time), and include in this the longer running client-calling integration tests. With this in place, it won't matter how long the tests take, and you'll get a big red flag first thing in the morning if your client has broken your stuff. You can also run these manually on demand, if you suspect there might be a problem.

How to make a TeamCity build fail (timeout) if it takes too long?

How do we put a timeout on a TeamCity build?
We have a TeamCity build which runs some integration tests. These tests read/write data to a database and sometimes this is very slow (why it is slow is another open quesiton).
We currently have timeouts in our integration tests to check that e.g. the data has been written within 30 seconds, but these tests are randomly failing during periods of heavy use.
If we removed the timeouts from the tests, we would want to fail the build only if the entire run took more than some much larger timeout.
But I can't see how to do that.
On the first page of the build setup you will find the field highlights in my screenie - use that
In TeamCity v.9 and v.10 you should find it under the "Failure Conditions". See:

Resources