test strategy for non functional test cases in continuous integration [closed] - continuous-integration

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
In large-system development, the non-functional requirements are frequently
the most important, and implementing them takes the majority of the development time.The non-functional tests are expensive
and often take long to run.Non-functional tests frequently cannot be run in the normal continuous-integration-system cycle because they may take too long to execute—a stability test might take two weeks.Anyone could suggest any good test strategy to achieve manual execution of non functional testing in continuous integration process where taking automated build which is created in every 2 hrs

Some lengthy tests could (and if so they should) be split in several shorter tests which can be executed in parallel.
In some cases it could be preferable to spend some money to increase the number of testbeds thus the overall test bandwith/capacity which would allow multiple tests overlapping each-other, reducing or even eliminating the impact of the long test duration - you could still use it in (some) CI systems - no one says that if the CI pipelines start every 2h they also need to complete within 2h - they can continue and overlap (staggered) as long as the resource capacity allows it (or at least a decent CI system should support such overlapping).
Alternatively the CI systems could be configured to selectively run longer tasks depending on their capacity: say do the typical stuff for every pipeline (2h apart) but only run a test with a capacity of 1 per day once every 12 pipelines or whenever resources for the long test are available (maybe selecting one pipeline which already passed the shorter verifications -> higher chances of passing the longer test, more meaningful results) (this could be done even "manually", by firing the long tests with artifacts from a subset of the CI executions).
In some cases the long duration is a side effect of limitations of the testing infrastructure or the actual test coding, for example inability to execute tasks in paralel even if that wouldn't fundamentally affect the test. In such cases switching to a more appropriate infrastructure or, respectively, re-writing the tests to allow/improve parallelism could shorted (sometimes significantly) the test duration.

First of all, congratulations for understanding of importance of non-functional requirements, this is still uncommon knowledge!
You've mentioned running tests for 2 weeks - this seems far too long for me. Continuous integration is about immediate feedback loop. If any test take that long, you may get notified of a serious problem only after 2 weeks after it was introduced. I'd think twice if this has to be like that.
Manual execution of non functional testing in continuous integration should be avoided as much as possible. Tests should run automatically straight after deployment. If for some reasons certain tests can't run in this fashion (e.g. because they take longer to execute), they should be triggered periodically - automatically of course.
There are a couple of options to speed up NFT execution time:
Scale down the tests - e.g. instead of 1000 threads with ramp up = x, run 100 threads with ramp up = x/10. If you scale all necessary parameters properly, you may get accurate feedback much earlier.
Parallelise NFT execution across a number of test environments, once functional tests passed. If you use platform like Amazon, this should be perfectly possible. And if you pay for time the machine was up, this doesn't have to significantly raise the cost - overall test execution time may be similar.

Related

What is proper practice for performance rules testing?

I know that what we're doing is incorrect/strange practice.
We have an object that is constructed in many places in the app, and lags in its construction can severely impact our performance.
We want a gate to stop check-ins which affect this construction's performance too adversely...
So what we did was create a unit test which is basically the following:
myStopwatch.StartNew()
newMyObject = New myObject()
myStopwatch.Stop()
Assert(myStopwatch.ElapsedMilliseconds < 100)
Or: Fail if construction takes longer than 100ms
This "works" in the sense that check-ins will not commit if they impact this performance too negatively... However it's inherently a bad unit test because it can fail intermittently... If, for example, our build-server happens to be slow for whatever reason.
In response to some of the answers; we explicitly want our gates to reject check-ins that impact this performance, we don't want to check logs or watch for trends in data.
What is the correct way to meter performance in our check-in gate?
To avoid the machine dependence, you could first time the construction of a "reference object" which has a known acceptable construction time. Then compare the time to construct your object to the reference object's time.
This may help prevent false failures on an overloaded server since the reference code will also be slower. I'd also run the test several times and only require X% of them to pass. (since there are many external events which can slow down code, but none that will speed it up. )
First I would say: Can't you allow some of that logic be lazily run rather than executing all of it in the constructor / initialization? Or can you partition the object? An useful metric for this is LCOM4.
Secondly, can you cache those instances? In a previous project we had a similar situation, and we decided to cache the object for a few minutes. This brought some other smaller issues, but the performance of the app skyrocketed.
And last, I do think it's a good approach, but I would take an average, rathen than just one sample (the OS might just at that time decide to run something else and it might take more than 100ms).
Also, one issue with this approach is, if you update your hardware and forget to update this, you might add even more logic, without realizing.
I think a better approach, but more a bit more tricky to implement, is to store how long it takes to run N iterations, and if that value increases more than X% you fail the build. The benefit of this, is that since you store how long it takes, you can generate a graph from it and see the trend.
I don't think that you should really do this in such a way as to block check ins because it is too much work to be done during the check in process. Check ins need to be fast because your developers can do nothing else whilst they run.
This unit test would have to compile and run whilst the developer sits and waits for it. As you pointed out, one iteration of the test is not good enough to produce consistant results. How many times would it need to be run to be reliable? 10? A run of 10 iterations would increase the check in time by up to 1 second and still isn't reliable enough in my opinion. If you increased that to 100 iterations you'd get a better result but that's adding 10 seconds to the check in time.
Also, what happens if two developers check in code at the same time? Does the second one have to wait for the first test to complete before theirs starts or would the tests be run simultaneously? The first scenario is bad because the second developer has to wait twice as long. The second scenario is bad as you'd be likely to fail both tests.
I think that a better option would have the unit test be run after the check in has completed and, if it fails, have it communicate this to somebody. You could have the test run after each check in but that still has the potential for two people to check in at the same time. I think that it would be better to run the test every N minutes. That way you'd be able to track it down fairly quickly.
You could do it so that it blocks check ins but you'd have to make sure that it only runs when that object (or a dependancy) changes so that don't slow down every commit. You'd also have to make sure that the test isn't run more than once at a time.
As to the specific test, I don't think that you can get away with anything other than run the test through a number of iterations to get a more accurate result. I wouldn't like to rely on anything less than a 5 or 10 second test (so 50 to 100 iterations).

Estimation of development work - ratio between alloted time for development and bug fixes [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I have now a good process for estimating development work for projects, I think how much time it would take me to do it considering the worst case scenario, and then I double that number. For each of my team members I have another ratio (higher or even lower) but the idea is the same.
My problem is the fixes phase, it is very hard to tell up front how much time to reserve for issue resolving as it depends on many parameters (complexity of the project, staff skill level, management and design quality, QA quality etc)
I am still to decide on a percentage from the project pure development estimation that I should always add for the issue fixes (Just the fixes phase untill "go live" / "production" / "next release" etc)
Is there a methodology that defines an actual golden ratio number? does anyone have one?
20%? 50%?
Test Driven development reduces these pains. At the cost of the time you write a test, you instantly (if you actually run your tests) detect regressions.
As you say there are many variables. For me one commonality is that look at the lines added vs the lines deleted. When every commit adds and removes about the same number of lines, those are the bug fixes.
use your SCM to track how many commits / weeks / lines this was.
NOTE: your deleters might be doing more good than your adders in some cases. ( as long as they don't introduce bugs )
On a traditional waterfall style project, we found a good rule of thumb was 20/20/20/40 - 20 HLD, 20 DD, 20 CCUT, 40 integration and test. I've always found that to be useful in that it works both for initial estimates and for a checkpoint when you are part way into the cycle.
From an ongoing post-delivery maintenance, I don't have as good a ratio. Most projects I know, don't even try, they just budget some number of support hours, and figure some will be bugfix, and some will be user handholding.
Addition - realized I ought to clarify my acronyms:
HLD = High Level Design
DD = Detailed Design
CCUT = Code, Compile, Unit Test
I'm pulling from traditional waterfall concepts here as that's where I've had access to the most metrics. So these assume that you'll (more or less) have to do HLD before DD, DD before CCUT and so forth. Practical experience shows they can blend more than a little, but if you have HLD and CCUT happening at the same time, you have some real risks afoot.
As you say bug fixing depends a lot on the code complexity. automated tools like ProjectCodeMeter calculate this by analyzing your source code, it usually gives me between %30 to %60 percent for debugging+testing, depending on the code.

Purpose of automation testing - Feasibility [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
what is the purpose of automation testing?
According to me the main purpose is
Fast
Removes repetitive manual work
My main query comes here. If after automation if it only reduces repetitive manual work but it is taking almost the same time as it was taking before then is automation feasible in this case. As to make the testing automated it takes some time by the tester to create.
So if one resource is dedicating 15 working days to create the framework of the automation testing and later if he gets that the automation testing is just reducing his repetitive work but not reducing the time required then what is the profit of the organisation from this automation testframe provided the resource is dedicated to that part which he automate.
The profit is long term :
short term, it take time to create the tests
short / middle term, you gain some time running them ; but it is balanced by the time taken it write them
long / very-long term, you can run the tests over and over again ; each day, you gain some more time ;-)
You also have the advantage of having reproducable tests -- easier to get the same results each time, comparing between two builds if/what went wrong...
Also, once you have your tests which are complete, lots of things are tested each time they are run -- on the other hand, would a human being do the same tests over and over again each day ? Would you ?
Considering too many developpers don't even fully test their application once... I bet no-one will test his application every day / each time a modification is made.
Considering the feasability : well, last year, I spent something like 20 days writting automated tests ; those are still run 2 times a day each day -- and still sometimes identify regressions on (not often used by develloppers) parts of the application no-one would test manually, or in parts of the application that are so hard to get to (many screens with long forms and complex process) that no-one ever tests them manually too...
It took time, yes ; but it definitly was a great investment !
Building of escalators and elevators takes a great deal of time and money. They also require maintenance.
But people using them have the convenience quickly getting to the floor they need. They're still walking too.
As you see from this analogy, Test Automation clearly is not the same as Automated Testing.
But once it's implemented, testers may use it to get test results automatically. That saves time and helps to extend the coverage.
You also don't really need elevators in small house with 2-3 storeys. For 5-7 storey building it becomes valuable. For 10 and up storey building it is necessary, and the more floors you have, the more elevators and escalators will be required.
Replace storeys with functionality modules to get back to Test Automation needs.
Thanks.
The main benefit from automating your testing is that it will expose when you made changes to the code that caused a regression, where something that used to work fine is now broken. The payback computation on the automation work really depends on how much your code changes. If you're unlikely to ever touch the code once it's tested as working, then automation is of limited value relative to what it costs to develop. But if developers are going to be hacking at the program regularly, you can bet that automating the tests that pass will eventually pay for itself. You'll find regressions as soon as they're introduced, rather than later when the cause will be much harder to determine, and it doesn't take many such expensive events to pay for the cost of automating the tests in the first place. And if you consider the quality of the released code to be important, automated tests to find regressions before something goes out are even more valuable.
Its quick
avoid the regression testing
need to work only on the updated module
Very less manual intervention
we can utilize more time to the further enhancement in automation

Evidence Based Scheduling - handling hidden tasks, concurrent tasks? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
I've sort of been trying EBS on my personal project tasks, but two things have come up a couple times I'm not sure how to handle.
1 - I find some hidden task(s). I thought it was going to be 6 hours to do task X, but turns out that requires a new Ant build task, which requires libray ZipBlahBoo, which I have then get into ivy, which requires some investigation into the XML parsing library versions each uses. I want to note these new tasks in my list, but that disrupts the estimation impact. I might have taken 6 hours for the actual original task, but there were another 8 hours in hidden tasks.
2 - I'll often have tasks that are mutually dependent. I need to update the Foolet service, but that also means updating the API, which means updating the Mock Foolet service used in unit tests. I've got each of those called out as 2 hour tasks, but I don't do them serially, I do them concurrently because the system won't work until it's all done. Let's say the set of tasks takes 15 hours and I know overall I took 13 hours, but I don't really know all that well how much of that 13 hours any of the specific tasks too. From an EBS point of view, how do I track the time it took to complete each task?
Any suggestions?
Evidence based scheduling should work best if you just charge all the hidden sub-task hours to the task that spawned them. This way, it will begin to transparently factor these occasional overruns into your overall expected performance and therefore produce better projections.
You're splitting too finely. Updating a test harness to account for an interface change shouldn't be a separate task from the interface change itself, unless the test harness is a separable product.
That's a case of not being good at foreseeing all the hidden tasks, so you should add all these hours. Basically, you do 14 hours for that, including the stuff you aren't foreseeing right now. Of course, you still estimate "6 hours", and then apply the multiplier computed from past evidence.
Well, that's tough. I suggest you either estimate, and live with that, or stop splitting such tasks.

How Much Time Should be Allotted for Testing & Bug Fixing [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Every time I have to estimate time for a project (or review someone else's estimate), time is allotted for testing/bug fixing that will be done between the alpha and production releases. I know very well that estimating so far into the future regarding a problem-set of unknown size is not a good recipe for a successful estimate. However for a variety of reasons, a defined number of hours invariably gets assigned at the outset to this segment of work. And the farther off this initial estimate is from the real, final value, the more grief those involved with the debugging will have to take later on when they go "over" the estimate.
So my question is: what is the best strategy you have seen with regards to making estimates like this? A flat percentage of the overall dev estimate? Set number of hours (with the expectation that it will go up)? Something else?
Something else to consider: how would you answer this differently if the client is responsible for testing (as opposed to internal QA) and you have to assign an amount of time for responding to the bugs that they may or may not find (so you need to figure out time estimates for bug fixing but not for testing)
It really depends on a lot of factors. To mention but a few: the development methodology you are using, the amount of testing resource you have, the number of developers available at this stage in the project (many project managers will move people onto something new at the end).
As Rob Rolnick says 1:1 is a good rule of thumb- however in cases where a specification is bad the client may push for "bugs" which are actually badly specified features. I was recently involved in a project which used many releases but more time was spent on bug fixing than actual development due to the terrible specification.
Ensure a good specification/design and your testing/bug fixing time will be reduced because it will be easier for testers to see what and how to test and any clients will have less lee-way to push for extra features.
Maybe I just write buggy code, but I like having a 1:1 ratio between devs and tests. I don't wait until alpha to test, but rather do it throughout the whole project. The logic? Depending on your release schedule, there could be a good deal of time between when development starts and when your alpha, beta, and ship dates are. Furthermore, the earlier you catch bugs, the easier (and cheaper) they are to fix.
A good tester, who find bugs soon after each check-in, is invaluable. (Or, better yet, before a check-in from a PR or DPK) Simply put, I am still extremely familiar with my code, so most bug fixes become super simple. With this approach, I tend to leave roughly 15% of my dev time to bug fixing. At least when I do estimates. So in a 16 week run I'd leave around 2-3 weeks.
Only a good amount of accumulated statistics from previous projects can help you to give precise estimates. If you have a well defined set of requirements, you can make a rough calculation of how many use cases you have. As I said you need to have some statistics for your team. You need to know average bugs-per-loc number to estimate total bugs count. If you don't have such numbers for your team, you can use industry average numbers. After you have estimated LOC (number of use cases * NLOC) and average bugs-per-lines, you can give more or less accurate estimation on time required to release project.
From my practical experience, time spent on bug-fixing is equal to or more (in 99% cases :) ) than time spent on original implementation.
From the testing Bible:
Testing Computer Software
p. 31: "Testing [...] accounts for 45% of initial development of a product." A good rule of thumb is thus to allocate about half of your total effort to testing during initial development.
Use a language with Design-by-Contract or "Code-contracts" (preconditions, check assertions, post-conditions, class-invariants, etc) to get "testing" as close to your classes and class features (methods and properties) as possible. Then use TDD to test your code with its contracts.
Use as much self-built code-generation as you possibly can. Generated code is proven, predictable, easier to debug, and easier/faster to fix than all-hand-coded code. Why write what you can generate? However, do not use OPG (other-peoples-generators)! Code YOU generate is code you control and know.
You can expect to spend an inverting ratio over the course of your project--that is--you will write lots of hand-code and contracts in the start (1:1) of your project. As you see patterns, teach a code generator YOU WRITE to generate the code for you and reuse it. The more you generate, the less you design, write, debug, and test. By the end of the project, you will find that your equation has inverted: You're writing less of your core-code, and your focus shifts to your "leaf-code" (last-mile) or specialized (vs generalized and generated) code.
Finally--get a code analyzer. A good, automated code analysis rule system and engine will save you oodles of time finding "stupid-bugs" because there are well-known gotchas in how people write code in particular languages. In Eiffel, we now have Eiffel Inspector, where we not only use the 90+ rules coming with it, but are learning to write our own rules for our own discovered "gotchas". Such analyzers not only save you in terms of bugs, but enhance your design--even GREEN programmers "get it" rather quickly and stop making rookie mistakes earlier and learn faster!
The rule of thumb for rewriting existing systems is this: "If it took 10 years to write, it will take 10 years to re-write." In our case, using Eiffel, Design-by-Contract, Code Analysis, and Code Generation, we have re-written a 14 year system in 4 years and will fully deliver in 4 1/2. The new system is about 4x to 5x more complex than the old system, so this is saying a lot!

Resources