Comprehensive study of software complexity metrics

Comprehensive study of software complexity metrics - complexity-theory

Does anyone know of any work (academic or otherwise) that thoroughly compares the ability of software complexity metrics to predict the existence of bugs?

Is something like crap4j what you're looking for? I don't know that it predicts the existence of bugs, but it does help to highlight the most likely places for bugs to occur.

Try looking at "Empirical studies of quality models in object-oriented systems" by Briand and Wust. "Predicting Maintainability with Object-Oriented Metrics - An Empirical Comparison" by Dagpinar and Jahnke, "Object-oriented metrics that predict maintainability" by Li and Henry.

Related

Evaluation of Code Metrics [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
There has been a considerable amout of discussion about code metrics (e.g.: What is the fascination with code metrics?). I (as a software developer) am really interested in those metrics because I think that they can help one to write better code. At least they are helpful when it comes to finding areas of code that need some refactoring.
However, what I would like to know is the following. Are there some evaluations of those source code metrics that prove that they really do correlate with the bug-rate or the maintainability of a method. For example: Do methods with a very high cyclomatic-complexity really introduce more bugs than methods with a low complexity? Or do methods with a high difficulty level (Halstead) really need much more amount to maintain them than methods with a low one?
Maybe someone knows about some reliable research in this area.
Thanks a lot!

Good question, no straight answer.
There are research papers available that show relations between, for example, cyclomatic complexity and bugs. The problem is that most research papers are not freely available.
I have found the following: http://www.pitt.edu/~ckemerer/CK%20research%20papers/CyclomaticComplexityDensity_GillKemerer91.pdf. Though it shows a relation between cyclomatic complexity and productivity. It has a few references to other papers however, and it is worth trying to google them.

Here are some:
Object-oriented metrics that predict maintainability
A Quantitative Evaluation of Maintainability Enhancement by Refactoring
Predicting Maintainability with Object-Oriented Metrics - An Empirical Comparison
Investigating the Effect of Coupling Metrics on Fault Proneness in Object-Oriented Systems
The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics

Have a look at this article from Microsoft research. In general I'm dubious of development wisdom coming out of Microsoft, but they do have the resources to be able to do long-term studies of large products. The referenced article talks about the correlation they've found between various metrics and project defect rate.

Finally I did find some papers about the correlation between software metrics and the error-rate but none of them was really what I was looking for. Most of the papers are outdated (late 80s or early 90s).
I think that it would be quite a good idea to start an analysis of current software. In my opinion it should be possible to investigate some populare open source systems. The source code is available and (what I think is much more important) many projects use issue trackers and some kind of version control system. Probably it would be possible to find a strong link between the log of the versioning systems and the issue trackers. This would lead to a very interesting possibility of analyzing the relation between some software metrics and the bug rate.
Maybe there still is a project out there that does exactly what I've described above. Does anybody know about something like that?

We conducted an empirical study about the bug prediction capabilities of the well-known Chidamber and Kemerer object-oriented metrics. It turned out these metrics combined can predict bugs with an accuracy of above 80% when we applied proper machine learning models. If you are interested, you can ready the full study in the following paper:
"Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction. In IEEE Transactions on Software Engineering, Vol. 31, No. 10, October 2005, pages 897-910."

I too was once fascinated with the promises of code metrics for measuring likely quality, and discovering how long it would take to write a particular piece of code given its design complexity. Sadly, the vast majority of claims for metrics were hype and never bore any fruit.
The largest problem is that the outputs we want to know (quality, time, $, etc.) depend on too many factors that cannot all be controlled for. Here is just a partial list:
Tool(s) Operating system
Type of code (embedded, back-end, GUI, web)
Developer experience level
Developer skill level
Developer background
Management environment
Quality focus
Coding standards
Software processes
Testing environment/practices
Requirements stability
Problem domain (accounting/telecom/military/etc.)
Company size/age
System architecture
Language(s)
See here for a blog that discusses many of these issues, giving sound reasons for why the things we have tried so far have not worked in practice. (Blog is not mine.)
https://shape-of-code.com
This link is good, as it deconstructs one of the most visible metrics, the Maintainability Index, found in Visual Studio:
https://avandeursen.com/2014/08/29/think-twice-before-using-the-maintainability-index/
See this paper for a good overview of quite a large number of metrics, showing that they do not correlate well with program understandability (which itself should correlate with maintainability): "Automatically Assessing Code Understandability: How Far Are We?", by Scalabrino et al.

Software time planning metrics

In software development we all need to planning the time correctly. I want to know what metrics you are using to planning time in all processes of software such as analysis, development, maintenance etc...
Sure there are some great articles that you could suggest or methodologies that you follow, could you please inform?

The bad news is that no... let's call it an "off-the-shelf" metric... will immediately give you good results. There are too many variables between development teams and too much variation in how different people may apply the metric.
The only way you can get good results in estimation is by taking steps to chain your estimates to reality. Estimates do not just fall in line with reality, they must be wrestled there. That means you must make estimates, compare the estimates to what really happened, and adjust your process for generating them appropriately.
For something more concrete, Joel on Software has a great article on Evidence-Based Scheduling which is worth a read. I don't necessarily think this is the "best" process but it's a starting point which is better than 95% of software teams ever achieve.

If you are interested in Agile methodology I found Agile Estimating and Planning a good read.

Software quality metrics [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I was wondering if anyone has experience in metrics used to measure software quality. I know there are code complexity metrics but I'm wondering if there is a specific way to measure how well it actually performs during it's lifetime. I don't mean runtime performance, but rather a measure of the quality. Any suggested tools that would help gather these are welcome too.
Is there measurements to answer these questions:
How easy is it to change/enhance the software, robustness
If it is a common/general enough piece of software, how reusable is it
How many defects were associated with the code
Has this needed to be redesigned/recoded
How long has this code been around
Do developers like how the code is designed and implemented
Seems like most of this would need to be closely tied with a CM and bug reporting tool.

If measuring code quality in the terms you put it would be such a straightforward job and the metrics accurate, there would probably be no need for Project Managers anymore. Even more, the distinction between good and poor managers would be very small. Because it isn't, that just shows that getting an accurate idea about the quality of your software, is no easy job.
Your questions span to multiple areas that are quantified differently or are very subjective to quantification, so you should group these into categories that correspond to common targets. Then you can assign an "importance" factor to each category and derive some metrics from that.
For instance you could use static code analysis tools for measuring the syntactic quality of your code and derive some metrics from that.
You could also derive metrics from bugs/lines of code using a bug tracking tool integrated with a version control system.
For measuring robustness, reuse and efficiency of the coding process you could evaluate the use of design patterns per feature developed (of course where it makes sense). There's no tool that will help you achieve this, but if you monitor your software growing bigger and put numbers on these it can give you a pretty good idea of how you project is evolving and if it's going in the right direction. Introducing code-review procedures could help you keep track of these easier and possibly address them early in the development process. A number to put on these could be the percentage of features implemented using the appropriate design patterns.
While metrics can be quite abstract and subjective, if you dedicate time to it and always try to improve them, it can give you useful information.
A few things to note about metrics in the software process though:
Unless you do them well, metrics could prove to be more harm than good.
Metrics are difficult to do well.
You should be cautious in using metrics to rate individual performance or offering bonus schemes. Once you do this everyone will try to cheat the system and your metrics will prove worthless.

If you are using Ruby, there are some tools to help you out with metrics ranging from LOCs/Method and Methods/Class Saikuros Cyclomatic complexity.
My boss actually held a presentation on software metric we use at a ruby conference last year, these are the slides.
A interesting tool that brings you a lot of metrics at once is metric_fu. It checks alot of interesting aspects of your code. Stuff that is highly similar, changes a lot, has a lot of branches. All signs your codes could be better :)
I imagine there are lot more tools like this for other languages too.

There is a good thread from the old Joel on Software Discussion groups about this.

I know that some SVN stat programs provide an overview over changed lines per submit. If you have a bugtracking system and persons fixing bugs adding features etc are stating their commit number when the bug is fixed you can then calculate how many line were affected by each bug/new feature request. This could give you a measurement of changeability.
The next thing is simply count the number of bugs found and set them in ratio to the number of code lines. There are some values how many bugs a high quality software should have per codeline.

You could do it in some economic way or in programmer's way.
In case of economic way you mesaure costs of improving code, fixing bugs, adding new features and so on. If you choose the second way, you may want to measure how much staff works with your program and how easy it is to, say, find and fix an average bug in human hours. Certainly they are not flawless, because costs depend on the market situation and human hours depend on the actual people and their skills, so it's better to combine both methods.
This way you get some instruments to mesaure quality of your code. Of course you should take into account the size of your project and other factors, but I hope main idea is clear.

A more customer focused metric would be the average time it takes for the software vendor to fix bugs and implement new features.
It is very easy to calculate, based on your bug tracking software's date created, and closed information.
If your average bug fixing/feature implementation time is extremely high, this could also be an indicator for bad software quality.

You may want to check the following page describing various different aspects of software quality including sample plots. Some of the quality characteristics you require to measure can be derived using tool such as Sonar. It is very important to figure out how would you want to model some of the following aspects:
Maintainability: You did mention about how easy is it to change/test the code or reuse the code. These are related with testability and re-usability aspect of maintainability which is considered to be key software quality characteristic. Thus, you could measure maintainability as a function of testability (unit test coverage) and re-usability (cohesiveness index of the code).
Defects: Defects alone may not be a good idea to measure. However, if you can model defect density, it could give you a good picture.

What metrics would be usable to determine expertise level in a particular programming language

I am interesting in the raw (or composite) metrics used to get a handle on how well a person can program in a particular language.
Scenario: George knows a few programming languages and wants to learn "foobar", but He would like to know when he has a reasonable amount of experience in "foobar".
I am really interesting in something broader than just the LOC (lines of code) metric.
My hope for this question is to understand how engineers quantify the programming language experiences of others and if this can be mechanically measured.
Thanks in Advance!

In reply to the previous two posters, I'd guess that there is a way to get a handle on how well a person can program in a particular language: you can test how well someone knows English, or Maths, or Music, or Medicine, or Fine Art, so what's so special about a programming language?
In reply to the OP, I guess the tests must assess:
How well you can program
How well you can use the programming language
Therefore the metrics might be:
What's the goodness of the person's programming (and there are various dimensions of goodness such as bug-free, maintainable, quick/cheap to write, runs quickly, meets user requirements, etc.)?
Does the person use appropriate/idiomatic features of the programming language in question in order to do that good programming?
It would be difficult to make the test 'mechanical', though: most exams that I know of are graded by a human examiner. In the case of programming, part of the test could be graded mechanically (i.e. "does it run?") but part of it ("is it understandable and idiomatic?") is intended to benefit, and is better judged by, other human programmers.

The best indicator of your expertise in a particular language, in my opinion, is how productive you are in it.
Productivity is not just how fast you can work but, importantly, how few bugs you create and how little refactoring/rework is required later on.
For example, if you took two languages you have similar level of experience with, and were (in parallel universes) to build the same system with both, I would say the language you build the system with faster and with fewer defects/design flaws, is the language you have more expertise in.
Sorry it's not a "hard" metric for you, it's a more practical approach.

I don't believe that this can be "mechanically measured". I've thought about this a lot though.

Hang on...
Even the "LOC" of a program is a heavily disputed topic!
(Are we talking about the output of cat *.{h,c} | wc -l or some other mechnanism, for instance? What about blank lines? Comments? Are comments important? Is good code documented?)
Until you've realised how pointless a LOC comparison is, you've no hope of realising how pointless other metrics are.

It's a rather qualitative thing that is rarely measured with any great accuracy. It's like asking "how smart was Einstein?". Certification is one (and a reasonably thorough) quantitative indicator, but even it falls drastically short of identifying "good programmers" as many recruiters discover.
What are you ultimately trying to achieve? General programming aptitude can be more important than language expertise in some situations.
If you are language-focussed, taking on a challenge like Project Euler using that language may be a way to track progress.

How proficient they are in debugging complex problems in that language.
Ask them about projects they have worked on in the past, difficult problems they encountered and how they solved them. Ask them about debugging techniques they have used - you'll be surprised at what you'll hear, and you might even learn something new ;-)
A lot of places have a person or two who is a superstar in their field - the person everyone else goes to when they can't figure out what is wrong with their program. I'm guessing thats the person you are looking for :-)

Facility with a programming language is not enough. What is required is facility with a programming language in the context of a partiular suite of libraries on a particular platform
C++ on winapi on Windows 32bit
C++ on KDE on Linux
C++ on Symbian on a Nokia S60 phone
C# on MS .NET on Windows
C# on Mono on Linux
Within such a context, the measures of competence using the target language on the target platform are as follows:
The ability to express common
patterns succinctly and robustly.
The ability to debug common but subtle bugs like race conditions.
It would be possible to develope a suite of benchmark exercises for a programmer. One might also, once significant samples were available, determine the bell curve for ability. Preparing these things would take literally years and they would rapidly be obsoleted. This (and general tightness) is why organisations don't bother.
It would also be necessary to grade people in both "tool maker" and "tool user" modes. Tool makers are very different people with a much higher level of competence but they are often unsuited to monkey work, for which you really want a tool user.

John
There are a couple of ways to approach your question:
1) If you are interviewing candidates for a particular position requiring a particular language, then the only measure to compare candidates is 'how long has this person been writing in this language.' It's not perfect - it's not even very good - but it's reality. Unless you want to give the candidate a problem, a computer, and a compiler to test them on the spot there's no other measure. And then most programmer-types don't do well in "someone's watching you" scenarios.
2) I interpret your question to be more of 'when can I call MYSELF profecient in a language?' For this I would refer to levels of learning a non-native language: first level is you need to look up words/phrases in a dictionary (book) in order to say or understand anything; second level would be that you can understand hearing the language(or reading code) with only the occasional lookup in your trusted and now well-worn dictionary; third level you can now speak (or write code) with only the occasional lookup; fourth level is where you dream in the language; and the final levels is where fool native speakers into thinking that you're a native speaker also (in programming, other experts would think that you may have helped develop the language syntax).
Note that this doesn't help determine how good of a programmer you are - just like knowing English without having to look up words in the dictionary doesn't show "how gooder you is at writin' stuff" - that's subjective and has nothing to do with a particular language as people that are good at programming are good in any language you give them.

The phrase "a reasonable amount of experience" is dependent upon the language being considered and what that language can be used for.
A metric is the result of a measurement. Stevens (see wikipedia: Level Of Measurement) proposed that measurements use four different scale types: nominal (assigning a label), ordinal (assigning a ranking), interval (ordering the measurements) and ratio (having a non-arbitrary zero starting point). LOC is a ratio measurement. Although far from perfect, I think LOC is a relevant, objective number indicating how much experience you have in a language and can be compared to quantifiable values in the software industry. But, this begs the question: where do these industry values come from?
Personally, I would say that "George" will know that he has a reasonable amount of experience when he has designed, implemented and tested a project, maybe of his own choosing on his personal time on his home computer if need be. For example: database, business application, web page, GUI test tool, etc.
From the hiring managers point of view, I would start off by asking the programmer how good s/he is in the language, but this is not a metric. I have always thought that the best way to measure a persons ability to write programs is to give the programmer several small programming problems that are thought-out in advance and solved in a given amount of time, say, 5 minutes each. I have never objected to this being done to me in job interviews. Several metrics are available: Was the programmer able to solve the problem (yes or no - nominal)? How much time did it take (number of minutes - ratio)? How effective was their approach to solving the problem (good, fair, poor - ordinal)? You learn not only the persons ability to write code, but can observe several subjective things as well, such as their behaviour as they go about solving the problem, the questions s/he asks while solving the problem, the ability to work under pressure, etc, From a "quality" perspective though, remember that people do not like being measured.

Still, I believe there are some good metrics like the McCabe Cyclomatic Metric for cyclomatic complexity or the amount of useful commentary per block of code or even the average amount of code written between two consecutive tests.

I know of no such thing. I don't believe there's concensus on how to quantify experience or what "reasonable" means. Maybe I'll learn something too, but if I do it'll be a great surprise.
This may be pertinent.

I find that testing the ability to debug is a more accurate gauge of programming skill than any test aimed at straightforward programming problems that I have encountered. Given the source for a reasonably sized class or function with a stated (or unstated, in some cases) misbehavior, can the testee locate the problem?

Well, they try that in job interviews. There's no metric, but you can assess a person's abilities through questioning and quizzing.

WTF/s * LOC, smaller is best.

there are none; expertise can only be judged subjectively relative to others, or tested on specifics (which has its own level of inaccuracy)
see what is the fascination with code metrics for more information

Have you ever used a genetic algorithm in real-world applications?

I was wondering how common it is to find genetic algorithm approaches in commercial code.
It always seemed to me that some kinds of schedulers could benefit from a GA engine, as a supplement to the main algorithm.

Genetic Algorithms have been widely used commercially. Optimizing train routing was an early application. More recently fighter planes have used GAs to optimize wing designs. I have used GAs extensively at work to generate solutions to problems that have an extremely large search space.
Many problems are unlikely to benefit from GAs. I disagree with Thomas that they are too hard to understand. A GA is actually very simple. We found that there is a huge amount of knowledge to be gained from optimizing the GA to a particular problem that might be difficult and as always managing large amounts of parallel computation continue to be a problem for many programmers.
A problem that would benefit from a GA is going to have the following characteristics:
A good way to encode potential solutions
A way to compute an a numerical score to evaluate the quality of the solution
A large multi-dimensional search space where the answer is non-obvious
A good solution is good enough and a perfect solution is not required
There are many problems that could probably benefit from GAs and in the future they will probably be more widely deployed. I believe that GAs are used in cutting edge engineering more than people think however most people (like my company does) guards those secrets extremely closely. It is only long after the fact that it is revealed that GAs were used.
Most people that deal with "normal" applications probably don't have much use for them though.

If you want to find an example, look at Postgres's Query Planner. It uses many techniques, and one just so happens to be genetic.
http://developer.postgresql.org/pgdocs/postgres/geqo-pg-intro.html

I used GA in my Master's thesis, but after that I haven't found anything in my daily work a GA could solve that I couldn't solve faster with some other Algorithm.

I don't think it is particularly common to find genetic algorithms in everyday-commercial code. They are more commonly found in academic/research code where the need to find the "best algorithm" is less important than the need to just find a good solution to a problem.
Nonetheless, I have consulted on a couple of commercial projects that do use GAs (chiefly as a result of my involvement with GAUL). I think the most interesting example was at a Biotech company. They used the GA to optimise scoring functions that were used for virtual screening, as part of their drug discovery application.
Earlier this year, with my current company, I added a new feature to one of our products that uses another GA. I think we might be marketing this from next month. Basically, the GA is used to explore molecules that have the potential for binding to a protein, and could therefore be further investigated as drugs targeting that protein. A competing product that also uses a GA is EA inventor.

As part of my thesis I wrote a generic java framework for the multi-objective optimisation algorithm mPOEMS (Multiobjective prototype optimization with evolved improvement steps), which is a GA using evolutionary concepts. It is generic in a way that all problem-independent parts have been separated from the problem-dependent parts, and an interface is povided to use the framework with only adding the problem-dependent parts. Thus one who wants to use the algorithm does not have to begin from zero, and it facilitates work a lot.
You can find the code here.
The solutions which you can find with this algorithm have been compared in a scientific work with state-of-the-art algorithms SPEA-2 and NSGA, and it has been proven that
the algorithm performes comparable or even better, depending on the metrics you take to measure the performance, and especially depending on the optimization-problem you are looking on.
You can find it here.
Also as part of my thesis and proof of work I applied this framework to the project selection problem found in portfolio management. It is about selecting the projects which add the most value to the company, support most the strategy of the company or support any other arbitrary goal. E.g. selection of a certain number of projects from a specific category, or maximization of project synergies, ...
My thesis which applies this framework to the project selection problem:
http://www.ub.tuwien.ac.at/dipl/2008/AC05038968.pdf
After that I worked in a portfolio management department in one of the fortune 500, where they used a commercial software which also applied a GA to the project selection problem / portfolio optimization.
Further resources:
The documentation of the framework:
http://thomaskremmel.com/mpoems/mpoems_in_java_documentation.pdf
mPOEMS presentation paper:
http://portal.acm.org/citation.cfm?id=1792634.1792653
Actually with a bit of enthusiasm everybody could easily adapt the code of the generic framework to an arbitrary multi-objective optimisation problem.

I haven't but I've heard of this company (can't remember their name) which uses mutating, genetic algos to calculate placements and lengths of antennas (or something) from a friend of mine. And they're supposed to (according to my friend) have huge success with this. I guess GA is just too complex for "average Joe developer" to become mainstream. Kind of like Map Reduce - spectacularly cool, but WAY too advanced to hit the "mainstream"...

LibreOffice Calc uses it in its Solver module.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio