Number of lines of code in a lifetime [closed] - metrics

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
One of the companies required from its prospective employee to give the number of lines of code written in the life time in a certain programming language like Java, or C#. Since, most of us have a number of years of experience in different projects in multiple languages and we hardly keep record of this, what would be the best approach to calculate this metrics. I am sure the smart members of stackoverlow.com will have some ideas.
This is a very respected company in its domain and I am sure they have some very good reason to ask this question. But what makes it also difficult to answer is the type of code to consider. Should I only include the difficult algorithm that I implemented or any code I wrote for e.g. a POJO that had 300 properties and whose getters/setters were generated using IDEs!

The best response to such a question is one of the following:
Why do you want to know?
What meaning would you attribute to such a number?
Is it OK if I just up and leave just about now?
I would seriously question the motives behind anyone asking such a question either of current or prospective employees. It is most likely the same type of company that would start doing code reviews focusing on the number of lines of code you type.
Now, if they argue that the number of lines of code is a measure of the experience of a programmer, then I would definitely leave the interview at that point.
Simple solutions can be found for complex problems, and are typically better than just throw enough lines of code at the problem and it'll sort itself out. Since the number of bugs produced scales linearly and above with the number of statements, I would say that the inverse is probably better, combined with the number of problems they've tackled.
As a test-response, I would ask this:
If in a program I am able to solve problem A, B and C in 1000 lines of code, and another programmer solves the same problems in 500 lines of code, which of us is the best (and the answer would be: not enough information to judge)
Now, if you still want to estimate the number of lines, I would simply start thinking about the projects the person has written, and compare their size with a known quantity. For instance, I have a class library that currently ranges about 130K lines of code, and I've written similar things in Delphi and other languages, plus some sizable application projects, so I would estimate that I have a good 10 million lines of code on my own at least. Is the number meaningful? Not in the slightest.

Sounds like this is D E Shaw's questionnaire?

This seems like one of those questions like 'How many ping-pong balls could you fit in a Boeing 747?' In that case, the questioner wants to see you demonstrate your problem solving skills more than know how many lines of code you've actually written. I would be careful not to respond with any criticism of the question, and instead honestly try to solve the problem ; )

Take a look at ohloh. The site shows metrics from open source projects.
The site estimates that 107,187 lines of code corresponds to an effort of 27 Person Years (4000 lines of code per year).
An example of the silliness of such a metric is that the number is from a project I've been toying with outside work during 2 years.

There are basically three ways of dealing with ridiculous requests for meaningless metrics.
Refuse to answer, challenging the questioner for their reasons and explaining why those reasons are silly.
Spending time gathering all the information you can, and calculating the answer to the best of your ability.
Making up a plausible answer, and moving on with as little emotional involvement possible in the stupidity as possible.
The first answers I see seem to be taking the first line. Think about whether you still want the job despite the stupidity of their demands. If the answer is still Yes, avoid number 1.
The second method would involve looking at your old code repositories from old projects.
In this case, I would go with the third way.
Multiply the number of years you have worked on a language by 200 work days per year, by 20 lines of code a day, and use that.
If you are claiming more than one language per year, apportion it out between them.
If you have been working more on analysis, design or management, drop the figure by three quarters.
If you have been working in a high-ceremony environment (defence, medicine), drop the figure by an order of magnitude.
If you have been working on an environment with particularly low ceremony, increase it by an order of magnitude.
Then put the stupidity behind you and get on with your life as quickly as possible

Depending on what they do with the answer, I don't think this is a bad question. For example, if a candidate puts JavaScript on their resume, I want to know how much JavaScript have they actually written. I may ask, for example, for the number of lines in the largest JavaScript project they've written. But I'm only looking for a sense of scale, not an actual number. Is it 10, 100, 1000, or 10,000 lines?
When I ask, I'll make very clear that I'm just looking for a crude number to gauge the size of the project. I hope the employer in the questioner's case is after the same.

It is an interesting metric to ask for considering you could write many many lines of bad code instead of writing just a few smart ones.
I can only assume they are considering more lines to be better than fewer. Would it be better to not plan at all and just start writing code, That would be a great way to write more lines of code, since at least if I do that I usually end up writing everything at least twice.

Smart of stack overflowers would generally avoid organization that ask this kind of question. Unless the correct answer is "huh, wtf??"

If you were to be truly honest then you'd say that you don't know because you have never viewed it as a valid metric. If the interviewer is a reasonable/rational person, then this is the answer they are looking for.
The only other option to saying you don't know is to guess, and that really isn't demonstrating problem solving skills.

Why bother calculating this metric without a good reason? And some random company asking for the metric really isn't a good reason.
If the company's question is actually serious, and you think the interview might lead to something interesting, then I would just pick a random number in order to see where that leads :-)

Ha, reminds me when I took over a C based testing framework, which started out as 20K+
lines that I ended up collapsing into 1K LOC by factoring down to a subroutine instead
of the 20K lines of diarrea code originally written by the original author. Unfortunately,
I got spanked harder for any errors in the code as my KLOC's written actually went
negative... I would think long and hard about shrinking the code base in a metrics driven organization....

Even if I agree with the majority in saying that this is not a really good metric, if it's a serious compmany, as you say, they may have their reasons to ask this.. This is what I would probably do:
Take one of your existing project, get the number of lines and divide it by the time it took you to code it. This will give you a kind of lines per hour metric. Then, try to estimate how many time you have worked with that specific language and multiply it with your already calculated metric. I honestly don't think it's a great way.. but honest, this isn't a great question neither.. I would also tell the company the strategy I used to come up with this number.. maybe, MAYBE, this is what they want.. to know your opinion about this question and how you would answered it? :p
Or, they just want to know if you have some experiences.. so, guess an impressive number and write it down :D

"This is a very respected company in its domain and I am sure they have some very good reason to ask this question"
And I am very sure they don't, because "being respected" does not mean "they do everything right", because this is certainly not right, or if it is, then it's at least dumb in my opinion.
What does count as "Lines of Code"? I estimate that I have written around 250.000 Lines of C# Code, possibly a lot more. The Problem? 95% was throwaway code, and not all was for learning. I still find myself writing a small 3-line program for the tenth time simply because it's easier to write those three lines again (and change a parameter) than go search for the existing ones.
Also, the lines of code means nothing. So I have two guys, one has written 20% more Lines that the other one, but those 20% more were unnecessary complicated lines, "loop-unrolling" and otherwise useless stuff that could have been refactored out.
So sorry, respected company or not: Asking for Lines of Code is a sure sign that they have no clue about measuring the efficiency of their programmers, which means they have to rely on stone-age techniques like measuring the LoC that are about as accurate as calendars in stone-age. Which means it's possibly a good place to work in if you like to slack off and inflate your Numbers every once in a while.
Okay, that was more a rant than an answer, but I really see absolutely no good reason for this number whatsoever.

And nobody has yet cited the Bill Atkinson -2000 lines story...
In my Friday afternoon (well, about one Friday per month) self-development exercises at work over the past year, tests, prototypes and infrastructure included, I've probably written about 5 kloc. However one project took an existing 25kloc C/C++ application and reimplemented it as 1100 lines of Erlang, and another took 15kloc of an existing C library and turned it into 1kloc of C++, so the net is severely negative. And the only reason I have those numbers was that I was looking to see how negative.

I know this is an old post, but this might be useful to someone anyway...
I recently moved on from a company I worked at for roughly 9.5 years as a Java developer. All our code was in CVS, then SVN, with Atlassian Fisheye providing a view into it.
When I left, Fisheye was reporting my personal, total LOC as +- 250,000. Here's the Fisheye description of its LOC metric, including the discussion on how each SVN user's personal LOC is calculated. Note the issues with branching and merging in SVN, and that LOC should usually only be based on TRUNK.

Related

How does a programmer think? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
This may be a hopelessly vague question. But I am interested to hear whatever logical thought processes people go through when learning a new concept or trying to get their brain around code they might not have ever seen before.
Basically, what general steps does one take to to break down problems and what does it take to "get it"? If you were to diagram a flowchart of how your mental process works when you look at code or try to solve a problem what might it look like?
What common references, tips, and mental assumptions do you find useful in problem solving?
How is this different between different domains? For example in what ways is a web programmer's thought process similar or different from a traditional desktop app developer's process?
I'm a big believer that no matter what type of application you're looking at for the first time, may it be a web app, a desktop app, a device driver, or whatever else, there are three steps one developer usually follows in order to understand how it works:
Get the big picture :
What kind of app is this (web, desktop, ...)?
How is it layered (standalone, client-server, n-tier, ...)?
What is the app's purpose? What is it supposed to do?
Who is the app made for?
See how it works :
What language(s) is (are) used?
How is the code structured?
How is the data structured?
Understand (or at least try to) the way the app has been thought through:
Has it been thought through at all?
Is the app clearly optimized? (For performances? For readability?)
Is the app finished? Or is there room for evolutions?
Are there signs of multiple releases?
etc...
The 1st and 2nd steps are purely technical, while the 3rd MUST be as untechnical as possible... it's more about psychology and understanding how the app has been built. It obviously requires experience, but as long as you think hard enough and don't waste your brain's time with technical details, you'll eventually get it.
This whole process shouldn't require the use of a keyboard. You're only supposed to read, think, and take notes on a paper (I'm not kidding: pen and paper!).
Ho ho, good luck with this one. It's a great question and I'm sure you'll get a ton of answers. Although I have to say I cannot give a satisfactory answer to this - the last thing I would describe my thought processes as is a flow chart - I don't think there is any golden formula for this.
The only tip in problem solving I can recommend is discussing it with somebody else. In those times when you hit a brick wall, going through it with a colleague is invaluable. Quite often, as well, they will actually not even add much to the discussion - in the process of getting all your thoughts out in the open, the solution can become clear.
People are notoriously bad at examining their own thought processes, but I'll give it a whirl. I test very high for visuo-spacial ability in IQ tests, medium-to-high for verbal skills, and moderate for mathematical skills (explains my A-level Maths grade, I suppose). amd when I start to design software, I think in terms of shapes and the connections between them. When it comes to describing these thoughts to others (or clarifying them for myself), I use simple block diagrams or the object diagrams taken from Jacobson's Objectory method - NOT the over complex stuff that UML suggests. I sometimes write textual descriptions of complex things, mostly as reminders to myself, but never use numbers or maths.
Of course this is just me - I've worked with maths whizzes who were just as good or even better programmers than myself.
I don't think... I process.
This is actually less flip than it sounds. I always break down tasks into their components and then break these down further, and that doesn't just go for writing software! Much like #Mark Pim U go through things sequentially.
My wife gets really annoyed when I make dinner because I take so long to get started.
Divide & Conquer
I start by trying grasp the entire problem as it is, and then start to find patterns I can recognize, and do the same for them in a kind of recursive process, until I have a broken down solution I can implement and follow more easily.
This is one of the rare times I would answer with "it just works." I learn things by steamrolling through them. I don't have gimmicks, or devices to help me. Took me some time to learn PHP, but after that Javascript was much easier. Once you tackle one thing, the next items become cumulatively-easier.
Personally, I conduct an internal dialogue with myself 'OK so we need to loop over this list of integers.' 'But we can break when we find the value we want.' 'OK, will the list definitely be initialised when we start?'
I'd be interested to see if any psychological research had been done on problem solving techniques.
Similar to Jonathan Sampson - it kind of just works.
When I'm attacking a real problem, I try and think of the most logical way of getting through it is.
Then, when everything goes wrong (as it usually does), I have to make hundreds of sidesteps to get things done. Just keep focusing on that end goal, that logical way and you'll get there.
Eventually though, it decides to work for me and I end up with a finished product that is usually nothing like I planned it out to be. As long as the customers are happy, I am!
Personally, I see code in my head pictorally rather than textually (like Neil Butterworth) - it's a bit hard to describe since (quoting STIV) "there's no common frame of reference."
My main skill is identifying similarities between models or systems I already know about and the task at hand. The connections between some of these can seem quite abstract; the key is to spot the connections. This leads to the abstraction of common patterns and approaches which are widely applicable. Related to this, the most important thing I learnt about algorithms was that the problem is never 'come up with a smart algorithm to solve X'. It's 'model problem X such that it can be solved by existing smart algorithm Y'.

Estimating time in a project that includes unfamiliar concepts? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
When giving a time estimate for a project that includes work in areas you don't have experience with how do you estimate?
In most cases it is hard enough to get the estimate right when the project's areas are familiar.
What methods have you used in these cases? How well did they work?
Not Serious...
Reach your hand up in the air.
Grab an estimate.
Write down what you find.
Ok, ok...
It's hard. Actually, it is not possible to give an accurate estimate of something that is not known. But that doesn't stop people from needing to know.
Write down a list of all functional components or aspects that you are aware of.
Be as vague as necessary to cover everything.
Now, you have a list of components. For each component, do your best to get the business and/or functional requirements of that part. Why is it there? What is it supposed to do? Why? How? Drill down to more detail, add as many items to your list as possible (nested, components, sub-components).
The more items you have the easier it is to estimate each one.
Then double it!
I think it is very reasonable to walk the stakeholders through some level of design in order to provide an estimate. Otherwise, it's like walking up to a construction company and saying "I want a house, how much will it cost?"
And keep in mind -- it is an estimate :)
Oh, one last thing... Be clear about what it is that you are estimating. If it is "a, b, c" and they later ask for "d", it will be easy to point out that your estimate did not cover "d"... your new estimate is ...
Know what you Know AND Know what you don't Know and you'll be successful.
Use a work breakdown structure (WBS) to get the sub parts of the project/application/whatever. You can provide an est. (to a degree) of the effort to get the knowns complete, as with any project there are the unknowns - identifying them is the first big step. The best next step is to add to the WBS steps to better know those unknowns, for example, if the task is to go BBQ a cheese burger with strips of bacon and you've never BBQ'd bacon before - then you break the job down to getting the ingredients, getting the BBQ, starting the BBQ, etc. and the one thing you don't know - getting/cooking bacon you add some sub-components like:
call butcher
research bacon cooking
hire bacon BBQ'er expert
etc. - each of those you can assign some est. to and put a ? near the actual cooking. Call it a proof of concept (can you bbq bacon?), research and development, whatever - but there are somethings when you start a project you don't know, but you can devise a plan to better know those areas AND once you do revisit the plan and add the findings. By doing this you're communicating out to the team, sponsors, etc. what you know, those areas you don't know but have a plan to get to know, and a rough est. with the unknowns identified.
MinMaxLikely is the method I use. The minimum for the case where everything goes well, the maximum for everything going pear-shaped, and the likely somewhere in between.
Of course, management only ever looks at the minimum figure but, by the time they start complaining about cost over-runs and missed deliverables, you've already got them locked in and you have been (hopefully) keeping the fully appraised of any problems you've encountered and the effect those problems are having (pushing the actuals away from Min and closer to Max).
Have you tried planning poker?
These SO questions might be relevant:
How to estimate the length of a programming task
Dealing with awful estimates
The most significant project management mistakes
Do you practice the planning game?
You cannot estimate these, you can only guess. The difference, to me, is that an estimate is based on knowledge & experience, and you have neither for the unknown.
You have to constantly reevaluate where you started, where you stand, and your guess on what's left. You could probably break down your problem into steps but your biggest issue will be in the "footnotes" of what you're learning.
The example that comes to mind was my first C# interop with Word. It was entirely unknown and I didn't have the foggiest clue what it would take to generate a 1000 page formatted document based on database information. It, in itself, is pretty straight-forward: open Word, format page, and insert data. There are all kinds of issues that cropped up that you couldn't fathom and that easily doubles your guess.
Ask people who are familar with the concepts how long it would have taken them back when they first learnt the technique (including how long to learn). Average the results and adjust based on your capability to learn.
Then guess anyway and triple the quote. Doubling is for chumps.
Wherever possible, go one meta-level up: give an estimate of when you can give reasonably accurate estimate.
"I don't know enough about Foo to give an accurate estimate, but I will know enough about Foo by Thursday week."
I acknowledge it often isn't possible, but I try to make it an option when I am project managing.
You can even go up another meta-level, in large projects.
"I don't know how long it will take for us to get a expert in to give us an estimate. I will talk with my colleagues, and give you an estimate on Friday of how long it will take to get an expert estimate."
This can be combined with large error margins, diminishing rapidly. "It will take 1 year, +/- 9 months. I'll give you another estimate with an error range +/- 3 months by June 30."

How do you explain to a sales person that programming is really difficult and takes time [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I often work with sales and marketing types that cannot figure out how to use Excel, let alone understand the scope of their requests from a technical perspective. Of course, it would not be fair to expect them to, but that still leaves me with a problem.
What is the best way to show marketing and sales types that they have asked for something that requires a lot of complex programming and some patience?
Could you please share examples of problems and solutions?
Could you please recommend books on this subject?
Thanks!
Break the problem up into as many sub-divided tasks as possible. Provide a per-item estimate in hours beside each one.
When they think of a project as a whole, it seems simple. However, when they see each individual thing that must be done and the number of hours each item will require, it is putting it into terms business people can understand. Suddenly the software solution they want isn't a "black box" to them anymore and they now have some insight into the process.
If you are looking for books I would suggest Software Estimation - Demystifying the Black Art.
The computer will do what you told it to do, not what you want it to do.
Any form of abstractions needed be translated to exact details.
source http://c2.com/cgi/wiki?TeachMeToSmoke
Teacher: "It's hard to express ourselves clearly. You're a smoker, right?
Are you pretty good at it? [Student nods.]
Let's pretend I'm a man from Mars and you are going to teach me to smoke.
Do you have a fresh pack? Let's start with that.
[Takes pack.] OK, now tell me what to do."
Student: "Tear open the pack."
T: [Tears pack to shreds. Cigarettes fly everywhere.]
S: "No, no, tear off the top of the pack!"
T: "OK, sorry, do you have another pack? No? OK, let's just start with this cigarette. [Picks one up.]
S: "Put it in your mouth."
T: [Puts whole cigarette in mouth.]
S: "No, no, just put the end in your mouth!"
T: "Sorry." [Tears filter off, puts whole filter in mouth.]
S: "No, no, don't tear the cigarette, just hold it between your lips!"
T: "Oh, sorry, give me another one." [Places new cig sideways between lips.]
... and so on. You can play the game for a long time. It's hard to give clear instructions, even when you know the domain. Programming will endure for a long long time. -- RonJeffries
I had a friend who could do the Rubik's Cube in seconds.
That made me think of this way of explaining to my manager why did our latest project FAIL!
Olivier takes an average of 10 seconds to completely sort all colors of a 3x3 Rubik's Cube after looking at it for approximately 5 seconds.
If you ask him to make an estimate of how long it will take to sort it, you give him the cube, start the clock and after 5 seconds he will say:
"OK, as soon as I start I will be done in 10 seconds"
you smile and say: "Start!"
After 3 seconds you ask him to stop.. give him another Rubik's cube and say.. sort this one instead...
4 seconds after he starts the second Rubik's cube, how long do you think he will take to sort the first one again?
If you answered 7 seconds approximately, congratulations: You're upper management material!
(and Olivier would be rightly entitled to force you to eat the cubes)
I agree with Simucal in the sense that managers tend to do better when you break a problem into hours, rather than into programming tasks. For example, saying to your boss, "That should take about two hours to complete, but I have a few other things that I have to complete first, so I should have it to you by tomorrow." is a lot more useful than saying, "Well, first I have to design an interface to communicate between objects, and then create the classes to implement the interface, and so on." Managers understand what they can see, so anytime you can explain your task in terms of end-user effects, you will likely have more success.
With that said, don't let your manager intimidate you into making promises that you can't keep. You may know that all they want to hear is "I'll have it by the end of the day.", but if you know it can't be done, don't say that it can, hoping that if you have it to them sometime in the next couple days, that it will be "close enough". If you start factoring in time for designing and testing and give them appropriate estimates, eventually they will start to understand how long it takes to accomplish certain types of tasks, and stop expecting everything to be done by yesterday.
I've also noticed that tangible results along the way tend to put their nerves at rest (temporarily, at least). My boss starts demanding finished results when he starts to panic as to whether or not a task will be completed on time. However, when he is able to "see" the step-by-step progression, then he is more likely to understand that we are, in fact, making progress, even though it isn't in the finished product yet.
Also, as you start this process, try to look at things from their point of view, and understand that until you get to a point where you can spend the amount of time you think is necessary, you may have to find a happy medium. There came a point in my experience where I needed to develop a Cache object, and while I would have loved to take several weeks to design and implement a configurable and extensible Cache that could be widely distributed across multiple applications, I had to limit myself to the task at hand. Just make sure that if you decide to scale back or follow through with a short-sighted design, be sure that it is well-documented so you can go back and fix it when you have time (or so another developer can pick up on the train of thought that you were unable to finish). Also, don't sacrifice good coding standards and style, as this will also make your code easier to maintain and update properly in the future.
Good luck!
This one may be a good book for non-programmers to understand some of these issues and pitfalls of runaway requirements:
Dreaming in Code: Two Dozen Programmers, Three Years, 4,732 Bugs, and One Quest for Transcendent Software
In all seriousness, I think the best thing it to actually tell them that some things are complex and do require complex problem solving, analysis and design. There is a gap between what they do, and what the programmer does and its unfortunate that they will never understand the full implications. You sometimes just have to be firm and explain that it can take alot of time.
Perhaps a breakdown of the task into subtasks and giving them estimates may help.
Make sure you understand their issues too. People will often bring solutions to the table ("we need this feature") rather than start with root business needs. The more you understand the problem the more likely you are to be able to suggest a compromise.
On occassion I've been told a certain large feature is absolutely essential, but I've been able to deploy much simpler solutions that substantially addresses the problem. Sometimes these interim solutions have grown into vital features, just as often I've been able to remove them two releases later without anybody noticing.
In my experience, whenever I started to explain to sales people in the past why a task takes a certain amount of time, they quickly admit that they do not really want to know the technical details, and I am fine with that. I usually do not want them to explain to me why they still have not nailed down that big sale after n days either. To do work effectively, everybody has his own area of responsibility.
Just make sure that your relationship with the sales people that you provide estimates for is good and they trust in your ability to do proper and reasonable estimations and get the work done. IMHO there should be no need to explain and reason an estimation in every detail, but if there is, I would say the real problem lies elsewhere.
And I wholeheartedly agree with "It depends" above.

When does a code base become large and unwieldy? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
When do you start to consider a code base to be getting too large and unwieldy?
-when a significant amount of your coding time is devoted to "where do I put this code?"
-when reasoning about side-effects starts to become really hard.
-when there's a significant amount of code that's just "in there", and nobody knows what it does or if it's still running but it's too scary to remove
-when lots of team members spend significant chunks of their time chasing down intermittent bugs caused by some empty string somewhere in the data where it wasn't expected, or something that you think would usually be caught in a well-written application, in some edge case
-when, in considering how to implement a new feature, "complete rewrite" starts to seem like a good answer
-when you dread looking at the mess of code you need to maintain and wish you could find work building something clean and logical instead of dumpster diving through the detritus of someone else's poorly organized thinking
When it's over 100 lines. Joke. This is probably the hardest question to answer, because it's very individual.
But if you structure the application well and use different layers for i.e. interfaces, data, services and front-end you will automaticly get a nice "base"-structure. Then you can dividie each layer into different classes and then inside the classes you point out the appropriet methods for the class.
However, there's not an "x amount of lines per method is bad" but think of it more like this, if there is possibility of replication, split it from the current peice and make it re-usable.
Re-using code is the basics of all good structure.
And splitting up into different layers will help the base to become more and more flexible and modular.
There exist some calculable metrics if that's what you're searching for. Static code analysis tools can help with that:
Here's one list: http://checkstyle.sourceforge.net/config_metrics.html
Other factors can be the time it takes to change/add something.
Other non-calculable factors can be
the risk associated to changes
the level intermingling of features.
if the documentation can keep up with the features / code
if the documentation represent the application.
the level of training needed.
the quantity of repeat instead of reuse.
Ah, the god-program anti-pattern.
When you can't remember at least the
outline of sections of it.
When you have to think about how
changes will affect itself or
dependencies.
When you can't remember all the
things it's dependant on or depend
on it.
When it takes more than a few
minutes(?) to download the source or
compile.
When you have to worry about how to
deploy new versions.
When you encounter classes which are
functionally identical to other
classes elsewhere in the app.
So many possible signs.
I think there are many thoughts to why some code base is too large.
It is hard to remain in a constant naming convention. If classes/methods/atributes can't be named consistently or can't be found consistently, then it's time to reorganize.
When your programmers are surfing the web and going to lunch in order to compile. Keeping compiling/linking time to a minimum is important for management. The last thing you want is a programmer to get distracted by twiddling their thumbs for too long.
When small changes start to affect many MANY other places of code. There is a benefit to consolidation of code, but there is also a cost. If a small change to fix one bug causes a dozen more, and this is commonly happens, then your code base needs to be spread out (versioned libraries) or possibly unconsolidated (yes, duplicate code).
If the learning curve of new programmers to the project is obviously longer than acceptable (usually 90 days), then your code base/training isn't set up right.
..There are many many more, I'm sure. If you think about it from these three perspectives:
Is it hard to support?
Is it hard to change?
Is it hard to learn?
...Then you will have an idea if your code fits the "large and unwieldy" category
For me, code becomes unwieldy when there's been a lot of changes made to the codebase that weren't planned for when the program was initially written or last refactored significantly. At this point, stuff starts to get fitted into the existing codebase in odd places for expediency and you start to get a lot of design artifacts that only make sense if you know the history of the implementation.
Short answer: it depends on the project.
Long answer:
A codebase doesn't have to be large to be unwieldy - spaghetti code can be written from line 1. So, there's not really a magic tripping point from good to bad - it's more of a spectrum of great <---> awful, and it takes daily effort to keep your codebase from heading in the wrong direction. What you generally need is a lead developer that has the ability to review others' code objectively, and keep an eye on the architecture and design of the code as a whole - no one line developer can do that.
When I can't remember what a class does or what other classes it uses off the top of my head. It's really more a function of my cognitive capacity coupled with the code complexity.
I was trying to think of a way of deciding based on how your collegues perceive it to be.
During my first week at a gig a few years ago, I said during a stand-up that I had been tracking a white rabbit around the ContainerManagerBean, the ContainerManagementBean and the ContextManagerBean (it makes me shudder just recalling these words!). At least two of the developers looked at their shoes and I could see them keeping in a snigger.
Right then and there, I knew that this was not a problem with my lack of familiarity with the codebase - all the developers perceived a problem with it.
If over years of development different people code change requests and bug fixes you will sooner or later get parts of code with duplicated functionality, very similar classes, some spaghetti etc.
This is mostly due to the fact that a fix is needed fast and the "new guy" doesn't know the code base. So he happily codes away something which is already there.
But if you have automatic checks in place checking the style, unit test code coverage and similar you can avoid some of it.
A lot of the things that people have identified as indicating problems don't really have to do with the raw size of the codebase, but rather its comprehensibility. How does size relate to comprehensibility? If at all...
I've seen very short programs that are just a mess -- easier to throw away and redo from scratch. I've also seen very large programs whose structure is transparent enough that it is comprehensible even at progressively more detailed views of it. And everything in between...
I think look at this question from the standpoint of an entire codebase is a good one, but it probably pays to work up from the bottom and look first at the comprehensibility of individual classes, to multi-class components, to subsystems, and finally up to an entire system. I would expect the answers at each level of detail to build on each other.
For my money, the simplest benchmark is this: Can you explain the essence of what X does in one sentence? Where X is some granularity of component, and you can assume an understanding of the levels immediately above and below the component.
When you come to need a utility method or class, and have no idea whether someone else has already implemented it or have any idea where to look for one.
Related: when several slightly different implementations of the same functionality exist, because each author was unaware of other authors' work.

When, if ever, is "number of lines of code" a useful metric? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Some people claim that code's worst enemy is its size, and I tend to agree. Yet every day you keep hearing things like
I write blah lines of code in a day.
I own x lines of code.
Windows is x million lines of code.
Question: When is "#lines of code" useful?
ps: Note that when such statements are made, the tone is "more is better".
I'd say it's when you're removing code to make the project run better.
Saying you removed "X number of lines" is impressive. And far more helpful than you added lines of code.
I'm surprised nobody has mentioned Dijkstra's famous quote yet, so here goes:
My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.
The quote is from an article called "On the cruelty of really teaching computing science".
It's a terrible metric, but as other people have noted, it gives you a (very) rough idea of the overall complexity of a system. If you're comparing two projects, A and B, and A is 10,000 lines of code, and B is 20,000, that doesn't tell you much - project B could be excessively verbose, or A could be super-compressed.
On the other hand, if one project is 10,000 lines of code, and the other is 1,000,000 lines, the second project is significantly more complex, in general.
The problems with this metric come in when it's used to evaluate productivity or level of contribution to some project. If programmer "X" writes 2x the number of lines as programmer 'Y", he might or might not be contributing more - maybe "Y" is working on a harder problem...
When bragging to friends.
At least, not for progress:
“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” --Bill Gates
There is one particular case when I find it invaluable. When you are in an interview and they tell you that part of your job will be to maintain an existing C++/Perl/Java/etc. legacy project. Asking the interviewer how many KLOC (approx.) are involved in the legacy project will give you a better idea as to whether you want their job or not.
It's useful when loading up your line printer, so that you know how many pages the code listing you're about to print will consume. ;)
Reminds me of this:
The present letter is a very long one, simply because I had no leisure to make it shorter.
--Blaise Pascal.
like most metrics, they mean very little without a context. So the short answer is: never (except for the line printer, that's funny! Who prints out programs these days?)
An example:
Imagine that you're unit-testing and refactoring legacy code. It starts out with 50,000 lines of code (50 KLOC) and 1,000 demonstrable bugs (failed unit tests). The ratio is 1K/50KLOC = 1 bug per 50 lines of code. Clearly this is terrible code!
Now, several iterations later, you have reduced the known bugs by half (and the unknown bugs by more than that most likely) and the code base by a factor of five through exemplary refactoring. The ratio is now 500/10000 = 1 bug per 20 lines of code. Which is apparently even worse!
Depending on what impression you want to make, this can be presented as one or more of the following:
50% less bugs
five times less code
80% less code
60% worsening of the bugs-to-code ratio
all of these are true (assuming i didn't screw up the math), and they all suck at summarizing the vast improvement that such a refactoring effort must have achieved.
To paraphrase a quote I read about 25 years ago,
"The problem with using lines of code as a metric is it measures the complexity of the solution, not the complexity of the problem".
I believe the quote is from David Parnas in an article in the Journal of the ACM.
There are a lot of different Software Metrics. Lines of code is the most used and is the easiest to understand.
I am surprised how often the lines of code metric correlates with the other metrics. In stead of buying a tool that can calculate cyclomatic complexity to discover code smells, I just look for the methods with many lines, and they tend to have high complexity as well.
A good example of use of lines of code is in the metric: Bugs per lines of code. It can give you a gut feel of how many bugs you should expect to find in your project. In my organization we are usually around 20 bugs per 1000 lines of code. This means that if we are ready to ship a product that has 100,000 lines of code, and our bug database shows that we have found 50 bugs, then we should probably do some more testing. If we have 20 bugs per 1000 lines of code, then we are probably approaching the quality that we usually are at.
A bad example of use is to measure developer productivity. If you measure developer productivity by lines of code, then people tend to use more lines to deliver less.
Answer: when you can talk about negative lines of code. As in: "I removed 40 extraneous lines of code today, and the program is still functioning as well as before."
I'd agree that taking the total number of lines of code in a project is one way to measure complexity.
It's certainly not the only measure of complexity. For example debugging a 100 line obfuscated Perl script is much different from debugging a 5,000 line Java project with comment templates.
But without looking at the source, you'd usually think more lines of code is more complex, just as you might think a 10MB source tarball is more complex than a 15kb source tarball.
It is useful in many ways.
I don't remember the exact # but Microsoft had a web cast that talked about for every X lines of code on average there are y number of bugs. You can take that statement and use it to give a baseline for several things.
How well a code reviewer is doing their job.
judging skill level of 2 employees by comparing their bug ratio's over several projects.
Another thing we look at is, why is it so many lines? Often times when a new programmer is put in a jam they will just copy and paste chunks of code instead of creating functions and encapsulating.
I think that the I wrote x lines of code in a day is a terrible measure. It take no account for difficulty of problem, language your writing in, and so on.
It seems to me that there's a finite limit of how many lines of code I can refer to off the top of my head from any given project. The limit is probably very similar for the average programmer. Therefore, if you know your project has 2 million lines of code, and your programmers can be expected to be able to understand whether or not a bug is related to the 5K lines of code they know well, then you know you need to hire 400 programmers for your code base to be well covered from someone's memory.
This will also make you think twice about growing your code base too fast and might get you thinking about refactoring it to make it more understandable.
Note I made up these numbers.
The Software Engineering Institute's Process Maturity Profile of the Software Community: 1998 Year End Update (which I could not find a link to, unfortunately) discusses a survey of around 800 software development teams (or perhaps it was shops). The average defect density was 12 defects per 1000 LOC.
If you had an application with 0 defects (it doesn't exist in reality, but let's suppose) and wrote 1000 LOC, on average, you can assume that you just introduced 12 defects into the system. If QA finds 1 or 2 defects and that's it, then they need to do more testing as there are probably 10+ more defects.
It's a metric of productivity, as well as complexity. Like all metrics, it needs to be evaluated with care. A single metric usually is not sufficient for a complete answer.
IE, a 500 line program is not nearly as complex as a 5000 line. Now you have to ask other questions to get a better view of the program...but now you have a metric.
It's a great metric for scaring/impressing people. That's about it, and definitely the context I'm seeing in all three of those examples.
Lines of code are useful to know when you're wondering if a code file is getting too large. Hmmm...This file is now 5000 lines of code. Maybe I should refactor this.
When you have to budget for the number of punch cards you need to order.
I wrote 2 blog post detailling the pro and cons of counting Lines of Code (LoC):
How do you count your number of Lines Of Code (LOC) ? : The idea is to explain that you need to count the logical number of lines of code instead of a physical count. To do so you can use tools like NDepend for example.
Why is it useful to count the number of Lines Of Code (LOC) ?: The idea is that LoC should never be used to measure productivity, but more to do test coverage estimation and software deadline estimation.
As most people have already stated, it can be an ambiguous metric, especially if you are comparing people coding in different languages.
5,000 lines of Lisp != 5,000 lines of C
Always. Bunch o'rookies on this question. Masters write code prolifically and densely. Good grads write lots of lines but too much fluff. Crappers copy lines of code. So, first do a Tiles analysis or gate, of course.
LoC must be used if your org doesn't do any complexity points, feature points/function points, commits, or other analysis.
Any developer who tells you not to measure him or her by LoC is shite. Any master cranks code our like you would not believe. I've worked with a handful who are 20x to 200x as productive as the average programmer. And their code is very, very, very compact and efficient. Yes, like Dijkstra, they have enormous mental models.
Finally, in any undertaking, most people are not good at it and most doing it are not great. Programming is no different.
Yes, do a hit analysis on any large project and find out 20% plus is dead code. Again, master programmers regularly annihilate dead code and crapcode.
When you are refactoring a code base and can show that you removed lines of code, and all the regression tests still passed.
Lines of code isn't so useful really, and if it is used as a metric by management it leads to programmers doing a lot of refactoring to boost their scores. In addition poor algorithms aren't replaced by neat short algorithms because that leads to negative LOC count which counts against you. To be honest, just don't work for a company that uses LOC/d as a productivity metric, because the management clearly doesn't have any clue about software development and thus you'll always be on the back foot from day one.
In competitions.
When the coder doesn't know you are counting lines of code, and so has no reason to deliberately add redundant code to game the system. And when everyone in the team has a similar coding style (so there is a known average "value" per line.) And only if you don't have a better measure available.
They can be helpful to indicate the magnitude of an application - says nothing about quality! My point here is just that if you indicate you worked on an application with 1,000 lines and they have an application that is 500k lines (roughly), a potential employer can understand if you have large-system experience vs. small utility programming.
I fully agree with warren that the number of lines of code you remove from a system is more useful than the lines you add.
Check out wikipedia's definition: http://en.wikipedia.org/wiki/Source_lines_of_code
SLOC = 'source lines of code'
There is actually quite a bit of time put into these metrics where I work. There are also different ways to count SLOC.
From the wikipedia article:
There are two major types of SLOC
measures: physical SLOC and logical
SLOC.
Another good resource: http://www.dwheeler.com/sloc/
It is a very usefull idea when it is associated with the number of defects. "Defects" gives you a measure of code quality. The least "defects" the better the software; It is nearly impossible to remove all defects. In many occasions, a single defect could be harmfull and fatal.
However, it does not seem that nondefective software exists.

Resources