How can I convert a bunch of logic and arithmetic to a human readable explanation of specific input variables? - documentation-generation

I'm working on a RPG where a character has numerical stats and skill levels and those numbers get used in various ways while performing actions in the game.
Are there tools, standards, or processes I can use to take the existing blocks of code, which involves a lot of conditional logic and arithmetic, to produce simplified human-readable explanations of the effects of those numbers?
For example, consider this fictional function:
bool player::can_lift_item(itype item) {
if(!this.has_arms) {
return false;
}
return (this.strength > item.weight/50);
}
This is a very simple example, and the code is pretty easy to follow. Many real examples are much more complex. What I'd like to get out of this is something like "increasing strength by 1 increases the weight of items the player can lift by 50". I'm not naive enough to think that the wording would be that easy to read, but anything even remotely close to that would be better than nothing at all. I'm especially interested in being able to get any sort of aggregated information on everywhere player.strength is used and what effects it has in all of those places.

Related

Finding an experiment to evaluate how good an algorithm for keywords extraction is

I have a few algorithms that extract and rank keywords [both terms and bigrams] from a paragraph [most are based on the tf-idf model].
I am looking for an experiment to evaluate these algorithms. This experiment should give a grade to each algorithm, indicating "how good was it" [on the evaluation set, of course].
I am looking for an automatic / semi-automatic method to evaluate each algorithm's results, and an automatic / semi-automatic method to create the evaluation set.
Note: These experiments will be ran off-line, so efficiency is not an issue.
The classic way to do this would be to define a set of key words you want the algorithms to find per paragraph, then check how well the algorithms do with respect to this set, e.g. (generated_correct - generated_not_correct)/total_generated (see update, this is nonsense). This is automatic once you have defined this ground truth. I guess constructing that is what you want to automate as well when you talk about constructing the evaluation set? That's a bit more tricky.
Generally, if there was a way to generate key words automatically that's a good way to use as a ground truth - you should use that as your algorithm ;). Sounds cheeky, but it's a common problem. When you evaluate one algorithm using the output of another algorithm, something's probably going wrong (unless you specifically want to benchmark against that algorithm).
So you might start harvesting key words from common sources. For example:
Download scientific papers that have a keyword section. Check if those keywords actually appear in the text, if they do, take the section of text including the keywords, use the keyword section as ground truth.
Get blog posts, check if the terms in the heading appear in the text, then use the words in the title (always minus stop words of course) as ground truth
...
You get the idea. Unless you want to employ people to manually generate keywords, I guess you'll have to make do with something like the above.
Update
The evaluation function mentioned above is stupid. It does not incorporate how many of the available key words have been found. Instead, the way to judge a ranked list of relevant and irrelevant results is to use precision and recall. Precision rewards the absence of irrelevant results, Recall rewards the presence of relevant results. This again gives you two measures. In order to combine these two into a single measure, either use the F-measure, which combines those two measures into a single measure, with an optional weighting. Alternatively, use Precision#X, where X is the number of results you want to consider. Precision#X interestingly is equivalent to Recall#X. However, you need a sensible X here, ie if you have less than X keywords in some cases, those results will be punished for never providing an Xth keyword. In the literature on tag recommendation for example, which is very similar to your case, F-measure and P#5 are often used.
http://en.wikipedia.org/wiki/F1_score
http://en.wikipedia.org/wiki/Precision_and_recall

Method for runtime comparison of two programs' objects

I am working through a particular type of code testing that is rather nettlesome and could be automated, yet I'm not sure of the best practices. Before describing the problem, I want to make clear that I'm looking for the appropriate terminology and concepts, so that I can read more about how to implement it. Suggestions on best practices are welcome, certainly, but my goal is specific: what is this kind of approach called?
In the simplest case, I have two programs that take in a bunch of data, produce a variety of intermediate objects, and then return a final result. When tested end-to-end, the final results differ, hence the need to find out where the differences occur. Unfortunately, even intermediate results may differ, but not always in a significant way (i.e. some discrepancies are tolerable). The final wrinkle is that intermediate objects may not necessarily have the same names between the two programs, and the two sets of intermediate objects may not fully overlap (e.g. one program may have more intermediate objects than the other). Thus, I can't assume there is a one-to-one relationship between the objects created in the two programs.
The approach that I'm thinking of taking to automate this comparison of objects is as follows (it's roughly inspired by frequency counts in text corpora):
For each program, A and B: create a list of the objects created throughout execution, which may be indexed in a very simple manner, such as a001, a002, a003, a004, ... and similarly for B (b001, ...).
Let Na = # of unique object names encountered in A, similarly for Nb and # of objects in B.
Create two tables, TableA and TableB, with Na and Nb columns, respectively. Entries will record a value for each object at each trigger (i.e. for each row, defined next).
For each assignment in A, the simplest approach is to capture the hash value of all of the Na items; of course, one can use LOCF (last observation carried forward) for those items that don't change, and any as-yet unobserved objects are simply given a NULL entry. Repeat this for B.
Match entries in TableA and TableB via their hash values. Ideally, objects will arrive into the "vocabulary" in approximately the same order, so that order and hash value will allow one to identify the sequences of values.
Find discrepancies in the objects between A and B based on when the sequences of hash values diverge for any objects with divergent sequences.
Now, this is a simple approach and could work wonderfully if the data were simple, atomic, and not susceptible to numerical precision issues. However, I believe that numerical precision may cause hash values to diverge, though the impact is insignificant if the discrepancies are approximately at the machine tolerance level.
First: What is a name for such types of testing methods and concepts? An answer need not necessarily be the method above, but reflects the class of methods for comparing objects from two (or more) different programs.
Second: What are standard methods exist for what I describe in steps 3 and 4? For instance, the "value" need not only be a hash: one might also store the sizes of the objects - after all, two objects cannot be the same if they are massively different in size.
In practice, I tend to compare a small number of items, but I suspect that when automated this need not involve a lot of input from the user.
Edit 1: This paper is related in terms of comparing the execution traces; it mentions "code comparison", which is related to my interest, though I'm concerned with the data (i.e. objects) than with the actual code that produces the objects. I've just skimmed it, but will review it more carefully for methodology. More importantly, this suggests that comparing code traces may be extended to comparing data traces. This paper analyzes some comparisons of code traces, albeit in a wholly unrelated area of security testing.
Perhaps data-tracing and stack-trace methods are related. Checkpointing is slightly related, but its typical use (i.e. saving all of the state) is overkill.
Edit 2: Other related concepts include differential program analysis and monitoring of remote systems (e.g. space probes) where one attempts to reproduce the calculations using a local implementation, usually a clone (think of a HAL-9000 compared to its earth-bound clones). I've looked down the routes of unit testing, reverse engineering, various kinds of forensics, and whatnot. In the development phase, one could ensure agreement with unit tests, but this doesn't seem to be useful for instrumented analyses. For reverse engineering, the goal can be code & data agreement, but methods for assessing fidelity of re-engineered code don't seem particularly easy to find. Forensics on a per-program basis are very easily found, but comparisons between programs don't seem to be that common.
(Making this answer community wiki, because dataflow programming and reactive programming are not my areas of expertise.)
The area of data flow programming appears to be related, and thus debugging of data flow programs may be helpful. This paper from 1981 gives several useful high level ideas. Although it's hard to translate these to immediately applicable code, it does suggest a method I'd overlooked: when approaching a program as a dataflow, one can either statically or dynamically identify where changes in input values cause changes in other values in the intermediate processing or in the output (not just changes in execution, if one were to examine control flow).
Although dataflow programming is often related to parallel or distributed computing, it seems to dovetail with Reactive Programming, which is how the monitoring of objects (e.g. the hashing) can be implemented.
This answer is far from adequate, hence the CW tag, as it doesn't really name the debugging method that I described. Perhaps this is a form of debugging for the reactive programming paradigm.
[Also note: although this answer is CW, if anyone has a far better answer in relation to dataflow or reactive programming, please feel free to post a separate answer and I will remove this one.]
Note 1: Henrik Nilsson and Peter Fritzson have a number of papers on debugging for lazy functional languages, which are somewhat related: the debugging goal is to assess values, not the execution of code. This paper seems to have several good ideas, and their work partially inspired this paper on a debugger for a reactive programming language called Lustre. These references don't answer the original question, but may be of interest to anyone facing this same challenge, albeit in a different programming context.

How to automatically tune parameters of an algorithm?

Here's the setup:
I have an algorithm that can succeed or fail.
I want it to succeed with highest probability possible.
Probability of success depends on some parameters (and some external circumstances):
struct Parameters {
float param1;
float param2;
float param3;
float param4;
// ...
};
bool RunAlgorithm (const Parameters& parameters) {
// ...
// P(return true) is a function of parameters.
}
How to (automatically) find best parameters with a smallest number of calls to RunAlgorithm ?
I would be especially happy with a readl library.
If you need more info on my particular case:
Probability of success is smooth function of parameters and have single global optimum.
There are around 10 parameters, most of them independently tunable (but some are interdependent)
I will run the tunning overnight, I can handle around 1000 calls to Run algorithm.
Clarification:
Best parameters have to found automatically overnight, and used during the day.
The external circumstances change each day, so computing them once and for all is impossible.
More clarification:
RunAlgorithm is actually game-playing algorithm. It plays a whole game (Go or Chess) against fixed opponent. I can play 1000 games overnight. Every night is other opponent.
I want to see whether different opponents need different parameters.
RunAlgorithm is smooth in the sense that changing parameter a little does change algorithm only a little.
Probability of success could be estimated by large number of samples with the same parameters.
But it is too costly to run so many games without changing parameters.
I could try optimize each parameter independently (which would result in 100 runs per parameter) but I guess there are some dependencies.
The whole problem is about using the scarce data wisely.
Games played are very highly randomized, no problem with that.
Maybe you are looking for genetic algorithms.
Why not allow the program fight with itself? Take some vector v (parameters) and let it fight with v + (0.1,0,0,0,..,0), say 15 times. Then, take the winner and modify another parameter and so on. With enough luck, you'll get a strong player, able to defeat most others.
Previous answer (much of it is irrevelant after the question was edited):
With these assumptions and that level of generality, you will achieve nothing (except maybe an impossiblity result).
Basic question: can you change the algorithm so that it will return probability of success, not the result of a single experiment? Then, use appropriate optimization technique (nobody will tell you which under such general assumptions). In Haskell, you can even change code so that it will find the probability in simple cases (probability monad, instead of giving a single result. As others mentioned, you can use a genetic algorithm using probability as fitness function. If you have a formula, use a computer algebra system to find the maximum value.
Probability of success is smooth function of parameters and have single global optimum.
Smooth or continuous? If smooth, you can use differential calculus (Lagrange multipliers?). You can even, with little changes in code (assuming your programming language is general enough), compute derivatives automatically using automatic differentiation.
I will run the tunning overnight, I can handle around 1000 calls to Run algorithm.
That complex? This will allow you to check two possible values (210=1024), out of many floats. You won't even determine order of magnitude, or even order of order of magnitude.
There are around 10 parameters, most of them independently tunable (but some are interdependent)
If you know what is independent, fix some parameters and change those that are independent of them, like in divide-and-conquer. Obviously it's much better to tune two algorithms with 5 parameters.
I'm downvoting the question unless you give more details. This has too much noise for an academic question and not enough data for a real-world question.
The main problem you have is that, with ten parameters, 1000 runs is next to nothing, given that, for each run, all you have is a true/false result rather than a P(success) associated with the parameters.
Here's an idea that, on the one hand, may make best use of your 1000 runs and, on the other hand, also illustrates the the intractability of your problem. Let's assume the ten parameters really are independent. Pick two values for each parameter (e.g. a "high" value and a "low" value). There are 1024 ways to select unique combinations of those values; run your method for each combination and store the result. When you're done, you'll have 512 test runs for each value of each parameter; with the independence assumption, that might give you a decent estimate on the conditional probability of success for each value. An analysis of that data should give you a little information about how to set your parameters, and may suggest refinements of your "high" and "low" values for future nights. The back of my mind is dredging up ANOVA as a possibly useful statistical tool here.
Very vague advice... but, as has been noted, it's a rather vague problem.
Specifically for tuning parameters for game-playing agents, you may be interested in CLOP
http://remi.coulom.free.fr/CLOP/
Not sure if I understood correctly...
If you can choose the parameters for your algorithm, does it mean that you can choose it once for all?
Then, you could simply:
have the developper run all/many cases only once, find the best case, and replace the parameters with the best value
at runtime for your real user, the algorithm is already parameterized with the best parameters
Or, if the best values change for each run ...
Are you looking for Genetic Algorithms type of approach?
The answer to this question depends on:
Parameter range. Can your parameters have a small or large range of values?
Game grading. Does it have to be a boolean, or can it be a smooth function?
One approach that seems natural to this problem is Hill Climbing.
A possible way to implement would be to start with several points, and calculate their "grade". Then figure out a favorable direction for the next point, and try to "ascend".
The main problems that I see in this question, as you presented it, is the huge range of parameter values, and the fact that the result of the run is boolean (and not a numeric grade). This will require many runs to figure out whether a set of chosen parameters are indeed good, and on the other hand, there is a huge set of parameters values yet to check. Just checking all directions will result in a (too?) large number of runs.

How do you represent music in a data structure?

How would you model a simple musical score for a single instrument written in regular standard notation? Certainly there are plenty of libraries out there that do exactly this. I'm mostly curious about different ways to represent music in a data structure. What works well and what doesn't?
Ignoring some of the trickier aspects like dynamics, the obvious way would be a literal translation of everything into Objects - a Scores is made of Measures is made of Notes. Synthesis, I suppose, would mean figuring out the start/end time of each note and blending sine waves.
Is the obvious way a good way? What are other ways to do this?
Many people doing new common Western music notation projects use MusicXML as a starting point. It provides a complete representation of music notation that you can subset to meet your needs. There is now an XSD schema definition that projects like ProxyMusic use to create MusicXML object models. ProxyMusic creates these in Java, but you should be able to do something similar with other XML data binding tools in other languages.
As one MusicXML customer put it:
"A very important benefit of all of your hard work on MusicXML as far as I am concerned is that I use it as a clear, structured and very ‘real-world practical’ specification of what music ‘is’ in order to design and implement my application’s internal data structures."
There's much more information available - XSDs and DTDs, sample files, a tutorial, a list of supported applications, a list of publications, and more - at
http://www.makemusic.com/musicxml
MIDI is not a very good model for a simple musical score in standard notation. MIDI lacks many of the basic concepts of music notation. It was designed to be a performance format, not a notation format.
It is true that music notation is not hierarchical. Since XML is hierarchical, MusicXML uses paired start-stop elements for representing non-hierarchical information. A native data structure can represent things more directly, which is one reason that MusicXML is just a starting point for the data structure.
For a more direct way of representing music notation that captures its simultaneous horizontal and vertical structure, look at the Humdrum format, which uses more of a spreadsheet/lattice model. Humdrum is especially used in musicology and music analysis applications where its data structure works particularly well.
MIDI files would be the usual way to do this. MIDI is a standard format for storing data about musical notes, including start and end times, note volume, which instrument it's played on, and various special characteristics; you can find plenty of prewritten libraries (including some open source) for reading and writing the files and representing the data in them in terms of arrays or objects, though they don't usually do it by having an object for each note, which would add up to a lot of memory overhead.
The instruments defined in MIDI are just numbers from 1 to 128 which have symbolic names, like violin or trumpet, but MIDI itself doesn't say anything about what the instruments should actually sound like. That is the job of a synthesizer, which takes the high-level MIDI data an converts it into sound. In principle, yes, you can create any sound by superposing sine waves, but that doesn't work that well in practice because it becomes computationally intensive once you get to playing a few tracks in parallel; also, a simple Fourier spectrum (the relative intensities of the sine waves) is just not adequate when you're trying to reproduce the real sound of an instrument and the expressiveness of a human playing it. (I've written a simple synthesizer to do just that so I know hard it can be produce a decent sound) There's a lot of research being done in the science of synthesis, and more generally DSP (digital signal processing), so you should certainly be able to find plenty of books and web pages to read about it if you'd like.
Also, this may only be tangentially related to what the question, but you might be interested in an audio programming language called ChucK. It was designed by people at the crossroads of programming and music, and you can probably get a good idea of the current state of sound synthesis by playing around with it.
Music in a data structure, standard notation, ...
Sounds like you would be interested in LilyPond.
Most things about musical notation are almost purely mechanical (there are rules and guidelines even for the complex, non-trivial parts of notation), and LilyPond does a beautiful job of taking care of all those mechanical aspects. What's left is input files that are simple to write in any text editor. In addition to PDFs, LilyPond can also produce Midi files.
If you felt so inclined, you could generate the text files algorythimically with a program and call LilyPond to convert it to notation and a midi file for you.
I doubt you could find a more complete and concise way to express music than an input file for LilyPond.
Please understand that music and musical notation is not hierarchical and can't be modelled(well) by strict adherence to hierarchical thinking. Read this for mor information on that subject.
Have fun!
Hmmm, fun problem.
Actually, I'd be tempted to turn it into Command pattern along with Composite. This is kind of turning the normal OO approach on its head, as you are in a sense making the modeled objects verbs instead of nouns. It would go like this:
a Note is a class with one method, play(), and a ctor takinglengthandtone`.
you need an Instrument which defines the behavior of the synth: timbre, attack, and so on.
You would then have a Score, which has a TimeSignature, and is a Composite pattern containing Measures; the Measures contain the Notes.
Actually playing it means interpreting some other things, like Repeats and Codas, which are other Containers. To play it, you interpret the hierarchical structure of the Composite, inserting a note into a queue; as the notes move through the queue based on the tempi, each Note has its play() method called.
Hmmm, might invert that; each Note is given as input to the Instrument, which interprets it by synthesizing the wave form as required. That comes back around to something like your original scheme.
Another approach to the decomposition is to apply Parnas' Law: you decompose in order to keep secret places where requirements could change. But I think that ends up with a similar decomposition; You can change the time signature and the tuning, you can change the instrument --- a Note doesn't care if you play it on a violin, a piano, or a marimba.
Interesting problem.
My music composition software (see my profile for the link) uses Notes as the primary unit (with properties like starting position, length, volume, balance, release duration etc.). Notes are grouped into Patterns (which have their own starting positions and repetition properties) which are grouped into Tracks (which have their own instrument or instruments).
Blending sine waves is one method of synthesizing sounds, but it's pretty rare (it's expensive and doesn't sound very good). Wavetable synthesis (which my software uses) is computationally inexpensive and relatively easy to code, and is essentially unlimited in the variety of sounds it can produce.
The usefulness of a model can only be evaluated within a given context. What is it you are trying to do with this model?
Many respondents have said that music is non-hierarchical. I sort of agree with this, but instead suggest that music can be viewed hierarchically from many different points of view, each giving rise to a different hierarchy. We may want to view it as a list of voices, each of which has notes with on/off/velocity/etc attributes. Or we may want to view it as vertical sonorities for the purpose of harmonic analysis. Or we may want to view it in a way suitable for contrapuntal analysis. Or many other possibilities. Worse still, we may want to see it from these different points of view for a single purpose.
Having made several attempts to model music for the purposes of generating species counterpoint, analysing harmony and tonal centers, and many other things, I have been continuously frustrated by music's reluctance to yield to my modelling skills. I'm beginning to think that the best model may be relational, simply because to a large extent, models based on the relational model of data strive not to take a point of view about the context of use. However, that may simply be pushing the problem somewhere else.

Should LOC counting include tests and comments?

While LOC (# lines of code) is a problematic measurement of a code's complexity, it is the most popular one, and when used very carefully, can provide a rough estimate of at least relative complexities of code bases (i.e. if one program is 10KLOC and another is 100KLOC, written in the same language, by teams of roughly the same competence, the second program is almost certainly much more complex).
When counting lines of code, do you prefer to count comments in ? What about tests?
I've seen various approaches to this. Tools like cloc and sloccount allow to either include or exclude comments. Other people consider comments part of the code and its complexity.
The same dilemma exists for unit tests, that can sometimes reach the size of the tested code itself, and even exceed it.
I've seen approaches all over the spectrum, from counting only "operational" non-comment non-blank lines, to "XXX lines of tested, commented code", which is more like running "wc -l on all code files in the project".
What is your personal preference, and why?
A wise man once told me 'you get what you measure' when it comes to managing programmers.
If you rate them in their LOC output amazingly you tend to get a lot of lines of code.
If you rate them on the number of bugs they close out, amazingly you get a lot of bugs fixed.
If you rate them on features added, you get a lot of features.
If you rate them on cyclomatic complexity you get ridiculously simple functions.
Since one of the major problems with code bases these days is how quickly they grow and how hard they are to change once they've grown, I tend to shy away from using LOC as a metric at all, because it drives the wrong fundamental behavior.
That said, if you have to use it, count sans comments and tests and require a consistent coding style.
But if you really want a measure of 'code size' just tar.gz the code base. It tends to serve as a better rough estimate of 'content' than counting lines which is susceptible to different programming styles.
Tests and comments have to be maintained too. If you're going to use LOC as a metric (and I'm just going to assume that I can't talk you out of it), you should give all three (lines of real code, comments, tests).
The most important (and hopefully obvious) thing is that you be consistent. Don't report one project with just the lines of real code and another with all three combined. Find or create a tool that will automate this process for you and generate a report.
Lines of Code: 75,000
Lines of Comments: 10,000
Lines of Tests: 15,000
---------
Total: 100,000
This way you can be sure it will
Get done.
Get done the same way every time.
I personally don't feel that the LOC metric on its own is as useful as some of the other code metrics.
NDepend will give you the LOC metric but will also give you many others, such cyclometric complexity. Rather than list them all, here's the link to the list.
There is also a free CodeMetric add-in for Reflector
I'm not going to directly answer your question for a simple reason: I hate the lines of code metric. No matter what you're trying to measure it's very hard to do worse than LOC; Pretty much any other metric you care to think of is going to be better.
In particular, you seem to want measure the complexity of your code. Overall cyclometric complexity (also called McCabe's complexity) is much better metric for this.
Routines with a high cyclometric complexity are the routines you want to focus your attention on. It's these routines that are difficult to test, rotten to the core with bugs and hard to maintain.
There are many tools that measure this sort of complexity. A quick Google search on your favourite language will find dozens of tools that do this sort of complexity.
Lines of Code means exactly that: No comments or empty lines are counted. And in order for it to be comparable to other source code (no matter if the metric in itsle fis helpful or not), you need at least similar coding styles:
for (int i = 0; i < list.count; i++)
{
// do some stuff
}
for (int i = 0; i < list.count; i++){
// do some stuff
}
The second version does exactly the same, but has one LOC less. When you have a lot of nested loops, this can sum up quite a bit. Which is why metrics like function points were invented.
Depends on what you are using the LOC for.
As a complexity measure - not so much. Maybe the 100KLOC are mostly code generated from a simple table, and the 10KLOC kas 5KLOC regexps.
However, I see every line of code associated with a running cost. You pay for every line as long as the program lives: it needs to be read when maintained, it might contain an error that needs to be fixed, it increases compile time, get-from-source-control and backup times, before you change or remove it you may need to find out if anyone relies on it etc. The average cost may be nanopennies per line and day, but it's stuff that adds up.
KLOC can be a first shot indicator of how much infrastructure a project needs. In that case, I would include comments and tests - even though the running cost of a comment line is much lower than one of the regexp's in the second project.
[edit] [someone with a similar opinion about code size]1
We only use a lines of code metric for one thing - a function should contain few enough lines of code to be read without scrolling the screen. Functions bigger than that are usually hard to read, even if they have a very low cyclometric complexity. For his use we do count whitespace and comments.
It can also be nice to see how many lines of code you've removed during a refactor - here you only want to count actual lines of code, whitespace that doesn't aid readability and comments that aren't useful (which can't be automated).
Finally a disclaimer - use metrics intelligently. A good use of metrics is to help answer the question 'which part of the code would benefit most from refactoring' or 'how urgent is a code review for the latest checkin?' - a 1000 line function with a cyclomatic complexity of 50 is a flashing neon sign saying 'refactor me now'. A bad use of metrics is 'how productive is programmer X' or 'How complicated is my software'.
Excerpt from the article: How do you count your number of Lines Of Code (LOC) ? relative to the tool NDepend that counts the logical numbers of lines of code for .NET programs.
How do you count your number of Lines Of Code (LOC) ?
Do you count method signature declaration? Do you count lines with only bracket? Do you count several lines when a single method call is written on several lines because of a high number of parameters? Do you count ‘namespaces’ and ‘using namespace’ declaration? Do you count interface and abstract methods declaration? Do you count fields assignment when they are declared? Do you count blank line?
Depending on the coding style of each of developer and depending on the language choose (C#, VB.NET…) there can be significant difference by measuring the LOC.
Apparently measuring the LOC from parsing source files looks like a complex subject. Thanks to an astute there exists a simple way to measure exactly what is called the logical LOC. The logical LOC has 2 significant advantages over the physical LOC (the LOC that is inferred from parsing source files):
Coding style doesn’t interfere with logical LOC. For example the LOC won’t change because a method call is spawn on several lines because of a high number of arguments.
Logical LOC is independent from the language. Values obtained from assemblies written with different languages are comparable and can be summed.
In the .NET world, the logical LOC can be computed from the PDB files, the files that are used by the debugger to link the IL code with the source code. The tool NDepend computes the logical LOC for a method this way: it is equals to the number of sequence point found for a method in the PDB file. A sequence point is used to mark a spot in the IL code that corresponds to a specific location in the original source. More info about sequence points here. Notice that sequence points which correspond to C# braces‘{‘ and ‘}’ are not taken account.
Obviously, the LOC for a type is the sum of its methods’ LOC, the LOC for a namespace is the sum of its types’ LOC, the LOC for an assembly is the sum of its namespaces’ LOC and the LOC for an application is the sum of its assemblies LOC. Here are some observations:
Interfaces, abstract methods and enumerations have a LOC equals to 0. Only concrete code that is effectively executed is considered when computing LOC.
Namespaces, types, fields and methods declarations are not considered as line of code because they don’t have corresponding sequence points.
When the C# or VB.NET compiler faces an inline instance fields initialization, it generates a sequence point for each of the instance constructor (the same remark applies for inline static fields initialization and static constructor).
LOC computed from an anonymous method doesn’t interfere with the LOC of its outer declaring methods.
The overall ratio between NbILInstructions and LOC (in C# and VB.NET) is usually around 7.

Resources