What does Total lines constitute in ReportGenerator? - mstest

I'm using the opensource tool ReportGenerator to show the details of the XML output generated from using OpenCover and mstest. For one particular project I see the following metrics:
Covered Lines: 3611
Uncovered lines: 3587
Coverable lines: 7198
Total lines: 35609
Ok, Covered + Uncovered = Coverable lines and this makes sense. However Total lines is significantly above this value and I want to know what constitutes or defines all these additional lines of code to make up Total lines.
The documentation for ReportGenerator is sparse at best, but I would guess it encompasses commenting, whitespace, things like using statements (for importing namespaces), declarations of methods, classes, etc. that are not testable. However I'm not sure, and I plan on using this tool for a lot of projects and need to be able to explain what's behind this number.
Does anyone know or can explain what the Total lines value is comprised of beyond the total testable lines?

According to Codefile.cs it is
string[] lines = System.IO.File.ReadAllLines(this.Path);
this.TotalLines = lines.Length;
and then these are aggregated at the class/assembly level
However if they are based on files recorded in the PDB then it will not be all your source files there are no sequence points (i.e. a place you can put a break point) that require the file to be recorded in the PDB.

Related

How many lines of code are in my Stata do-file, excluding comments?

Is there a fast way to see how many lines of code are in my Stata do-file (.do), not counting the comments? I could make a new version and delete all the comments out by hand, but that's too tedious for what I need.
My intent is to compare the lengths of an old version vs. a new version of the do file. I want to see whether I have made the code more efficient. However, I have some large commented sections of non-vital code in the files that I don't need to count.
A closely related question: is there a way to quickly see a total of all lines of code in a project (rather than just the do-file) - either including or excluding comments? Thank you.

How to split a large csv file into multiple files in GO lang?

I am a novice Go lang programmer,trying to learn Go lang features.I wanted to split a large csv file into multiple files in GO lang, each file containing the header.How do i do this? I have searched everywhere but couldnt get the right solution.Any help in this regard will be greatly appreciated.
Also please suggest me a good book for reference.
Thanking You
Depending on your shell fu this problem might be better suited for common shell utilities but you specifically mentioned go.
Let's think through the problem.
How big is this csv file? Are we talking 100 lines or is it 5G ?
If it's smallish I typically use this:
http://golang.org/pkg/io/ioutil/#ReadFile
However, this package also exists:
http://golang.org/pkg/encoding/csv/
Regardless - let's return to the abstraction of the problem. You have a header (which is the first line) and then the rest of the document.
So what we probably want to do (if ignoring csv for the moment) is to read in our file.
Then we want to split the file body by all the newlines in it.
You can use this to do so:
http://golang.org/pkg/strings/#Split
You didn't mention but do you know how many files you want to split by or would you rather split by the line count or byte count? What's the actual limitation here?
Generally it's not going to be file count but if we pretend it is we simply want to divide our line count by our expected file count to give lines/file.
Now we can take slices of the appropriate size and write the file back out via:
http://golang.org/pkg/io/ioutil/#WriteFile
A trick I use sometime to help think me threw these things is to write down our mission statement.
"I want to split a large csv file into multiple files in go"
Then I start breaking that up into pieces but take the divide/conquer approach - don't try to solve the entire problem in one go - just break it up to where you can think about it.
Also - make gratiutious use of pseudo-code until you can comfortably write the real code itself. Sometimes it helps to just write a short comment inline with how you think the code should flow and then get it down to the smallest portion that you can code and work from there.
By the way - many of the golang.org packages have example links where you can literally run in your browser the example code and cut/paste that to your own local environment.
Also, I know I'll catch some haters with this - but as for books - imo - you are going to learn a lot faster just by trying to get things working rather than reading. Action trumps passivity always. Don't be afraid to fail.
Here is a package that might help. You can set a necessary chunk size in bytes and a file will be split on an appropriate amount of chunks.

Searching a list of keywords from text files in folders

I have compiled a list of db object names, one name per line, in a text file. I want to know for each names, where it is being used. The target search is a group of folders containing sub-folders of source codes.
Before I give up looking for a tool to do this and start creating my own, perhaps you can help to point to me an existing one.
Ideally, it should be a Windows desktop application. I have not used grep before.
use grep (there are tons of port of this command to windows, search the web).
eventually, use AgentRansack.
See our Source Code Search Engine. It indexes a large code base according to the atoms (tokens) of the language(s) of interest, and then uses that index to quickly execute structured queries stated in terms of language elememnts. It is a kind of super-grep, but it isn't fooled by comments or string literals, and it automatically ignores whitespace. This means you get a lot fewer false positive hits than you get with grep.
If you had an identifier "foo", the following query would find all mentions:
I=foo
For C and Java, you can constrain the types of identifier accesses to Use, Read, Write or Defines.
D=bar*
would find only declarations of identifiers which started with the letters "bar".
You can write more complex queries using sequences of language tokens:
'int' I=*baz* '['
for C, would find declarations of any variable name that contained the letters "baz" and apparantly declared an array.
You can see the hits in a GUI, and one-click navigate to a source code view of any hit.
It is a Windows application. It handles a wide variety of languages: C#, C++, Java, ... and many more.
I had created an SSIS package to load my 500+ source code files that is distributed into some depth of folders belongs to several projects, into a table, with 1 row as 1 line from the files (total is 10K+ lines).
I then made a select statement against it, by cross-applying the table that keeps the list of 5K+ keywords of db objects, with the help of RegEx for MS-SQL, http://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/. The query took almost 1.5 hr to complete.
I know it's a long winded, but this is exactly what I need. I thank you for your efforts in guiding me. I would be happy to explain the details further, should anyone gets interested using my method.
insert
dbo.DbObjectUsage
select
do.Id as DbObjectId,
fl.Id as FileLineId
from
dbo.FileLine as fl -- 10K+
cross apply
dbo.DbObject as do -- 5K+
where
dbo.RegExIsMatch('\b' + do.name + '\b', fl.Line, 0) != 0

DUnit Compare Two Text Files and show Diff

Is there a way to compare two text files and show the diff if they are not identical in dunit?
The easy start is to read them to TStringList, however the code for comparing two text file is much more complicated, and the gui in the DUnitGui is not sufficient for this.
Any idea? suggestion?
There is a nice little unit that comes with some examples called TDiff, this is available from http://angusj.com/delphi/ and will allow you to compare 2 files and see the differences, it also allows for merging.
It is a very simple Utility that you can download the entire source for.

Eliminating code duplication in a single file

Sadly, a project that I have been working on lately has a large amount of copy-and-paste code, even within single files. Are there any tools or techniques that can detect duplication or near-duplication within a single file? I have Beyond Compare 3 and it works well for comparing separate files, but I am at a loss for comparing single files.
Thanks in advance.
Edit:
Thanks for all the great tools! I'll definitely check them out.
This project is an ASP.NET/C# project, but I work with a variety of languages including Java; I'm interested in what tools are best (for any language) to remove duplication.
Check out Atomiq. It finds code that is duplicate that is prime for extracting to one location.
http://www.getatomiq.com/
If you're using Eclipse, you can use the copy paste detector (CPD) https://olex.openlogic.com/packages/cpd.
You don't say what language you are using, which is going to affect what tools you can use.
For Python there is CloneDigger. It also supports Java but I have not tried that. It can find code duplication both with a single file and between files, and gives you the result as a diff-like report in HTML.
See SD CloneDR, a tool for detecting copy-paste-edit code within and across multiple files. It detects exact copyies, copies that have been reformatted, and near-miss copies with different identifiers, literals, and even different seqeunces of statements.
The CloneDR handles many languages, including Java (1.4,1.5,1.6) and C# especially up to C#4.0. You can see sample clone detection reports at the website, also including one for C#.
Resharper does this automagically - it suggests when it thinks code should be extracted into a method, and will do the extraction for you
Check out PMD , once you have configured it (which is tad simple) you can run its copy paste detector to find duplicate code.
One with some Office skills can do following sequence in 1 minute:
use ordinary formatter to unify the code style, preferably without line wrapping
feed the code text into Microsoft Excel as a single column
search and replace all dual spaces with single one and do other replacements
sort column
At this point the keywords for duplicates will be already well detected. But to go further
add comparator formula to 2nd column and counter to 3rd
copy and paste values again, sort and see the most repetitive lines
There is an analysis tool, called Simian, which I haven't yet tried. Supposedly it can be run on any kind of text and point out duplicated items. It can be used via a command line interface.
Another option similar to those above, but with a different tool chain: https://www.npmjs.com/package/jscpd

Resources