Cross version line matching - debugging

I'm considering how to do automatic bug tracking and as part of that I'm wondering what is available to match source code line numbers (or more accurate numbers mapped from instruction pointers via something like addr2line) in one version of a program to the same line in another. (Assume everything is in some kind of source control and is available to my code)
The simplest approach would be to use a diff tool/lib on the files and do some math on the line number spans, however this has some limitations:
It doesn't handle cross file motion.
It might not play well with lines that get changed
It doesn't look at the information available in the intermediate versions.
It provides no way to manually patch up lines when the diff tool gets things wrong.
It's kinda clunky
Before I start diving into developing something better:
What already exists to do this?
What features do similar system have that I've not thought of?

Why do you need to do this? If you use decent source version control, you should have access to old versions of the code, you can simply provide a link to that so people can see the bug in its original place. In fact the main problem I see with this system is that the bug may have already been fixed, but your automatic line tracking code will point to a line and say there's a bug there. Seems this system would be a pain to build, and not provide a whole lot of help in practice.

My suggestion is: instead of trying to track line numbers, which as you observed can quickly get out of sync as software changes, you should decorate each assertion (or other line of interest) with a unique identifier.
Assuming you're using C, in the case of assertions, this could be as simple as changing something like assert(x == 42); to assert(("check_x", x == 42)); -- this is functionally identical, due to the semantics of the comma operator in C and the fact that a string literal will always evaluate to true.
Of course this means that you need to identify a priori those items that you wish to track. But given that there's no generally reliable way to match up source line numbers across versions (by which I mean that for any mechanism you could propose, I believe I could propose a situation in which that mechanism does the wrong thing) I would argue that this is the best you can do.
Another idea: If you're using C++, you can make use of RAII to track dynamic scopes very elegantly. Basically, you have a Track class whose constructor takes a string describing the scope and adds this to a global stack of currently active scopes. The Track destructor pops the top element off the stack. The final ingredient is a static function Track::getState(), which simply returns a list of all currently active scopes -- this can be called from an exception handler or other error-handling mechanism.

Related

How can one get a list of Mathematica's built-in global rewrite rules?

I understand that over a thousand built-in rewrite rules in Mathematica populate the global rules table by default. Is there any way to get Mathematica to give a full or even partial list of those rules?
The best way is to get a job at Wolfram Research.
Failing that, I think that for things not completely compiled into the kernel you can recover most of the rules/definitions. Look at
Attributes[fn]
where fn is the command that you're interested in. If it returns
{Protected, ReadProtected}
then there's something you can get a look at (although often it's just a MakeBoxes (formatting) definition or a AutoLoad/Stub type definition). To see what's there run
Unprotect[fn];
ClearAttributes[fn, ReadProtected];
??fn
Quite often you'll have to run an example of the command to load it if it was a stub. You'll also have to dig down from the user-facing commands to the back-end implementations.
Eventually you'll most likely reach a core command that is compiled into the kernel that you can not see the details of.
I previously mentioned this in tips for creating Graph diagrams and it got a mention in What is in your Mathematica tool bag?.
An good example, with a nice bite-sized and digestible bit of code is Experimental`AngularSlider[] mentioned in Circular/Angular slider. I'll leave it up to you to look at the code produced.
Another example is something like BoxWhiskerChart, where you need to call it once in order to load all of the code. Then you see that BoxWhiskerChart proceeds to call Charting`iBoxWhiskerChart which you'll have to unprotect to look at, etc...

Stop Visual Basic 6 from changing my casing

Very simple question that is apparently impossible to find a decent answer to: How can I make Visual Basic 6 stop changing my ^##*ing variable casing!?!
I know that the general opinion of a great many VB users is that this "feature" is actually quite helpful, but I doubt that they use it much with any source control system. This is absolutely INFURIATING when you are trying to collaborate on a project of any significant size with several other developers. If ignored, you produce thousands of false-positive "changes" to your files (even ones with no actual code changes!) that pollute the revision history and make it near impossible in some cases to locate the actual change that took place.
If you don't ignore it (like my office, where we have been forced to implement a "no unneeded case change" policy), you spend 5x the time you would normally on each commit because you have to carefully revert out VB's "corrections" on every file, sometimes reverting hundreds of lines to put in a one line change.
Surely there must be a setting, plugin, hack, etc. out there that can remove this unwanted "feature"? I am willing to take any method I can get as long as it doesn't require me to pick through piles of phantom diffs. And to squash a couple of complaints up front: No, I can't turn off case detection in my diff tool, that's not the point. No, we can't just make the case changes globally. We're working with hundreds of thousands of LOC being worked on by multiple developers spanning many years of development. Synchronizing that is not feasible from a business standpoint. And, finally: No, we cannot upgrade to VB.net or port to another language (as much as I would love to).
(And yes, I am just a tiny bit peeved at the moment. Can you tell? My apologies, but this is costing me time and my company money, and I don't find that acceptable.)
Depending on your situation adding
#If False Then
Dim CorrectCase
#End If
might help.
Here is a real world scenario and how we solved it for our 350k LOC VB6 project.
We are using Janus Grid and at some point all the code lines which referenced DefaultValue property of JSColumn turned to defaultValue. This was an opportunity to debug the whole IDE nuisance.
What I found was that a reference to MSXML has just been added and now the IDE picks up ISchemaAttributes' defaultValue property before the Janus Grid typelib.
After some experiments I found out that the IDE collects "registered" identifiers in the following order:
Referenced Libraries/Projects from Project->References in the order they are listed
Controls from Project->Components (in unknown order)
Source Code
So the simple fix we did was to create a dummy class/interface with methods that hold our proper casing. Since we already had a project-wide typelib we referenced from every project before anything other typelib, this was painless to do.
Here is part of the IDL for our IUcsVbIntellisenseFix interface:
[
odl,
uuid(<<guid_here>>),
version(1.0),
dual,
nonextensible,
oleautomation
]
interface IUcsVbIntellisenseFix : IDispatch {
[id(1)] HRESULT DefaultValue();
[id(2)] HRESULT Selector();
[id(3)] HRESULT Standalone();
...
}
We added a lot of methods to IUcsVbIntellisenseFix, some of them named after enum items we used to misspell and whatever we wanted to fix. The same can be done with a simple VB class in a common library (ActiveX DLL) that's referenced from every project.
This way our source code at some point converged to proper casing because upon check-out the IDE actually fixed the casing as per IUcsVbIntellisenseFix casing. Now we can't misspell enums, methods or properties even if we try to.
SIMPLE WAY: Dim each variable in the case that you want. Otherwise, VBA will change it in a way that is not understandable.
Dim x, X1, X2, y, Yy as variant
in a subroutine will change ALL cases to those in the Dim statement
I can sympathise. Luckily we're allowed to turn off case sensitivity in our version control diff tool!
It seems the VB6 IDE automatic case-correction occasionally changes case in variable declarations and references, perhaps depending on the order in which modules are listed in the VBP file? But the IDE doesn't tell you that the file needs to be saved. So the problem only shows up when you saved the file because of another edit. We briefly tried to prevent this by checking out all the files in a project and setting the case carefully, but it didn't go away.
I suppose you could list the variable names that are affected - the usual suspects are one letter names like "I", "X" and "Y", perhaps because they are used in standard event handlers like MouseDown. Then write an add-in that'll search for all declarations " As" and force the case to upper. Run the add-in on your modules before you check them in. You might be able to trigger the add-in to run automatically when you save in VB6.
EDIT: Something I've just thought of: adapt Fred's answer. From now on, every time you check in a file, add a block at the top to establish canonical case for the usual suspects. If nothing else, it's easier than reverting hundreds of lines by hand. Eventually you will have this block in every file & maybe then the problem will stop happening.
#If False Then
Dim I, X, Y ' etc '
#End If
I standardised the case across the codebase, normally by using the examples above (Dim CorrectCase), and removing it again.
I then triggered VB to save EVERY file, by doing a case sensitive search/replace of "End" with "End" (no functional change, but enough to get VB to resave).
Once that was done, I could then do a single commit to standardise the case, making it MUCH easier to keep on top of it at a later date.
In this example VB6 was changing the case of the following line following a typo I made when referencing a library: -
Dim MyRecordset As ADODB.REcordset
Ugly, and now every other instance of an ADODB.REcordset thus acquired the new misspelling. I fixed this as follows: -
Type in a new declaration as follows
Dim VB6CasingSucks AS ADODB, Recordset
Note the comma and space after ADODB. Hit [ENTER] for VB6 to check the line.
At this point all instances of REcordset change back to Recordset.
Delete your new declaration.
I don't know if this fix will help with enums/other variable names.
Specifically for controlling the case of enum values, there is a VB6 IDE add-in which may be helpful. Enums seem to have a slightly unique version of this problem.
As described in the link below:
The VB6 IDE has an annoying quirk when it comes to the case of Enum
members. Unlike with other identifiers, the IDE doesn't enforce the
case of an Enum member as it was declared in the Enum block. That
occasionally causes an Enum member that was manually written to lose
its original case, unless a coder typed it carefully enough.
...
However, if a project contains a lot of Enums and/or a particular Enum
has a lot of members, redeclaring the members in each of them can get
quite tedious fast. ...
Ref: http://www.vbforums.com/showthread.php?778109-VB6-modLockEnumCase-bas-Enforce-Case-of-Enums
...load and unload the add-in as needed via the Add-In Manager
dialog box. Usage is as simple as selecting the entire Enum block,
right-clicking and then choosing the "Lock Enum Case" context menu
item.
I have a similar problem:
in a bas module there I wrote :
Private sub bla_bla()
Dim K as integer
End Sub
so in a class module the Dim k as integer will automatically be replaced by IDE become 'Dim K as integer' <-- it's not logical but then:
I correct the bas module become:
Private sub bla_bla()
Dim k as integer
End Sub
then magically the problem in the class module was solved (still be k and not automatically replaced by IDE become K). Sorry I'm poor in English
I don't think there's any to do it. The IDE will change the case of the variable name to whatever it is when it's declared. But, honestly, back in the day I worked on several large VB6 projects and never found this to be a problem. Why are people on your development team constantly changing variable declarations? It seems like you have not established a clear variable naming policy that you enforce. I know your upset, so no offense, but it might be your policies that are lacking in this regard.
Unfortunately, according to this SO thread, alternate VB6 IDEs are hard to come by. So, your best bet is to solve this problem via policy. Or move to VB.NET. :)
Wow. I've spent a lot of time programming in VB6 and I have no idea what you're on about. The only thing I can think you're referring to is that intellisense will change the capitalization of variable names to match their declarations. If you're complaining about that, I would have to wonder why the hell they've been entered any other way to begin with. And if that is your problem, no, there's no way to disable it that I'm aware of. I'd suggest you, in one go, check out every file, make sure the caps on the declarations and uses of variables all match and check back in.

Should a method parameter name specify its unit in its name?

Of the following two options for method parameter names that have a unit as well as a value, which do you prefer and why? (I've used Java syntax, but my question would apply to most languages.)
public void move(int length)
or
public void move(int lengthInMetres)
Option (1) would seem to be sufficient, but I find that when I'm coding/typing, my IDE can indicate to me I need a length value, but I typically have to break stride and look up the method's doco to determine the units, so that I pass in the correct value (and not kilometres instead of metres for example). This can be an annoying interruption to a thought process. Option (2) alleviates this problem, but can be verbose, particularly if your unit is metresPerSecondSquared or some such. Which do you think is the best?
I would recommend making your parameter (and method) names as clear as possible, even if they become wordy. You'll be glad when you look at or use the code in 6 months time, or when someone else has to look at your code.
If you think the names are becoming too long, consider rewording them. In your example you could use parameter name int Metres that would probably be clear enough. Consider changing the method name, eg public void moveMetres(int length).
In Visual Studio, the XML comments generated when you enter 3 comment symbols above a method definition will appear in Intellisense hints when you use the method in other locations. Other IDEs may have similar functionality.
Abbreviations should be used sparingly. If absolutely necessary only use commonly known and/or relevant industry-standard abbreviations and be consistent, ie use the same abbreviation everywhere.
Take a step back. Write the code then move on to something else. Come back the next day and check to see if the names are still clear.
Peer reviews can help too. Ask someone who knows the programming language (or just thinks logically), but not the specific functionality, if your naming scheme is clear enough or to help brainstorm alternatives. They might be the poor sap who has to maintain your code in the future!
I would prefer the second approach (i.e. lengthInMeters) as it describes the input needed for the method accurately. The fact that you find it confusing to figure out the units when you are just writing the code would imply it would be much more confusing when you (or some one) looks at the same piece of code later. As regard to issue of the variable name being longer you can find ways to abbreviate it (say "mtrsPerSecondSquared").
Also in defence second approach, the book Code Complete mentions a research that indicates, effort required to debug a program was minimized when variables had names averaged to 10 to 16 characters.

Standards Document

I am writing a coding standards document for a team of about 15 developers with a project load of between 10 and 15 projects a year. Amongst other sections (which I may post here as I get to them) I am writing a section on code formatting. So to start with, I think it is wise that, for whatever reason, we establish some basic, consistent code formatting/naming standards.
I've looked at roughly 10 projects written over the last 3 years from this team and I'm, obviously, finding a pretty wide range of styles. Contractors come in and out and at times, and sometimes even double the team size.
I am looking for a few suggestions for code formatting and naming standards that have really paid off ... but that can also really be justified. I think consistency and shared-patterns go a long way to making the code more maintainable ... but, are there other things I ought to consider when defining said standards?
How do you lineup parenthesis? Do you follow the same parenthesis guidelines when dealing with classes, methods, try catch blocks, switch statements, if else blocks, etc.
Do you line up fields on a column? Do you notate/prefix private variables with an underscore? Do you follow any naming conventions to make it easier to find particulars in a file? How do you order the members of your class?
What about suggestions for namespaces, packaging or source code folder/organization standards? I tend to start with something like:
<com|org|...>.<company>.<app>.<layer>.<function>.ClassName
I'm curious to see if there are other, more accepted, practices than what I am accustomed to -- before I venture off dictating these standards. Links to standards already published online would be great too -- even though I've done a bit of that already.
First find a automated code-formatter that works with your language. Reason: Whatever the document says, people will inevitably break the rules. It's much easier to run code through a formatter than to nit-pick in a code review.
If you're using a language with an existing standard (e.g. Java, C#), it's easiest to use it, or at least start with it as a first draft. Sun put a lot of thought into their formatting rules; you might as well take advantage of it.
In any case, remember that much research has shown that varying things like brace position and whitespace use has no measurable effect on productivity or understandability or prevalence of bugs. Just having any standard is the key.
Coming from the automotive industry, here's a few style standards used for concrete reasons:
Always used braces in control structures, and place them on separate lines. This eliminates problems with people adding code and including it or not including it mistakenly inside a control structure.
if(...)
{
}
All switches/selects have a default case. The default case logs an error if it's not a valid path.
For the same reason as above, any if...elseif... control structures MUST end with a default else that also logs an error if it's not a valid path. A single if statement does not require this.
In the occasional case where a loop or control structure is intentionally empty, a semicolon is always placed within to indicate that this is intentional.
while(stillwaiting())
{
;
}
Naming standards have very different styles for typedefs, defined constants, module global variables, etc. Variable names include type. You can look at the name and have a good idea of what module it pertains to, its scope, and type. This makes it easy to detect errors related to types, etc.
There are others, but these are the top off my head.
-Adam
I'm going to second Jason's suggestion.
I just completed a standards document for a team of 10-12 that work mostly in perl. The document says to use "perltidy-like indentation for complex data structures." We also provided everyone with example perltidy settings that would clean up their code to meet this standard. It was very clear and very much industry-standard for the language so we had great buyoff on it by the team.
When setting out to write this document, I asked around for some examples of great code in our repository and googled a bit to find other standards documents that smarter architects than I to construct a template. It was tough being concise and pragmatic without crossing into micro-manager territory but very much worth it; having any standard is indeed key.
Hope it works out!
It obviously varies depending on languages and technologies. By the look of your example name space I am going to guess java, in which case http://java.sun.com/docs/codeconv/ is a really good place to start. You might also want to look at something like maven's standard directory structure which will make all your projects look similar.

How do you feel about code folding? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
For those of you in the Visual Studio environment, how do you feel about wrapping any of your code in #regions? (or if any other IDE has something similar...)
9 out of 10 times, code folding means that you have failed to use the SoC principle for what its worth.
I more or less feel the same thing about partial classes. If you have a piece of code you think is too big you need to chop it up in manageable (and reusable) parts, not hide or split it up.It will bite you the next time someone needs to change it, and cannot see the logic hidden in a 250 line monster of a method.
Whenever you can, pull some code out of the main class, and into a helper or factory class.
foreach (var item in Items)
{
//.. 100 lines of validation and data logic..
}
is not as readable as
foreach (var item in Items)
{
if (ValidatorClass.Validate(item))
RepositoryClass.Update(item);
}
My $0.02 anyways.
This was talked about on Coding Horror.
My personal belief is that is that they are useful, but like anything in excess can be too much.
I use it to order my code blocks into:
Enumerations
Declarations
Constructors
Methods
Event Handlers
Properties
Sometimes you might find yourself working on a team where #regions are encouraged or required. If you're like me and you can't stand messing around with folded code you can turn off outlining for C#:
Options -> Text Editor -> C# -> Advanced Tab
Uncheck "Enter outlining mode when files open"
I use #Region to hide ugly and useless automatically generated code, which really belongs in the automatically generated part of the partial class. But, when working with old projects or upgraded projects, you don't always have that luxury.
As for other types of folding, I fold Functions all the time. If you name the function well, you will never have to look inside unless you're testing something or (re-)writing it.
While I understand the problem that Jeff, et. al. have with regions, what I don't understand is why hitting CTRL+M,CTRL+L to expand all regions in a file is so difficult to deal with.
I use Textmate (Mac only) which has Code folding and I find it really useful for folding functions, I know what my "getGet" function does, I don't need it taking up 10 lines of oh so valuable screen space.
I never use it to hide a for loop, if statement or similar unless showing the code to someone else where I will hide code they have seen to avoid showing the same code twice.
I prefer partial classes as opposed to regions.
Extensive use of regions by others also give me the impression that someone, somewhere, is violating the Single Responsibility Principle and is trying to do too many things with one object.
#Tom
Partial classes are provided so that you can separate tool auto-generated code from any customisations you may need to make after the code gen has done its bit. This means your code stays intact after you re-run the codegen and doesn't get overwritten. This is a good thing.
I'm not a fan of partial classes - I try to develop my classes such that each class has a very clear, single issue for which it's responsible. To that end, I don't believe that something with a clear responsibility should be split across multiple files. That's why I don't like partial classes.
With that said, I'm on the fence about regions. For the most part, I don't use them; however, I work with code every day that includes regions - some people go really heavy on them (folding up private methods into a region and then each method folded into its own region), and some people go light on them (folding up enums, folding up attributes, etc). My general rule of thumb, as of now, is that I only put code in regions if (a) the data is likely to remain static or will not be touched very often (like enums), or (b) if there are methods that are implemented out of necessity because of subclassing or abstract method implementation, but, again, won't be touched very often.
Regions must never be used inside methods. They may be used to group methods but this must be handled with extreme caution so that the reader of the code does not go insane. There is no point in folding methods by their modifiers. But sometimes folding may increase readability. For e.g. grouping some methods that you use for working around some issues when using an external library and you won't want to visit too often may be helpful. But the coder must always seek for solutions like wrapping the library with appropriate classes in this particular example. When all else fails, use folding for improving readibility.
This is just one of those silly discussions that lead to nowhere. If you like regions, use them. If you don't, configure your editor to turn them off. There, everybody is happy.
I generally find that when dealing with code like Events in C# where there's about 10 lines of code that are actually just part of an event declaration (the EventArgs class the delegate declaration and the event declaration) Putting a region around them and then folding them out of the way makes it a little more readable.
Region folding would be fine if I didn't have to manually maintain region groupings based on features of my code that are intrinsic to the language. For example, the compiler already knows it's a constructor. The IDE's code model already knows it's a constructor. But if I want to see a view of the code where the constructors are grouped together, for some reason I have to restate the fact that these things are constructors, by physically placing them together and then putting a group around them. The same goes for any other way of slicing up a class/struct/interface. What if I change my mind and want to see the public/protected/private stuff separated out into groups first, and then grouped by member kind?
Using regions to mark out public properties (for example) is as bad as entering a redundant comment that adds nothing to what is already discernible from the code itself.
Anyway, to avoid having to use regions for that purpose, I wrote a free, open source Visual Studio 2008 IDE add-in called Ora. It provides a grouped view automatically, making it far less necessary to maintain physical grouping or to use regions. You may find it useful.
I think that it's a useful tool, when used properly. In many cases, I feel that methods and enumerations and other things that are often folded should be little black boxes. Unless you must look at them for some reason, their contents don't matter and should be as hidden as possible. However, I never fold private methods, comments, or inner classes. Methods and enums are really the only things I fold.
My approach is similar to a few others here, using regions to organize code blocks into constructors, properties, events, etc.
There's an excellent set of VS.NET macros by Roland Weigelt available from his blog entry, Better Keyboard Support for #region ... #endregion. I've been using these for years, mapping ctrl+. to collapse the current region and ctrl++ to expand it. Find that it works a lot better that the default VS.NET functionality which folds/unfolds everything.
I personally use #Regions all the time. I find that it helps me to keep things like properties, declarations, etc separated from each other.
This is probably a good answer, too!
Coding Horror
Edit: Dang, Pat beat me to this!
The Coding Horror article actual got me thinking about this as well.
Generally, I large classes I will put a region around the member variables, constants, and properties to reduce the amount of text I have to scroll through and leave everything else outside of a region. On forms I will generally group things into "member variables, constants, and properties", form functions, and event handlers. Once again, this is more so I don't have to scroll through a lot of text when I just want to review some event handlers.
I prefer #regions myself, but an old coworker couldn't stand to have things hidden. I understood his point once I worked on a page with 7 #regions, at least 3 of which had been auto-generated and had the same name, but in general I think they're a useful way of splitting things up and keeping everything less cluttered.
I really don't have a problem with using #region to organize code. Personally, I'll usually setup different regions for things like properties, event handlers, and public/private methods.
Eclipse does some of this in Java (or PHP with plugins) on its own. Allows you to fold functions and such. I tend to like it. If I know what a function does and I am not working on it, I dont need to look at it.
Emacs has a folding minor mode, but I only fire it up occasionally. Mostly when I'm working on some monstrosity inherited from another physicist who evidently had less instruction or took less care about his/her coding practices.
Using regions (or otherwise folding code) should have nothing to do with code smells (or hiding them) or any other idea of hiding code you don't want people to "easily" see.
Regions and code folding is really all about providing a way to easily group sections of code that can be collapsed/folded/hidden to minimize the amount of extraneous "noise" around what you are currently working on. If you set things up correctly (meaning actually name your regions something useful, like the name of the method contained) then you can collapse everything except for the function you are currently editing and still maintain some level of context without having to actually see the other code lines.
There probably should be some best practice type guidelines around these ideas, but I use regions extensively to provide a standard structure to my code files (I group events, class-wide fields, private properties/methods, public properties/methods). Each method or property also has a region, where the region name is the method/property name. If I have a bunch of overloaded methods, the region name is the full signature and then that entire group is wrapped in a region that is just the function name.
I personally hate regions. The only code that should be in regions in my opinion is generated code.
When I open file I always start with Ctrl+M+O. This folds to method level. When you have regions you see nothing but region names.
Before checking in I group methods/fields logically so that it looks ok after Ctrl+M+O.
If you need regions you have to much lines in your class. I also find that this is very common.
region ThisLooksLikeWellOrganizedCodeBecauseIUseRegions
// total garbage, no structure here
endregion
Enumerations
Properties
.ctors
Methods
Event Handlers
That's all I use regions for. I had no idea you could use them inside of methods.
Sounds like a terrible idea :)

Resources