Python: Detect code which gets never executed in production - refactoring

I need to do refactoring in a big legacy Python code base.
Often I think "these lines don't get executed in production any more".
But I am unsure.
There are some tests which touch these lines. But I can't tell for sure if really no usage happens in production.
What can I do in this situation?
This question is about coverage on a production system. This question is not about coverage during testing/CI.
I don't want to comment out that lines, since I don't want to produce an error in the production system.

Common practice is to use logging inside that lines of code. e.g. you have a block of code you think is not in use. You add try catch block in the beginning of that block of code. Inside trycatch you add line to a specific json named same as your suspicious block of code.
try:
with open("block1.dat", "rb") as file:
activity = pickle.load(file)
curtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
currentact = "dt = {}; code done that: var1 = {},
var2 = {}".format(curdate, var1, var2)
activity.append(currentact)
file = open("block1.dat", "ab")
pickle.dump(activity, file)
file.close()
except Exception: pass
You can use telegram api to log code to. After a while You'll get info how often your code works and what does it do.
Then you monitor for a while and if nothing happens in a month, You can comment the block.

Is the production system deterministic?
Is it interactive?
Does control flow depend on input data?
Do you have access to all possible inputs?
Do the tests exist for a reason or just because?
I'd be careful removing code based on what is needed based on logging unless I knew there are no exceptional situations that occur rarely.
I would follow the common code paths to try to understand the codebase piece by piece in order to figure out what can be simplified. It's hard to give more specific advice without knowing more about the system you're dealing with.

We use a simple pattern to handle this: looks_like_dead_code(my_string)
This is a method which logs the string "my_string".
Example usage:
if ext == '.jpe':
looks_like_dead_code('2018-11-30 tguettler: looks fixed in mime_type_to_extension')
Using the date and the developer login is not enforced, it is just best practice.
If the line gets executed the one who is responsible for checking the logs will talk to the developer.
Since our production environments get updated roughly once in two weeks, you can be sure that this line was not executed during the last months.
I like this solution since in most cases it is like this:
you want to fix a bug or implement a new feature
you look at the code and see some lines which look like dead code. I mean code which is useless, since it won't get executed any more.
You don't have hours of time to investigate. You can dive into your vague guess that this is dead code. You want to do your actual work (fix a bug or implement a new feature. See Step1)
The method looks_like_dead_code() gives you a way to actually do something and leave a note for other developers. It only costs some seconds improve the current situation.
If you have a Tickler file System you can remind yourself to check this code in six months. At least in my context I can be very sure that this is dead code if this line was not executed for several months.

Related

Can I debug dynamically added Ruby method?

I want to store brief snippets of code in the database (following a standard signature) and "inject" them at runtime. One way would be using eval(my_code). Is there some way to debug the injected code using breakpoints, etc? (I'm using Rubymine)
I'm aware I can just log to console, etc, but I'd prefer IDE-style debugging if possible.
Hm. Let's analyze your question. Firstly, it does not seem to have anything to do with databases: You simply store a code block in the source form somewhere. It can be a file, or a database. Secondly, you don't want IDE-style "debugging", but TDD-style. (But don't concentrate on that question now.)
What you need, is assertions about your code. That is, you need to describe what output should your code produce given some input examples. And then, you need to run that code and see whether its function matches the expectations. Furthermore, if you are not sure about the source of your snippets, run them in a sandbox (with $SAFE = 4). If your code fails the expectations, raise nice errors (TypeError, or even better, your custom made exception), and then you can eg. rescue those exceptions and eg. use some default code snippets...
... but maybe I'm not actually answering the same question that you are asking. If that's the case, then let me share one link to this sourcify gem, which let's you know the source, so that you can insert a breakpoint by saying eg. require 'rdebug' in the middle of code, or can even convert code to sexps. That's all I know.

Cross version line matching

I'm considering how to do automatic bug tracking and as part of that I'm wondering what is available to match source code line numbers (or more accurate numbers mapped from instruction pointers via something like addr2line) in one version of a program to the same line in another. (Assume everything is in some kind of source control and is available to my code)
The simplest approach would be to use a diff tool/lib on the files and do some math on the line number spans, however this has some limitations:
It doesn't handle cross file motion.
It might not play well with lines that get changed
It doesn't look at the information available in the intermediate versions.
It provides no way to manually patch up lines when the diff tool gets things wrong.
It's kinda clunky
Before I start diving into developing something better:
What already exists to do this?
What features do similar system have that I've not thought of?
Why do you need to do this? If you use decent source version control, you should have access to old versions of the code, you can simply provide a link to that so people can see the bug in its original place. In fact the main problem I see with this system is that the bug may have already been fixed, but your automatic line tracking code will point to a line and say there's a bug there. Seems this system would be a pain to build, and not provide a whole lot of help in practice.
My suggestion is: instead of trying to track line numbers, which as you observed can quickly get out of sync as software changes, you should decorate each assertion (or other line of interest) with a unique identifier.
Assuming you're using C, in the case of assertions, this could be as simple as changing something like assert(x == 42); to assert(("check_x", x == 42)); -- this is functionally identical, due to the semantics of the comma operator in C and the fact that a string literal will always evaluate to true.
Of course this means that you need to identify a priori those items that you wish to track. But given that there's no generally reliable way to match up source line numbers across versions (by which I mean that for any mechanism you could propose, I believe I could propose a situation in which that mechanism does the wrong thing) I would argue that this is the best you can do.
Another idea: If you're using C++, you can make use of RAII to track dynamic scopes very elegantly. Basically, you have a Track class whose constructor takes a string describing the scope and adds this to a global stack of currently active scopes. The Track destructor pops the top element off the stack. The final ingredient is a static function Track::getState(), which simply returns a list of all currently active scopes -- this can be called from an exception handler or other error-handling mechanism.

Is there an easy way to convert HTTP_ACCEPT_LANGUAGE to Oracle NLS_LANG settings?

When adding internationalisation capabilities to an Oracle web application (build on mod_plsql), I'd like to interpret the HTTP_ACCEPT_LANGUAGE parameter and use it to set various NLS_* settings in the Oracle session.
For example:
HTTP_ACCEPT_LANGUAGE=de
alter session set nls_territory=germany;
alter session set nls_lang=...
However, you could get something more complicated I suppose...
HTTP_ACCEPT_LANGUAGE=en-us,en;q=0.5
How have folks tackled this sort of thing before?
EDIT - following on from Curt's detailed answer below
Thanks for the clear and detailed reply Curt. I didn't really make myself clear though, as I was really asking if there were any existing Oracle widgets that handled this.
I'm already down the road of manually parsing the HTTP_ACCEPT_LANGUAGE variable and - as Curt indicated in his answer - there are a few subtle areas of complexity. It feels like something that must have been done many times before. As I wrote more and more code I had that sinking "I'm reinventing the wheel" feeling. :)
There must be an existing Oracle approach for this - probably something in iAS??
EDIT - stumbled across the answer
While looking for something else, I stumbled across the UTL_I18N package, which does exactly wham I'm after:
Is there an easy way to convert HTTP_ACCEPT_LANGUAGE to Oracle NLS_LANG settings?
Sure, and it's not too tough, if you break up the problem properly and don't get to ambitious at first.
You need, essentially, two functions: one to parse the HTTP_ACCEPT_LANGUAGE and produce a language code, and one to take that and generate the appropriate set commands.
The former can get pretty sophisticated; if you're given only 'en', you probably want to generate 'en-us', you need to deal with chosing one of multiple choices when nothing matches perfectly, you need to deal with malformed header values, and so on. Don't try to tackle this all at once: just do something very simple at first, and extend it later.
The same more or less goes for the other half of it, generating the set commands, but this is pretty simple in and of itself anyway; it's really just a lookup function, though it may get a bit more sophisticated depending on what is provided to it.
What will really make or break your programming experience on something like this is your unit tests. This is an ideal problem for unit testing and test-driven development. Unit tests will make sure that when you change things, old functionality keeps working, and make it easier to add new functionality and fix bugs, because you just add another test and you have that to guide you from that point on. (You'll also find it easier to do a complete rewrite of one of the functions if you find out you've gone terribly wrong at some point, because you can easily confirm that the new version isn't breaking anything.)
How you do unit testing in your environment is probably a bit beyond the scope of this question, but let me add a few hints. First, if there's a unit test framework ("pl-sql-unit?") available for your environment, that's great. If not, don't panic. You don't need anything sophisticated: just a set of inputs and expected outputs, and a way to run them through the function and either say "all OK!" or show any incorrect results. You can probably write a single, simple PL/SQL function that reads the inputs and expected outputs from a table and does this for you.
Finally stumbled across the answer. The Oracle package UTL_I18N contains functions to map from the browser language codes to Oracle NLS settings:
utl_i18n.map_language_from_iso;
utl_i18n.map_territory_from_iso;
The mapping doesn't seem to cope very well with multi-language settings, e.g. en-us,en;q=0.5, but as long as you just use the first 5 characters the functions seem to work ok.
HTTP_ACCEPT_LANGUAGE: ar-lb,en-gb;q=0.5
v_language:
v_territory:
HTTP_ACCEPT_LANGUAGE: ar-lb
v_language: ARABIC
v_territory: LEBANON

What is the most obfuscated code you've had to fix?

Most programmers will have had the experience of debugging/fixing someone else's code. Sometimes that "someone else's code" is so obfuscated it's bad enough trying to understand what it's doing.
What's the worst (most obfuscated) code you've had to debug/fix?
If you didn't throw it away and recode it from scratch, well why didn't you?
PHP OSCommerce is enough to say, it is obfuscated code...
a Java class
only static methods that manipulates DOM
8000 LOCs
long chain of methods that return null on "error": a.b().c().d().e()
very long methods (400/500 LOC each)
nested if, while, like:
if (...) {
for (...) {
if (...) {
if (...) {
while (...) {
if (...) {
cut-and-paste oriented programming
no exceptions, all exceptions are catched and "handled" using printStackTrace()
no unit tests
no documentation
I was tempted to throw away and recode... but, after 3 days of hard debugging,
I've added the magic if :-)
Spaghetti code PHP CMS system.
by default, programmers think someone else's code is obfuscated.
The worse I probably had to do was interpreting what variables i1, i2 j, k, t were in a simple method and they were not counters in 'for' loops.
In all other circumstances I guess the problem area was difficult which made the code look difficult.
I found this line in our codebase today and thought it was a nice example of sneaky obfuscation:
if (MULTICLICK_ENABLED.equals(propService.getProperty(PropertyNames.MULTICLICK_ENABLED))) {} else {
return false;
}
Just making sure I read the whole line. NO SKIMREADING.
When working on a GWT project, I would reach parts of GWT-compiled obfuscated JS code which wasn't mine.
Now good luck debugging real obfuscated code.
I can't remember the full code, but a single part of it remains burned into my memory as something I spend hours trying to understand:
do{
$tmp = shift unless shift;
$tmp;
}while($tmp);
I couldn't understand it at first, it looks so useless, then I printed out #_ for a list of arguments, a series of alternating boolean and function names, the code was used in conjunction with a library detection module that changed behaviour if a function was broken, but the code was so badly documented and made of things like that which made no sense without a complete understanding of the full code I gave up and rewrote the whole thing.
UPDATE from DVK:
And, lest someone claims this was because Perl is unreadable as opposed to coder being a golf master instead of good software developer, here's the same code in a slightly less obfuscated form (the really correct code wouldn't even HAVE alternating sub names and booleans in the first place :)
# This subroutine take a list of alternating true/false flags
# and subroutine names; and executes the named subroutines for which flag is true.
# I am also weird, otherwise I'd have simply have passed list of subroutines to execute :)
my #flags_and_sub_names_list = #_;
while ( #flags_and_sub_names_list ) {
my $flag = shift #flags_and_sub_names_list;
my $subName = shift #flags_and_sub_names_list;
next unless $flag && $subName;
&{ $subName }; # Call the named subroutine
}
I've had a case of a 300lines function performing input sanitization which missed a certain corner case. It was parsing certain situations manually using IndexOf and Substring plus a lot of inlined variables and constants (looks like the original coder didn't know anything about good practices), and no comment was provided. Throwing it away wasn't feasible due to time constraints and the fact that I didn't have the specification required so rewriting it would've meant understanding the original, but after understanding it fixing it was just quicker. I also added lots of comments, so whoever shall come after me won't feel the same pain taking a look at it...
The Perl statement:
select((select(s),$|=1)[0])
which, at the suggestion of the original author (Randal Schwartz himself, who said he disliked it but nothing else was available at the time), was replaced with something a little more understandable:
IO::Handle->autoflush
Beyond that one-liner, some of the Java JDBC libraries from IBM are obfuscated and all variables and functions are either combinations of the letter 'l' and '1' or single/double characters - very hard to track anything down until you get them all renamed. Needed to do this to track down why they worked fine in IBM's JRE but not Sun's.
If you're talking about HLL codes, once I was updating project written by a chinese and all comments were chinese (stored in ansii) and it was a horror to understand some code fragments, if you're talking about low level code there were MANY of them (obfuscated, mutated, vm-ed...).
I once had to reverse engineer a Java 1.1 framework that:
Extended event-driven SAX parser classes for every class, even those that didn't parse XML (the overridden methods were simply invoked ad hoc by other code)
Custom runtime exceptions were thrown in lieu of method invocations wherever possible. As a result, most of the business logic landed in a nested series of catch blocks.
If I had to guess, it was probably someone's "smart" idea that method invocations were expensive in Java 1.1, so throwing exceptions for non-exceptional flow control was somehow considered an optimization.
Went through about three bottles of eye drops.
I once found a time bomb that had been intentionally obfuscated.
When I had finally decoded what it was doing I mentioned it to the manager who said they knew about the time bomb but had left it in place because it was so ineffective and was interwoven with other code.
The time bomb was (presumably) supposed to go off after a certain date.
Instead, it had a bug in it so it only activated if someone was working after lunchtime on Dec 31st.
It had taken three years for that circumstance to occur since the guy who wrote the time bomb left the company.

Commenting code that is removed [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 5 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Is it a good practice to comment code that is removed? For example:
// Code to do {task} was removed by Ajahn on 10/10/08 because {reason}.
Someone in my developer group during a peer review made a note that we should comment the lines of code to be removed. I thought this was a terrible suggestion, since it clutters the code with useless comments. Which one of us is right?
Generally, code that is removed should not be commented, precisely because it clutters the codebase (and, why would one comment on something that doesn't exist?).
Your defect tracking system or source control management tools are where such comments belong.
There are some (rare) situations when commenting code out (instead of deleting) is a good idea. Here's one.
I had a line of code that seemed good and necessary. Later I realized that it is unnecessary and harmful. Instead of deleting the line, I commented it out, adding another comment: "The line below is wrong for such and such reason". Why?
Because I am sure next reader of the code will first think that not having this line is an error and will try to add it back. (Even if the reader is me two years from now.) I don't expect him to consult source control first. I need to add comment to warn him of this tricky situation; and having wrong line and the reason why it is wrong happened to be the best way to do so.
I agree that it is not a good idea to leave code removed in comments.
Code history should be viewed through a version control system, which is where old code can be found, as well as the reason it was removed.
You should delete the code always.
As for being able to see old/removed code, that's what revision control is.
Depends on the reason for removal.
I think of comments as hints for people maintaining the code in the future, if the information that the code was there but was removed can be helpful to someone maintaining the code (maybe as a "don't do that" sign) then it should be there.
Otherwise adding detailed comments with names and dates on every code change just make the whole thing unreadable.
I think it's pretty useless and make the code less readable. Just think what it will be like after some monthes....
// removed because of this and that
/*
removed this stuff because my left leg...
*/
doSomething();
// this piece of has been removed, we don't need it...
You'll spend half an hour to find out what's going on
The question is, why do you remove code?
Is it useless? Was it a mistake to put it there in the first place?
No comments needed from my point of view.
It's useful when debugging, but there's no reason to check in code that way. The whole point of source control is being able to recover old versions without cluttering up the code with commented-out code.
I would suggest that, yes it's good practice to comment on code that has been removed but not in the code itself.
To further clarify this position, you should be using a source code control system (SCCS) that allows some form of check-in comment. That is where you should place the comments about why code was removed. The SCCS will provide the full contextual history of what has happened to the code, including what has been removed. By adding check-in comments you further clarify that history.
Adding comments in the code directly simply leads to clutter.
The recent consensus (from other discussions on here) is that the code should just be removed.
I personally will comment out code and tag it with a date or a reason. If it's old/stale and I'm passing through the file, then I strip it out. Version control makes going back easy, but not as easy as uncommenting...
It sounds like you are trying to get around versioning your code. In theory, it sounds like a great idea, but in practice it can get very confusing very quickly.
I highly recommend commenting code out for debugging or running other tests, but after the final decision has been made remove it from the file completely!
Get a good versioning system in place and I think you'll find that the practice of commenting out changes is messy.
Nobody here has written much about why you shouldn't leave commented-out code, other than that it looks messy. I think the biggest reason is that the code is likely to stop working. Nobody's compiling it. Nobody's running it through unit tests. When people refactor the rest of the code, they're not refactoring it. So pretty soon, it's going to become useless. Or worse than useless -- someone might uncomment it, blindly trusting that it works.
There are times when I'll comment out code, if we're still doing heavy design/development on a project. At this stage, I'm usually trying out several different designs, looking for the right approach. And sometimes the right approach is one I had already attempted earlier. So it's nice if that code isn't lost in the depths of source control. But once the design has been settled, I'll get rid of the old code.
In general I tend to comment very sparsely. I believe good code should be easy to read without much commenting.
I also version my code. I suppose I could do diffs over the last twenty checkins to see if a particular line has changed for a particular reason. But that would be a huge waste of my time for most changes.
So I try comment my code smartly. If some code is being deleted for a fairly obvious reason, I won't bother to comment the deletion. But if a piece of code is being deleted for a subtle reason (for example it performed a function that is now being handled by a different thread) I will comment-out or delete the code and add a banner comment why:
// this is now handled by the heartbeat thread
// m_data.resort(m_ascending);
Or:
// don't re-sort here, as it is now handled by the heartbeat thread
Just last month, I encountered a piece of code that I had changed a year ago to fix a particular issue, but didn't add a comment explaining why. Here is the original code:
cutoff = m_previous_cutofftime;
And here is the code as it was initially fixed to use a correct cutoff time when resuming an interrupted state:
cutoff = (!ok_during) ? m_previous_cutofftime : 0;
Of course another unrelated issue came up, which happened to touch the same line of code, in this case reverting it back to its original state. So the new issue was now fixed, but the old issue suddenly became rebroken. D'oh!
So now the checked-in code looks like this:
// this works for overlong events but not resuming
// cutoff = m_previous_cutofftime;
// this works for resuming but not overlong events
// cutoff = (!ok_during) ? m_previous_cutofftime : 0;
// this works for both
cutoff = (!resuming || !ok_during) ? m_previous_cutofftime : 0;
Of course, YMMV.
As the lone dissenting voice, I will say that there is a place for commenting out code in special circumstances. Sometimes, you'll have data that continues to exist that was run through that old code and the clearest thing to do is to leave that old code in with source. In such a case I'd probably leave little note indicating why the old code was simply commented out. Any programmers coming along after would be able to understand the still extant data, without having to psychically detect the need to check old versions.
Usually though, I find commented out code completely odious and I often delete it when I come across it.
If you are removing code. You should not comment it that you removed it. This is the entire purpose of source control (You are using source control? Right?), and as you state the comment just clutters up the code.
I agree that it's a terrible suggestion. That's why you have Source Control that has revisions. If you need to go back and see what was changed between two revisions, diff the two revisions.
I hate seeing code that's cluttered with commented out code. Delete the code and write a commit message that says why it was removed. You do use source control, don't you?
Don't litter active code with dead code.
I'll add my voice to the consensus: put the comments on why code was deleted in the source control repository, not in the code.
This is one of those "broken" windows thinkgs like compiler hints/warnings left unaddressed. it will hurt you one day and it promotes sloppiness in the team.
The check in comment in version control can track what/why this code was removed - if the developer didnt leave a note, track them down and throttle them.
A little anecdote, for fun: I was in a company, some years ago, knowing nothing of source code version control (they got such tool later...).
So they had a rule, in our C sources: "when you make a change, disable the old code with preprocessor macros":
#ifdef OLD /* PL - 11/10/1989 */
void Buggy()
{
// ...
}
#else
void Good()
{
// ...
}
#end
No need to say, our sources quickly became unreadable! It was a nightmare to maintain...
That's why I added to SciTE the capacity to jump between nested #ifdef / #else / #end and such... It can be still useful in more regular cases.
Later, I wrote a Visual Studio macro to happily get rid of old code, once we got our VCS!
Now, like buti-oxa, sometime I felt the need to indicate why I removed some code. For the same reason, or because I remove old code which I feel is no longer needed, but I am not too sure (legacy, legacy...). Obviously not in all cases!
I don't leave such comment, actually, but I can understand the need.
At worse, I would comment out in one version, and remove everything in the next version...
At my current work, for important local changes, we leave the old code but can reactivate it by properties, in case of emergency. After testing it some time in production, we eventually remove the old code.
Of course, VCS comments are the best option, but when the change is a few lines in a big file with other changes, referencing the little removal can be hard...
If you are in the middle of major changes, and need to make a fix to existing functionality, commenting out the future code is a reasonable thing to do, provided you remark that this is future functionality, at least until we have futures friendly source control systems.
I comment unnused code because you never know when will you have to fallback on the ancient code, and maybe the old code will help other people to understand it, if it was simpler back then.
I agree with you Andrew; IMO this is why you use version control. With good checkin/commit comments and a diff tool you can always find out why lines were removed.
If you are using any form of Source Control then this approach is somewhat redundant (as long as descriptive log messages are used)
I also think it's a terrible suggestion :)
You should use source control and if you remove some code you can add a comment when you commit. So you still have the code history if you want...
There's a general "clean code" practice that says that one should never keep removed code around as commented out since it clutters and since your CVS/SVN would archive it anyway.
While I do agree with the principle I do not think that it is an acceptable approach for all development situations. In my experience very few people keep track of all the changes in the code and every check-in. as a result, if there is no commented out code, they may never be aware that it has ever existed.
Commenting code out like that could be a way of offering a general warning that it is about to be removed, but of course, there are no guarantees that interested parties would ever see that warning (though if they frequently work with that file, they will see it).
I personally believe that the correct approach is to factor that code out to another private method, and then contact relevant stakeholders and notify them of the pending removal before actually getting rid of the function.
Where I am at we comment out old code for one release cycle and then remove the comments after that. (It gives us quick fix ability if some of the new code is problematic and needs to be replaced with the old code.)
In almost all cases old code should of course be removed and tracked in your RCS.
Like all things though, I think that making the statement 'All deleted code will ALWAYS be removed' is an incorrect approach.
The old code might want to be left in for a miriad of reasons. The prime reason to leave the code in is when you want any developer who is working in that section of code in the future to see the old code.
Relying on source tracking obviously does not give this.
So, I believe the correct answer is:
-Delete old code unless leaving it in provides crucial information that the next developer in the code would require. Ie, remove it 99% of the time but don't make a draconian rule that would remove your ability to provide much needed documentation to the next developer when circumstances warrant it.

Resources