Pointless getter checks when updating an object - performance

Hopefully this will not come across as a silly or pedantic question, but I'm curious.
Occasionally I'll be in a situation where an existing object's properties may need to be updated with new variables, and I'll do it like this (in no particular language):
public void Update(date, somevar){
if(date > this.Date){
this.Var = somevar;
}
}
The idea being that if the date passed to the function is more recent than the date in the current object, the variable is updated. Think of it as like a basic way of caching something.
Now, the interesting part is that I know somevar will never be "old" when compared to this.Var, but it may be the same. So as far as I can see, checking the date is pointless, and therefore a pointless operation for the program to perform.
So what this is really about is whether it's better - in whatever way - to perform a write to this.Var every time Update is called, or getting this.Date, comparing it, then possibly performing the write. And just to throw in something interesting here, what if Update were to be called multiple times?
If the example I've given makes no sense or has holes in it, I apologise; I can't think of another way of giving an example, but hopefully you can see the point I'm trying to make here.

Unless for some reason assignment is an expensive operation (e.g. it always triggers a database write), this isn't going to make your programme faster.
The point of putting checks in your setters is usually to enforce data integrity, i.e. to preserve programme invariants, and thus the correctness of your other code, which is rather more important.

Related

Null checks for a complex dereference chain in Java 8

So I have a class generated by some contract (so no modifications allowed) with multiple data layers, I get it through soap request and then in my backend I have something like this:
value = bigRequest.getData().getSamples().get(0).getValuableData().getValue()
and in every dereference in that chain I can have null result. Class itself has no logic, just pure data with accessors, but nonetheless. I'm kinda sick of thought to make an ugly boilerplate of not-null checks for every single dereference, so I'm thinking of the best practice here:
Actually make the ugly boilerplate (either with ifs or with asserts). I assume that its what I've got to do, but I have faint hopes.
Do some Optional magic. But with no access to modify the source it'll probably be even uglier.
Catch the NPE. It's ugly in its heart, but in this particular case I feel it's the best option, just because it's part of the logic, either I have that value or not. But catching NPE makes me shiver.
Something I can't see by myself now.
I'm actually feel myself a little bit uncomfortable with this question cause I feel that NPE theme is explored to the bones, but I had no success in search.
I agree with both of you that Andrew Vershinin’s suggestion is the best we can do here and thus deserves to be posted as an answer.
nullableValue = Optional.ofNullable(bigRequest)
.map(RequestCls::getData)
.map(DataCls::getSamples)
.filter(samples -> ! samples.isEmpty())
.map(samples -> samples.get(0))
.map(SampleCls::getValuableData)
.map(ValDataCls::getValue)
.orElse(null);
You will need to substitute the right class or interface names in the method references (or you may rewrite as lambdas if you prefer). Edit: If bigRequest itself cannot be null, the first method call should be just Optional.of(bigRequest).
It’s not the primarily intended use of Optional, but I find it OK. And better than items 1. and 3. (and 4.) from your question.

Is it better to give a function an additional parameter, or to make the parameter static?

Many times I've come across a situation where I am calling a method many times with one parameter always exactly the same, and I need to add some additional parameter defined in my main method. I'm never sure what to do about the additional parameter. It seems like it might be a better idea to make the parameter static to make the code cleaner, but then, static variables are not very good. Still, adding parameters can in some cases lead to very long lists of parameters that are in a way unnecessary, and I imagine that there will be a speed disadvantage as well when your function is short, called often, and has a lot of parameters.
Here's the most recent code (kotlin) that has had me thinking about this problem, but I've ran into the problem a lot in different languages.
tailrec fun getAncestor(ind: Int, parent: IntArray): Int {
if (parent[ind] == ind) return ind else return getAncestor(parent[ind], parent);
}
Is it better to make parent a static variable and not have it as a parameter to the method? (Assume that every time that this method is called, the second parameter will be the same. I'm usually only writing short 100 line code files for competitive programming so there is no chance the method will be reused with a different second parameter)
In general if you're working on a project, you'll want to stay away from static variables because when the project starts scaling up and getting bigger, it'll be very hard to keep track of the static variables and this will make it very hard to debug.
In competitive programming, I suggest go for the quicker approach.
In my opinion, it depends on your preferred coding paradigm. For instance, if you'd like your code to conform to the functional paradigm, your code is fine how it is now (regarding the parameter), however, if you don't care about the functional programming, having that param extracted out, as a constant is alright. I see no reason why it should be bad. If you worry about the performance, you could declare it as lazy, but otherwise it'd fine, I guess.

Is checking Perl function arguments worth it?

There's a lot of buzz about MooseX::Method::Signatures and even before that, modules such as Params::Validate that are designed to type check every argument to methods or functions. I'm considering using the former for all my future Perl code, both personal and at my place of work. But I'm not sure if it's worth the effort.
I'm thinking of all the Perl code I've seen (and written) before that performs no such checking. I very rarely see a module do this:
my ($a, $b) = #_;
defined $a or croak '$a must be defined!';
!ref $a or croak '$a must be a scalar!";
...
#_ == 2 or croak "Too many arguments!";
Perhaps because it's simply too much work without some kind of helper module, but perhaps because in practice we don't send excess arguments to functions, and we don't send arrayrefs to methods that expect scalars - or if we do, we have use warnings; and we quickly hear about it - a duck typing approach.
So is Perl type checking worth the performance hit, or are its strengths predominantly shown in compiled, strongly typed languages such as C or Java?
I'm interested in answers from anyone who has experience writing Perl that uses these modules and has seen benefits (or not) from their use; if your company/project has any policies relating to type checking; and any problems with type checking and performance.
UPDATE: I read an interesting article on the subject recently, called Strong Testing vs. Strong Typing. Ignoring the slight Python bias, it essentially states that type checking can be suffocating in some instances, and even if your program passes the type checks, it's no guarantee of correctness - proper tests are the only way to be sure.
If it's important for you to check that an argument is exactly what you need, it's worth it. Performance only matters when you already have correct functioning. It doesn't matter how fast you can get a wrong answer or a core dump. :)
Now, that sounds like a stupid thing to say, but consider some cases where it isn't. Do I really care what's in #_ here?
sub looks_like_a_number { $_[0] !~ /\D/ }
sub is_a_dog { eval { $_[0]->DOES( 'Dog' ) } }
In those two examples, if the argument isn't what you expect, you are still going to get the right answer because the invalid arguments won't pass the tests. Some people see that as ugly, and I can see their point, but I also think the alternative is ugly. Who wins?
However, there are going to be times that you need guard conditions because your situation isn't so simple. The next thing you have to pass your data to might expect them to be within certain ranges or of certain types and don't fail elegantly.
When I think about guard conditions, I think through what could happen if the inputs are bad and how much I care about the failure. I have to judge that by the demands of each situation. I know that sucks as an answer, but I tend to like it better than a bondage-and-discipline approach where you have to go through all the mess even when it doesn't matter.
I dread Params::Validate because its code is often longer than my subroutine. The Moose stuff is very attractive, but you have to realize that it's a way for you to declare what you want and you still get what you could build by hand (you just don't have to see it or do it). The biggest thing I hate about Perl is the lack of optional method signatures, and that's one of the most attractive features in Perl 6 as well as Moose.
I basically concur with brian. How much you need to worry about your method's inputs depends heavily on how much you are concerned that a) someone will input bad data, and b) bad data will corrupt the purpose of the method. I would also add that there is a difference between external and internal methods. You need to be more diligent about public methods because you're making a promise to consumers of your class; conversely you can be less diligent about internal methods as you have greater (theoretical) control over the code that accesses it, and have only yourself to blame if things go wrong.
MooseX::Method::Signatures is an elegant solution to adding a simple declarative way to explain the parameters of a method. Method::Signatures::Simple and Params::Validate are nice but lack one of the features I find most appealing about Moose: the Type system. I have used MooseX::Declare and by extension MooseX::Method::Signatures for several projects and I find that the bar to writing the extra checks is so minimal it's almost seductive.
Yes its worth it - defensive programming is one of those things that are always worth it.
The counterargument I've seen presented to this is that checking parameters on every single function call is redundant and a waste of CPU time. This argument's supporters favor a model in which all incoming data is rigorously checked when it first enters the system, but internal methods have no parameter checks because they should only be called by code which will pass them data which has already passed the checks at the system's border, so it is assumed to still be valid.
In theory, I really like the sound of that, but I can also see how easily it can fall like a house of cards if someone uses the system (or the system needs to grow to allow use) in a way that was unforeseen when the initial validation border is established. All it takes is one external call to an internal function and all bets are off.
In practice, I'm using Moose at the moment and Moose doesn't really give you the option to bypass validation at the attribute level, plus MooseX::Declare handles and validates method parameters with less fuss than unrolling #_ by hand, so it's pretty much a moot point.
I want to mention two points here.
The first are the tests, the second the performance question.
1) Tests
You mentioned that tests can do a lot and that tests are the only way
to be sure that your code is correct. In general i would say this is
absolutly correct. But tests itself only solves one problem.
If you write a module you have two problems or lets say two different
people that uses your module.
You as a developer and a user that uses your module. Tests helps with the
first that your module is correct and do the right thing, but it didn't
help the user that just uses your module.
For the later, i have one example. i had written a module using Moose
and some other stuff, my code ended always in a Segmentation fault.
Then i began to debug my code and search for the problem. I spend around
4 hours of time to find the error. In the end the problem was that i have
used Moose with the Array Trait. I used the "map" function and i didn't
provide a subroutine function, just a string or something else.
Sure this was an absolutly stupid error of mine, but i spend a long time to
debug it. In the end just a checking of the input that the argument is
a subref would cost the developer 10 seconds of time, and would cost me
and propably other a lot of more time.
I also know of other examples. I had written a REST Client to an interface
completly OOP with Moose. In the end you always got back Objects, you
can change the attributes but sure it didn't call the REST API for
every change you did. Instead you change your values and in the end you
call a update() method that transfers the data, and change the values.
Now i had a user that then wrote:
$obj->update({ foo => 'bar' })
Sure i got an error back, that update() does not work. But sure it didn't
work, because the update() method didn't accept a hashref. It only does
a synchronisation of the actual state of the object with the online
service. The correct code would be.
$obj->foo('bar');
$obj->update();
The first thing works because i never did a checking of the arguments. And i don't throw an error if someone gives more arguments then i expect. The method just starts normal like.
sub update {
my ( $self ) = #_;
...
}
Sure all my tests absolutely works 100% fine. But handling these errors that
are not errors cost me time too. And it costs the user propably a lot
of more time.
So in the end. Yes, tests are the only correct way to ensure that your code
works correct. But that doesn't mean that type checking is meaningless.
Type checking is there to help all your non-developers (on your module)
to use your module correctly. And saves you and others time finding
dump errors.
2) Performance
The short: You don't care for performance until you care.
That means until your module works to slow, Performance is always fast
enough and you don't need to care for this. If your module really works
to slow you need further investigations. But for these investigions
you should use a profiler like Devel::NYTProf to look what is slow.
And i would say. In 99% slowliness is not because you do type
checking, it is more your algorithm. You do a lot of computation, calling
functions to often etc. Often it helps if you do completly other solutions
use another better algorithm, do caching or something else, and the
performance hit is not your type checking. But even if the checking is the
performance hit. Then just remove it where it matters.
There is no reason to leave the type checking where performance don't
matters. Do you think type checking does matter in a case like above?
Where i have written a REST Client? 99% of performance issues here are
the amount of request that goes to the webservice or the time for such an
request. Don't using type checking or MooseX::Declare etc. would propably
speed up absolutly nothing.
And even if you see performance disadvantages. Sometimes it is acceptable.
Because the speed doesn't matter or sometimes something gives you a greater
value. DBIx::Class is slower then pure SQL with DBI, but DBIx::Class
gives you a lot for these.
Params::Validate works great,but of course checking args slows things down. Tests are mandatory(at least in the code I write).
Yes it's absolutely worth it, because it will help during development, maintenance, debugging, etc.
If a developer accidentally sends the wrong parameters to a method, a useful error message will be generated, instead of the error being propagated down to somewhere else.
I'm using Moose extensively for a fairly large OO project I'm working on. Moose's strict type checking has saved my bacon on a few occassions. Most importantly it has helped avoid situations where "undef" values are incorrectly being passed to the method. In just those instances alone it saved me hours of debugging time..
The performance hit is definitely there, but it can be managed. 2 hours of using NYTProf helped me find a few Moose Attributes that I was grinding too hard and I just refactored my code and got 4x performance improvement.
Use type checking. Defensive coding is worth it.
Patrick.
Sometimes. I generally do it whenever I'm passing options via hash or hashref. In these cases it's very easy to misremember or misspell an option name, and checking with Params::Check can save a lot of troubleshooting time.
For example:
sub revise {
my ($file, $options) = #_;
my $tmpl = {
test_mode => { allow => [0,1], 'default' => 0 },
verbosity => { allow => qw/^\d+$/, 'default' => 1 },
force_update => { allow => [0,1], 'default' => 0 },
required_fields => { 'default' => [] },
create_backup => { allow => [0,1], 'default' => 1 },
};
my $args = check($tmpl, $options, 1)
or croak "Could not parse arguments: " . Params::Check::last_error();
...
}
Prior to adding these checks, I'd forget whether the names used underscores or hyphens, pass require_backup instead of create_backup, etc. And this is for code I wrote myself--if other people are going to use it, you should definitely do some sort of idiot-proofing. Params::Check makes it fairly easy to do type checking, allowed value checking, default values, required options, storing option values to other variables and more.

Should we create objects if we need them only once in our code?

This is a coding style questions:-
Here is the case
Dim obj1 as new ClassA
' Some lines of code which does not uses obj1
Something.Pass(obj1) ' Only line we are using obj1
Or should we directly initiaize the object when passing it as an argument?
Something.new(new ClassA())
If you're only using the object in that method call, it's probably better to just pass in "new ClassA()" directly into the call. This way, you won't have an extra variable lying around, that someone might mistakenly try to use in the future.
However, for readability, and debugging it's often useful to create the temporary object and pass it in. This way, you can inspect the variable in the debugger before it gets passed into the method.
Your question asks "should we create objects"; both your examples create an object.
There is not logically any difference at all between the two examples. Giving a name to an object allows it to be referred to in more than one place. If you aren't doing that, it's much clearer to not give it a name, so someone maintaining the code can instantly see that the object is only passed to one other method.
Generally speaking, I would say no, there's nothing wrong with what you're doing, but it does sound like there may be some blurring of responsibilities between your calling function, the called function, and the temporary object. Perhaps a bit of refactoring is in order.
I personally prefer things to be consistent, and to make my life easier (what I consider making my life easier may not be what you consider making your life easier... so do with this advice what you will).
If you have something like this:
o = new Foo();
i = 7
bar(o, i, new Car());
then you have an inconsistency (two parameters are variables, the other is created on the fly). To be consistent you would either:
always pass things as variables
always pass things created on the fly
only one of those will work (the first one!).
There are also practical aspects to it as well: making a variable makes debugging easier.
Here are some examples:
while(there are still lines in the file)
{
foo(nextLine());
}
If you want to display the next line for debugging you now need to change it to:
while(there are still lines in the file)
{
line = nextLine();
display(line);
foo(line);
}
It would be easier (and safer) to have made the variable up front. It is safer because you are less likely to accidentally call nextLine() twice (by forgetting to take it out of the foo call).
You can also view the value of "line" in a debugger without having to go into the "foo" method.
Another one that can happen is this:
foo(b.c.d()); // in Java you get a NullPointerException on this line...
was "b" or "c" the thing that was null? No idea.
Bar b;
Car c;
int d;
b = ...;
c = b.c; // NullPointException here - you know b was null
d = c.d(); // NullPointException here - you know c was null
foo(d); // can view d in the debugger without having to go into foo.
Some debuggers will let you highlight "d()" and see what it outputs, but that is dangerous if "d()" has side effects as the debugger will wind up calling "d()" each time you get the value via the debugger).
The way I code for this does make it more verbose (like this answer :-) but it also makes my life easier if things are not working as expected - I spend far less time wondering what went wrong and I am also able to fix bugs much faster than before I adopted this way of doing things.
To me the most important thing when programming is to be consistent. If you are consistent then the code is much easier to get through because you are not constantly having to figure out what is going on, and your eyes get drawn to any "oddities" in the code.

Should I make sure arguments aren't null before using them in a function?

The title may not really explain what I'm really trying to get at, couldn't really think of a way to describe what I mean.
I was wondering if it is good practice to check the arguments that a function accepts for nulls or empty before using them. I have this function which just wraps some hash creation like so.
Public Shared Function GenerateHash(ByVal FilePath As IO.FileInfo) As String
If (FilePath Is Nothing) Then
Throw New ArgumentNullException("FilePath")
End If
Dim _sha As New Security.Cryptography.MD5CryptoServiceProvider
Dim _Hash = Convert.ToBase64String(_sha.ComputeHash(New IO.FileStream(FilePath.FullName, IO.FileMode.Open, IO.FileAccess.Read)))
Return _Hash
End Function
As you can see I just takes a IO.Fileinfo as an argument, at the start of the function I am checking to make sure that it is not nothing.
I'm wondering is this good practice or should I just let it get to the actual hasher and then throw the exception because it is null.?
Thanks.
In general, I'd suggest it's good practice to validate all of the arguments to public functions/methods before using them, and fail early rather than after executing half of the function. In this case, you're right to throw the exception.
Depending on what your method is doing, failing early could be important. If your method was altering instance data on your class, you don't want it to alter half of the data, then encounter the null and throw an exception, as your object's data might them be in an intermediate and possibly invalid state.
If you're using an OO language then I'd suggest it's essential to validate the arguments to public methods, but less important with private and protected methods. My rationale here is that you don't know what the inputs to a public method will be - any other code could create an instance of your class and call it's public methods, and pass in unexpected/invalid data. Private methods, however, are called from inside the class, and the class should already have validated any data passing around internally.
One of my favourite techniques in C++ was to DEBUG_ASSERT on NULL pointers. This was drilled into me by senior programmers (along with const correctness) and is one of the things I was most strict on during code reviews. We never dereferenced a pointer without first asserting it wasn't null.
A debug assert is only active for debug targets (it gets stripped in release) so you don't have the extra overhead in production to test for thousands of if's. Generally it would either throw an exception or trigger a hardware breakpoint. We even had systems that would throw up a debug console with the file/line info and an option to ignore the assert (once or indefinitely for the session). That was such a great debug and QA tool (we'd get screenshots with the assert on the testers screen and information on whether the program continued if ignored).
I suggest asserting all invariants in your code including unexpected nulls. If performance of the if's becomes a concern find a way to conditionally compile and keep them active in debug targets. Like source control, this is a technique that has saved my ass more often than it has caused me grief (the most important litmus test of any development technique).
Yes, it's good practice to validate all arguments at the beginning of a method and throw appropriate exceptions like ArgumentException, ArgumentNullException, or ArgumentOutOfRangeException.
If the method is private such that only you the programmer could pass invalid arguments, then you may choose to assert each argument is valid (Debug.Assert) instead of throw.
If NULL is an inacceptable input, throw an exception. By yourself, like you did in your sample, so that the message is helpful.
Another method of handling NULL inputs is just to respont with a NULL in turn. Depends on the type of function -- in the example above I would keep the exception.
If its for an externally facing API then I would say you want to check every parameter as the input cannot be trusted.
However, if it is only going to be used internally then the input should be able to be trusted and you can save yourself a bunch of code that's not adding value to the software.
You should check all arguments against the set of assumptions that you make in that function about their values.
As in your example, if a null argument to your function doesn't make any sense and you're assuming that anyone using your function will know this then being passed a null argument shows some sort of error and some sort of action taken (eg. throwing an exception). And if you use asserts (as James Fassett got in and said before me ;-) ) they cost you nothing in a release version. (they cost you almost nothing in a debug version either)
The same thing applies to any other assumption.
And it's going to be easier to trace the error if you generate it than if you leave it to some standard library routine to throw the exception. You will be able to provide much more useful contextual information.
It's outside the bounds of this question, but you do need to expose the assumptions that your function makes - for example, through the comment header to your function.
According to The Pragmatic Programmer by Andrew Hunt and David Thomas, it is the responsibility of the caller to make sure it gives valid input. So, you must now choose whether you consider a null input to be valid. Unless it makes specific sense to consider null to be a valid input (e.g. it is probably a good idea to consider null to be a legal input if you're testing for equality), I would consider it invalid. That way your program, when it hits incorrect input, will fail sooner. If your program is going to encounter an error condition, you want it to happen as soon as possible. In the event your function does inadvertently get passed a null, you should consider it to be a bug, and react accordingly (i.e. instead of throwing an exception, you should consider making use of an assertion that kills the program, until you are releasing the program).
Classic design by contract: If input is right, output will be right. If input is wrong, there is a bug. (if input is right but output is wrong, there is a bug. That's a gimme.)
I'll add a couple of elaborations (in bold) to the excellent design by contract advice offerred by Brian earlier...
The priniples of "design by contract" require that you define what is acceptable for the caller to pass in (the valid domain of input values) and then, for any valid input, what the method/provider will do.
For an internal method, you can define NULLs as outside the domain of valid input parameters. In this case, you would immediately assert that the input parameter value is NOT NULL. The key insight in this contract specification is that any call passing in a NULL value IS A CALLER'S BUG and the error thrown by the assert statement is the proper behavior.
Now, while very well defined and parsimonius, if you're exposing the method to external/public callers, you should ask yourself, is that the contract I/we really want?
Probably not. In a public interface, you'd probably accept the NULL (as technically in the domain of inputs that the method accepts), but then decline to process gracefully w/ a return message. (More work to meet the naturally more complex customer-facing requirement.)
In either case, what you're after is a protocol that handles all of the cases from both the perspective of the caller and the provider, not lots of scattershot tests that can make it difficult to assess the completeness or lack of completeness of the contractual condition coverage.
Most of the time, letting it just throw the exception is pretty reasonable as long as you are sure the exception won't be ignored.
If you can add something to it, however, it doesn't hurt to wrap the exception with one that is more accurate and rethrow it. Decoding "NullPointerException" is going to take a bit longer than "IllegalArgumentException("FilePath MUST be supplied")" (Or whatever).
Lately I've been working on a platform where you have to run an obfuscator before you test. Every stack trace looks like monkeys typing random crap, so I got in the habit of checking my arguments all the time.
I'd love to see a "nullable" or "nonull" modifier on variables and arguments so the compiler can check for you.
If you're writing a public API, do your caller the favor of helping them find their bugs quickly, and check for valid inputs.
If you're writing an API where the caller might untrusted (or the caller of the caller), checked for valid inputs, because it's good security.
If your APIs are only reachable by trusted callers, like "internal" in C#, then don't feel like you have to write all that extra code. It won't be useful to anyone.

Resources