Checking for validity in Java - validation

I have heard that it is good practice to check public method arguments for validity and throw exceptions in the case that they are not valid. I have also heard that you should check the arguments of private methods using assertions.
A couple questions I have are:
Should you ever pass objects with multiple fields into private methods?
If you do, should you check the validity of all fields in the public method before doing so or check at time of use?
Should asserts in private methods be used just to check arguments or also in the case where you have object as null and it tries to call a method as shown below?
A.doSomething()

Should you ever pass objects with multiple fields into private methods?
There's nothing inherently wrong, bad, or special in doing so. You'll find yourself very restricted if you try to avoid passing complex objects to private methods.
If you do, should you check the validity of all fields in the public method before doing so or check at time of use?
This is entirely up to you.
If checking ahead of time will avoid doing a lot of unnecessary work, then it's probably a good idea to check earlier rather than later. (No point doing a lot of CPU work or some long network calls if you know that you've been given invalid input in the first place!)
However, you don't always know that what you've been given is invalid so there are cases where it might be impossible to check right away.
Should asserts in private methods be used just to check arguments or also in the case where you have object as null and it tries to call a method as shown below?
Asserts are typically used to make sure that your logic is right, not that inputs are right. Suppose that you're certain that the logic in some method is such that it will never return a negative number, you could put in an assert that will alert you if it does which indicates either a mistake in the implementation or in your design. You should use Exceptions to catch invalid inputs.

Related

Detect if golang method is internal?

I'm writing a function that iterates over the methods on a given struct and binds the methods to handlers. I would like to skip over internal methods if possible. I'm not sure if this is possible to do so explicitly - I reviewed the documentation for the reflect package and I didn't see a means to detect if a given Value is an internal method. I know I can get the method's name, and then check if it starts with a lowercase character but I'm not sure if there's a kosher way to accomplish this. It's also possible that the internal / public boundary really only exists at compile time anyways, so there really isn't even a way of knowing this beyond the method's name. In either case I'd like to know for sure. Thanks!
The reflect package will not give you unexported methods via Type.Method. Pretty much, if you can see it via reflect, it is already exported.
See https://play.golang.org/p/61qQYO38P0

Returning both computation result and status. Best practices

I was thinking about patterns which allow me to return both computation result and status:
There are few approaches which I could think about:
function returns computation result, status is being returned via out parameter (not all languages support out parameters and this seems wrong, since in general you don't expect parameters to be modified).
function returns object/pair consisting both values (downside is that you have to create artificial class just to return function result or use pair which have no semantic meaning - you know which argument is which by it's order).
if your status is just success/failure you can return computation value, and in case of error throw an exception (look like the best approach, but works only with success/failure scenario and shouldn't be abused for controlling normal program flow).
function returns value, function arguments are delegates to onSuccess/onFailure procedures.
there is a (state-full) method class which have status field, and method returning computation results (I prefer having state-less/immutable objects).
Please, give me some hints on pros, cons and situations' preconditions of using aforementioned approaches or show me other patterns which I could use (preferably with hints on preconditions when to use them).
EDIT:
Real-world example:
I am developing java ee internet application and I have a class resolving request parameters converting them from string to some business logic objects. Resolver is checking in db if object is being created or edited and then return to controller either new object or object fetched from db. Controller is taking action based on object status (new/editing) read from resolver. I know it's bad and I would like to improve code design here.
function returns computation result, status is being returned via out
parameter (not all languages support out parameters and this seems
wrong, since in general you don't expect parameters to be modified).
If the language supports multiple output values, then the language clearly was made to support them. It would be a shame not to use them (unless there are strong opinions in that particular community against them - this could be the case for languages that try and do everything)
function returns object/pair consisting both values (downside is that
you have to create artificial class just to return function result or
use pair which have no semantic meaning - you know which argument is
which by it's order).
I don't know about that downside. It seems to me that a record or class called "MyMethodResult" should have enough semantics by itself. You can always use such a class in an exception as well, if you are in an exceptional condition only of course. Creating some kind of array/union/pair would be less acceptable in my opinion: you would inevitably loose information somewhere.
if your status is just success/failure you can return computation
value, and in case of error throw an exception (look like the best
approach, but works only with success/failure scenario and shouldn't
be abused for controlling normal program flow).
No! This is the worst approach. Exceptions should be used for exactly that, exceptional circumstances. If not, they will halt debuggers, put colleagues on the wrong foot, harm performance, fill your logging system and bugger up your unit tests. If you create a method to test something, then the test should return a status, not an exception: to the implementation, returning a negative is not exceptional.
Of course, if you run out of bytes from a file during parsing, sure, throw the exception, but don't throw it if the input is incorrect and your method is called checkFile.
function returns value, function arguments are delegates to
onSuccess/onFailure procedures.
I would only use those if you have multiple results to share. It's way more complex than the class/record approach, and more difficult to maintain. I've used this approach to return multiple results while I don't know if the results are ignored or not, or if the user wants to continue. In Java you would use a listener. This kind of operation is probably more accepted for functinal languages.
there is a (state-full) method class which have status field, and
method returning computation results (I prefer having
state-less/immutable objects).
Yes, I prefer those to. There are producers of results and the results themselves. There is little need to combine the two and create a stateful object.
In the end, you want to go to producer.produceFrom(x): Result in my opinion. This is either option 1 or 2a, if I'm counting correctly. And yes, for 2a, this means writing some extra code.
My inclination would be to either use out parameters or else use an "open-field" struct, which simply contains public fields and specifies that its purpose is simply to carry the values of those fields. While some people suggest that everything should be "encapsulated", I would suggest that if a computation naturally yields two double values called the Moe and Larry coefficients, specifying that the function should return "a plain-old-data struct with fields of type double called MoeCoefficient and LarryCoefficient" would serve to completely define the behavior of the struct. Although the struct would have to be declared as a data type outside the method that performs the computation, having its contents exposed as public fields would make clear that none of the semantics associated with those values are contained in the struct--they're all contained in the method that returns it.
Some people would argue that the struct should be immutable, or that it should include validation logic in its constructor, etc. I would suggest the opposite. If the purpose of the structure is to allow a method to return a group of values, it should be the responsibility of that method to ensure that it puts the proper values into the structure. Further, while there's nothing wrong with a structure exposing a constructor as a "convenience member", simply having the code that will return the struct fill in the fields individually may be faster and clearer than calling a constructor, especially if the value to be stored in one field depends upon the value stored to the other.
If a struct simply exposes its fields publicly, then the semantics are very clear: MoeCoefficient contains the last value that was written to MoeCoefficient, and LarryCoefficient contains the last value written to LarryCoefficient. The meaning of those values would be entirely up to whatever code writes them. Hiding the fields behind properties obscures that relationship, and may impede performance as well.

Which error code to return when a passed object doesn't implement a necessary interface?

In COM when I have a well-known interface that I can't change:
interface IWellKnownInterface {
HRESULT DoStuff( IUnknown* );
};
and my implementation of IWellKnownInterface::DoStuff() can only work when the passed object implements some specific interface how do I handle this situation?
HRESULT CWellKnownInterfaceImpl::DoStuff( IUnknown* param )
{
//this will QI for the specific interface
ATL::CComQIPtr<ISpecificInterface> object( param );
if( object == 0 )
//clearly the specifil interface is not supported
return E_INVALIDARG;
}
// proceed with implementation
}
In case the specific interface is not supported which error code should I return? Is returning E_INVALIDARG appropriate?
E_INVALIDARG is a fine choice, but the most important thing is to ensure that the precise conditions for each return code that you use are well documented.
Additionally, you could consider implementing ISupportErrorInfo and then returning rich error information via CreateErrorInfo and SetErrorInfo. This is especially useful in cases where you think callers may benefit from having a custom error message generated at the point of failure, with all of the relevant context contained therein. In your case, this might be to identify specifically which argument is invalid and which interface was unimplemented for it to be so. Even though such a message is unlikely to be of value to an end user, it could be invaluable to a developer if it shows up in a log file or the event viewer.
Yes. At least, that is what I've learned to expect a microsoft API call to return in such a case.
I beg to differ with #nobugz. E_NOINTERFACE has the very specific meaning that your object does not support a certain interface that your client has just requested.
In general, you should only return E_NOINTERFACE from your own IUnknown::QueryInterface(). Returning E_NOINTERFACE from DoStuff() would surprise end users and tools alike. If I saw E_NOINTERFACE coming from anywhere other than QueryInterface(), my immediate thought would be "abstraction leak!".
At first sight, E_INVALIDARG looks like the best options available -- after all, you were passed an argument that doesn't work for you. But in my view there is a subtlety here, so I'm not sure if I can make a universal rule out of that.
In an ideal world, the error codes returned by a COM method are more about the Interface than about the object. Concretely, IWellKnownInterface::DoStuff() doesn't define (necessarily) that the object being passed in must implement ISpecificInterface, or it would have a different argument type. So, technically you're cheating potential users of your object by not quite fully implementing the semantics of your interface. Of course, in the real world, it's very hard to create a meaningful open system if you make that into an absolute rule and interfaces are that rigid.
So: if you can provide reduced functionality that still meets the interface semantics even if your argument doesn't implement ISpecificInterface, you should consider doing that.
Assuming that you must return in error: if this is a well-know, well documented (well designed?) interface, I would probably glance at the interface documentation first, to see if they give you any guidance. Lacking that, I would probably go ahead and use E_INVALIDARG.
#Phil Booth has some nice recommendations about how to provide more information to your user about the nature of the error.
Obviously, if you had the chance you would want to change the interface to take a ISpecificInterface* param instead.
Here's another possibilty you might or might not like better.
In Visual Basic 6, there are many objects with methods that take a VARIANT argument and look for several alternative interfaces they can use (e.g. the DataSource member of data-bound controls). That looks a lot like your situation. Those objects return DISP_E_TYPEMISMATCH, which is a well understood return value by most people.
On the other hand, I totally understand if you don't want to return a DISP_xxxx error code from non-IDISPATCH based interfaces. There is a strong argument for saying that DISP_xxxx error codes are intened for VARIANT parameters only, and a method that takes an IUnknown* should not be returning error them.
I'm on the fence on this one.

Where to perform Parameter Validation within nested methods

Where is the proper place to perform validation given the following scenario/code below:
In MethodA only: since this is the public method which is meant to be used by external assemblies?
In MethodA and B since both these can be accessed outside the class?
Or Methods A, B and C since method C may be used by another internal method (but it might not efficient since the programmer can see the code for MethodC already and therefore should be able to know the valid parameters to pass)?
Thanks for any input.
public class A
{
public void MethodA(param)
{
MethodB(param);
}
internal void MethodB(param)
{
MethodC(param);
}
private void MethodC(param)
{
}
}
Parameter validation should always be performed regardless of the caller's location (inside or outside of the assembly). Defensive programming, one can say.
MethodC; that way the parameter always gets checked, even if someone comes along later and adds a call to MethodC from within class A, or they make MethodC public. Any exception should be bubbled up to where it can be best dealt with.
There isn't a 'proper' place, except to adhere to DRY principles and avoid copying the validation code to several places. I'd normally suggest that you delay validation to the latest possible stage, as then if the parameter is never used you don't need to spend time validating it though. This also gives the validation some locality to the place it is used, and you never need to think 'oh, has this parameter been validated yet?' as the validation is right there.
Given that a more likely senario would involve each method having different parameters and also probably some
if (P1 == 1) { MethodA(P2) } else { MethodB(P2) }
type logic in hte longer term it makes more sense to validate each parameter at the point of entry, escpecially as you may want different error handling depending on where hte method was called.
If the validation logic for a given parameter start to get complex ( i.e. more than five lines of code) then consider a private method to validate that parameter.

Using return statements to great effect!

When I am making methods with return values, I usually try and set things up so that there is never a case when the method is called in such a way that it would have to return some default value. When I started I would often write methods that did something, and would either return what they did or, if they failed to do anything, would return null. But I hate having ugly if(!null) statements all over my code,
I'm reading a re-guide to ruby that I read many moons ago, by the pragmatic programmers, and I notice that they often return self (ruby's this) when they wouldn't normally return anything. This is, they say, in order to be able to chain method calls, as in this example using setters that return the object whose attributes they set.
tree.setColor(green).setDecor(gaudy).setPractical(false)
Initially I find this sort of thing attractive. There have been a couple of times when I have rejoiced at being able to chain method calls, like Player.getHand().getSize() but this is somewhat different in that the object of the method call changes from step to step.
What does Stack Overflow think about return values? Are there any patterns or idioms that come to mind warmly when you think of return values? Any great ways to avoid frustration and increase beauty?
In my humble opinion, there are three kinds of return-cases that you should take into consideration:
Object property manipulation
The first is the manipulation of object properties. The pattern you describe here is very often used when manipulating objects. A very typical scenario is using it together with a factory. Consider this hypothetical creation call:
// When the object has manipulative methods:
Pizza p = PizzaFactory().create().addAnchovies().addTomatoes();
// When the factory has manipulative methods working on the
// object, IMHO more elegant from a semantic point of view:
Pizza p = PizzaFactory().create().addAnchovies().addTomatoes().getPizza();
It allows for a quick grasp at what exactly is being created or how an object is manipulated, because the methods form one human-readable expression. It's definitely nice, but don't overuse. A rule of thumb is that this might be used with methods whose return value you could also declare as void.
Evaluating object properties
The second might be when a method evaluates something on an object. Consider, for example, the method car.getCurrentSpeed(), that could be interpreted as a message to an object asking for the current speed and returning that. It would simply return the value, not too complicated. :)
Make object do this or that
The third might be when a method makes an perform an operation, returning some sort of value indicating how well the caller's intention was fulfilled - but laying out such a method could be difficult:
int new_gear = 20;
if (car.gears.changeGear(new_gear)) // does that mean success or fail?
This is where you can see a difficulty in designing the method. Should it return 0 upon success or failure? How about -1 if the gear could not be set, because the car only has 5 gears? Does that mean the current gear is at -1 now, too? The method could return the gear it changed to, meaning you would have to compare the argument supplied to the method to the return code. That would work. On the other hand, you could simply return either true or false for failure or false or true for failure. Which one to use could be decided by estimating if you'd expect those method calls to rather fail or succeed.
In my humble opinion, there is a way to better express the semantics of such return values, by giving them a semantic description. Future developers interacting with your objects will love you for not having to look up the comments or documentation for your methods:
class GearSystem {
// (...)
public:
enum GearChangeResult
{ GearChangeSuccess, NonExistingGear, MechanicalGearProblem };
GearChangeResult changeGear (int gear);
};
That way, it becomes perfectly obvious for any programmer looking at your code, what the return value means (consider: if (gears.changeGear(20) == GearSystem::GearChangeSuccess) - much clearer what that means than the example above)
Antipattern: Failures as return codes.
The fourth possibility for a return value I actually omitted, because in my opinion it isn't any: when there's an error in your program, like a logic error or a failure that needs to be dealt with - you could theoretically return a value indicating so. But today, that's not done so often anymore (or should not be), because for that, there are exceptions.
I don't agree that methods should never return null. The most obvious examples are from systems programming. For instance, if someone asks to open a file, you simply have to give them null if the open fails. There is no sane alternative. There are other cases where null is appropriate, such as a getNextNode(node) method, when called on the last node of a linked list. So I guess what these cases have in common is that null represents "no object" (either no file handle or no list node), which makes sense.
In other cases, the method should never fail, and there is an appropriate exception facility. Then, I think method chaining like your example can be used to great effect. I think it's a bit funny that you seem to believe this is an innovation of the "Pragmatic Programmers". In fact, it dates to Lisp if not before.
Returning this is also used in the "builder pattern", another case where method chaining can enhance readability as well as writing convenience.
A null is often returned as an out-of-band value to indicate that no result could be produced. I believe that this is perfectly reasonable when getting no result is a normal event; examples would include a null return from readLine() at end-of-file, or a null returned when providing a non-existent key to the get(...) method of a Map. Reading to the end of the file is normal behavior (as opposed to an IOException, which indicates that something went abnormally wrong while trying to read). Similarly, looking up a key and being told that it has no value is a normal case.
A good alternative to null for some cases is a "null object", which is a full-fledged instance of the result class, but which has appropriate state and behavior for a "nobody's home" case. For instance, the result of looking up a non-existent user ID might well be a NullUser object which has a zero-length name and no permissions to do anything in the system.
It's confusing to me. OO programming languages need Smalltalk's semicolon:
tree color: green;
decor: gaudy;
practical: false.
obj method1; method2. means "call method1 on obj then method2 on obj". This kind of object setup is very common.

Resources