How to test a Reducer containing a Mutation - hadoop

I'm attempting to use MRUnit, but none of the examples I've seen quite match what I'm trying to do.
My reducer outputs a key and mutation, but I can't seem to compare the mutation with what's expected. It shows the objects as being the same, but with an address of 0 and the following error:
junit.framework.AssertionFailedError: expected: <org.apache.accumulo.core.data.Mutation#0> but was <org.apache.accumulo.core.data.Mutation#0>
I'm using the reduceDriver.run() method, and attempting assertEquals on my expected mutation object with the actual. Is there anything I'm missing?
Thanks for any input.

Mutation does not have an appropriate equals() implementation. you're best bet is to probably compare the results of getUpdates() and getRow(). They return a List and byte[] respectively and those are easily comparable.

Mutation has an appropriate equals() method, at least it does in the 1.4.x line. However, that method calls a private serialize() method that modifies the data about to be checked.
I've gotten around this in the past by wrapping the Mutation in a new Mutation, which calls serialize under the hood:
assertEquals(expectedMutation, new Mutation(actualMutation));

You can extend Mutation and pass the new class to mrunit. Just override equals in the new class.

Related

Mocked repository does not trigger as expected

I have a Controller Unit test using Mockito and MockMvc.
After a POST request, the POSTed object is resolved correctly, but my repository mock is not triggered.
Here is the mock code:
Date mydate = new Date();
Notification not = new Notification();
not.setId(-1L);
not.setUserid("BaBlubb");
not.setTimestamp(mydate);
not.setContent("MyContent");
Notification not2 = new Notification();
not2.setId(1L);
not2.setUserid("BaBlubb");
not2.setTimestamp(mydate);
not2.setContent("MyContent");
when(notificationRepository.save(not)).thenReturn(not2);
So this really just should simulate the saving of an object (ID is set and a route is generated out of it).
Unfortunately, the repository always returns null, so my code later fails when trying to return a new created Route out of null.
The mocks are correctly injected and do work for e.g. String comparison or if I only check for the function to have been called, I just can't get it to trigger on Objects.
The same issue occurs on
verify(notificationRepository, times(1)).save(not);
it does not trigger.
The questions would be:
1.) Why does the mock not trigger? I suppose it does not check for value equality in the object, but for object identifiers, which are not the same since the object is serialized and de-serialized in between.
2.) How can I get a generic mock? e.g. whenever repository.save() is called, no matter the parameter, it always should perform a specific way, e.g. instead of
when(notificationRepository.save(not)).thenReturn(not2);
I'd like to have
when(notificationRepository.save()).thenReturn(not2);
P.S. If for whatever reason you need the rest of the code, here is the submitted part, object is the json representation of the notification (with jackson)
mockMvc.perform(post("/api/notification").content(object)
.contentType(MediaType.APPLICATION_JSON)
.accept(MediaType.APPLICATION_JSON));
and here is the Controller header, the Object is de-serialized perfectly, the values are 1:1 the same
#RequestMapping(method=RequestMethod.POST)
public ResponseEntity<?> postNotification(#RequestBody Notification n) {
logger.debug("Saving userid "+n.getId());
Thanks for helping.
1.) Why does the mock not trigger? I suppose it does not check for value equality in the object, but for object identifiers, which are not the same...
By default, Mockito delegates to your object's equals method. If you haven't overridden that, then it checks references by default. The following two lines are equivalent:
when(notificationRepository.save(not)).thenReturn(not2);
when(notificationRepository.save(not)).thenReturn(eq(not2)); // uses eq explicitly
If all Notification objects with the same fields are the same, then overriding equals and hashCode will get you where you need to go. Beware, though, that this may have unintended side-effects with regard to Set and Map behavior, especially if your Notification objects don't have an ID until they are saved.
2.) How can I get a generic mock? e.g. whenever repository.save() is called, no matter the parameter, it always should perform a specific way
With Matchers, this is very simple:
when(notificationRepository.save(not)).thenReturn(any(Notification.class));
Though Matchers are very powerful, beware: they have some tricky rules associated with their use.
For (1), as mentioned by Jeff, You might need to use eq() instead of direct reference to not1
For (2) You can use Mockito.any()
For e.g. when(notificationRepository.save(any(Notification.class))).thenReturn(not2);
This would create stubbing on mocked object notificationRepository which would always return not2 for any argument which is of type Notification. if save() method accepts Object then you can writewhen(notificationRepository.save(any(Object.class))).thenReturn(not2); which would return not2 for any argument of type Object

Coding Style: How to Make Obvious Determination of Parameter's Type We Have To Pass To a Function?

What is the best way to document the type of parameters that a function expects to receive?
Sometimes a function uses only one or two fields of an object. Sometimes this fields have common names (get(), set(), reset(), etc.). In this situation we must leave a comments:
...
#staticmethod
def get( postId, obj ):
"""obj is instance of class Type1, not Type2"""
inner = obj.get()
Is there a more explicit way to make it obvious? Maybe an object name should contain expecting typename?
Given python's 'duck-typing' (late bound) behaviour, it would be a mistake to require a particular type.
If you know which types your function must not take, you can raise an exception after detecting those; otherwise, simply raise an exception if the object passed does not support the appropriate protocol.
As to documentation, just put the required protocol in the docstring.
One strength of python is "duck typing", that is not to rely on the actual type of a variable, but on its behaviour. So I'd suggest, that you document the field, that the object should contain.
"""obj should have a field 'foo' like in class 'bar' or 'baz' """
First of all, name your methods properly, and use properties if they make sense.
You should try to get the hang of duck-typing. It's pretty useful. And if not, try and see if abstract base classes helps you do what you want.

is it possible to tell rspec to warn me if a stub returns a different type of object than the method it's replacing?

I have a method called save_title:
def save_title (data)
...
[ if the record exists, update, return 0]
[ if the record is new, create, return 1]
end
All fine, until I stubbed it:
saved_rows = []
proc.stub(:save_title) do |arg|
saved_rows << arg
end
The bug here is that I was using the integer returned from the real method to determine how many records were created vs. updated. The stub doesn't return an integer. Oooops. So the code worked fine in reality, but appeared broken in the test. A while later (more than I care to admit, cursing included) I realize the stub and the real method don't behave the same. Such are the pitfalls of dynamic languages I suppose.
Questions:
Can I tell rspec to warn me if the stub doesn't return the same sort of thing as the real method?
Is there an analyzer gem that I can use to warn about this sort of thing?
Is there some sort of best practice that I don't know about with returning values from methods?
1) There is no way that rspec can know what type of object the method is supposed to return, that's for you to tell it, however...
2) There is something you can look into. Instead of using a stub, try using a mock instead as your test double. It is basically the same thing as a stub, however, you can do many more validations on it (check out the documentation here). Things like how many times the specific method was called, the arguments it should be called with and what the return value should be as well. Your test will fail if any of those validations don't pass.
3) The best practice would be the method name itself. For example, methods ending in ? like object.exists? should always return a boolean value. In your case, I would suggest a refactoring of your method, maybe divide it in two, one for updating and one for creating and have another method to tell you if an object exists or not. It is not good practice to have a method behave in two different ways depending on the input (see separation of concerns)
Good luck! hope this helps.

Returning both computation result and status. Best practices

I was thinking about patterns which allow me to return both computation result and status:
There are few approaches which I could think about:
function returns computation result, status is being returned via out parameter (not all languages support out parameters and this seems wrong, since in general you don't expect parameters to be modified).
function returns object/pair consisting both values (downside is that you have to create artificial class just to return function result or use pair which have no semantic meaning - you know which argument is which by it's order).
if your status is just success/failure you can return computation value, and in case of error throw an exception (look like the best approach, but works only with success/failure scenario and shouldn't be abused for controlling normal program flow).
function returns value, function arguments are delegates to onSuccess/onFailure procedures.
there is a (state-full) method class which have status field, and method returning computation results (I prefer having state-less/immutable objects).
Please, give me some hints on pros, cons and situations' preconditions of using aforementioned approaches or show me other patterns which I could use (preferably with hints on preconditions when to use them).
EDIT:
Real-world example:
I am developing java ee internet application and I have a class resolving request parameters converting them from string to some business logic objects. Resolver is checking in db if object is being created or edited and then return to controller either new object or object fetched from db. Controller is taking action based on object status (new/editing) read from resolver. I know it's bad and I would like to improve code design here.
function returns computation result, status is being returned via out
parameter (not all languages support out parameters and this seems
wrong, since in general you don't expect parameters to be modified).
If the language supports multiple output values, then the language clearly was made to support them. It would be a shame not to use them (unless there are strong opinions in that particular community against them - this could be the case for languages that try and do everything)
function returns object/pair consisting both values (downside is that
you have to create artificial class just to return function result or
use pair which have no semantic meaning - you know which argument is
which by it's order).
I don't know about that downside. It seems to me that a record or class called "MyMethodResult" should have enough semantics by itself. You can always use such a class in an exception as well, if you are in an exceptional condition only of course. Creating some kind of array/union/pair would be less acceptable in my opinion: you would inevitably loose information somewhere.
if your status is just success/failure you can return computation
value, and in case of error throw an exception (look like the best
approach, but works only with success/failure scenario and shouldn't
be abused for controlling normal program flow).
No! This is the worst approach. Exceptions should be used for exactly that, exceptional circumstances. If not, they will halt debuggers, put colleagues on the wrong foot, harm performance, fill your logging system and bugger up your unit tests. If you create a method to test something, then the test should return a status, not an exception: to the implementation, returning a negative is not exceptional.
Of course, if you run out of bytes from a file during parsing, sure, throw the exception, but don't throw it if the input is incorrect and your method is called checkFile.
function returns value, function arguments are delegates to
onSuccess/onFailure procedures.
I would only use those if you have multiple results to share. It's way more complex than the class/record approach, and more difficult to maintain. I've used this approach to return multiple results while I don't know if the results are ignored or not, or if the user wants to continue. In Java you would use a listener. This kind of operation is probably more accepted for functinal languages.
there is a (state-full) method class which have status field, and
method returning computation results (I prefer having
state-less/immutable objects).
Yes, I prefer those to. There are producers of results and the results themselves. There is little need to combine the two and create a stateful object.
In the end, you want to go to producer.produceFrom(x): Result in my opinion. This is either option 1 or 2a, if I'm counting correctly. And yes, for 2a, this means writing some extra code.
My inclination would be to either use out parameters or else use an "open-field" struct, which simply contains public fields and specifies that its purpose is simply to carry the values of those fields. While some people suggest that everything should be "encapsulated", I would suggest that if a computation naturally yields two double values called the Moe and Larry coefficients, specifying that the function should return "a plain-old-data struct with fields of type double called MoeCoefficient and LarryCoefficient" would serve to completely define the behavior of the struct. Although the struct would have to be declared as a data type outside the method that performs the computation, having its contents exposed as public fields would make clear that none of the semantics associated with those values are contained in the struct--they're all contained in the method that returns it.
Some people would argue that the struct should be immutable, or that it should include validation logic in its constructor, etc. I would suggest the opposite. If the purpose of the structure is to allow a method to return a group of values, it should be the responsibility of that method to ensure that it puts the proper values into the structure. Further, while there's nothing wrong with a structure exposing a constructor as a "convenience member", simply having the code that will return the struct fill in the fields individually may be faster and clearer than calling a constructor, especially if the value to be stored in one field depends upon the value stored to the other.
If a struct simply exposes its fields publicly, then the semantics are very clear: MoeCoefficient contains the last value that was written to MoeCoefficient, and LarryCoefficient contains the last value written to LarryCoefficient. The meaning of those values would be entirely up to whatever code writes them. Hiding the fields behind properties obscures that relationship, and may impede performance as well.

How Does Queryable.OfType Work?

Important The question is not "What does Queryable.OfType do, it's "how does the code I see there accomplish that?"
Reflecting on Queryable.OfType, I see (after some cleanup):
public static IQueryable<TResult> OfType<TResult>(this IQueryable source)
{
return (IQueryable<TResult>)source.Provider.CreateQuery(
Expression.Call(
null,
((MethodInfo)MethodBase.GetCurrentMethod()).MakeGenericMethod(
new Type[] { typeof(TResult) }) ,
new Expression[] { source.Expression }));
}
So let me see if I've got this straight:
Use reflection to grab a reference to the current method (OfType).
Make a new method, which is exactly the same, by using MakeGenericMethod to change the type parameter of the current method to, er, exactly the same thing.
The argument to that new method will be not source but source.Expression. Which isn't an IQueryable, but we'll be passing the whole thing to Expression.Call, so that's OK.
Call Expression.Call, passing null as method (weird?) instance and the cloned method as its arguments.
Pass that result to CreateQuery and cast the result, which seems like the sanest part of the whole thing.
Now the effect of this method is to return an expression which tells the provider to omit returning any values where the type is not equal to TResult or one of its subtypes. But I can't see how the steps above actually accomplish this. It seems to be creating an expression representing a method which returns IQueryable<TResult>, and making the body of that method simply the entire source expression, without ever looking at the type. Is it simply expected that an IQueryable provider will just silently not return any records not of the selected type?
So are the steps above incorrect in some way, or am I just not seeing how they result in the behavior observed at runtime?
It's not passing in null as the method - it's passing it in as the "target expression", i.e. what it's calling the method on. This is null because OfType is a static method, so it doesn't need a target.
The point of calling MakeGenericMethod is that GetCurrentMethod() returns the open version, i.e. OfType<> instead of OfType<YourType>.
Queryable.OfType itself isn't meant to contain any of the logic for omitting returning any values. That's up to the LINQ provider. The point of Queryable.OfType is to build up the expression tree to include the call to OfType, so that when the LINQ provider eventually has to convert it into its native format (e.g. SQL) it knows that OfType was called.
This is how Queryable works in general - basically it lets the provider see the whole query expression as an expression tree. That's all it's meant to do - when the provider is asked to translate this into real code, that's where the magic happens.
Queryable couldn't possibly do the work itself - it has no idea what sort of data store the provider represents. How could it come up with the semantics of OfType without knowing whether the data store was SQL, LDAP or something else? I agree it takes a while to get your head round though :)

Resources