Cached and Filtered Aggregation Methods

Cached and Filtered Aggregation Methods - caching

I have a Polling system. I want to be able to calculate results based on filtered subsets of votes. To do this, I'm allowing the caller to pass a subquery to my model, which is then used to select only a subset of the votes when calculating results.
The catch is that I want to cache my results, and queries are flexible to the point of being horrible keys in a cache hash. This is where I come to you: How should I pass filters into my result methods in order to strike the best balance between good code practice, reliable caching (i.e. the coder can easily understand when things are being cached), and filter flexibility.
Option 1: Suck it up and live with query hashes
pass the $voteFilter into each method so that the methods look something like this:
class Poll {
getResults($voteFilter) {...} // Returns the poll results for passed filter
getWinner($voteFilter) {...} // Returns the winning result for passed filter
isTie($voteFilter) {...} // Returns tie status for passed filter
}
The methods would check their caches and if that filter query had been used, it just uses those results. This is risky because you could have the same result set generated by slightly different queries (i.e. the order of a reflective logical clause is swapped). I feel this means a coder could accidentally not be using a cache when he/she intends to.
This also feels like I'm passing a lot back and forth when I don't need to -- presumably the coder will be working with a single filter set across all of the result methods (When I want the results and the winner and the tie status, it will probably be for the same vote filter at any given moment)
Option 2: Set the filter using a separate class method
Pass the $voteFilter that is currently being used using a setFilter() method, starting a result session. This would reset the cache each time it is called and would dictate the filter used in result methods until the next time setFilter is called. Looking like this:
class Poll {
setVoteFilter($voteFilter) {...} // Clears cache and sets vote filter
getResults() {...} // Returns the poll results for current filter
getWinner() {...} // Returns the winning result for current filter
isTie() {...} // Returns tie status for current filter
}
This option feels more elegant and I like it, but I don't know if it is bad form and is the kind of thing I'll look at in two months and say "this is horrible. Why would I have made methods without explicit parameters"
Option 3: Define a more rigid filtering technique
If I limit the ways I can filter I can create filter parameters which have no room for confusion, solving the ambiguity issues in Option 1. This limits flexibility and could result in a less understandable API. I like this choice least but wanted to throw it out here for consideration in case someone has a strong thought.
Anyone have insight? Other options?

Imagine that Poll is immutable with a signature like:
class Poll {
withFilter(filter)
getFilter(...)
getResults(...)
getWinner(...)
}
Now, withFilter returns a Poll object that has the given filter applied (perhaps cumulative, however as you note, care must be taken -- e.g. an AST or context must be handle or filter must fall into a certain class of restrictions). The object returned can be a new Poll object or a cached Poll object -- if Poll is immutable it doesn't matter. If the cache is maintained entirely with reach-ability then this may also handle "cleanup" -- but that really depends upon language.
Happy coding.

Related

How do i 'destroy all' a given Resource type in redux-saga?

I'm new to Redux-Saga, so please assume very shaky foundational knowledge.
In Redux, I am able to define an action and a subsequent reducer to handle that action. In my reducer, i can do just about whatever i want, such as 'delete all' of a specific state tree node, eg.
switch action.type
...
case 'DESTROY_ALL_ORDERS'
return {
...state,
orders: []
}
However, it seems to me (after reading the docs), that reducers are defined by Saga, and you have access to them in the form of certain given CRUD verb prefixes with invocation post fixes. E.g.
fetchStart, destroyStart
My instinct is to use destroyStart, but the method accepts a model instance, not a collection, i.e. it only can destroy a given resource instance (in my case, one Order).
TL;DR
Is there a destroyStart equivalent for a group of records at once?
If not, is there a way i can add custom behavior to the Saga created reducers?
What have a missed? Feel free to be as mean as you want, I have no idea what i'm doing but when you are done roasting me do me a favor and point me in the right direction.
EDIT:
To clarify, I'm not trying to delete records from my database. I only want to clear the Redux store of all 'Order' Records.

Two key bit's of knowledge were gained here.
My team is using a library called redux-api-resources which to some extent I was conflating with Saga. This library was created by a former employee, and adds about as much complexity as it removes. I would not recommend it. DestroyStart is provided by this library, and not specifically related to Saga. However the answer for anyone using this library (redux-api-resources) is no, there is no bulk destroy action.
Reducers are created by Saga, as pointed out in the above comments by #Chad S.. The mistake in my thinking was that I believed I should somehow crack open this reducer and fill it with complex logic. The 'Saga' way to do this is to put logic in your generator function, which is where you (can) define your control flow. I make no claim that this is best practice, only that this is how I managed to get my code working.
I know very little about Saga and Redux in general, so please take these answers with a grain of salt.

Trying to identify if a data injection method has a name already

Lets say we have a class "Car" than has different pieces of data ( maker, model, color, fabrication date, registration date, etc). The class has no method to get data, but it knows to as for it from another object (sent via constructor, let's cal it for short DS).- and the same for when needing to update changes.
A method getColor() would be implemented like this
if(! this->loaded('color')){
this->askDS('color') // this will do the necesarry work to generate a request to DS
}
return this->information('color');
Nothing too fancy so far. No comes the part i want to find out if it has a name, or if there are libraries / frameworks that do this already.
DS has a list of methods registered dinamically based on the class that needs data. For car we have:
input: car serial number, output: method to use to read the numbers to extract raw values
input: car raw color value, output: color code
input: car color code, manufacturer, year, mode, output:human-readable color (for example navy blue)
Now, DS or any method does not have an ordered list of using command to start from serial number and return the color blue, but if can construct a chain of methods that from one set of data, it can run them in order and get the desired data.
For our example above, DS runs 1,2,3 in that order and injects the data resulted from all methods into the class object that needed it.
Now if the car needs registration info, we have method (4) that gets that from the police database with an api request.
So, given:
- a type of model (class/object)
- a list of methods that take a fixed list of input(object properties) and give out a fixed list of output (object properties)
- a class DS that can glue the methods and run the needed ones for a model to get from property A (serial) to properby B (human readable colour) without the model or DS having a preconfigured way to get this data but finding it as needed.
does this have a name or is it already implemented somewhere ?
I've implemented a very basic prototype and it works very nice and i think this implementation method has useful features:
if you have a set of methods that do sql queries and then your app switches to using an api, you only need to change the methods and don't have to touch any other part of the application
when looking for a chain of methods that resolve the 'need' the object has, you can find a method chain, run it, if it fails keep looking for another list of methods based on the currently available data - so if you have multiple sources for a piece of data, it can try multiple versions
starting from the above paragraph i could start with an app that only has sql queries for data retrieval - when i find out a part of the app overloads the sql server i could add a method to retrieve data from cache with a lower cost than the one from database (or multiple layered caches, each with different costs)
i could probably add business logi in the mix the same ways as cache, and based on the user location / options present different data
this requires less coding overall, and decouples the data source from the object, making each piece easier to mock/test
what is needed to make this fast is a caching solution for the discovered method chains, since matching hundreds of thousands of methods per model type would be time-consuming but I don't think this is very hard to do - just store all found chains in memory as you find them and some metadata to be able to resume a search from any point in time - when you update the methods, just clear the cache, take a performance hit for the first requests
Thank you for your time

What you describe sounds like a somewhat roundabout way of doing Dependency Injection. Quote:
"Passing the service to the client, rather than allowing a client to
build or find the service, is the fundamental requirement of the
pattern."
Depending on what language you're using, there should be several Dependency Injection frameworks/libraries available.

Returning both computation result and status. Best practices

I was thinking about patterns which allow me to return both computation result and status:
There are few approaches which I could think about:
function returns computation result, status is being returned via out parameter (not all languages support out parameters and this seems wrong, since in general you don't expect parameters to be modified).
function returns object/pair consisting both values (downside is that you have to create artificial class just to return function result or use pair which have no semantic meaning - you know which argument is which by it's order).
if your status is just success/failure you can return computation value, and in case of error throw an exception (look like the best approach, but works only with success/failure scenario and shouldn't be abused for controlling normal program flow).
function returns value, function arguments are delegates to onSuccess/onFailure procedures.
there is a (state-full) method class which have status field, and method returning computation results (I prefer having state-less/immutable objects).
Please, give me some hints on pros, cons and situations' preconditions of using aforementioned approaches or show me other patterns which I could use (preferably with hints on preconditions when to use them).
EDIT:
Real-world example:
I am developing java ee internet application and I have a class resolving request parameters converting them from string to some business logic objects. Resolver is checking in db if object is being created or edited and then return to controller either new object or object fetched from db. Controller is taking action based on object status (new/editing) read from resolver. I know it's bad and I would like to improve code design here.

function returns computation result, status is being returned via out
parameter (not all languages support out parameters and this seems
wrong, since in general you don't expect parameters to be modified).
If the language supports multiple output values, then the language clearly was made to support them. It would be a shame not to use them (unless there are strong opinions in that particular community against them - this could be the case for languages that try and do everything)
function returns object/pair consisting both values (downside is that
you have to create artificial class just to return function result or
use pair which have no semantic meaning - you know which argument is
which by it's order).
I don't know about that downside. It seems to me that a record or class called "MyMethodResult" should have enough semantics by itself. You can always use such a class in an exception as well, if you are in an exceptional condition only of course. Creating some kind of array/union/pair would be less acceptable in my opinion: you would inevitably loose information somewhere.
if your status is just success/failure you can return computation
value, and in case of error throw an exception (look like the best
approach, but works only with success/failure scenario and shouldn't
be abused for controlling normal program flow).
No! This is the worst approach. Exceptions should be used for exactly that, exceptional circumstances. If not, they will halt debuggers, put colleagues on the wrong foot, harm performance, fill your logging system and bugger up your unit tests. If you create a method to test something, then the test should return a status, not an exception: to the implementation, returning a negative is not exceptional.
Of course, if you run out of bytes from a file during parsing, sure, throw the exception, but don't throw it if the input is incorrect and your method is called checkFile.
function returns value, function arguments are delegates to
onSuccess/onFailure procedures.
I would only use those if you have multiple results to share. It's way more complex than the class/record approach, and more difficult to maintain. I've used this approach to return multiple results while I don't know if the results are ignored or not, or if the user wants to continue. In Java you would use a listener. This kind of operation is probably more accepted for functinal languages.
there is a (state-full) method class which have status field, and
method returning computation results (I prefer having
state-less/immutable objects).
Yes, I prefer those to. There are producers of results and the results themselves. There is little need to combine the two and create a stateful object.
In the end, you want to go to producer.produceFrom(x): Result in my opinion. This is either option 1 or 2a, if I'm counting correctly. And yes, for 2a, this means writing some extra code.

My inclination would be to either use out parameters or else use an "open-field" struct, which simply contains public fields and specifies that its purpose is simply to carry the values of those fields. While some people suggest that everything should be "encapsulated", I would suggest that if a computation naturally yields two double values called the Moe and Larry coefficients, specifying that the function should return "a plain-old-data struct with fields of type double called MoeCoefficient and LarryCoefficient" would serve to completely define the behavior of the struct. Although the struct would have to be declared as a data type outside the method that performs the computation, having its contents exposed as public fields would make clear that none of the semantics associated with those values are contained in the struct--they're all contained in the method that returns it.
Some people would argue that the struct should be immutable, or that it should include validation logic in its constructor, etc. I would suggest the opposite. If the purpose of the structure is to allow a method to return a group of values, it should be the responsibility of that method to ensure that it puts the proper values into the structure. Further, while there's nothing wrong with a structure exposing a constructor as a "convenience member", simply having the code that will return the struct fill in the fields individually may be faster and clearer than calling a constructor, especially if the value to be stored in one field depends upon the value stored to the other.
If a struct simply exposes its fields publicly, then the semantics are very clear: MoeCoefficient contains the last value that was written to MoeCoefficient, and LarryCoefficient contains the last value written to LarryCoefficient. The meaning of those values would be entirely up to whatever code writes them. Hiding the fields behind properties obscures that relationship, and may impede performance as well.

Do you ToList()?

Do you have a default type that you prefer to use in your dealings with the results of LINQ queries?
By default LINQ will return an IEnumerable<> or maybe an IOrderedEnumerable<>. We have found that a List<> is generally more useful to us, so have adopted a habit of ToList()ing our queries most of the time, and certainly using List<> in our function arguments and return values.
The only exception to this has been in LINQ to SQL where calling .ToList() would enumerate the IEnumerable prematurely.
We are also using WCF extensively, the default collection type of which is System.Array. We always change this to System.Collections.Generic.List in the Service Reference Settings dialog in VS2008 for consistency with the rest of our codebase.
What do you do?

ToList always evaluates the sequence immediately - not just in LINQ to SQL. If you want that, that's fine - but it's not always appropriate.
Personally I would try to avoid declaring that you return List<T> directly - usually IList<T> is more appropriate, and allows you to change to a different implementation later on. Of course, there are some operations which are only specified on List<T> itself... this sort of decision is always tricky.
EDIT: (I would have put this in a comment, but it would be too bulky.) Deferred execution allows you to deal with data sources which are too big to fit in memory. For instance, if you're processing log files - transforming them from one format to another, uploading them into a database, working out some stats, or something like that - you may very well be able to handle arbitrary amounts of data by streaming it, but you really don't want to suck everything into memory. This may not be a concern for your particular application, but it's something to bear in mind.

We have the same scenario - WCF communications to a server, the server uses LINQtoSQL.
We use .ToArray() when requesting objects from the server, because it's "illegal" for the client to change the list. (Meaning, there is no purpose to support ".Add", ".Remove", etc).
While still on the server, however, I would recommend that you leave it as it's default (which is not IEnumerable, but rather IQueryable). This way, if you want to filter even more based on some criteria, the filtering is STILL on the SQL side until evaluated.
This is a very important point as it means incredible performance gains or losses depending on what you do.
EXAMPLE:
// This is just an example... imagine this is on the server only. It's the
// basic method that gets the list of clients.
private IEnumerable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsEnumerable();
}
// This method here is actually called by the user...
public Client[] GetClientsForLoggedInUser()
{
var clients = GetClients().Where(client=> client.Owner == currentUser);
return clients.ToArray();
}
Do you see what's happening there? The "GetClients" method is going to force a download of ALL 'clients' from the database... THEN the Where clause will happen in the GetClientsForLoogedInUser method to filter it down.
Now, notice the slight change:
private IQueryable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsQueryable();
}
Now, the actual evaluation won't happen until ".ToArray" is called... and SQL will do the filtering. MUCH better!

In the Linq-to-Objects case, returning List<T> from a function isn't as nice as returning IList<T>, as THE VENERABLE SKEET points out. But often you can still do better than that. If the thing you are returning ought to be immutable, IList is a bad choice because it invites the caller to add or remove things.
For example, sometimes you have a method or property that returns the result of a Linq query or uses yield return to lazily generate a list, and then you realise that it would be better to do that the first time you're called, cache the result in a List<T> and return the cached version thereafter. That's when returning IList may be a bad idea, because the caller may modify the list for their own purposes, which will then corrupt your cache, making their changes visible to all other callers.
Better to return IEnumerable<T>, so all they have is forward iteration. And if the caller wants rapid random access, i.e. they wish they could use [] to access by index, they can use ElementAt, which Linq defines so that it quietly sniffs for IList and uses that if available, and if not it does the dumb linear lookup.
One thing I've used ToList for is when I've got a complex system of Linq expressions mixed with custom operators that use yield return to filter or transform lists. Stepping through in the debugger can get mighty confusing as it jumps around doing lazy evaluation, so I sometimes temporarily add a ToList() to a few places so that I can more easily follow the execution path. (Although if the things you are executing have side-effects, this can change the meaning of the program.)

It depends if you need to modify the collection. I like to use an Array when I know that no one is going to add/delete items. I use a list when I need to sort/add/delete items. But, usually I just leave it as IEnumerable as long as I can.

If you don't need the added features of List<>, why not just stick with IQueryable<> ?!?!?! Lowest common denominator is the best solution (especially when you see Timothy's answer).

Using return statements to great effect!

When I am making methods with return values, I usually try and set things up so that there is never a case when the method is called in such a way that it would have to return some default value. When I started I would often write methods that did something, and would either return what they did or, if they failed to do anything, would return null. But I hate having ugly if(!null) statements all over my code,
I'm reading a re-guide to ruby that I read many moons ago, by the pragmatic programmers, and I notice that they often return self (ruby's this) when they wouldn't normally return anything. This is, they say, in order to be able to chain method calls, as in this example using setters that return the object whose attributes they set.
tree.setColor(green).setDecor(gaudy).setPractical(false)
Initially I find this sort of thing attractive. There have been a couple of times when I have rejoiced at being able to chain method calls, like Player.getHand().getSize() but this is somewhat different in that the object of the method call changes from step to step.
What does Stack Overflow think about return values? Are there any patterns or idioms that come to mind warmly when you think of return values? Any great ways to avoid frustration and increase beauty?

In my humble opinion, there are three kinds of return-cases that you should take into consideration:
Object property manipulation
The first is the manipulation of object properties. The pattern you describe here is very often used when manipulating objects. A very typical scenario is using it together with a factory. Consider this hypothetical creation call:
// When the object has manipulative methods:
Pizza p = PizzaFactory().create().addAnchovies().addTomatoes();
// When the factory has manipulative methods working on the
// object, IMHO more elegant from a semantic point of view:
Pizza p = PizzaFactory().create().addAnchovies().addTomatoes().getPizza();
It allows for a quick grasp at what exactly is being created or how an object is manipulated, because the methods form one human-readable expression. It's definitely nice, but don't overuse. A rule of thumb is that this might be used with methods whose return value you could also declare as void.
Evaluating object properties
The second might be when a method evaluates something on an object. Consider, for example, the method car.getCurrentSpeed(), that could be interpreted as a message to an object asking for the current speed and returning that. It would simply return the value, not too complicated. :)
Make object do this or that
The third might be when a method makes an perform an operation, returning some sort of value indicating how well the caller's intention was fulfilled - but laying out such a method could be difficult:
int new_gear = 20;
if (car.gears.changeGear(new_gear)) // does that mean success or fail?
This is where you can see a difficulty in designing the method. Should it return 0 upon success or failure? How about -1 if the gear could not be set, because the car only has 5 gears? Does that mean the current gear is at -1 now, too? The method could return the gear it changed to, meaning you would have to compare the argument supplied to the method to the return code. That would work. On the other hand, you could simply return either true or false for failure or false or true for failure. Which one to use could be decided by estimating if you'd expect those method calls to rather fail or succeed.
In my humble opinion, there is a way to better express the semantics of such return values, by giving them a semantic description. Future developers interacting with your objects will love you for not having to look up the comments or documentation for your methods:
class GearSystem {
// (...)
public:
enum GearChangeResult
{ GearChangeSuccess, NonExistingGear, MechanicalGearProblem };
GearChangeResult changeGear (int gear);
};
That way, it becomes perfectly obvious for any programmer looking at your code, what the return value means (consider: if (gears.changeGear(20) == GearSystem::GearChangeSuccess) - much clearer what that means than the example above)
Antipattern: Failures as return codes.
The fourth possibility for a return value I actually omitted, because in my opinion it isn't any: when there's an error in your program, like a logic error or a failure that needs to be dealt with - you could theoretically return a value indicating so. But today, that's not done so often anymore (or should not be), because for that, there are exceptions.

I don't agree that methods should never return null. The most obvious examples are from systems programming. For instance, if someone asks to open a file, you simply have to give them null if the open fails. There is no sane alternative. There are other cases where null is appropriate, such as a getNextNode(node) method, when called on the last node of a linked list. So I guess what these cases have in common is that null represents "no object" (either no file handle or no list node), which makes sense.
In other cases, the method should never fail, and there is an appropriate exception facility. Then, I think method chaining like your example can be used to great effect. I think it's a bit funny that you seem to believe this is an innovation of the "Pragmatic Programmers". In fact, it dates to Lisp if not before.

Returning this is also used in the "builder pattern", another case where method chaining can enhance readability as well as writing convenience.
A null is often returned as an out-of-band value to indicate that no result could be produced. I believe that this is perfectly reasonable when getting no result is a normal event; examples would include a null return from readLine() at end-of-file, or a null returned when providing a non-existent key to the get(...) method of a Map. Reading to the end of the file is normal behavior (as opposed to an IOException, which indicates that something went abnormally wrong while trying to read). Similarly, looking up a key and being told that it has no value is a normal case.
A good alternative to null for some cases is a "null object", which is a full-fledged instance of the result class, but which has appropriate state and behavior for a "nobody's home" case. For instance, the result of looking up a non-existent user ID might well be a NullUser object which has a zero-length name and no permissions to do anything in the system.

It's confusing to me. OO programming languages need Smalltalk's semicolon:
tree color: green;
decor: gaudy;
practical: false.
obj method1; method2. means "call method1 on obj then method2 on obj". This kind of object setup is very common.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Cached and Filtered Aggregation Methods - caching

Related

How do i 'destroy all' a given Resource type in redux-saga?

Trying to identify if a data injection method has a name already

Returning both computation result and status. Best practices

Do you ToList()?

Using return statements to great effect!

Categories

Resources