Name that technique (it may be called 'piggybacking') - algorithm

What is the name of the following method/technique (I'll try to describe the best I could, background on "memoization" is probably needed to understand why this technique can be very useful):
You start some potentially lenghty asynchronous computation and you realize that an identical computation has already been started but is not done yet and you "piggyback" on the first computation. Then when the first computation ends, it issues not one but two callbacks.
The goal is to not needlessly start a second computation because you know that there's already an identical computation running.
Note that altough not entirely dissimilar, I'm not looking for the particular case of caching that "memoization" is: memoization is when you start a computation and find a cached (memoized) result of that same computation that is already done that you can reuse.
Here I'm looking for the name of the technique that is in a way a bit similar to memoization (in that it is can be useful for some of the same reasons that memoization is a useful technique), except that it reuses the result of the first computation even if the first computation is not done yet at the time you issue the second computation.
I've always called that technique "piggybacking" but I don't know if this is correct.
I've actually used this more than once as some kind of "memoization on steroids" and it came very handy.
I just don't know what the name of this (advanced ?) technique is.
EDIT
Damn, I wanted to comment on epatel's answer but it disappeared. epatel's answer gave me an idea, this technique could be called "lazy memoization" :)

This is just memoization of futures.
Normal "eager" memoization works like this:
f_memo(x):
critical_section:
if (exists answers(f,x))
return answers(f,x)
else
a = f(x)
answers(f,x) = a
return a
Now if f(x) returns futures instead of actual results, the above code works as is. You get the piggyback effect, i.e. like this:
First thread calls f(3)
There is no stored answer for f(3), so in the critical section there's a call to f(3). f(3) is implemented as returning a future, so the 'answer' is ready immediately; 'a' in the code above is set to the future F and the future F is stored in the answers table
The future F is returned as the "result" of the call f(3), which is potentially still ongoing
Another thread calls f(3)
The future F is found from the table, and returned immediately
Now both threads have handle to the result of the computation; when they try to read it, they block until the computation is ready---in the post this communication mechanism was mentioned as being implemented by a callback, presumeably in a context where futures are less common

Sounds like a future: http://en.wikipedia.org/wiki/Future_%28programming%29

In some contexts, I've heard this called "Request Merging".

Sounds a little like Lazy Evaluation, but not exactly...

Related

Debugging recursion is so hard

is there anyway that i can debug infinite recursion errors fast?I get an infinite recursion error and i know that it happens because a base case is missing so its executing itself so many times that the call stack is exceeded. But my problem is with finding where exactly its missing the base case and to do so by just steping out and in using the developer tool debugger in the browser will take hours. So is there anyway that i can do it fast and jump to exactly where the base case is missing?
(pause on exeption doesnt work for the recursion)
No, there is no way to do what you want to do. If you had an infinite while loop, you wouldn't expect the computer to magically tell you when the loop should have ended because the computer has no idea what you want. Similarly, if you have infinite recursion, there's no way for the computer to tell you where the recursion should have ended because the computer has no idea what you want.
There are some general tips for both programming and recursion which will simplify the task.
First, always try to simplify your code as much as possible. If your code is too complicated, try separating the code into helper functions and documenting exactly what the code should do using documentation comments or doc strings, and test each function one at a time. If you're writing a basic factorial function, it's almost inconceivable that you would miss the base case if you have any practice in recursion because the code is so short. But if you're writing something very complicated (ie 10+ lines), it's easy for mistakes to slip through the cracks.
Second, you should make sure that you have a proper notion of the "size of the input" and ensure that whenever you make a recursive call, you make the call on a smaller input. Remember that "size" must be measured by an unsigned integer (aka natural number) - you can't let size go negative, and you can't let size be a fraction. As long as you do these checks, your recursion will always terminate.

Is declaring a variable and using it has better performance than getting the variable every time?

This is kind of naïve question. But still asking.. Considering the following code:
func1(obj.state)
func2(obj.state)
func3(obj.state)
func4(obj.state)
Does replacing above code with below has any performance improvement or it doesn't matter at all (modern compilers can optimize these things themselves..?).
value = obj.state
func1(value)
func2(value)
func3(value)
func4(value)
If state was instead a big function that takes some time to compute, then surely second code would have better performance. I'm asking in the case when its just a state.
I thought of this because, in first case it has first go to the reference of object, and then it has to go to reference pointed by state. But in second case it can directly go to the reference pointed by value. It is a tradeoff between space and time.
Also does this differ from language to language?
Question to you: does better performance matter to you if the result is not correct?
The first code fragment uses latest state for each function call, the second - the same state for all calls. If you know that the state doesn’t change, and if the compiler doesn’t know that - the second fragment is better. Otherwise use the first.

Python Recursion Understanding Issue

I'm a freshman in cs, and I am having a little issue understanding the content of recursion in python.
I have 3 questions that I want to ask, or to confirm if I am understanding the content correctly.
1: I'm not sure the purpose of using base case in recursion. Is base case worked to terminate the recursion somewhere in my program?
2: Suppose I have my recursive call above any other code of my program. Is the program going to run the recursion first and then to run the content after recursion?
3: How to trace recursion properly with a reasonable correctness rate? I personally think it's really hard to trace recursion and I can barely find some instruction online.
Thanks for answering my questions.
Yes. The idea is that the recursive call will continue to call itself (with different inputs) until it calls the base case (or one of many base cases). The base case is usually a case of your problem where the solution is trivial and doesn't rely on any other parts of the problem. It then walks through the calls backwards, building on the answer it got for the simplest version of the question.
Yes. Interacting with recursive functions from the outside is exactly the same as interacting with any other function.
Depends on what you're writing, and how comfortable you are with debugging tools. If you're trying to track what values are getting passed back and forth between recursive calls, printing all the parameters at the start of the function and the return value at the end can take you pretty far.
The easiest way to wrap your head around this stuff is to write some of your own. Start with some basic examples (the classic is the factorial function) and ask for help if you get stuck.
Edit: If you're more math-oriented, you might look up mathematical induction (you will learn it anyway as part of cs education). It's the exact same concept, just taught a little differently.

Why should a function have only one exit-point? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I've always heard about a single exit-point function as a bad way to code because you lose readability and efficiency. I've never heard anybody argue the other side.
I thought this had something to do with CS but this question was shot down at cstheory stackexchange.
There are different schools of thought, and it largely comes down to personal preference.
One is that it is less confusing if there is only a single exit point - you have a single path through the method and you know where to look for the exit. On the minus side if you use indentation to represent nesting, your code ends up massively indented to the right, and it becomes very difficult to follow all the nested scopes.
Another is that you can check preconditions and exit early at the start of a method, so that you know in the body of the method that certain conditions are true, without the entire body of the method being indented 5 miles off to the right. This usually minimises the number of scopes you have to worry about, which makes code much easier to follow.
A third is that you can exit anywhere you please. This used to be more confusing in the old days, but now that we have syntax-colouring editors and compilers that detect unreachable code, it's a lot easier to deal with.
I'm squarely in the middle camp. Enforcing a single exit point is a pointless or even counterproductive restriction IMHO, while exiting at random all over a method can sometimes lead to messy difficult to follow logic, where it becomes difficult to see if a given bit of code will or won't be executed. But "gating" your method makes it possible to significantly simplify the body of the method.
My general recommendation is that return statements should, when practical, either be located before the first code that has any side-effects, or after the last code that has any side-effects. I would consider something like:
if (!argument) // Check if non-null
return ERR_NULL_ARGUMENT;
... process non-null argument
if (ok)
return 0;
else
return ERR_NOT_OK;
clearer than:
int return_value;
if (argument) // Non-null
{
.. process non-null argument
.. set result appropriately
}
else
result = ERR_NULL_ARGUMENT;
return result;
If a certain condition should prevent a function from doing anything, I prefer to early-return out of the function at a spot above the point where the function would do anything. Once the function has undertaken actions with side-effects, though, I prefer to return from the bottom, to make clear that all side-effects must be dealt with.
With most anything, it comes down to the needs of the deliverable. In "the old days", spaghetti code with multiple return points invited memory leaks, since coders that preferred that method typically did not clean up well. There were also issues with some compilers "losing" the reference to the return variable as the stack was popped during the return, in the case of returning from a nested scope. The more general problem was one of re-entrant code, which attempts to have the calling state of a function be exactly the same as its return state. Mutators of oop violated this and the concept was shelved.
There are deliverables, most notably kernels, which need the speed that multiple exit points provide. These environments normally have their own memory and process management, so the risk of a leak is minimized.
Personally, I like to have a single point of exit, since I often use it to insert a breakpoint on the return statement and perform a code inspect of how the code determined that solution. I could just go to the entrance and step through, which I do with extensively nested and recursive solutions. As a code reviewer, multiple returns in a function requires a much deeper analysis - so if you're doing it to speed up the implementation, you're robbing Peter to save Paul. More time will be required in code reviews, invalidating the presumption of efficient implementation.
-- 2 cents
Please see this doc for more details: NISTIR 5459
Single entry and exit point was original concept of structured programming vs step by step Spaghetti Coding. There is a belief that multiple exit-point functions require more code since you have to do proper clean up of memory spaces allocated for variables. Consider a scenario where function allocates variables (resources) and getting out of the function early and without proper clean up would result in resource leaks. In addition, constructing clean-up before every exit would create a lot of redundant code.
I used to be an advocate of single-exit style. My reasoning came mostly from pain...
Single-exit is easier to debug.
Given the techniques and tools we have today, this is a far less reasonable position to take as unit tests and logging can make single-exit unnecessary. That said, when you need to watch code execute in a debugger, it was much harder to understand and work with code containing multiple exit points.
This became especially true when you needed to interject assignments in order to examine state (replaced with watch expressions in modern debuggers). It was also too easy to alter the control flow in ways that hid the problem or broke the execution altogether.
Single-exit methods were easier to step through in the debugger, and easier to tease apart without breaking the logic.
In my view, the advice to exit a function (or other control structure) at only one point often is oversold. Two reasons typically are given to exit at only one point:
Single-exit code is supposedly easier to read and debug. (I admit that I don't think much of this reason, but it is given. What is substantially easier to read and debug is single-entry code.)
Single-exit code links and returns more cleanly.
The second reason is subtle and has some merit, especially if the function returns a large data structure. However, I wouldn't worry about it too much, except ...
If a student, you want to earn top marks in your class. Do what the instructor prefers. He probably has a good reason from his perspective; so, at the very least, you'll learn his perspective. This has value in itself.
Good luck.
The answer is very context dependent. If you are making a GUI and have a function which initialises API's and opens windows at the start of your main it will be full of calls which may throw errors, each of which would cause the instance of the program to close. If you used nested IF statements and indent your code could quickly become very skewed to the right. Returning on an error at each stage might be better and actually more readable while being just as easy to debug with a few flags in the code.
If, however, you are testing different conditions and returning different values depending on the results in your method it may be much better practice to have a single exit point. I used to work on image processing scripts in MATLAB which could get very large. Multiple exit points could make the code extremely hard to follow. Switch statements were much more appropriate.
The best thing to do would be to learn as you go. If you are writing code for something try finding other people's code and seeing how they implement it. Decide which bits you like and which bits you don't.
If you feel like you need multiple exit points in a function, the function is too large and is doing too much.
I would recommend reading the chapter about functions in Robert C. Martin's book, Clean Code.
Essentially, you should try to write functions with 4 lines of code or less.
Some notes from Mike Long’s Blog:
The first rule of functions: they should be small
The second rule of functions: they should be smaller than that
Blocks within if statements, while statements, for loops, etc should be one line long
…and that line of code will usually be a function call
There should be no more than one or maybe two levels of indentation
Functions should do one thing
Function statements should all be at the same level of abstraction
A function should have no more than 3 arguments
Output arguments are a code smell
Passing a boolean flag into a function is truly awful. You are by definition doing two --things in the function.
Side effects are lies.

Code structure: should I use lots of functions to increase readability?

My question has Bash and PowerShell scripts in mind, but I suppose it applies to other languages as well.
It is my understanding that the purpose of a function is to perform the same (or a very similar) task multiple times. This decreases the amount of code in the script and it also makes it easier to maintain.
With that in mind, if you discover that your script only calls a function one time then there's no reason for that function to exist as a function. Instead, you should take the function's code and place it in the location where that function is being called.
Having said all that, here's my question:
If I have a complicated script, should I move each section of code into its own function even though each function will only be called once? This would greatly increase the script's readability because its logic (the functions) would all be at the top of the script and the flow of execution would be at the bottom of the script. Since 50 lines of code would be represented by just 1 line, it would be much easier to understand what the script is doing.
Do other people do this? Are there disadvantages to this approach?
Having functions also increases readability. So a bash script might look better and be easier to follow if it reads:
getParams()
startTask()
doSomethingElse()
finishTask()
# implement functions below
even if the function implementations are simple, it reads better.
Code readability is indeed a major concern, usually (nowadays) more important than sheer amount of code or performance. Not to mention that inlining function calls may not necessarily have noticeable performance benefits (very language specific).
So lots of developers (I venture to say that the better of the breed :-) create small functions/methods like you describe, to partition their code into logically cohesive parts.
A function does a well-defined task. If you have a mega function that does 5 different things, it strongly suggests it should be calling 5 smaller functions.
It is my understanding that the purpose of a function is to perform the same (or a very similar) task multiple times.
Well, it is my understanding that a function is a discrete entity that performs a specific, well defined task.
With that in mind, if you discover that your script calls a given function AT LEAST ONCE, then it's doing its job.
Focus on being able to read and easily understand your code.
Having clear, readable code is definitely more a payoff than being afraid of function calls overhead. That's just a premature optimisation.
Plus, the goal of a function is to accomplish a particular task. A task can be a sub-task, there's nothing wrong with that!
Read this book
http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882
Here are some quotes from the book.
"Small!
The first rule of functions is that they should be small. The second rule of functions is that
they should be smaller than that."
"FUNCTIONS SHOULD DO ONE THING. THEY SHOULD DO IT WELL.THEY SHOULD DO IT ONLY."
As far as my knowledge is concerned, a function represents a Sequence of steps which become a part of larger program.
Coming to your question, I strongly agree that function(s) improve readability and re-usability. But at the same time breaking every thing into pieces might not be a good practice.
Finally, I want to give one statement : "Anything In Excess Is Not Beneficial!"

Resources