How is Ruby's throw-catch implemented? - ruby

In ruby you can throw :label so long as you've wrapped everything in a catch(:label) do block.
I want to add this to a custom lispy language but I'm not sure how it's implemented under the hood. Any pointers?

This is an example of a non-local exit. If you are using your host language's (in this case C) call stack for function calls in your target language (so e.g. a function call in your lisp equates to a function call in C), then the easiest way is to use your host language's form of non-local exit. In C that means setjmp/longjmp.
If, however, you are maintaining your target language's call stack separately then you have many options for how to do this. One really simple way would be to have each lexical-scope exit yield two values; the actual value returned, and an exception state, if any. Then you can check for the exception at runtime and propagate this value up. This has the downside of incurring extra cost to function calls when no condition is signaled, but may be sufficient for a toy language.
The book "Lisp In Small Pieces" covers about a half-dozen ways of handling this, if you're interested.

Related

In C++, how can one predict if move or copy semantics would be invoked?

Given the latitude that a C++ compiler has in instantiating temporary objects, and in invoking mechanisms like return value optimization etc., it is not always clear by looking at some code if move or copy semantics will be invoked (or how many).
It almost feels as if these primitives exist for incidental optimizations. That is, you may or may not get them. It seems like it's difficult to design any kind of resource management strategy that leverages moves, when it is hard to control the invocation of moves themselves.
Is there a way to predict clearly (and simply) where and how many copies and moves might occur in some code? Ideally, one would not need to be an expert in compiler internals to be able to do this.
It seems like it's difficult to design any kind of resource management strategy that leverages moves, when it is hard to control the invocation of moves themselves.
I would contradict here. Leveraging move semantics when designing a resource handling class should be done independently of how or when copy- or move-construction occurs in the client code. Once move-ctor/assignment is there, client code can be designed to leverage the existence of these special member functions.
Is there a way to predict clearly (and simply) where and how many copies and moves might occur in some code?
A bit hard to tell what simply means here, but this is how I understand it:
Given that a class has no move ctor/assignment operator, you will always get a copy. This is trivial, but important to keep in mind when working with e.g. classes in a legacy code that have user defined destructors and/or copy-ctor/assignment, because the compiler doesn't generate move ctors/assignment in this case.
Return value optimization. The question is tagged C++11, so you don't have guaranteed copy elision for initialization with prvalues brought by C++17. However, it is fair to assume that identical mechanism are already implemented by your compiler. Hence,
struct A {};
A func() { return A{}; }
can be assumed to construct the instance of A to which the function return value is bound on the calling side in place. This causes neither move nor copy construction. The same behavior can optimistically be assumed if the returned object has a name, as long as func() has no branching that renders NRVO impossible.
As an exception from this guideline, function return values that are also function parameters do not qualify for return value optimization. Hence, move/forward them to prevent copy in case A is move-constructible:
A func(A& a) { return std::move(a); }
The object created by the return value of func(A&) will hence be move-constructed.
Function parameters do not reveal per se how they behave, it depends on the type and its special member functions. Given
void f1(A a1) { A a2{std::move(a1)}; };
void f2(A& a1) { /* Same as above. */ };
void f1(A&& a1) { /* Again, same. */ };
the instances a2 are move-constructed if A has a move ctor, otherwise, it's copy.
There is a lot to discover beyond the exemplary cases above, I am neither capable of going into more detail, nor would this fit into the desired simplicity of an answer. Also, the scenario is different when you don't know the types you are dealing with, e.g. in function or class templates. In this case, a good read on how to deal with the related uncertainty of whether copies or moves are made is Item 29 in Eff. Modern C++ ("Assume that move operations are not present, not cheap, and not used").

Make-array in SBCL

How does make-array work in SBCL? Are there some equivalents of new and delete operators in C++, or is it something else, perhaps assembler level?
I peeked into the source, but didn't understand anything.
When using SBCL compiled from source and an environment like Emacs/Slime, it is possible to navigate the code quite easily using M-. (meta-point). Basically, the make-array symbol is bound to multiple things: deftransform definitions, and a defun. The deftransform are used mostly for optimization, so better just follow the function, first.
The make-array function delegates to an internal make-array% one, which is quite complex: it checks the parameters, and dispatches to different specialized implementation of arrays, based on those parameters: a bit-vector is implemented differently than a string, for example.
If you follow the case for simple-array, you find a function which calls allocate-vector-with-widetag, which in turn calls allocate-vector.
Now, allocate-vector is bound to several objects, multiple defoptimizers forms, a function and a define-vop form.
The function is only:
(defun allocate-vector (type length words)
(allocate-vector type length words))
Even if it looks like a recursive call, it isn't.
The define-vop form is a way to define how to compile a call to allocate-vector. In the function, and anywhere where there is a call to allocate-vector, the compiler knows how to write the assembly that implements the built-in operation. But the function itself is defined so that there is an entry point with the same name, and a function object that wraps over that code.
define-vop relies on a Domain Specific Language in SBCL that abstracts over assembly. If you follow the definition, you can find different vops (virtual operations) for allocate-vector, like allocate-vector-on-heap and allocate-vector-on-stack.
Allocation on heap translates into a call to calc-size-in-bytes, a call to allocation and put-header, which most likely allocates memory and tag it (I followed the definition to src/compiler/x86-64/alloc.lisp).
How memory is allocated (and garbage collected) is another problem.
allocation emits assembly code using %alloc-tramp, which in turns executes the following:
(invoke-asm-routine 'call (if to-r11 'alloc-tramp-r11 'alloc-tramp) node)
There are apparently assembly routines called alloc-tramp-r11 and alloc-tramp, which are predefined assembly instructions. A comment says:
;;; Most allocation is done by inline code with sometimes help
;;; from the C alloc() function by way of the alloc-tramp
;;; assembly routine.
There is a base of C code for the runtime, see for example /src/runtime/alloc.c.
The -tramp suffix stands for trampoline.
Have also a look at src/runtime/x86-assem.S.

How to keep only one return statement in a function?

Despite the discussion here, Should a function have only one return statement?, are there some simple tips or method to keep only one return statement? Or how to refactor a multiple return statement to a only one return statement?
First of all it is not clear why you want to do this as in the thread it is already mentioned that this feature was added in modern languages due to some reasons and to allow the user to get more functionality.
The wiki says a lot on it:
Some people make sure each function has a single entry, single exit
(SESE). These people argue that The use of a return statement violates
structured programming: it is an unstructured exit from the function,
resulting in multiple exit points, rather than the single exit point
required by structured programming. It has thus been argued[5] that
one should eschew the use of the explicit return statement except at
the textual end of a subroutine, considering that, when it is used to
"return early", it may suffer from the same sort of problems that
arise for the GOTO statement. Conversely, it can be argued that using
the return statement is worthwhile when the alternative is more
convoluted code, such as deeper nesting, harming readability.
Other people say that one or more "guard clauses" -- conditional
"early exit" return statements near the beginning of a function --
often make a function easier to read than the alternative.[6][7][8][9]
The most common problem in early exit is that cleanup or final
statements are not executed – for example, allocated memory is not
unallocated, or open files are not closed, causing leaks. These must
be done at each return site, which is brittle and can easily result in
bugs. For instance, in later development, a return statement could be
overlooked by a developer, and an action which should be performed at
the end of a subroutine (e.g., a trace statement) might not be
performed in all cases. Languages without a return statement, such as
standard Pascal don't have this problem. Some languages, such as C++
and Python, employ concepts which allow actions to be performed
automatically upon return (or exception throw) which mitigates some of
these issues – these are often known as "try/finally" or similar.
Ironically, functionality like these "finally" clauses can be
implemented by a goto to the single return point of the subroutine. An
alternative solution is to use the normal stack unwinding (variable
deallocation) at function exit to unallocate resources, such as via
destructors on local variables, or similar mechanisms such as Python's
"with" statement.
But even then if you want to achieve the Single entry and Single exit functionality then the best way to do is to get invalid cases out of the way first, either simply exiting or raising exceptions as appropriate, put a blank line in there, then add the "real" body of the method.
Also check Where did the notion of “one return only” come from?

Return from a subroutine

I want to write a subroutine for working out what to do and then returning.
Before you jump on the "A subroutine that returns is a function LOL!" bandwagon, I want the return to be executed as it were in the function body calling the subroutine, as though I've got a preprocessor to do the substitution, because otherwise this codebase is going to get unwieldy really fast, and returning the return value of a function seems kludgy.
Will vb (sorry I can't be more specific about what version- I'm writing formulas for an embedded system, our API docs are "it runs vb") let me do that or fall in a heap?
I want the return to be executed as it were in the function body calling the subroutine, as though I've got a preprocessor to do the substitution, because otherwise this codebase is going to get unwieldy really fast, and returning the return value of a function seems kludgy.
It's not. Tail-calls are a common practice that work just fine.
They are much preferable than having a function that cannot ever be called unless you want to return its value.
It sounds like you are asking whether C/C++ style macros can be implemented in VB, the answer is no. You could possibly fake it though by generating vbscript, and substituting the right things in.
Lambdas and delegates in VB.Net are not really the same thing as what you are asking for - if my interpretation is correct.

Should I make sure arguments aren't null before using them in a function?

The title may not really explain what I'm really trying to get at, couldn't really think of a way to describe what I mean.
I was wondering if it is good practice to check the arguments that a function accepts for nulls or empty before using them. I have this function which just wraps some hash creation like so.
Public Shared Function GenerateHash(ByVal FilePath As IO.FileInfo) As String
If (FilePath Is Nothing) Then
Throw New ArgumentNullException("FilePath")
End If
Dim _sha As New Security.Cryptography.MD5CryptoServiceProvider
Dim _Hash = Convert.ToBase64String(_sha.ComputeHash(New IO.FileStream(FilePath.FullName, IO.FileMode.Open, IO.FileAccess.Read)))
Return _Hash
End Function
As you can see I just takes a IO.Fileinfo as an argument, at the start of the function I am checking to make sure that it is not nothing.
I'm wondering is this good practice or should I just let it get to the actual hasher and then throw the exception because it is null.?
Thanks.
In general, I'd suggest it's good practice to validate all of the arguments to public functions/methods before using them, and fail early rather than after executing half of the function. In this case, you're right to throw the exception.
Depending on what your method is doing, failing early could be important. If your method was altering instance data on your class, you don't want it to alter half of the data, then encounter the null and throw an exception, as your object's data might them be in an intermediate and possibly invalid state.
If you're using an OO language then I'd suggest it's essential to validate the arguments to public methods, but less important with private and protected methods. My rationale here is that you don't know what the inputs to a public method will be - any other code could create an instance of your class and call it's public methods, and pass in unexpected/invalid data. Private methods, however, are called from inside the class, and the class should already have validated any data passing around internally.
One of my favourite techniques in C++ was to DEBUG_ASSERT on NULL pointers. This was drilled into me by senior programmers (along with const correctness) and is one of the things I was most strict on during code reviews. We never dereferenced a pointer without first asserting it wasn't null.
A debug assert is only active for debug targets (it gets stripped in release) so you don't have the extra overhead in production to test for thousands of if's. Generally it would either throw an exception or trigger a hardware breakpoint. We even had systems that would throw up a debug console with the file/line info and an option to ignore the assert (once or indefinitely for the session). That was such a great debug and QA tool (we'd get screenshots with the assert on the testers screen and information on whether the program continued if ignored).
I suggest asserting all invariants in your code including unexpected nulls. If performance of the if's becomes a concern find a way to conditionally compile and keep them active in debug targets. Like source control, this is a technique that has saved my ass more often than it has caused me grief (the most important litmus test of any development technique).
Yes, it's good practice to validate all arguments at the beginning of a method and throw appropriate exceptions like ArgumentException, ArgumentNullException, or ArgumentOutOfRangeException.
If the method is private such that only you the programmer could pass invalid arguments, then you may choose to assert each argument is valid (Debug.Assert) instead of throw.
If NULL is an inacceptable input, throw an exception. By yourself, like you did in your sample, so that the message is helpful.
Another method of handling NULL inputs is just to respont with a NULL in turn. Depends on the type of function -- in the example above I would keep the exception.
If its for an externally facing API then I would say you want to check every parameter as the input cannot be trusted.
However, if it is only going to be used internally then the input should be able to be trusted and you can save yourself a bunch of code that's not adding value to the software.
You should check all arguments against the set of assumptions that you make in that function about their values.
As in your example, if a null argument to your function doesn't make any sense and you're assuming that anyone using your function will know this then being passed a null argument shows some sort of error and some sort of action taken (eg. throwing an exception). And if you use asserts (as James Fassett got in and said before me ;-) ) they cost you nothing in a release version. (they cost you almost nothing in a debug version either)
The same thing applies to any other assumption.
And it's going to be easier to trace the error if you generate it than if you leave it to some standard library routine to throw the exception. You will be able to provide much more useful contextual information.
It's outside the bounds of this question, but you do need to expose the assumptions that your function makes - for example, through the comment header to your function.
According to The Pragmatic Programmer by Andrew Hunt and David Thomas, it is the responsibility of the caller to make sure it gives valid input. So, you must now choose whether you consider a null input to be valid. Unless it makes specific sense to consider null to be a valid input (e.g. it is probably a good idea to consider null to be a legal input if you're testing for equality), I would consider it invalid. That way your program, when it hits incorrect input, will fail sooner. If your program is going to encounter an error condition, you want it to happen as soon as possible. In the event your function does inadvertently get passed a null, you should consider it to be a bug, and react accordingly (i.e. instead of throwing an exception, you should consider making use of an assertion that kills the program, until you are releasing the program).
Classic design by contract: If input is right, output will be right. If input is wrong, there is a bug. (if input is right but output is wrong, there is a bug. That's a gimme.)
I'll add a couple of elaborations (in bold) to the excellent design by contract advice offerred by Brian earlier...
The priniples of "design by contract" require that you define what is acceptable for the caller to pass in (the valid domain of input values) and then, for any valid input, what the method/provider will do.
For an internal method, you can define NULLs as outside the domain of valid input parameters. In this case, you would immediately assert that the input parameter value is NOT NULL. The key insight in this contract specification is that any call passing in a NULL value IS A CALLER'S BUG and the error thrown by the assert statement is the proper behavior.
Now, while very well defined and parsimonius, if you're exposing the method to external/public callers, you should ask yourself, is that the contract I/we really want?
Probably not. In a public interface, you'd probably accept the NULL (as technically in the domain of inputs that the method accepts), but then decline to process gracefully w/ a return message. (More work to meet the naturally more complex customer-facing requirement.)
In either case, what you're after is a protocol that handles all of the cases from both the perspective of the caller and the provider, not lots of scattershot tests that can make it difficult to assess the completeness or lack of completeness of the contractual condition coverage.
Most of the time, letting it just throw the exception is pretty reasonable as long as you are sure the exception won't be ignored.
If you can add something to it, however, it doesn't hurt to wrap the exception with one that is more accurate and rethrow it. Decoding "NullPointerException" is going to take a bit longer than "IllegalArgumentException("FilePath MUST be supplied")" (Or whatever).
Lately I've been working on a platform where you have to run an obfuscator before you test. Every stack trace looks like monkeys typing random crap, so I got in the habit of checking my arguments all the time.
I'd love to see a "nullable" or "nonull" modifier on variables and arguments so the compiler can check for you.
If you're writing a public API, do your caller the favor of helping them find their bugs quickly, and check for valid inputs.
If you're writing an API where the caller might untrusted (or the caller of the caller), checked for valid inputs, because it's good security.
If your APIs are only reachable by trusted callers, like "internal" in C#, then don't feel like you have to write all that extra code. It won't be useful to anyone.

Resources