According to Effective Go, the function math.Sin cannot be used to define a constant because that function must happen at run-time.
What is the reasoning behind this limitation? Floating-point consistency? Quirk of the Sin implementation? Something else?
There is support for this sort of thing in other languages. In C, for example: as of version 4.3, GCC supports compile-time calculation of the sine function. (See section "General Optimizer Improvements").
However, as noted in this blog post by Bruce Dawson, this can cause unexpected issues. (See section "Compile-time versus run-time sin").
Is this a relevant concern in Go? Or is this usage restricted for a different reason?
Go doesn't support initializing a constant with the result of a function. Functions are called at runtime, not at compile time. But constants are defined at compile time.
It would be possible to make exceptions for certain functions (like math.Sin for example), but that would make the spec more complicated. The Go developers generally prefer to keep the spec simple and consistent.
Go simply lacks the concept. There is no way of marking a function as pure (its return value depends only on its arguments, and it doesn't alter any kind of mutable state or perform I/O), there is no way for the compiler to infer pureness, and there's no attempt to evaluate any expression containing a function call at compile-time (because doing so for anything except a pure function of constant arguments would be a source of weird behavior and bugs, and because adding the machinery needed to make it work right would introduce quite a bit of complexity).
Yes, this is a substantial loss, which forces a tradeoff between code with bad runtime behavior, and code which is flat-out ugly. Go partisans will choose the ugly code and tell you that you are a bad human being for not finding it beautiful.
The best thing you have available to you is code generation. The integration of go generate into the toolchain and the provision of a complete Go parser in the standard library makes it relatively easy to munge code at build time, and one of the things that you can do with this ability is create more advanced constant-folding if you so choose. You still get all of the debuggability peril of code generation, but it's something.
Related
Maybe a newb's question but if you never ask youll never know
Will using Stripe's Sorbet (https://sorbet.org/) on a RoR app, can potentially improve the app's performance?
(performance meaning response times, not robustness \ runtime error rate)
I did some reading on dynamically typed languages (particularly Javascript in this case) and found out that if we keep sending some function (foo for example) the same type of objects, the engine does some optimising work on that function, so that when it is invoked again with the same types, there interpreting work would be quicker.
I thought maybe ruby interpreter does a similar work which can potentially mean that type-checking may increase interpreting speed
I thought maybe ruby interpreter does a similar work which can potentially mean that type-checking may increase interpreting speed
It doesn't yet, but one could potentially build this one day.
Goal of Sorbet was to build a type system for people, compared to building a type system for computers(compiler). It can introduce some performance overhead, but as Stripe runs it in production, we keep it in check. Internally, we page us if overhead is >7% of cpu time.
I did some reading on dynamically typed languages (particularly Javascript in this case) and found out that if we keep sending some function (foo for example) the same type of objects, the engine does some optimising work on that function, so that when it is invoked again with the same types, there interpreting work would be quicker.
Yes, this can be done. What you're describing is a common optimization in Just-In-Time(JIT) compilers. The technique that you seem to refer to uses run time profiling and actually is a common alternative technique that allows to achieve this result in absence of type system. It's also worth noting that well-build JITs can do it more frequently than a type system, as type system encodes what could happen, while profiling & JITs can optimize for what actually happens in practice.
That said, building a JIT is frequently much more work than building an online compiler, thus, depending on amount of investment one wants to put into speeding up Ruby, either using building a JIT or using types can prove better under different real-world constrains.
I thought maybe ruby interpreter does a similar work which can potentially mean that type-checking may increase interpreting speed
Summarizing the previous paragraph, Sorbet typesystem does not currently speedup Ruby, but it doesn't slow it down much either.
Type systems could be indeed used to speed up languages, but they aren't your only tool, with profiling & JIT compilation being the major competitor.
the optimizations you are talking about apply more to the JIT that is beeing worked on for the ruby runtime.
in general, sorbet aims at type-safety by introducing type interfaces or method signatures. they enable static type-checks that are applied before deploying the application in order to get rid of "type errors".
sorbet comes with a runtime component that can enforce type checks at runtime in your runnable application, but those are going to decrease the applications performance as they wrap method-calls in order to check for correct types https://sorbet.org/docs/runtime#runtime-checked-sig-s
Go has a nice generic implementation of introsort using the generic sort.Interface. But one of the advantages of C++'s std::sort is that you can specify a compare functor which is inlined and unnecessary function calls are omitted.
Can we force the current native Go compilers somehow to inline those sort.Swap and sort.Less calls? I do not consider gccgo because it gives us terrible results compared to the native compiler.
For Go 1.x the answer is a sound no.
The Go compiler may inline some classes of functions on some platforms in some specific ways. You might even be able to figure out some trickery through which you can make the current compiler inline the sort-"functor" on your architecture but there is absolutely no guarantee whatsoever that this will hold in future releases and it may work against you (even break your code, depending on the level of hackery involved) when the compiler changes.
A major release of Go (2.x?) might provide some guarantees about optimizations; but for Go 1.x there's nothing like that. And even if the Go authors wanted to do this it's difficult to provide such guarantees for all platforms (remember Go runs on many architectures).
Even more generally it is a bad idea to rely on compiler-specific behavior, in any language.
I have found a piece of code which has UB, and was told to leave it in the code, with a comment that states it is UB. Using MSVC2012 only.
The code itself has a raw array of Foo objects, then casts that array to char* with reinterpret_cast<char*> and then calls delete casted_array (like this, not delete[]) on it.
Like this:
Foo* foos = new Foo[500];
char* CastedFoos = reinterpret_cast<char*>(foos);
delete CastedFoos;
Per the Standard 5.3.5/3 this is clearly Undefined Behavior.
Apparently this code does what it does to avoid having to call destructors as an optimisation.
I wondered, is there actually places where leaving UB in the code could be considered valid?
Also, as far as I'm concerned, leaving the above in code is not smart, am I right?
It depends entirely on your perspective.
Take an extreme example: in C++03, threads were undefined behavior. As soon as you had more than one thread, your program's behavior was no longer defined by the C++ standard.
And yet, most people would say threads are useful.
Of course, multithreading may have been UB according to the C++ standard, but individual compilers didn't treat it as undefined. They provided an additional guarantee that multithreading is going to work as you'd expect.
When talking about C++ in the abstract, UB has no uses whatsoever. How could it? You don't know what could or would happen.
But in specific applications, specific code compiled by specific compilers to run on specific operating systems, you may sometimes know that a piece of UB is (1) safe, and (2) ends up having some kind of beneficial effect.
The C++ standard defines "undefined behaviour" as follows:
behavior for which this standard imposes no requirements
So if you want your code to be portable to different compilers and platforms, then your code should not depend on undefined behavior, because what the programs (that are produced by different compilers compiling your code) do in these cases may vary.
If you don't care about portability, then you should check if your compiler documents how it behaves under the circumstances of interest. If it doesn't document what it does (and it doesn't have to), beware that the compiler could change what it does without warning between different versions. Also note that its behaviour may be non-deterministic. So for example it could crash 1% of the time, which you may not notice in ad-hoc testing, but will come back and bite you later when it goes into production. So even if you are using one compiler, it may still be a bad idea to depend on undefined behavior.
With regard to your specific example, you can rewrite it to achieve the same effect (not calling destructor, but reclaiming memory) in a way that does not result in undefined behaviour. Allocate a std::aligned_storage to hold the Foo array, call placement new to construct the Foo array on the aligned_storage, then when you want to deallocate the array, deallocate the aligned_storage without calling placement delete.
Of course this is still a terrible design, may cause memory leaks or other problems depending on what Foo::~Foo() was supposed to do, but at least it isn't UB.
I was at the StackOverflow Dev Days convention yesterday, and one of the speakers was talking about Python. He showed a Memoize function, and I asked if there was any way to keep it from being used on a non-pure function. He said no, that's basically impossible, and if someone could figure out a way to do it it would make a great PhD thesis.
That sort of confused me, because it doesn't seem all that difficult for a compiler/interpreter to solve recursively. In pseudocode:
function isPure(functionMetadata): boolean;
begin
result = true;
for each variable in functionMetadata.variablesModified
result = result and variable.isLocalToThisFunction;
for each dependency in functionMetadata.functionsCalled
result = result and isPure(dependency);
end;
That's the basic idea. Obviously you'd need some sort of check to prevent infinite recursion on mutually-dependent functions, but that's not too difficult to set up.
Higher-order functions that take function pointers might be problematic, since they can't be verified statically, but my original question presupposes that the compiler has some sort of language constraint to designate that only a pure function pointer can be passed to a certain parameter. If one existed, that could be used to satisfy the condition.
Obviously this would be easier in a compiled language than an interpreted one, since all this number-crunching would be done before the program is executed and so not slow anything down, but I don't really see any fundamental problems that would make it impossible to evaluate.
Does anyone with a bit more knowledge in this area know what I'm missing?
You also need to annotate every system call, every FFI, ...
And furthermore the tiniest 'leak' tends to leak into the whole code base.
It is not a theoretically intractable problem, but in practice it is very very difficult to do in a fashion that the whole system does not feel brittle.
As an aside, I don't think this makes a good PhD thesis; Haskell effectively already has (a version of) this, with the IO monad.
And I am sure lots of people continue to look at this 'in practice'. (wild speculation) In 20 years we may have this.
It is particularly hard in Python. Since anObject.aFunc can be changed arbitrarily at runtime, you cannot determine at compile time which function will anObject.aFunc() call or even if it will be a function at all.
In addition to the other excellent answers here: Your pseudocode looks only at whether a function modifies variables. But that's not really what "pure" means. "Pure" typically means something closer to "referentially transparent." In other words, the output is completely dependent on the input. So something as simple as reading the current time and making that a factor in the result (or reading from input, or reading the state of the machine, or...) makes the function non-pure without modifying any variables.
Also, you could write a "pure" function that did modify variables.
Here's the first thing that popped into my mind when I read your question.
Class Hierarchies
Determining if a variable is modified includes the act of digging through every single method which is called on the variable to determine if it's mutating. This is ... somewhat straight forward for a sealed type with a non-virtual method.
But consider virtual methods. You must find every single derived type and verify that every single override of that method does not mutate state. Determining this is simply not possible in any language / framework which allows for dynamic code generation or is simply dynamic (if it's possible, it's extremely difficult). The reason why is that the set of derived types is not fixed because a new one can be generated at runtime.
Take C# as an example. There is nothing stopping me from generating a derived class at runtime which overrides that virtual method and modifies state. A static verified would not be able to detect this type of modification and hence could not validate the method was pure or not.
I think the main problem would be doing it efficiently.
D-language has pure functions but you have to specify them yourself, so the compiler would know to check them. I think if you manually specify them then it would be easier to do.
Deciding whether a given function is pure, in general, is reducible to deciding whether any given program will halt - and it is well known that the Halting Problem is the kind of problem that cannot be solved efficiently.
Note that the complexity depends on the language, too. For the more dynamic languages, it's possible to redefine anything at any time. For example, in Tcl
proc myproc {a b} {
if { $a > $b } {
return $a
} else {
return $b
}
}
Every single piece of that could be modified at any time. For example:
the "if" command could be rewritten to use and update global variables
the "return" command, along the same lines, could do the same thing
the could be an execution trace on the if command that, when "if" is used, the return command is redefined based on the inputs to the if command
Admittedly, Tcl is an extreme case; one of the most dynamic languages there is. That being said, it highlights the problem that it can be difficult to determine the purity of a function even once you've entered it.
It seems to me that the most invaluable thing about a static/strongly-typed programming language is that it helps refactoring: if/when you change any API, then the compiler will tell you what that change has broken.
I can imagine writing code in a runtime/weakly-typed language ... but I can't imagine refactoring without the compiler's help, and I can't imagine writing tens of thousands of lines of code without refactoring.
Is this true?
I think you're conflating when types are checked with how they're checked. Runtime typing isn't necessarily weak.
The main advantage of static types is exactly what you say: they're exhaustive. You can be confident all call sites conform to the type just by letting the compiler do it's thing.
The main limitation of static types is that they're limited in the constraints they can express. This varies by language, with most languages having relatively simple type systems (c, java), and others with extremely powerful type systems (haskell, cayenne).
Because of this limitation types on their own are not sufficient. For example, in java types are more or less restricted to checking type names match. This means the meaning of any constraint you want checked has to be encoded into a naming scheme of some sort, hence the plethora of indirections and boiler plate common to java code. C++ is a little better in that templates allow a bit more expressiveness, but don't come close to what you can do with dependent types. I'm not sure what the downsides to the more powerful type systems are, though clearly there must be some or more people would be using them in industry.
Even if you're using static typing, chances are it's not expressive enough to check everything you care about, so you'll need to write tests too. Whether static typing saves you more effort than it requires in boilerplate is a debate that's raged for ages and that I don't think has a simple answer for all situations.
As to your second question:
How can we re-factor safely in a runtime typed language?
The answer is tests. Your tests have to cover all the cases that matter. Tools can help you in gauging how exhaustive your tests are. Coverage checking tools let you know wether lines of code are covered by the tests or not. Test mutation tools (jester, heckle) can let you know if your tests are logically incomplete. Acceptance tests let you know what you've written matches requirements, and lastly regression and performance tests ensure that each new version of the product maintains the quality of the last.
One of the great things about having proper testing in place vs relying on elaborate type indirections is that debugging becomes much simpler. When running the tests you get specific failed assertions within tests that clearly express what they're doing, rather than obtuse compiler error statements (think c++ template errors).
No matter what tools you use: writing code you're confident in will require effort. It most likely will require writing a lot of tests. If the penalty for bugs is very high, such as aerospace or medical control software, you may need to use formal mathematical methods to prove the behavior of your software, which makes such development extremely expensive.
I totally agree with your sentiment. The very flexibility that dynamically typed languages are supposed to be good at is actually what makes the code very hard to maintain. Really, is there such a thing as a program that continues to work if the data types are changed in a non trivial way without actually changing the code?
In the mean time, you could check the type of variable being passed, and somehow fail if its not the expected type. You'd still have to run your code to root out those cases, but at least something would tell you.
I think Google's internal tools actually do a compilation and probably type checking to their Javascript. I wish I had those tools.
To start, I'm a native Perl programmer so on the one hand I've never programmed with the net of static types. OTOH I've never programmed with them so I can't speak to their benefits. What I can speak to is what its like to refactor.
I don't find the lack of static types to be a problem wrt refactoring. What I find a problem is the lack of a refactoring browser. Dynamic languages have the problem that you don't really know what the code is really going to do until you actually run it. Perl has this more than most. Perl has the additional problem of having a very complicated, almost unparsable, syntax. Result: no refactoring tools (though they're working very rapidly on that). The end result is I have to refactor by hand. And that is what introduces bugs.
I have tests to catch them... usually. I do find myself often in front of a steaming pile of untested and nigh untestable code with the chicken/egg problem of having to refactor the code in order to test it, but having to test it in order to refactor it. Ick. At this point I have to write some very dumb, high level "does the program output the same thing it did before" sort of tests just to make sure I didn't break something.
Static types, as envisioned in Java or C++ or C#, really only solve a small class of programming problems. They guarantee your interfaces are passed bits of data with the right label. But just because you get a Collection doesn't mean that Collection contains the data you think it does. Because you get an integer doesn't mean you got the right integer. Your method takes a User object, but is that User logged in?
Classic example: public static double sqrt(double a) is the signature for the Java square root function. Square root doesn't work on negative numbers. Where does it say that in the signature? It doesn't. Even worse, where does it say what that function even does? The signature only says what types it takes and what it returns. It says nothing about what happens in between and that's where the interesting code lives. Some people have tried to capture the full API by using design by contract, which can broadly be described as embedding run-time tests of your function's inputs, outputs and side effects (or lack thereof)... but that's another show.
An API is far more than just function signatures (if it wasn't, you wouldn't need all that descriptive prose in the Javadocs) and refactoring is far more even than just changing the API.
The biggest refactoring advantage a statically typed, statically compiled, non-dynamic language gives you is the ability to write refactoring tools to do quite complex refactorings for you because it knows where all the calls to your methods are. I'm pretty envious of IntelliJ IDEA.
I would say refactoring goes beyond what the compiler can check, even in statically-typed languages. Refactoring is just changing a programs internal structure without affecting the external behavior. Even in dynamic languages, there are still things that you can expect to happen and test for, you just lose a little bit of assistance from the compiler.
One of the benefits of using var in C# 3.0 is that you can often change the type without breaking any code. The type needs to still look the same - properties with the same names must exist, methods with the same or similar signature must still exist. But you can really change to a very different type, even without using something like ReSharper.