Simple function is not inlined - go

I am adding metrics calls to my Go program using Prometheus. I decided to separate all the Prometheus calls to simple function calls in a separate source file for maintainability (in case I want to move to a different metrics package). But more important it also makes it faster to write the code as the IDE will prompt with the label names as parameters to the function call. Eg something like this:
var requestCounter = promauto.NewCounterVec(prometheus.CounterOpts{}, []string{"name"})
func incrementRequestCounter(label1, label2 string) {
requestCounter.WithLabelValues(label1, label2).Inc()
}
Some of these functions are called often in a low-level loop so I don't want these calls to slow down the code too much. My assumption was that such a simple line of code would be easy to inline. However checking (with build option --gcflags -m) I found that the above single line function is not inlined (go1.12.5 windows/amd64). Does anyone know why? And how to get around this? Note that this function is inlined:
func incrementRequestCounter(label1, label2 string) {
requestCounter.WithLabelValues(label1, label2)
}
With further experimentation it seems that a function will not be inlined if it has more than one call to non-inlineable functions. (You can have lots of calls to inlineable functions and a function will still be inlineable.)

Just posting an answer (since nobody else has) with these points:
Benchmark before trying to optimise.
A seemingly simple function may be difficult to inline
Inlining is evolving and the above may be inlined in the future

Related

Is there a way to make the compiler copy-paste entire methods for performance? [duplicate]

How do you do "inline functions" in C#? I don't think I understand the concept. Are they like anonymous methods? Like lambda functions?
Note: The answers almost entirely deal with the ability to inline functions, i.e. "a manual or compiler optimization that replaces a function call site with the body of the callee." If you are interested in anonymous (a.k.a. lambda) functions, see #jalf's answer or What is this 'Lambda' everyone keeps speaking of?.
Finally in .NET 4.5, the CLR allows one to hint/suggest1 method inlining using MethodImplOptions.AggressiveInlining value. It is also available in the Mono's trunk (committed today).
// The full attribute usage is in mscorlib.dll,
// so should not need to include extra references
using System.Runtime.CompilerServices;
...
[MethodImpl(MethodImplOptions.AggressiveInlining)]
void MyMethod(...)
1. Previously "force" was used here. I'll try to clarify the term. As in the comments and the documentation, The method should be inlined if possible. Especially considering Mono (which is open), there are some mono-specific technical limitations considering inlining or more general one (like virtual functions). Overall, yes, this is a hint to compiler, but I guess that is what was asked for.
Inline methods are simply a compiler optimization where the code of a function is rolled into the caller.
There's no mechanism by which to do this in C#, and they're to be used sparingly in languages where they are supported -- if you don't know why they should be used somewhere, they shouldn't be.
Edit: To clarify, there are two major reasons they need to be used sparingly:
It's easy to make massive binaries by using inline in cases where it's not necessary
The compiler tends to know better than you do when something should, from a performance standpoint, be inlined
It's best to leave things alone and let the compiler do its work, then profile and figure out if inline is the best solution for you. Of course, some things just make sense to be inlined (mathematical operators particularly), but letting the compiler handle it is typically the best practice.
Update: Per konrad.kruczynski's answer, the following is true for versions of .NET up to and including 4.0.
You can use the MethodImplAttribute class to prevent a method from being inlined...
[MethodImpl(MethodImplOptions.NoInlining)]
void SomeMethod()
{
// ...
}
...but there is no way to do the opposite and force it to be inlined.
You're mixing up two separate concepts. Function inlining is a compiler optimization which has no impact on the semantics. A function behaves the same whether it's inlined or not.
On the other hand, lambda functions are purely a semantic concept. There is no requirement on how they should be implemented or executed, as long as they follow the behavior set out in the language spec. They can be inlined if the JIT compiler feels like it, or not if it doesn't.
There is no inline keyword in C#, because it's an optimization that can usually be left to the compiler, especially in JIT'ed languages. The JIT compiler has access to runtime statistics which enables it to decide what to inline much more efficiently than you can when writing the code. A function will be inlined if the compiler decides to, and there's nothing you can do about it either way. :)
Cody has it right, but I want to provide an example of what an inline function is.
Let's say you have this code:
private void OutputItem(string x)
{
Console.WriteLine(x);
//maybe encapsulate additional logic to decide
// whether to also write the message to Trace or a log file
}
public IList<string> BuildListAndOutput(IEnumerable<string> x)
{ // let's pretend IEnumerable<T>.ToList() doesn't exist for the moment
IList<string> result = new List<string>();
foreach(string y in x)
{
result.Add(y);
OutputItem(y);
}
return result;
}
The compilerJust-In-Time optimizer could choose to alter the code to avoid repeatedly placing a call to OutputItem() on the stack, so that it would be as if you had written the code like this instead:
public IList<string> BuildListAndOutput(IEnumerable<string> x)
{
IList<string> result = new List<string>();
foreach(string y in x)
{
result.Add(y);
// full OutputItem() implementation is placed here
Console.WriteLine(y);
}
return result;
}
In this case, we would say the OutputItem() function was inlined. Note that it might do this even if the OutputItem() is called from other places as well.
Edited to show a scenario more-likely to be inlined.
Do you mean inline functions in the C++ sense? In which the contents of a normal function are automatically copied inline into the callsite? The end effect being that no function call actually happens when calling a function.
Example:
inline int Add(int left, int right) { return left + right; }
If so then no, there is no C# equivalent to this.
Or Do you mean functions that are declared within another function? If so then yes, C# supports this via anonymous methods or lambda expressions.
Example:
static void Example() {
Func<int,int,int> add = (x,y) => x + y;
var result = add(4,6); // 10
}
Yes Exactly, the only distinction is the fact it returns a value.
Simplification (not using expressions):
List<T>.ForEach Takes an action, it doesn't expect a return result.
So an Action<T> delegate would suffice.. say:
List<T>.ForEach(param => Console.WriteLine(param));
is the same as saying:
List<T>.ForEach(delegate(T param) { Console.WriteLine(param); });
the difference is that the param type and delegate decleration are inferred by usage and the braces aren't required on a simple inline method.
Where as
List<T>.Where Takes a function, expecting a result.
So an Function<T, bool> would be expected:
List<T>.Where(param => param.Value == SomeExpectedComparison);
which is the same as:
List<T>.Where(delegate(T param) { return param.Value == SomeExpectedComparison; });
You can also declare these methods inline and asign them to variables IE:
Action myAction = () => Console.WriteLine("I'm doing something Nifty!");
myAction();
or
Function<object, string> myFunction = theObject => theObject.ToString();
string myString = myFunction(someObject);
I hope this helps.
The statement "its best to leave these things alone and let the compiler do the work.." (Cody Brocious) is complete rubish. I have been programming high performance game code for 20 years, and I have yet to come across a compiler that is 'smart enough' to know which code should be inlined (functions) or not. It would be useful to have a "inline" statement in c#, truth is that the compiler just doesnt have all the information it needs to determine which function should be always inlined or not without the "inline" hint. Sure if the function is small (accessor) then it might be automatically inlined, but what if it is a few lines of code? Nonesense, the compiler has no way of knowing, you cant just leave that up to the compiler for optimized code (beyond algorithims).
There are occasions where I do wish to force code to be in-lined.
For example if I have a complex routine where there are a large number of decisions made within a highly iterative block and those decisions result in similar but slightly differing actions to be carried out. Consider for example, a complex (non DB driven) sort comparer where the sorting algorythm sorts the elements according to a number of different unrelated criteria such as one might do if they were sorting words according to gramatical as well as semantic criteria for a fast language recognition system. I would tend to write helper functions to handle those actions in order to maintain the readability and modularity of the source code.
I know that those helper functions should be in-lined because that is the way that the code would be written if it never had to be understood by a human. I would certainly want to ensure in this case that there were no function calling overhead.
I know this question is about C#. However, you can write inline functions in .NET with F#. see: Use of `inline` in F#
No, there is no such construct in C#, but the .NET JIT compiler could decide to do inline function calls on JIT time. But i actually don't know if it is really doing such optimizations.
(I think it should :-))
In case your assemblies will be ngen-ed, you might want to take a look at TargetedPatchingOptOut. This will help ngen decide whether to inline methods. MSDN reference
It is still only a declarative hint to optimize though, not an imperative command.
Lambda expressions are inline functions! I think, that C# doesn`t have a extra attribute like inline or something like that!

Documentation for architecture-specific Golang function

I have a function that I would like to provide an assembly implementation for
on amd64 architecture. For the sake of discussion let's just suppose it's an
Add function, but it's actually more complicated than this. I have the
assembly version working but my question concerns getting the godoc to display
correctly. I have a feeling this is currenty impossible, but I wanted to seek
advice.
Some more details:
The assembly implementation of this function contains only a few
instructions. In particular, the mere cost of calling the function is a
significant part of the entire cost.
It makes use of special instructions (BMI2) therefore can only be used
following a CPUID capability check.
The implementation is structured like this gist. At a high level:
In the generic (non-amd64 case) the function is defined by delegating to
addGeneric.
In the amd64 case the function is actually a variable, initially set to
addGeneric but replaced by addAsm in the init function if a cpuid
check passes.
This approach works. However the godoc output is crappy because in the
amd64 case the function is actually a variable. Note godoc appears to be
picking up the same build tags as the machine it's running on. I'm not sure
what godoc.org would do.
Alternatives considered:
The Add function delegates to addImpl. Then we pull some similar trick
to replace addImpl in the amd64 case. The problem with this is (in my
experiments) Go doesn't seem to be able to inline the call, and the assembly
is now wrapped in two function calls. Since the assembly is so small already
this has a noticable impact on performance.
In the amd64 case we define a plain function Add that has the useAsm
check inside it, and calls one of addGeneric and addAsm depending on the
result. This would have an even worse impact on performance.
So I guess the questions are:
Is there a better way to structure the code to achieve the performance I
want, and have it appear properly in documentation.
If there is no alternative, is there some other way to "trick" godoc?
See math.Sqrt for an example of how to do this.
Write a stub function with the documentation
Write a generic implementation as an unexported function.
For each architecture, write a function in assembler that jumps to the unexported generic implementation or implements the function directly.
To handle the cpuid check, set a package variable in init() and conditionally jump based on that variable in the assembly implementation.

Find common function in decompiled scripts

I have a bunch of decompiled scripts that look like C code.
These scripts were decompiled (I cannot really tell exactly how), so they were not written as-is by humans.
However they show some very interesting patterns that lead me to think that the decompiler duplicated some functions in most of the scripts because it followed the execution path.
For example most of the scripts have a function like:
void *fct_9887(int param_123, char param_864) {
fct_4698(param_75, param_83);
return CORE_FUNCTION(param_864, param_123);
}
But for each scripts, function names and parameters names are different. So the exact same function will appear in another script like
void *fct_57(int param_93, char param_75) {
fct_1000(param_75, param_83);
return CORE_FUNCTION(param_75, param_83);
}
As you can guess here, fct_4698 and fct_1000 are also identical, making these too functions duplicated code.
Some of the functions actually keep the same name across all the scripts (here, CORE_FUNCTION), because are external functions, and thus shared between scripts
I am trying to find a way to "re-factor" this and try to extract common function used in these scripts. I know it will sometimes not be possible but for some functions it should be possible.
How would you proceed to do such a thing?

Matlab: Are local functions (subfunctions) compiled together with main function or separately?

I have heard that MATLAB has an automatic in-need compilation of functions which could create a lot of function-call overhead if you call a function many times like in the following code:
function output = BigFunction( args )
for i = 1:10000000
SmallFunction( args );
end
end
Is it faster to call the function SmallFunction() if I put it in the same file as BigFunction() as a local function? Or is there any good solution other than pasting the code from SmallFunction() into the BigFunction() to optimize the performance?
Edit: It may be false assumption that the function-call overhead is because of the in-need compilation. The question is how to cut down on the overhead without making the code look awful.
Matlab hashes the functions it reads into memory. The functions are only compiled once if they exist as an independent function in its own file. If you put BigFunction in BigFunction.m and SmallFunction in SmallFunction.m then you should recieve the optimization benefit of having the m-script compiled once.
The answer to my first question is that a local function performs the same as a function in another file.
An idea for the second question is to, if possible, make SmallFunction() an inline-function, which has less function-call overhead. I found more about function-call performances in the MathWorks forum, and I paste the question and answer below:
Question:
I have 7 different types of function call:
An Inline function. The body of the function is directory written down (inline).
A function is defined in a separate MATLAB file. The arguments are passed by the calling function (file-pass).
A function is defined in a separate MATLAB file. The arguments are provided by referencing global variables; only indices are provided by the calling function (file-global).
A nested function. The arguments are passed by the enclosing function (nest-pass).
A nested function. The arguments are those shared with the enclosing function; only indices are provided by the enclosing function (nest-share).
A sub function. The arguments are passed by the calling function (sub-pass).
A sub function. The arguments are provided by referencing global variables; only indices are provided by the calling function (sub-global).
I would like to know which function call provides better performance than the others in general.
The answer from MathWorks Support Team pasted here:
The ordering of performance of each function call from the fastest to the slowest tends to be as follows:
inline > file-pass = nest-pass = sub-pass > nest-share > sub-global > file-global
(A>B means A is faster than B and A=B means A is as fast as B)
First, inline is the fastest as it does not incur overhead associated with function call.
Second, when the arguments are passed to the callee function, the calling function sets up the arguments in such a way that the callee function knows where to retrieve them. This setup associated with function call in general incurs performance overhead, and therefore file-pass, nest-pass, and sub-pass are slower than inline.
Third, if the workspace is shared with nested functions and the arguments to a nested function are those shared within the workspace, rather than pass-by-value, then performance of that function call is inhibited. If MATLAB sees a shared variable within the shared workspace, it searches the workspace for the variable. On the other hand, if the arguments are passed by the calling function, then MATLAB does not have to search for them. The time taken for this search explains that type nest-share is slower than file-pass, nest-pass, and sub-pass.
Finally, when a function call involves global variables, performance is even more inhibited. This is because to look for global variables, MATLAB has to expand its search space to the outside of the current workspace. Furthermore, the reason a function call involving global variables appears a lot slower than the others is that MATLAB Accelerator does not optimize such a function call. When MATLAB Accelerator is turned off with the following command,
feature accel off
the difference in performance between inline and file-global becomes less significant.
Please note that the behaviors depend largely on various factors such as operating systems, CPU architectures, MATLAB Interpreter, and what the MATLAB code is doing.

Return from a subroutine

I want to write a subroutine for working out what to do and then returning.
Before you jump on the "A subroutine that returns is a function LOL!" bandwagon, I want the return to be executed as it were in the function body calling the subroutine, as though I've got a preprocessor to do the substitution, because otherwise this codebase is going to get unwieldy really fast, and returning the return value of a function seems kludgy.
Will vb (sorry I can't be more specific about what version- I'm writing formulas for an embedded system, our API docs are "it runs vb") let me do that or fall in a heap?
I want the return to be executed as it were in the function body calling the subroutine, as though I've got a preprocessor to do the substitution, because otherwise this codebase is going to get unwieldy really fast, and returning the return value of a function seems kludgy.
It's not. Tail-calls are a common practice that work just fine.
They are much preferable than having a function that cannot ever be called unless you want to return its value.
It sounds like you are asking whether C/C++ style macros can be implemented in VB, the answer is no. You could possibly fake it though by generating vbscript, and substituting the right things in.
Lambdas and delegates in VB.Net are not really the same thing as what you are asking for - if my interpretation is correct.

Resources