Find common function in decompiled scripts - refactoring

I have a bunch of decompiled scripts that look like C code.
These scripts were decompiled (I cannot really tell exactly how), so they were not written as-is by humans.
However they show some very interesting patterns that lead me to think that the decompiler duplicated some functions in most of the scripts because it followed the execution path.
For example most of the scripts have a function like:
void *fct_9887(int param_123, char param_864) {
fct_4698(param_75, param_83);
return CORE_FUNCTION(param_864, param_123);
}
But for each scripts, function names and parameters names are different. So the exact same function will appear in another script like
void *fct_57(int param_93, char param_75) {
fct_1000(param_75, param_83);
return CORE_FUNCTION(param_75, param_83);
}
As you can guess here, fct_4698 and fct_1000 are also identical, making these too functions duplicated code.
Some of the functions actually keep the same name across all the scripts (here, CORE_FUNCTION), because are external functions, and thus shared between scripts
I am trying to find a way to "re-factor" this and try to extract common function used in these scripts. I know it will sometimes not be possible but for some functions it should be possible.
How would you proceed to do such a thing?

Related

Simple function is not inlined

I am adding metrics calls to my Go program using Prometheus. I decided to separate all the Prometheus calls to simple function calls in a separate source file for maintainability (in case I want to move to a different metrics package). But more important it also makes it faster to write the code as the IDE will prompt with the label names as parameters to the function call. Eg something like this:
var requestCounter = promauto.NewCounterVec(prometheus.CounterOpts{}, []string{"name"})
func incrementRequestCounter(label1, label2 string) {
requestCounter.WithLabelValues(label1, label2).Inc()
}
Some of these functions are called often in a low-level loop so I don't want these calls to slow down the code too much. My assumption was that such a simple line of code would be easy to inline. However checking (with build option --gcflags -m) I found that the above single line function is not inlined (go1.12.5 windows/amd64). Does anyone know why? And how to get around this? Note that this function is inlined:
func incrementRequestCounter(label1, label2 string) {
requestCounter.WithLabelValues(label1, label2)
}
With further experimentation it seems that a function will not be inlined if it has more than one call to non-inlineable functions. (You can have lots of calls to inlineable functions and a function will still be inlineable.)
Just posting an answer (since nobody else has) with these points:
Benchmark before trying to optimise.
A seemingly simple function may be difficult to inline
Inlining is evolving and the above may be inlined in the future

Is there a way to make the compiler copy-paste entire methods for performance? [duplicate]

How do you do "inline functions" in C#? I don't think I understand the concept. Are they like anonymous methods? Like lambda functions?
Note: The answers almost entirely deal with the ability to inline functions, i.e. "a manual or compiler optimization that replaces a function call site with the body of the callee." If you are interested in anonymous (a.k.a. lambda) functions, see #jalf's answer or What is this 'Lambda' everyone keeps speaking of?.
Finally in .NET 4.5, the CLR allows one to hint/suggest1 method inlining using MethodImplOptions.AggressiveInlining value. It is also available in the Mono's trunk (committed today).
// The full attribute usage is in mscorlib.dll,
// so should not need to include extra references
using System.Runtime.CompilerServices;
...
[MethodImpl(MethodImplOptions.AggressiveInlining)]
void MyMethod(...)
1. Previously "force" was used here. I'll try to clarify the term. As in the comments and the documentation, The method should be inlined if possible. Especially considering Mono (which is open), there are some mono-specific technical limitations considering inlining or more general one (like virtual functions). Overall, yes, this is a hint to compiler, but I guess that is what was asked for.
Inline methods are simply a compiler optimization where the code of a function is rolled into the caller.
There's no mechanism by which to do this in C#, and they're to be used sparingly in languages where they are supported -- if you don't know why they should be used somewhere, they shouldn't be.
Edit: To clarify, there are two major reasons they need to be used sparingly:
It's easy to make massive binaries by using inline in cases where it's not necessary
The compiler tends to know better than you do when something should, from a performance standpoint, be inlined
It's best to leave things alone and let the compiler do its work, then profile and figure out if inline is the best solution for you. Of course, some things just make sense to be inlined (mathematical operators particularly), but letting the compiler handle it is typically the best practice.
Update: Per konrad.kruczynski's answer, the following is true for versions of .NET up to and including 4.0.
You can use the MethodImplAttribute class to prevent a method from being inlined...
[MethodImpl(MethodImplOptions.NoInlining)]
void SomeMethod()
{
// ...
}
...but there is no way to do the opposite and force it to be inlined.
You're mixing up two separate concepts. Function inlining is a compiler optimization which has no impact on the semantics. A function behaves the same whether it's inlined or not.
On the other hand, lambda functions are purely a semantic concept. There is no requirement on how they should be implemented or executed, as long as they follow the behavior set out in the language spec. They can be inlined if the JIT compiler feels like it, or not if it doesn't.
There is no inline keyword in C#, because it's an optimization that can usually be left to the compiler, especially in JIT'ed languages. The JIT compiler has access to runtime statistics which enables it to decide what to inline much more efficiently than you can when writing the code. A function will be inlined if the compiler decides to, and there's nothing you can do about it either way. :)
Cody has it right, but I want to provide an example of what an inline function is.
Let's say you have this code:
private void OutputItem(string x)
{
Console.WriteLine(x);
//maybe encapsulate additional logic to decide
// whether to also write the message to Trace or a log file
}
public IList<string> BuildListAndOutput(IEnumerable<string> x)
{ // let's pretend IEnumerable<T>.ToList() doesn't exist for the moment
IList<string> result = new List<string>();
foreach(string y in x)
{
result.Add(y);
OutputItem(y);
}
return result;
}
The compilerJust-In-Time optimizer could choose to alter the code to avoid repeatedly placing a call to OutputItem() on the stack, so that it would be as if you had written the code like this instead:
public IList<string> BuildListAndOutput(IEnumerable<string> x)
{
IList<string> result = new List<string>();
foreach(string y in x)
{
result.Add(y);
// full OutputItem() implementation is placed here
Console.WriteLine(y);
}
return result;
}
In this case, we would say the OutputItem() function was inlined. Note that it might do this even if the OutputItem() is called from other places as well.
Edited to show a scenario more-likely to be inlined.
Do you mean inline functions in the C++ sense? In which the contents of a normal function are automatically copied inline into the callsite? The end effect being that no function call actually happens when calling a function.
Example:
inline int Add(int left, int right) { return left + right; }
If so then no, there is no C# equivalent to this.
Or Do you mean functions that are declared within another function? If so then yes, C# supports this via anonymous methods or lambda expressions.
Example:
static void Example() {
Func<int,int,int> add = (x,y) => x + y;
var result = add(4,6); // 10
}
Yes Exactly, the only distinction is the fact it returns a value.
Simplification (not using expressions):
List<T>.ForEach Takes an action, it doesn't expect a return result.
So an Action<T> delegate would suffice.. say:
List<T>.ForEach(param => Console.WriteLine(param));
is the same as saying:
List<T>.ForEach(delegate(T param) { Console.WriteLine(param); });
the difference is that the param type and delegate decleration are inferred by usage and the braces aren't required on a simple inline method.
Where as
List<T>.Where Takes a function, expecting a result.
So an Function<T, bool> would be expected:
List<T>.Where(param => param.Value == SomeExpectedComparison);
which is the same as:
List<T>.Where(delegate(T param) { return param.Value == SomeExpectedComparison; });
You can also declare these methods inline and asign them to variables IE:
Action myAction = () => Console.WriteLine("I'm doing something Nifty!");
myAction();
or
Function<object, string> myFunction = theObject => theObject.ToString();
string myString = myFunction(someObject);
I hope this helps.
The statement "its best to leave these things alone and let the compiler do the work.." (Cody Brocious) is complete rubish. I have been programming high performance game code for 20 years, and I have yet to come across a compiler that is 'smart enough' to know which code should be inlined (functions) or not. It would be useful to have a "inline" statement in c#, truth is that the compiler just doesnt have all the information it needs to determine which function should be always inlined or not without the "inline" hint. Sure if the function is small (accessor) then it might be automatically inlined, but what if it is a few lines of code? Nonesense, the compiler has no way of knowing, you cant just leave that up to the compiler for optimized code (beyond algorithims).
There are occasions where I do wish to force code to be in-lined.
For example if I have a complex routine where there are a large number of decisions made within a highly iterative block and those decisions result in similar but slightly differing actions to be carried out. Consider for example, a complex (non DB driven) sort comparer where the sorting algorythm sorts the elements according to a number of different unrelated criteria such as one might do if they were sorting words according to gramatical as well as semantic criteria for a fast language recognition system. I would tend to write helper functions to handle those actions in order to maintain the readability and modularity of the source code.
I know that those helper functions should be in-lined because that is the way that the code would be written if it never had to be understood by a human. I would certainly want to ensure in this case that there were no function calling overhead.
I know this question is about C#. However, you can write inline functions in .NET with F#. see: Use of `inline` in F#
No, there is no such construct in C#, but the .NET JIT compiler could decide to do inline function calls on JIT time. But i actually don't know if it is really doing such optimizations.
(I think it should :-))
In case your assemblies will be ngen-ed, you might want to take a look at TargetedPatchingOptOut. This will help ngen decide whether to inline methods. MSDN reference
It is still only a declarative hint to optimize though, not an imperative command.
Lambda expressions are inline functions! I think, that C# doesn`t have a extra attribute like inline or something like that!

c++ Hidden Unique Pointer

I have some code which depends on some include files which are partly defined at the start of source files (which is usual) and others which are used within functions.
I typical example for that are the OpenFOAM solver sources.
Because the scheme of this code is highly procedural, but I want to put all this into a class which provides init(), run() and maybe release(), I plan to put some of the variables into the classes as private making them members.
I don't want to modify the included files because they belong to a library.
The reason for using a class is that other routines classes run together with this code.
Here is the thing. init() must prepare some variable and there situations that theses variables (being type of other clases) not explicit constructors and special arguments. It is called once. run() is called several times. The procedural code has a loop only and the contents of that loop are put into the run() method.
So the best solution was to put these variables into std::unique_ptr and init can construct whatever it needs to. Obviously with that trick the variable signature changed, so I created a second declaration of a reference like this:
std::unique_ptr<volScalarField> mp_p;
volScalarField &p = *mp_p;
Now this is a bit tedious so I created a macro
FOAMPTR(volVectorField, p)
which does all the work for me:
#define FOAMPTR(TYPE,NAME) std::unique_ptr<TYPE> mp_##NAME; TYPE &NAME=*mp_##NAME
It works pretty well, but I'm not fan of macros in general, especially if you need to debug code.
Now my question is: Is there a better way to tackle this and use something else like a template definition which might do all the magic?
Edit: With 'works pretty well' I mean, that the compiler can translate that. The reference though still is invalid.
Edit: Okay, I solved the invalid pointer problem using two Macros:
#define FOAMPTR(TYPE,NAME) std::unique_ptr<TYPE> mp_##NAME
#define FETCHFOAMREF(NAME) auto &NAME=*mp_##NAME
Now I put FOAMPTR(TYPE,NAME) to the member and I get my unique ptrs. In the run() method the second macro FETCHFOAMREF(NAME) is used. Of course init() must be sure to correctly initialize the object or else the program is going to crash.
I still leave the question open because I'm not satisfied with that solution.

Bash Functions Order and Timing

This should be easy to answer, but I couldn't find exactly what I was asking on google/stackoverflow.
I have a bash script with 18 functions (785 lines)- ridiculous, I know I need to learn another language for the lengthy stuff. I have to run these functions in a particular order because the functions later in the sequence use info from the database and/or text files that were modified by the functions preceding. I am pretty much done with the core functionality of all the functions individually and I would like a function to run them all (One ring to rule them all!).
So my questions are, if I have a function like so:
function precious()
{
rings_of #Functions in Sequence
elves #This function Modifies DB
men #This function uses DB to modify text
dwarves #This function uses that modified text
}
Would variables be carried from one function to the next if declared like so? (inside of a function):
function men()
{
...
frodo_sw_name=`some DB query returning the name of Frodo's sword`
...
}
Also, if the functions are called in a specific order, as seen above, will Bash wait for one function to finish before starting the next? - I am pretty sure the answer is yes, but I have a lot of typing to do either way, and since I couldn't find this answer quickly on the internet, I figured it might benefit others to have this answer posted as well.
Thanks!
Variables persist unless you run the function in a subshell. This would happen if you run it as part of a pipeline, or group it with (...) (you should use { ... } instead for grouping if you don't want to create a subshell.
The exception is if you explicitly declare the variables in the function with declare, typeset, or local, which makes them local to that function rather than global to the script. But you can also use the -g option to declare and typeset to declare global variables (this would obviously be inappropriate for the local declaration).
See this tutorial on variable scope in bash.
Commands are all run sequentially, unless you deliberately background them with & at the end. There's no difference between functions and other commands in this regard.

How can I make an external toolbox available to a MATLAB Parallel Computing Toolbox job?

As a continuation of this question and the subsequent answer, does anyone know how to have a job created using the Parallel Computing Toolbox (using createJob and createTask) access external toolboxes? Is there a configuration parameter I can specify when creating the function to specify toolboxes that should be loaded?
According to this section of the documentation, one way you can do this is to specify either the 'PathDependencies' property or the 'FileDependencies' property of the job object so that it points to the functions you need the job's workers to be able to use.
You should be able to point the way to the KbCheck function in PsychToolbox, along with any other functions or directories needed for KbCheck to work properly. It would look something like this:
obj = createJob('PathDependencies',{'path_to_KbCheck',...
'path_to_other_PTB_functions'});
A few comments, based on my work troubleshooting this:
It appears that there are inconsistencies with how well nested functions and anonymous functions work with the Parallel Computation toolkit. I was unable to get them to work, while others have been able to. (Also see here.) As such, I would recommend having each function stored in it's own file, and including those files using the PathDependencies or FileDependencies properties, as described by gnovice above.
It is very hard to troubleshoot the Parallel Computation toolkit, as everything happens outside your view. Use breakpoints liberally in your code, and the inspect command is your friend. Also note that if there is an error, task objects will contain an error parameter, which in turn will contain ErrorMessage string, and possibly the Error.causes MException object. Both of these were immensely useful in debugging.
When including Psychtoolbox, you need to do it as follows. First, create a jobStartup.m file with the following lines:
PTB_path = '/Users/eliezerk/Documents/MATLAB/Psychtoolbox3/';
addpath( PTB_path );
cd( PTB_path );
SetupPsychtoolbox;
However, since the Parallel Computation toolkit can't handle any graphics functionality, running SetupPsychtoolbox as-is will actually cause your thread to crash. To avoid this, you need to edit the PsychtoolboxPostInstallRoutine function, which is called at the very end of SetupPsychtoolbox. Specifically, you want to comment out the line AssertOpenGL (line 496, as of the time of this answer; this may change in future releases).

Resources