How to auto-optimize a Python application?

How to auto-optimize a Python application? - performance

I have a complex Python server application, which is batch based. I want this application to work as fast a possible. In this application there are probably something like 100 integer constants that somehow affect the performance of the application. These could be something like the initial size of a dictionary, setting memory constraints of external programs.
What I would like to do is to enable an optimizer program to modify these 100 integer values and run thousands of tests over night and figure out what set of parameters would have the Python program finish in the shortest time.
Does such a thing exist? I imagine that I could build this somehow using the EXEC statement and the replace function to modify the integers.

If the effect of each variable is independent of the other variables, you can feasibly do this using a script to optimize each variable in turn... if each variable can assume k values and there are n variables, this is O(nk). If the variables may affect each other's effect on performance is completely arbitrary ways, you would have to enumerate and test all O(k^n) assignments. If you're somewhere in between, it makes describing an algorithm a little harder.
Regarding the mechanics, as soon as you figure out what configurations are meaningful (as in above), a simple script/program using e.g. exec or time should work. Even if a tool did exist, you'd still need an answer to the above to avoid the brute-force O(k^n) solution... or recognize that this is the best you can do.

Related

OpenMDAO: Use Parallel Groups in order to run same computations i parallel, to make the optimization faster

I am writing this post in the hope to understand how to set parallel computations by using OpenMDAO. I have been told that OpenMDAO has an option called Parallel Groups (https://openmdao.org/newdocs/versions/latest/features/core_features/working_with_groups/parallel_group.html) and I am wondering if this option could help me to make a gradient-free optimizer able to run in parallel the computations of the function that it has to study.
Do you know if I can create 2 or 3 instances of the function that I am trying to optimize, and in that way make OpenMDAO able to run the instances of the function with differents chosen inputs, in order to find the optimal results with less time that if it had to work with only one function instance ?
I saw that this thread was closer to what I am trying to do: parallelize openmdao optimization with different initial guesses I think it could have brought me some answers, but it appears that the link proposed as an answer is not available anymore.
Many thanks in advance for your help

to start, You'll need to install MPI, MPI4py, PETSc, and PETSc4py. These can be installed without too much effort on linux. They are a little harder on OSx, and VERY hard on windows.
You can use parallel groups to run multiple components in parallel. Whether or not you can make use of that for a gradient free method though is a trickier question. Unfortunately, as of V3.17, none of the current gradient free drivers are set up to work that way.
You could very probably make it work, but it will require some development on your part. You'll need to find a way to map the "generation data" (using that GA term as a generic reference to the set of parallel cases you can run at once for a gradient free method). That will almost certainly involve setting up a for loop outside the normal OpenMDAO run method.
You would set up the model with n instances in parallel where n is equal to the size of a generation. Then write your own code around a call to run_model that would map the gradient free data down into that model to run the cases all at once.
I am essentially proposing that you forgo the driver API and write your own execution code around OpenMDAO. This modeling approach was prototyped in the 2020 Reverse Hackathon, where we discussed how the driver API is not strictly necessary.

"GLOBAL could be very inefficient"

I am using (in Matlab) a global statement inside an if command, so that I import the global variable into the local namespace only if it is really needed.
The code analyzer warns me that "global could be very inefficient unless it is a top-level statement in its function". Thinking about possible internal implementation, I find this restriction very strange and unusual. I am thinking about two possibilities:
What this warning really means is "global is very inefficient of its own, so don't use it in a loop". In particular, using it inside an if, like I'm doing, is perfectly safe, and the warning is issued wrongly (and poorly worded)
The warning is correct; Matlab uses some really unusual variable loading mechanism in the background, so it is really much slower to import global variables inside an if statement. In this case, I'd like to have a hint or a pointer to how this stuff really works, because I am interested and it seems to be important if I want to write efficient code in future.
Which one of these two explanations is correct? (or maybe neither is?)
Thanks in advance.
EDIT: to make it clearer: I know that global is slow (and apparently I can't avoid using it, as it is a design decision of an old library I am using); what I am asking is why the Matlab code analyzer complains about
if(foo==bar)
GLOBAL baz
baz=1;
else
do_other_stuff;
end
but not about
GLOBAL baz
if(foo==bar)
baz=1;
else
do_other_stuff;
end
I find it difficult to imagine a reason why the first should be slower than the second.

To supplement eykanals post, this technical note gives an explanation to why global is slow.
... when a function call involves global variables, performance is even more inhibited. This is because to look for global variables, MATLAB has to expand its search space to the outside of the current workspace. Furthermore, the reason a function call involving global variables appears a lot slower than the others is that MATLAB Accelerator does not optimize such a function call.

I do not know the answer, but I strongly suspect this has to do with how memory is allocated and shared at runtime.
Be that as it may, I recommend reading the following two entries on the Mathworks blogs by Loren and Doug:
Writing deployable code, the very first thing he writes in that post
Top 10 MATLAB code practices that make me cry, #2 on that list.
Long story short, global variables are almost never the way to go; there are many other ways to accomplish variable sharing - some of which she discusses - which are more efficient and less error-prone.

The answer from Walter Roberson here
http://mathworks.com/matlabcentral/answers/19316-global-could-be-very-inefficient#answer_25760
[...] This is not necessarily more work if not done in a top-level command, but people would tend to put the construct in a loop, or in multiple non-exclusive places in conditional structures. It is a lot easier for a person writing mlint warnings to not have to add clarifications like, "Unless you can prove those "global" will only be executed once, in which case it isn't less efficient but it is still bad form"
supports my option (1).

Fact(from Matlab 2014 up until Matlab 2016a, and not using parallell toolbox): often, the fastest code you can achieve with Matlab is by doing nested functions, sharing your variables between functions without passing them.
The step close to that, is using global variables, and splitting your project up into multiple files. This may pull down performance slightly, because (supposedly, although I have never seen it verified in any tests) Matlab incurs overhead by retrieving from the global workspace, and because there is some kind of problem (supposedly, although never seen any evidence of it) with the JIT acceleration.
Through my own testing, passing very large data matrices (hi-res images) between calls to functions, using nested functions or global variables are almost identical in performance.
The reason that you can get superior performance with global variables or nested functions, is because you can avoid having extra data copying that way. If you send a variable to function, Matlab does so by reference, but if you modify the variable in the function, Matlab makes a copy on the fly (copy-on-write). There is no way I know of to avoid that in Matlab, except by nested functions and global variables. Any small drain you get from hinderance to JIT or global fetch times, is totally gained by avoiding this extra data copying, (when using larger data).
This may have changed with never versions of Matlab, but from what i hear from friends, I doubt it. I cant submit any test, dont have a Matlab license anymore.
As proof, look no further then this toolbox of video processing i made back in the day I was working with Matlab. It is horribly ugly under the hood, because I had no way of getting performance without globals.
This fact about Matlab (that global variables is the most optimized way you can code when you need to modify large data in different functions), is an indication that the language and/or interpreter needs to be updated.
Instead, Matlab could use a better, more dynamic notion of workspace. But nothing I have seen indicates this will ever happen. Especially when you see the community of users seemingly ignore the facts, and push forward oppions without any basis: such as using globals in Matlab are slow.
They are not.
That said, you shouldnt use globals, ever. If you are forced to do real time video processing in pure Matlab, and you find you have no other option then using globals to reach performance, you should get the hint and change language. Its time to get into higher performance languages.... and also maybe write an occasional rant on stack overflow, in hopes that Matlab can get improved by swaying the oppinions of its users.

Code generation by genetic algorithms

Evolutionary programming seems to be a great way to solve many optimization problems. The idea is very easy and the implementation does not make problems.
I was wondering if there is any way to evolutionarily create a program in ruby/python script (or any other language)?
The idea is simple:
Create a population of programs
Perform genetic operations (roulette-wheel selection or any other selection), create new programs with inheritance from best programs, etc.
Loop point 2 until program that will satisfy our condition is found
But there are still few problems:
How will chromosomes be represented? For example, should one cell of chromosome be one line of code?
How will chromosomes be generated? If they will be lines of code, how do we generate them to ensure that they are syntactically correct, etc.?
Example of a program that could be generated:
Create script that takes N numbers as input and returns their mean as output.
If there were any attempts to create such algorithms I'll be glad to see any links/sources.

If you are sure you want to do this, you want genetic programming, rather than a genetic algorithm. GP allows you to evolve tree-structured programs. What you would do would be to give it a bunch of primitive operations (while($register), read($register), increment($register), decrement($register), divide($result $numerator $denominator), print, progn2 (this is GP speak for "execute two commands sequentially")).
You could produce something like this:
progn2(
progn2(
read($1)
while($1
progn2(
while($1
progn2( #add the input to the total
increment($2)
decrement($1)
)
)
progn2( #increment number of values entered, read again
increment($3)
read($1)
)
)
)
)
progn2( #calculate result
divide($1 $2 $3)
print($1)
)
)
You would use, as your fitness function, how close it is to the real solution. And therein lies the catch, that you have to calculate that traditionally anyway*. And then have something that translates that into code in (your language of choice). Note that, as you've got a potential infinite loop in there, you'll have to cut off execution after a while (there's no way around the halting problem), and it probably won't work. Shucks. Note also, that my provided code will attempt to divide by zero.
*There are ways around this, but generally not terribly far around it.

It can be done, but works very badly for most kinds of applications.
Genetic algorithms only work when the fitness function is continuous, i.e. you can determine which candidates in your current population are closer to the solution than others, because only then you'll get improvements from one generation to the next. I learned this the hard way when I had a genetic algorithm with one strongly-weighted non-continuous component in my fitness function. It dominated all others and because it was non-continuous, there was no gradual advancement towards greater fitness because candidates that were almost correct in that aspect were not considered more fit than ones that were completely incorrect.
Unfortunately, program correctness is utterly non-continuous. Is a program that stops with error X on line A better than one that stops with error Y on line B? Your program could be one character away from being correct, and still abort with an error, while one that returns a constant hardcoded result can at least pass one test.
And that's not even touching on the matter of the code itself being non-continuous under modifications...

Well this is very possible and #Jivlain correctly points out in his (nice) answer that genetic Programming is what you are looking for (and not simple Genetic Algorithms).
Genetic Programming is a field that has not reached a broad audience yet, partially because of some of the complications #MichaelBorgwardt indicates in his answer. But those are mere complications, it is far from true that this is impossible to do. Research on the topic has been going on for more than 20 years.
Andre Koza is one of the leading researchers on this (have a look at his 1992 work) and he demonstrated as early as 1996 how genetic programming can in some cases outperform naive GAs on some classic computational problems (such as evolving programs for Cellular Automata synchronization).
Here's a good Genetic Programming tutorial from Koza and Poli dated 2003.
For a recent reference you might wanna have a look at A field guide to genetic programming (2008).

Since this question was asked, the field of genetic programming has advanced a bit, and there have been some additional attempts to evolve code in configurations other than the tree structures of traditional genetic programming. Here are just a few of them:
PushGP - designed with the goal of evolving modular functions like human coders use, programs in this system store all variables and code on different stacks (one for each variable type). Programs are written by pushing and popping commands and data off of the stacks.
FINCH - a system that evolves Java byte-code. This has been used to great effect to evolve game-playing agents.
Various algorithms have started evolving C++ code, often with a step in which compiler errors are corrected. This has had mixed, but not altogether unpromising results. Here's an example.
Avida - a system in which agents evolve programs (mostly boolean logic tasks) using a very simple assembly code. Based off of the older (and less versatile) Tierra.

The language isn't an issue. Regardless of the language, you have to define some higher-level of mutation, otherwise it will take forever to learn.
For example, since any Ruby language can be defined in terms of a text string, you could just randomly generate text strings and optimize that. Better would be to generate only legal Ruby programs. However, it would also take forever.
If you were trying to build a sorting program and you had high level operations like "swap", "move", etc. then you would have a much higher chance of success.
In theory, a bunch of monkeys banging on a typewriter for an infinite amount of time will output all the works of Shakespeare. In practice, it isn't a practical way to write literature. Just because genetic algorithms can solve optimization problems doesn't mean that it's easy or even necessarily a good way to do it.

The biggest selling point of genetic algorithms, as you say, is that they are dirt simple. They don't have the best performance or mathematical background, but even if you have no idea how to solve your problem, as long as you can define it as an optimization problem you will be able to turn it into a GA.
Programs aren't really suited for GA's precisely because code isn't good chromossome material. I have seen someone who did something similar with (simpler) machine code instead of Python (although it was more of an ecossystem simulation then a GA per se) and you might have better luck if you codify your programs using automata / LISP or something like that.
On the other hand, given how alluring GA's are and how basically everyone who looks at them asks this same question, I'm pretty sure there are already people who tried this somewhere - I just have no idea if any of them succeeded.

Good luck with that.
Sure, you could write a "mutation" program that reads a program and randomly adds, deletes, or changes some number of characters. Then you could compile the result and see if the output is better than the original program. (However we define and measure "better".) Of course 99.9% of the time the result would be compile errors: syntax errors, undefined variables, etc. And surely most of the rest would be wildly incorrect.
Try some very simple problem. Say, start with a program that reads in two numbers, adds them together, and outputs the sum. Let's say that the goal is a program that reads in three numbers and calculates the sum. Just how long and complex such a program would be of course depends on the language. Let's say we have some very high level language that lets us read or write a number with just one line of code. Then the starting program is just 4 lines:
read x
read y
total=x+y
write total
The simplest program to meet the desired goal would be something like
read x
read y
read z
total=x+y+z
write total
So through a random mutation, we have to add "read z" and "+z", a total of 9 characters including the space and the new-line. Let's make it easy on our mutation program and say it always inserts exactly 9 random characters, that they're guaranteed to be in the right places, and that it chooses from a character set of just 26 letters plus 10 digits plus 14 special characters = 50 characters. What are the odds that it will pick the correct 9 characters? 1 in 50^9 = 1 in 2.0e15. (Okay, the program would work if instead of "read z" and "+z" it inserted "read w" and "+w", but then I'm making it easy by assuming it magically inserts exactly the right number of characters and always inserts them in the right places. So I think this estimate is still generous.)
1 in 2.0e15 is a pretty small probability. Even if the program runs a thousand times a second, and you can test the output that quickly, the chance is still just 1 in 2.0e12 per second, or 1 in 5.4e8 per hour, 1 in 2.3e7 per day. Keep it running for a year and the chance of success is still only 1 in 62,000.
Even a moderately competent programmer should be able to make such a change in, what, ten minutes?
Note that changes must come in at least "packets" that are correct. That is, if a mutation generates "reax z", that's only one character away from "read z", but it would still produce compile errors, and so would fail.
Likewise adding "read z" but changing the calculation to "total=x+y+w" is not going to work. Depending on the language, you'll either get errors for the undefined variable or at best it will have some default value, like zero, and give incorrect results.
You could, I suppose, theorize incremental solutions. Maybe one mutation adds the new read statement, then a future mutation updates the calculation. But without the calculation, the additional read is worthless. How will the program be evaluated to determine that the additional read is "a step in the right direction"? The only way I see to do that is to have an intelligent being read the code after each mutation and see if the change is making progress toward the desired goal. And if you have an intelligent designer who can do that, that must mean that he knows what the desired goal is and how to achieve it. At which point, it would be far more efficient to just make the desired change rather than waiting for it to happen randomly.
And this is an exceedingly trivial program in a very easy language. Most programs are, what, hundreds or thousands of lines, all of which must work together. The odds against any random process writing a working program are astronomical.
There might be ways to do something that resembles this in some very specialized application, where you are not really making random mutations, but rather making incremental modifications to the parameters of a solution. Like, we have a formula with some constants whose values we don't know. We know what the correct results are for some small set of inputs. So we make random changes to the constants, and if the result is closer to the right answer, change from there, if not, go back to the previous value. But even at that, I think it would rarely be productive to make random changes. It would likely be more helpful to try changing the constants according to a strict formula, like start with changing by 1000's, then 100's then 10's, etc.

I want to just give you a suggestion. I don't know how successful you'd be, but perhaps you could try to evolve a core war bot with genetic programming. Your fitness function is easy: just let the bots compete in a game. You could start with well known bots and perhaps a few random ones then wait and see what happens.

Why is determining if a function is pure difficult?

I was at the StackOverflow Dev Days convention yesterday, and one of the speakers was talking about Python. He showed a Memoize function, and I asked if there was any way to keep it from being used on a non-pure function. He said no, that's basically impossible, and if someone could figure out a way to do it it would make a great PhD thesis.
That sort of confused me, because it doesn't seem all that difficult for a compiler/interpreter to solve recursively. In pseudocode:
function isPure(functionMetadata): boolean;
begin
result = true;
for each variable in functionMetadata.variablesModified
result = result and variable.isLocalToThisFunction;
for each dependency in functionMetadata.functionsCalled
result = result and isPure(dependency);
end;
That's the basic idea. Obviously you'd need some sort of check to prevent infinite recursion on mutually-dependent functions, but that's not too difficult to set up.
Higher-order functions that take function pointers might be problematic, since they can't be verified statically, but my original question presupposes that the compiler has some sort of language constraint to designate that only a pure function pointer can be passed to a certain parameter. If one existed, that could be used to satisfy the condition.
Obviously this would be easier in a compiled language than an interpreted one, since all this number-crunching would be done before the program is executed and so not slow anything down, but I don't really see any fundamental problems that would make it impossible to evaluate.
Does anyone with a bit more knowledge in this area know what I'm missing?

You also need to annotate every system call, every FFI, ...
And furthermore the tiniest 'leak' tends to leak into the whole code base.
It is not a theoretically intractable problem, but in practice it is very very difficult to do in a fashion that the whole system does not feel brittle.
As an aside, I don't think this makes a good PhD thesis; Haskell effectively already has (a version of) this, with the IO monad.
And I am sure lots of people continue to look at this 'in practice'. (wild speculation) In 20 years we may have this.

It is particularly hard in Python. Since anObject.aFunc can be changed arbitrarily at runtime, you cannot determine at compile time which function will anObject.aFunc() call or even if it will be a function at all.

In addition to the other excellent answers here: Your pseudocode looks only at whether a function modifies variables. But that's not really what "pure" means. "Pure" typically means something closer to "referentially transparent." In other words, the output is completely dependent on the input. So something as simple as reading the current time and making that a factor in the result (or reading from input, or reading the state of the machine, or...) makes the function non-pure without modifying any variables.
Also, you could write a "pure" function that did modify variables.

Here's the first thing that popped into my mind when I read your question.
Class Hierarchies
Determining if a variable is modified includes the act of digging through every single method which is called on the variable to determine if it's mutating. This is ... somewhat straight forward for a sealed type with a non-virtual method.
But consider virtual methods. You must find every single derived type and verify that every single override of that method does not mutate state. Determining this is simply not possible in any language / framework which allows for dynamic code generation or is simply dynamic (if it's possible, it's extremely difficult). The reason why is that the set of derived types is not fixed because a new one can be generated at runtime.
Take C# as an example. There is nothing stopping me from generating a derived class at runtime which overrides that virtual method and modifies state. A static verified would not be able to detect this type of modification and hence could not validate the method was pure or not.

I think the main problem would be doing it efficiently.
D-language has pure functions but you have to specify them yourself, so the compiler would know to check them. I think if you manually specify them then it would be easier to do.

Deciding whether a given function is pure, in general, is reducible to deciding whether any given program will halt - and it is well known that the Halting Problem is the kind of problem that cannot be solved efficiently.

Note that the complexity depends on the language, too. For the more dynamic languages, it's possible to redefine anything at any time. For example, in Tcl
proc myproc {a b} {
if { $a > $b } {
return $a
} else {
return $b
}
}
Every single piece of that could be modified at any time. For example:
the "if" command could be rewritten to use and update global variables
the "return" command, along the same lines, could do the same thing
the could be an execution trace on the if command that, when "if" is used, the return command is redefined based on the inputs to the if command
Admittedly, Tcl is an extreme case; one of the most dynamic languages there is. That being said, it highlights the problem that it can be difficult to determine the purity of a function even once you've entered it.

Random Number Generator: Class level or Method level?

When using a Random Number Generator, which is the better way to use it for greater randomness of the new value:
Have a method that instantiates a new instance of the RNG each time and then returns a value?
Have an instance of the RNG at the class level, which is instantiated once in the Constructor, and all subsequent calls for a new random value using the existing instance?
The issue is that there may be many calls for a random number, often in different scopes not connected with each other.
This is not a performance issue, so the fact that each call might instantiate a new instance makes no difference. This is all about the randomness of the returned value.

Option 1 does not work, actually.
Option 2 is the only choice. RNG's absolutely require that you generate the values in sequence from a single seed.
Your "create a new generator with a new seed" breaks the mathematical foundation. What you get then totally depends on your seeds, which -- sadly -- won't be very random.

I suggest option 3: have a single RNG used throughout the program. It requires locking or a thread-local if the RNG isn't thread-safe (e.g. in .NET), but it makes life a lot simpler and you don't need to worry about repetition.
See the relevant MiscUtil page for details of the .NET StaticRandom class I wrote for this very purpose. (It's incredibly simple - nothing clever at all.)

Edit: I believe I mean option 3, as mentioned in another answer, i.e a global random manager, although within a specific class you can doe exactly the same thing.
Another advantage of using Option 2 is that if you ever require "replay" functionality in your software, you can simply store the seed which you used to initialise the RNG. Next time, you only need to force the RNG to use the stored seed and you will get the exact same set of behaviour, assuming that there are not other issues such as concurrency / threading which might change the order of execution.
You might want to do something like this if your software is running an experiment which requires a lot of randomness, but where you might wish to repeat a particular run to demonstrate to other people. It's also used a lot in computer games where the AI will take decisions based on weightings of possible choices, but with ultimately a random number "picking" what action they take.
It also makes debugging possible for transient bugs which only appear occasionally. If you don't store the seed of each run there is no way to recreate the exact conditions which caused the bug.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio