I'm looking at evolving ants capable of food foraging behaviour using genetic programming, as described by Koza here. Each time step, I loop through each ant, executing its computer program (the same program is used by all ants in the colony). Currently, I have defined simple instructions like MOVE-ONE-STEP, TURN-LEFT, TURN-RIGHT, etc. But I also have a function PROGN that executes arguments in sequence. The problem I am having is that because PROGN can execute instructions in sequence, it means an ant can do multiple actions in a single time step. Unlike nature, I cannot run the ants in parallel, meaning one ant might go and perform several actions, manipulating the environment whilst all of the other ants are waiting to have their turn.
I'm just wondering, is this how it is normally done, or is there a better way? Koza does not seem to mention anything about it. Thing is, I want to expand the scenario to have other agents (e.g. enemies), which might rely on things occurring only once in a single time step.
I am not familiar with Koza's work, but I think a reasonable approach is to give each ant its own instruction queue that persists across time steps. By doing this, you can get the ants to execute PROGN functions one instruction per time step. For instance, the high-level logic for the time step of an ant can be:
Do-Time-Step(ant):
1. if ant.Q is empty: // then put the next instruction(s) into the queue
2. instructions <- ant.Get-Next-Instructions()
3. for instruction in instructions:
4. ant.Q.enqueue(instruction)
5. end for
6. end if
7. instruction <- ant.Q.dequeue() // get the next instruction in the queue
8. ant.execute(instruction) // have that ant do its job
Another similar approach to queuing instructions would be to preprocess the set of instructions an expand instances of PROGN to the set of component instructions. This would have to be done recursively if you allow PROGNs to invoke other PROGNs. The downside to this is that the candidate programs get a bit bloated, but this is only at runtime. On the other hand, it is easy, quick, and pretty easy to debug.
Example:
Say PROGN1 = {inst-p1 inst-p2}
Then the candidate program would start off as {inst1 PROGN1 inst2} and would be expanded to {inst1 inst-p1 inst-p2 inst2} when it was ready to be evaluated in simulation.
It all depends on your particular GP implementation.
In my GP kernel programs are either evaluated repeatedly or in parallel - as a whole, i.e. the 'atomic' operation in this scenario is a single program evaluation.
So all individuals in the population are repeated n times sequentially before evaluating the next program or all individuals are executed just once, then again for n times.
I've had pretty nice results with virtual agents using this level of concurrency.
It is definitely possible to break it down even more, however at that point you'll reduce the scalability of your algorithm:
While it is easy to distribute the evaluation of programs amongst several CPUs or cores it'll be next to worthless doing the same with per-node evaluation just due to the amount of synchronization required between all programs.
Given the rapidly increasing number of CPUs/cores in modern systems (even smartphones) and the 'CPU-hunger' of GP you might want to rethink your approach - do you really want to include move/turn instructions in your programs?
Why not redesign it to use primitives that store away direction and speed parameters in some registers/variables during program evaluation?
The simulation step then takes these parameters to actually move/turn your agents based on the instructions stored away by the programs.
evaluate programs (in parallel)
execute simulation
repeat for n times
evaluate fitness, selection, ...
Cheers,
Jay
Related
Let's say we have two integers a and b. which way is faster for swapping their values?
c=a;
a=b;
b=c;//(edited typo)
or
a=a+b;
b=a-b;
a=a-b;
or bitwise xor
a=a^b;
b=a^b;
a=a^b;
I'll test its performance differences when I'll be able but I'd like to know it now. Is it bitwise?
Firstly, you cannot quantify the speed of an algorithm independent of the program language, the compiler and the platform on which it is run. An algorithm is a mathematical abstraction.
Having said that:
for a typical programming language,
and a typical compiler, and
a typical execution platform,
the first version will typically be faster because it will typically compile to fewer native instructions that take less clock cycles to execute. The first version only requires load and save operations. The other two versions have (at least) the same number of loads and saves, and some additional arithmetic or bit manipulation instructions.
However, even that is not cut-and-dry.
The 2nd and 3rd examples are performing the swap without using a temporary variable. This is something you might do if using an extra temporary variable was expensive. This might happen on a machine which didn't provide enough general purpose registers, and the relative cost of loading / saving to memory was large. In some circumstances, the native code equivalents could be optimal.
However ... and this is the real point ... the best strategy is to leave this kind of decision to the compiler. Unless you are prepared to put a huge amount of effort into micro-optimizing, the compiler is likely to be able to a better job than you can. Indeed, writing code in "cunning ways" is liable to make it harder for the compiler to optimize. (In the 3rd case for example, the compiler would need to figure out that that sequence is actually swapping 2 variables before it can substitute the optimal instruction sequence. Chances are that the optimizer won't be able to do that.)
My goal is to implement a control algorithm written in Grafcet on a PLC. I am struggling with the difference of Grafcet as multi-process synchronous language and the single-core sequential PLC. Below is an example. What is the outcome of the Grafcet in the first cycle after the upper transition has fired? (a=1,x=1) or (a=1, x=0)?
I know that in SFC, it depends on the implementation of the engineering tool (e.g. Codesys, Multiprog) how actions are evaluated, typically from left to right. So for an SFC, (a=1,x=1) would be the answer. But since everything happens at the same time in Grafcet, I do not know how to handle this case.
Bonus points if someone can point out how I can learn more about the challenges of implementing languages like Grafcet on sequential machines.
Conditional actions are considered in not all Grafcet variants, but when they are, the behavior goes like this: as long as the step is active, turn on x while a is on.
If that's what you meant, though we may never find a conditional action formatted the way you did, x will be turned on within an infinitely short time after the two simultaneous steps are activated (at least that's my understanding based on the Grafcet evolution rules). So, the fact that the initial value of x is unpredictable - assuming that the two concurrent steps are activated at the very same time - should be actually no problem.
Moreover, as soon as the Grafcet is "implemented" in the real world (i.e. your single-core PLC), whether it's directly compiled by the engineering tool or converted into ladder diagram, an order of evaluation is necessarily chosen, as you said, and everything becomes deterministic, so your question is not a real problem when it comes to "implementing languages like Grafcet on sequential machines". You may find which are the real "challanges" by studying the canonical procedure for converting SFCs to ladder logic (detailed documentation is easily found on the web).
PLC's are single core, as you said, so... never 2 steps in the same moment of time.
There you have simultaneous branch, so both steps WILL execute. But clearly there will be one after another. By default, always the one from left. Please note that some PLC's allow you to change the order (never tried for simultaneous, but for divergent surely allow... such as RSLogix5000).
Simultaneous it's like having an AND. So you are telling processor execute first step AND second step. If you are familiar with Ladder Logic, I am sure this will be clear to you.
In the end, it should be a=1;x=1.
Also note that for other steps that are not simultaneous, there is one scan delay before evaluate next transition, which is a great thing. This is the most omitted thing when implementing a SFC in Ladder (and can lead to problems impossible to troubleshoot if you are not aware of it). I've seen this "bug" in about 50% of projects with ladder implementation and hundreds of projects so far. Example: If you have 10 consecutive transitions true, you are going from step 1 to step 10 in a single scan. Troubleshoot why motor didn't start :)
Tip: You can always use dummy steps in simultaneous branches to delay with 1 scan. So, if you want the other outcome (a=1,x=0), you can put a dummy step before left step.
My question is this:
Given a single line of isolated code (ie., does not parse, read/write to file on HDD, not a subroutine, etc.), is it possible to predict or measure with any degree of accuracy and/or consistency how long the code will take to execute on a given system?
For example, say I have the following code (not in any specific language, just a generalization):
1 | If x = 1 then
2 | x = x + 1
3 | End If
how long would it take to execute line 2?
I am not looking for numbers here; I am just wondering if such a thing is possible or practical.
Thanks!
Update:
Now I am looking for some numbers... If I were to execute a simple For loop that simply sleeps 1 second per iteration, after 60 (actual) minutes, how far off would it be? In other words, can the time taken to evaluate a line of isolated code be considered negligible (assuming no errors or interrupts)?
If you look at documentation such as this you will find how many clock cycles the fundamental operations take on a CPU, and how long those clock cycles are. Translating those numbers into the time taken to perform x = x+1 in your program is an imprecise science, but you'll get some kind of clue by examining the assembly code listing that your compiler produces.
The science becomes less and less precise as you move from simple arithmetic statements towards large programs and you start hitting all sorts of issues arising from the complexity of modern CPUs and modern operating systems and memory hierarchies and all the rest.
The simplest way would be to run it a billion times, see how much time it took, and then divide by one billion. This should remove the problems with clock accuracy.
Sounds like you might be interested in learning about Big O Notation. Using Big O, you you can figure out how to write the most efficient algorithms regardless of the speed of the machine on which it is running.
Most computers will usually have some processes, services or other "jobs" running in the background and consuming resources at different times for various reasons... which will tend to make it hard to predict exact execution times. However if you could break down the above code into assembly language and calculate the number of bits, registers and clock ticks that are used at the machine level, I would assume that it would be possible to get an accurate estimation.
The answer is a resounding no. For example consider what would happen if an interrupt was raised during the addition: the next instruction would be executed after the interrupt has been serviced, therefore giving the impression that the addition took an awful lot of time.
If we're talking about an ISR-disabled condition, then the answer is "it depends". For example, consider that a branch misprediction on the if would probably slow down the execution a lot. The same could be said about multicore CPUs (often some of the ALUs are shared between cores, so activity on a core could affect the performance of operations on other cores).
If we're talking about an ISR-disabled condition on a CPU with a single core and a single-stage pipeline then I guess you should be pretty much able to predict how long it takes. Problem is, you'd be probably working on a microcontroller (and an old one, at that). :)
update: on second thought, even in the last case you probably wouldn't be able to predict it exactly: in fact some operations are data-dependant (the canonical example being floating point operations on denormal numbers).
A way you can figure it out is writing that line of code 1000 times and then time how long it takes to run that code and then dividing the time by 1000, to get a more acurate time you can run that test 10 times and see the average time you got out of all those tests.
as it is so common with Fortran, I'm writing a massively parallel scientific code. In the beginning of my code I read my configuration file which tells me which type of solver I want to use. Now that means that in a subroutine (during the main run) I have
if(solver.eq.1)then
call solver1()
elseif(solver.eq.2)then
call solver2()
else
call solver3()
endif
Edit to avoid some confusion: This if is inside my time integration loop and I have one that is inside 3 nested loops.
Now my question is, wouldn't it be more efficient to use function pointers instead as the solver variable will not change during execution, except at the initialisation procedure.
Obviously function pointers are F2003. That shouldn't be a problem as long as I use gfortran 4.6. But I'm mainly using a BlueGene P, there is a f2003 compiler, so I suppose it's going to work there as well although I couldn't find any conclusive evidence on the web.
Knowing nothing about Fortran, this is my answer: The main problem with branching is that a CPU potentially cannot speculatively execute code across them. To mitigate this problem, branch prediction was introduced (which is very sophisticated in modern CPUs).
Indirect calls through a function pointer can be a problem for the prediction unit of the CPU. If it can't predict where the call will actually go, this will stall the pipeline.
I am quite sure that the CPU will correctly predict that your branch will always be taken or not taken because it is a trivial case of prediction.
Maybe the CPU can speculate across the indirect call, maybe it can't. This is why you need to test which is better.
If it cannot, you will certainly notice in your benchmark.
In addition, maybe you can hoist the if test out of your inner loop so it won't be called often. This will make the actual performance of the branch irrelevant.
If you only plan to use the function pointers once, at initialisation, and you are running codes on a BlueGene, isn't your concern for the efficiency mis-directed ? Generally, any initialisation which works is OK, if it takes 1sec instead of 1msec it's probably going to have 0 impact on total execution time.
Code initialisation routines for clarity, ease of modification, that sort of thing.
EDIT
My guess is that using function pointers rather than your current code will have no impact on execution speed. But it's just a (educated perhaps) guess and I'll be very interested in any data you gather on this question.
If you solver routines take a non-trivial runtime, then the trivial runtime of the IF statements is likely to be immaterial. If the sovler routines have a comparable runtine to the IF statement, then the total runtime is very short, so why do your care? This seems an optimization unlikely to pay off.
The first rule of runtime optimization is to profile your code is see what portions are consuming the runtime. Otherwise you are likely to optimize portions that are unimportant, which will accomplish nothing.
For what its worth, someone else recently had a very similar concern: Fortran Subroutine Pointers for Mismatching Array Dimensions
After a brief search I couldn't find the answer to the question, so I ran a little benchmark myself (see this link for the Makefile & dependencies). The benchmark consists of:
Draw random number to select method a, b, or c, which all perform a simple addition to their single integer argument
Call the chosen method 100 million times, using either a procedure pointer or if-statements
Repeat the above 5 times
The result with gfortran 4.8.5 on an CPU E5-2630 v3 # 2.40GHz is:
Time per call (proc. pointer): 1.89 ns
Time per call (if statement): 1.89 ns
In other words, there is not much of a performance difference!
I will start with the question and then proceed to explain the need:
Given a single C++ source code file which compiles well in modern g++ and uses nothing more than the standard library, can I set up a single-task operating system and run it there?
EDIT: Following the comment by Chris Lively, I would have better asked: What's the easiest way you can suggest to try to tweak linux into effectively giving me a single-tasking behavior.
Nevertheless, it seems like I did get a good answer although I did not phrase my question well enough. See the second paragraph in sarnold's answer regarding the scheduler.
Motivation:
In some programming competitions the communication between a contestant's program and the grading program involves a huge number of very short interactions.
Thus, using getrusage to measure the time spent by a contestant's program is inaccurate because getrusage works by sampling a process at constant intervals (usually around once per 10ms) which are too large compared to the duration of each interaction.
Another approach to timing would be to measure time before and after the program is run using something like *clock_gettime* and then subtract their values. We should also subtract the amount of time spent on I/O and this can be done be intercepting printf and scanf using something like LD_PRELOAD and accumulate the time spent in each of these functions by checking the time just before and just after each call to printf/scanf (it's OK to require contestants to use these functions only for I/O).
The method proposed in the last paragraph is ofcourse only true assuming that the contestant's program is the only program running which is why I want a single-tasking OS.
To run both the contestant's program and the grading program at the same time I would need a mechanism which, when one of these program tries to read input and blocks, runs the other program until it write enough output. I still consider this to be single tasking because the programs are not going to run at the same time. The "context switch" would occur when it is needed.
Note: I am aware of the fact that there are additional issues to timing such as CPU power management, but I would like to start by addressing the issue of context switches and multitasking.
First things first, what I think would suit your needs best would actually be a language interpreter -- one that you can instrument to keep track of "execution time" of the program in some purpose-made units like "mems" to represent memory accesses or "cycles" to represent the speeds of different instructions. Knuth's MMIX in his The Art of Computer Programming may provide exactly that functionality, though Python, Ruby, Java, Erlang, are all reasonable enough interpreters that can provide some instruction count / cost / memory access costs should you do enough re-writing. (But losing C++, obviously.)
Another approach that might work well for you -- if used with extreme care -- is to run your programming problems in the SCHED_FIFO or SCHED_RR real-time processing class. Programs run in one of these real-time priority classes will not yield for other processes on the system, allowing them to dominate all other tasks. (Be sure to run your sshd(8) and sh(1) in a higher real-time class to allow you to kill runaway tasks.)