What compilelation stage does list scheduling work? - compilation

I am learning compilation techniques, and I have read the Dragon Book Chapter 10 instruction scheduling. I'm very confused that the code will be generated to a file line by line, so when will the list scheduling algorithm carry out? To be more concise, how to make the hardware know that we have scheduled the instruction and ask the hardware to follow our schedule? I think this is hard for me to understand.

Related

Computer programs from the point of view of CPU

This might sound a bit naive, but, I'm unable to find an appropriate answer to the question in my mind.
Let's say there is an algorithm X, which is implemented in 10 different programming languages. After the bootstrap stage for every program, each program executes the algorithm over and over again. My question is that Would there be any difference from the hardware-level when they all execute on the same CPU?
What I understand is that the set of hardware resources (register, etc.) on each CPU are limited. Hence, executing a core algorithm should follow a similar (if not identical) pattern from the "fetch–decode–execute" cycle.

Suggest an OpenMP program that has noticeble speedup and the most important concepts in it for a talk

I am going to have a lecture on OpenMP and I want to write an program using OpenMP lively . What program do you suggest that has the most important concept of OpenMP and has noticeable speedup? I want an awesome program example, please help me all of you that you are expert in OpenMP
you know I am looking for an technical and Interesting example with nice output.
I want to write two program lively , first one for better illustration of most important OpenMP concept and has impressive speedup and second-one as a hands-on that everyone must write that code at the same time
my audience may be very amateur
Personally I wouldn't say that the most impressive aspect of OpenMP is the scalability of the codes you can write with it. I'd say that a more impressive aspect is the ease with which one can take an existing serial program and, with only a few OpenMP directives, turn it into a parallel program with satisfactory scalability.
So I'd suggest that you take any program (or part of any program) of interest to your audience, better yet a program your audience is familiar with, and parallelise it right there and then in your lecture, lively as you put it. I'd be impressed if a lecturer could show me, say, a 4 times speedup on 8 cores with 5 minutes coding and a re-compilation. And that leads on to all sorts of interesting topics about why you don't (always, easily) get 8 times speedup on 8 cores.
Of course, like all stage illusionists, you'll have to choose your example carefully and rehearse to ensure that you do get an impressive-enough speedup to support your argument.
Personally I'd be embarrassed to use an embarrassingly parallel program for such a demo; the more perceptive members of the audience might be provoked into a response such as meh.
(1) Matrix multiply
Perhaps it's the most simple example (though matrix addition would be simpler).
(2) Mandelbrot
http://en.wikipedia.org/wiki/Mandelbrot_set
Mandelbrot is also embarrassingly parallel, and OpenMP can achieve decent speedups. You can even use graphics to visualize it. Mandelbrot is also an interesting example because it has workload imbalance. You may see different speedups based on scheduling policies (e.g., schedule(dynamic,1) vs. schedule(static)), and different threading libraries (e.g., Cilk Plus or TBB).
(3) A couple of mathematical kernels
For example, FFT (non-recursive version) is also embarrassingly parallelized.
Take a look at "OmpSCR" benchmarks: http://sourceforge.net/projects/ompscr/ This suite has simple OpenMP examples.

Implementing lottery scheduling algorithm in linux kernel

I have to implement a lottery scheduling algorithm for kernel. This question on SO is quite helpful but on older version of kernel. Moreover I would like to where are other scheduling algorithms are implemented. I mean where I can find the source code related to, say round robin scheduling, in the kernel source.
Any help appriciated

Program to measure small changes in reaction-time

I need some advice on writing a program that will be used as part of a psychology experiment. The program will track small changes in reaction time. The experimental subject will be asked to solve a series of very simple math problems (such as "2x4=" or "3+5="). The answer is always a single digit. The program will determine the time between the presentation of the problem and the keystroke that answers it. (Typical reaction times are on the order of 200-300 milliseconds.)
I'm not a professional programmer, but about twenty years ago, I took some courses in PL/I, Pascal, BASIC, and APL. Before I invest the time in writing the program, I'd like to know whether I can get away with using a programming package that runs under Windows 7 (this would be the easiest approach for me), or whether I should be looking at a real-time operating system. I've encountered conflicting opinions on this matter, and I was hoping to get some expert consensus.
I'm not relishing the thought of installing some sort of open-source Linux distribution that has real-time capabilities -- but if that's what it takes to get reliable data, then so be it.
Affect seems like it could save you the programming: http://ppw.kuleuven.be/leerpsy/affect4/index2.php. Concerning accuracy on a windows machine, read this.

Single-Tasking for programming competitions

I will start with the question and then proceed to explain the need:
Given a single C++ source code file which compiles well in modern g++ and uses nothing more than the standard library, can I set up a single-task operating system and run it there?
EDIT: Following the comment by Chris Lively, I would have better asked: What's the easiest way you can suggest to try to tweak linux into effectively giving me a single-tasking behavior.
Nevertheless, it seems like I did get a good answer although I did not phrase my question well enough. See the second paragraph in sarnold's answer regarding the scheduler.
Motivation:
In some programming competitions the communication between a contestant's program and the grading program involves a huge number of very short interactions.
Thus, using getrusage to measure the time spent by a contestant's program is inaccurate because getrusage works by sampling a process at constant intervals (usually around once per 10ms) which are too large compared to the duration of each interaction.
Another approach to timing would be to measure time before and after the program is run using something like *clock_gettime* and then subtract their values. We should also subtract the amount of time spent on I/O and this can be done be intercepting printf and scanf using something like LD_PRELOAD and accumulate the time spent in each of these functions by checking the time just before and just after each call to printf/scanf (it's OK to require contestants to use these functions only for I/O).
The method proposed in the last paragraph is ofcourse only true assuming that the contestant's program is the only program running which is why I want a single-tasking OS.
To run both the contestant's program and the grading program at the same time I would need a mechanism which, when one of these program tries to read input and blocks, runs the other program until it write enough output. I still consider this to be single tasking because the programs are not going to run at the same time. The "context switch" would occur when it is needed.
Note: I am aware of the fact that there are additional issues to timing such as CPU power management, but I would like to start by addressing the issue of context switches and multitasking.
First things first, what I think would suit your needs best would actually be a language interpreter -- one that you can instrument to keep track of "execution time" of the program in some purpose-made units like "mems" to represent memory accesses or "cycles" to represent the speeds of different instructions. Knuth's MMIX in his The Art of Computer Programming may provide exactly that functionality, though Python, Ruby, Java, Erlang, are all reasonable enough interpreters that can provide some instruction count / cost / memory access costs should you do enough re-writing. (But losing C++, obviously.)
Another approach that might work well for you -- if used with extreme care -- is to run your programming problems in the SCHED_FIFO or SCHED_RR real-time processing class. Programs run in one of these real-time priority classes will not yield for other processes on the system, allowing them to dominate all other tasks. (Be sure to run your sshd(8) and sh(1) in a higher real-time class to allow you to kill runaway tasks.)

Resources