Infinite loops are taught as evil. Is there ever a good use?
When coding them by accident, the CPU peaks and I imagine memory does too, especially if assigning variables inside the loop.
If there is a good use, how are those issues prevented?
Basicly every operating system or server spins in an infinte loop.
To avoid these memory issues normally you wouldn't allocate memory inside the loop unless it can be freed later inside the same loop. For example you would allocate memory for a request and delete it once it was served.
To avoid cpu peaks you would wait for interrupts in case of an os or call a blocking function like poll() which waits for a new event once per iteration.
First of all, the word "infinite" in this phrase should be taken a bit more loosely. I am presuming you are talking about a while (true) loop with a break instruction, which will eventually end, as opposed to a loop which will run until the end of time and all humanity.
In the former sense, yes, there are use cases where it's appropriate:
Games use infinite game loops.
Embedded programs use infinite main loops.
Windows applications use infinite message loops.
One example where they might be used inappropriately is when they are used to create time delays by spinning the CPU, which is what novice programmers tend to do to avoid dealing with timer interrupts (or timer events, or other non-procedural constructs). However, when spinning the CPU is done to acquire a shared resource, then the "infinite loop" is also a perfectly valid implementation choice. Even the .NET CLR Monitor, for example, tries spinning for several hundred cycles before issuing a true wait on a kernel event handle and creating a more expensive thread switch.
In addition to programs that run on event loops (like the the system processes that #Christoph mentions), some languages have a concept known as a generator, that allow and even encourage you to write an infinite loop. The trick is that the object only runs for a finite time when it "yields" (returns) some expression. After that its state is "frozen" until it is needed again. For example, in Python you can have an object that alternates between LEFT and RIGHT:
def side():
while True:
yield "LEFT"
yield "RIGHT"
a = side()
print a.next()
print a.next()
print a.next()
Which would give LEFT RIGHT LEFT. The side function looks like an infinite loop with the statement While True:, but it will only ever run for a finite amount of time per call.
All the applications on your handset run in infinite event loops.
Related
I tried searching the net but couldn't find a perfect answer for this? What is the name of the procedure which is followed to avoid this situation where the other worker function will wait forever for this infinte loop to get over
if talking about processes infinite loop or no infinite loop linux does not care. the keyword you are looking for is 'preemption' and that can be looked up in any respectable os book
if there is an infinite loop not giving up the cpu in the kernel itself, that's a programming error. this is spotted with 'watchdog'.
Linux allows each thread of execution to run only for a specified quantum, before it is subject to being swapped out for another. It does not matter whether the process is in an infinite loop, the kernel has the power here.
As an aside (and I assume this is still the case with current kernels), threads which relinquish the CPU before their quantum is fully used can be given temporary priority boosts by the kernel as a reward for behaving nicely.
I understand that, in a endless loop or somewhere else, you could sleep(0) to leave the OS to perform a context-switching and execute another thread (if there is and it is ready to execute).
Now, I saw a bunch of code where people use sleep(1) instead of sleep(0).
Is this optimal?
Where may I found documentation about it?
If you're implementing something like 'check for the existence of a file, repeat until it exists, then continue', it's better to do a sleep(some_small_positive_number), so you don't use up 100% of CPU time.
Polling loops like this are almost always a sign of improper planning when used in a program, but are used often in command line scripts.
99.9% of the time, such short loops are a symptom of poor design, inadequate understanding of inter-thread comms or just laziness 'cos polling seems easier.
Most while(true) loops in multithreaded calls need no Sleep() calls at all because they block on some other call, I/O or inter-thread synchro objects.
In those cases where a loop does not block on anything, you still need no sleep() calls if the work being done is making real forward progress. Putting in a sleep() call just slows down real work. If the work has an undesirable impact on the system as a whole, lower the priority of the work threads instead of shoving in sleep() calls.
The evil is looping purely for the purpose of polling flags. This is done so often that sleep() itself is often regarded as intrinsically evil. It is not - it's the misuse of it that should stop.
There is not much, on modern OS, that requires polling. File systems, for example, give notifications upon file creation, eliminating the need to continually check and removing the latency and CPU-waste of sleep() loops.
Forgive me if this is not actually a race condition; I'm not that familiar with the nomenclature.
The problem I'm having is that this code runs slower with OpenMP enabled. I think the loop should be plenty big enough (k=100,000), so I don't think overhead is the issue.
As I understand it, a race condition is occurring here because all the loops are trying to access the same v(i,j) values all the time, slowing down the code.
Would the best fix here be to create as many copies of the v() array as threads and have each thread access a different one?
I'm using intel compiler on 16 cores, and it runs just slightly slower than on a single core.
Thanks all!
!$OMP PARALLEL DO
Do 500, k=1,n
Do 10, i=-(b-1),b-1
Do 20, j=-(b-1),b-1
if (abs(i).le.l.and.abs(j).eq.d) then
cycle
endif
v(i,j)=.25*(v(i+1,j)+v(i-1,j)+v(i,j+1)+v(i,j-1))
if (k.eq.n-1) then
vtest(i,j,1)=v(i,j)
endif
if (k.eq.n) then
vtest(i,j,2)=v(i,j)
endif
20 continue
10 continue
500 continue
!$OMP END PARALLEL DO
You certainly have programmed a race condition though I'm not sure that that is the cause of your program's failure to execute more quickly. This line
v(i,j)=.25*(v(i+1,j)+v(i-1,j)+v(i,j+1)+v(i,j-1))
which will be executed by all threads for the same (set of) values for i and j is where the racing happens. Given that your program does nothing to coordinate reads and writes to the elements of v your program is, in practice, not deterministic as there is no way to know the order in which updates to v are made.
You should have observed this non-determinism on inspecting the results of the program, and have noticed that changing the number of threads has an impact on the results too. Then again, with a long-running stencil operation over an array the results may have converged to the same (or similar enough) values.
OpenMP gives you the tools to coordinate access to variables but it doesn't automatically implement them; there is definitely nothing going on under the hood to prevent quasi-simultaneous reads from and writes to v. So the explanation for the lack of performance improvement lies elsewhere. It may be down to the impact of multiple threads on cache at some level in your system's memory hierarchy. A nice, cache-friendly, run over every element of an array in memory order for a serial program becomes a blizzard of (as far as the cache is concerned) random accesses to memory requiring access to RAM at every go.
It's possible that the explanation lies elsewhere. If the time to execute the OpenMP version is slightly longer than the time to execute a serial version I suspect that the program is not, in fact, being executed in parallel. Failure to compile properly is a common (here on SO) cause of that.
How to fix this ?
Well the usual pattern of OpenMP across an array is to parallelise on one of the array indices. The statements
!$omp parallel do
do i=-(b-1),b-1
....
end do
ensure that each thread gets a different set of values for i which means that they write to different elements of v, removing (almost) the data race. As you've written the program each thread gets a different set of values of k but that's not used (much) in the inner loops.
In passing, testing
if (k==n-1) then
and
if (k==n) then
in every iteration looks like you are tying an anchor to your program, why not just
do k=1,n-2
and deal with the updates to vtest at the end of the loop.
You could separate the !$omp parallel do like this
!$omp parallel
do k=1,n-2
!$omp do
do i=-(b-1),b-1
(and make the corresponding changes at the end of the parallel loop and region). Now all threads execute the entire contents of the parallel region but each gets its own set of i values to use. I recommend that you add clauses to your directives to specify the accessibility (eg private or shared) of each variable; but this answer is getting a bit too long and I won't go into more detail on these. Or on using a schedule clause.
Finally, of course, even with the changes I've suggested your program will be non-deterministic because this statement
v(i,j)=.25*(v(i+1,j)+v(i-1,j)+v(i,j+1)+v(i,j-1))
will read neighbouring elements from v which are updated (at a time you have no control over) by another thread. To sort that out ... got to go back to work.
i am trying to implement some custom lock-free structures. its operates similar to a stack so it has a take() and a free() method and operates on pointer and underlying array. typically it uses optimistic conncurrency. free() writes a dummy value to pointer+1 increments the pointer and writes the real value to the new address. take() reads the value at pointer in a spin/sleep style until it doesnt read the dummy value and then decrements the pointer. in both operations changes to the pointer are done with compare and swap and if it fails, the whole operation starts again. the purpose of the dummy value is to insure consistency since the write operation can be preempted after the pointer is incremented.
this situation leads me to wonder weather it is possible to prevent preemtion in that critical place by somhow determining how much time is left before the thread will be preempted by the scheduler for another thread. im not worried about hardware interrupts. im trying to eliminate the possible sleep from my reading function so that i can rely on a pure spin.
is this at all possible?
are there other means to handle this situation?
EDIT: to clarify how this may be helpful, if the critical operation is interrupted, it will effectively be like taking out an exclusive lock, and all other threads will have to sleep before they could continue with their operations
EDIT: i am not hellbent on having it solved like this, i am merely trying to see if its possible. the probability of that operation being interrupted in that location for a very long time is extremely unlikely and if it does happen it will be OK if all the other operations need to sleep so that it can complete.
some regard this as premature optimization, but this is just my pet project. regardless - that does not exclude research and sience from attempting to improve techniques. even though computer sience has reasonably matured and every new technology we use today is just an implementation of what was already known 40 years ago, we should not stop to be creative to address even the smallest of concerns, like trying to make a reasonable set of operations atomic woithout too much performance implications.
Such information surely exists somewhere, but it is of no use for you.
Under "normal conditions", you can expect upwards of a dozen DPCs and upwards of 1,000 interrupts per second. These do not respect your time slices, they occur when they occur. Which means, on the average, you can expect 15-16 interrupts within a time slice.
Also, scheduling does not strictly go quantum by quantum. The scheduler under present Windows versions will normally let a thread run for 2 quantums, but may change its opinion in the middle if some external condition changes (for example, if an event object is signalled).
Insofar, even if you know that you still have so and so many nanoseconds left, whatever you think you know might not be true at all.
Cnnot be done without time-travel. You're stuffed.
There is programs that is able to limit the CPU usage of processes in Windows. For example BES and ThreadMaster. I need to write my own program that does the same thing as these programs but with different configuration capabilities. Does anybody know how the CPU throttling of a process is done (code)? I'm not talking about setting the priority of a process, but rather how to limit it's CPU usage to for example 15% even if there is no other processes competing for CPU time.
Update: I need to be able to throttle any processes that is already running and that I have no source code access to.
You probably want to run the process(es) in a job object, and set the maximum CPU usage for the job object with SetInformationJobObject, with JOBOBJECT_CPU_RATE_CONTROL_INFORMATION.
Very simplified, it could work somehow like this:
Create a periodic waitable timer with some reasonable small wait time (maybe 100ms). Get a "last" value for each relevant process by calling GetProcessTimes once.
Loop forever, blocking on the timer.
Each time you wake up:
if GetProcessAffinityMask returns 0, call SetProcessAffinityMask(old_value). This means we've suspended that process in our last iteration, we're now giving it a chance to run again.
else call GetProcessTimes to get the "current" value
call GetSystemTimeAsFileTime
calculate delta by subtracting last from current
cpu_usage = (deltaKernelTime + deltaUserTime) / (deltaTime)
if that's more than you want call old_value = GetProcessAffinityMask followed by SetProcessAffinityMask(0) which will take the process offline.
This is basically a very primitive version of the scheduler that runs in the kernel, implemented in userland. It puts a process "to sleep" for a small amount of time if it has used more CPU time than what you deem right. A more sophisticated measurement maybe going over a second or 5 seconds would be possible (and probably desirable).
You might be tempted to suspend all threads in the process instead. However, it is important not to fiddle with priorities and not to use SuspendThread unless you know exactly what a program is doing, as this can easily lead to deadlocks and other nasty side effects. Think for example of suspending a thread holding a critical section while another thread is still running and trying to acquire the same object. Or imagine your process gets swapped out in the middle of suspending a dozen threads, leaving half of them running and the other half dead.
Setting the affinity mask to zero on the other hand simply means that from now on no single thread in the process gets any more time slices on any processor. Resetting the affinity gives -- atomically, at the same time -- all threads the possibility to run again.
Unluckily, SetProcessAffinityMask does not return the old mask as SetThreadAffinityMask does, at least according to the documentation. Therefore an extra Get... call is necessary.
CPU usage is fairly simple to estimate using QueryProcessCycleTime. The machine's processor speed can be obtained from HKLM\HARDWARE\DESCRIPTION\System\CentralProcessor\\~MHz (where is the processor number, one entry for each processor present). With these values, you can estimate your process' CPU usage and yield the CPU as necessary using Sleep() to keep your usage in bounds.