getloadavg() within kernel

getloadavg() within kernel - linux-kernel

Is there an equivalent api like getloadavg() that can be used within the kernel i.e. for my own driver ?
I have a driver that is thrashing and I would like to throttle it, and i am looking for a kernel-api to find about the cpu usage.
Thank you.

You're probably looking for the get_avenrun() function in kernel/sched.c. An example of how to use it is in fs/proc/loadavg.c:
static int loadavg_proc_show(struct seq_file *m, void *v)
{
unsigned long avnrun[3];
get_avenrun(avnrun, FIXED_1/200, 0);
seq_printf(m, "%lu.%02lu %lu.%02lu %lu.%02lu %ld/%d %d\n",
LOAD_INT(avnrun[0]), LOAD_FRAC(avnrun[0]),
LOAD_INT(avnrun[1]), LOAD_FRAC(avnrun[1]),
LOAD_INT(avnrun[2]), LOAD_FRAC(avnrun[2]),
nr_running(), nr_threads,
task_active_pid_ns(current)->last_pid);
return 0;
}
Though I'm a little skeptical of how you can use the load average to modify a driver -- the load average is best treated as a heuristic for system administrators to gauge how their system changes over time, not necessarily how "healthy" it might be at any given moment -- what specifically in the driver is causing troubles? There's probably a better mechanism to make it play nicely with the rest of the system.

Related

How to call MmGetPhysicalMemoryRanges in driver to get memory range?

I am writing a driver in which i want the exact range of RAM. I came to know about memory manager routines inside windows kernel. I am planning to include MmGetPhysicalMemoryRanges routine in my driver also to get memory range.
I don't know how to add these routines into driver..
Anyone please tell me how to write this routine??What is its syntax???

NTKERNELAPI
PPHYSICAL_MEMORY_RANGE
MmGetPhysicalMemoryRanges (
VOID
);
Where PHYSICAL_MEMORY_RANGE is:
typedef struct _PHYSICAL_MEMORY_RANGE {
PHYSICAL_ADDRESS BaseAddress;
LARGE_INTEGER NumberOfBytes;
} PHYSICAL_MEMORY_RANGE, *PPHYSICAL_MEMORY_RANGE;

How to simulate slow DVD drive?

Does anyone know any feasible ways to simulate a slow DVD drive, e.g. over a mounted DMG/ISO image?
The goal is to program against a slow drive, thus a simulation requirement. Any ideas would be greatly appreciated!
Update: again, the goal would be to simulate a slow I/O process. Tools like Network Conditioner or Charles do not provide a solution unfortunately. trickle is out of date and no more actively developed :(

With hdiutil, you could mount a disk image over a specially crafted HTTP server, but you do not control the OS cache and I/O slowness would not be fine-grained. I would suggest two non-network solutions.
Inserting slowness in I/O system calls
You could do slow-down I/O system calls, for example through DYLD_INSERT_LIBRARIES. This approach is quite easy and this is what I would try first.
You simply create a library with read(2) and pread(2) implementations like:
/* slow.c */
#define SLEEP_TIMESPEC {0, 25000000} // 25ms
ssize_t
read(int fildes, void *buf, size_t nbyte) {
struct timespec s = SLEEP_TIMESPEC;
(void) nanosleep(&s, NULL);
return (ssize_t) syscall(SYS_read, fildes, buf, nbyte);
}
ssize_t
pread(int d, void *buf, size_t nbyte, off_t offset) {
struct timespec s = SLEEP_TIMESPEC;
(void) nanosleep(&s, NULL);
return (ssize_t) syscall(SYS_pread, d, buf, nbyte, offset);
}
You might also need to implement readv(2). You simply need to compile this C code as a shared library and set DYLD_INSERT_LIBRARIES to load this library before running your program. You will probably also need to define DYLD_FORCE_FLAT_NAMESPACE. See dyld(1).
clang -arch i386 -arch x86_64 -shared -Wall slow.c -o slow.dylib
(The library is compiled universally, as the AIR app I had on disk was actually i386, not x86_64).
To test the library, simply do:
env DYLD_INSERT_LIBRARIES=slow.dylib DYLD_FORCE_FLAT_NAMESPACE=1 cat slow.c
You might want to try with values higher than 25ms for cat, e.g. 1 second which can be inlined as {1, 0}. Likewise, you should start your application from the command line:
env DYLD_INSERT_LIBRARIES=slow.dylib DYLD_FORCE_FLAT_NAMESPACE=1 path/to/YourApp.app/Contents/MacOS/YourApp
This will slow down every read calls (even through higher-level APIs). Some read operations will not be affected (e.g. mmap(2)), however, and you might want to slow down I/Os on some files but not on others. This later case can be handled by trapping open(2) but requires more work.
25ms per read accesses is enough to make any AIR app noticeably slower. Of course, you should adjust this value to your needs.
Working with a slower file system
Alternatively, you could implement a Fuse plug-in. This is especially easy if you start from LoopbackFS (C or ObjC).
Indeed, you can very easily call nanosleep(2) in the readFileAtPath:userData:buffer:size:offset:error: method or loopback_read function.

Well, read and write speeds for the first DVD drives and players were of 1350 kB/s (or 1,32 MB/s); this speed is usually called "1×" and is the slowest speed that you can find in any DVD drive.
USB 1.1 flash drives have nominal speed of 12Mbps or 1,5MB/s, the actual speed is really close to a 1x DVD Drive.
So why don't you copy your image to an old USB 1.1 flash drive and mount it from there? If you don't have any you can find used ones in eBay.
I should also mention that broadband internet connections like ADSL2 or HSPA (3.5G) have actual download speed about 12Mbps which is what you want.

How to check which index in a loop is executing without slow down process?

What is the best way to check which index is executing in a loop without too much slow down the process?
For example I want to find all long fancy numbers and have a loop like
for( long i = 1; i > 0; i++){
//block
}
and I want to learn which i is executing in real time.
Several ways I know to do in the block are printing i every time, or checking if(i % 10000), or adding a listener.
Which one of these ways is the fastest. Or what do you do in similar cases? Is there any way to access the value of the i manually?

Most of my recent experience is with Java, so I'd write something like this
import java.util.concurrent.atomic.AtomicLong;
public class Example {
public static void main(String[] args) {
AtomicLong atomicLong = new AtomicLong(1); // initialize to 1
LoopMonitor lm = new LoopMonitor(atomicLong);
Thread t = new Thread(lm);
t.start(); // start LoopMonitor
while(atomicLong.get() > 0) {
long l = atomicLong.getAndIncrement(); // equivalent to long l = atomicLong++ if atomicLong were a primitive
//block
}
}
private static class LoopMonitor implements Runnable {
private final AtomicLong atomicLong;
public LoopMonitor(AtomicLong atomicLong) {
this.atomicLong = atomicLong;
}
public void run() {
while(true) {
try {
System.out.println(atomicLong.longValue()); // Print l
Thread.sleep(1000); // Sleep for one second
} catch (InterruptedException ex) {}
}
}
}
}
Most AtomicLong implementations can be set in one clock cycle even on 32-bit platforms, which is why I used it here instead of a primitive long (you don't want to inadvertently print a half-set long); look into your compiler / platform details to see if you need something like this, but if you're on a 64-bit platform then you can probably use a primitive long regardless of which language you're using. The modified for loop doesn't take much of an efficiency hit - you've replaced a primitive long with a reference to a long, so all you've added is a pointer dereference.

It won't be easy, but probably the only way to probe the value without affecting the process is to access the loop variable in shared memory with another thread. Threading libraries vary from one system to another, so I can't help much there (on Linux I'd probably use pthreads). The "monitor" thread might do something like probe the value once a minute, sleep()ing in between, and so allowing the first thread to run uninterrupted.

To have a null cost reporting (on multi-cpu computers) : set your index as a "global" property (class-wide for instance), and have a separate thread to read and report the index value.
This report could be timer-based (5 times per seconds or so).
Rq : Maybe you'll need also a boolean stating 'are we in the loop ?'.

Volatile and Caches
If you're going to be doing this in, say, C / C++ and use a separate monitor thread as previously suggested then you'll have to make the global/static loop variable volatile. You don't want the compiler decide deciding to use a register for the loop variable. Some toolchains make that assumption anyway, but there's no harm being explicit about it.
And then there's the small issue of caches. A separate monitor thread nowadays will end up on a separate core, and that'll mean that the two separate cache subsystems will have to agree on what the value is. That will unavoidably have a small impact on the runtime of the loop.
Real real time constraint?
So that begs the question of just how real time is your loop anyway? I doubt that your timing constraint is such that you're depending on it running within a specific number of CPU clock cycles. Two reasons, a) no modern OS will ever come close to guaranteeing that, you'd have to be running on the bare metal, b) most CPUs these days vary their own clock rate behind your back, so you can't count on a specific number of clock cycles corresponding to a specific real time interval.
Feature rich solution
So assuming that your real time requirement is not that constrained, you may wish to do a more capable monitor thread. Have a shared structure protected by a semaphore which your loop occasionally updates, and your monitor thread periodically inspects and reports progress. For best performance the monitor thread would take the semaphore, copy the structure, release the semaphore and then inspect/print the structure, minimising the semaphore locked time.
The only advantage of this approach over that suggested in previous answers is that you could report more than just the loop variable's value. There may be more information from your loop block that you'd like to report too.
Mutex semaphores in, say, C on Linux are pretty fast these days. Unless your loop block is very lightweight the runtime overhead of a single mutex is not likely to be significant, especially if you're updating the shared structure every 1000 loop iterations. A decent OS will put your threads on separate cores, but for the sake of good form you'd make the monitor thread's priority higher than the thread running the loop. This would ensure that the monitoring does actually happen if the two threads do end up on the same core.

How can you implement a condition variable using semaphores?

A while back I was thinking about how to implement various synchronization primitives in terms of one another. For example, in pthreads you get mutexes and condition variables, and from these can build semaphores.
In the Windows API (or at least, older versions of the Windows API) there are mutexes and semaphores, but no condition variables. I think that it should be possible to build condition variables out of mutexes and semaphores, but for the life of me I just can't think of a way to do so.
Does anyone know of a good construction for doing this?

Here's a paper from Microsoft Research [pdf] which deals with exactly that.

One way of implementing X given semaphores is to add a server process to the system, use semaphores to communicate with it, and have the process do all the hard work of implementing X. As an academic exercise, this might be cheating, but it does get the job done, and it can be more robust to misbehaviour by the client processes, or to their sudden death.

I may be missing something here, but there seem to be a simpler way to implement a Condition from a Semaphore and Lock than the way described in the paper.
class Condition {
sem_t m_sem;
int m_waiters;
int m_signals;
pthread_mutex_t *m_mutex;
public:
Condition(pthread_mutex_t *_mutex){
sem_init(&this->m_sem,0,0);
this->m_waiters = 0;
this->m_signals = 0;
this->m_mutex = _mutex;
}
~Condition(){}
void wait();
void signal();
void broadcast();
};
void Condition::wait() {
this->m_waiters++;
pthread_mutex_unlock(this->m_mutex);
sem_wait(&this->m_sem);
pthread_mutex_lock(this->m_mutex);
this->m_waiters--;
this->m_signals--;
}
void Condition::signal() {
pthread_mutex_lock(this->m_mutex);
if (this->m_waiters && (this->m_waiters > this->m_signals)) {
sem_post(&this->m_sem);
this->m_signals++;
}
pthread_mutex_unlock(this->m_mutex);
}

How can I find out how much of address space the application is consuming and report this to user?

I'm writing the memory manager for an application, as part of a team of twenty-odd coders. We're running out of memory quota and we need to be able to see what's going on, since we only appear to be using about 700Mb. I need to be able to report where it's all going - fragmentation etc. Any ideas?

You can use existing memory debugging tools for this, I found Memory Validator 1 quite useful, it is able to track both API level (heap, new...) and OS level (Virtual Memory) allocations and show virtual memory maps.
The other option which I also found very usefull is to be able to dump a map of the whole virtual space based on VirtualQuery function. My code for this looks like this:
void PrintVMMap()
{
size_t start = 0;
// TODO: make portable - not compatible with /3GB, 64b OS or 64b app
size_t end = 1U<<31; // map 32b user space only - kernel space not accessible
SYSTEM_INFO si;
GetSystemInfo(&si);
size_t pageSize = si.dwPageSize;
size_t longestFreeApp = 0;
int index=0;
for (size_t addr = start; addr<end; )
{
MEMORY_BASIC_INFORMATION buffer;
SIZE_T retSize = VirtualQuery((void *)addr,&buffer,sizeof(buffer));
if (retSize==sizeof(buffer) && buffer.RegionSize>0)
{
// dump information about this region
printf(.... some buffer information here ....);
// track longest feee region - usefull fragmentation indicator
if (buffer.State&MEM_FREE)
{
if (buffer.RegionSize>longestFreeApp) longestFreeApp = buffer.RegionSize;
}
addr += buffer.RegionSize;
index+= buffer.RegionSize/pageSize;
}
else
{
// always proceed
addr += pageSize;
index++;
}
}
printf("Longest free VM region: %d",longestFreeApp);
}

You can also find out information about the heaps in a process with Heap32ListFirst/Heap32ListNext, and about loaded modules with Module32First/Module32Next, from the Tool Help API.
'Tool Help' originated on Windows 9x. The original process information API on Windows NT was PSAPI, which offers functions which partially (but not completely) overlap with Tool Help.

Our (huge) application (a Win32 game) started throwing "Not enough quota" exceptions recently, and I was charged with finding out where all the memory was going. It is not a trivial job - this question and this one were my first attempts at finding out. Heap behaviour is unexpected, and accurately tracking how much quota you've used and how much is available has so far proved impossible. In fact, it's not particularly useful information anyway - "quota" and "somewhere to put things" are subtly and annoyingly different concepts. The accepted answer is as good as it gets, although enumerating heaps and modules is also handy. I used DebugDiag from MS to view the true horror of the situation, and understand how hard it is to actually thoroughly track everything.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

getloadavg() within kernel - linux-kernel

Is there an equivalent api like getloadavg() that can be used within the kernel i.e. for my own driver ? I have a driver that is thrashing and I would like to throttle it, and i am looking for a kernel-api to find about the cpu usage. Thank you.

Related

How to call MmGetPhysicalMemoryRanges in driver to get memory range?

How to simulate slow DVD drive?

How to check which index in a loop is executing without slow down process?

How can you implement a condition variable using semaphores?

How can I find out how much of address space the application is consuming and report this to user?

Categories

Resources