i'm using ubuntu 11.04 now and using v2lin to port my program from vxWorks tolinux. I have problem with clock_getres().
with this code:
struct timespec res;
clock_getres(CLOCK_REALTIME, &res);
i have res.tv_nsec = 1 , which is somehow not correct.
Like this guy showed: http://forum.kernelnewbies.org/read.php?6,377,423 , there is difference between kernel 2.4 and 2.6.
So what should be the correct value for the clock resolution in kernel 2.6
Thanks
According to "include/linux/hrtimer.h" file from kernel sources, clock_getres() will always return 1ns (one nanosecond) for high-resolution timers (if there are such timers in the system). This value is hardcoded and it means: "Timer's value will be rounded to it"
http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/include/linux/hrtimer.h
269 /*
270 * The resolution of the clocks. The resolution value is returned in
271 * the clock_getres() system call to give application programmers an
272 * idea of the (in)accuracy of timers. Timer values are rounded up to
273 * this resolution values.
274 */
275 # define HIGH_RES_NSEC 1
276 # define KTIME_HIGH_RES (ktime_t) { .tv64 = HIGH_RES_NSEC }
277 # define MONOTONIC_RES_NSEC HIGH_RES_NSEC
278 # define KTIME_MONOTONIC_RES KTIME_HIGH_RES
For low-resolution timers (and for MONOTONIC and REALTIME clocks if there is no hrtimer hardware), linux will return 1/HZ (typical HZ is from 100 to 1000; so value will be from 1 to 10 ms):
http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/include/linux/ktime.h#L321
321 #define LOW_RES_NSEC TICK_NSEC
322 #define KTIME_LOW_RES (ktime_t){ .tv64 = LOW_RES_NSEC }
Values from low-resolution timers may be rounded to such low precision (effectively they are like jiffles, the linux kernel "ticks").
PS: This post http://forum.kernelnewbies.org/read.php?6,377,423 as I can understand, compares 2.4 linux without hrtimers enabled (implemented) with 2.6 kernel with hrtimers available. So all values are correct.
Try to get it from procfs.
cat /proc/timer_list
Why do you think it is incorrect?
For example, on modern x86 CPUs the kernel uses the TSC to provide high resolution clocks - any CPU running at higher than 1Ghz has a TSC that ticks over faster than a tick per nanosecond, so nanosecond resolution is quite common.
Related
I develop software which usually includes both OpenGL and Nvidia CUDA SDK. Recently, I also started to seek ways to optimize run-time memory footprint. I noticed the following (Debug and Release builds differ only by 4-7 Mb):
Application startup - Less than 1 Mb total
OpenGL 4.5 context creation ( + GLEW loader init) - 45 Mb total
CUDA 8.0 context (Driver API) creation 114 Mb total.
If I create OpenGL context in "headless" mode, the GL context uses 3 Mb less, which probably goes to default frame buffers allocation. That makes sense as the window size is 640x360.
So after OpenGL and CUDA context are up, the process already consumes 114 Mb.
Now, I don't have deep knowledge regarding OS specific stuff that occurs under the hood during GL and CUDA context creation, but 45 Mb for GL and 68 for CUDA seems a whole lot to me. I know that usually several megabytes goes to system frame buffers, function pointers,(probably a bulk of allocations happens on driver side). But hitting over 100 Mb with just "empty" contexts looks too much.
I would like to know:
Why GL/CUDA context creation consumes such a considerable amount of memory?
Are there ways to optimize that?
The system setup under test:
Windows 10 64bit. NVIDIA GTX 960 GPU (Driver Version:388.31). 8 Gb RAM. Visual Studio 2015, 64bit C++ console project.
I measure memory consumption using Visual Studio built-in Diagnostic Tools -> Process Memory section.
UPDATE
I tried Process Explorer, as suggested by datenwolf. Here is the screenshot of what I got, (my process at the bottom marked with yellow):
I would appreciate some explanation on that info. I was always looking at "Private Bytes" in "VS Diagnostic Tools" window. But here I see also "Working Set", "WS Private" etc. Which one correctly shows how much memory my process currently uses? 281,320K looks way too much, because as I said above, the process at the startup does nothing, but creates CUDA and OpenGL contexts.
Partial answer: This is an OS-specific issue; on Linux, CUDA takes 9.3 MB.
I'm using CUDA (not OpenGL) on GNU/Linux:
CUDA version: 10.2.89
OS distribution: Devuan GNU/Linux Beowulf (~= Debian Buster without systemd)
Kernel: Linux 5.2.0
Processor: Intel x86_64
To check how much memory gets used by CUDA when creating a context, I ran the following C program (which also checks what happens after context destruction):
#include <stdio.h>
#include <cuda.h>
#include <malloc.h>
#include <stdlib.h>
static void print_allocation_stats(const char* s)
{
printf("%s:\n", s);
printf("--------------------------------------------------\n");
malloc_stats();
printf("--------------------------------------------------\n\n");
}
int main()
{
display_mallinfo("Initially");
int status = cuInit(0);
if (status != 0 ) { return EXIT_FAILURE; }
print_allocation_stats("After CUDA driver initialization");
int device_id = 0;
unsigned flags = 0;
CUcontext context_id;
status = cuCtxCreate(&context_id, flags, device_id);
if (status != CUDA_SUCCESS ) { return EXIT_FAILURE; }
print_allocation_stats("After context creation");
status = cuCtxDestroy(context_id);
if (status != CUDA_SUCCESS ) { return EXIT_FAILURE; }
print_allocation_stats("After context destruction");
return EXIT_SUCCESS;
}
(note that this uses a glibc-specific function, not in the standard library.)
Summarizing the results and snipping irrelevant parts:
Point in program
Total bytes
In-use
Max MMAP Regions
Max MMAP bytes
Initially
135168
1632
0
0
After CUDA driver initialization
552960
439120
2
307200
After context creation
9314304
6858208
8
6643712
After context destruction
7016448
580688
8
6643712
So CUDA starts with 0.5 MB and after allocating a context takes up 9.3 MB (going back down to 7.0 MB on destroying the context). 9 MB is still a lot of memory for not having done anything; but - maybe some of it is all-zeros, or uninitialized, or copy-on-write, in which case it doesn't really take up that much memory.
It's possible that memory use improved dramatically over the two years between the driver release with CUDA 8 and with CUDA 10, but I doubt it. So - it looks like your problem is Windows specific.
Also, I should mention I did not create an OpenGL context - which is another part of OP's question; so I haven't estimated how much memory that takes. OP brings up the question of whether the sum is greater than its part, i.e. whether a CUDA context would take more memory if an OpenGL context existed as well; I believe this should not be the case, but readers are welcome to try and report...
I need to emulate MDC/MDIO bus using the bit-banging for MDC line. I need to get a clock with frequency of 1.5 Mhz, 1 Mhz will also do.
I am trying to use udelay and ndelay from linux/delay.h. I am working with kernel 2.6.32 and MPC8569E processor from freescale. ndelay is not giving me dealy in nanoseconds but microseconds, I saw it using a logic analyzer on the wires. So ndelay(1) and udelay(1) are effectively behaving the same, giving 1 microsecond delay.
Now In a bit-bang model the codes gonna be something like
par_io_data_set(2/*C port*/,30 /*MDIO pin*/,val /*value*/); //write data to the line MDIO line
//clock pulse for setting the data
ndelay(MDIO_DELAY);
par_io_data_set(2/*C port*/,31 /*MDC pin*/,1 /*value*/);
ndelay(MDIO_DELAY);
par_io_data_set(2/*C port*/,31 /*MDC pin*/,0 /*value*/);
Where I have defined MDIO_DELAY as 1. So I am able to get a clock of around 0.4MHz. I want to bitbang at 1.5 Mhz, but I can't just do so until I can't give delay in nanoseconds.
So I was looking at chapter 7 of ldd, jiffies. Well my HZ is 250, so the kernel interrupts for time every 1/250 secs i.e. 4milli secs right? So jiffies gonna be incremented every 4ms. So I can't expect jiffies to get a counter of the order of nano-seconds, right?
How do I get this job done?
I wanted to know how much time "1ms sleep" takes.
Ran this quest in kernel module:
rdtscl(aj);
msleep(1);
rdtscl(b);
printk(KERN_INFO "Difference = %lu", (b-a));// Number of clock cycles consumed
Output i got:
Difference = 13479219
Output for cat /proc/cpuinfo
cpu MHz : 1197.000
With that, I calculated the delay, which i got 11.26 milli second.
Why am i not getting it around 1 ms ?
UPDATE:
The Processor frequency in cat /proc/cpuinfo sholud be got from the following line:
model name : Intel(R) Core(TM) i3 CPU 540 # 3.07GHz
=> the Processor frequency is 3.07 GHz. Dont know what is the meaning of this line "cpu MHz : 1197.000" though.
Thanks
The process resolution depends on the HZ value configured on the system that you had run the test code. The HZ value can be 100 or 1000, if its 100 then the scheduler will wake up only once in 10 ms. Mostly in desktop systems, in the recent distributions, it will be set to 1000. (You can check in the config file in /boot in Fedora). The scheduler will schedule only based on that, so if the scheduler wakes up once in every 10 ms, then there is no way of getting resolutions lesser than 10 ms. Or you need to use HR timers in the kernel.
kernel-3.4.5 (u3-1 *)$ cat /boot/config-3.6.10-4.fc18.x86_64 | grep HZ
CONFIG_NO_HZ=y
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
If you really want the delay but without sleeping, then you can use mdelay, which will just loop for the specified amount of time and return.
What is the maximum size that we can allocate using kzalloc() in a single call?
This is a very frequently asked question. Also please let me know if i can verify that value.
The upper limit (number of bytes that can be allocated in a single kmalloc / kzalloc request), is a function of:
the processor – really, the page size – and
the number of buddy system freelists (MAX_ORDER).
On both x86 and ARM, with a standard page size of 4 Kb and MAX_ORDER of 11, the kmalloc upper limit on a single call is 4 MB!
Details, including explanations and code to test this, here:
http://kaiwantech.wordpress.com/2011/08/17/kmalloc-and-vmalloc-linux-kernel-memory-allocation-api-limits/
No different to kmalloc(). That's the question you should ask (or search), because kzalloc is just a thin wrapper that sets GFP_ZERO.
Up to about PAGE_SIZE (at least 4k) is no problem :p. Beyond that... you're right to say lots of people people have asked, it's definitely something you have to think about. Apparently it depends on the kernel version - there used to be a hard 128k limit, but it's been increased (or maybe dropped altogether) now. That's just the hard limit though, what you can actually get depends on a given system. (And very definitely on the kernel version).
Maybe read What is the difference between vmalloc and kmalloc?
You can always "verify" the allocation by checking the return value from kzalloc(), but by then you've probably already logged an allocation failure backtrace. Other than that, no - I don't think there's a good way to check in advance.
However, it depends on your kernel version and config. These limits normally locate in linux/slab.h, usually descripted as below(this example is under linux 2.6.32):
#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
(MAX_ORDER + PAGE_SHIFT - 1) : 25)
#define KMALLOC_MAX_SIZE (1UL << KMALLOC_SHIFT_HIGH)
#define KMALLOC_MAX_ORDER (KMALLOC_SHIFT_HIGH - PAGE_SHIFT)
And you can test them with code below:
#include <linux/module.h>
#include <linux/slab.h>
int init_module()
{
printk(KERN_INFO "KMALLOC_SHILFT_LOW:%d, KMALLOC_SHILFT_HIGH:%d, KMALLOC_MIN_SIZE:%d, KMALLOC_MAX_SIZE:%lu\n",
KMALLOC_SHIFT_LOW, KMALLOC_SHIFT_HIGH, KMALLOC_MIN_SIZE, KMALLOC_MAX_SIZE);
return 0;
}
void cleanup_module()
{
return;
}
Finally, the results under linux 2.6.32 32bits are: 3, 22, 8, 4194304, it means the min size is 8 bytes, and the max size is 4MB.
PS.
you can also check the actual size of memory allocated by kmalloc, just use ksize(), i.e.
void *p = kmalloc(15, GFP_KERNEL);
printk(KERN_INFO "%u\n", ksize(p)); /* this will print "16" under my kernel */
I'm getting ready to release a tool that is only effective with regular hard drives, not SSD (solid state drive). In fact, it shouldn't be used with SSD's because it will result in a lot of read/writes with no real effectiveness.
Anyone knows of a way of detecting if a given drive is solid-state?
Finally a reliable solution! Two of them, actually!
Check /sys/block/sdX/queue/rotational, where sdX is the drive name. If it's 0, you're dealing with an SSD, and 1 means plain old HDD.
I can't put my finger on the Linux version where it was introduced, but it's present in Ubuntu's Linux 3.2 and in vanilla Linux 3.6 and not present in vanilla 2.6.38. Oracle also backported it to their Unbreakable Enterprise kernel 5.5, which is based on 2.6.32.
There's also an ioctl to check if the drive is rotational since Linux 3.3, introduced by this commit. Using sysfs is usually more convenient, though.
You can actually fairly easily determine the rotational latency -- I did this once as part of a university project. It is described in this report. You'll want to skip to page 7 where you see some nice graphs of the latency. It goes from about 9.3 ms to 1.1 ms -- a drop of 8.2 ms. That corresponds directly to 60 s / 8.2 ms = 7317 RPM.
It was done with simple C code -- here's the part that measures the between positions aand b in a scratch file. We did this with larger and larger b values until we have been wandered all the way around a cylinder:
/* Measure the difference in access time between a and b. The result
* is measured in nanoseconds. */
int measure_latency(off_t a, off_t b) {
cycles_t ta, tb;
overflow_disk_buffer();
lseek(work_file, a, SEEK_SET);
read(work_file, buf, KiB/2);
ta = get_cycles();
lseek(work_file, b, SEEK_SET);
read(work_file, buf, KiB/2);
tb = get_cycles();
int diff = (tb - ta)/cycles_per_ns;
fprintf(stderr, "%i KiB to %i KiB: %i nsec\n", a / KiB, b / KiB, diff);
return diff;
}
This command lsblk -d -o name,rota lists your drives and has a 1 at ROTA if it's a rotational disk and a 0 if it's an SSD.
Example output :
NAME ROTA
sda 1
sdb 0
Detecting SSDs is not as impossible as dseifert makes out. There is already some progress in linux's libata (http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg03625.html), though it doesn't seem user-ready yet.
And I definitely understand why this needs to be done. It's basically the difference between a linked list and an array. Defragmentation and such is usually counter-productive on a SSD.
You could get lucky by running
smartctl -i sda
from Smartmontools. Almost all SSDs has SSD in the Model field. No guarantee though.
My two cents to answering this old but very important question... If a disk is accessed via SCSI, then you will (potentially) be able to use SCSI INQUIRY command to request its rotational rate. VPD (Vital Product Data) page for that is called Block Device Characteristics and has a number 0xB1. Bytes 4 and 5 of this page contain a number with meaning:
0000h "Medium rotation rate is not reported"
0001h "Non-rotating medium (e.g., solid state)"
0002h - 0400h "Reserved"
0401h - FFFEh "Nominal medium rotation rate in rotations per minute (i.e.,
rpm) (e.g., 7 200 rpm = 1C20h, 10 000 rpm = 2710h, and 15 000 rpm = 3A98h)"
FFFFh "Reserved"
So, SSD must have 0001h in this field. The T10.org document about this page can be found here.
However, the implementation status of this standard is not clear to me.
I wrote the following javascript code. I needed to determine if machine was ussing SSD drive and if it was boot drive. The solution uses MSFT_PhysicalDisk WMI interface.
function main()
{
var retval= false;
// MediaType - 0 Unknown, 3 HDD, 4 SSD
// SpindleSpeed - -1 has rotational speed, 0 has no rotational speed (SSD)
// DeviceID - 0 boot device
var objWMIService = GetObject("winmgmts:\\\\.\\root\\Microsoft\\Windows\\Storage");
var colItems = objWMIService.ExecQuery("select * from MSFT_PhysicalDisk");
var enumItems = new Enumerator(colItems);
for (; !enumItems.atEnd(); enumItems.moveNext())
{
var objItem = enumItems.item();
if (objItem.MediaType == 4 && objItem.SpindleSpeed == 0)
{
if (objItem.DeviceID ==0)
{
retval=true;
}
}
}
if (retval)
{
WScript.Echo("You have SSD Drive and it is your boot drive.");
}
else
{
WScript.Echo("You do not have SSD Drive");
}
return retval;
}
main();
SSD devices emulate a hard disk device interface, so they can just be used like hard disks. This also means that there is no general way to detect what they are.
You probably could use some characteristics of the drive (latency, speed, size), though this won't be accurate for all drives. Another possibility may be to look at the S.M.A.R.T. data and see whether you can determine the type of disk through this (by model name, certain values), however unless you keep a database of all drives out there, this is not gonna be 100% accurate either.
write text file
read text file
repeat 10000 times...
10000/elapsed
for an ssd will be much higher, python3:
def ssd_test():
doc = 'ssd_test.txt'
start = time.time()
for i in range(10000):
with open(doc, 'w+') as f:
f.write('ssd test')
f.close()
with open(doc, 'r') as f:
ret = f.read()
f.close()
stop = time.time()
elapsed = stop - start
ios = int(10000/elapsed)
hd = 'HDD'
if ios > 6000: # ssd>8000; hdd <4000
hd = 'SSD'
print('detecting hard drive type by read/write speed')
print('ios', ios, 'hard drive type', hd)
return hd