Kernel stalls when accessing serial device on FPGA - linux-kernel

I have two UART devices on an FPGA exposed to Linux on an Altera Cyclone V SoC. I have modified the DTS to incorporate these devices, and Linux picks them up on boot:
[ 0.879942] (NULL device *): ttyAL0 at MMIO 0xff200400 (irq = 41, base_baud = 3125000) is a Altera UART
[ 0.890050] (NULL device *): ttyAL1 at MMIO 0xff200420 (irq = 44, base_baud = 3125000) is a Altera UART
Resulting in a ttyAL0 and ttyAL1 in /dev/. The devices also appear in the relevant device subdirectory in /sys/devices/soc/ with the driver symlink present, for example:
lrwxrwxrwx 1 root root 0 Jun 20 10:36 driver -> ../../../bus/platform/drivers/altera_uart
-rw-r--r-- 1 root root 4096 Jun 20 10:36 driver_override
-r--r--r-- 1 root root 4096 Jun 20 10:36 modalias
drwxr-xr-x 2 root root 0 Jun 20 10:36 power
lrwxrwxrwx 1 root root 0 Jun 20 10:36 subsystem -> ../../../bus/platform
-rw-r--r-- 1 root root 4096 Jun 20 10:36 uevent
However if I try to open the port either programmatically, or with cat or setserial, there is a 20s stall before the RCU scheduler throws an exception:
[ 202.242133] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=2102 jiffies, g=124, c=123, q=254)
[ 202.252516] INFO: Stall ended before state dump start
[ 223.252109] INFO: rcu_sched self-detected stall on CPU { 0} (t=2100 jiffies g=125 c=124 q=229)
[ 223.260843] Task dump for CPU 0:
[ 223.264066] klogd R running 0 954 1 0x00000002
[ 223.270566] [<c0017984>] (unwind_backtrace) from [<c00137e0>] (show_stack+0x20/0x24)
[ 223.278319] [<c00137e0>] (show_stack) from [<c004b6cc>] (sched_show_task+0xb0/0x104)
[ 223.286045] [<c004b6cc>] (sched_show_task) from [<c004e34c>] (dump_cpu_task+0x48/0x4c)
[ 223.293941] [<c004e34c>] (dump_cpu_task) from [<c006ae60>] (rcu_dump_cpu_stacks+0xa0/0xcc)
[ 223.302188] [<c006ae60>] (rcu_dump_cpu_stacks) from [<c006e520>] (rcu_check_callbacks+0x488/0x790)
[ 223.311137] [<c006e520>] (rcu_check_callbacks) from [<c0072db0>] (update_process_times+0x50/0x70)
[ 223.319982] [<c0072db0>] (update_process_times) from [<c0083258>] (tick_sched_timer+0x78/0x27c)
[ 223.328656] [<c0083258>] (tick_sched_timer) from [<c00735f4>] (__run_hrtimer+0x90/0x1bc)
[ 223.336719] [<c00735f4>] (__run_hrtimer) from [<c0073ef4>] (hrtimer_interrupt+0x140/0x31c)
[ 223.344955] [<c0073ef4>] (hrtimer_interrupt) from [<c0016b58>] (twd_handler+0x40/0x50)
[ 223.352867] [<c0016b58>] (twd_handler) from [<c00669bc>] (handle_percpu_devid_irq+0x90/0x124)
[ 223.361364] [<c00669bc>] (handle_percpu_devid_irq) from [<c0062684>] (generic_handle_irq+0x3c/0x4c)
[ 223.370377] [<c0062684>] (generic_handle_irq) from [<c0062948>] (__handle_domain_irq+0x6c/0xb4)
[ 223.379042] [<c0062948>] (__handle_domain_irq) from [<c00086b0>] (gic_handle_irq+0x34/0x6c)
[ 223.387362] [<c00086b0>] (gic_handle_irq) from [<c0014380>] (__irq_svc+0x40/0x54)
[ 223.394811] Exception stack(0xded29cf8 to 0xded29d40)
[ 223.399842] 9ce0: 00000001 c06cb200
[ 223.407986] 9d00: 00000000 00000000 c0687b34 00000000 00000082 00000001 df418800 c06c416c
[ 223.416128] 9d20: ded28000 ded29d9c 00000000 ded29d40 c06cb200 c0029330 200f0113 ffffffff
[ 223.424285] [<c0014380>] (__irq_svc) from [<c0029330>] (__do_softirq+0xc4/0x2f0)
[ 223.431656] [<c0029330>] (__do_softirq) from [<c00297f8>] (irq_exit+0x88/0xc0)
[ 223.438851] [<c00297f8>] (irq_exit) from [<c006294c>] (__handle_domain_irq+0x70/0xb4)
[ 223.446649] [<c006294c>] (__handle_domain_irq) from [<c00086b0>] (gic_handle_irq+0x34/0x6c)
[ 223.454965] [<c00086b0>] (gic_handle_irq) from [<c0014380>] (__irq_svc+0x40/0x54)
[ 223.462412] Exception stack(0xded29e08 to 0xded29e50)
[ 223.467443] 9e00: dfbd3540 df782ac0 00000000 0000996f df59d6c0 dfbd3540
[ 223.475584] 9e20: c0695e20 00000000 df59c1c0 df59c540 ded28030 ded29e6c ded29e70 ded29e50
[ 223.483725] 9e40: c047bad0 c004756c 600f0013 ffffffff
[ 223.488762] [<c0014380>] (__irq_svc) from [<c004756c>] (finish_task_switch+0x78/0x11c)
[ 223.496661] [<c004756c>] (finish_task_switch) from [<c047bad0>] (__schedule+0x230/0x5f4)
[ 223.504726] [<c047bad0>] (__schedule) from [<c047bed4>] (schedule+0x40/0x8c)
[ 223.511746] [<c047bed4>] (schedule) from [<c0061a58>] (do_syslog+0x51c/0x5a8)
[ 223.518855] [<c0061a58>] (do_syslog) from [<c0061b00>] (SyS_syslog+0x1c/0x20)
[ 223.525968] [<c0061b00>] (SyS_syslog) from [<c000f820>] (ret_fast_syscall+0x0/0x30)
I don't know why this is happening but I have noticed two interesting (i.e. wrong) things about how Linux sees my devices. The first is that their IRQs, even though correctly reported during boot and any bind/unbind operations, are not listed in /proc/interrupts (they would appear as ff200400.serial2 and ff200420.serial3):
CPU0 CPU1
29: 47565 47091 GIC 29 twd
74: 0 0 GIC 74 0009
75: 0 0 GIC 75 000A
76: 0 0 GIC 76 000A
77: 0 0 GIC 77 0004
78: 0 0 GIC 78 0003
79: 0 0 GIC 79 0006
80: 0 0 GIC 80 0011
81: 0 0 GIC 81 0011
82: 0 0 GIC 82 0010
171: 10554 0 GIC 171 dw-mci
186: 0 0 GIC 186 dw_spi65535
190: 0 0 GIC 190 ffc04000.i2c
191: 0 0 GIC 191 ffc05000.i2c
192: 0 0 GIC 192 ffc06000.i2c
193: 0 0 GIC 193 ffc07000.i2c
194: 465 0 GIC 194 serial
199: 0 0 GIC 199 timer0
207: 0 0 GIC 207 fpga-mgr
IPI0: 0 0 CPU wakeup interrupts
IPI1: 0 0 Timer broadcast interrupts
IPI2: 591 3015 Rescheduling interrupts
IPI3: 0 0 Function call interrupts
IPI4: 1 5 Single function call interrupts
IPI5: 0 0 CPU stop interrupts
IPI6: 0 0 IRQ work interrupts
IPI7: 0 0 completion interrupts
Err: 0
The other observation is that in /sys/class/tty, the ttyAL* entries are links to virtual devices instead of the physical ones:
...
lrwxrwxrwx 1 root root 0 Jun 20 10:49 tty8 -> ../../devices/virtual/tty/tty8
lrwxrwxrwx 1 root root 0 Jun 20 10:49 tty9 -> ../../devices/virtual/tty/tty9
lrwxrwxrwx 1 root root 0 Jun 20 10:49 ttyAL0 -> ../../devices/virtual/tty/ttyAL0
lrwxrwxrwx 1 root root 0 Jun 20 10:49 ttyAL1 -> ../../devices/virtual/tty/ttyAL1
lrwxrwxrwx 1 root root 0 Jun 20 10:49 ttyS0 -> ../../devices/soc/ffc02000.serial0/tty/ttyS0
lrwxrwxrwx 1 root root 0 Jun 20 10:49 ttyS1 -> ../../devices/soc/ffc03000.serial1/tty/ttyS1
lrwxrwxrwx 1 root root 0 Jun 20 10:49 ttyp0 -> ../../devices/virtual/tty/ttyp0
lrwxrwxrwx 1 root root 0 Jun 20 10:49 ttyp1 -> ../../devices/virtual/tty/ttyp1
...
You can see the other two physical devices ttyS0 and ttyS1 ('real' UARTs on the ARM part of the SoC), I expected my devices to be in the same format. If you refer to the /sys/devices/soc/ device subdirectory listing above, you'll notice that it does not have a corresponding tty subdirectory - presumably part of the reason why I have a virtual TTY associated with the device.
So my question is: Why is my physical serial device appearing as virtual, and is that the reason I'm suffering kernel stalls?
In case I am missing vital information in the DTS, here are my UART additions:
uart2: serial2#ff200400 {
compatible = "altr,uart-1.0";
reg = <0xff200400 0x20>;
interrupts = <0 9 4>;
clock-frequency = <50000000>;
current-speed = <115200>;
};
uart3: serial3#ff200420 {
compatible = "altr,uart-1.0";
reg = <0xff200420 0x20>;
interrupts = <0 12 4>;
clock-frequency = <50000000>;
current-speed = <115200>;
};
They are child nodes of a soc node where the interrupt controller is specified.

I finally discovered the issue, and it's unsurprising judging from the RCU scheduler stack trace: My IRQs are wrong.
I don't quite understand the exact mechanics of it as I'm not a firmware engineer, but the UART modules were on a IRQ offset of 40, so the their IRQs were not 9 and 12 as I thought, but 49 and 52. Updating the DTS to match caused everything to work as expected.

Related

rsync: failed to set times on "/cygdrive/e/.": Invalid argument (22)

I get the below error message when I try to rsync from a local hard disk to a USB disk mounted at E: on Windows 10.
rsync: failed to set times on "/cygdrive/e/.": Invalid argument (22)
My rsync command is as below (path shortened for brevity):
rsync -rtv --delete --progress --modify-window=5 /cygdrive/d/path/to/folder/ /cygdrive/e/
I actually need to set modification times (on directories as well) and rsync actually sets modification times perfectly. It only fails to set times on root of the USB disk.
I experienced exactly the same problem.
I created a dir containing one text file and when trying to rsync it to an removable (USB) drive, I got the error. However, the file was copied to the destination. The problem is not reproducible if the destination is a folder (other than root) on the removable drive
I then repeated the process using a fixed drive as destination, and the problem was not reproducible
The 1st difference that popped up between the 2 drives, was the file system (for more details, check [MS.Docs]: File Systems Technologies):
FAT32 - on the removable drive
NTFS - on the fixed one
So this was the cause of my failure. Formatting the USB drive as NTFS fixed the problem:
The USB drive formatted as FAT32 (default):
cfati#cfati-e5550-0 /cygdrive/e/Work/Dev/StackOverflow/q045006385
$ ll /cygdrive/
total 20
dr-xr-xr-x 1 cfati None 0 Jul 14 17:58 .
drwxrwx---+ 1 cfati None 0 Jun 9 15:04 ..
d---r-x---+ 1 NT SERVICE+TrustedInstaller NT SERVICE+TrustedInstaller 0 Jul 13 22:21 c
drwxrwx---+ 1 SYSTEM SYSTEM 0 Jul 14 13:19 e
drwxr-xr-x 1 cfati None 0 Dec 31 1979 n
drwxr-xr-x 1 cfati None 0 Dec 31 1979 w
cfati#cfati-e5550-0 /cygdrive/e/Work/Dev/StackOverflow/q045006385
$ rsync -rtv --progress --modify-window=5 ./dir/ /cygdrive/w
sending incremental file list
rsync: failed to set times on "/cygdrive/w/.": Invalid argument (22)
./
a.txt
3 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=0/2)
sent 111 bytes received 111 bytes 444.00 bytes/sec
total size is 3 speedup is 0.01
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1196) [sender=3.1.2]
cfati#cfati-e5550-0 /cygdrive/e/Work/Dev/StackOverflow/q045006385
$ ll /cygdrive/
total 20
dr-xr-xr-x 1 cfati None 0 Jul 14 17:58 .
drwxrwx---+ 1 cfati None 0 Jun 9 15:04 ..
d---r-x---+ 1 NT SERVICE+TrustedInstaller NT SERVICE+TrustedInstaller 0 Jul 13 22:21 c
drwxrwx---+ 1 SYSTEM SYSTEM 0 Jul 14 13:19 e
drwxr-xr-x 1 cfati None 0 Dec 31 1979 n
drwxr-xr-x 1 cfati None 0 Dec 31 1979 w
After formatting the USB drive as NTFS:
cfati#cfati-e5550-0 /cygdrive/e/Work/Dev/StackOverflow/q045006385
$ ll /cygdrive/
total 24
dr-xr-xr-x 1 cfati None 0 Jul 14 17:59 .
drwxrwx---+ 1 cfati None 0 Jun 9 15:04 ..
d---r-x---+ 1 NT SERVICE+TrustedInstaller NT SERVICE+TrustedInstaller 0 Jul 13 22:21 c
drwxrwx---+ 1 SYSTEM SYSTEM 0 Jul 14 13:19 e
drwxr-xr-x 1 cfati None 0 Dec 31 1979 n
drwxrwxrwx+ 1 Administrators Administrators 0 Jul 14 17:59 w
cfati#cfati-e5550-0 /cygdrive/e/Work/Dev/StackOverflow/q045006385
$ rsync -rtv --progress --modify-window=5 ./dir/ /cygdrive/w
sending incremental file list
./
a.txt
3 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=0/2)
sent 111 bytes received 38 bytes 298.00 bytes/sec
total size is 3 speedup is 0.02
cfati#cfati-e5550-0 /cygdrive/e/Work/Dev/StackOverflow/q045006385
$ ll /cygdrive/
total 24
dr-xr-xr-x 1 cfati None 0 Jul 14 17:59 .
drwxrwx---+ 1 cfati None 0 Jun 9 15:04 ..
d---r-x---+ 1 NT SERVICE+TrustedInstaller NT SERVICE+TrustedInstaller 0 Jul 13 22:21 c
drwxrwx---+ 1 SYSTEM SYSTEM 0 Jul 14 13:19 e
drwxr-xr-x 1 cfati None 0 Dec 31 1979 n
drwxrwxrwx+ 1 Administrators Administrators 0 Jul 14 13:19 w
As a side note, when I was at step #2., I was an idiot and kept the --delete arg, so til I hit Ctrl + C, it deleted some data. Luckily, it didn't get to delete crucial files / folders.

Listening to a different process's socket?

I have one process (PID1) that does:
exec 3<>/dev/tcp/127.0.0.1/12713
And when I do:
$ ls -lh /proc/self/fd/
lrwx------ 1 0 0 64 Mar 24 12:19 0 -> /dev/pts/9
lrwx------ 1 0 0 64 Mar 24 12:19 1 -> /dev/pts/9
lrwx------ 1 0 0 64 Mar 24 12:19 2 -> /dev/pts/9
lrwx------ 1 0 0 64 Mar 24 12:20 255 -> /dev/pts/9
lrwx------ 1 0 0 64 Mar 24 12:19 3 -> socket:[83968639]
Now let's say I have a second process PID2, is it possible to read the socket opened through the PID1?
I have tried:
exec 1>/proc/PID1/fd/3
but i get the error message: No such device or address
My scenario has the PID1 writing to the socket and PID2 reading it. (basically for experimentation with the file descriptors)

Strange behaviour ssh -> bash --> (tty no echo) --> c program

I'll try to be as clear as possible (sorry for any inconvenience)
At job we have an old C program which works with industrial hand terminals from Honeywell.
That terminal has its own ssh client to connect to a linux redhat 6.6 server.
Once it is connected to the linux box (using a certain user), a C program is launched by the bash shell with the following parameters
export TERM=vt200
stty raw icrnl -echo
$APLI_EXEC/program param1 param2
so the flow is like => client ssh --> ssh server-> bash --> c program
The application (or it seems) is working fine but sometimes (1-3-5 times per week) a randomly terminal stops receiving data from the server but the application receives the inputs from it. It is like if you writes Ctrl+S in a shell
Debuging the application and the ssh process using strace I realized about something strange:
The app strace is fine
write(1, "1", 7) = 1
but the strace of the ssh process is not fine (I think.. and yes I saw the ioctl no echo param, but...)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(3, "\227\316\242\350\261\330)\300e\210\352\367\2VX\24\305\2474\272\371\34\273n{\323p.\211\17H\327"..., 16384) = 48
select(14, [3 9], [11], NULL, {900, 0}) = 1 (out [11], left {899, 999996})
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(11, "1", 1) = 1
ioctl(11, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 -opost -isig -icanon -echo ...}) = 0
select(14, [3 9], [], NULL, {900, 0} <<<<
file descriptor used by the ssh process:
lr-x------ 1 root root 64 Feb 15 17:12 9 -> pipe:[383586491]
lr-x------ 1 root root 64 Feb 15 17:12 8 -> /var/lib/sss/mc/group
lrwx------ 1 root root 64 Feb 15 17:12 7 -> socket:[383586484]
lrwx------ 1 root root 64 Feb 15 17:12 6 -> socket:[383586478]
lrwx------ 1 root root 64 Feb 15 17:12 5 -> socket:[383586458]
lrwx------ 1 root root 64 Feb 15 17:12 4 -> socket:[383586457]
lrwx------ 1 root root 64 Feb 15 17:12 3 -> socket:[383585929]
lrwx------ 1 root root 64 Feb 15 17:12 2 -> /dev/null
lrwx------ 1 root root 64 Feb 15 17:12 14 -> /dev/ptmx
lrwx------ 1 root root 64 Feb 15 17:12 13 -> /dev/ptmx
lrwx------ 1 root root 64 Feb 15 17:12 11 -> /dev/ptmx
l-wx------ 1 root root 64 Feb 15 17:12 10 -> pipe:[383586491]
lrwx------ 1 root root 64 Feb 15 17:12 1 -> /dev/null
lrwx------ 1 root root 64 Feb 15 17:12 0 -> /dev/null
In the select call, I miss fd #11 or fd #13 in there
Comparing this with another call
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(3, "\365\354\354C\10|\336-\4\342\327B0P\275&\213)\367\32\24\333)#\364\355V\3\237\337\33\204"..., 16384) = 52
select(14, [3 9 13], [11], NULL, {900, 0}) = 1 (out [11], left {899, 999997})
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(11, "a", 1) = 1
ioctl(11, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
select(14, [3 9 13], [], NULL, {900, 0} <<<
What it is going on with fd #13 in the other call?
Is it possible the C program is doing something to lock the file descriptor of ssh? I don't think so because the ssh process is owned by root and the C program is running by a normal user, but who knows
It is possible the hand terminal sends a combination of ctrl keys which they 'hangs' the standard output?
I ran out of ideas.. Can anybody drive me to the right direction?
Thanks in advance
Nacho.

Memory leak debugging with windbg without user stack trace

I have a full memory dump but in this instance I don't have a user stack trace database to go with it, I have up to date symbols and the original binaries that go with the dump, normally, I've been able to use the !heap -p -a address to view the call stack at the moment of allocation but this won't work without the user stack trace database.
My question is whether there's another way (albeit less direct approach) to get at the source of this memory leak.
LFH Key : 0x0000005c2dc22701
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-------------------------------------------------------------------------------------
00000000002e0000 00000002 3125248 3122792 3125248 282 378 197 0 7 LFH
0000000000010000 00008000 64 4 64 1 1 1 0 0
0000000000530000 00001002 1088 416 1088 51 10 2 0 0 LFH
0000000000490000 00001002 512 284 512 5 5 1 0 0 LFH
0000000000af0000 00001002 1088 248 1088 2 2 2 0 0 LFH
0000000000c00000 00001002 64 8 64 3 1 1 0 0
0000000000de0000 00001002 512 8 512 3 1 1 0 0
0000000000ac0000 00001002 31616 30356 31616 1810 42 6 0 0 LFH
00000000012c0000 00001002 512 8 512 2 1 1 0 0
0000000002140000 00001003 512 88 512 49 7 1 0 N/A
0000000001ab0000 00001003 512 8 512 5 1 1 0 N/A
00000000022f0000 00001003 512 8 512 5 1 1 0 N/A
0000000002490000 00001003 512 8 512 5 1 1 0 N/A
0000000000d40000 00001003 512 8 512 5 1 1 0 N/A
0000000002690000 00001003 512 8 512 5 1 1 0 N/A
0000000002860000 00001003 512 8 512 5 1 1 0 N/A
0000000002e90000 00001002 512 8 512 2 2 1 0 0
0000000002e10000 00001002 1536 556 1536 40 6 2 0 0 LFH
0000000001b90000 00011002 512 8 512 3 2 1 0 0
00000000033e0000 00001002 512 8 512 3 2 1 0 0
-------------------------------------------------------------------------------------
As you can see from this heap summary (!heap -s), heap 00000000002e0000 has grown pretty large, on closer inspection is can see that 70% of the data is allocated in blocks of size 0x4058, 0x23d1 and 0x10d1 (which is definitely some kind of pattern) so I'm pretty sure I want to investigate that further.
heap # 00000000002e0000
group-by: TOTSIZE max-display: 20
size #blocks total ( %) (percent of total busy bytes)
4058 1ea - 7b2870 (39.56)
23d1 1dc - 42989c (21.39)
10d1 1ed - 20627d (10.40)
c51 1f4 - 180e34 (7.73)
307 25b - 7217d (2.29)
378 1f9 - 6d7b8 (2.20)
188 40e - 63570 (1.99)
c0 59f - 43740 (1.35)
30 12c7 - 38550 (1.13)
28 147e - 333b0 (1.03)
140 22a - 2b480 (0.87)
138 231 - 2abb8 (0.86)
2340 11 - 25740 (0.75)
100 244 - 24400 (0.73)
120 1ea - 22740 (0.69)
78 456 - 20850 (0.65)
1010 12 - 12120 (0.36)
10188 1 - 10188 (0.32)
10008 1 - 10008 (0.32)
4000 4 - 10000 (0.32)
My problem is that I don't know where to go from here, previously I've followed the instructions found here with great success but this time around I don't have a user stack trace database and I can't easily reproduce this pattern but I know the memory dump contains a lot of useful information I'm just not sure how to go about getting at something meaningful from here. Windbg experts? Memory dump analysts? Please advice.
Some blocks, first couple of bytes
0:000> dc 0000000005254b80
00000000`05254b80 52474d45 00000000 050f1c40 00000000 EMGR....#.......
00000000`05254b90 00000000 00000000 00000001 00000001 ................
00000000`05254ba0 00000400 000003ff 0001d4c0 00000001 ................
00000000`05254bb0 524d4954 00000000 051fcd10 00000000 TIMR............
00000000`05254bc0 f7b315d0 000007fe 05254b80 00000000 .........K%.....
00000000`05254bd0 00000000 00000000 05254bd8 00000000 .........K%.....
00000000`05254be0 05254bd8 00000000 05254be8 00000000 .K%......K%.....
00000000`05254bf0 05254be8 00000000 05254bf8 00000000 .K%......K%.....
0:000> dc 00000000051ce640
00000000`051ce640 52474d45 00000000 04f1ab00 00000000 EMGR............
00000000`051ce650 00000000 00000000 00000001 00000001 ................
00000000`051ce660 00000400 000003ff 0001d4c0 00000001 ................
00000000`051ce670 524d4954 00000000 05037070 00000000 TIMR....pp......
00000000`051ce680 f7b315d0 000007fe 051ce640 00000000 ........#.......
00000000`051ce690 00000000 00000000 051ce698 00000000 ................
00000000`051ce6a0 051ce698 00000000 051ce6a8 00000000 ................
00000000`051ce6b0 051ce6a8 00000000 051ce6b8 00000000 ................
0:000> dc 0000000004fdb1f0
00000000`04fdb1f0 52474d45 00000000 04f1b570 00000000 EMGR....p.......
00000000`04fdb200 00000000 00000000 00000001 00000001 ................
00000000`04fdb210 00000400 000003ff 0001d4c0 00000001 ................
00000000`04fdb220 524d4954 00000000 04ed6ba0 00000000 TIMR.....k......
00000000`04fdb230 f7b315d0 000007fe 04fdb1f0 00000000 ................
00000000`04fdb240 00000000 00000000 04fdb248 00000000 ........H.......
00000000`04fdb250 04fdb248 00000000 04fdb258 00000000 H.......X.......
00000000`04fdb260 04fdb258 00000000 04fdb268 00000000 X.......h.......
0:000> dc 0000000001e649b0
00000000`01e649b0 52474d45 00000000 00351270 00000000 EMGR....p.5.....
00000000`01e649c0 00000000 00000000 00000001 00000001 ................
00000000`01e649d0 00000400 000003ff 0001d4c0 00000001 ................
00000000`01e649e0 524d4954 00000000 01e64130 00000000 TIMR....0A......
00000000`01e649f0 f7b315d0 000007fe 01e649b0 00000000 .........I......
00000000`01e64a00 00000000 00000000 01e64a08 00000000 .........J......
00000000`01e64a10 01e64a08 00000000 01e64a18 00000000 .J.......J......
00000000`01e64a20 01e64a18 00000000 01e64a28 00000000 .J......(J......
Use the !heap -flt s on the offending size(s) (with logging to file)
Then manually dump the contents on some of them and try to guess what kind of data they contain.
If you are lucky it’s C++ objects with a vtable address in first DWORD which make them “easy” to recognize.
If not, use dc , dds commands and try to figure out what the contents is.
Another approach is to find types which have corresponding size to those you suspect leaking.
============================Find symbols of spesific size===================================
0:011> dt -v -s a4 <MyDll>!*
Enumerating symbols matching <MyDll>!*, Size = 0xa4
Address Size Symbol
0a4 <MyDll>!NMDATETIMEFORMATW
0a4 <MyDll>!CWinApp
0a4 <MyDll>!CWinApp
==> Check all modules
!for_each_module ".echo ##ModuleName;dt -v -s a4 ${##ModuleName}!*"
You can also try to find heap blocks which has a pointer to a leak suspect
0:008> !heap -srch 09C07058
_HEAP # 02C90000
in HEAP_ENTRY: Size : Prev Flags - UserPtr UserSize - state
0B7DA920: 002c : 002c [01] - 0B7DA928 (00000158) - (busy)
diasymreader!Mod1::`vftable'

Calculating CPU usage from /proc/stat

When reading /proc/stat, I get these return values:
cpu 20582190 643 1606363 658948861 509691 24 112555 0 0 0
cpu0 3408982 106 264219 81480207 19354 0 35 0 0 0
cpu1 3395441 116 265930 81509149 11129 0 30 0 0 0
cpu2 3411003 197 214515 81133228 418090 0 1911 0 0 0
cpu3 3478358 168 257604 81417703 30421 0 29 0 0 0
cpu4 1840706 20 155376 83328751 1564 0 7 0 0 0
cpu5 1416488 15 171101 83410586 1645 13 108729 0 0 0
cpu6 1773002 7 133686 83346305 25666 10 1803 0 0 0
cpu7 1858207 10 143928 83322929 1819 0 8 0 0 0
Some sources state to read only the first four values to calculate CPU usage, while some sources say to read all the values.
Do I read only the first four values to calculate CPU utilization; the values user, nice, system, and idle? Or do I need all the values? Or not all, but more than four? Would I need iowait, irq, or softirq?
cpu 20582190 643 1606363
Versus the entire line.
cpu 20582190 643 1606363 658948861 509691 24 112555 0 0 0
Edits: Some sources also state that iowait is added into idle.
When calculating a specific process' CPU usage, does the method differ?
The man page states that it varies with architecture, and also gives a couple of examples describing how they are different:
In Linux 2.6 this line includes three additional columns: ...
Since Linux 2.6.11, there is an eighth column, ...
Since Linux 2.6.24, there is a ninth column, ...
When "some people said to only use..." they were probably not taking these into account.
Regarding whether the calculation differs across CPUs: You will find lines related to "cpu", "cpu0", "cpu1", ... in /proc/stat. The "cpu" fields are all aggregates (not averages) of corresponding fields for the individual CPUs. You can check that for yourself with a simple awk one-liner.
cpu 84282 747 20805 1615949 44349 0 308 0 0 0
cpu0 26754 343 9611 375347 27092 0 301 0 0 0
cpu1 12707 56 2581 422198 5036 0 1 0 0 0
cpu2 33356 173 6160 394561 7508 0 4 0 0 0
cpu3 11464 174 2452 423841 4712 0 1 0 0 0

Resources