SIG 11 for G-WAN w/o much debug information - g-wan

The gwan (4.1.18) instance runs for about one day, and then crashed with this tracing information (Not suer how useful it is:( How to get meaningful tracing information?)
----------------
----------------
Sun, 27 Jan 2013 20:07:34 GMT: warning: no loadable sections found in added symbol-file /home/gwan/gwan
0x00007f0018206148 in ?? ()
Id Target Id Frame
* 1 process 2860 "gwan" 0x00007f0018206148 in ?? ()
Thread 1 (process 2860):
#0 0x00007f0018206148 in ?? ()
#1 0x0000000000000004 in ?? ()
#2 0x00007f0018941245 in ?? ()
#3 0x00007f0018206070 in ?? ()
#4 0x00007f000b70ece8 in ?? ()
#5 0x000000000000002e in ?? ()
#6 0x0000000000000000 in ?? ()
Signal : 11:Address not mapped to object
Signal src : 1:.
errno : 0
Thread : 1
Code Pointer: 0000004343c0 (module:gwan, function:??, line:0)
Access Address: 0001004647f0
Registers : EAX=7f000ca04d40 CS=00000033 EIP=0000004343c0 EFLGS=000000010282
EBX=7f000ca08eb0 SS=00000000 ESP=7f000ca04af0 EBP=00000001aeb0
ECX=0000ffffffb0 DS=00000000 ESI=7f000ca04cf0 FS=00000033
EDX=0000000000b0 ES=00000000 EDI=7f000ca04d9c CS=00000033
Module :Function :Line # PgrmCntr(EIP) RetAddress FramePtr(EBP)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

How to get meaningful tracing information?
Leaving the relevant per-thread crash information would surely have helped to give you some insights.
But as you have removed it by hand, I am assuming that you know what you are doing.
Note that running gwan under /home/gwan/gwan -d is not a good idea.
Rather use something like /opt/gwan/gwan -d because the contents of /home are not reachable from other accounts.
And if you are running during days without the parent angel process (-d mode) then it's even less a good idea.
That's not G-WAN-specific:
"you should NEVER run Nginx in production with 'master_process off'"

Related

Rust debugging doesn't stop at the breakpoints when debugging stm32f407 via openocd and gdb

I have a problem debugging an stm32f407vet6 board and rust code.
The point of the problem is that GDB ignores breakpoints.
After setting breakpoints and executing the "continue" command in gdb, the program continues to ignore all breakpoints.
The only way to stop the program running is to cause an interrupt using the "ctrl + c" command.
After this command, the board stops its execution on the line currently being executed.
I have tried to set breakpoints on all lines where I can set them, but all the attempts are unsuccessful.
$ openocd
Open On-Chip Debugger 0.10.0 (2020-07-01) [https://github.com/sysprogs/openocd]
Licensed under GNU GPL v2
libusb1 09e75e98b4d9ea7909e8837b7a3f00dda4589dc3
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : auto-selecting first available session transport "hla_swd". To override use 'transport select <transport>'.
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : clock speed 2000 kHz
Error: libusb_open() failed with LIBUSB_ERROR_NOT_SUPPORTED
Info : STLINK V2J35S7 (API v2) VID:PID 0483:3748
Info : Target voltage: 6.436364
Info : stm32f4x.cpu: hardware has 6 breakpoints, 4 watchpoints
Info : starting gdb server for stm32f4x.cpu on 3333
Info : Listening on port 3333 for gdb connections
$ arm-none-eabi-gdb -q target\thumbv7em-none-eabihf\debug\test_blink
Reading symbols from target\thumbv7em-none-eabihf\debug\test_blink...
(gdb) target remote :3333
Remote debugging using :3333
0x00004070 in core::ptr::read_volatile (src=0xe000e010) at C:\Users\User\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\src/libcore/ptr/mod.rs:1005
1005 pub unsafe fn read_volatile<T>(src: *const T) -> T {
(gdb) load
Loading section .vector_table, size 0x1a8 lma 0x0
Loading section .text, size 0x47bc lma 0x1a8
Loading section .rodata, size 0xbf0 lma 0x4970
Start address 0x47a2, load size 21844
Transfer rate: 100 KB/sec, 5461 bytes/write.
(gdb) b main
Breakpoint 1 at 0x1f2: file src\main.rs, line 15.
(gdb) continue
Continuing.
Program received signal SIGINT, Interrupt.
0x00001530 in cortex_m::peripheral::syst::<impl cortex_m::peripheral::SYST>::has_wrapped (self=0x1000fc6c)
at C:\Users\User\.cargo\registry\src\github.com-1ecc6299db9ec823\cortex-m-0.6.3\src\peripheral/syst.rs:135
135 pub fn has_wrapped(&mut self) -> bool {
(gdb) bt
#0 0x00001530 in cortex_m::peripheral::syst::<impl cortex_m::peripheral::SYST>::has_wrapped (self=0x1000fc6c)
at C:\Users\User\.cargo\registry\src\github.com-1ecc6299db9ec823\cortex-m-0.6.3\src\peripheral/syst.rs:135
#1 0x00003450 in <stm32f4xx_hal::delay::Delay as embedded_hal::blocking::delay::DelayUs<u32>>::delay_us (self=0x1000fc6c, us=500000)
at C:\Users\User\.cargo\registry\src\github.com-1ecc6299db9ec823\stm32f4xx-hal-0.8.3\src/delay.rs:69
#2 0x0000339e in <stm32f4xx_hal::delay::Delay as embedded_hal::blocking::delay::DelayMs<u32>>::delay_ms (self=0x1000fc6c, ms=500)
at C:\Users\User\.cargo\registry\src\github.com-1ecc6299db9ec823\stm32f4xx-hal-0.8.3\src/delay.rs:32
#3 0x00000318 in test_blink::__cortex_m_rt_main () at src\main.rs:40
#4 0x000001f6 in main () at src\main.rs:15
memory.x file:
MEMORY
{
/* NOTE 1 K = 1 KiBi = 1024 bytes */
/* TODO Adjust these memory regions to match your device memory layout */
/* These values correspond to the LM3S6965, one of the few devices QEMU can emulate */
CCMRAM : ORIGIN = 0x10000000, LENGTH = 64K
RAM : ORIGIN = 0x20000000, LENGTH = 128K
FLASH : ORIGIN = 0x00000000, LENGTH = 512K
}
/* This is where the call stack will be allocated. */
/* The stack is of the full descending type. */
/* You may want to use this variable to locate the call stack and static
variables in different memory regions. Below is shown the default value */
_stack_start = ORIGIN(CCMRAM) + LENGTH(CCMRAM);
/* You can use this symbol to customize the location of the .text section */
/* If omitted the .text section will be placed right after the .vector_table
section */
/* This is required only on microcontrollers that store some configuration right
after the vector table */
/* _stext = ORIGIN(FLASH) + 0x400; */
/* Example of putting non-initialized variables into custom RAM locations. */
/* This assumes you have defined a region RAM2 above, and in the Rust
sources added the attribute `#[link_section = ".ram2bss"]` to the data
you want to place there. */
/* Note that the section will not be zero-initialized by the runtime! */
/* SECTIONS {
.ram2bss (NOLOAD) : ALIGN(4) {
*(.ram2bss);
. = ALIGN(4);
} > RAM2
} INSERT AFTER .bss;
*/
openocd.cfg file:
# Sample OpenOCD configuration for the STM32F3DISCOVERY development board
# Depending on the hardware revision you got you'll have to pick ONE of these
# interfaces. At any time only one interface should be commented out.
# Revision C (newer revision)
source [find interface/stlink.cfg]
# Revision A and B (older revisions)
# source [find interface/stlink-v2.cfg]
source [find target/stm32f4x.cfg]
# use hardware reset, connect under reset
# reset_config none separate
main.rs file:
#![no_main]
#![no_std]
#![allow(unsafe_code)]
// Halt on panic
#[allow(unused_extern_crates)] // NOTE(allow) bug rust-lang/rust#53964
extern crate panic_halt; // panic handler
use cortex_m;
use cortex_m_rt::entry;
use stm32f4xx_hal as hal;
use crate::hal::{prelude::*, stm32};
#[entry]
fn main() -> ! {
if let (Some(dp), Some(cp)) = (
stm32::Peripherals::take(),
cortex_m::peripheral::Peripherals::take(),
) {
let rcc = dp.RCC.constrain();
let clocks = rcc
.cfgr
.sysclk(168.mhz())
.freeze();
let mut delay = hal::delay::Delay::new(cp.SYST, clocks);
let gpioa = dp.GPIOA.split();
let mut l1 = gpioa.pa6.into_push_pull_output();
let mut l2 = gpioa.pa7.into_push_pull_output();
loop {
l1.set_low().unwrap();
l2.set_high().unwrap();
delay.delay_ms(500u32);
l1.set_high().unwrap();
l2.set_low().unwrap();
delay.delay_ms(500u32);
}
}
loop {}
}
Cargo.toml file:
[package]
name = "test_blink"
version = "0.1.0"
authors = ["Alex"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
embedded-hal = "0.2"
nb = "0.1.2"
cortex-m = "0.6"
cortex-m-rt = "0.6"
# Panic behaviour, see https://crates.io/keywords/panic-impl for alternatives
panic-halt = "0.2"
cortex-m-log="0.6.2"
[dependencies.stm32f4xx-hal]
version = "0.8.3"
features = ["rt", "stm32f407"]
I am new to rust embedded and maybe I have done something wrong, but I have already tried all the options I can find on the Internet.
At first I thought it was a problem with the cortex-debug plugin for vscode and even created the issue, but the guys couldn't help me because the problem is obviously not on their side.
Debugging "C" code in cubeIDE works, so I dare to assume that the problem is somewhere in rust--gdb--openocd. Perhaps I am missing something, but unfortunately I cannot find it myself yet.
I would appreciate any resources or ideas to solve this problem.
I'm hoping you checked out this resources:
Discovery - debug
From your screen-grab of arm-none-eabi-gdb it does indeed look it it did not hit the break point.
you should have seen this message afterwards:
Note: automatically using hardware breakpoints for read-only addresses.
Breakpoint 1, main () at ...
Did you compile your source with symbols, and unoptimised?
Your config all looks right to me otherwise.

Connecting to an existing db farm using MonetDBLite C API

I have a DB farm created with a database (ex: temp). When I try to connect to that using monetdb_startup, get the following error.
src/gdk/gdk_utils.c:1465: GDKfree: Assertion `(asize & 2) == 0' failed.
Aborted (core dumped)
I'm using the sample application tests/readme/readme.c provided.
monetdb_startup("/dbfarm/temp", 0, 0) is what I'm trying to do.
Monet version being used:
MonetDB 5 server v11.29.3 "Mar2018" (64-bit, 128-bit integers)
Stack trace:
#0 0x0000003f39232495 in raise () from /lib64/libc.so.6
#1 0x0000003f39233c75 in abort () from /lib64/libc.so.6
#2 0x0000003f3922b60e in __assert_fail_base () from /lib64/libc.so.6
#3 0x0000003f3922b6d0 in __assert_fail () from /lib64/libc.so.6
#4 0x00007ffff799bc3c in GDKfree (s=0x19602e0) at src/gdk/gdk_utils.c:1465
#5 0x00007ffff79a8521 in freeException (msg=0x19602e0 '▒' <repeats 88 times>, "▒L\001") at src/mal/mal/mal_exception.c:135
#6 0x00007ffff7b38c09 in SQLupgrades (c=0x7ffff42b2400, m=0x1815460) at src/mal/sqlbackend/sql_upgrades.c:1442
#7 0x00007ffff7b1edb2 in SQLinitClient (c=0x7ffff42b2400) at src/mal/sqlbackend/sql_scenario.c:612
#8 0x00007ffff7404f32 in monetdb_connect () at src/embedded/embedded.c:72
#9 0x00007ffff74055da in monetdb_startup (dbdir=0x7fffffffd7c0 "/dbfarm/temp/", silent=0 '\000', sequential=0 '\000')
at src/embedded/embedded.c:162
Thanks
In general, this use case is not supported. So upgrade MonetDBLite databases between versions should work fine, but moving from MonetDBLite to MonetDB and back is probably going to give errors and/or crash.

linux systemtap register error

I use systematap to probe slab memory allocation activity.
#! /usr/bin/env stap
global slabs
probe vm.kmem_cache_alloc {
slabs [execname(), bytes_req]<<<1
}
probe timer.ms(10000)
{
dummy = "";
foreach ([name, bytes] in slabs) {
if (dummy != name)
printf("\nProcess:%s\n", name);
printf("Slab_size:%d\tCount:%d\n", bytes, #count(slabs[name, bytes]));
dummy = name;
}
delete slabs
printf("\n-------------------------------------------------------\n\n")
}
but the stap produce following errors :
[root#svr_test5 ~]# stap -v -u vm.tracepoints.stp
Pass 1: parsed user script and 85 library script(s) using 146832virt/23712res/3012shr/21396data kb, in 140usr/10sys/152real ms.
Pass 2: analyzed script: 3 probe(s), 111 function(s), 3 embed(s), 13 global(s) using 228472virt/45000res/4760shr/41696data kb, in 300usr/150sys/488real ms.
Pass 3: translated to C into "/tmp/stap7FrdOq/stap_1d0a8db65ecd4c9f56be318001d197c0_39617_src.c" using 226240virt/47000res/6800shr/41696data kb, in 10usr/0sys/36real ms.
Pass 4: compiled C into "stap_1d0a8db65ecd4c9f56be318001d197c0_39617.ko" in 1360usr/160sys/1546real ms.
Pass 5: starting run.
WARNING: probe kernel.function("kmem_cache_alloc#mm/slab.c:3269").call (address 0xffffffff8000ac24) registration error (rc -84)
WARNING: probe kernel.function("kmem_cache_alloc#mm/slab.c:3269").return (address 0xffffffff8000ac24) registration error (rc -84)
which I guess the probe kernel module should be not registered, so have no effective.
My os :
CentOS release 5.8 (Final)
kernel :
Linux svr_test5 2.6.18-308.el5 #1 SMP Tue Feb 21 20:06:06 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
so, what's the WARNING meaning ? how to fix it ?
WARNING: probe [...] registration error (rc -84)
This is an indication of a kernel kprobe error EILSEQ, which is issued when the kernel is unable to decode/confirm the binary instruction sequence at the requested address.
For systemtap 1.8 (last version officially updated for RHEL5) against a RHEL5.11 kernel (2.6.18-400), it happens to work; perhaps kprobes improvements did the job.

nt!KeWaitForSingleObject without Args

I'm currently trying to debug a system deadlock and I'm having a hard time understanding this.
Child-SP RetAddr : Args to Child : Call Site
fffff880`035cb760 fffff800`02ecef72 : 00000000`00000002 fffffa80`066e8b50 00000000`00000000 fffffa80`066a16e0 : nt!KiSwapContext+0x7a
fffff880`035cb8a0 fffff800`02ee039f : fffffa80`0b9256b0 00000000`000007ff 00000000`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x1d2
fffff880`035cb930 fffff880`0312a5e4 : 00000000`00000000 fffff800`00000000 fffffa80`079a3c00 00000000`00000000 : nt!KeWaitForSingleObject+0x19
Why would the first argument for KeWaitForSingleObject be null?
Unless I'm misunderstanding isn't the first argument the object being waited on?
Is the deadlock simply that this thread is waiting on nothing or is this ordinary behavior?
Additionally I see another process (services.exe) showing a similar stack trace:
1: kd> .thread fffffa800d406b50
Implicit thread is now fffffa80`0d406b50
1: kd> kv
*** Stack trace for last set context - .thread/.cxr resets it
Child-SP RetAddr : Args to Child : Call Site
fffff880`09ed4800 fffff800`02ecef72 : fffffa80`0d406b50 fffffa80`0d406b50 00000000`00000000 fffff8a0`00000000 : nt!KiSwapContext+0x7a
fffff880`09ed4940 fffff800`02ee039f : 00000000`000000b4 fffffa80`0b1df7f0 00000000`0000005e fffff800`031ae5e7 : nt!KiCommitThreadWait+0x1d2
fffff880`09ed49d0 fffff800`031d1e3e : fffffa80`0d406b00 00000000`00000006 00000000`00000001 00000000`093bf000 : nt!KeWaitForSingleObject+0x19f
fffff880`09ed4a70 fffff800`02ed87d3 : fffffa80`0d406b50 00000000`77502410 fffff880`09ed4ab8 fffffa80`0b171a50 : nt!NtWaitForSingleObject+0xde
Is this thread waiting on itself essentially?
You're debugging a 64-bit process.
Remember the x64 calling convention, which is explained here. The first 4 arguments are passed in registers. After that, arguments are pushed onto the stack.
Unfortunately, kv blindly displays the stack arguments. In fact, it would be quite difficult (and sometimes impossible) for it to determine what the first 4 arguments actually were at the time of the call since they may not have been stored anywhere that can ever be recovered.
So, you are looking at the 5th argument to nt!NtWaitForSingleObject, where a nullptr is a pretty typical argument for a Timeout.
Luckily for us debugging types, all is not lost! There is a windbg extension which does its best to reconstruct the arguments when the function was called. The extension is called CMKD. You can place the extension DLL in your winext folder and call it like so:
0:000> !cmkd.stack -p
Call Stack : 7 frames
## Stack-Pointer Return-Address Call-Site
00 000000a408c7fb28 00007ffda95b1148 ntdll!NtWaitForSingleObject+a
Parameter[0] = 0000000000000034
Parameter[1] = 0000000000000000
Parameter[2] = 0000000000000000
Parameter[3] = (unknown)
01 000000a408c7fb30 00007ff7e44c13f1 KERNELBASE!WaitForSingleObjectEx+98
Parameter[0] = 0000000000000034
Parameter[1] = 00000000ffffffff
Parameter[2] = 0000000000000000
Parameter[3] = 00007ff7e44cba28
02 000000a408c7fbd0 00007ff7e44c3fed ConsoleApplication2!main+41
Parameter[0] = (unknown)
Parameter[1] = (unknown)
Parameter[2] = (unknown)
Parameter[3] = (unknown)
Notice that it does not always succeed at finding the argument, as some of them are (unknown). But, it does a pretty good job and can be an invaluable tool when debugging 64-bit code.
This looks like a 64-bit OS, and therefore the calling convention is not to pass all the parameters on the stack. Rather, the first four parameters get passed in RCX, RDX, R8, and R9, with the remaining parameters on the stack. So if you catch the call to KeWaitForSingleObject it's easy to see what's in RCX and go from there. Once you are a few stack frames beyond that, it's much hard to tell since something will have been loaded into that register. The original value is probably stored somewhere, but it will be difficult to find.

Break at program start on OS X?

How can I break at the very start of a program on OS X (10.6) without debug symbols?
I'm debugging an issue where my machine hangs and can't do certain things (at least anything involving networking). The programs I can use to try to identify the hang also hang on launch, so I'd like to start a program but not actually run it until the hang occurs, in the hopes that either the program runs or the place it hangs helps me to diagnose the issue.
I tried just setting breakpoints at the addresses that show up in the backtrace, but execution didn't stop.
Breakpoint 2, 0x000000010005cc78 in write$NOCANCEL ()
(gdb) bt
#0 0x000000010005cc78 in write$NOCANCEL ()
#1 0x000000010005cc74 in __swrite ()
#2 0x000000010005cbfd in _swrite ()
#3 0x000000010005cb42 in __sflush ()
#4 0x0000000100061361 in __swbuf ()
#5 0x0000000100093474 in putchar ()
#6 0x0000000100003ce7 in ?? ()
#7 0x000000010000090c in ?? ()
(gdb) b *0x000000010000090c
Breakpoint 3 at 0x10000090c
(gdb) b *0x0000000100003ce7
Breakpoint 4 at 0x100003ce7
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: [...]
Breakpoint 2, 0x000000010005cc78 in write$NOCANCEL ()
(gdb) bt
#0 0x000000010005cc78 in write$NOCANCEL ()
#1 0x000000010005cc74 in __swrite ()
#2 0x000000010005cbfd in _swrite ()
#3 0x000000010005cb42 in __sflush ()
#4 0x0000000100061361 in __swbuf ()
#5 0x0000000100093474 in putchar ()
#6 0x0000000100003ce7 in ?? ()
#7 0x000000010000090c in ?? ()
b __dyld__dyld_start works. (Thanks to #kongtomorrow.)

Resources