Linux version:
3.14.25-00387-g38b1460 #2 SMP PREEMPT Mon Oct 1 14:26:11 CEST 2018 x86_64 x86_64 x86_64 GNU/Linux
Sometimes, the kern.log is not found, don't know why.
I know there is rotating configs, but if the file is not there when we want to collect kern.log, is this an issue, anyhow?
https://www.syslog-ng.com/syslog-ng-faq/
“If there are multiple processes reading ''/proc/kmsg'', a race condition occurs and the process which loses the race essentially deadlocks until the next kernel message is generated, when a new race condition occurs.”
Above is somehow one possible answer, and we see on node that two processes may have chance to access that kmsg file.
Related
I am trying to run a simple bpf program that I wrote. But I am not able to run it as non root user. Below is the program I am trying to load, It basically gets the pointer to my map whose fd is map_fd (I am not showing the code where I create the map). It works as root but for some reason fails with non root user.
Output of uname -a
Linux 5.8.0-38-generic #43~20.04.1-Ubuntu SMP Tue Jan 12 16:39:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
BPF program
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
BPF_LD_MAP_FD(BPF_REG_1,map_fd),
BPF_RAW_INSN(BPF_CALL | BPF_JMP, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
BPF_EXIT_INSN(),
TL;DR. Qeole is correct, you first need to make sure you are using one of the BPF program types allowed for unprivileged users. You also need to check your sysctl settings. Finally, your current program has a pointer leak that should be fixed before it is loaded by an unprivileged users.
Using the correct program type
The kernel allows unprivileged users to load only two types of BPF programs, BPF_PROG_TYPE_SOCKET_FILTER and BPF_PROG_TYPE_CGROUP_SKB. You can see the check in the kernel for that condition in kernel/bpf/syscall.c.
Setting the proper sysctl
The kernel.unprivileged_bpf_disabled sysctl controls whether unprivileged users can load eBPF programs. It is unfortunately set to 0 (allow loading) on major distributions.
sysctl -w kernel.unprivileged_bpf_disabled=0
Note: If you are not using unprivileged program types, I would strongly recommend to set this sysctl to 1.
Fixing the pointer leaks
Regardless of the above settings, BPF programs loaded by unprivileged users are never allowed to leak pointers to userspace. For example, if the program is returning a pointer to a map value, it is considered a leak. That's your case.
After the call to BPF_FUNC_map_lookup_elem, if R0 is non-zero, you should overwrite its value (set to 1?) before returning it.
I do a variety of different kinds of data analysis and numerical simulation on my custom-built Ubuntu machine using custom-written programs that sometimes must run for days or even weeks. Some of those programs have been in Fortran, some in Python, some in C; there is literally zero commonality between these programs except that they run a long time and do a lot of disk i/o. Most are single-thread.
The typical execution command line looks like
./myprog &> myprog.log &
If an ordinary runtime error occurs, any buffered program output and the error message both faithfully appear in myprog.log and the logfile is cleanly closed. But what's been happening instead in many cases is that the program simply quits in mid-stream -- usually after half a day to a day or so, without any further output to the log file. It's like the program had been randomly hit with a 'kill -9'.
I don't know why this is happening, and it seems to be specific to this particular machine (I have been doing similar work for 30 years and never experienced this before). The operating system itself seems rock-stable; it has been rebooted only rarely over the past couple years for specific reasons like updates. It's only my longer-running user processes that seem to die abruptly like this with no accompanying diagnostic.
Not being a system-level expert, I'm at a loss for how to diagnose what's going on. Right now, my only option is to regularly check whether my program is still running and restart it if necessary.
System details:
Ubuntu 18.04.4 LTS
Linux kernel: 4.15.0-39-generic
CPU: AMD Ryzen Threadripper 1950x
UPDATE: Since dmesg was mentioned, here are some representive messages, which I have no idea how to interpret. The UFW BLOCK messages are by far the most numerous, but there are also a fair number of the ata6 messages, which seem to have something to do with the SATA hard drive. Could this be relevant?
[5301325.692596] audit: type=1400 audit(1594876149.572:218): apparmor="DENIED" operation="open" profile="/usr/sbin/cups-browsed" name="/usr/share/locale/" pid=19663 comm="cups-browsed" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[5352288.689739] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[5352288.689753] ata6.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 14 pio 16392 in
Get event status notification 4a 01 00 00 10 00 00 00 08 00res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[5352288.689756] ata6.00: status: { DRDY }
[5352288.689760] ata6: hard resetting link
[5352289.161877] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[5352289.166076] ata6.00: configured for PIO0
[5352289.166635] ata6: EH complete
[5353558.066052] [UFW BLOCK] IN=enp5s0 OUT= MAC=10:7b:44:93:2f:58:b4:0c:25:e0:40:12:08:00 SRC=172.105.89.161 DST=144.92.130.162 LEN=40 TOS=0x00 PREC=0x00 TTL=243 ID=50780 PROTO=TCP SPT=58944 DPT=68 WINDOW=1024 RES=0x00 SYN URGP=0
I have a customer who is getting their system log flooded with thousands of copies of this message:
Jul 25 11:21:33 athayer-mbp13 kernel[0]: PSYNCH: pid[52893]: address already known to kernel for another [busy] synchronizer type
The culprit is my app, but I can’t reproduce the problem and don’t have much of a clue to its cause. My app does disk searching, and this error happens about 15 hours into the life of the process. There is no excessive memory usage or file descriptor leakage. The app continues to operate normally, it’s just that these messages cause the system log to blow up to gigabyte proportions and fill up the boot disk.
I found the Darwin kernel code where the message is printed, but it’s only a clue, it doesn’t show the smoking gun:
http://opensource.apple.com//source/xnu/xnu-1699.32.7/bsd/kern/pthread_support.c
FAILEDUSERTEST("address already known to kernel for another (busy) synchronizer type\n”);
It’s in this function:
/* find kernel waitqueue, if not present create one. Grants a reference */
int
ksyn_wqfind(user_addr_t mutex, uint32_t mgen, uint32_t ugen, uint32_t rw_wc, uint64_t tid, int flags, int wqtype, ksyn_wait_queue_t * kwqp)
Can anyone provide any insight into what’s going on?
Here’s the profile for the machine:
Model Name: MacBook Pro
Model Identifier: MacBookPro12,1
Processor Name: Intel Core i5
Processor Speed: 2.7 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 3 MB
Memory: 8 GB
Boot ROM Version: MBP121.0167.B16
SMC Version (system): 2.28f7
Hardware UUID: 9205D058-90BF-541E-8E61-E75259ABC11F
System Software Overview:
System Version: OS X 10.11.4 (15E65)
Kernel Version: Darwin 15.4.0
Boot Volume: Macintosh HD
Boot Mode: Normal
Computer Name: athayer-mbp13
User Name: System Administrator (root)
Secure Virtual Memory: Enabled
system_integrity: integrity_enabled
Time since boot: 9 days 18:55
Possible Explanation
It's possible that you're being affected by an old kernel bug. If a pthread condition variable (the main component of a standard pthread_mutex family object) is allocated, but never waited on, there is a situation in which its object is never removed from a pthreads-internal registry on OSX.
If that happens, and if another mutex is later allocated that happens to end up in the same space in memory, and if that mutex is waited on, this error can occur, since the new mutex's ID will not match the one already present in its space. This is distinct from a reallocation issue where garbled/meaningless info is found instead of a valid ID.
Workaround
The workaround is to ensure that you are calling a a wait function on all mutexes/condvars you create. Even a nanosecond wait will trigger "correct" destruction when it completes on a no-longer-used mutex. An example of the fix by the Chromium devs is linked below.
For example, you could wait one nanosecond/tick on a lock thus:
struct timespec time { .tv_sec = 0, .tv_nsec = 1 };
pthread_cond_timedwait_relative_np(
&some_condition_handle,
&some_lock_handle,
time
);
Confounding Factors
The kernel bug may not be the real issue. There are a lot of confounding factors here:
The kernel source hasn't been published for 10.10 or 10.11, so the code being called that generates that error may not be the code that you found online.
As a result of that, the kernel bug I mentioned may not still exist, or may not be reachable in the same way.
The error line you published has parens (()) around the word "busy", but the source you found has square brackets ([]). The places in code that print out the two different messages are distinct from each other, so the problem lines might not be the ones you pointed out in your question.
Relevant Links
Article by the first (only?) person who has diagnosed this issue: http://rayne3d.com/blog/02-27-2014-rayne-weekly-devblog-4
The problem gets exhibited in the pthread source (or it was, in pthread 105.1.4), visible at this link (search in the page for 13782056): https://opensource.apple.com/source/libpthread/libpthread-105.1.4/src/pthread_cond.c
An example fix like the workaround listed above was made by the Chromium team when they were affected by a similar (the same?) issue: https://codereview.chromium.org/1323293005
The original Apple Developer Forum link appears to be defunct, though I might just be unable to access it: https://devforums.apple.com/thread/220316?tstart=0
I'm working on the linux kernel 3.10 patched with LITMUS^RT, a real-time extension with a focus on multiprocessor real-time scheduling and synchronization.
My aim is to write a scheduler that allows a task to migrate from a cpu to another when preempted and only when particular conditions are met. My current implementation is affected by deadlock beetwen cpus as shown by the following error:
Setting up rt task parameters for process 1622.
[ INFO: inconsistent lock state ]
3.10.5-litmus2013.1 #105 Not tainted
---------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
rtspin/1620 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&rq->lock){?.-.-.}, at: [<ffffffff8155f0d5>] __schedule+0x175/0xa70
{IN-HARDIRQ-W} state was registered at:
[<ffffffff8107832a>] __lock_acquire+0x86a/0x1e90
[<ffffffff81079f65>] lock_acquire+0x95/0x140
[<ffffffff81560fc6>] _raw_spin_lock+0x36/0x50
[<ffffffff8105e231>] scheduler_tick+0x61/0x210
[<ffffffff8103f112>] update_process_times+0x62/0x80
[<ffffffff81071677>] tick_periodic+0x27/0x70
[<ffffffff8107174b>] tick_handle_periodic+0x1b/0x70
[<ffffffff810042d0>] timer_interrupt+0x10/0x20
[<ffffffff810849fd>] handle_irq_event_percpu+0x6d/0x260
[<ffffffff81084c33>] handle_irq_event+0x43/0x70
[<ffffffff8108778c>] handle_level_irq+0x6c/0xc0
[<ffffffff81003a89>] handle_irq+0x19/0x30
[<ffffffff81003925>] do_IRQ+0x55/0xd0
[<ffffffff81561cef>] ret_from_intr+0x0/0x13
[<ffffffff8108615a>] __setup_irq+0x20a/0x4e0
[<ffffffff81086473>] setup_irq+0x43/0x90
[<ffffffff8184fb5f>] setup_default_timer_irq+0x12/0x14
[<ffffffff8184fb78>] hpet_time_init+0x17/0x19
[<ffffffff8184fb46>] x86_late_time_init+0xa/0x11
[<ffffffff8184ecd1>] start_kernel+0x270/0x2e0
[<ffffffff8184e5a3>] x86_64_start_reservations+0x2a/0x2c
[<ffffffff8184e66c>] x86_64_start_kernel+0xc7/0xca
irq event stamp: 8886
hardirqs last enabled at (8885): [<ffffffff8108dd6b>] rcu_note_context_switch+0x8b/0x2d0
hardirqs last disabled at (8886): [<ffffffff81561052>] _raw_spin_lock_irq+0x12/0x50
softirqs last enabled at (8880): [<ffffffff81037125>] __do_softirq+0x195/0x2b0
softirqs last disabled at (8857): [<ffffffff8103738d>] irq_exit+0x7d/0x90
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&rq->lock);
<Interrupt>
lock(&rq->lock);
*** DEADLOCK ***
1 lock held by rtspin/1620:
#0: (&rq->lock){?.-.-.}, at: [<ffffffff8155f0d5>] __schedule+0x175/0xa70
stack backtrace:
CPU: 1 PID: 1620 Comm: rtspin Not tainted 3.10.5-litmus2013.1 #105
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
ffffffff81bc4cc0 ffff88001cdf3aa8 ffffffff8155ae1e ffff88001cdf3af8
ffffffff81557f39 0000000000000000 ffff880000000001 ffff880000000001
0000000000000000 ffff88001c5ec280 ffffffff810750d0 0000000000000002
Call Trace:
[<ffffffff8155ae1e>] dump_stack+0x19/0x1b
[<ffffffff81557f39>] print_usage_bug+0x1f7/0x208
[<ffffffff810750d0>] ? print_shortest_lock_dependencies+0x1c0/0x1c0
[<ffffffff81075ead>] mark_lock+0x2ad/0x320
[<ffffffff81075fd0>] mark_held_locks+0xb0/0x120
[<ffffffff8129bf71>] ? pfp_schedule+0x691/0xba0
[<ffffffff810760f2>] trace_hardirqs_on_caller+0xb2/0x210
[<ffffffff8107625d>] trace_hardirqs_on+0xd/0x10
[<ffffffff8129bf71>] pfp_schedule+0x691/0xba0
[<ffffffff81069e70>] pick_next_task_litmus+0x40/0x500
[<ffffffff8155f17a>] __schedule+0x21a/0xa70
[<ffffffff8155f9f4>] schedule+0x24/0x70
[<ffffffff8155d1bc>] schedule_timeout+0x14c/0x200
[<ffffffff8105e3ed>] ? get_parent_ip+0xd/0x50
[<ffffffff8105e589>] ? sub_preempt_count+0x69/0xf0
[<ffffffff8155ffab>] wait_for_completion_interruptible+0xcb/0x140
[<ffffffff81060e60>] ? try_to_wake_up+0x470/0x470
[<ffffffff8129266f>] do_wait_for_ts_release+0xef/0x190
[<ffffffff81292782>] sys_wait_for_ts_release+0x22/0x30
[<ffffffff81562552>] system_call_fastpath+0x16/0x1b
At this point I suppose I see two possible approaches to follow to solve this problem:
Release the lock to the current cpu before migrating to the target cpu using the kernel's functions. LITMUS^RT provides a callback in which I can decide which task will execute:
static struct task_struct* pfp_schedule(struct task_struct * prev) {
[...]
if(is_preempted(prev) {
// release lock on current cpu
migrate_to_another_cpu();
}
[...]
}
What I think I must do is to release the current lock before the call to migrate_to_another_cpu function, but I still haven't found any way of doing that.
The scheduler that I want to implement allows only one task migration at a time, therefore it's theoretically impossible to have a deadlock. For some reasons though the execution of my task set fails and I get the error listed above which I think is identified during some kind of preliminary analisys performed by the linux kernel. This, however is a potential deadlock, it's kind of a warning, and I would like to let my task set to continue its normal execution ignoring it.
Long story short, does anyone know if one of these solutions is possible and, if yes, how to implement it?
Thanks in advance!
P.S.: tips or better ideas are very welcome! ;-)
In older versions of linux kernel (e.g. 2.6.11) struct sk_buff contains a pointer to struct sk_buff_head (named list). The 'Understanding Linux Network Internals' book says this pointer is maintained as sk_buffs need to quickly look up the head of the skb list. However I could not find such a member in recent versions of kernel (3.2.1). Can anyone explain how the skb list management has changed in newer kernels?
This changed a long time ago, in 2.6.14 apprently. The kernel commit in question was:
commit 8728b834b226ffcf2c94a58530090e292af2a7bf
Author: David S. Miller <davem#davemloft.net>
Date: Tue Aug 9 19:25:21 2005
[NET]: Kill skb->list
Remove the "list" member of struct sk_buff, as it is entirely
redundant. All SKB list removal callers know which list the
SKB is on, so storing this in sk_buff does nothing other than
taking up some space.
Two tricky bits were SCTP, which I took care of, and two ATM
drivers which Francois Romieu <romieu#fr.zoreil.com> fixed
up.
Signed-off-by: David S. Miller <davem#davemloft.net>
Signed-off-by: Francois Romieu <romieu#fr.zoreil.com>