cgroup skb egress ebpf hook - skb->cb[0-4] updated values are reset in tc egress ebpf hook - linux-kernel

I am just updating the skb->cb with constant values (0xfeed) and trying to get the same packet data at egress tc layer ebpf hook. It is all zero. Am i missing something here?
Is there anyway to send meta data between hooks with skb itself (LRU_HASH map may help but having in-band metadata will be faster, I think).
SEC("cgroup/skb")
int cgrp_dump_pkt(struct __sk_buff *skb) {
void *data_end = (void *) (long) skb->data_end;
void *data = (void *) (long) skb->data;
if(data < data_end)
{
skb->cb[0] = 0xfeed;
skb->cb[1] = 0xfeed;
skb->cb[2] = 0xfeed;
skb->cb[3] = 0xfeed;
skb->cb[4] = 0xfeed;
bpf_printk("hash:%x mark:%x pri:%x", skb->hash, skb->mark, skb->priority);
bpf_printk("cb0:%x cb1:%x cb2:%x", skb->cb[0], skb->cb[1], skb->cb[2]);
bpf_printk("cb3:%x cb4:%x", skb->cb[3], skb->cb[4]);
}
return 1;
}
The above hook is attached to a docker instance cgroup and I am trying to see the same packets by attaching the hook in tc egress (will just print the packet info includes cb[0-4]). cb[0-4] values are shown as 0. Is it an expected behavior? How to pass the metadata between ebpf hook from cgroup skb to tc egress or lower layers?

I've only ever used the cb fields to pass data across BPF tail calls. To pass data from one hook point to another in the stack, you could use skb->mark (32-bit value).
Do make sure that no other software is using the same bits you want to use though. See https://github.com/fwmark/registry for example.

Related

AVFormatContext: interrupt callback proper usage?

AVFormatContext's interrupt_callback field is a
Custom interrupt callbacks for the I/O layer.
It's type is AVIOInterruptCB, and it explains in comment section:
Callback for checking whether to abort blocking functions.
AVERROR_EXIT is returned in this case by the interrupted function. During blocking operations, callback is called with opaque as parameter. If the callback returns 1, the blocking operation will be aborted.
No members can be added to this struct without a major bump, if new elements have been added after this struct in AVFormatContext or AVIOContext.
I have 2 questions:
what does the last section means? Especially "without a major bump"?
If I use this along with an RTSP source, when I close the input by avformat_close_input, the "TEARDOWN" message is being sent out, however it won't reach the RTSP server.
For 2: here is a quick pseudo-code for demo:
int pkts = 0;
bool early_exit = false;
int InterruptCallback(void* ctx) {
return early_exit ? 1 : 0;
}
void main() {
ctx = avformat_alloc_context
ctx->interrupt_callback.callback = InterruptCallback;
avformat_open_input
avformat_find_stream_info
pkts=0;
while(!early_exit) {
av_read_frame
if (pkts++ > 100) early_exit=true;
}
avformat_close_input
}
In case I don't use the interrupt callback at all, TEARDOWN is being sent out, and it also reaches the RTSP server so it can actually tear down the connection. Otherwise, it won't tear down it, and I have to wait until TCP socket times out.
What is the proper way of using this interrupt callback?
It means that they are not going to change anything for this structure (AVIOInterruptCB). However, if thats the case it would be in a major bump (major change from 4.4 eg to 5.0)
You need to pass a meaningful parameter to void* ctx. Anything that you like so you can check it within the static function. For example a bool that you will set as cancel so you will interrupt the av_read_frame (which will return an AVERROR_EXIT). Usually you pass a class of your decoder context or something similar which also holds all the info that you required to check whether to return 1 to interrupt or 0 to continue the requests properly. A real example would be that you open a wrong rtsp and then you want to open another one (the right one) so you need to cancel your previous requests.

Simulating (lazy) NAND memory on Windows

I'm running a firmware simulation in a DLL which has simulated NAND (256MB or 1GB). I want to avoid allocating memory for this on the heap and instead allocate using virtual memory.
The memory initially needs to be cleared to 0xFF (like NAND is). However I don't want to pay for that initialization (nor commit un-accessed pages). So ideally it should only allocate upon access. And I do not need to retain the data following exit of the simulation.
Initial ideas are
VirtualAlloc. Not sure but thinking perhaps could use guard page and then trap the exception on first access. Not sure its ideal that a DLL handles such SEH exceptions? Or is there a better way?
Create a big file that's initialized to 0xFF. Then map view of file with copy-on-write.
Anyone know if it is possible to create a file with a callback for providing the initial data?
Think probably 1) the way to go but wondering if that's really the best option.
Edit:
3) I've come up with another method that can avoid exception handler and also avoids creating a huge file:
Create a file that is same size as dwAllocationGranularity (64KiB typically). Fill with 0xFF. Then create multiple copy-on-write views of that in contiguous memory using MapViewOfFileEx + FILE_MAP_COPY (after an initial VirtualAlloc/VirtualFree to get a suitable base address that we can hope to allocate juxtapositioned views). Need to test this a bit more fully - slight concern about potential thread races.. I'm ony actually using a single thread but the CRT does start a few too.
This means that any code that only reads the virtual NAND also does not result in all pages getting committed.
yes, basically 1 is best solution. only i be do next changes - use VEH instead SEH - SEH handler will be called only if you access memory inside it, when in case VEH - access can be ai any context and thread. and instead use guard page, i be initial only reserve region of memory without real allocation. so any access to memory region lead to exception, you handle it in VEH - commit memory and fill with 0xFF pattern. demo code
PVOID g_NandBegin;
SIZE_T g_NandSize = 0x1000000;
LONG NTAPI Vex(::PEXCEPTION_POINTERS ExceptionInfo)
{
::PEXCEPTION_RECORD ExceptionRecord = ExceptionInfo->ExceptionRecord;
if (ExceptionRecord->ExceptionCode == STATUS_ACCESS_VIOLATION &&
ExceptionRecord->NumberParameters > 1)
{
PVOID pv = (PVOID)ExceptionRecord->ExceptionInformation[1];
if ((ULONG_PTR)pv - (ULONG_PTR)g_NandBegin < g_NandSize)
{
SIZE_T RegionSize = 1;
if (0 <= NtAllocateVirtualMemory(NtCurrentProcess(), &pv, 0, &RegionSize, MEM_COMMIT, PAGE_READWRITE))
{
RtlFillMemoryUlong(pv, RegionSize, MAXULONG);
return EXCEPTION_CONTINUE_EXECUTION;
}
}
}
return EXCEPTION_CONTINUE_SEARCH;
}
void dc()
{
if (PVOID pv = AddVectoredExceptionHandler(TRUE, Vex))
{
if (g_NandBegin = VirtualAlloc(0, g_NandSize, MEM_RESERVE, PAGE_READWRITE))
{
ULONG seed = ~GetTickCount();
int n = 0x100;
do
{
if (*(UCHAR*)((PBYTE)g_NandBegin + (((ULONG64)RtlRandomEx(&seed) * g_NandSize) >> 32)) != 0xFF)
{
__debugbreak();
}
} while (--n);
VirtualFree(g_NandBegin, 0, MEM_RELEASE);
}
RemoveVectoredExceptionHandler(pv);
}
}

Interrupt performance on linux kernel with RT patches - should be better?

I have bumped into a bit inconsistent IRQ/ISR performance on Freescales imx.233 running linux kernel (3.8.13) with CONFIG_PREEMPT_RT patches.
I am little bit surprised why this processor (ARM9, 454mhz) is unable to keep up even with 74kHz IRQ requests.. ?
In my kernel config I have set following flags:
CONFIG_TINY_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_RT_BASE=y
CONFIG_HAVE_PREEMPT_LAZY=y
CONFIG_PREEMPT_LAZY=y
CONFIG_PREEMPT_RT_FULL=y
CONFIG_PREEMPT_COUNT=y
CONFIG_DEBUG_PREEMPT=y
On the system there is basically nothing running (created by buildroot), and I set PWM to generate a pulse of 74kHz, that serves as interrupt.
Then in the ISR, I just trigger another GPIO output pin, and check the output.
What I find is that sometimes I miss an interrupt -
You can see the missed interrupt here:
And also the the triggering of output pin seems to be a bit inconsistent, the output pin is triggered usually within "5% window", that might still be acceptable. But I worry, that when I start implementing data transfer logic, instead of just triggering the pin, I might run into further problems...
My simple driver code looks like this:
#needed includes
uint16_t INPUT_IRQ = 39;
uint16_t OUTPUT_GPIO = 38;
struct test_device *device;
//Prototypes
void irqtest_exit(void);
int irqtest_init(void);
void free_device(void);
//Default functions
module_init(irqtest_init);
module_exit(irqtest_exit);
//triggering flag
uint16_t pulse = 0x1;
irqreturn_t irq_handle_function(int irq, void *device_id)
{
pulse = !pulse;
gpio_set_value(OUTPUT_GPIO, pulse);
return IRQ_HANDLED;
}
struct test_device {
int huuhaa;
};
void free_device() {
if (device)
kfree(device);
}
int irqtest_init(void) {
int result = 0;
device = kmalloc(sizeof *device, GFP_KERNEL);
device->huuhaa = 10;
printk("IRB/irqtest_init: Inserting IRQ module\n");
printk("IRB/irqtest_init: Requesting GPIO (%d)\n", INPUT_IRQ);
result = gpio_request_one(INPUT_IRQ, GPIOF_IN, "PWM input");
if (result != 0) {
free_device();
printk("IRB/irqtest_init: Failed to set GPIO (%d) as input.. exiting\n", INPUT_IRQ);
return -EINVAL;
}
result = gpio_request_one(OUTPUT_GPIO, GPIOF_OUT_INIT_LOW , "IR OUTPUT");
if (result != 0) {
free_device();
printk("IRB/irqtest_init: Failed to set GPIO (%d) as output.. exiting\n", OUTPUT_GPIO);
return -EINVAL;
}
//Set our desired interrupt line as input
result = gpio_direction_input(INPUT_IRQ);
if (result != 0) {
printk("IRB/irqtest_init: Failed to set IRQ as input.. exiting\n");
free_device();
return -EINVAL;
}
//Set flags for our interrupt, guessing here..
irq_flags |= IRQF_NO_THREAD;
irq_flags |= IRQF_NOBALANCING;
irq_flags |= IRQF_TRIGGER_RISING;
irq_flags |= IRQF_NO_SOFTIRQ_CALL;
//register interrupt
result = request_irq(gpio_to_irq(INPUT_IRQ), irq_handle_function, irq_flags, "irq testing", device);
if (result != 0) {
printk("IRB/irqtest_init: Failed to reserve GPIO 38\n");
return -EINVAL;
}
printk("IRB/irqtest_init: insert success\n");
return 0;
}
void irqtest_exit(void) {
if (device)
kfree(device);
gpio_free(INPUT_IRQ);
gpio_free(OUTPUT_GPIO);
printk("IRB/irqtest_exit: Removing irqtest module\n");
}
int irqtest_open(struct inode *inode, struct file *filp) {return 0;}
int irqtest_release(struct inode *inode, struct file *filp) {return 0;}
In the system, I have following interrupts registered, after the driver is loaded:
# cat /proc/interrupts
CPU0
16: 36379 - MXS Timer Tick
17: 0 - mxs-spi
18: 2103 - mxs-dma
60: 0 gpio-mxs irq testing
118: 0 - mxs-spi
119: 0 - mxs-dma
120: 0 - RTC alarm
124: 0 - 8006c000.serial
127: 68050 - uart-pl011
128: 151 - ci13xxx_imx
Err: 0
I wonder if the flags I declare to my IRQ are good ? I noticed that with this configuration, I can no longer reach console, so kernel seems totally consumed with servicing this 74kHz trigger now.. this can't be right ?
I suppose it's not a big deal for me since this is only during data transfer, but still I feel I'm doing something wrong..
Also, I wonder if it would be more efficient to map the registers with ioremap, and trigger the output with direct memory writes ?
Is there some way I could increase the priority of the interrupt even higher ? Or could I somehow lock the kernel for the duration of the data transfer (~400ms), and generate somehow else my timing for the output ?
Edit: Forgot to add /proc/interrupts output to the question...
What you experience here is interrupt jitter. This is to be expected on Linux, because the kernel regularly disables the interrupts for various tasks (entering a spinlock, handling an interrupt, etc.).
This will happen, regardless wether you have PREEMPT_RT or not, so expecting to generate 74kHz signal with regular interrupts is pretty much unrealistic.
Now, ARM has higher priority interrupts called FIQs, that will never be masked or disabled.
Linux doesn't use FIQ, and is not built to deal with the fact that an FIQ could be used, so you won't be able to use the generic kernel framework.
From Linux driver development point of view however, it's not really different as long as you keep this in mind: you have to write a handler, and associate it to an IRQ. You'll also have to poke into the interrupt controller to make it generate a FIQ for the interrupt you want to use (the details on how to change it are platform-dependant. Some platforms have functions to do that (like imx25 and mxc_set_irq_fiq), some others don't. imx23/28 don't, so you'll have to do it by hand).
The only thing that the functions to setup a fiq handler only work with a assembly-written handler, so you'll have to rewrite your handler in assembly (with your current code, it should be trivial though).
You can grab additional details to the blog post Alexandre posted (http://free-electrons.com/blog/fiq-handlers-in-the-arm-linux-kernel/), where you'll find working code, samples, and explanations on how it all works together.
You can have a look at what my colleague Maxime Ripard did using an FIQ on a similar SoC (i.mx28) :
http://free-electrons.com/blog/fiq-handlers-in-the-arm-linux-kernel/
Try this flags:
int irq_flags;
...
irq_flags = IRQF_TRIGGER_RISING | IRQF_EARLY_RESUME
I had a kernel 3.8.11 and can't find IRQF_NO_SOFTIRQ_CALL define. It's only for 3.8.13?
Also I didn't see irq_flags define. Where is it?

What's the correct method for CoreAudio realtime thread to communicate with UI thread?

I need to pass data between CoreAudio's realtime thread and the UI thread (one way, RT->UI). I know I can't use any Cocoa/Objective C methods like performSelectorOnMainThread or NSNotification and I can't use anything that will allocate memory as this will potentially block the RT thread.
What is the correct method for communicating between threads? Can I use GCD message queues or is there a more basic system to use?
Edit:
Thinking about this a bit more, I suppose I could use a lock free ring buffer, which the RT thread puts a message into, and the UI thread checks for messages to pull out. Is this the best way and if so is there a system already to do this in CoreAudio or available elsewhere or do I need to code it up myself?
It turns out this was a lot simpler than I expected and the solution I came up with was just to use the Portaudio ring buffer. I needed to add pa_ringbuffer.[ch] and pa_memorybarrier.h to my project and then define a MessageData structure to store in the ring buffer.
typedef struct MessageData {
MessageType type;
union {
struct {
NSUInteger position;
} position;
} data;
} MessageData;
Then I allocated some space to store 32 messages and created the ring buffer.
_playbackData->RTToMainBuffer = malloc(sizeof(MessageData) * 32);
PaUtil_InitializeRingBuffer(&_playbackData->RTToMainRB, sizeof(MessageData),
32, _playbackData->RTToMainBuffer);
Finally I started an NSTimer for every 20ms to pull data from the ring buffer
while (PaUtil_GetRingBufferReadAvailable(&_playbackData->RTToMainRB)) {
MessageData *dataPtr1, *dataPtr2;
ring_buffer_size_t sizePtr1, sizePtr2;
// Should we read more than one at a time?
if (PaUtil_GetRingBufferReadRegions(&_playbackData->RTToMainRB, 1,
(void *)&dataPtr1, &sizePtr1,
(void *)&dataPtr2, &sizePtr2) != 1) {
continue;
}
// Parse message
switch (dataPtr1->type) {
case MessageTypeEOS:
break;
case MessageTypePosition:
break;
default:
break;
}
PaUtil_AdvanceRingBufferReadIndex(&_playbackData->RTToMainRB, 1);
}
Then in the realtime thread, pushing a message to the ringbuffer was simply
MessageData *dataPtr1, *dataPtr2;
ring_buffer_size_t sizePtr1, sizePtr2;
if (PaUtil_GetRingBufferWriteRegions(&data->RTToMainRB, 1,
(void *)&dataPtr1, &sizePtr1,
(void *)&dataPtr2, &sizePtr2)) {
dataPtr1->type = MessageTypePosition;
dataPtr1->data.position.position = currentPosition;
PaUtil_AdvanceRingBufferWriteIndex(&data->RTToMainRB, 1);
}
A ringbuffer is a good solution. Two if you need to communicate both ways ie. inbox/outbox message passing.
This is a good implementation for iOS/Mac if you don't want to use Portaudio.
https://github.com/michaeltyson/TPCircularBuffer

corrupted pointer in 'net_device'

the device driver I'm working on is implementing a virtual device. The logic
is as follows:
static struct net_device_ops virt_net_ops = {
.ndo_init = virt_net_init,
.ndo_open = virt_net_open,
.ndo_stop = virt_net_stop,
.ndo_do_ioctl = virt_net_ioctl,
.ndo_get_stats = virt_net_get_stats,
.ndo_start_xmit = virt_net_start_xmit,
};
...
struct net_device *dev;
struct my_dev *virt;
dev = alloc_netdev(..);
/* check for NULL */
virt = netdev_priv(dev);
dev->netdev_ops = &virt_net_ops;
SET_ETHTOOL_OPS(dev, &virt_ethtool_ops);
dev_net_set(dev, net);
virt->magic = MY_VIRT_DEV_MAGIC;
ret = register_netdev(dev);
if (ret) {
printk("register_netdev failed\n");
free_netdev(dev);
return ret;
}
...
What happens is that somewhere somehow the pointer net_device_ops in
'net_dev' gets corrupted, i.e.
1) create the device the first time (allocated net_dev, init the fields
including net_device_ops,which is
initialized with a static structure containing function pointers), register
the device with the kernel invoking register_netdev() - OK
2) attempt to create the device with the same name again, repeat the above
steps, call register_netdev() which will return negative and we
free_netdev(dev) and return error to the caller.
And between these two events the pointer to net_device_ops has changed,
although nowhere in the code it is done explicitly except the initialization
phase.
The kernel version is 2.6.31.8, platform MIPS. Communication channel between the user space and the kernel is implemented via netlink sockets.
Could anybody suggest what possibly can go wrong?
Appreciate any advices, thanks.
Mark
"The bug is somewhere else. "
The second device should not interact with the existing one. If you register_netdev with an existing name, nevertheless the ndo_init virtual function is called first before the condition is detected and -EEXIST is returned. Maybe your init function does something nasty involving some global variables. (For example, does the code assume there is one device, and stash a global pointer to it during initialization?)

Resources