See what causes deadlock on pthread_mutex_lock - xcode

I have a Core Data iOS app that uses private queue concurrency in a background process. I'm getting a deadlock that makes the UI freeze up from time to time (fairly regularly, to be honest) - but all the info I get from the debugger (LLDB) is that it is stuck on pthread_mutex_lock. The stack trace is no longer than that, which makes debugging near on impossible:
thread #1: tid = 0x2503, 0x3b5060fc libsystem_kernel.dylib`__psynch_mutexwait + 24, stop reason = signal SIGSTOP
frame #0: 0x3b5060fc libsystem_kernel.dylib`__psynch_mutexwait + 24
frame #1: 0x3b44f128 libsystem_c.dylib`pthread_mutex_lock + 392
The XCode process pane is similarly only showing those two entries on the stack.
I'm quite new to this multithreading stuff so am at a total loss where to begin with fixing the issue. Any suggestions for how to go about debugging this?

Your stack is obviously longer than two frames, you can't start a thread with pthread_mutex_lock. So the truncation of the stack frame is pretty clearly just a bug in the lldb unwinder. If you have an ADC account, please file a bug about this at bugreporter.apple.com. Also if you're not using the most recent version of lldb you can get your hands on you might want to try that, maybe it fixed whatever bug you are seeing. You can install multiple Xcode's side by side so you don't have to remove the one you are currently using to try a newer one.
You might also try another tool that will give you a backtrace (e.g. the Instruments time profiler) when your app gets into this state, since it uses a different unwinder. That will at least let you see what the full backtrace is.

Related

Call to ExAllocatePoolWithTag never returns

I am having some issues with my virtualHBA driver on Windows Server 2016. A ran the HLK crashdump support test. 3 times out of 10 the test passed. In those 3 failing tests, the crashdump hangs at 0% while taking Complete dump, or Kernel dump or minidump.
By kernel debugging my code, I found that the call to ExAllocatePoolWithTag() for buffer allocation never actually returns.
Below is the statement which never returns.
pDeviceExtension->pcmdbuf=(struct mycmdrsp *)ExAllocatePoolWithTag(NonPagedPoolCacheAligned,pcmdqSignalSize,((ULONG)'TA1'));
I searched on the web regarding this. However, all of the found pages are focusing on this function returning NULL which in my case never returns.
Any help on how to move forward would be highly appreciated.
Thanks in advance.
You can't allocate memory in crash dump mode. You're running at HIGH_LEVEL with interrupts disabled and so you're calling this API at the wrong IRQL.
The typical solution for a hardware adapter is to set the RequestedDumpBufferSize in the PORT_CONFIGURATION_INFORMATION structure during the normal HwFindAdapter call. Then when you're called again in crash dump mode you use the CrashDumpRegion field to get your dump buffer allocation. You then need to write your own "crash dump mode only" allocator to allocate buffers out of this memory region.
It's a huge pain, especially given that it's difficult/impossible to know how much memory you're ultimately going to need. I usually calculate some minimal configuration overhead (i.e. 1 channel, 8 I/O requests at a time, etc.) and then add in a registry configurable slush. The only benefit is that the environment is stripped down so you don't need to be in your all singing, all dancing configuration.

How to implement a kernel thread that never sleeps?

Problem
I need a kernel thread that is able to work for prolonged periods of time without yielding, basically fully dedicating a CPU core to it on demand:
int my_kthread(void *arg)
{
while(!kthread_should_stop()) {
do_some_work();
if(sleeping_enabled) msleep(1000);
else {
// What to do here to avoid lockup warnings
// and ensure system stability?
}
}
return 0;
}
Background
The thread is created like this when the module that I am working on is loaded:
my_task = kthread_run(&my_kthread, (void *)some_data, "My KThread")
set_cpus_allowed(my_task, *cpumask_of(10)); // Pin thread to core #10
and stopped like this when the module is unloaded:
kthread_stop(my_task);
Everything works just fine when sleeping_enabled is true.
Otherwise, soon after the thread is started, the kernel complains of the apparent lockup.
At first, I merely aimed to avoid the various warnings such as
BUG: soft lockup - CPU#10 stuck for 22s!
and
INFO: rcu_sched detected stalls on CPUs/tasks: { 10} (detected by 15, t=30567 jiffies)
since they tend to flood my console with dumps for all >20 cores, and the "lockup" is desired behavior.
I tried poking the watchdog like this:
if(sleeping_enabled) msleep(1000);
else touch_softlockup_watchdog();
in combination with (echo 1 > /sys/module/rcupdate/parameters/rcu_cpu_stall_suppress)
and pretty much got what I want (a never-sleeping thread that successfully does what I want and no spam in the console).
However, not only does this "solution" feel like cheating, it seems I am completely breaking something by hogging that one core: when unloading the module via rmmod, the whole system freezes. The console starts periodically dumping soft lockups on all cores, with this call trace:
[<ffffffff810c96b0>] ? queue_stop_cpus_work+0xd0/0xd0
[<ffffffff810c9903>] cpu_stopper_thread+0xe3/0x1b0
[<ffffffff8108639a>] ? finish_task_switch+0x4a/0xf0
[<ffffffff8169e654>] ? __schedule+0x3c4/0x700
[<ffffffff81080e98>] ? __wake_up_common+0x58/0x90
[<ffffffff810c9820>] ? __stop_cpus+0x80/0x80
[<ffffffff81077e93>] kthread+0x93/0xa0
[<ffffffff816a9724>] kernel_thread_helper+0x4/0x10
[<ffffffff81077e00>] ? flush_kthread_worker+0xb0/0xb0
[<ffffffff816a9720>] ? gs_change+0x13/0x13
Meanwhile, my kernel thread continues running (as evidenced by some console messages that it prints out every now and then), so it never saw kthread_should_stop() return true.
Unloading did work correctly and stopped the thread before I switched to not sleeping at all. Now, I am unable to make iterative modifications without having to reboot.
Note that I have simplified the description here a lot. I am trying to add such a thread (to poll some hardware registers and log their changes) to a GPU driver, so there may be module-dependent reasons for the freeze on unload. However, this does not change my general question about how to best implement a thread that never sleeps.
I think this question is similar to your question "whole one core dedicated to single process" , please check the replies there.

get_user_pages -EFAULT error caused by VM_GROWSDOWN flag not set

I'm continue my work on the FGPA driver.
Now I'm adding OpenCL support. So I have a following test.
It's just add NUM_OF_EXEC times write and read requests of same buffers and after that waits for completion.
Each write/read request serialized in driver and sequentially executed as DMA transaction. DMA related code can be viewed here.
So the driver takes a transaction, execute it (rsp_setup_dma and fpga_push_data_to_device), waits for interrupt from FPGA (fpga_int_handler), release resources (fpga_finish_dma_write) and begin a new one. When NUM_OF_EXEC equals to 1, all seems to work, but if I increase it, problem appears. At some point get_user_pages (at rsp_setup_dma) returns -EFAULT. Debugging the kernel, I found out, that allocated vma doesn't have VM_GROWSDOWN flag set (at find_extend_vma in mmap.c). But at this point I stuck, because neither I'm sure that I understand why this flag is needed, neither I have an idea why it is not set. Why can get_user_pages fail with the above symptomps? How can I debug this?
On some architectures the stack grows up and on others the stack grows down. See hppa and hppa64 for the weirdos that created the need for such a flag.
So whenever you have to deal with setting up the stack for a kernel thread or process you'll have to provide the direction in which the stack grows as well.

OpenAL - alSourceQueueBuffers increases AL_BUFFERS_PROCESSED count?

I'm having some difficulty with handling streaming sources in OpenAL on Mac OS X (using the system framework). I'm still not sure what triggers it, but sometimes, after stopping a streaming source and playing it again, queueing a buffer increases the AL_BUFFERS_PROCESSED value. I use a while loop like the following to process the source's buffers:
alGetSourcei(source, AL_BUFFERS_PROCESSED, &processed);
while (processed--)
{
ALuint buffer;
// Get a free buffer.
alSourceUnqueueBuffers(source, 1, &buffer);
streamAtomic(buffer, decoder); // streamAtomic decodes compressed audio data and calls alBufferData.
alSourceQueueBuffers(source, 1, &buffer);
}
The full source code to the Source class can be found here.
Normally this update loop works fine, but whenever this bug gets triggered, calling alSourceQueueBuffers seemingly increases AL_BUFFERS_PROCESSED, meaning that every update cycle, this loop takes longer and longer, until it reaches the total number of buffers queued, period (32, in this case), where it stays until pausing or stopping the source, at which point AL_BUFFERS_PROCESSED resets - and promptly begins increasing again. I checked, and the count does decrease by 1 after calling alSourceUnqueueBuffers. It's only after I call alSourceQueueBuffers that the count increases again.
I've been poring over my code, the OpenAL spec, Stack Overflow, the OpenAL mailing list, and Google, and I can't find any documentation of this occurring, nor any indication as to whether I'm doing something wrong or if it's a bug in the OpenAL implementation. For what it's worth, this bug does not occur, using the exact same code, under OpenAL Soft on Windows and Linux. I couldn't get OpenAL Soft working properly on my Mac to test, though.
Any ideas?

OpenAL on Mac OS X: Setting AL_SAMPLE_OFFSET does nothing

at work, we're unable to use alSourcePause() to pause sounds, and in any case we might want to start the sound with an offset.
We're performing a "resume" by doing alSourcei(this->sourceId, AL_SAMPLE_OFFSET, this->sampleOffset); with a sample offset that we retrieved with alGetSourcei(). We tried using AL_SEC_OFFSET, AL_BYTE_OFFSET and AL_SAMPLE_OFFSET -- to no avail. We have read that the sound source needs to be in the "initial" state; recreating the source and attaching the buffer, then attempting to skip also did not help.
Changing the buffer to skip AL_BYTE_OFFSET is not a solution, since it complicates looping.
Streaming sounds are skipping on slower machines; we're having trouble implementing multithreaded playing.
Since we're on a tight schedule, what is the best way to skip a portion of a simple sound source on OpenAL on OS X?
Source code is available at our Sourceforge repository.
I recently encountered the same problem in our game engine on OS X (10.6.8). We performed the following steps when resuming playback of a static buffer with a given sample offset, in this order:
alSourceQueueBuffers(mSourceId, 1, mBufferId);
alSourcei(mSourceId, AL_SAMPLE_OFFSET, mSampleOffset);
alSourcePlay(mSourceId);
The source was stopped before that, and all buffers were unqueued. According to the AL 1.1 specs, it should be possible to either
specify the buffer offset when the source is in the stopped state; here, the offset is supposed to be applied upon the next alSourcePlay() call, or
specify the offset on an already playing source, which should result in an immediate skip to the desired position.
(See section 4.3.2 of the official specs at http://connect.creativelabs.com/openal/Documentation/OpenAL%201.1%20Specification.htm )
Reversing the latter two calls in the above sequence (i.e. setting the buffer offset after issuing the alSourcePlay() call) did the trick in our case. Technically, this should be a perfectly valid way to go; however, if the audio thread gets interrupted right between these two calls for too long a time, this could possibly result in hearable glitches.

Resources