What is a 'k' packet? - macos

I have this crash log on a Core Audio application I'm developping. I'm currently debugging it, so my question isn't about the crash itself, but about the meaning of 'k' packet.
What does it mean ?
I've read this, and this (about inferior process), but I'm not sure to understand.
CRASH LOG EXTRACT
26/05/14 15:12:37,469 coreaudiod[170]: Disabled automatic stack shots because audio IO is active
26/05/14 15:13:40,352 com.apple.debugserver-300.2[1587]: Got a 'k' packet, killing the inferior process.
26/05/14 15:13:40,353 com.apple.debugserver-300.2[1587]: Sending ptrace PT_KILL to terminate
inferior process.
26/05/14 15:13:40,353 com.apple.debugserver-300.2[1587]: 2 +70.908045 sec [0633/0303]: error: ::ptrace (request = PT_THUPDATE, pid = 0x0634, tid = 0x2003, signal = 0) err = Resource busy (0x00000010)
26/05/14 15:13:40,354 com.apple.debugserver-300.2[1587]: 3 +0.000258 sec [0633/0303]: error: ::task_info ( target_task = 0x1803, flavor = TASK_BASIC_INFO, task_info_out => 0x7fff5a8ecfa0, task_info_outCnt => 10 ) err = (os/kern) invalid argument (0x00000004)
26/05/14 15:13:40,362 coreaudiod[170]: Enabled automatic stack shots because audio IO is inactive
26/05/14 15:13:40,369 _coreaudiod[1607]: audit warning: allsoft 26/05/14 15:13:40,369 _coreaudiod[1606]: audit warning: soft/var/audit
26/05/14 15:13:40,370 _coreaudiod[1608]: audit warning: closefile /var/audit/20140526131229.20140526131340

A k Packet is a kill command, received by GDB and executed upon the "inferior process," to wit, killing the application being debugged.
GDB can be controlled by a remote machine, and the commands it receives remotely (or locally through the interface) are in the form of formatted TCP packets, hence the name.

Related

Find process where a particular system call returns a particular error

On OS X El Capitan, my log file system.log feels with hundreds of the following lines at times
03/07/2016 11:52:17.000 kernel[0]: hfs_clonefile: cluster_read failed - 34
but there is no indication of the process where this happens. Apart from that, Disk Utility could not find any fault with the file system. But I would still like to know what is going on and it seems to me that dtrace should be perfectly suited to find out that faulty process but I am stuck. I know of the function return probe but it seems to require the PID, e.g.
dtrace -n 'pidXXXX::hfs_clonefile:return { printf("ret: %d", arg1); }'
Is there a way to tell dtrace to probe all processes? And then how would I print the process name?
You can try something like this (I don't have access to an OS X machine to test it)
#!/usr/sbin/dtrace -s
# pragma D option quiet
fbt::hfs_clonefile:return
/ args[ 1 ] != 0 /
{
printf( "\n========\nprocess: %s, pid: %d, ret value: %d\n", execname, pid, args[ 1 ] );
/* get kernel and user-space stacks */
stack( 20 );
ustack( 20 );
}
For the fbt probes, args[ 1 ] is the value returned by the function.
The dTrace script will print out the process name, pid, and return value from hfs_clonefile() whenever the return value is not zero. It also adds the kernel and user space stack traces. That should be more than enough data for you to find the source of the errors.
Assuming it works on OS X, anyway.
You can use the syscall provider rather than the pid provider to do this sort of thing. Something like:
sudo dtrace -n 'syscall::hfs_clonefile*:return /errno != 0/ { printf("ret: %d\n", errno); }'
The above command is a minor variant of what's used within the built-in DTrace-based errinfo utility. You can view /usr/bin/errinfo in any editor to see how it works.
However, there's no hfs_clonefile syscall, as least as far as DTrace is concerned, on my El Capitan (10.11.5) system:
$ sudo dtrace -l -n 'syscall::hfs*:'
ID PROVIDER MODULE FUNCTION NAME
dtrace: failed to match syscall::hfs*:: No probe matches description
Also, unfortunately the syscall provider is prevented from tracing system processes by the System Integrity Protection feature introduced with El Capitan (macOS 10.11). So, you will have to disable SIP which makes your system less secure.

thin using high cpu and not replying to request

After a timeout occurs on thin, the process keep using high cpu. The only way is to restart it (I let it run more than a day)
this is the output of strace
ruby#localhost:~$ strace -p 17830
Process 17830 attached
brk(0x7cf38000) = 0x7cf38000
brk(0x7d3ac000) = 0x7d3ac000
brk(0x7d655000) = 0x7d655000
brk(0x7de8c000) = 0x7de8c000
brk(0x7e616000) = 0x7e616000
brk(0x7a0c9000) = 0x7a0c9000
.. and one similar line each 3 - 4 seconds
Why this happen? I've seen this also on mongrel. Why if the http request already ended it keeps up?

Issue with NVIDIA GPU for matrix operations [duplicate]

I have this little nonsense script here which I am executing in MATLAB R2013b:
clear all;
n = 2000;
times = 50;
i = 0;
tCPU = tic;
disp 'CPU::'
A = rand(n, n);
B = rand(n, n);
disp '::Go'
for i = 0:times
CPU = A * B;
end
tCPU = toc(tCPU);
tGPU = tic;
disp 'GPU::'
A = gpuArray(A);
B = gpuArray(B);
disp '::Go'
for i = 0:times
GPU = A * B ;
end
tGPU = toc(tGPU);
fprintf('On CPU: %.2f sec\nOn GPU: %.2f sec\n', tCPU, tGPU);
Unfortunately after execution I receive a message from Windows saying: "Display driver stopped working and has recovered.".
Which I assume means that Windows did not get response from my graphic cards driver or something. The script returned without errors:
>> test
CPU::
::Go
GPU::
::Go
On CPU: 11.01 sec
On GPU: 2.97 sec
But no matter if the GPU runs out of memory or not, MATLAB is not able to use the GPU device before I restarted it. If I don't restart MATLAB I receive just a message from CUDA:
>> test
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
CPU::
::Go
GPU::
Error using gpuArray
An unexpected error occurred during CUDA execution.
The CUDA error was:
the launch timed out and was terminated
Error in test (line 21)
A = gpuArray(A);
Does anybody know how to avoid this issue or what I am doing wrong here?
If needed, my GPU Device:
>> gpuDevice
ans =
CUDADevice with properties:
Name: 'GeForce GTX 660M'
Index: 1
ComputeCapability: '3.0'
SupportsDouble: 1
DriverVersion: 6
ToolkitVersion: 5
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1475e+09
FreeMemory: 1.9037e+09
MultiprocessorCount: 2
ClockRateKHz: 950000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
The key piece of information is this part of the gpuDevice output:
KernelExecutionTimeout: 1
This means that the host display driver is active on the GPU you are running the compute jobs on. The NVIDIA display driver contains a watchdog timer which kills any task which takes more than a predefined amount of time without yielding control back to the driver for screen refresh. This is intended to prevent the situation where a long running or stuck compute job renders the machine unresponsive by freezing the display. The runtime of your Matlab script is clearly exceeding the display driver watchdog timer limit. Once that happens, the the compute context held on the device is destroyed and Matlab can no longer operate with the device. You might be able to reinitialise the context by calling reset, which I guess will run cudaDeviceReset() under the cover.
There is a lot of information about this watchdog timer on the interweb - for example this Stack Overflow question. The solution for how to modify this timeout is dependent on your OS and hardware. The simplest way to avoid this is to not run CUDA code on a display GPU, or increase the granularity of your compute jobs so that no one operation has a runtime which exceeds the timeout limit. Or just write faster code...

High frequency calls to 'VM Periodic Task Thread'

Running a small jetty application on a raspberry pi I noticed that after the first access, the application keeps burning around 3% CPU. A quick inspection showed that the same is true, with less %, on my laptop. Checking with strace I find a never ending sequence of
...
12:58:01.999717 clock_gettime(CLOCK_MONOTONIC, {2923, 200177551}) = 0
12:58:01.999864 futex(0x693a0f44, FUTEX_WAIT_BITSET_PRIVATE, 1, {2923, 250177551}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
12:58:02.050090 futex(0x693a0f28, FUTEX_WAKE_PRIVATE, 1) = 0
12:58:02.050236 gettimeofday({1436093882, 50296}, NULL) = 0
12:58:02.050403 gettimeofday({1436093882, 50444}, NULL) = 0
12:58:02.050767 clock_gettime(CLOCK_MONOTONIC, {2923, 251228114}) = 0
...
(This is Java 7 on ubuntu 14.04 with Jetty 9.3.* using an h2 db, just in case this rings any bells for someone.)
I learned that it suffices to capture strace -f -tt -p <pid> -o out.txt, grep for clock_gettime, extract the pid, sort and uniq -c to find the thread calling clock_gettime most often. Plotting the delta times nicely shows a line at 50 milliseconds. Further the PID can be found in a thread dump taken with jvisualvm as the nid in hex and turns out to be 'VM Periodic Task Thread'. But why so often? This does not seem to be a standard behaviour of the JVM.

Julia doesn't like it when I add and remove processes without doing any parallel processing

UPDATE: Confirmed as a bug. For more detail, see the link and details provided by #ViralBShah below.
Julia throws a strange error when I add and remove processes (addprocs and rmprocs), but only if I don't do any parallel processing in between. Consider the following example code:
#Set parameters
numCore = 4;
#Add workers
print("Adding workers... ");
addprocs(numCore - 1);
println(string(string(numCore-1), " workers added."));
#Detect number of cores
println(string("Number of processes detected = ", string(nprocs())));
# Do some stuff (COMMENTED OUT)
# XLst = {rand(10, 1) for i in 1:8};
# XMean = pmap(mean, XLst);
#Remove the additional workers
print("Removing workers... ");
rmprocs(workers());
println("Done.");
println("Subroutine complete.");
Note that I've commented out the only code that actually does any parallel processing (the call to pmap). If I run this code on my machine (Julia 0.2.1, Ubuntu 14.04), I get the following output in the console:
Adding workers... 3 workers added.
Number of processes detected = 4
Removing workers... Done.
Subroutine complete.
fatal error on
In [86]: fatal error on 88: ERROR: 87: ERROR: connect: connection refused (ECONNREFUSED)
in yield at multi.jl:1540
connect: connection refused (ECONNREFUSED) in wait at task.jl:117
in wait_connected at stream.jl:263
in connect at stream.jl:878
in Worker at multi.jl:108
in anonymous at task.jl:876
in yield at multi.jl:1540
in wait at task.jl:117
in wait_connected at stream.jl:263
in connect at stream.jl:878
in Worker at multi.jl:108
in anonymous at task.jl:876
The first four lines are printed by my program, and seem to indicate that it runs to completion. But then I get a fatal error. Any ideas?
The most interesting thing about this error is if I uncomment the code with the call to pmap (ie if I actually do some parallel processing), the fatal error goes away.
This issue is being tracked at https://github.com/JuliaLang/julia/issues/7646 and I reproduce the answer by Amit Murthy:
pid 1 does an addprocs(3)
addprocs returns after it has established connections with all 3 new workers.
However, at this time the the connections between workers may not have been setup, i.e. from pids 3 -> 2, 4 -> 2 and 4 -> 3.
Now pid 1 calls rmprocs(workers()) , i.e., pids 2, 3 and 4.
As pid 2 exits, the connection attempt in 4 to 2, results in an error.
Since we have redirected the output of pid 4, to the stdout of pid 1, we see the same error printed.
The system is still in a consistent state, though the printing of said error messages may suggest something amiss.

Resources