Find process where a particular system call returns a particular error - macos

On OS X El Capitan, my log file system.log feels with hundreds of the following lines at times
03/07/2016 11:52:17.000 kernel[0]: hfs_clonefile: cluster_read failed - 34
but there is no indication of the process where this happens. Apart from that, Disk Utility could not find any fault with the file system. But I would still like to know what is going on and it seems to me that dtrace should be perfectly suited to find out that faulty process but I am stuck. I know of the function return probe but it seems to require the PID, e.g.
dtrace -n 'pidXXXX::hfs_clonefile:return { printf("ret: %d", arg1); }'
Is there a way to tell dtrace to probe all processes? And then how would I print the process name?

You can try something like this (I don't have access to an OS X machine to test it)
#!/usr/sbin/dtrace -s
# pragma D option quiet
fbt::hfs_clonefile:return
/ args[ 1 ] != 0 /
{
printf( "\n========\nprocess: %s, pid: %d, ret value: %d\n", execname, pid, args[ 1 ] );
/* get kernel and user-space stacks */
stack( 20 );
ustack( 20 );
}
For the fbt probes, args[ 1 ] is the value returned by the function.
The dTrace script will print out the process name, pid, and return value from hfs_clonefile() whenever the return value is not zero. It also adds the kernel and user space stack traces. That should be more than enough data for you to find the source of the errors.
Assuming it works on OS X, anyway.

You can use the syscall provider rather than the pid provider to do this sort of thing. Something like:
sudo dtrace -n 'syscall::hfs_clonefile*:return /errno != 0/ { printf("ret: %d\n", errno); }'
The above command is a minor variant of what's used within the built-in DTrace-based errinfo utility. You can view /usr/bin/errinfo in any editor to see how it works.
However, there's no hfs_clonefile syscall, as least as far as DTrace is concerned, on my El Capitan (10.11.5) system:
$ sudo dtrace -l -n 'syscall::hfs*:'
ID PROVIDER MODULE FUNCTION NAME
dtrace: failed to match syscall::hfs*:: No probe matches description
Also, unfortunately the syscall provider is prevented from tracing system processes by the System Integrity Protection feature introduced with El Capitan (macOS 10.11). So, you will have to disable SIP which makes your system less secure.

Related

warning using Parallel::ForkManager but only in Windows

I sometimes get this warning when using Parallel::ForkManager but only in Windows, not on a Unix based system. What does it mean and should I worry about it?
child process '-17108' disappeared. A call to waitpid outside of
Parallel::ForkManager might have reaped it.
Here is the sample code from the docs that my code is based on:
use LWP::Simple;
use Parallel::ForkManager;
my #links=(
["http://www.foo.bar/rulez.data","rulez_data.txt"],
["http://new.host/more_data.doc","more_data.doc"],
);
# Max 30 processes for parallel download
my $pm = Parallel::ForkManager->new(30);
LINKS:
foreach my $linkarray (#links) {
$pm->start and next LINKS; # do the fork
my ($link, $fn) = #$linkarray;
warn "Cannot get $fn from $link"
if getstore($link, $fn) != RC_OK;
$pm->finish; # do the exit in the child process
}
$pm->wait_all_children;
I had the similar issue and placing a sleep 1 before "$pm->start and next LINKS;"
fixed the issue. I guess its due to continues forking, where Perl lost track of the fork processes. I may be wrong!

Where does tmux keep logs on OS X

tmux has been crashing a lot lately, and I'm not sure why. I want to look into it further, but I don't know where I can find any kind of logs, or error messages. So far, my googling for "tmux log location" and the like has come up empty.
I'm running OS X, and installed tmux via Homebrew.
The manual page needs some work (you may not see the feature at first). But starting from the source code (referring to version 2.1 in tty.c) you may see
if (debug_level > 3) {
xsnprintf(out, sizeof out, "tmux-out-%ld.log", (long) getpid());
fd = open(out, O_WRONLY|O_CREAT|O_TRUNC, 0644);
if (fd != -1 && fcntl(fd, F_SETFD, FD_CLOEXEC) == -1)
fatal("fcntl failed");
tty->log_fd = fd;
}
The -v flag sets the debug_level value; repeating it increases the value. Back to the manual page:
-v
Request verbose logging. This option may be specified multiple times for increasing verbosity. Log messages will be saved into tmux-client-PID.log and tmux-server-PID.log files in the current directory, where PID is the PID of the server or client process.

looking for more info on EIO error returned by readdir_r () on Mac OS X

On Mac OS X 10.6.8, some C code that calls readdir_r() sometimes gets an I/O error 5 (EIO) returned. I've seen this just a few times, always on external USB drives. Each time I've seen it, if I immediately cd to the parent direction and do an ls, I appear to see all the files. And, if I re-run the C program that saw the error, it will run fine.
On some platforms, when one gets an error like EIO, there's something else you can call that gives you much more detailed error information (e.g., hperrmsg on MPE/iX, which accesses the per-process error stack). Is there something similar on OS X?
I guess I'm assuming that the error is false
i = readdir_r (dirp, &dire, &dire_ptr);
save_errno = errno;
if (i)
{
if (i == EACCES)
...
else
{
my_perror (save_errno, "readdir_r failed: ");
espout4 ("readdir_r returned %s; errno = %s, dire_ptr = %s; parent = %s\n",
num64 (i), num64 (save_errno), fmt_p (dire_ptr), parent_dir);
espout1 (" (ERROR: readdir_r failed here, errno %s)\n", num64 (save_errno));
}
}
Update, a week later...
I've determined that the combo of the Macally case (model G-S350SU)
and the Western Digital drive was causing sporadic problems. I noticed that the WD SATA connector didn't mate tightly with the Macally socket. Using a Seagate drive with the case was fine, and using the WD drive in a different case was fine.
Of course, I'd still like to be able to get better / more detailed error information in my programs :)

Random corruption in file created/updated from shell script on a singular client to NFS mount

We have bash script (job wrapper) that writes to a file, launches a job, then at job completion it appends to the file information about the job. The wrapper is run on one of several thousand batch nodes, but has only cropped up with several batch machines (I believe RHEL6) accessing one NFS server, and at least one known instance of a different batch job on a different batch node using a different NFS server. In all cases, only one client host is writing to the files in question. Some jobs take hours to run, others take minutes.
In the same time period that this has occurred, there seems to be 10-50 issues out of 100,000+ jobs.
Here is what I believe to effectively be the (distilled) version of the job wrapper:
#!/bin/bash
## cwd is /nfs/path/to/jobwd
## This file is /nfs/path/to/jobwd/job_wrapper
gotEXIT()
{
## end of script, however gotEXIT is called because we trap EXIT
END="EndTime: `date`\nStatus: Ended”
echo -e "${END}" >> job_info
cat job_info | sendmail jobtracker#example.com
}
trap gotEXIT EXIT
function jobSetVar { echo "job.$1: $2" >> job_info; }
export -f jobSetVar
MSG=“${email_metadata}\n${job_metadata}”
echo -e "${MSG}\nStatus: Started" | sendmail jobtracker#example.com
echo -e "${MSG}" > job_info
## At the job’s end, the output from `time` command is the first non-corrupt data in job_info
/usr/bin/time -f "Elapsed: %e\nUser: %U\nSystem: %S" -a -o job_info job_command
## 10-360 minutes later…
RC=$?
echo -e "ExitCode: ${RC}" >> job_info
So I think there are two possibilities:
echo -e "${MSG}" > job_info
This command throws out corrupt data.
/usr/bin/time -f "Elapsed: %e\nUser: %U\nSystem: %S" -a -o job_info job_command
This corrupts the existing data, then outputs it’s data correctly.
However, some job, but not all, call jobSetVar, which doesn't end up being corrupt.
So, I dig into time.c (from GNU time 1.7) to see when the file is open. To summarize, time.c is effectively this:
FILE *outfp;
void main (int argc, char** argv) {
const char **command_line;
RESUSE res;
/* internally, getargs opens “job_info”, so outfp = fopen ("job_info", "a”) */
command_line = getargs (argc, argv);
/* run_command doesn't care about outfp */
run_command (command_line, &res);
/* internally, summarize calls fprintf and putc on outfp FILE pointer */
summarize (outfp, output_format, command_line, &res); /
fflush (outfp);
}
So, time has FILE *outfp (job_info handle) open the entire time of the job. It then writes the summary at the end of the job, and then doesn’t actually appear to close the file (not sure if this is necessary with fflush?) I've no clue if bash also has the file handle open concurrently as well.
EDIT:
A corrupted file will typically end consist of the corrupted part, followed with the non-corrupted part, which may look like this:
The corrupted section, which would occur before the non-corrupted section, is typically largely a bunch of 0x0000, with maybe some cyclic garbage mixed in:
Here's an example hexdump:
40000000 00000000 00000000 00000000
00000000 00000000 C8B450AC 772B0000
01000000 00000000 C8B450AC 772B0000
[ 361 x 0x00]
Then, at the 409th byte, it continues with the non-corrupted section:
Elapsed: 879.07
User: 0.71
System: 31.49
ExitCode: 0
EndTime: Fri Dec 6 15:29:27 PST 2013
Status: Ended
Another file looks like this:
01000000 04000000 805443FC 9D2B0000 E04144FC 9D2B0000 E04144FC 9D2B0000
[96 x 0x00]
[Repeat above 3 times ]
01000000 04000000 805443FC 9D2B0000 E04144FC 9D2B0000 E04144FC 9D2B0000
Followed by the non corrupted section:
Elapsed: 12621.27
User: 12472.32
System: 40.37
ExitCode: 0
EndTime: Thu Nov 14 08:01:14 PST 2013
Status: Ended
There are other files that have much more random corruption sections, but more than a few were cyclical similar to above.
EDIT 2: The first email, sent from the echo -e statement goes through fine. The last email is never sent due to no email metadata from corruption. So MSG isn't corrupted at that point. It's assumed that job_info probably isn't corrupt at that point either, but we haven't been able to verify that yet. This is a production system which hasn't had major code modifications and I have verified through audit that no jobs have been ran concurrently which would touch this file. The problem seems to be somewhat recent (last 2 months), but it's possible it's happened before and slipped through. This error does prevent reporting which means jobs are considered failed, so they are typically resubmitted, but one user in specific has ~9 hour jobs in which this error is particularly frustrating. I wish I could come up with more info or a way of reproducing this at will, but I was hoping somebody has maybe seen a similar problem, especially recently. I don't manage the NFS servers, but I'll try to talk to the admins to see what updates the NFS servers at the time of these issues (RHEL6 I believe) were running.
Well, the emails corresponding to the corrupt job_info files should tell you what was in MSG (which will probably be business as usual). You may want to check how NFS is being run: there's a remote possibility that you are running NFS over UDP without checksums. That could explain some corrupt data. I also hear that UDP/TCP checksums are not strong enough and the data can still end up corrupt -- maybe you are hitting such a problem (I have seen corrupt packets slipping through a network stack at least once before, and I'm quite sure some checksumming was going on). Presumably the MSG goes out as a single packet and there might be something about it that makes checksum conflicts with the garbage you see more likely. Of course it could also be an NFS bug (client or server), a server-side filesystem bug, busted piece of RAM... possibilities are almost endless here (although I see how the fact that it's always MSG that gets corrupted makes some of those quite unlikely). The problem might be related to seeking (which happens during the append). You could also have a bug somewhere else in the system, causing multiple clients to open the same job_info file, making it a jumble.
You can also try using different file for 'time' output and then merge them together with job_info at the end of script. That may help to isolate problem further.
Shell opens 'job_info' file for writing, outputs MSG and then shall close its file descriptor before launching main job. 'time' program opens same file for append as stream and I suspect the seek over NFS is not done correctly which may cause that garbage. Can't explain why, but normally this shall not happen (and is not happening). Such rare occasions may point to some race condition somewhere, can be caused by out of sequence packet delivery (due to network latency spike) or retransmits which causes duplicate data, or a bug somewhere. At first look I would suspect some bug, but that bug may be triggered by some network behavior, e.g. unusually large delay or spike of packet loss.
File access between different processes are serialized by kernel, but for additional safeguard may be worth adding some artificial delays - sleep timers between outputs for example.
Network is not transparent, especially a large one. There can be WAN optimization devices which are known to cause application issues sometimes. CIFS and NFS are good candidates for optimization over WAN with local caching of filesystem operations. Might be worth looking for recent changes with network admins..
Another thing to try, although can be difficult due to rare occurrences is capture of interesting NFS sessions via tcpdump or wireshark. In really tough cases we do simultaneous capturing on both client and server side and then compare the protocol logic to prove that network is or is not working correctly. That's a whole topic in itself, requires thorough preparation and luck but usually a last resort of desperate troubleshooting :)
It turns out this was actually another issue altogether, apparently to do with an out-of-date page being written to disk.
A bug fix was supplied to the linux-nfs implementation:
http://www.spinics.net/lists/linux-nfs/msg41357.html

Can I ask dtrace what probes are enabled?

If it matters, I am using Mac OS X, but I believe this would apply across OSs. If the answer is different per OS, I would be interested in learning about that as well.
Let's say that I open a terminal window, enable a few probes, and start collecting data with DTrace.
From a different terminal window, can I ask DTrace what probes have been enabled? If so, how?
I got the following information from Adam Leventhal on a DTrace mailing list. First, he provided this script, which works on Solaris
#!/usr/sbin/dtrace -s
#pragma D option quiet
int i;
tick-100
/i >= `dtrace_nprobes/
{
exit(0);
}
tick-100
{ printf("%4d %10s %20s %20s %10s %s\n", i,
stringof(`dtrace_probes[i]->dtpr_provider->dtpv_name),
stringof(`dtrace_probes[i]->dtpr_mod),
stringof(`dtrace_probes[i]->dtpr_func),
stringof(`dtrace_probes[i]->dtpr_name),
`dtrace_probes[i]->dtpr_ecb != NULL ? "enabled" : "disabled");
i++
}
Unfortunately, the same kernel variables are not available on Mac OS X due to a bug.

Resources