How to kill slave kernel securely? - wolfram-mathematica

LinkClose[link] "does not necessarily terminate the program at the other end
of the connection" as it is said in the Documentation. Is there a way to kill the
process of the slave kernel securely?
EDIT:
In really I need a function in Mathematica that returns only when the process of the slave kernel has already killed and its memory has already released. Both LinkInterrupt[link, 1] and LinkClose[link] do not wait while the slave kernel exits. At this moment the only such function is seemed to be killProc[procID] function I had showed in one of answers at this page. But is there a built-in analog?

At this moment I know only one method to kill the MathKernel process securely. This method uses NETLink and seems to work only under Windows and requires Microsoft .NET 2 or later to be installed.
killProc[processID_] := If[$OperatingSystem === "Windows",
Needs["NETLink`"];
Symbol["LoadNETType"]["System.Diagnostics.Process"];
With[{procID = processID},
killProc[procID_] := (
proc = Process`GetProcessById[procID];
proc#Kill[]
);
];
killProc[processID]
];
(*Killing the current MathKernel process*)
killProc[$ProcessID]
Any suggestions or improvements will be appreciated.
Edit:
The more correct method:
Needs["NETLink`"];
LoadNETType["System.Diagnostics.Process"];
$kern = LinkLaunch[First[$CommandLine] <> " -mathlink -noinit"];
LinkRead[$kern];
LinkWrite[$kern, Unevaluated[$ProcessID]];
$kernProcessID = First#LinkRead[$kern];
$kernProcess = Process`GetProcessById[$kernProcessID];
AbortProtect[If[! ($kernProcess#Refresh[]; $kernProcess#HasExited),
$kernProcess#Kill[]; $kernProcess#WaitForExit[];
$kernProcess#Close[]];
LinkClose[$kern]]
Edit 2:
Even more correct method:
Needs["NETLink`"];
LoadNETType["System.Diagnostics.Process"];
$kern = LinkLaunch[First[$CommandLine] <> " -mathlink -noinit"];
LinkRead[$kern];
LinkWrite[$kern, Unevaluated[$ProcessID]];
$kernProcessID = First#LinkRead[$kern];
$kernProcess = Process`GetProcessById[$kernProcessID];
krnKill := AbortProtect[
If[TrueQ[MemberQ[Links[], $kern]], LinkClose[$kern]];
If[TrueQ[MemberQ[LoadedNETObjects[], $kernProcess]],
If[! TrueQ[$kernProcess#WaitForExit[100]],
Quiet#$kernProcess#Kill[]; $kernProcess#WaitForExit[]];
$kernProcess#Close[]; ReleaseNETObject[$kernProcess];
]
];

Todd Gayley has answered my question in the newsgroup. The solution is to send to the slave kernel an MLTerminateMessage. From
top-level code:
LinkInterrupt[link, 1] (* An undocumented form that lets you pick
the message type *)
In C:
MLPutMessage(link, MLTerminateMessage);
In Java using J/Link:
link.terminateKernel();
In .NET using .NET/Link:
link.TerminateKernel();
EDIT:
I have discovered that in standard cases when using LinkInterrupt[link, 1]
my operating system (Windows 2000 at the moment) releases physical memory
only in 0.05-0.1 second beginning with a moment of execution of
LinkInterrupt[link, 1] while with LinkClose[link] it releases physical
memory in 0.01-0.03 second (both values include the time, spent on execution
of the command itself). Time intervals were measured by using SessionTime[]
under equal conditions and are steadily reproduced.
Actually I need a function in Mathematica that returns only when the process of the slave kernel has already killed and its memory has already released. Both LinkInterrupt[link, 1] and LinkClose[link] do not wait while the slave kernel exits. At this moment the only such function is seemed to be killProc[procID] function I had showed in another answer at this page.

Related

linux device driver stuck in spin lock due to ongoing read on first access

I am working on a Linux driver for usb device which fortunately is identical to that in the usb_skeleton example driver which is part of the standard kernel source.
With the 4.4 kernel, it was a breeze, I simply changed the VID and PID and a few strings and the driver compiled and worked perfectly both on x64 and ARM kernels.
But it turns out I have to make this work with a 3.2 kernel. I have no choice in this. I made the same modifications to the skeleton driver in the 3.2 source. Again, I did not have to change actual code, just the VID, PID and some strings. Although it compiles and loads fine (and shows up in /dev), it permanently hangs in the first attempt to do a read from /dev/myusbdev0.
The following code is from the read function, which is supposed to read from the bulk endpoint. When I attempt to read the device, I see the first message that it is going to block due to ongoing io. Then nothing. The user program trying to read this is hung, and cannot be killed with kill -9. The linux machine cannot even reboot - I have to power cycle. There are no error messages, exceptions or anything like that. It seems fairly certain it is hanging in the part that is commented 'IO May Take Forever'.
My question is: why would there be ongoing IO when no program has done any IO with the driver yet? Can I fix this in driver code, or does the user program have to do something before it can start reading from /dev/myusbdev0 ?
In this case the target machine an embedded ARM device similar to a Beaglebone Black. Incidently, the 4.4 kernel version of this driver works perfectly with on the Beaglebone with the same user-mode test program.
/* if IO is under way, we must not touch things */
retry:
spin_lock_irq(&dev->err_lock);
ongoing_io = dev->ongoing_read;
spin_unlock_irq(&dev->err_lock);
if (ongoing_io) {
dev_info(&interface->dev,
"USB PureView Pulser Receiver device blocking due to ongoing io -%d",
interface->minor);
/* nonblocking IO shall not wait */
if (file->f_flags & O_NONBLOCK) {
rv = -EAGAIN;
goto exit;
}
/*
* IO may take forever
* hence wait in an interruptible state
*/
rv = wait_for_completion_interruptible(&dev->bulk_in_completion);
dev_info(&interface->dev,
"USB PureView Pulser Receiver device completion wait done io -%d",
interface->minor);
if (rv < 0)
goto exit;
/*
* by waiting we also semiprocessed the urb
* we must finish now
*/
dev->bulk_in_copied = 0;
dev->processed_urb = 1;
}
Writing this up as an answer since there was no response to my comments. Kernel commit c79041a4[1], which was added in 3.10, fixes "blocked forever in skel_read". Looking at the code above, I see that the first message can trigger without the second being shown if the device file has the O_NONBLOCK flag set. As described in the commit message, if the completion occurs between read() calls the next read() call will end up at the uninterruptible wait, waiting for a completion which has already occurred.
[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c79041a4
Obviously I am not sure that this is what you are seeing, but I think there is a good chance. If that is correct then you can apply the change (manually) to your driver and that should fix the problem.

Free to wrong pool 2608aa8 not 6d3fe8 at test.pl

Following is my code:
sub test_ms {
my $coderef1 = shift;
my $coderef2 = shift;
if (fork() == 0) {
&$coderef1;
exit;
}
&$coderef2;
}
When I am running this, I am getting the error as
Free to wrong pool 2608aa8 not 6d3fe8 at test.pl
Why am I getting this error?
perlfork says
On Windows fork() system call is not available
That's why it's not working as expected for you. Try using Win32::Process::Create.
See:
What is the difference between Windows fork and Unix fork?
A Great little summary of issues with fork on Windows
Mr. Peabody Explains fork()
That message means a scalar (or array or ...) allocated by one thread (or interpreter?) was freed by another. fork creates threads, not processes on Windows. It's probably because the object in $Excel is not safe to pass between threads. Create $Excel in the thread in which you are going to use it.

How to stop a process and begin other process without returning to the initial Process in Arduino

I desire to construct a Hexapod which utilizes Arduino and is remotely controlled via Bluetooth, at present I am writing the code for its walking(in Arduino part),however I do not know how to proceed.The problem is as follow:
When a new command is received from the remote device I want the legs to stop what they are doing and carry out the received command.If this action is realized with Interrupts then after the command has been completed the previous process again starts,which is undesired for me. What can be done?
Thanks in advance for your answers.
The arduino doesn't really have separate processes - or even an OS.
You should think in terms of "states". Have a global (sorry) int representing the current state (use an enum) then when you do a new command set the state to the new command and return, then have a main loop which checks the state and performs whatever function is needed.

Self-restarting MathKernel - is it possible in Mathematica?

This question comes from the recent question "Correct way to cap Mathematica memory use?"
I wonder, is it possible to programmatically restart MathKernel keeping the current FrontEnd process connected to new MathKernel process and evaluating some code in new MathKernel session? I mean a "transparent" restart which allows a user to continue working with the FrontEnd while having new fresh MathKernel process with some code from the previous kernel evaluated/evaluating in it?
The motivation for the question is to have a way to automatize restarting of MathKernel when it takes too much memory without breaking the computation. In other words, the computation should be automatically continued in new MathKernel process without interaction with the user (but keeping the ability for user to interact with the Mathematica as it was originally). The details on what code should be evaluated in new kernel are of course specific for each computational task. I am looking for a general solution how to automatically continue the computation.
From a comment by Arnoud Buzing yesterday, on Stack Exchange Mathematica chat, quoting entirely:
In a notebook, if you have multiple cells you can put Quit in a cell by itself and set this option:
SetOptions[$FrontEnd, "ClearEvaluationQueueOnKernelQuit" -> False]
Then if you have a cell above it and below it and select all three and evaluate, the kernel will Quit but the frontend evaluation queue will continue (and restart the kernel for the last cell).
-- Arnoud Buzing
The following approach runs one kernel to open a front-end with its own kernel, which is then closed and reopened, renewing the second kernel.
This file is the MathKernel input, C:\Temp\test4.m
Needs["JLink`"];
$FrontEndLaunchCommand="Mathematica.exe";
UseFrontEnd[
nb = NotebookOpen["C:\\Temp\\run.nb"];
SelectionMove[nb, Next, Cell];
SelectionEvaluate[nb];
];
Pause[8];
CloseFrontEnd[];
Pause[1];
UseFrontEnd[
nb = NotebookOpen["C:\\Temp\\run.nb"];
Do[SelectionMove[nb, Next, Cell],{12}];
SelectionEvaluate[nb];
];
Pause[8];
CloseFrontEnd[];
Print["Completed"]
The demo notebook, C:\Temp\run.nb contains two cells:
x1 = 0;
Module[{},
While[x1 < 1000000,
If[Mod[x1, 100000] == 0, Print["x1=" <> ToString[x1]]]; x1++];
NotebookSave[EvaluationNotebook[]];
NotebookClose[EvaluationNotebook[]]]
Print[x1]
x1 = 0;
Module[{},
While[x1 < 1000000,
If[Mod[x1, 100000] == 0, Print["x1=" <> ToString[x1]]]; x1++];
NotebookSave[EvaluationNotebook[]];
NotebookClose[EvaluationNotebook[]]]
The initial kernel opens a front-end and runs the first cell, then it quits the front-end, reopens it and runs the second cell.
The whole thing can be run either by pasting (in one go) the MathKernel input into a kernel session, or it can be run from a batch file, e.g. C:\Temp\RunTest2.bat
#echo off
setlocal
PATH = C:\Program Files\Wolfram Research\Mathematica\8.0\;%PATH%
echo Launching MathKernel %TIME%
start MathKernel -noprompt -initfile "C:\Temp\test4.m"
ping localhost -n 30 > nul
echo Terminating MathKernel %TIME%
taskkill /F /FI "IMAGENAME eq MathKernel.exe" > nul
endlocal
It's a little elaborate to set up, and in its current form it depends on knowing how long to wait before closing and restarting the second kernel.
Perhaps the parallel computation machinery could be used for this? Here is a crude set-up that illustrates the idea:
Needs["SubKernels`LocalKernels`"]
doSomeWork[input_] := {$KernelID, Length[input], RandomReal[]}
getTheJobDone[] :=
Module[{subkernel, initsub, resultSoFar = {}}
, initsub[] :=
( subkernel = LaunchKernels[LocalMachine[1]]
; DistributeDefinitions["Global`"]
)
; initsub[]
; While[Length[resultSoFar] < 1000
, DistributeDefinitions[resultSoFar]
; Quiet[ParallelEvaluate[doSomeWork[resultSoFar], subkernel]] /.
{ $Failed :> (Print#"Ouch!"; initsub[])
, r_ :> AppendTo[resultSoFar, r]
}
]
; CloseKernels[subkernel]
; resultSoFar
]
This is an over-elaborate setup to generate a list of 1,000 triples of numbers. getTheJobDone runs a loop that continues until the result list contains the desired number of elements. Each iteration of the loop is evaluated in a subkernel. If the subkernel evaluation fails, the subkernel is relaunched. Otherwise, its return value is added to the result list.
To try this out, evaluate:
getTheJobDone[]
To demonstrate the recovery mechanism, open the Parallel Kernel Status window and kill the subkernel from time-to-time. getTheJobDone will feel the pain and print Ouch! whenever the subkernel dies. However, the overall job continues and the final result is returned.
The error-handling here is very crude and would likely need to be bolstered in a real application. Also, I have not investigated whether really serious error conditions in the subkernels (like running out of memory) would have an adverse effect on the main kernel. If so, then perhaps subkernels could kill themselves if MemoryInUse[] exceeded a predetermined threshold.
Update - Isolating the Main Kernel From Subkernel Crashes
While playing around with this framework, I discovered that any use of shared variables between the main kernel and subkernel rendered Mathematica unstable should the subkernel crash. This includes the use of DistributeDefinitions[resultSoFar] as shown above, and also explicit shared variables using SetSharedVariable.
To work around this problem, I transmitted the resultSoFar through a file. This eliminated the synchronization between the two kernels with the net result that the main kernel remained blissfully unaware of a subkernel crash. It also had the nice side-effect of retaining the intermediate results in the event of a main kernel crash as well. Of course, it also makes the subkernel calls quite a bit slower. But that might not be a problem if each call to the subkernel performs a significant amount of work.
Here are the revised definitions:
Needs["SubKernels`LocalKernels`"]
doSomeWork[] := {$KernelID, Length[Get[$resultFile]], RandomReal[]}
$resultFile = "/some/place/results.dat";
getTheJobDone[] :=
Module[{subkernel, initsub, resultSoFar = {}}
, initsub[] :=
( subkernel = LaunchKernels[LocalMachine[1]]
; DistributeDefinitions["Global`"]
)
; initsub[]
; While[Length[resultSoFar] < 1000
, Put[resultSoFar, $resultFile]
; Quiet[ParallelEvaluate[doSomeWork[], subkernel]] /.
{ $Failed :> (Print#"Ouch!"; CloseKernels[subkernel]; initsub[])
, r_ :> AppendTo[resultSoFar, r]
}
]
; CloseKernels[subkernel]
; resultSoFar
]
I have a similar requirement when I run a CUDAFunction for a long loop and CUDALink ran out of memory (similar here: https://mathematica.stackexchange.com/questions/31412/cudalink-ran-out-of-available-memory). There's no improvement on the memory leak even with the latest Mathematica 10.4 version. I figure out a workaround here and hope that you may find it's useful. The idea is that you use a bash script to call a Mathematica program (run in batch mode) multiple times with passing parameters from the bash script. Here is the detail instruction and demo (This is for Window OS):
To use bash-script in Win_OS you need to install cygwin (https://cygwin.com/install.html).
Convert your mathematica notebook to package (.m) to be able to use in script mode. If you save your notebook using "Save as.." all the command will be converted to comments (this was noted by Wolfram Research), so it's better that you create a package (File->New-Package), then copy and paste your commands to that.
Write the bash script using Vi editor (instead of Notepad or gedit for window) to avoid the problem of "\r" (http://www.linuxquestions.org/questions/programming-9/shell-scripts-in-windows-cygwin-607659/).
Here is a demo of the test.m file
str=$CommandLine;
len=Length[str];
Do[
If[str[[i]]=="-start",
start=ToExpression[str[[i+1]]];
Pause[start];
Print["Done in ",start," second"];
];
,{i,2,len-1}];
This mathematica code read the parameter from a commandline and use it for calculation.
Here is the bash script (script.sh) to run test.m many times with different parameters.
#c:\cygwin64\bin\bash
for ((i=2;i<10;i+=2))
do
math -script test.m -start $i
done
In the cygwin terminal type "chmod a+x script.sh" to enable the script then you can run it by typing "./script.sh".
You can programmatically terminate the kernel using Exit[]. The front end (notebook) will automatically start a new kernel when you next try to evaluate an expression.
Preserving "some code from the previous kernel" is going to be more difficult. You have to decide what you want to preserve. If you think you want to preserve everything, then there's no point in restarting the kernel. If you know what definitions you want to save, you can use DumpSave to write them to a file before terminating the kernel, and then use << to load that file into the new kernel.
On the other hand, if you know what definitions are taking up too much memory, you can use Unset, Clear, ClearAll, or Remove to remove those definitions. You can also set $HistoryLength to something smaller than Infinity (the default) if that's where your memory is going.
Sounds like a job for CleanSlate.
<< Utilities`CleanSlate`;
CleanSlate[]
From: http://library.wolfram.com/infocenter/TechNotes/4718/
"CleanSlate, tries to do everything possible to return the kernel to the state it was in when the CleanSlate.m package was initially loaded."

Why is a thread's status running but it doesn't use any CPU?

Today I found a very strange problem.
I ran Redhat Enterprise Linux 6, and the CPU was Intel E31275 (4 cores, 8 threads). I found one kernel thread (I called it as my_thread) didn't work correctly.
With "ps" command, I found the status of my_thread was always running:
ps ax
5545 ? R 3:14 [my_thread]
15774 ttyS0 Ss 0:00 -bash
...
But its running time was always 3:14. Since it ws running, why didn't the total time increase?
From the proc file /proc/5545/sched, I found the all statistics including wakeups count (se.nr_wakeups) for this thread was always the same, too.
From /proc/5545/stack, I found this thread called this function and never returned:
interruptible_sleep_on_timeout(&q, 3*HZ);
In theory this function would return every 3 seconds if no other threads woke up the thread. Each time after the function returned, se.nr_wakeups in /proc/5545/sched would be increased by 1. But this never happened after I found the thread had some problems.
Does any one have some ideas? Is it possible that interruptible_sleep_on_timeout() never returns?
Update:
I find the problem won't occur if I set CPU affinity for this thread. If I pin it to a dedicated core, then everything is OK. Are there any problems with SMP scheduling?
Update again:
After I disalbe hyperthread in BIOS, I have not seen such a problem until now.
First off, R indicates the thread is not in running state but runnable. That is, it does not mean it runs, it means it is in a state the scheduler is allowed to pick it for running. There is a big difference between the two.
In a similar sense interruptible_sleep_on_timeout(&q, 3*HZ); will not run the thread after 3 jiffies, but rather make it available for running after 3 jiffies - and indeed you see it in "ps" as available for running, so possibly the timeout has indeed occurred.
Since you did not say anything about the kernel thread in question I don't even know if it is in your own code or standard kernel code so I cannot really answer in detail.
One possible reason for the situation you described is that some other thread (user or kernel) has higher priority then your thread and so the scheduler never picks it for running. If so, it is not probably a thread running in real time priority (SCHED_FIFO or SCHED_RR).

Resources