Go program launching several processes - go

I'm playing with Go to understand its features and syntax. I've done a simple producer-consumer program with concurrent go functions, and a priority buffer in the middle. A single producer produces tasks with a certain priority and send them to a buffer using a channel. A set of consumers will ask for a task when idle, received it and consume it. The intermediate buffer will store a set of tasks in a priority queue buffer, so highest priority tasks are served first. The program also prints the Garbage collector activity (how many times it was invoked and how many time it took to collect the garbage).
I'm running it on a Raspberry Pi using Go 1.1.
The software seems to work fine but I found that at SO level, htop shows that there are 4 processes running, with the same memory use, and the sum of the CPU use is over 100% (the Raspberry Pi only has one core so I suppose it has something to do with threads/processes). Also the system load is about 7% of the CPU, I suppose because of a constant context switching at OS level. The GOMAXPROCS environment variable is set to 1 or unset.
Do you know why Go is using more than one OS process?
Code can be found here: http://pastebin.com/HJUq6sab
Thank you!
EDIT:
It seems that htop shows the lightweight processes of the system. Go programs run several of these lightweight processes (they are different from the goroutines threads) so using htop shows several processes, while psor top will show just one, as it should be.

Please try to kill all of the suspect processes and try again running it only once. Also, do not use go run, at least for now - it blurs the number of running processes at minimum.
I suspect the other instances are simply leftovers from your previous development attempts (probably invoked through go run and not properly [indirectly] killed on SIGINT [hypothesis only]), especially because there's a 1 hour "timeout" at the end of 'main' (instead of a proper synchronization or select{}). A Go binary can spawn new threads, but it should never create new processes unless explicitly asked for that. Which is not the case of your code - it doesn't even import "os/exec" or "syscall" in the first place.
If my guess about the combination of go run and using a long timeout is really the culprit, then there's possibly some difference in the RP kernel wrt what the dev guys are using for testing.

Related

MQ process amqrmppa keep increasing

Good day.
I'm having an issue which MQ process amqrmppa are keep increasing which currently there are 635 processes exist. Previous 2 days it only have 2++ processes but keep increasing slowly until current value.
MQ version = 8.0
Operating System = AIX 7
This processes expected to be increase until our maxuproc limit which is 1024.
When I run command,
echo "dis chs(*)" | runmqsc MYMQ | grep channel | wc -l
The output appeared is 268.
Does this means the actual process that are amqrmppa use is 268?
The amqrmppa process is MQ's channel pooling process and by default will scale up using both threads and new process instances. When it increases like this there are two possibilities.
Memory leak. The question doesn't mention the complete version number. When the fix pack level is included it would be a dotted-quad format such as 8.0.0.1 where 8.0 is the major version and 0.1 is the support stream and Fix Pack. When the Fix Pack is known it's possible to look through the APARS to see if it is a known problem.
Application error. When applications terminate client connections improperly, it leaves an orphan channel instance behind. These eventually time out but the pooling process is slow to release threads since it is quicker to reuse one than to rebuild the entire process from scratch. So the number of processes generally reflects a recent high water mark of started channels.
Generally poor programming or external factors (such as firewall timeout) are more likely root causes than MQ code bugs, however the number of APARS in every Fix Pack is testament that these do happen. You might want to tune the MAXCHAN settings so that no one application can spin up more than a reasonable number of concurrent channel instances. You might also want to install BlockIP2 or LogIP (from Mr. MQ) and log connection attempts.
Some more notes:
As a rule, the number of processes is less than the number of concurrent channels due to dispatching channels to threads. This is tunable though and the local QMgr may have the channels set to PROCESS instead of THREAD. Also, if channels are started by inetd the result is one channel per process. (Do not do that in a modern QMgr!)
Why are you running with maxuproc=1024? According to the Infocenter for MQ v8.0, on AIX this value should be 64000. If the recommended kernel tuning has not been performed an apparent process leak is one of the most benign errors you might see. Run the mqconfig program and follow the advice it gives. (I believe mqconfig runs on AIX and also you need to give it the full path name to the executable.)

How to find and kill all the decendent process in Go?

I am looking for a cross platform (various flavours of Unix, including Linux) to find and kill all processes spawned by my program. For Linux, I can walk to /proc to obtain this information, and I am sure I can find somethinf similar for OS X and *BSD. But I'd prefer if there were a standard library for this.
Background: I am writing a custom job schedular which needs to terminate the jobs that don't complete within a given period of time. Simply killing (SIGTERM, followed by SIGKILL- if the formar is not ignored) the child process works fine when the job doesn't spawn any other process or handles SIGTERM properly and takes care of the cleanup. But I don't control the jobs - and I know at least some that are poorly written. In the latter case, the system is left with a bunch of orphaned process which keep holdin on to certain resouces and cause all sorts of problems.
Any pointer to libraries or some cross platform way of doing this would be welcome.

I/O performance of multiple JVM (Windows 7 affected, Linux works)

I have a program that creates a file of about 50MB size. During the process the program frequently rewrites sections of the file and forces the changes to disk (in the order of 100 times). It uses a FileChannel and direct ByteBuffers via fc.read(...), fc.write(...) and fc.force(...).
New text:
I have a better view on the problem now.
The problem appears to be that I use three different JVMs to modify a file (one creates it, two others (launched from the first) write to it). Every JVM closes the file properly before the next JVM is started.
The problem is that the cost of fc.write() to that file occasionally goes through the roof for the third JVM (in the order of 100 times the normal cost). That is, all write operations are equally slow, it is not just one that hang very long.
Interestingly, one way to help this is to insert delays (2 seconds) between the launching of JVMs. Without delay, writing is always slow, with delay, the writing is slow aboutr every second time or so.
I also found this Stackoverflow: How to unmap a file from memory mapped using FileChannel in java? which describes a problem for mapped files, which I'm not using.
What I suspect might be going on:
Java does not completely release the file handle when I call close(). When the next JVM is started, Java (or Windows) recognizes concurrent access to that file and installes some expensive concurrency handler for that file, which makes writing expensive.
Would that make sense?
The problem occurs on Windows 7 (Java 6 and 7, tested on two machines), but not under Linux (SuSE 11.3 64).
Old text:
The problem:
Starting the program from as a JUnit test harness from eclipse or from console works fine, it takes around 3 seconds.
Starting the program through an ant task (or through JUnit by kicking of a separate JVM using a ProcessBuilder) slows the program down to 70-80 seconds for the same task (factor 20-30).
Using -Xprof reveals that the usage of 'force0' and 'pwrite' goes through the roof from 34.1% (76+20 tics) to 97.3% (3587+2913+751 tics):
Fast run:
27.0% 0 + 76 sun.nio.ch.FileChannelImpl.force0
7.1% 0 + 20 sun.nio.ch.FileDispatcher.pwrite0
[..]
Slow run:
Interpreted + native Method
48.1% 0 + 3587 sun.nio.ch.FileDispatcher.pwrite0
39.1% 0 + 2913 sun.nio.ch.FileChannelImpl.force0
[..]
Stub + native Method
10.1% 0 + 751 sun.nio.ch.FileDispatcher.pwrite0
[..]
GC and compilation are negligible.
More facts:
No other methods show a significant change in the -Xprof output.
It's either fast or very slow, never something in-between.
Memory is not a problem, all test machines have at least 8GB, the process uses <200MB
rebooting the machine does not help
switching of virus-scanners and similar stuff has no affect
When the process is slow, there is virtually no CPU usage
It is never slow when running it from a normal JVM
It is pretty consistently slow when running it in a JVM that was started from the first JVM (via ProcessBuilder or as ant-task)
All JVMs are exactly the same. I output System.getProperty("java.home") and the JVM options via RuntimeMXBean RuntimemxBean = ManagementFactory.getRuntimeMXBean(); List arguments = RuntimemxBean.getInputArguments();
I tested it on two machines with Windows7 64bit, Java 7u2, Java 6u26 and JRockit, the hardware of the machines differs, though, but the results are very similar.
I tested it also from outside Eclipse (command-line ant) but no difference there.
The whole program is written by myself, all it does is reading and writing to/from this file, no other libraries are used, especially no native libraries. -
And some scary facts that I just refuse to believe to make any sense:
Removing all class files and rebuilding the project sometimes (rarely) helps. The program (nested version) runs fast one or two times before becoming extremely slow again.
Installing a new JVM always helps (every single time!) such that the (nested) program runs fast at least once! Installing a JDK counts as two because both the JDK-jre and the JRE-jre work fine at least once. Overinstalling a JVM does not help. Neither does rebooting. I haven't tried deleting/rebooting/reinstalling yet ...
These are the only two ways I ever managed to get fast program runtimes for the nested program.
Questions:
What may cause this performance drop for nested JVMs?
What exactly do these methods do (pwrite0/force0)? -
Are you using local disks for all testing (as opposed to any network share) ?
Can you setup Windows with a ram drive to store the data ? When a JVM terminates, by default its file handles will have been closed but what you might be seeing is the flushing of the data to the disk. When you overwrite lots of data the previous version of data is discarded and may not cause disk IO. The act of closing the file might make windows kernel implicitly flush data to disk. So using a ram drive would allow you to confirm that their since disk IO time is removed from your stats.
Find a tool for windows that allows you to force the kernel to flush all buffers to disk, use this in between JVM runs, see how long that takes at the time.
But I would guess you are hitten some iteraction with the demands of the process and the demands of the kernel in attempting to manage disk block buffer cache. In linux there is a tool like "/sbin/blockdev --flushbufs" that can do this.
FWIW
"pwrite" is a Linux/Unix API for allowing concurrent writing to a file descriptor (which would be the best kernel syscall API to use for the JVM, I think Win32 API already has provision for the same kinds of usage to share a file handle between threads in a process, but since Sun have Unix heritige things get named after the Unix way). Google "pwrite(2)" for more info on this API.
"force" I would guess that is a file system sync, meaning the process is requesting the kernel to flush unwritten data (that is currently in disk block buffer cache) into the file on the disk (such as would be needed before you turned your computer off). This action will happen automatically over time, but transactional systems require to know when the data previously written (with pwrite) has actually hit the physical disk and is stored. Because some other disk IO is dependant on knowing that, such as with transactional checkpointing.
One thing that could help is making sure you explicitly set the FileChannel to null. Then call System.runFinalization() and maybe System.gc() at the end of the program. You may need more than 1 call.
System.runFinalizersOnExit(true) may also help, but it's deprecated so you will have to deal with the compiler warnings.

Scaling a ruby script by launching multiple processes instead of using threads

I want to increase the throughput of a script which does net I/O (a scraper). Instead of making it multithreaded in ruby (I use the default 1.9.1 interpreter), I want to launch multiple processes. So, is there a system for doing this to where I can track when one finishes to re-launch it again so that I have X number running at any time. ALso some will run with different command args. I was thinking of writing a bash script but it sounds like a potentially bad idea if there already exists a method for doing something like this on linux.
I would recommend not forking but instead that you use EventMachine (and the excellent em-http-request if you're doing HTTP). Managing multiple processes can be a bit of a handful, even more so than handling multiple threads, but going down the evented path is, in comparison, much simpler. Since you want to do mostly network IO, which consist mostly of waiting, I think that an evented approach would scale as well, or better than forking or threading. And most importantly: it will require much less code, and it will be more readable.
Even if you decide on running separate processes for each task, EventMachine can help you write the code that manages the subprocesses using, for example, EventMachine.popen.
And finally, if you want to do it without EventMachine, read the docs for IO.popen, Open3.popen and Open4.popen. All do more or less the same thing but give you access to the stdin, stdout, stderr (Open3, Open4), and pid (Open4) of the subprocess.
You can try fork http://ruby-doc.org/core/classes/Process.html#M003148
You can get the PID in return and see if this process run again or not.
If you want manage IO concurrency. I suggest you to use EventMachine.
You can either
implement (or find an equivalent gem) a ThreadPool (ProcessPool, in your case), or
prepare an array of all, let's say 1000 tasks to be processed, split it into, say 10 chunks of 100 tasks (10 being the number of parallel processes you want to launch), and launch 10 processes, of which each process right away receives 100 tasks to process. That way you don't need to launch 1000 processes and control that not more than 10 of them work at the same time.

Send CTRL+C to subprocess tree on Windows

I would like to run arbitrary console-based sub-processes and manage them from a single master process. The console based sub-processes communicate via stdin, stdout and stderr, and if you run them in a genuine console they terminate cleanly when you press CTRL+C. Some of them may in fact be a tree of processes, such as a batch script that runs an executable which may in turn run another executable to do some work. I would like to redirect their standard I/O (for example, so that I can show their output in a GUI window) and in certain circumstances to send them a CTRL+C event so that they will give up and terminate cleanly.
The following two diagrams show first the normal structure - one master process has four worker sub-processes, and some of those workers have their own subprocesses; and then what should happen when one of the workers needs to be stopped - it and all of its children should get the CTRL+C event, but no other processes should receive the CTRL+C event.
(source: livejournal.com)
Additionally, I would much prefer that there are no extra windows visible to the user.
Here's what I've tried (note that I'm working in Python, but solutions for C would still be helpful):
Spawning an extra intermediate process with CREATE_NEW_CONSOLE, and then having it spawn the worker process. Then have it call GenerateConsoleCtrlEvent(CTRL_C_EVENT, 0) when we want to kill the worker. Unfortunately, CREATE_NEW_CONSOLE seems to prevent me from redirecting the standard I/O channels, so I'm left with no easy way to get the output back to the main program.
Spawning an extra intermediate process with CREATE_NEW_PROCESS_GROUP, and then having it spawn the worker process. Then have it call GenerateConsoleCtrlEvent(CTRL_C_EVENT, 0) when we want to kill the worker. Somehow, this manages to send the CTRL+C only to the master process, which is completely useless. On closer inspection, GenerateConsoleCtrlEvent says that CTRL+C cannot be sent to process groups.
Spawning the subprocess with CREATE_NEW_PROCESS_GROUP. Then call GenerateConsoleCtrlEvent(CTRL_BREAK_EVENT, pid) to kill the worker. This is not ideal, because CTRL+BREAK is less friendly than CTRL+C and will probably result in a messier termination. (E.g. if it's a Python process, no KeyboardInterrupt can be caught and no finally blocks run.)
Is there any good way to do what I want? I can see that I could theoretically build on the first attempt and find some other way to communicate between the processes, but I am worried it will turn out to be extremely awkward. Are there good examples of other programs that achieve the same effect? It seems so simple that it can't be all that uncommon a requirement.
I don't know about managing/redirecting stdin et. al., but for managing the subprocess tree
have you considered using the Windows Job Objects api?
There are several other questions about managing process trees (How do I automatically destroy child processes in Windows? Performing equivalent of “Kill Process Tree” in c++ on windows) and it looks like the cleanest method if you can use it.
Chapter 5 of Windows Via C/C++ by Jeffery Richter has a good discussion on using CreateJobObject and the related APIs.

Resources