I'm trying to do some parallel programming on a HPC running on Windows server 2008. I'm trying to use MPI and particulary MPI.NET an implentation of the MPI protocol in C#. Right now I'm following the tutorial to understand the library.
I'm trying to run the program pingpong.exe given with the SDK. It works fine on the HPC if the processes are on the same node (so it's the same utility as a threading system) As soon as it is on more than one node, it doesn't work.
What am I missing ?
Thanks.
Ok I've been able to understand why, it wasn't working : the firewall was blocking the message between the nodes. source It's possible to change the firewall so it doesn't block anything inside the cluster.
Hope it helps the next person.
Thanks Mark and IronMan.
Related
I am trying to do some debug with VSCode on the computing node of a SGE cluster via qrsh commands. However, every time I entered the debug mode, it always remained at the login node instead of the computing node.
Here's my dilemma:
First, log in login-node via remote-ssh in vscode
Then apply a compute-node with gpu via qrsh command
Each time the hostname of the compute-node could change.
If I hit the debug button trying to do some debugging on the python programmes, it would go back to the login-node.
I tried to google a solution, some people mentioned that can try with proxy jump in ~/.ssh/config. But it seems not realistic since the applied compute-node's hostname is always different with the previous one.
At the moment, I have to either print everything out or use pdb.set_trace(). Both of them are not convenient for me because some python programmes are very huge. Using the IDE debugging function is really helpful and efficient for understanding other people's coding.
Is there any solution to fix it?
I tried to google some solution, but most of them are for slurm cluster, rather than SGE cluster.
You can use wildcards in ~/.ssh/config host definitions.
For example, if the compute nodes in your HPC are named compute-1, compute-2, compute-3, etc. then you can set up a single definition for all these hosts.
Host compute*
You don't give an example of how ProxyJump might be used here so I can't test this, but hopefully this gets you one step closer to a solution.
Years ago, there were functions in Win32 whereby the app could check to see if a user was running the app via Terminal Services/Remote Desktop. I thnk it was something like:
GetSystemMetrics(1000H)
Is there a system call one can make to check to see if a Win32 or Win64 app is being run remotely via a program like GotoMyPC or LogMeIn?
No, there is not. Those are third party apps doing their own video/input capturing and network streaming. They are plain ordinary apps as far as Windows is concerned. Terminal Services is built into Windows, which is why there are APIs to query TS status.
The only way I can (currently) think of, other than using the aforementioned API call, is also seeing if any particular processes you can identify are running (e.g. GotoMyPC or LogMeIn... they will have some process running). Without doing too much research, they may be running without actually having someone using them. If, however, they launch something to do the streaming, you could check for that.
Just to make sure that this isn't an XY problem, what is it that you're trying to do - and perhaps there is another way?
I have an application in windows, that opens a com port. It attempts to call a comport, then fails and prompts me with an error.
The issue is this is very legacy software that we no longer have the source code for. I'm wondering if anyone knows of a way that can trace, or follow a program calling a com port to find out what com port its attempting to allocate.
Appearantly you can use Process Explorer (as called out in this post) to search for processes using serial ports. It sounds like you should be able to use the same searching concept called out the other post to find what you need.
I actually gave up on this solution and re-wrote the entire program in a week, it had to be done due to binary compatibility issues with the PCI cards.
I am trying to port a MS VC++ program to run on a rocks cluster! I am not very good with linux but I am eager to learn and I imagine porting it wouldn't be an impossible task for me. However, I do not understand how to take advantage of the cluster nodes. because it seems that the code execute only runs on the front end server (obviously).
I have read a little about MPI and its seems like I should use MPI to comminicate between nodes. The program is currently written such that I have a main thread that synchronizes all worker threads. The main thread also recieves commands to manipulate the simulation or query its state. If the simulation is properly setup, communication between executing threads can be significantly minimized. What I don't understand is how do I start the process on the compute nodes and how do I handle failures in nodes? And maybe there should be other things I should also consider when porting my program to run in a cluster?
The first step is porting the threaded MS VC++ program to run on a single Linux machine.
Once you have gotten past that point, then modify your program to use MPI in addition to threads (or instead of threads). You can do this on a single computer as well.
To run the program on multiple nodes on your cluster, you will need to submit the program to whatever scheduling system you cluster uses. The command for this is dependent on the scheduling software used for your Rocks cluster. Ask your administrator. It may look something like mpirun -np 32 yourprogram.
Handling failures is the nodes is a broad question. Your first pass should probably just report the failure, then fail the program. If the program doesn't take to long to compute on the cluster, then restarting the program, adjusting for the failed node, may be good enough. Beyond that, your application can write to disk intermediate information needed to resume where it left off. This is called checkpointing your application. Thus, when a node fails, the job fails, but restarting the job doesn't start from the beginning. Much more advanced would be trying to actually detect node failures and reschedule the work unit that was on the failed node. This assumes that the work unit doesn't have non-idempotent side effects. This sort of thing gets really complicated. Checkpointing is likely good enough.
My understanding is that node.js is designed to scale by adding processes rather than by spawning threads in a process. In fact, from watching an awesome introductory video by Ryan Dahl, I get the idea that spawning threads is forbidden in node.js. I like the simplicity of this approach, but I am concerned that there might be downside when running on Windows, since processes creation is more expensive on Windows than Linux.
Given modern hardware and the fact that node.js processes can be expected to be relatively long running, does process overhead still create a significant advantage for Linux when considering hosting node.js? To put it in concrete terms, if we assume an organization that is using the Windows stack only, but is planning a big move onto node.js, is there a point in considering a new OS because of this issue?
No. Node.js runs in only 1 process and doesn't spawn processes during execution.
The reason you might have gotten the impression that node uses processes to scale is because you can add a process per CPU core to enable node to take advantage of your multicore computer (you'll need a load balancer like solution for this tho). Still: you don't spawn processes on the fly. So yes, you can run node perfectly fine on Windows (or Azure) without too much of a performance hit (if any).