Creating a multi headnode hpc cluster - cluster-computing

I have a HPC cluster where several webapps are installed in docker containers, the queue is managed using Torque. Every app submits job to the HPC cluster connecting to it through ssh and then running qsub: ssh user#cluster qsub bla blabla. There are shared folder for exchanging data.
I am not satisfied with this setup and I'd like to know if it is possible to have a masternode running on each docker and using qsub directly inside it without doing an ssh connection. I'd prefer to use torque but I am open to other solutions.

Torque permits multiple submission hosts.
The names or addresses of the hosts should be added to submit_hosts variable in Torque server configuration here is a relevant page from the manual.
qmgr -c 'set server submit_hosts = headnode'
qmgr -c 'set server submit_hosts += app1'
qmgr -c 'set server submit_hosts += app2'
Assuming app1 and app2 are domain names of the docker containers. You will need to configure name resolution.
For more details and other options see Torque manual.

Related

How to configure docker swarm using jenkins?

I have got an assignment. The assignment is "Write a shell script to install and configure docker swarm(one master/leader and one node) and automate the process using Jenkins." I am new to this technology and finding it difficult to proceed. Can anyone help me in explaining step-by-step process of how to proceed?
#Rajnish Kumar Singh, Have you tried to check resources online? I understand you are very new to this technology, but googling some key words like
what is docker swarm
what is jenkins , etc would definitely helps
Having said that, Basically you need to do below set of steps to complete your assignment
Pre-requisites
2 or more - Ubuntu 20.04 Server
(You can use any linux distros like ubuntu, Redhat etc, But make sure your install and execute commands change accordingly.
Here we need two nodes mainly to configure the master and worker node cluster)
Eg :
manager --- 132.92.41.4
worker --- 132.92.41.5
You can create these nodes in any of public cloud providers like AWS EC2 instances or GCP VMs etc
Next, You need to do below set of steps
Configure Hosts
Install Docker-ce
Docker Swarm Initialization
You can refer this article for more info https://www.howtoforge.com/tutorial/ubuntu-docker-swarm-cluster/
This completes first part of your assignment.
Next, You can create one small shell script and include all those install and configuration commands in that script. Basically shell script is collection of set of linux commands. Instead of running each commands separately , you will run script alone and all set up will be done for you.
You can create small script using touch command
touch docker-swarm-install.sh
Specify proper privileges to script to make it executable
chmod +x docker-swarm-install.sh
Next include all your install + configure commands, which you have used earlier to do docker swarm set up in scripts (You can refer above shared link)
Now, when your script is ready, you can configure this script in jenkins job and whenever jenkins job is run, script will get execute and docker swarm cluster will be created
You need a jenkins server. Jenkins is open source software, you can install it in any of public cloud instance (Aws EC2)
Reference : https://devopsarticle.com/how-to-install-jenkins-on-aws-ec2-ubuntu-20-04/
Next once installation is completed. You need to configure job in jenkins
Reference : https://www.toolsqa.com/jenkins/jenkins-build-jobs/
Add your 'docker-swarm-install.sh' as build step in created job
Reference : https://faun.pub/jenkins-jobs-hands-on-for-the-different-use-cases-devops-b153efb483c7
If all set up is successful and now when you run your jenkins job, your docker swarm cluster must be get created.

Launching a Singularity Container Remotely using Visual Studio Code

I am aware that you can launch docker containers remotely in VSCode. Is it possible to do the same with singularity containers?
Update: the solution to this was published in the same issue (https://github.com/microsoft/vscode-remote-release/issues/3066#issuecomment-1019500216) as before by user oschulz:
As promised, here are some instructions on how to use Singularity with VS-Code Remote SSH via SSH RemoteCommand. The procedure described below makes VS-Code run it’s remote server component inside a Singularity container instance (other runtimes like Shifter work too).
Acknowledgement: Credit for a lot of this goes to #gipert, who refined my original approach (using a custom SSH script) when support for RemoteCommand became available in VS-Code recently
Step 1
Use VS-Code >= v1.64 (includes support for the SSH RemoteCommand setting). Install the Pre-Release version of the Remote SSH extension
Important: In the VS-Code settings, set "remote.SSH.enableRemoteCommand": true.
Step 2
In your "$HOME/.ssh/config", add something like
Host myimage1~*
RemoteCommand singularity shell /path/to/image1.sif
RequestTTY yes
Host myimage2~*
RemoteCommand singularity shell /path/to/image2.sif
RequestTTY yes
Host somehost myimage1~somehost myimage2~somehost
HostName some.host.somewhere
User your_username_
Host otherhost myimage1~otherhost myimage2~otherhost
HostName some.otherhost.somewhere
User your_username_
Test whether this works using ssh myimage1~somehost. This should drop you into an SSH session inside of an instance of the "/path/to/image1.sif" container image on some.host.somewhere.
Connecting to the remote host with VS-Code: F1 > "Connect to Host" > "myimage1~somehost” should now get you a remote VS-Code session running in the container image as well. The same for "myimage2~somehost", "myimage1~otherhost" and "myimage2~otherhost".
Step 3
However, since VS-code reuses remote server instance, that's not sufficient to run multiple container images on the same host at the same time. To get separate (per container) VS-Code server instances the same host, add something like this to your VS-Code preferences:
"remote.SSH.serverInstallPath": {
"myimage1~somehost": "~/.vscode-container/myimage1",
"myimage1~otherhost": "~/.vscode-container/myimage1",
"myimage2~somehost": "~/.vscode-container/myimage2",
"myimage2~otherhost": "~/.vscode-container/myimage2"
}
Request to the VS-Code dev team
Could "remote.SSH.serverInstallPath" be controlled via an environment variable? This would allow us to eliminate all these cumbersome "remote.SSH.serverInstallPath" preferences. The environment variable could be set by a container startup script on the remote side (like the one below) automatically, depending on the selected container image.
Other Container runtimes
To use a different container runtime than Singularity (e.g. Shifter, Charliecloud, etc.), simply replace singularity shell /path/to/image1.sif by the appropriate command for your runtime.
On some systems (e.g. with Shifter at NERSC) you may also need to override $XDG_RUNTIME_DIR, since it's default location may not be writable from within a container instance. In such cases, it's best to use a custom container run-script like
#!/bin/sh
export XDG_RUNTIME_DIR="${TMPDIR:-/tmp}/`whoami`/run"
exec shifter --image="$1"
So in your SSH config, use
RemoteCommand /my/homedir/.local/bin/run_container image_name
I maintain a little container start-script called cenv that handles $XDG_RUNTIME_DIR (and quite a bit more, including some default bind-mounts) automatically for both Singularity and Shifter (contributions welcome).
Tips and tricks
If things don't work, try "Kill server on remote" from VS-Code and reconnect.
You can also try starting over from scratch with brute force: Close the VS-Code remote connection. Then, from an external terminal, kill the remote VS-Code server instance:
$ ssh somehost
$ kill -9 -1
(Will kill all processes you own on the remote host.)
Remove the ~/.vscode-server directory.
Old:
I believe this is still not supported. Refer to this issue: https://github.com/microsoft/vscode-remote-release/issues/3066, and there are also some ideas for potential workarounds in the same link.

WebLogic in Docker: how to start managed servers automatically

I am learning Docker, and creating an image for Oracle WebLogic 12.2.1.4 server.
My image is ready, working fine. It contains
an admin server
two managed servers
When I run my image with docker run -d -p 7001:7001 --name WL oracle/weblogic-12.2.1.4.0:1.0 the admin server starts automatically because I added the following line at the end of my Dockerfile:
CMD /u01/oracle/user_projects/domains/$DOMAIN_NAME/startWebLogic.sh
But I need to start managed servers manually. I need to login into the container and start them by hand:
docker exec -it WL /bin/bash
./startManagedWebLogic.sh MANAGED_SERVER_1 http://localhost:7001 &
./startManagedWebLogic.sh MANAGED_SERVER_2 http://localhost:7001 &
This is not what I want. I want to start managed servers automatically after admin server is up and running.
I was thinking about to create a new bash script, copy it into the image and use it to boot up the admin and managed servers. Like this:
start-wls-domain.sh
#!/bin/bash
/u01/oracle/user_projects/domains/$DOMAIN_NAME/startWebLogic.sh &
# there are a more sophisticated way to check the status of the admin server but it is okay for test
sleep 60
./startManagedWebLogic.sh MANAGED_SERVER_1 http://localhost:7001 &
./startManagedWebLogic.sh MANAGED_SERVER_2 http://localhost:7001 &
This script can be called from Dockerfile with CMD command.
But with this solution, I lost the ability to see the output on the default Docker log. The docker logs WL -f will display nothing.
Another issue with this bash script solution is if this script finished the container will stop running. Do I need an infinite loop at the end of this script?
If possible I would like to have a solution without start-wls-domain.sh.
What is the best and easiest way to start Weblogic managed servers automatically within a Docker container?
I followed the suggestions and I run different servers in different containers. That way I was able to start properly the server.
I published the solution on Github, here.

Hooking up into running heroku phoenix application

Previous night I was tinkering with Elixir running code on my both machines at home, but when I woke up, I asked myself Can I actually do the same using heroku run command?
I think theoretically it should be entirely possible if setup properly. Obviously heroku run iex --sname name executes and gives me access to shell (without functioning backspace which is irritating) but i haven't accessed my app yet.
Each time I executed the command it gave me different machine. I guess it's how Heroku achieve sandbox. I also was trying to find a way to determine address of my app's machine but haven't got any luck yet.
Can I actually connect with the dyno running the code to evaluate expressions on it like you would do iex -S mix phoenix.server locally ?
Unfortunately it's not possible.
To interconnect Erlang VM nodes you'd need EPMD port (4369) to be open.
Heroku doesn't allow opening custom ports so it's not possible.
In case You'd want to establish a connection between your Phoenix server and Elixir node You'd have to:
Two nodes on the same machine:
Start Phoenix using iex --name phoenix#127.0.0.1 -S mix phoenix.server
Start iex --name other_node#127.0.0.1
Establish a connection using Node.ping from other_node:
iex(other_node#127.0.0.1)1> Node.ping(:'phoenix#127.0.0.1')
(should return :pong not :pang)
Two nodes on different machines
Start Phoenix using some external address
iex --name phoenix#195.20.2.2 --cookie someword -S mix phoenix.server
Start second node
iex --name other_node#195.20.2.10 --cookie someword
Establish a connection using Node.ping from other_node:
iex(other_node#195.20.2.10)1> Node.ping(:'phoenix#195.20.2.2')
(should return :pong not :pang)
Both nodes should contact each other on the addresses they usually see each other on the network. (Full external IP when different networks, 192.168.X.X when in the same local network, 127.0.0.1 when on the same machine)
If they're on different machines they also must have set the same cookie value, because by default it takes automatically generated cookie in your home directory. You can check it out by running:
cat ~/.erlang.cookie
What's last you've got to make sure that your EPMD port 4369 is open, because Erlang VM uses it for internode data exchange.
As a sidenote if you will leave it open make sure to make your cookie as private as possible, because if someone knows it, he can have absolute power over your machine.
When you execute heroku run it will start a new one-off dyno which is a temporary instance that is deprovisioned when you finish the heroku run session. This dyno is not a web dyno and cannot receive inbound HTTP requests through Heroku's routing layer.
From the docs:
One-off dynos can never receive HTTP traffic, since the routers only route traffic to dynos named web.N.
https://devcenter.heroku.com/articles/one-off-dynos#formation-dynos-vs-one-off-dynos
If you want your phoenix application to receive HTTP requests you will have to set it up to run on a web dyno.
It has been a while since you've asked the question, but someone might find this answer valuable, though.
As of 2021 Heroku allows forwarding multiple ports, which allows to remsh into a running ErlangVM node. It depends on how you deploy your application, but in general, you will need to:
Give your node a name and a cookie (i.e. --name "myapp#127.0.0.1" --cookie "secret")
Tell exactly which port a node should bind to, so you know which pot to forward (i.e. --erl "-kernel inet_dist_listen_min 9000 -kernel inet_dist_listen_max 9000")
Forward EPMD and Node ports by running heroku ps:forward 9001:4369,9000
Remsh into your node: ERL_EPMD_PORT=9001 iex --cookie "secret" --name console#127.0.0.1 --remsh "myapp#127.0.0.1"
Eventually you should start your server with something like this (if you are still using Mix tool): MIX_ENV=prod elixir --name "myapp#127.0.0.1" --cookie "secret" --erl "-kernel inet_dist_listen_min 9000 -kernel inet_dist_listen_max 9000" -S mix phx.server --no-halt
If you are using Releases, most of the setup has already been done for you by the Elixir team.
To verify that EPMD port has been forwarded correctly, try running epmd -port 9001 -names. The output should be:
epmd: up and running on port 4369 with data:
name myapp#127.0.0.1 at port 9000
You may follow my notes on how I do it for Dockerized releases (there is a bit more hustle): https://paveltyk.medium.com/elixir-remote-shell-to-a-dockerized-release-on-heroku-cc6b1196c6ad

parallel ipython/ipcluster through head node

I want to use the parallel capabilities of ipython on a remote computer cluster. Only the head node is accessible from the outside. I have set up ssh keys so that I can connect to the head node with e.g. ssh head and from there I can also ssh into any node without entering a password, e.g. ssh node3. So I can basically run any commands on the nodes by doing:
ssh head ssh node3 command
Now what I really want to do is to be able to run jobs on the cluster from my own computer from ipython. The way to set up the hosts to use in ipcluster is:
send_furl = True
engines = { 'host1.example.com' : 2,
'host2.example.com' : 5,
'host3.example.com' : 1,
'host4.example.com' : 8 }
But since I only have a host name for the head node, I don't think I can do this. One option is to set us ssh tunneling on the head node, but I cannot do this in my case, since this requires enough ports to be open to accommodate all the nodes (and this is not the case). Are there any alternatives?
I use ipcluster on the NERSC clusters by using the PBS queue:
http://ipython.org/ipython-doc/stable/parallel/parallel_process.html#using-ipcluster-in-pbs-mode
in summary you submit jobs which runs mpiexec ipengine, (after having launched ipcontroller on the login node). Do you have PBS on your cluster?
this was working fine with ipython .10, it is now broken in .11 alpha.
I would set up a VPN server on the master, and connect to that with a VPN client on my local machine. Once established, the virtual private network will allow all of the slaves to appear as if they're on the same LAN as my local machine (on a "virtual" network interface, in a "virtual" subnet), and it should be possible to ssh to them.
You could possibly establish that VPN over SSH ("ssh tunneling", as you mention); other options are OpenVPN and IPsec.
I don't understand what you mean by "this requires enough ports to be open to accommodate all the nodes". You will need: (i) one inbound port on the master, to provide the VPN/tunnel, (ii) inbound SSH on each slave, accessible from the master, (iii) another inbound port on each slave, over which the master drives the IPython engines. Wouldn't (ii) and (iii) be required in any setup? So all we've added is (i).

Resources