How to run Open MPI program on a databricks cluster with multiple nodes? - amazon-ec2

I am trying to run OpenMPI from a python notebook on databricks clusters (ubuntu).
The cluster has 3 nodes:
Driver node: 8 cores
two worker nodes: each has 8 cores
I found that the Open MPI has been installed on databricks.
My command:
sudo mpirun --allow-run-as-root -np 25 --hostfile MY_HOSTFILE ./MY_C_APP
I got:
ssh: connect to host DRIVER_NODE_IP port 22: No route to host
ssh: connect to host ONE_WORKER_NODE_IP port 22: No route to host
In MY_HOSTFILE, all nodes' IP addresses are listed.
I run the command from python notebook on databricks.
%sh
sudo ssh -T DRIVER_NODE_IP
I got:
ssh: connect to host DRIVER_NODE_IP port 22: No route to host
the notebook is run on the driver node through webservice.
I have tried to set ssh key so that each node can be accessed. But, the shell command of "ssh-keygen" cannot be run from databricks python notebook.
Could anybody let me know how I can work out this infrastructure problem?
thanks

Related

Connecting PostgreSQL installed in docker inside Hyper-V Ubuntu from Windows 10 PgAdmin

I need help in connecting PostgreSQL which is installed in Docker inside HyperV ubuntu 18.4 from Windows 10 PgAdmin. So far I tried the following
Step 1: Install Postgres in Docker (Ubuntu running on Hyper-V)
sudo docker run -p 5432:5432 --name pg_test -e POSTGRES_PASSWORD=admin -d postgres
Step 2: Create a database
docker exec -it pg_test bash
psql -U postgres
create database mytestdb
Step 3: Get the ip address
sudo docker inspect pg_test | grep IPAddress
//returned with 172.17.0.2
Step 4: pg_hba.conf
host all all 0.0.0.0/0 md5
Step 5: When I try to connect from Windows PgAdmin 4, I get this below error -
Note: I have also tried using UBUNTU VM IP address, but no luck
Your's is a case where you are trying to connect to postgres from another subnet, i.e windows subnet to hyper visor subnet if you are not using bridged protocol.
So case 1:
If this is on NAT\HOST and not on bridge then you need to make sure you are able to ping the ubuntu server from windows server.
next is make sure that port is open from ubuntu's end. How do you check that, do a telnet on the port number from windows cmd prompt.
telnet 192.168.0.10 5432
if you are bridged and you can ping ping the server as well, checked that port is opened which is telnet works. You need to make sure that in the postgres.conf file
"listen address" is to "*". which is all.
Again from OS level in ubuntu run the command systemctl stop firewalld to stop firewall and then try to connect. IF this works then you need to open the port in the firewall using this command:
firewall-cmd --permanent --add-port 5432/tcp
I can see from you docker image that 5432 is already opened. This is more of port mapping and firewalld stuff.
You may want to check that pg_hba.conf is not restricted to local. It should not be the case for docker image but you never know.
See: https://www.postgresql.org/docs/9.1/auth-pg-hba-conf.html
Also, there is a typo: POSTGRES_PASSWOR=admin is missing D, it should be POSTGRES_PASSWORD=admin.
You don't need container IP. Since you have mapped container port to host machine (Ubuntu) anyone outsider just needs the Ubuntu machine IP, and on Ubuntu itself you can use localhost.

how to verify the port mesos is listening on

After I start memos-master on Ubuntu 14.04, I'm unable to get to http://:5050
therefore I want to verify if Mesos is listening on the default port 5050.
I'm following the instructions here.
vagrant#master2:~$ sudo start mesos-master
mesos-master start/running, process 5272
vagrant#master1:~$ mesos help
Usage: mesos <command> [OPTIONS]
Available commands:
help
start-agents.sh
daemon.sh
stop-masters.sh
start-masters.sh
start-slaves.sh
start-cluster.sh
master
stop-slaves.sh
agent
stop-cluster.sh
stop-agents.sh
log
execute
scp
tail
resolve
ps
init-wrapper
local
cat
I tried this to verify, but no result.
vagrant#master1:~$ sudo netstat -tnlp | grep 5050
I know Mesos is running but I get connection refused.
vagrant#master1:~$ curl http://192.168.2.1:5050
curl: (7) Failed to connect to 192.168.2.1 port 5050: Connection refused
I see you are using Vagrant so go the the browser in the host machine and type <master2_ip>:5050
<master2_ip> - replace it with the IP address of the master2 using ipaddr or ifconfig to find the ip.
If the mesos is up and running you will get the mesos dashboard or else port unreachable error.
Post your vagrantFile here.

is it possible to use ssh in pod?

I made hadoop image based on centos using dockerfile. There are 4 nodes. I want to configure cluster using ssh-copy-id. But an error has occurred.
ERROR: ssh: connect to host [ip] port 22: Connection refused
How can I solve this problem?
ssh follows a client-server architecture. So, the openssh-server has to be installed in the container. Now ssh-copy-id and other commands should run if the ip address is routable.

Having trouble in accessing my virtual machine from local with SSH

I've ubuntu 16.04 on both my local and virtual machine, I want to access my virtual machine from my local machine, I've already changed the network adapter to bridge connection (both ips are in 192.168,10.x). But when i run the ssh virtual_mac_ip from my local terminal i get the error ssh: connect to host 192.168.10.7 port 22: Connection refused.
ps: I want to configure single node hadoop cluster
The issue has been resolved I changed my network adapter to NAT again, and use port forwarding on port 2222. Now when I run "ssh -p 2222 username#127.0.0.1", I am able to connect to my guest OS
Side note: Please check if OpenSSH is installed on your guess machine

ssh: connect to host slave port 22: Connection timed out

Edited:-
i have done with single node cluster on two different machine,I have made one as master(192.168.1.1) and other m/c as slave(192.168.1.2), I am successfully able to ping between two machine,I have made the following changes to get into 2 node cluster Update :-
/etc/hosts on both machines hosts.allow
All : Ashish-PC 192.168.1.1 : allow
All : slave 192.168.1.2 : allow
master file with
Ashish-PC
slaves file with
Ashish-PC
slave
I am getting an error while copying local host public key to remote host(slave): port 22
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop#slave
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: ERROR: ssh: connect to host slave port 22: Connection timed out
as well as when i start all dfs at master services then also :-
bin/start-dfs.sh
starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-Ashish-namenode- Ashish-PC.out
slave: ssh: connect to host slave port 22: Connection timed out
Ashish-PC: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-Ashish-secondarynamenode-Ashish-PC.out
slave: ssh: connect to host slave port 22: Connection timed out
while copying key:-
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop#slave
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: ERROR: ssh: connect to host slave port 22: Connection timed out
i have used cygwin and ssh is working fine on both the PC and I went through some suggestion to change the port number 22(because of ISP problem) but i dont want do that just because.
thanks in advance for your help and response.
Allow master to communicate through Windows Firewall by adding sshd in home as well as public...
make sure your sshd services are started on each node for communication.
This worked for me:
1.
sudo vi /etc/ssh/sshd_config
2.
Remove the comment
#Port 22
#Protocol 2

Resources