Adding a Spark node to clusters gets stuck on Determining SSH fingerprints - amazon-ec2

I am trying to add a new Spark node to an existing DSE 4.8.4 cluster running on EC2. I use following settings:
AMI = ami-50520e27
AMI username = ubuntu
Use OpsCenter specific security group
Key file using by the existing cluster
The machine spins up, but in OpsCenter > Activities > Add Nodes there is a message saying "Determining SSH fingerprints" which stays there forever.

Related

How to setup Kubernetes cluster on several windows hosts?

I have several Windows servers available and would like to setup a Kubernetes cluster on them.
Is there some tool or a step by step instruction how to do so?
What I tried so far is to install DockerDesktop and enable its Kubernetes feature.
That gives me a single node Cluster. However, adding additional nodes to that Docker-Kubernetes Cluster (from different Windows hosts) does not seem to be possible:
Docker desktop kubernetes add node
Should I first create a Docker Swarm and could then run Kubernetes on that Swarm? Or are there other strategies?
I guess that I need to open some ports in the Windows Firewall Settings of the hosts? And map those ports to some Docker containers in which Kubernetes is will be installed? What ports?
Is there some program that I could install on each Windows host and that would help me with setting up a network with multiple hosts and connecting the Kubernetes nodes running inside Docker containers? Like a "kubeadm for Windows"?
Would be great if you could give me some hint on the right direction.
Edit:
Related info about installing kubeadm inside Docker container:
https://github.com/kubernetes/kubernetes/issues/35712
https://github.com/kubernetes/kubeadm/issues/17
Related question about Minikube:
Adding nodes to a Windows Minikube Kubernetes Installation - How?
Info on kind (kubernetes in docker) multi-node cluster:
https://dotnetninja.net/2021/03/running-a-multi-node-kubernetes-cluster-on-windows-with-kind/
(Creates multi-node kubernetes cluster on single windows host)
Also see:
https://github.com/kubernetes-sigs/kind/issues/2652
https://hub.docker.com/r/kindest/node
You can always refer to the official kubernetes documentation which is the right source for the information.
This is the correct way to manage this question.
Based on Adding Windows nodes, you need to have two prerequisites:
Obtain a Windows Server 2019 license (or higher) in order to configure the Windows node that hosts Windows containers. If you are
using VXLAN/Overlay networking you must have also have KB4489899
installed.
A Linux-based Kubernetes kubeadm cluster in which you have access to the control plane (see Creating a single control-plane cluster with kubeadm).
Second point is especially important since all control plane components are supposed to be run on linux systems (I guess you can run a Linux VM on one of the servers to host a control plane components on it, but networking will be much more complicated).
And once you have a proper running control plane, there's a kubeadm for windows to proper join Windows nodes to the kubernetes cluster. As well as a documentation on how to upgrade windows nodes.
For firewall and which ports should be open check ports and protocols.
For worker node (which will be windows nodes):
Protocol Direction Port Range Purpose Used By
TCP Inbound 10250 Kubelet API Self, Control plane
TCP Inbound 30000-32767 NodePort Services All
Another option can be running windows nodes in cloud managed kuberneres, for example GKE with windows node pool (yes, I understand that it's not your use-case, but for further reference).

Is it necessary to be able to do SSH from one node to another in order to do cloudera cluster installation?

I am creating a cloudera cluster and when I do SSH from one node to another I get message saying "Public Key".Login to the machines happen using PEM file and paraphrase Is it necessary to be able to do SSH from one node to another in order to do cloudera cluster installation?
Yes, it is necessary. Cloudera manager need a way to access other machines over SSH. This can be authenticated using public keys (recommended) or passwords.
Cloudera Manager requires an SSH key to communicate with the Cloudera SCM Agents and do the installation.
Cloudera Manager installs the cluster for you. but you will need to initially add the authorized_keys SSH file for the remote manager to access the agent machines.
How does Cloudera Manager Work? (doesn't mention SSH, though)

DC/OS - Dashboard showing 0 connected nodes

After restarting my 3 masters in my DC/OS cluster, the DC/OS dashboard is showing 0 connected nodes. However from the DC/OS cli I see all 6 of my agent nodes:
$ dcos node
HOSTNAME IP ID
172.16.1.20 172.16.1.20 a7af5134-baa2-45f3-892e-5e578cc00b4d-S7
172.16.1.21 172.16.1.21 a7af5134-baa2-45f3-892e-5e578cc00b4d-S12
172.16.1.22 172.16.1.22 a7af5134-baa2-45f3-892e-5e578cc00b4d-S8
172.16.1.23 172.16.1.23 a7af5134-baa2-45f3-892e-5e578cc00b4d-S6
172.16.1.24 172.16.1.24 a7af5134-baa2-45f3-892e-5e578cc00b4d-S11
172.16.1.25 172.16.1.25 a7af5134-baa2-45f3-892e-5e578cc00b4d-S10`
I am still able to schedule tasks in Marathon both from the dcos cli and from the Marathon gui, they then are properly scheduled and executed on the agents. Also, from the mesos interface on :5050 I can see all of the agents in the slaves page.
I have restarted agent nodes and master nodes. I have also rerun the DC/OS GUI installer and run preflight check, which of course fails with an "already installed" error.
Is there a way to re-register the node with DC/OS GUI short of uninstalling/reinstalling a node?
For anyone who is running into this, my problem was related to our corporate proxy. In order to get the Universe working in my cluster I had to add proxy settings to /opt/mesosphere/environment. I then restarted the dcos-cosmos.service and life was good. However, upon server restart, dcos-history-service.service was now running with the new environment and was unable to resolve my local names with our proxy server. To solve, I added a NO_PROXY to the /opt/mesosphere/environment and DCOS dashboard is again happy.

How does one install etcd in a cluster?

Newbie w/ etcd/zookeeper type services ...
I'm not quite sure how to handle cluster installation for etcd. Should the service be installed on each client or a group of independent servers? I ask because if I'm on a client, how would I query the cluster? Every tutorial I've read shows a curl command running against localhost.
For etcd cluster installation, you can install the service on independent servers and form a cluster. The cluster information can be queried by logging onto one of the machines and running curl or remotely by specifying the IP address of one of the cluster member node.
For more information on how to set it up, follow this article

Use spark-submit to submit a application to EC2 cluster

I am new to Spark and I am trying to run it on EC2. I follow the tutorial on spark webpage by using spark-ec2 to launch a Spark cluster. Then, I try to use spark-submit to submit the application to the cluster. The command looks like this:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://ec2-54-88-9-74.compute-1.amazonaws.com:7077 --executor-memory 2G --total-executor-cores 1 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0.jar 100
However, I got the following error:
ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
Please let me know how to fix it. Thanks.
You're seeing this issue because the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your machine). The default mode of spark-submit is client which runs the driver on the machine that submitted it.
A new cluster mode was added to spark-deploy that submits the job to the master where it is then run on a client, removing the need for a direct connection. Unfortunately this mode is not supported in standalone mode.
You can vote for the JIRA issue here: https://issues.apache.org/jira/browse/SPARK-2260
Tunneling your connection via SSH is possible but latency would be a big issue since the driver would be running locally on your machine.
I'm curious if you still having this issue ... But in case anyone is asking here is a brief answer. As clarified by jhappoldt, the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your local machine). Two workarounds are possible, tested and succeeded.
(1) From EC2 Management Console, create a new security group and add rules to enable TCP back and forth from your PC (public IP). (what I did was adding TCP rules inbound and outbound) ... Then add this security group to your master instance. (right click --> Networking --> Change security groups). Note: add it and don't remove the already established security groups.
This solution work well, but in your specific scenario, deploying your application from local machine to EC2 cluster, you will face further problems (resource related) so the next option is the best one
(2) Having your .jar file (or .egg) copy it to the master node using scp. You can check this link http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html for information about how to do that; and deploy your application from the master node. Note: spark is already pre-insalled so you will do nothing but write the same exact command you write on your local machine from ~/spark/bin. This shall work perfect.
Are you executing the command on your local machine, or on the created EC2 node? If you're doing it locally, make sure port 7077 is open in the security settings, as its closed to the outside by default.

Resources