Setting up Ganglia on spark running on EC2 - amazon-ec2

I am trying to set up ganglia on ec2 servers, but can't seem to recevie info from any other host than my master.
The master meterics are showing fine, but I have 4 more nodes that aren't listed in the web interface.
My question is - could it be that because I'm using unicast I can only see the master node, and that it's aggregating all of the other nodes data?
I have ran both gmetad and gmond in the foreground, and saw that the node and the master are communicating with each other, but still can't see the node in the web UI.
Any help would be appreciated.

Related

Hadoop cluster with docker swarm

I'm trying to setup a hadoop cluster inside a docker swarm with multiple hosts, with a datanode on each docker node with a mounted volume.I made some tests and works fine, but the problem comes when a datanode dies and then return.
I restarted 2 host at the same time and when the containers run again, they get a new ip. The problem is that the namemode give a error because it thinks it is another datanode.
ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node 10.0.0.13:50010 is attempting to report storage ID 3a7b556f-7364-460e-beac-173132d77503. Node 10.0.0.9:50010 is expected to serve this storage.
Is is possible to prevent docker to assign a new ip, and instead keep the last ip after a restart?
Or there are any option for Hadoop config to fix this?
Static DHCP addresses for containers accessing an overlay network are officially not supported for the time being, as told here: https://github.com/moby/moby/issues/31860.
I hope, that docker will provide a solution for this very soon.

FQDN on Azure Service Fabric on Premise

I don't see a way to configure the cluster FQDN for On Premise installation.
I create a 6 nodes cluster (each nodes running on a physical server) and I'm only able to contact each node on their own IP instead of contacting the cluster on a "general FQDN". With this model, I'm to be are of which node is up, and which node is down.
Does somebody know how to achieve it, based on the sample configurations files provided with Service Fabric standalone installation package?
You need to add a network load balancer to your infrastructure for that. This will be used to route traffic to healthy nodes.

DC/OS - Dashboard showing 0 connected nodes

After restarting my 3 masters in my DC/OS cluster, the DC/OS dashboard is showing 0 connected nodes. However from the DC/OS cli I see all 6 of my agent nodes:
$ dcos node
HOSTNAME IP ID
172.16.1.20 172.16.1.20 a7af5134-baa2-45f3-892e-5e578cc00b4d-S7
172.16.1.21 172.16.1.21 a7af5134-baa2-45f3-892e-5e578cc00b4d-S12
172.16.1.22 172.16.1.22 a7af5134-baa2-45f3-892e-5e578cc00b4d-S8
172.16.1.23 172.16.1.23 a7af5134-baa2-45f3-892e-5e578cc00b4d-S6
172.16.1.24 172.16.1.24 a7af5134-baa2-45f3-892e-5e578cc00b4d-S11
172.16.1.25 172.16.1.25 a7af5134-baa2-45f3-892e-5e578cc00b4d-S10`
I am still able to schedule tasks in Marathon both from the dcos cli and from the Marathon gui, they then are properly scheduled and executed on the agents. Also, from the mesos interface on :5050 I can see all of the agents in the slaves page.
I have restarted agent nodes and master nodes. I have also rerun the DC/OS GUI installer and run preflight check, which of course fails with an "already installed" error.
Is there a way to re-register the node with DC/OS GUI short of uninstalling/reinstalling a node?
For anyone who is running into this, my problem was related to our corporate proxy. In order to get the Universe working in my cluster I had to add proxy settings to /opt/mesosphere/environment. I then restarted the dcos-cosmos.service and life was good. However, upon server restart, dcos-history-service.service was now running with the new environment and was unable to resolve my local names with our proxy server. To solve, I added a NO_PROXY to the /opt/mesosphere/environment and DCOS dashboard is again happy.

Hadoop Cluster distributed in different sub-networks (Docker + Flannel)

I want to have Hadoop 2.3.0 in a multi bare-metal cluster using Docker. I have a master container and a slave container (in this first setup). When Master and Slave containers are in the same host (and therefore, same Flannel subnet), Hadoop works perfectly. However, if the Master and Slave are in different bare metal nodes (hence, different flannel subnets), it simply does not work (I get a connection refused error). Both containers can ping and ssh one another, so there is no connectivity problem. For some reason, it seems that hadoop needs all the nodes in the cluster to be in the same subnet. Is there a way to circumvent this?
Thanks
I think having the nodes in separate flannel subnets introduces some NAT-related rules which cause such issues.
See the below link which seems to have addressed a similar issue
Re: Networking Problem in creating HDFS cluster.
Hadoop uses a bunch of other ports for communication between the nodes, the above assumes these ports are unblocked.
ssh and ping are not enough. If you have iptables or any other firewalls, either you need to disable or open up the ports. You can set up the cluster, as long as hosts can communicate with each other and ports are open. Run telnet <namenode> <port> to ensure hosts are communicating on desired ports.

How does one install etcd in a cluster?

Newbie w/ etcd/zookeeper type services ...
I'm not quite sure how to handle cluster installation for etcd. Should the service be installed on each client or a group of independent servers? I ask because if I'm on a client, how would I query the cluster? Every tutorial I've read shows a curl command running against localhost.
For etcd cluster installation, you can install the service on independent servers and form a cluster. The cluster information can be queried by logging onto one of the machines and running curl or remotely by specifying the IP address of one of the cluster member node.
For more information on how to set it up, follow this article

Resources