How does one install etcd in a cluster? - etcd

Newbie w/ etcd/zookeeper type services ...
I'm not quite sure how to handle cluster installation for etcd. Should the service be installed on each client or a group of independent servers? I ask because if I'm on a client, how would I query the cluster? Every tutorial I've read shows a curl command running against localhost.

For etcd cluster installation, you can install the service on independent servers and form a cluster. The cluster information can be queried by logging onto one of the machines and running curl or remotely by specifying the IP address of one of the cluster member node.
For more information on how to set it up, follow this article

Related

FQDN on Azure Service Fabric on Premise

I don't see a way to configure the cluster FQDN for On Premise installation.
I create a 6 nodes cluster (each nodes running on a physical server) and I'm only able to contact each node on their own IP instead of contacting the cluster on a "general FQDN". With this model, I'm to be are of which node is up, and which node is down.
Does somebody know how to achieve it, based on the sample configurations files provided with Service Fabric standalone installation package?
You need to add a network load balancer to your infrastructure for that. This will be used to route traffic to healthy nodes.

Mesosphere not allowing External Traffic

I spun up a Mesosphere cluster on Digital Ocean (development) and it's not allowing me to allow external (non vpn) connections to containers or apps. How can this be solved ?
To ensure that the world doesn't have access to your cluster normally, there have been iptables rules installed. By default, these allow full access inside the cluster and nothing externally.
If you're interested in running real applications, I'd recommend the following:
Put HAProxy on a single node.
Setup the haproxy-marathon-bridge script.
On the same box that you installed HAProxy on, setup iptables to allow access to the port that HAProxy is listening on.
By doing this, you'll have a single place to refer to when giving access to applications running on your Mesos cluster. No matter where the app or container is scheduled (with marathon), you'll always be able to reach it via. haproxy.

How to restart single node hadoop cluster on ec2

I have installed a single node haodoop cluster on using Hortonworks/Ambari on Amazon's ec2 host.
Since I don't want this cluster running 24/7, I stop the instance when done. When I reboot the instance later, I get a new IP address and then ambari no longer is able to start the Hadoop related services.
Is there a way other than completely redeploying to reconfigure the cluster so the services will start?
It looks like the IP address lives in various xml files under /etc, in the postgres database table ambari, and possibly other places I haven't found yet.
I tried updating the xml files and postgres database with updated versions of the ip address, internal and external dns names as I could find them, but to no avail. I have not been able to restart the services.
The reason I am doing this is to possibly save the deployment time and data configuration on hdfs and other project specific setup each time I restart the host.
Any suggestions?
Thanks!
Elastic IP can be used. Also, since you mentioned it being a single node cluster - you can use localhost or private IP.
If you use elastic IP, your UIs will always be on the same public IP. However, if you use private IP or localhost and do not associate your instance with an elastic IP you will have to look for public IP everytime you start the instance and then connect to the web UI using the IP.
Thanks for the help, both Harman and TJ are correct. I haven't used an elastic IP because I might have more than one of these running and a time, and for now at least, I don't mind looking up the public ip address.
Harman's suggestion of using "localhost" as the fqdn when setting up ambari in the first place is a really good idea in retrospect. Unless I go through the whole setup again, that's water under the bridge for me, but I recommend this to others who might read this post.
In my case, I figured this out on my own before coming back to the page. The specific step I took was insanely simple after all, thanks to Occam's Razor.
I added the following line in /etc/hosts:
<new internal IP> <old internal dns name>
and then did
ambari-server restart. from the command line. Then I am able to restart all services after logging into ambari.

Use spark-submit to submit a application to EC2 cluster

I am new to Spark and I am trying to run it on EC2. I follow the tutorial on spark webpage by using spark-ec2 to launch a Spark cluster. Then, I try to use spark-submit to submit the application to the cluster. The command looks like this:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://ec2-54-88-9-74.compute-1.amazonaws.com:7077 --executor-memory 2G --total-executor-cores 1 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0.jar 100
However, I got the following error:
ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
Please let me know how to fix it. Thanks.
You're seeing this issue because the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your machine). The default mode of spark-submit is client which runs the driver on the machine that submitted it.
A new cluster mode was added to spark-deploy that submits the job to the master where it is then run on a client, removing the need for a direct connection. Unfortunately this mode is not supported in standalone mode.
You can vote for the JIRA issue here: https://issues.apache.org/jira/browse/SPARK-2260
Tunneling your connection via SSH is possible but latency would be a big issue since the driver would be running locally on your machine.
I'm curious if you still having this issue ... But in case anyone is asking here is a brief answer. As clarified by jhappoldt, the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your local machine). Two workarounds are possible, tested and succeeded.
(1) From EC2 Management Console, create a new security group and add rules to enable TCP back and forth from your PC (public IP). (what I did was adding TCP rules inbound and outbound) ... Then add this security group to your master instance. (right click --> Networking --> Change security groups). Note: add it and don't remove the already established security groups.
This solution work well, but in your specific scenario, deploying your application from local machine to EC2 cluster, you will face further problems (resource related) so the next option is the best one
(2) Having your .jar file (or .egg) copy it to the master node using scp. You can check this link http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html for information about how to do that; and deploy your application from the master node. Note: spark is already pre-insalled so you will do nothing but write the same exact command you write on your local machine from ~/spark/bin. This shall work perfect.
Are you executing the command on your local machine, or on the created EC2 node? If you're doing it locally, make sure port 7077 is open in the security settings, as its closed to the outside by default.

Apache Traffic Server cluster on Amazon Ec2 instances

I am trying to setup Apache Traffic Server cluster on two Amazon EC2 instances.
I followed the steps from http://docs.trafficserver.apache.org/en/latest/admin/cluster-howto.en.html to set up the cluster. But when I give the following command
traffic_line -r proxy.process.cluster.nodes still gives me value 1. But I am expecting value 2. I want to know whether is it possible to set up Apache Traffic Server cluster on EC2 instances? If possible could anyone let me know what are the other steps that need to be considered apart from the steps mentioned in above link.

Resources