How to reliably discover Marathon URL - mesos

Since webui_url entry in Mesos state JSON is optional, one can try the luck with hostname (which is also optional).
However, if both of the above entries are missing from Mesos state, is there any other way to reliably discover where Marathon API server is listening?
Furthermore, if a Marathon instance is migrated to another location, Mesos webui_url seems to retain the old, stale value. This looks like a bug? Is there any workaround?

As far as I know you should be able to use the Mesos UI ("Frameworks" -> "Active Frameworks") to find the marathon framework.
If the hostname resolution of the host Marathon is running on works correctly, you should be able to click on the link of the "host" column and be redirected to the Marathon UI.
See
https://mesosphere.github.io/marathon/docs/command-line-flags.html
--hostname (Optional. Default: hostname of machine): The advertised hostname that is used for the communication with the mesos master. The value is also stored in the persistent store so another standby host can redirect to the elected leader. Note: Default is determined by InetAddress.getLocalHost.

Related

Can not open localhost:8088. Trying to install Hadoop3 on Windows10

localhost:9870 is working fine. the problem is localhost:8088. Did they move it same as 9870?
No. As stated in Apache Hadoop 3.0.0:
Default ports of multiple services have been changed.
Previously, the default ports of multiple Hadoop services were in the Linux ephemeral port range (32768-61000). This meant that at startup, services would sometimes fail to bind to the port due to a conflict with another application.
These conflicting ports have been moved out of the ephemeral range, affecting the NameNode, Secondary NameNode, DataNode, and KMS. Our documentation has been updated appropriately, but see the release notes for HDFS-9427 and HADOOP-12811 for a list of port changes.
Since the YARN ports were never in the ephemeral port range, they didn't need to be changed.
This is confirmed by looking at the yarn-default.xml for Hadoop 3.0.0.
| yarn.resourcemanager.webapp.address | ${yarn.resourcemanager.hostname}:8088 | The http address of the RM web application. If only a host is provided as the value, the webapp will be served on a random port.

Why can't standalone slaves connect to master on separate Mac OS boxes?

I have two Macs (both OS X EI Caption) at home, both are connected to same wifi. I want to install an spark cluster (with two workers) on this two computers.
Mac1 (192.168.1.2) is my master, with Spark 1.5.2, it is up and working well, and I can see the Spark UI at http://localhost:8080/ (also I see spark://Mac1:7077)
I also have run one slave on this machine (Mac1), and I see it under workers in the Spark UI.
Then, I have copied the Spark on the second machine (Mac2), and I am trying to run another Slave on Mac2 (192.168.2.9) by this command:
./sbin/start-slave.sh spark://Mac1:7077
But, it does not work: Looking at log it shows:
Failed to connect to master Mac1:7077
Actor not found for: ActorSelection[Anchor(akka.tcp://sparkMaster#Mac1:7077/),Path(/User/Master)]
Networking-wise, at Mac1, I can SSH to Mac2, and vice versa, but I cannot telnet to Mac1:7077.
I will appreciate it if you help me to solve this problem.
tl;dr Use -h option for ./sbin/start-master.sh, i.e. ./sbin/start-master.sh -h Mac1
Optionally, you could do ./sbin/start-slave.sh spark://192.168.1.2:7077 instead.
The reason is that binding to ports in Spark is very sensitive to what names and IPs are used. So, in your case, 192.168.1.2 != Mac1. They're different "names" in Spark, and that's why you can use ssh successfully as it uses name resolver on OS while it does not work at Spark level where the above condition holds, i.e. the "names" are not equal.
Likely a networking/firewall issue on the mac.
Also, your error message you copy/pasted reference port 7070. is this the issue?
using IP addresses in conf/slaves works somehow, but I have to use IP everywhere to address the cluster instead of host name.
SPARK + Standalone Cluster: Cannot start worker from another machine

Marathon loses control over Mesos when Marathon and Mesos leaders mismatch

When mesos or marathon service restart due to some reasons and leader of mesos and marathon is not on the same machine, deployments stuck in marathon and nothing happens in mesos, that leads to terrible results when marathon can not restart failed services and do nothing with deployments until leaders will not match again.
Our cluster has 3 masters (installed through mesosphere website) and this situation happens quite often, is there any way to fix that?
Marathon v.0.9.0
Mesos v0.22.1
It sounds like either Mesos or Marathon use a private ip (localhost/127.0.0.1), thus they weren't able to talk to each other.
You should be able to solve your issue by setting a public ip using the respective --ip command line flag or LIBPROCESS_IP environment var.
One particularly useful setting is LIBPROCESS_IP, which tells the master and slave binaries which IP address to bind to; in some installations, the default interface that the hostname resolves to is not the machine’s external IP address, so you can set the right IP through this variable.
/source http://mesos.apache.org/documentation/latest/deploy-scripts/

Mesosphere not allowing External Traffic

I spun up a Mesosphere cluster on Digital Ocean (development) and it's not allowing me to allow external (non vpn) connections to containers or apps. How can this be solved ?
To ensure that the world doesn't have access to your cluster normally, there have been iptables rules installed. By default, these allow full access inside the cluster and nothing externally.
If you're interested in running real applications, I'd recommend the following:
Put HAProxy on a single node.
Setup the haproxy-marathon-bridge script.
On the same box that you installed HAProxy on, setup iptables to allow access to the port that HAProxy is listening on.
By doing this, you'll have a single place to refer to when giving access to applications running on your Mesos cluster. No matter where the app or container is scheduled (with marathon), you'll always be able to reach it via. haproxy.

How to restart single node hadoop cluster on ec2

I have installed a single node haodoop cluster on using Hortonworks/Ambari on Amazon's ec2 host.
Since I don't want this cluster running 24/7, I stop the instance when done. When I reboot the instance later, I get a new IP address and then ambari no longer is able to start the Hadoop related services.
Is there a way other than completely redeploying to reconfigure the cluster so the services will start?
It looks like the IP address lives in various xml files under /etc, in the postgres database table ambari, and possibly other places I haven't found yet.
I tried updating the xml files and postgres database with updated versions of the ip address, internal and external dns names as I could find them, but to no avail. I have not been able to restart the services.
The reason I am doing this is to possibly save the deployment time and data configuration on hdfs and other project specific setup each time I restart the host.
Any suggestions?
Thanks!
Elastic IP can be used. Also, since you mentioned it being a single node cluster - you can use localhost or private IP.
If you use elastic IP, your UIs will always be on the same public IP. However, if you use private IP or localhost and do not associate your instance with an elastic IP you will have to look for public IP everytime you start the instance and then connect to the web UI using the IP.
Thanks for the help, both Harman and TJ are correct. I haven't used an elastic IP because I might have more than one of these running and a time, and for now at least, I don't mind looking up the public ip address.
Harman's suggestion of using "localhost" as the fqdn when setting up ambari in the first place is a really good idea in retrospect. Unless I go through the whole setup again, that's water under the bridge for me, but I recommend this to others who might read this post.
In my case, I figured this out on my own before coming back to the page. The specific step I took was insanely simple after all, thanks to Occam's Razor.
I added the following line in /etc/hosts:
<new internal IP> <old internal dns name>
and then did
ambari-server restart. from the command line. Then I am able to restart all services after logging into ambari.

Resources