Unable to get Mesos to run from tutorial: Setting up a Single Node Mesosphere Cluster - amazon-ec2

I have been following this tutorial to try and setup a single node mesosphere cluster from their
official tutorial:
http://mesosphere.com/docs/getting-started/developer/single-node-install/
I followed all the commands without any issues, and I also added the ports 5050 and 8080 to my security group. When I try to access the console for mesos/marathon, I get a "Internet Explorer cannot display the webpage" message.
They also recommend checking it the following way:
MASTER=$(mesos-resolve `cat /etc/mesos/zk`)
mesos-execute --master=$MASTER --name="cluster-test" --command="sleep 5"
But that comes up with an error:
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0106 17:03:08.126703 20993 process.cpp:1561] Failed to initialize, gethostbyname2: Unknown host
*** Check failure stack trace: ***
I am not really sure how to troubleshoot this either, and there are not many tutorials I could find on how to install mesos on ubuntu.
I checked the contents of the zk file, seems to be the default value.
$ cat /etc/mesos/zk
zk://localhost:2181/mesos
I would really appreciate any clues on how to go about this one.
Edit: The process is definitely running too - just an fyi:
root 31545 8.5 5.9 187464 35604 ? Ssl 17:28 0:00 /usr/local/sbin/mesos-slave --master=zk://localhost:2181/mesos --log_dir=/var/log/mesos
root 31563 28.5 2.1 116304 12856 ? Rs 17:28 0:00 /usr/local/sbin/mesos-master --zk=zk://localhost:2181/mesos --port=5050 --log_dir=/var/log/mesos --quorum=1 --wo

Mesos uses gethostbyname2 to resolve hostnames to IPs. The first thing I would recommend, is to try "ping localhost" and "ping hostname", and verify that there are no strange settings in /etc/hosts. If you're doing a multi-node cluster, I'd recommend that hostname map to the public IP address (not 127.0.x.1).
If that doesn't help, you can try setting the --ip and --hostname flags when starting mesos-master and mesos-slave, to bypass the gethostbyname2 resolution. These can also be set by writing to the file-based parameters, e.g. /etc/mesos/mesos-master/ip
For additional troubleshooting, try running wget http://localhost:5050 (or curl -L) from the mesos master, to verify that it is locally visible. Also try wget http://<public_ip>:5050 to verify that the web server is up and serving to the public IP. Depending on how your (EC2?) node is setup, you may need to expose/forward the port, or connect to a VPN.

Thanks Adam. I ran the wget and curl commands, and nothing was actually listening on port 8080 or 5050. I did open those ports in the ec2. A simple reboot did the trick however, once I ssh'ed into the ec2 instance after the reboot, both mesos and marathon were running and both ports are now showing after I ran
netstat -ntln.

Related

Cannot access Flink dashboard localhost:8081 on windows

I follow the first steps to install Flink.
I can start the cluster without any problem
$ start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host DESKTOP-....
Starting taskexecutor daemon on host DESKTOP-....
But I don't get any status from
$ ps aux | grep flink
I can also not access the dashboard via localhost:8081.
There is an older post having these issues, but the solution didn't work for me, since the described conf files do no longer exist, apparently.
My JAVA_HOME is set as C:\Progra~1\Java\jdk1.8.0_311 to avoid issues with the space in Program Files.
Can you check the logs in the /logs folder? I'm suspecting that C:\Program Files\ could still cause issues because of the space there.
go to download Flink folder and try bash command
$./bin/start-cluster.sh --daemon bootstrap-server localhost:8081
and run code one more
$ ./bin/flink run examples/streaming/WordCount.jar
if you finished run above code which not issue, go to localhost:8081
This still seems to be problematic. I tried to run from Windows Subsystem for Linux (WSL).
I have the following versions: java 11.0.16 and flink 1.15.2.
sudo apt-get update
sudo apt install openjdk-11-jre-headless
export FLINK_HOME=/mnt/c/Projects/Apache/flink-1.15.2
I set the following in flink-conf.yaml
rest.port: 8081
rest.address: localhost
rest.bind-adress: 0.0.0.0
Whereby I changed the bind address for localhost to 0.0.0.0 this seems to have fixed the problem.
$FLINK_HOME/bin/start-cluster.sh
Now I can access the Flink Web Dashboard.

Installing Kubernetes on mac with vagrant and virtualbox

This is my first attempt to install and use Kubernetes. I am trying to install an environment on Mac for developing my own apps and deploying them for test locally with Kubernetes. I am familiar with using Vagrant, VirtualBox and Docker for the same purpose. When I saw this page https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/vagrant.md I assumed it would be trivial. I executed these lines:
export KUBERNETES_PROVIDER=vagrant
curl -sS https://get.k8s.io | bash
This created a master VM and a Minion, but Kubernetes seems to have failed to start on the master. On the master /var/log/salt/master is full of python Traceback errors, like this:
2015-07-17 22:14:42,629 [cherrypy.error ][INFO ][3252] [17/Jul/2015:22:14:42] ENGINE Started monitor thread '_TimeoutMonitor'.
2015-07-17 22:14:42,736 [cherrypy.error ][ERROR ][3252] [17/Jul/2015:22:14:42] ENGINE Error in HTTP server: shutting down
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 187, in _start_http_thread
self.httpserver.start()
File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", line 1824, in start
raise socket.error(msg)
error: No socket could be created
Vagrant is version 1.7.3. VirtualBox is version 4.3.30
Have I made an obvious stupid mistake?
I don't yet know the fix but I know what is going wrong since it happens to me as well:
OS X 10.10.3
Vagrant 1.7.4
VirtualBox 4.3.30
Kubernetes 1.0.1
When I run the default configuration of this (which creates one "master" and one "minion" VM) I see that the static IP address is not being assigned to the "eth1" interface, and I also see that the Salt API server is sitting in what appears to be an infinite retry loop because it is trying to listen on that IP address.
Also, the following message happened during boot:
[vagrant#kubernetes-master ~]$ dmesg | grep eth1
[ 9.321496] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
So basically, the static IP address didn't get assigned because eth1 wasn't ready when the system first booted, and Salt is waiting for it to get assigned.
I could fix this after boot by sshing to the box using "vagrant ssh" and running the command:
sudo /etc/init.d/network restart
on each host.
This "fixes" eth1 by assigning the static IP address, and after that Salt begins to do its thing, installs Docker, boots various containers, and so on.
What I don't know is how to make this work every time without manual intervention. It appears to be some sort of a race condition between Vagrant and VirtualBox.
If you just want to kick the tires with Kubernetes, I'd recommend installing boot2docker and then following the Running kubernetes locally via Docker getting started guide. Once you are comfortable interacting with the Kubernetes API and want a more complex local setup, you can then work on installing Vagrant.
If the Vagrant instructions aren't working, you should also feel free to file a bug in the github repository.
The tutorial pointed by Robert is realy easy to run. Just change the version to 0.21.2 (maybe 0.21.3 works too).
Else, if you prefer a vagrant solution, try with pires cluster on vagrant. It runs with almost nothing to change.
Running Kubernetes inside VirtualBox requires 4 networks and some adjustments to the configuration:
The VirtualBox HOST ONLY network will be the network used to access the Kubernetes master and nodes from the Mac or PC.
The NAT Network to download packages from the Internet.
The internal connections between Kubernetes PODs uses a tunnel network TUN
The Kubernetes Cluster IP Network is a private IP range used inside the cluster to give each Kubernetes service a dedicated IP
Vagrantfile needs to pass the node public IPs to the Ansible roles that configure Kubernetes to set KUBELET_EXTRA_ARGS environment variable with the public IP of each node (required for reading logs using kubectl).
NodePort needs to be used to publish applications running inside the Kubernetes cluster as Load Balancers are not available in VirtualBox.
See the full example and download the code at Building a Kubernetes Cluster with Vagrant and Ansible (without Minikube), it has been tested in Ubuntu but should work on a MAC as well.

CDH 5.1 host IP address change

I have a CDH 5.1 cluster with 3 nodes. We installed it using cloudera manager automated installation.
It was running perfect until we moved the box to a different network and IP addresses changed. I tried following steps
1. Stopped service, cloudera-scm-server.
2. Stopped service, cloudera-scm-agent
3. Edit the /etc/cloudera-scm-agent/config.ini
4. change the server host to the new ip.
5. restart service, cloudera-scm-agent, cloudera-scm-server.
not working .
Then i followed
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-latest/Cloudera-Manager-Administration-Guide/cmag_change_hostnames.html
Not helped even after changing the ips in the PostgreSQL directly.
I found following blog :
http://www.geovanie.me/changing-ip-of-node-in-cdh-cluster/
Getting following error in the scm-agent log file
ProtocolError: <ProtocolError for 127.0.0.1/RPC2: 401 Unauthorized>
Not working ....
Can anyone please help how to change all IP addresses in a cdh 5.1 cluster safely .....
Thanks,
Amit
This is causing because of precious cloudera-scm-agent service wasn't stopped correctly, please try,
$> ps -ef | grep supervisord
$> kill -9 <processID>
then restart the agent again.
$>service cloudera-scm-agent start

Error in Cloudera Cluster installation process?

I have installed Cloudera manager successfully. It shows Currently managed hosts as 127.0.0.1 and it is active.
When I search and install cluster using the cloudera manager after the loads it shows the following error.
Installation failed. Failed to receive heartbeat from agent.
Ensure that the host's hostname is configured properly.
Ensure that port 7182 is accessible on the Cloudera Manager server (check firewall rules).
Ensure that ports 9000 and 9001 are free on the host being added.
Check agent logs in /var/log/cloudera-scm-agent/ on the host being added (some of the logs can be found in the installation details).
The following image clearly shows the problem while installing my cluster on cloudera manager.
I had a similar problem and it turned out the issue was conveniently skipping (unfortunately) the ...password-less SSH key ... step
After several hours breaking my head over it, I realised this.
At the terminal do,
ls -al ~/.ssh
You must see files like,
abc
abc.pub
These are you public/private key pairs. [Not necessarily the same names as mine above].The file name you used in Setting up SSH public/private keys step for your machine.
You need to copy the data in abc.pub to a file authorized_keys in this same folder. If its not there, create authorized_keys.
Incase you don't have you public/private key pair see here
For ubuntu, the problem is usually because of the association of "ubuntu 127.0.1.1." in your /etc/hosts file. For me, after changing it to "ubuntu 127.0.0.1", which is the standard local loopback, I can add the cluster successfully. Hope this helps!
I was struggling with this problem for two days. Fixing /etc/hosts as suggested by "khoadoan" worked for me.
/etc/hosts was looking like this when I had the problem
127.0.0.1 localhost
127.0.1.1 ubuntu
I changed it like this:
127.0.0.1 localhost
127.0.0.1 ubuntu
Restarted the machine.
sudo init 6
Launched the Cloudera Manager Admin page. This time the host status was already showing up "Managed = Yes". And I got an additional tab "Currently Managed Hosts(1)", where the local host was listed.

Installing Membase from source

I am trying to build and install membase from source tarball. The steps I followed are:
Un-archive the tar membase-server_src-1.7.1.1.tar.gz
Issue make (from within the untarred folder)
Once done, I enter into directory install/bin and invoke the script membase-server.
This starts up the server with a message:
The maximum number of open files for the membase user is set too low.
It must be at least 10240. Normally this can be increased by adding
the following lines to /etc/security/limits.conf:
Tried updating limits.conf as suggested, but no luck it continues to pop up the same message and continues booting
Given that the server is started I tried accessing memcached over port 11211, but I get a connection refused message. Then figured out (netstat) that memcached is listening to 11210 and tried telneting to port 11210, unfortunately the connection is closed as soon as I issue the following commands
stats
set myvar 0 0 5
Note: I am not getting any output from the commands above {Yes: stats did not show anything but still I issued set.}
Could somebody help me build and install membase from source? Also why is memcached listening to 11210 instead of 11211?
It would be great if somebody could also give me a step-by-step guide which I can follow to build from source from Git repository (I have not used autoconf earlier).
P.S: I have tried installing from binaries (debian package) on the same machines and I am able to successfully install and telnet. Hence not sure why is build from source not working.
You can increase the number of file descriptors on your machine by using the ulimit command. Try doing (you might need to use sudo as well):
ulimit -n 10240
I personally have this set in my .bash_rc so that whenever I start my terminal it is always set for me.
Also, memcached listens on port 11210 by default for Membase. This is done because Moxi, the memcached proxy server, listens on port 11211. I'm also pretty sure that the memcached version used for Membase only listens for the binary protocol so you won't be able to successfully telnet to 11210 and have commands work correctly. Telneting to 11211 (moxi) should work though.

Resources