I've recently added a new mesos agent in my cluster. It appears in the web ui but if I go to the agents tab I see that in the Agent attribute is showing a master machine and in the Master attribute is not showing anything and furthermore the Resources section is empty.
Any help please,
thanks.
Ok, I think that is not an error. When you do not have any framework running in your agent this info is normal.
thanks anyway for you interest.
Related
Followed TeamCity's description of running a TeamCity build server on AWS with a cloudformation template. Launched it, it gets stuck at AgentService (Resource creation initiated). Waited for half an hour, no progress.
Resources tab shows the following:
What am I doing wrong here?
(For me) this typically happens if the service cannot be started for some reasons. For instance if the cluster does not have enough suitable instances to start your service or for some other reason.
For diagnostic, check your service in the ECS cluster and there check events and in tasks of your service, check stopped tasks (and reasons they were stopped).
Got a tip from a colleague that if you are creating a CF template based service, it may take up to 3(!) hours. Tried again today, after 3 hours it was up and running.
The reason for this is the setup of the ECS, which involves DNS setup for an internet facing service.
I am working spark processes using python (pyspark). I create an amazon EMR cluster to run my spark scripts, but when cluster is just created a lot of processes ar launched by itself (¿?), when I check cluster UI:
So, when I try to lunch my own script, they enter in an endless queue, sometime ACCEPTED but never get into RUNNING state.
I couldn't find any info about this issue even in amazon forums, so I'll glad any advice.
Thanks in advance.
you need to check in the security group of the master node, check the inbound traffic,
maybe you have a rule for anywhere, please remove that or try to remove and check if the things work it is a vulnerability.
I set-up a new Hadoop Cluster with Hortonworks Data Platform 2.5. In the "old" cluster (installed HDP 2.4) I was able to see the information about running Spark jobs via the History Server UI by clicking the link show incomplete applications:
Within the new installation this link opens the page, but it always sais No incomplete applications found! (when there's still an application running).
I just saw, that the YARN ResourceManager UI shows two different kind of links in the "Tracking UI" column, dependent on the status of the Spark application:
application running: Application Master
this link opens http://master_url:8088/proxy/application_1480327991583_0010/
application finished: History
this link opens http://master_url:18080/history/application_1480327991583_0009/jobs/
Via the YARN RM link I can see the running Spark app infos, but why can't I access them via Spark History Server UI? Was there somethings changed from HDP 2.4 to 2.5?
I solved it, it was a network problem: Some of the cluster hosts (Spark slaves) couldn't reach each other due to a incorrect switch configuration. Found it out, as I tried to ping each host from each other.
Since all hosts can ping each other hosts the problem is gone and I can see active and finished jobs in my Spark History server UI again!
I didn't noticed the problem, because the ambari-agents worked on each host, and the ambari-server was also reachable from each cluster host! However, since ALL hosts can reach each other the problem is solved!
I have 3 mesos master setup for a cluster with 3 different. But I have a silly question here, how a application can resolve which uri to access. Lets say I am just browsing the admin console, do I have to try all 3 ip:5050 to get the hit?
Since there is only ever one active (leading) Mesos Master in a HA setup you only need to access that one IP. There seems to be a second question intermingled with the Master-related one, which I think revolves around the general case of mapping Mesos tasks to IP:PORT combinations: for this, Mesos-DNS is a useful solution.
You should be able to try just one Master endpoint. It will redirect to the "real" Master automatically after 5 seconds if you hit a non-leading Master.
When using Mesos-DNS and add that to your local systems DNS preferences you just can enter http://leader.mesos:5050/ in your browser for example to access the currently leading masters WebUI.
I have cluster of 3 Mesos slaves, where I have two applications: “redis” and “memcached”. Where redis depends on memcached and the requirement is both of the applications/services should start on same node instead of different slave nodes.
So I have created the application group and added the dependency properly in the JSON file. After launching the JSON file via “v2/groups” REST API, I observe that sometime both application group will start on same node but sometimes it will start on different slaves which breaks our requirement.
So intent/requirement is; if any application fails to start on a slave both the application should failover to other slave node. Also can I configure the JSON file to tell Marathon to start the application group on slave-1 (specific slave first) if it is available else start it on other slave in a cluster. Due to some reason if this application group will start on other slave can Marathon relaunch the application group to slave-1 if it is available to serve the request.
Thanks in advance for help.
Edit/Update (2):
Mesos, Marathon, and DC/OS support for PODs is available now:
DC/OS: https://dcos.io/docs/1.9/usage/pods/using-pods/
Mesos: https://github.com/apache/mesos/blob/master/docs/nested-container-and-task-group.md
Marathon: https://github.com/mesosphere/marathon/blob/master/docs/docs/pods.md
I assume you are talking about marathon apps.
Marathon application groups don't have any semantics concerning co-location on the same node and the same is the case for dependencies.
You seem to be looking for a Kubernetes like Pod abstraction in marathon, which is on the roadmap but not yet available (see update above :-)).
Hope this helps!
I think this should be possible (as a workaround) if you specify the correct app contraints within the group's JSON.
Have a look at the example request at
https://mesosphere.github.io/marathon/docs/generated/api.html#v2_groups_post
and the constraints syntax at
https://mesosphere.github.io/marathon/docs/constraints.html
e.g.
"constraints": [["hostname", "CLUSTER", "slave-1"]]
should do. Downside is that there will be no automatic failover to another slave that way. Still, I'd be curious why both apps need to specifically run on the same slave node...