Could not delete DC/OS service that was failed to deploy - mesos

I deployed a service in DC/OS (the service is cassandra). The deployment failed and it kept retrying. Under DC/OS > Services > Tasks I could see a new task was created every a few minutes, but they all had the status of "Failed". Under the Debug tab I could see the TASK_FAILED state with a error message about how I misconfigured the service (I picked a user that does not exist).
So I wanted to destroy the service and start over again.
Under Services, I clicked on the menu on the service and selected "Delete". The command was taken, and the Status changed to "Deleting" But then it stayed there forever.
If I checked the Tasks tab, I could see that DC/OS was still attempting to start the server every a few minutes.
Now how do I delete the service? Thanks!

As per latest DCOS cassandra servicce docs, you should uninstall it using dcos cli :
dcos package uninstall --app-id=<service-name> cassandra
If you are using DCOS 1.9 or older version, then follow below steps to uninstall service :
$ MY_SERVICE_NAME=<service-name>
$ dcos package uninstall --app-id=$MY_SERVICE_NAME cassandra`.
$ dcos node ssh --master-proxy --leader "docker run mesosphere/janitor / \
-r $MY_SERVICE_NAME-role \
-p $MY_SERVICE_NAME-principal \
-z dcos-service-$MY_SERVICE_NAME"


helm chart installation of cert-manager timed out for AKS

I'm trying to create two kubernetes (AKS) clusters for test environment using Azure DevOps. These clusters using letsencrypt certificates for their endpoints. I'm therefore automating the creation of these certificates using helm charts.
For some reason, the cert-manager installation helm task times out if I create two clusters around the same time.
I have tested the same release process with a single cluster and there isn't a problem when i run my deployment.
The helm cert-manager installation command that runs is:
c:\agent\_work\_tool\helm\2.11.0\x64\windows-amd64\helm.exe install --set ingressShim.defaultIssuerName=letsencrypt-prod,ingressShim.defaultIssuerKind=ClusterIssuer,rbac.create=false,serviceAccount.create=false --name appl-cert-manager --wait stable/cert-manager
As i said, this command for the 1st cluster succeeds. I receive a message:
16:20:26.4583241Z cert-manager has been deployed successfully!
However, the second command takes about 5 minutes. Then I receive this message:
2018-11-08T16:28:14.4988796Z ##[error]Error: release appl-cert-manager failed: timed out waiting for the condition
Is this happening because the name has to be globally unique?
In case someone has the same problem, it's got a simple solution that works consistently for me.
Add a timeout argument to helm:
--timeout 600
for example for what I assume is a 10 minute timeout setting.

ICP fails to start after machine reboot

I have ICP V2.1 installed into a RHEL VMWare image. After rebooting the image, ICP fails to start in what appears to be the first known issue in the documentation (Kubernetes controller manager fails to start after a master or cluster restart). However, the prescribed resolution does not get my system going.
Here is the running pod list:
calico-node-amd64-dtl47 2/2 Running 14 20h
filebeat-ds-amd64-mvcsj 1/1 Running 8 20h
k8s-etcd- 1/1 Running 7 20h
k8s-mariadb- 1/1 Running 7 20h
k8s-master- 2/3 CrashLoopBackOff 15 17m
k8s-proxy- 1/1 Running 7 20h
metering-reader-amd64-gkwt4 1/1 Running 7 20h
monitoring-prometheus-nodeexporter-amd64-sghrv 1/1 Running 7 20h
Removing the k8s-master- pod and allowing it to restart only puts it back into the CrashLoopBackOff state. Here is how the last line in controller manager log looks:
F1029 23:55:07.345341 1 controllermanager.go:176] error building controller context: failed to get supported resources from server: unable to retrieve the complete list of server APIs: an error on the server ("Error: 'dial tcp getsockopt: connection refused'\nTrying to reach: ''") has prevented the request from succeeding
Removing the pod or removing the failed controller master docker container directly has no effect. It seems like another service hasn't started yet, or failed to start. I've waited several hours to see if the issue resolves itself, but to no avail.
Before the fix of, kuberentes controller manager failed to start if an registered extension-apiserver is not ready. In ICP, service catalog is implemented as extension-apiserver.
Usually after ICP master is restarted, kubelet will start the k8s management service first as static pod. After that, it will get pods/nodes/service information from kubernetes api server, and then start all the pods including catalog api service. For that case, the whole cluster is recovered.
However for your case, there is a race condition that when kubelet get pods information from kuberentes api server and start all the pods, it has not get the nodes information from kubernetes api server yet. As a result, kubelet failed to start catalog api service due to nodeSelector is not met. The whole cluster failed to be recovered.
In next release of ICP, kuberentes will be upgraded into 1.8.2 with the fix of The issue will be resolved completely.
Before that you could try the following workaround method.
Use the -s flag form of the kubectl command if your token has expired after restart and you no longer have access to the GUI to re-establish it.
Delete apiservices of
kubectl delete apiservices
kubectl -s delete apiservices
Delete the dead controller manager
docker rm <k8s controller manager>
Wait until service catalog started
Recover the service catalog apiservices by re-register the apiservice of
kubectl apply -f cluster/cfc-components/service-catalog/apiregistration.yaml
kubectl -s apply -f cluster/cfc-components/service-catalog/apiregistration.yaml

DCOS Slaves : add placement constraints

I Have installed a DCOS cluster with the guidance of below link(
Now, DCOS cluster up and running.I want to add "Placement Constraints" for the applications that host top of DCOS cluster.
I added parameters(MESOS_ATTRIBUTES=SPACE:RACK1) into
/opt/mesosphere/etc/mesos-slave-common file. After I added the, I could not up the dcos-mesos-slave service again
Could you please advise me how to approach this by using above DCOS installation method.
etc # cat mesos-slave-common
Mesos attributes should be added to /var/lib/dcos/mesos-slave-common, not /opt/mesosphere/etc/mesos-slave-common. Note that you may need to create this file the first time.
Stop the slave: systemctl stop dcos-mesos-slave
Add your attributes to /var/lib/dcos/mesos-slave-common
Clean out old live executors: rm -f /var/lib/mesos/slave/meta/slaves/latest
Start the slave: systemctl restart dcos-mesos-slave

on installation of gitlab , at final step sudo service gitlab start ,facing an issue i.e. unicorn webserver is not running ,what to do?

On installing gitlab, everything is fine.
But at final step of installation when I run command
$ sudo service gitlab start
it shows an error i.e., unicorn web server is not working but GitLab Sidekiq job dispatcher with pid 4182 is running properly.

Hadoop installation: what is "This is comment for WebHCat Service (sic)"

Using Ambari, This is comment for WebHcat Service is the final selection in the “Services Selection” step.
If I don't select this service, then the Customize Services step hangs indefinitely. It doesn't matter which other services are selected.
If I select it, then the Customize Services step functions normally, but the installation will stop on step four with the error message:
An internal system exception occurred:
Configuration with tag version1439256707212 exists for webhcat-site
This is on a clean install, for a single node SLES 11 SP3 server.
What is the service This is comment for WebHcat Service, and why is it a comment instead of a service name?
If this is a fresh install, it's strange your getting configuration already exists errors. I would try to clean your ambari server instance by running:
sudo ambari-server reset
This will reset the postgres database that ambari-server uses, giving you a clean slate to retry another cluster install.
