I Have installed a DCOS cluster with the guidance of below link(https://dcos.io/docs/1.10/installing/custom/advanced/).
Now, DCOS cluster up and running.I want to add "Placement Constraints" for the applications that host top of DCOS cluster.
I added parameters(MESOS_ATTRIBUTES=SPACE:RACK1) into
/opt/mesosphere/etc/mesos-slave-common file. After I added the, I could not up the dcos-mesos-slave service again
Could you please advise me how to approach this by using above DCOS installation method.
etc # cat mesos-slave-common
MESOS_MASTER=zk://zk-1.zk:2181,zk-2.zk:2181,zk-3.zk:2181,zk-4.zk:2181,zk-5.zk:2181/mesos
MESOS_CONTAINERIZERS=docker,mesos
MESOS_EXTERNAL_LOG_FILE=/var/log/mesos/mesos-agent.log
MESOS_MODULES_DIR=/opt/mesosphere/etc/mesos-slave-modules
MESOS_CONTAINER_LOGGER=com_mesosphere_mesos_JournaldLogger
MESOS_ISOLATION=cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime,docker/volume,volume/sandbox_path,volume/secret,posix/rlimits,namespaces/pid,linux/capabilities,com_mesosphere_MetricsIsolatorModule,cgroups/devices,gpu/nvidia
MESOS_DOCKER_VOLUME_CHECKPOINT_DIR=/var/lib/mesos/isolators/docker/volume
MESOS_IMAGE_PROVIDERS=docker
MESOS_NETWORK_CNI_CONFIG_DIR=/opt/mesosphere/etc/dcos/network/cni
MESOS_NETWORK_CNI_PLUGINS_DIR=/opt/mesosphere/active/cni/:/opt/mesosphere/active/dcos-cni/:/opt/mesosphere/active/mesos/libexec/mesos
MESOS_WORK_DIR=/var/lib/mesos/slave
MESOS_SLAVE_SUBSYSTEMS=cpu,memory
MESOS_LAUNCHER_DIR=/opt/mesosphere/active/mesos/libexec/mesos
MESOS_EXECUTOR_ENVIRONMENT_VARIABLES=file:///opt/mesosphere/etc/mesos-executor-environment.json
MESOS_EXECUTOR_REGISTRATION_TIMEOUT=10mins
MESOS_CGROUPS_ENABLE_CFS=true
MESOS_CGROUPS_LIMIT_SWAP=false
MESOS_DISALLOW_SHARING_AGENT_PID_NAMESPACE=true
MESOS_DOCKER_REMOVE_DELAY=1hrs
MESOS_DOCKER_STOP_TIMEOUT=20secs
MESOS_DOCKER_STORE_DIR=/var/lib/mesos/slave/store/docker
MESOS_GC_DELAY=2days
MESOS_HOSTNAME_LOOKUP=false
GLOG_drop_log_memory=false
MESOS_ATTRIBUTES=SPACE:RACK1
Mesos attributes should be added to /var/lib/dcos/mesos-slave-common, not /opt/mesosphere/etc/mesos-slave-common. Note that you may need to create this file the first time.
Steps
Stop the slave: systemctl stop dcos-mesos-slave
Add your attributes to /var/lib/dcos/mesos-slave-common
Clean out old live executors: rm -f /var/lib/mesos/slave/meta/slaves/latest
Start the slave: systemctl restart dcos-mesos-slave
Related
I follow the first steps to install Flink.
I can start the cluster without any problem
$ start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host DESKTOP-....
Starting taskexecutor daemon on host DESKTOP-....
But I don't get any status from
$ ps aux | grep flink
I can also not access the dashboard via localhost:8081.
There is an older post having these issues, but the solution didn't work for me, since the described conf files do no longer exist, apparently.
My JAVA_HOME is set as C:\Progra~1\Java\jdk1.8.0_311 to avoid issues with the space in Program Files.
Can you check the logs in the /logs folder? I'm suspecting that C:\Program Files\ could still cause issues because of the space there.
go to download Flink folder and try bash command
$./bin/start-cluster.sh --daemon bootstrap-server localhost:8081
and run code one more
$ ./bin/flink run examples/streaming/WordCount.jar
if you finished run above code which not issue, go to localhost:8081
This still seems to be problematic. I tried to run from Windows Subsystem for Linux (WSL).
I have the following versions: java 11.0.16 and flink 1.15.2.
sudo apt-get update
sudo apt install openjdk-11-jre-headless
export FLINK_HOME=/mnt/c/Projects/Apache/flink-1.15.2
I set the following in flink-conf.yaml
rest.port: 8081
rest.address: localhost
rest.bind-adress: 0.0.0.0
Whereby I changed the bind address for localhost to 0.0.0.0 this seems to have fixed the problem.
$FLINK_HOME/bin/start-cluster.sh
Now I can access the Flink Web Dashboard.
I deployed a service in DC/OS (the service is cassandra). The deployment failed and it kept retrying. Under DC/OS > Services > Tasks I could see a new task was created every a few minutes, but they all had the status of "Failed". Under the Debug tab I could see the TASK_FAILED state with a error message about how I misconfigured the service (I picked a user that does not exist).
So I wanted to destroy the service and start over again.
Under Services, I clicked on the menu on the service and selected "Delete". The command was taken, and the Status changed to "Deleting" But then it stayed there forever.
If I checked the Tasks tab, I could see that DC/OS was still attempting to start the server every a few minutes.
Now how do I delete the service? Thanks!
As per latest DCOS cassandra servicce docs, you should uninstall it using dcos cli :
dcos package uninstall --app-id=<service-name> cassandra
If you are using DCOS 1.9 or older version, then follow below steps to uninstall service :
$ MY_SERVICE_NAME=<service-name>
$ dcos package uninstall --app-id=$MY_SERVICE_NAME cassandra`.
$ dcos node ssh --master-proxy --leader "docker run mesosphere/janitor /janitor.py \
-r $MY_SERVICE_NAME-role \
-p $MY_SERVICE_NAME-principal \
-z dcos-service-$MY_SERVICE_NAME"
My Elasticsearch server is already running as a service. I can start and run it like so:
sudo service elasticsearch start
sudo service elasticsearch stop
However I would like to have it always running. Currently I need to start it manually on every system boot. I have already tried to register it as a deamon with the following commands:
sudo update-rc.d elasticsearch defaults
sudo update-rc.d elasticsearch defaults 95 10
I still need to start the Elasticsearch server manually. What do I need to do to run Elasticsearch as a daemon or start it at all on system startup? Since it is my local development environment, I would not need Elasticsearch as a daemon. I just need to start it on the startup of my system.
Not sure if you've found the answer or not (I'm assuming so), but for anyone who has not; you can use:
sudo systemctl enable elasticsearch.service
I have installed Mesosphere DCOS in AWS using the provided template. Now I would like to restart all the nodes but adding the --insecure-registry parameter to all the slave nodes (and master as well) so that they communicate with my docker registry. So I was reading the best way to do this is on the cloud-config script for the AWS template.
So in the AWS EC2 Launch Configurations I copied the configuration of the master node, then adjusted the User Data then updated the auto scaling groups and restarted the master. (awesome answer how to do this at the end of How do I use insecure docker registries with Amazon EC2 Container Service (ECS)?)
The lines were added to the end of the units section in cloud-config as suggested by CoreOS docs:
https://coreos.com/os/docs/latest/cloud-config.html
units
....Many lines here
- name: docker.service
drop-ins: |-
- name: 50-insecure-registry.conf
content: |
[Service]
Environment=DOCKER_OPTS='--insecure-registry="10.0.1.0/24"'
But then, the master wouldn't restart. So I had to revert my change.
So many questions:
a. Why is there no docker.service block in this template cloud-config? How and when docker starts?
b. Do I need to edit the flannel_docker_opts.env file? Again there is no mention to such file in this cloud-config. But there is mention in this page:
https://coreos.com/flannel/docs/latest/flannel-config.html
Of particular interest at the end of that page:
ExecStartPost in flanneld.service converts information in /run/flannel/subnet.env into Docker daemon command line args (such as --bip and --mtu), storing them in /run/flannel_docker_opts.env
...
docker.service sources in /run/flannel_docker_opts.env which contains env variables with command line options and starts the Docker with them.
And in fact I can see the mentioned files like early-docker.service, but again no mention of flannel in the cloud-config.
But indeed I found the service files mentioned in the page above:
/usr/lib64/udev/rules.d/80-docker.rules
/usr/lib64/systemd/system/early-docker.service
/usr/lib64/systemd/system/early-docker.socket
/usr/lib64/systemd/system/docker.service
/usr/lib64/systemd/system/docker.socket
/usr/lib64/systemd/system/sockets.target.wants/docker.socket
/usr/lib64/systemd/system/early-docker.target
And indeed the /run/flannel_docker_opts.env file is mentioned in the docker.service file, but does not exist in the /run folder:
vi /usr/lib64/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=docker.socket early-docker.target network.target
Requires=docker.socket early-docker.target
[Service]
Environment=TMPDIR=/var/tmp
EnvironmentFile=-/run/flannel_docker_opts.env <<<<<<<<<< HERE!!!!!
MountFlags=slave
LimitNOFILE=1048576
LimitNPROC=1048576
ExecStart=/usr/lib/coreos/dockerd --daemon --host=fd:// $DOCKER_OPTS $DOCKER_OPT_BIP $DOCKER_OPT_MTU $DOCKER_OPT_IPMASQ
[Install]
c. Where does this docker.service file comes from! There is no mention to it in the cloud-config. Is it part of the CoreOS architecture?
d. If this docker.service is integral part of CoreOS, why are all the files in this /usr/lib64 path? CoreOS docs mentions other path locations for all the files.
Any suggestion would be appreciated. I'm going blind now. I will try and create to this non existent flannel_docker_conf.env file. But I'm not sure if what I'm doing is the correct way.
Thanks!
I tried running ubuntu elasticsearch at 12:04, after I install and run is OK, but i'm check sudo /etc/init.d/elasticsearch status there I see message elasticsearch is not running. and I tried to run in the browser to localhost: 9200 also failed.
help me please..
It will not start automatically after you install it for good reason. You don't want it to accidentally join a cluster configured to use multicast discovery. See my post here for information on the basics for configuring elasticsearch.
In addition to that post, also make sure you set the following two options in /etc/elasticsearch/elasticsearch.yml:
cluster.name: some-other-name
discovery.zen.ping.multicast.enabled: false
After you have done that, start it by running:
sudo service elasticsearch start
You should almost always disable multicast because on a local testing environment you only have one node so you don't need it, and in a production environment it's just bad practice since nodes accidentally joining the cluster can break things (trust me, I've had this happen and it's a headache).
Did you try sudo /etc/init.d/elasticsearch start?