I'm trying to run an etcd cluster of 3 machines (10.0.0.1, 10.0.0.2, 10.0.0.3) with SSL/TLS both for client and transport security, but I'm still getting into trouble, it seems like the cluster isn't able to elect its leader - raft falls into cycle. Am I doing something wrong? All machines uses etcd 2.0.5
server1
etcd -name eu1 -data-dir eu1 \
-ca-file=/root/etcd-ca/ca.crt -cert-file=/root/etcd-ca/server1.crt -key-file=/root/etcd-ca/server1.key \
-peer-ca-file=/root/etcd-ca/ca.crt -peer-cert-file=/root/etcd-ca/server1.crt -peer-key-file=/root/etcd-ca/server1.key \
-initial-advertise-peer-urls=https://10.0.0.1:2380 -listen-peer-urls=https://10.0.0.1:2380 \
-discovery https://discovery.etcd.io/7855c14b6cd05060974839f3833ea932
server2
etcd -name eu2 -data-dir eu2 \
-ca-file=/root/etcd-ca/ca.crt -cert-file=/root/etcd-ca/server2.crt -key-file=/root/etcd-ca/server2.key \
-peer-ca-file=/root/etcd-ca/ca.crt -peer-cert-file=/root/etcd-ca/server2.crt -peer-key-file=/root/etcd-ca/server2.key \
-initial-advertise-peer-urls=https://10.0.0.2:2380 -listen-peer-urls=https://10.0.0.2:2380 \
-discovery https://discovery.etcd.io/7855c14b6cd05060974839f3833ea932
server3
etcd -name player -data-dir player \
-ca-file=/root/etcd-ca/ca.crt -cert-file=/root/etcd-ca/server3.crt -key-file=/root/etcd-ca/server3.key \
-peer-ca-file=/root/etcd-ca/ca.crt -peer-cert-file=/root/etcd-ca/server3.crt -peer-key-file=/root/etcd-ca/server3.key \
-initial-advertise-peer-urls=https://10.0.0.3:2380 -listen-peer-urls=https://10.0.0.3:2380 \
-discovery https://discovery.etcd.io/7855c14b6cd05060974839f3833ea932
Log file with outputs: http://pastebin.com/JBitRT1e
Thanks for any kind of help!
J.
Related
I am attempting to have the New Relic Infrastructure Agent monitor my heroku applications.
The documentation says to run the following:
docker run \
-d \
--name newrelic-infra \
--network=host \
--cap-add=SYS_PTRACE \
--privileged \
--pid=host \
-v "/:/host:ro" \
-v "/var/run/docker.sock:/var/run/docker.sock" \
-e NRIA_LICENSE_KEY=[Key]\
newrelic/infrastructure:latest
But where do I actually run or put this so it runs it on my Heroku apps?
What's my scenario?
I have e.g. two external cards which can be plug/unplug without power down of pc.
And these cards are the resources I want to managed with mesos.
Currently, I use attributes to manage them: the attributes nodeKey:card1_key and nodeKey:card2_key are registered to master to distinguish two different cards. Then if card1 was used, I directly flag all cpu, mem was used for mesos-agent1, then master will not offer mesos-agent1 to framework.
Also, with this if I need to unplug card1, I can directly shutdown mesos-agent1 without affect for mesos-agent2 which is used for card2.
Above is my scenario, every works fine except if I have a lots of cards, I had to setup a lots of mesos-agent for every card. This will somewhat memory consume.
Current solution command:
Card1:
docker run -d --net=host --name=mesos-agent1 --privileged \
-e MESOS_IP=$PC_IP \
-e MESOS_HOSTNAME=$PC_IP \
-e MESOS_PORT=$node_port \
-e MESOS_MASTER=zk://$SERVER_IP:2181/mesos \
-e MESOS_ATTRIBUTES="nodeKey:card1_key" \
-e MESOS_SWITCH_USER=0 \
-e MESOS_CONTAINERIZERS=docker,mesos \
-e MESOS_LOG_DIR=/var/log/mesos \
-e MESOS_WORK_DIR=/var/tmp/mesos \
-v "$(echo ~)/.dp/mesos-slave/log/mesos-$nodeKey:/var/log/mesos" \
-v "$(echo ~)/.dp/mesos-slave/tmp/mesos-$nodeKey:/var/tmp/mesos" \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /cgroup:/cgroup \
-v /sys:/sys \
-v $(which docker):/usr/bin/docker \
mesosphere/mesos-slave:1.3.0
Card2:
docker run -d --net=host --name=mesos-agent2 --privileged \
-e MESOS_IP=$PC_IP \
-e MESOS_HOSTNAME=$PC_IP \
-e MESOS_PORT=$node_port \
-e MESOS_MASTER=zk://$SERVER_IP:2181/mesos \
-e MESOS_ATTRIBUTES="nodeKey:card2_key" \
-e MESOS_SWITCH_USER=0 \
-e MESOS_CONTAINERIZERS=docker,mesos \
-e MESOS_LOG_DIR=/var/log/mesos \
-e MESOS_WORK_DIR=/var/tmp/mesos \
-v "$(echo ~)/.dp/mesos-slave/log/mesos-$nodeKey:/var/log/mesos" \
-v "$(echo ~)/.dp/mesos-slave/tmp/mesos-$nodeKey:/var/tmp/mesos" \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /cgroup:/cgroup \
-v /sys:/sys \
-v $(which docker):/usr/bin/docker \
mesosphere/mesos-slave:1.3.0
My question:
So if possible I can just register one mesos agent to mesos master, meanwhile can support my scenario:
a) Card1 was used, the scheduler framework can tag it as used, then next resource offer will have no card1 just card2 offerd? Seems something like --resources='cpus:24;gpus:2;mem:24576;disk:409600;ports:[21000-24000,30000-34000]', if one task used 4 cpus, next time master will just offer 20 cpus, but this cannot be done with --attributes. But mesos seems just can afford interface to programmer customize for --attributes not --resources?
b) If we need to unplug card1 or add a new card3, we could change some parameters of mesos-agent without restart the agent, then currently used e.g. card2 will not be impacted?
Any solution possible, or I had to bear with my current solution?
The simple answer is NO.
You cannot just start one mesos agent for multiple resources. Mesos is a solution of virtualization which indicating multiple(resources) to one.
But I think your requirements would be supported well by an external tool — marathon, one of scheduler frameworks based on mesos.
Marathon would maintain every container’s status scheduled by itself. In your case, if you unplug card1 without any other operations, marathon would know(of course there is an internal gap) the containers on card1(mesos-agent1) dead already. Then marathon will re-schedule these containers, which would request resource from mesos(master). Mesos master offer resources for re-scheduled containers, DONE!
See? No extra operations, you may unplug any cards if you wish to — without any impact of running containers or mesos agents. But you must register new cards to mesos master by starting a new mesos agent on them.
Hope this helps.
I am trying to spin up an emr cluster with fair scheduling such that I can run multiple steps in parallel. I see that this is possible via pipeline (https://aws.amazon.com/about-aws/whats-new/2015/06/run-parallel-hadoop-jobs-on-your-amazon-emr-cluster-using-aws-data-pipeline/), but I already have cluster management / creating automated via an airflow job calling the awscli[1] so it would be great to just update my configurations.
aws emr create-cluster \
--applications Name=Spark Name=Ganglia \
--ec2-attributes "${EC2_PROPERTIES}" \
--service-role EMR_DefaultRole \
--release-label emr-5.8.0 \
--log-uri ${S3_LOGS} \
--enable-debugging \
--name ${CLUSTER_NAME} \
--region us-east-1 \
--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=4,InstanceType=m3.xlarge)
I think it may be achieved using the --configurations (https://docs.aws.amazon.com/cli/latest/reference/emr/create-cluster.html) flag, but not sure of the correct env names
Yes, you are correct. You can use EMR configurations to achieve your goal. You can create a JSON file with something like below :
yarn-config.json:
[
{
"Classification": "yarn-site",
"Properties": {
"yarn.resourcemanager.scheduler.class": "org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler"
}
}
]
as per Hadoop Fair Scheduler docs
Then modify you AWS CLI as :
aws emr create-cluster \
--applications Name=Spark Name=Ganglia \
--ec2-attributes "${EC2_PROPERTIES}" \
--service-role EMR_DefaultRole \
--release-label emr-5.8.0 \
--log-uri ${S3_LOGS} \
--enable-debugging \
--name ${CLUSTER_NAME} \
--region us-east-1 \
--instance-groups \
--configurations file://yarn-config.json
InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge
InstanceGroupType=CORE,InstanceCount=4,InstanceType=m3.xlarge)
I'm running into an odd behavior on the latest version of vagrant in a Windows7/msys/Virtualbox environment setup, where after executing a vagrant up command I get an error with rsync; 'file has vanished: "/c/Users/spencerd/workspace/watcher/.LISTEN' doing the provisioning stage.
Since google, irc, and issue trackers have little to no documentation on this issue I wonder if anyone else ran into this and what would the fix be?
And for the record I have successfully build a box using the same vagrant file and provisioning script. For those that want to look, the project code is up at https://gist.github.com/denzuko/a6b7cce2eae636b0512d, with the debug log at gist.github.com/
After digging further into the directory structure and running into issues with git pushing code up I was able to find a non-existant file that needed to be removed after a reboot.
Thus, doing a reboot and a rm -rf -- "./.LISTEN\ \ \ \ \ 0\ \ \ \ \ \ 100\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ " did the trick.
I'm unable to bootstrap my server because "knife ec2 server create" keeps expanding my runlist to "roles".
knife ec2 server create \
-V \
--run-list 'role[pgs]' \
--environment $1 \
--image $AMI \
--region $REGION \
--flavor $PGS_INSTANCE_TYPE \
--identity-file $SSH_KEY \
--security-group-ids $PGS_SECURITY_GROUP \
--subnet $PRIVATE_SUBNET \
--ssh-user ubuntu \
--server-connect-attribute private_ip_address \
--availability-zone $AZ \
--node-name pgs \
--tags VPC=$VPC
Consistently fails because 'roles[pgs]' is expanded to 'roles'. Why is this? Is there some escaping or alternative method I can use?
I'm currently working around this by bootstrapping with an empty run-list and then overriding the runlist by running chef-client once the node is registered.
This is a feature of bash. [] is a wildcard expander. You should can escape the brackets using "\".