When adding a node manually or via automatic horizontal scaling master node will be cloned. This means that also crontab will be cloned right?
How can i avoid, that a cron starts on two or more nodes simultaneously (which usually is not intended)?
There are few possible ways:
Your script can understand that it's not the master node, so it can remove itself from the cron or just do nothing. Each node has information about the master node in the layer/nodeGroup.
env | grep MASTER
MASTER_IP=172.25.2.1
MASTER_HOST=node153580
MASTER_ID=153580
Disable cron via Cloud Scripting at onAfterScaleOut. Here is an example how to use this event.
Deploy software templates as custom docker images (even if you use a certified Jelastic template). Such images are not cloned during horizontal scaling, they are created from scratch.
Related
Need to change the labels of worker nodes with a predefined combination of alphanumeric characters (example: machine01) as and when they join the cluster (again and again if node(s) leave or when new nodes join the cluster). Is it possible to do it using ansible or we need to setup a cronjob. If possible using ansible, I would like to have a hint on how we can run a playbook once and keep it active in background to keep on checking new node labels. Which is more cheap computationally.
How we can run a playbook once and keep it active (annot.: running) in background to keep on checking new node labels?
Since Ansible is a push based Configuration Management tool, it is not designed and developed for such use case. Ansible will just connect to the remote device via SSH and perform a configuration task.
Further Documentation
Ansible concepts
Ansible playbooks
Is there a proper way to execute a shell script on every node in a running EMR hadoop cluster?
Everything I look for brings up bootstrap actions, but that only applies to when the cluster is starting, not for a running cluster.
My application is using python, so my current guess is to use boto to list the IPs of each node in the cluster, then loop through each node and execute the shell script via ssh.
Is there a better way?
If your cluster is already started, you should use steps.
The steps are executed after the cluster is started, so technically it appears to be what you are looking for.
Be careful, steps are executed only on the master node, you should connect to the rest of your nodes in some way for modifyng them.
Steps are scripts as well, but they run only on machines in the
Master-Instance group of the cluster. This mechanism allows
applications like Zookeeper to configure the master instances and
allows applications like Hbase and Apache Drill to configure
themselves.
Reference
See this also.
I am exploring Rundeck for my Continuous Delivery platform. The challenge I could foresee here is automating the rundeck itself - adding the nodes to the Rundeck whenever a new node/vm get created.
I thought of creating the vm with the the public keys of my rundeck server and adding the vm details into the resources file [~/rundeck/projects/../resources.xml]. But its an inefficient approach as I have to manage the resources.xml file by removing the entries each time a vm is deleted. I am primary depending on chef for infrastructure provisioning, getting the node inventory from the chef seems like a viable solution but it adds more overhead and delays in the workflow.
It would be great if I could get some simple/clean suggestions for solving the problem.
As suggested you could download and use chef-rundeck gem from the below link.
https://github.com/oswaldlabs/chef-rundeck
But if you need audit information on the nodes like who added a node or who deleted a node or when the changes in node info occurred, I would suggest maintaining a node info file in SVN or Git and use the URL source option.
This is supported in the Poise Rundeck cookbook via the rundeck_node_source_file resource.
I am using Jenkins v1.564 with the Amazon EC2 Plugin and set-up 2x AMIs. The first AMI has the label small and the second AMI has the label large. Both AMIs have the Usage setting set to Utilize this node as much as possible.
Now, I have created 2x jobs. The first job has Restrict where this project can be run set to small. The second job, similarly, set to large.
So then I trigger a build of the first job. No slaves were previously running, so the plugin fires up a small slave. I then trigger a build of the second job, and it waits endlessly for the slave with the message All nodes of label `large' are offline.
I would have expected the plugin to fire up a large node since no nodes of that label are running. Clearly I'm misunderstanding something. I have gone over the plugin documentation but clearly I'm not getting it.
Any feedback or pointers to documentation that explains this would be much appreciated.
Are the two machine configurations using the same image? If so, you're probably running into this: https://issues.jenkins-ci.org/browse/JENKINS-19845
The EC2 plugin counts the number of instances based on
Found there's an Instance Cap setting in Manage Jenkins -> Configure System under Advanced for the EC2 module, which limits how many instances can be launched by the plug-in at any one time. It was set to 2. Still odd as I only had one instance running and it wasn't starting another one (so maybe the limit is "less than" ). Anyway, increasing the cap to a higher number made the instance fire up.
Can a Hadoop Yarn instance manage nodes from different places on Earth, networks? Can it manage nodes that use different platforms?
Every note about Yarn I found tells that Yarn manages clusters, but if the app I deploy is written in Java then it should probably work on the nodes regardless of the nodes' hardware.
Similarly, Yarn seems general enough to support more than just a LAN.
YARN is not platform aware. It is also not aware about how application processes on different hosts communicate with each other to perform the work.
In the same time for YARN application master should be run as a command line - and thereof any node on the cluster with enough resources should be able to run it.
If not every platform is capable to run specific app master- then YARN should be aware on it. Today it can not, but I can imegine platform to be special kind of resource - and then YARN will select appropriate node
Regarding LAN if you have application master which knows how to manage job over several LAN - it is should be fine with YARN.