Automating Rundeck in a CD platform- adding nodes to rundeck - continuous-integration

I am exploring Rundeck for my Continuous Delivery platform. The challenge I could foresee here is automating the rundeck itself - adding the nodes to the Rundeck whenever a new node/vm get created.
I thought of creating the vm with the the public keys of my rundeck server and adding the vm details into the resources file [~/rundeck/projects/../resources.xml]. But its an inefficient approach as I have to manage the resources.xml file by removing the entries each time a vm is deleted. I am primary depending on chef for infrastructure provisioning, getting the node inventory from the chef seems like a viable solution but it adds more overhead and delays in the workflow.
It would be great if I could get some simple/clean suggestions for solving the problem.

As suggested you could download and use chef-rundeck gem from the below link.
https://github.com/oswaldlabs/chef-rundeck
But if you need audit information on the nodes like who added a node or who deleted a node or when the changes in node info occurred, I would suggest maintaining a node info file in SVN or Git and use the URL source option.

This is supported in the Poise Rundeck cookbook via the rundeck_node_source_file resource.

Related

hazelcast-jet deployment and data ingestion

I have a distributed system running on AWS EC2 instances. My cluster has around 2000 nodes. I want to introduce a stream processing model which can process metadata being periodically published by each node (cpu usage, memory usage, IO and etc..). My system only cares about the latest data. It is also OK with missing a couple of data points when the processing model is down. Thus, I picked hazelcast-jet which is an in-memory processing model with great performance. Here I have a couple of questions regarding the model:
What is the best way to deploy hazelcast-jet to multiple ec2 instances?
How to ingest data from thousands of sources? The sources push data instead of being pulled.
How to config client so that it knows where to submit the tasks?
It would be super useful if there is a comprehensive example where I can learn from.
What is the best way to deploy hazelcast-jet to multiple ec2 instances?
Download and unzip the Hazelcast Jet distribution on each machine:
$ wget https://download.hazelcast.com/jet/hazelcast-jet-3.1.zip
$ unzip hazelcast-jet-3.1.zip
$ cd hazelcast-jet-3.1
Go to the lib directory of the unzipped distribution and download the hazelcast-aws module:
$ cd lib
$ wget https://repo1.maven.org/maven2/com/hazelcast/hazelcast-aws/2.4/hazelcast-aws-2.4.jar
Edit bin/common.sh to add the module to the classpath. Towards the end of the file is a line
CLASSPATH="$JET_HOME/lib/hazelcast-jet-3.1.jar:$CLASSPATH"
You can duplicate this line and replace -jet-3.1 with -aws-2.4.
Edit config/hazelcast.xml to enable the AWS cluster discovery. The details are here. In this step you'll have to deal with IAM roles, EC2 security groups, regions, etc. There's also a best practices guide for AWS deployment.
Start the cluster with jet-start.sh.
How to config client so that it knows where to submit the tasks?
A straightforward approach is to specify the public IPs of the machines where Jet is running, for example:
ClientConfig clientConfig = new ClientConfig();
clientConfig.getGroupConfig().setName("jet");
clientConfig.addAddress("54.224.63.209", "34.239.139.244");
However, depending on your AWS setup, these may not be stable, so you can configure to discover them as well. This is explained here.
How to ingest data from thousands of sources? The sources push data instead of being pulled.
I think your best option for this is to put the data into a Hazelcast Map, and use a mapJournal source to get the update events from it.

Ansible vs Puppet - Agent "check-in"

When it comes to Ansible vs Puppet, what's the difference when the nodes are receiving their configuration?
I know that the Puppet agent checks in every 30 min to get their configuration.
How is this for Ansible?
Puppet agent runs every 30 minutes by default making sure the state of
the checked in node (server) is in the desired (described) state.
Ansible doesn’t have that mechanism so if you want a scheduler you
need to look at Ansible Tower which has recently become Open Source.
Puppet vs Ansible
Ansible works on push model whereas puppet works on pull model i.e the nodes are pulling the configuration from the puppet master

How can i limit cron to one node

When adding a node manually or via automatic horizontal scaling master node will be cloned. This means that also crontab will be cloned right?
How can i avoid, that a cron starts on two or more nodes simultaneously (which usually is not intended)?
There are few possible ways:
Your script can understand that it's not the master node, so it can remove itself from the cron or just do nothing. Each node has information about the master node in the layer/nodeGroup.
env | grep MASTER
MASTER_IP=172.25.2.1
MASTER_HOST=node153580
MASTER_ID=153580
Disable cron via Cloud Scripting at onAfterScaleOut. Here is an example how to use this event.
Deploy software templates as custom docker images (even if you use a certified Jelastic template). Such images are not cloned during horizontal scaling, they are created from scratch.

How Ansible make sure that all the remote nodes are in desired state as per the template/playbook

Suppose after running a playbook, if someone tried to change the configuration on one/many of the node(s) managed by Ansible. So how does Ansible come to know that one/many of his managed node(s) is/are out of sync and sync it properly to desired state.
I presume we have this in other automation platforms like Chef and Puppet in which the remote agent runs periodically to be in sync with the master server template.
Also what are the best practices to do so.
Ansible doesn't manage anything by itself. It is a tool to automate tasks.
And it is agentless, so no way to get state updates from remote hosts by their will.
You may want to read about Ansible Tower. Excerpt from features list:
Set up occasional tasks like nightly backups, periodic configuration remediation for compliance, or a full continuous delivery pipeline with just a few clicks.

Configuring AWS cluster using automation script

We are looking for the possibility of an automation script which we can give how many master and data nodes we need and it would configure a cluster. Probably giving the credentials in a properties file.
Currently our approach is to login to the console and configure the Hadoop cluster. It would be great if there could be an automated way around it.
I've seen this done very nicely using Foreman, Chef, and Ambari Blueprints. Foreman was used to provision the VMs, Chef scripts were used to install Ambari, configure the Ambari blueprint, and to create the cluster using the Blueprint.

Resources