ARM template deployment effect on already running MemSQL Instance - ansible

I have an ARM template to deploy all of my resources as the production setup. Initially when I used to deploy the resources through the ARM template, everything was fine and our setup is up and running. We did started a MemSQL cluster into the setup. We have an application running using the MemSQL cluster. Now, we have some changes to the setup, specifically to the Ubuntu VMs, where MemSQL is running, like adding disks, assigning private IP etc.
The question what I have here is, does any changes made through the ARM deployment, effects the, in memory data for the running application on the VMs, specifically the MemSQL data?

the answer is - it depends. it wouldn't delete any data (unless you are using complete deployment mode). but it might (depending on what you are doing) detach data disks, or reboot the vms (if you change the sku, for example), if you change private ip addresses cluster might fall apart, as the nodes won't be able to talk to each other, etc.
I'd recommend having a dev cluster where you test your ARM templates and then you can apply the changes to the prod cluster

Related

How to setup local development environment for Nomad+Consul Service Mesh

As per Hashicorp documentation on Nomad+Consul, consul service mesh cannot be run on MacOS/Windows, since it does not support bridge network.
https://www.nomadproject.io/docs/integrations/consul-connect
What is the recommended way to setup a local development environment for Nomad+Consul?
I'd suggest to have a look at setting up your local environment using Vagrant (which is also a product for Hashicorp) and Virtual box. There are plenty examples online, for example
Here is one of the most recent setup with Nomad and Consul, although it is not parametrised much.
Here is one with the core Hashicorp stack, i.e. Nomad, Vault and Consul. This repo is quite old but it merely means that it uses old versions of binaries, which should be easy to update.
Here is one with only Vault and Consul, but you can add Nomad in a similar way. In fact, this Vargrant setup and how files are structured seems to me pretty close to the one above
I've run the first two previous week with a simple
vagrant up
and it worked almost like a charm. I think, I needed to upgrade my VirtualBox and maybe run vagrant up multiple times because of some weird run time errors which I didn't want to debug)
Once Vagrant finishes build you can
vagrant ssh
to get inside created VM, although configs are setup with mounting volumes/syncing files and all UI components are also exposed at the default ports.

Any open-souce software for me to manage big-data cluster including hadoop/hive/spark/?

I am looking for an open-source system for me to manage my big-data cluster which is composed of 50+ machines including components like hadoop, hdfs, hive, spark, oozie, hbase, zookeeper, kylin.
I want to manage them in a web system .The meaning of "manage" is :
I can restart the component one-by-one with only one click ,such
as when I click the "restart" button ,the component zookeeper will
be restarted one machine by another
I can deploy a component with only one click, such as when I
deploy a new zookeeper , I can make a compiled zookeeper prepared in
one machine ,then I click "deploy", it will deployed to all machines
automatically.
I can upgrade a component with only one click ,such as when I
want to update a zookeeper cluster, I can put the updated zookeeper
in a machine ,then I click "update" ,then the updated zookeeper will
override all the old version of zookeeper in other machines.
all in all , what I want is a management system for my big-data cluster like restart,deploy,upgrade,view the log ,modify the configuration and so on , or at least some of them .
I have considered Ambari, but it can only be used to deploy my whole system from absolute scratch, but my big-data cluster is already running for 1 years.
Any suggestions?
Ambari is what you want. It's the only open source solution for managing hadoop stacks that meets your listed requirements. You are correct that it doesn't work with already provisioned clusters, this is because to achieve such a tight integration with all those services it must know how they were provisioned and where everything is and know what configurations exist for each. The only way Ambari will know that is if it was used to provision those services.
Investing the time to recreate your cluster with Ambari may feel like its painful but in the long run it will payoff due to the added ability to upgrade and manage services so easily going forward.

Strategy to persist the node's data for dynamic Elasticsearch clusters

I'm sorry that this is probably a kind of broad question, but I didn't find a solution form this problem yet.
I try to run an Elasticsearch cluster on Mesos through Marathon with Docker containers. Therefore, I built a Docker image that can start on Marathon and dynamically scale via either the frontend or the API.
This works great for test setups, but the question remains how to persist the data so that if either the cluster is scaled down (I know this is also about the index configuration itself) or stopped, and I want to restart later (or scale up) with the same data.
The thing is that Marathon decides where (on which Mesos Slave) the nodes are run, so from my point of view it's not predictable if the all data is available to the "new" nodes upon restart when I try to persist the data to the Docker hosts via Docker volumes.
The only things that comes to my mind are:
Using a distributed file system like HDFS or NFS, with mounted volumes either on the Docker host or the Docker images themselves. Still, that would leave the question how to load all data during the new cluster startup if the "old" cluster had for example 8 nodes, and the new one only has 4.
Using the Snapshot API of Elasticsearch to save to a common drive somewhere in the network. I assume that this will have performance penalties...
Are there any other way to approach this? Are there any recommendations? Unfortunately, I didn't find a good resource about this kind of topic. Thanks a lot in advance.
Elasticsearch and NFS are not the best of pals ;-). You don't want to run your cluster on NFS, it's much too slow and Elasticsearch works better when the speed of the storage is better. If you introduce the network in this equation you'll get into trouble. I have no idea about Docker or Mesos. But for sure I recommend against NFS. Use snapshot/restore.
The first snapshot will take some time, but the rest of the snapshots should take less space and less time. Also, note that "incremental" means incremental at file level, not document level.
The snapshot itself needs all the nodes that have the primaries of the indices you want snapshoted. And those nodes all need access to the common location (the repository) so that they can write to. This common access to the same location usually is not that obvious, that's why I'm mentioning it.
The best way to run Elasticsearch on Mesos is to use a specialized Mesos framework. The first effort is this area is https://github.com/mesosphere/elasticsearch-mesos. There is a more recent project, which is, AFAIK, currently under development: https://github.com/mesos/elasticsearch. I don't know what is the status, but you may want to give it a try.

How to deploy a Cassandra cluster on two ec2 machines?

It's a known fact that it is not possible to create a cluster in a single machine by changing ports. The workaround is to add virtual Ethernet devices to our machine and use these to configure the cluster.
I want to deploy a cluster of , let's say 6 nodes, on two ec2 instances. That means, 3 nodes on each machine. Is it possible? What should be the seed nodes address, if it's possible?
Is it a good idea for production?
You can use Datastax AMI on AWS. Datastax Enterprise is a suitable solution for production.
I am not sure about your cluster, because each node need its own config files and it is default. I have no idea how to change it.
There are simple instructions here. When you configure instances settings, you have to write advanced settings for cluster, like --clustername yourCluster --totalnodes 6 --version community etc. You also can install Cassandra manually by installing latest version java and cassandra.
You can build cluster by modifying /etc/cassandra/cassandra.yaml (Ubuntu 12.04) fields like cluster_name, seeds, listener_address, rpc_broadcast and token. Cluster_name have to be same for whole cluster. Seed is master node, which IP you should add for every node. I am confused about tokens

How make a cluster of CoreOS on my local infrastructure?

I have some professional servers, and I want to create a cluster of 7-15 machines with CoreOS. I'm a little familiar with Proxmox, but I'm not clear about how create a virtual machine(VM) with CoreOS on proxmox. Also, I'm not sure if the idea of cluster of CoreOS's VM on proxmox it's right to do.
Then, I need:
How create a VM with CoreOS on proxmox.
If will be viable proxmox to create CoreOS's cluster.
I have no experience with Proxmox, but if you can make an image that runs then you can use it to stamp out the cluster. What you'd need to do is boot the ISO, run the installer and then make an image of that. Be sure to delete /etc/machine-id before you create the image.
CoreOS uses cloud-config to connect the machines together and configure a few parameters related to networking -- basically anything to get the machines talking to the cluster. A cloud-config file should be provided as a config-drive image, which is basically like mounting a CD-ROM to the VM. You'll have to check the docs on Proxmox to see if it supports that. More info here: http://coreos.com/docs/cluster-management/setup/cloudinit-config-drive/
The other option you have is to skip the VMs altogether and instead of using Proxmox, just boot CoreOS directly on your hardware. You can do this by booting the ISO and installing or doing something like iPXE: http://coreos.com/docs/running-coreos/bare-metal/booting-with-ipxe/

Resources