Ive installed a single node greenplum db with 2 segment hosts , inside them residing 2 primary and mirror segments , and i want to configure a standby master , can anyone help me with it?
It is pretty simple.
gpinitstandby -s smdw -a
Note: If you are using one of the cloud Marketplaces that deploys Greenplum for you, the standby master runs on the first segment host. The overhead of running the standby master is pretty small so it doesn't impact performance. The cloud Marketplaces also have self-healing so if that nodes fails, it is replaced and all services are automatically restored.
As Jon said, this is fairly straightforward. Here is a link to the documentation: https://gpdb.docs.pivotal.io/5170/utility_guide/admin_utilities/gpinitstandby.html
If you have follow up questions, post them here.
Related
Hello enthusiastic people.
I am a student trying to learn Elastic stack.
I have 1 node installed on my local machine. I have also successfully installed beats on my other local machine to get data and deliver it to my logstash.
My question is, what if I add another node, do I still need to install kibana and elasticsearch? Then connect it from my first node?
I just read a lot that a single node is prone to data loss.
Sorry for my noob question.
Your answer is very appreciated.
Thanks in advance.
Having a cluster with at least 3 nodes would be good to ensure data security and integrity.
A cluster can have one or more nodes.
An example scenario:
It will be easier for you to install with docker during the learning and development process. I recommend you follow the link below. This link explains how to set up an elasticsearch cluster with 3 nodes on docker.
Start a multi-node cluster with Docker Compose
I've set up a ejabberd (v 15.04) cluster on AWS using 2 Ubuntu images. Whilst I am able to successfully cluster the two (using the command join_cluster from the 2nd node to the 1st node), I am not sure if the behavior is as expected... any thoughts would be much appreciated...
To detail the above, 2 different clients connected to the 2 nodes separately can communicate with each other. However, when I stop the server on the secondary node, I would still expect the two clients to be able to talk to each other. But instead, this 2nd client simply gets disconnected as the server is not running.
Is there something possibly that am overlooking here?
Many thanks!
join two node with join_as_master() method.
cluster code is available on github site.
for doing the Ejabberd Clustering I followed the steps from the link below :
Link : http://chadillac.tumblr.com/post/35967173942/easy-ejabberd-clustering-guide-mnesia-mysql
I hav done the Clustering with no mysql tables only mnesia database
Imp Note :
1) The ejabberd.yml file should be same as in the master host.
2) Copy the .erlang.cookies file frm the master to the slaves
3) The slave host name will be mentioned in the ejabberdctl.cfg which will different from that mentioned in yml file of the slave.
4) For my sql, as in we are creating a totally different machine ..no need to add into the cluster.
I'm sorry that this is probably a kind of broad question, but I didn't find a solution form this problem yet.
I try to run an Elasticsearch cluster on Mesos through Marathon with Docker containers. Therefore, I built a Docker image that can start on Marathon and dynamically scale via either the frontend or the API.
This works great for test setups, but the question remains how to persist the data so that if either the cluster is scaled down (I know this is also about the index configuration itself) or stopped, and I want to restart later (or scale up) with the same data.
The thing is that Marathon decides where (on which Mesos Slave) the nodes are run, so from my point of view it's not predictable if the all data is available to the "new" nodes upon restart when I try to persist the data to the Docker hosts via Docker volumes.
The only things that comes to my mind are:
Using a distributed file system like HDFS or NFS, with mounted volumes either on the Docker host or the Docker images themselves. Still, that would leave the question how to load all data during the new cluster startup if the "old" cluster had for example 8 nodes, and the new one only has 4.
Using the Snapshot API of Elasticsearch to save to a common drive somewhere in the network. I assume that this will have performance penalties...
Are there any other way to approach this? Are there any recommendations? Unfortunately, I didn't find a good resource about this kind of topic. Thanks a lot in advance.
Elasticsearch and NFS are not the best of pals ;-). You don't want to run your cluster on NFS, it's much too slow and Elasticsearch works better when the speed of the storage is better. If you introduce the network in this equation you'll get into trouble. I have no idea about Docker or Mesos. But for sure I recommend against NFS. Use snapshot/restore.
The first snapshot will take some time, but the rest of the snapshots should take less space and less time. Also, note that "incremental" means incremental at file level, not document level.
The snapshot itself needs all the nodes that have the primaries of the indices you want snapshoted. And those nodes all need access to the common location (the repository) so that they can write to. This common access to the same location usually is not that obvious, that's why I'm mentioning it.
The best way to run Elasticsearch on Mesos is to use a specialized Mesos framework. The first effort is this area is https://github.com/mesosphere/elasticsearch-mesos. There is a more recent project, which is, AFAIK, currently under development: https://github.com/mesos/elasticsearch. I don't know what is the status, but you may want to give it a try.
I have a Vertica instance running on our prod. Currently, we are taking regular backups of the database. I want to build a Master/Slave configuration for Vertica so that I always have the latest backup in case something goes bad. I tried to google but did not find much on this topic. Your help will be much appreciated.
There is no concept of a Master/Slave in Vertica. It seems that you are after a DR solution which would give you a standby instance if your primary goes down.
The standard practice with Vertica is to use a dual load solution which streams data into your primary and DR instances. The option you're currently using would require an identical standby system and take time to restore from your backup. Your other option is to do storage replication which is more expensive.
Take a look at the best practices for disaster recovery in the documentation.
I am new to cloudera, I installed cloudera in my system successfully I have two doubts,
Consider a machine with some nodes already using hadoop with some data, Can we install Cloudera to use the existing Hadoop without made any changes or modifaction on data stored existing hadooop.
I installed Cloudera in my machine, I have another three machines to add those as clusters, I want to know, Am i want install cloudera in those three machines before add those machines as clusters ?, or Can we add a node as clusters without installing cloudera on that purticular nodes?.
Thanks in advance can anyone, please give some information about the above questions.
Answer to questions -
1. If you want to migrate to CDH from existing Apache Distribution, you can follow this link
Excerpt:
Overview
The migration process does require a moderate understanding of Linux
system administration. You should make a plan before you start. You
will be restarting some critical services such as the name node and
job tracker, so some downtime is necessary. Given the value of the
data on your cluster, you’ll also want to be careful to take recent
back ups of any mission-critical data sets as well as the name node
meta-data.
Backing up your data is most important if you’re upgrading from a
version of Hadoop based on an Apache Software Foundation release
earlier than 0.20.
2.CDH binary needs be installed and configured in all the nodes to have a CDH based cluster up and running.
From the Cloudera Manual
You can migrate the data from a CDH3 (or any Apache Hadoop) cluster to a CDH4 cluster by
using a tool that copies out data in parallel, such as the DistCp tool
offered in CDH4.
Other sources
Regarding your second question,
Again from the manual page
Important:
Before proceeding, you need to decide:
As a general rule:
The NameNode and JobTracker run on the the same "master" host unless
the cluster is large (more than a few tens of nodes), and the master
host (or hosts) should not
run the Secondary NameNode (if used), DataNode or TaskTracker
services. In a large cluster, it is especially important that the
Secondary NameNode (if used) runs on a separate machine from the
NameNode. Each node in the cluster except the master host(s) should
run the DataNode and TaskTracker services.
Additionally, if you use Cloudera Manager it will automatically do all the setup necessary i.e install the necessary selected components on the nodes in the cluster.
Off-topic: I had a bad habit of not referrring the manual properly. Have a clear look at it, it answers all our questions
Answer to your second question,
you can add directly, with installation few pre requisites like openssh-clients and firewalls and java.
these machines( existing node, new three nodes) should accept same username and password (or) you should set passwordless ssh to these hosts..
you should connect to the internet while adding the nodes.
I hope it will help you:)