I think, couchbase is a great software that support high availability.
I consider that configuring couchbase dual cluster in the same hosts.
Let application using ClusterA and ClusterB connected to XDCR(DCP) with ClusterA.
Server#1 Server#2 Server#3 Server#4
========== ========== ========== ==========
ClusterA-1 ClusterA-2 ClusterA-3 ClusterA-4
---------- ---------- ---------- ----------
ClusterB-1 ClusterB-2 ClusterB-3 ClusterB-4
========== ========== ========== ==========
I hope that when all node of ClusterA crashed same time, then ClusterB fail-over ClusterA and processing requests for application.
Is it possible?
Thanks for reading and I'm sorry my poor english.
This would probably not be a good idea, not without some sort of VM hypervisor or container (e.g. Docker). Couchbase really is meant to be a shared nothing architecture if you can. If you put it on the same nodes without a hypervisor or container then it will definitely be fighting for server resources (RAM, CPU, Disk, Network) but with no intelligent way to manage them as Couchbase is not designed to run like you are proposing. That being said, you might be better off for HA purposes to purchase an enterprise license and get the Rack/Zone Awareness (RZA) feature that it has and only have one cluster. If you are going to use XDCR, you'd really want to have separate servers or instances to be replicating to and if they are on the same VM cluster, it really might not get you what you are trying to achieve.
The RZA feature allows you designate nodes in a cluster as being in a server group. So if you have racks of servers, VM Hosts or zones in a cloud deployment, you can designate those in Couchbase and it will make sure to distribute the replica data from the nodes in one sever group to nodes in another group. Then you could lose an entire rack, zone or vmhost and still have all your data and be back up and running quickly. For more information go to the Couchbase docs on RZA.
Related
Ive installed a single node greenplum db with 2 segment hosts , inside them residing 2 primary and mirror segments , and i want to configure a standby master , can anyone help me with it?
It is pretty simple.
gpinitstandby -s smdw -a
Note: If you are using one of the cloud Marketplaces that deploys Greenplum for you, the standby master runs on the first segment host. The overhead of running the standby master is pretty small so it doesn't impact performance. The cloud Marketplaces also have self-healing so if that nodes fails, it is replaced and all services are automatically restored.
As Jon said, this is fairly straightforward. Here is a link to the documentation: https://gpdb.docs.pivotal.io/5170/utility_guide/admin_utilities/gpinitstandby.html
If you have follow up questions, post them here.
Using dc/os we like to schedule tasks close to the data that the task requires that in our case is stored in hadoop/hdfs (on an HDP cluster). Issue is that the hadoop cluster is not run from within dc/os and so we are looking for a way to offer only a subset of the system resources.
For example: say we like to reserve 8GB of memory to data node services, then we like to provide the remainder to dc/os to schedule tasks.
From what i have read so far, the task can specify the resources it requires, but i have not found any means to specify what you want to offer from the node perspective.
I'm aware that a CDH cluster can be run on dc/os, that would be one way to go, but for now that is not provided for HDP.
Thanks for any idea's/tips,
Paul
I have three servers and I want to deploy Spark Standalone Cluster or Spark on Yarn Cluster on that servers.
Now I have some questions about how to allocate physical resources for a big data cluster. For example, i want to know whether i can deploy Spark Master Process and Spark Worker Process on the same node. Why?
Server Details:
CPU Cores: 24
Memory: 128GB
I need your help. Thanks.
Of course you can, just put host with Master in slaves. On my test server I have such configuration, master machine is also worker node and there is one worker-only node. Everything is ok
However be aware, that is worker will fail and cause major problem (i.e. system restart), then you will have problem, because also master will be afected.
Edit:
Some more info after question edit :) If you are using YARN (as suggested), you can use Dynamic Resource Allocation. Here are some slides about it and here article from MapR. It a very long topic how to configure memory properly for given case, I think that these resources will give you much knowledge about it
BTW. If you have already intalled Hadoop Cluster, maybe try YARN mode ;) But it's out of topic of question
I’m trying to define the architecture of an Hadoop cluster (about 10/15 nodes), mainly used for batch treatment (indicators pre-aggregation).
I identified 5 roles for my nodes: Master, Master HA, Slave, Service, Gateway and tried to distinguish the software component to install on them.
Here is the result. I'm not sure about it. Do the clients (Hive/Sqoop/Spark) have to be installed on the slave nodes?
Is this architecture look relevant to you? Is something missing? Thanks for the help!
Do the clients (Hive/Sqoop/Spark) have to be installed on the slave nodes?
Not really. Clients are nodes / servers which have the libraries to run your Hive/Sqoop/Spark jobs. As long as you submit you jobs from that machine there should be no problem.
According to me the architecture looks okay.
I am supposed to work on cluster mirroring where I have to set up the similar HDFS cluster (same master and slaves) as a existing one and copy the data to the new and then run the same jobs as is.
I have read about falcon as a feed processing and a work flow coordinating tool and it is used for mirroring of HDFS clusters as well. Can someone enlighten me on what is Falcon's role in Hadoop ecosystem and how does it help in mirroring in particular. I am looking here to understand what all facon offers when it is part of my Hadoop eco-system (HDP).
Apache Falcon simplifies the configuration of data motion with: replication; lifecycle management; lineage and traceability. This provides data governance consistency across Hadoop components.
Falcon replication is asynchronous with delta changes. Recovery is done by running a process and swapping the source and target.
Data loss – Delta data may be lost if the primary cluster is completely shut down
Backup can be scheduled when needed depending on the bandwidth and network availability.