rethinkdb cluster, what if some of servers are down?

rethinkdb cluster, what if some of servers are down? - rethinkdb

I have 5 servers. On my first "primary" I have in config:
join=ip2:port
join=ip3:port
join=ip4:port
join=ip5:port
I am connection to rethinkdb via proxy:
proxy --join ip1:port --join ip2:port
When I stop rethinkdb on ip1 everything stops. I do not know how to solve this. Rethinkdb docs are not complete. Do I have to define this joins in every config?
UPDATE
In fact when I stop any server in cluster my app crash! I am getting in webui something like "Table db.table is available for outdated reads, but not up-to-date reads or writes."
Except table shards I do not see point.

Yes, you usually want every node to know the address of every other node so that they can connect to each other if any subset of the nodes is down.

Related

Distributing data read from GetMongo in a nifi cluster

I have a clustered nifi setup and we are running GetMongo processor with the Primary mode on, so that duplicate data is not fetched. This seems to be working fine. However once I have this data I want the following processes in the chain to run on a cluster, as in parallel processing to be done on this data which has been fetched. Somehow this is not happening. So my question is below assuming GetMongo has fetched 30000 records and they are in the queue:
1) How do I check whether a processor is running its process on a single node or on all nodes. The config has been set to all nodes, but when the processor is running I see it displays 1 in the top right corner.
2) If one processor has been set to run only on primary node, do all other processors in the flow also run on Primary mode?
Example:
In the screenshot above, my getmongo is running in primary node, how do I make sure that the execute script processor runs in parallel on all 3 nifi nodes. As of now if I check the view status history in the executescript process I see data flowing only through the primary node.

Yes, that's correct. When you mark the source processor to run only the Primary Node, all the subsequent steps will only happen on that node alone since the data is residing only that node (primary node), even when you have the NiFi in a clustered mode. To make it work the way you want, you can follow either of the following two approaches:
Approach #1 : Comibination of RPG and Site-To-Site
Here your flow will look like this:
Create an Input Port on the Root Group (the very top level of the NiFi canvas)
Make GetMongo run only on Primary Node.
Connect the success relationship of the processor to a Remote Processor Group (RPG). This RPG can be configured with the cluster details itself and configure it to connect to the port you added in step #1.
From the input port, connect it to your processing logic.
Useful Links:
https://pierrevillard.com/2017/02/23/listfetch-pattern-and-remote-process-group-in-apache-nifi/
This is cumbersome and would make your flow very complex but this how it has to be done, till NiFi 1.8. With NiFi 1.8, you can use the following approach.
Approach #2 : Load-Balanced Connections (Apache NiFi 1.8+)
Apache NiFi had a new release - 1.8, a week ago. With this release, there is a new feature (a long time coming and very much desired one) was introduced. It is called Load-Balanced Connections.
In this approach, you can simply ignore the RPG/Site-To-Site combination and rather do the following:
Connect the output of your source processor, in this case GetMongo with the subsequent processors.
Right click the success relationship of the source processor.
Click configure
Go to Settings tab
Set the Load Balance Strategy to the desired one, preferably Roudd robin in your case.
Useful Links:
https://blogs.apache.org/nifi/entry/load-balancing-across-the-cluster
https://pierrevillard.com/2018/10/29/nifi-1-8-revolutionizing-the-list-fetch-pattern-and-more/

Multidatacenter Replication with Rethinkdb

I have two servers in two different geographic locations (alfa1 and alfa2).
r.tableCreate('dados', {shards:1, replicas:{alfa1:1, alfa2:1}, primaryReplicaTag:'alfa1'})
I need to be able to write for both servers, but when I try to shutdown alfa1, and write to alfa2, rethinkdb only allow reads: Table test.dados is available for outdated reads, but not up-to-date reads or writes.
I need a way to write for all replicas, not only for Primary.
Is this possible ? Does rethinkdb allow multidatacenter replication ?
I think that multidatacenter replication need to permit write for both datacenters.
I tried to remove "primaryReplicaTag" but system don't accept !
Any help is welcome !!!

RethinkDB does support multi-datacenter replication/sharding.
I think the problem here is that you've setup a cluster of two, which means that when one fails you only have 50% of the nodes in the cluster which means you have less than 51%.
From the failover docs - https://rethinkdb.com/docs/failover/
To perform automatic failover for a table, the following requirements
must be met:
The cluster must have three or more servers
The table must be configured to have three or more replicas
A majority (greater thanhalf) of replicas for the table must be available
Try adding just one additional server and your problems should be resolved.

Rethink DB Cross Cluster Replication

I have 3 different pool of clients in 3 different geographical locations.
I need configure Rethinkdb with 3 different clusters and replicate data between the (insert, update and deletes). I do not want to use shard, only replication.
I didn't found in documentation if this is possible.
I didn't found in documentation how to configure multi-cluster replication.
Any help is appreciated.

I think that multi cluster is just same a single clusters with nodes in different data center
First, you need to setup a cluster, follow this document: http://www.rethinkdb.com/docs/start-a-server/#a-rethinkdb-cluster-using-multiple-machines
Basically using below command to join a node into cluster:
rethinkdb --join IP_OF_FIRST_MACHINE:29015 --bind all
Once you have your cluster setup, the rest is easy. Go to your admin ui, select the table, in "Sharding and replication", click Reconfigure and enter how many replication you want, just keep shard at 1.
You can also read more about Sharding and Replication at http://rethinkdb.com/docs/sharding-and-replication/#sharding-and-replication-via-the-web-console

Prevent unknown nodes joining cluster

I am new to Cassandra. When i tried to set up a Cassandra cluster, i noticed that any node can join the cluster if it has the IP address of seed nodes, then your data can be seen by any one. How to prevent this? (i thought about requiring password before joining but don't know how to do).

use kerberos!! check Datastax documentation - http://www.datastax.com/docs/datastax_enterprise3.0/security/security_setup_kerberos

Poor write Performance by HBase client

I'm using HBase client in my application server (-cum web-server) with HBase
cluster setup of 6 nodes using CDH3u4 (HBase-0.90). HBase/Hadoop services
running on cluster are:
NODENAME-- ROLE
Node1 -- NameNode
Node2 -- RegionServer, SecondaryNameNode, DataNode, Master
Node3 -- RegionServer, DataNode, Zookeeper
Node4 -- RegionServer, DataNode, Zookeeper
Node5 -- RegionServer, DataNode, Zookeeper
Node6 -- Cloudera Manager, RegionServer, DataNode
I'm using following optimizations for my HBase client:
auto-flush = false
ClearbufferOnFail=true
HTable bufferSize = 12MB
Put setWriteToWAL = false (I'm fine with loss of 1 data).
In order to be closely consistent between read and write, I'm calling
flush-commits on all the buffered tables at every 2 sec.
In my application, I place the HBase write call in a Queue (async manner) and
draining the queue using 20 Consumer threads. On hitting web-server locally
using curl, I'm able to see TPS of 2500 for HBase after curl completes, but
with Load-test where request is coming at high rate of 1200 hits per second
on 3 application servers,the Consumer(drain) threads which are responsible to
write to HBase are not writing data at a rate comparable to input rate. I'm
seeing not more than 600 TPS when request rate is 1200 hits per second.
Can anyone suggest what we can do to improve performance? I've tried with
reduced threads to 7 on each of 3 app server but still no effect. An expert
opinion would be helpful. As this is a production server, so not thinking
to swap the roles, unless someone point severe performance benefit.
[EDIT]:
Just to highlight/clarify our HBase writing pattern, our 1st Transaction checks the row in Table-A (using HTable.exists). It fails to find the row first time and so write to three tables. Subsequent 4 Transaction make exist check on Table-A and as it finds the row, it writes only to 1 Table.

So that's a pretty ancient version of HBase. As of Aug 18, 2013, I would recommend upgrading to something based off of 0.94.x.
Other than that it's really hard to tell you for sure. There are lots of tuning knobs. You should :
Make sure that HDFS has enough xceivers.
Make sure that HBase has enough heap space.
Make sure there is no swapping
Make sure there are enough handlers.
Make sure that you have compression turned on. [1]
Check disk io
Make sure that your row keys, column family names, column qualifiers, and values are as small as possible
Make sure that your writes are well distributed across your key space'
Make sure your regions are (pre-)split
If you're on a recent version then you might want to look at encoding [2]
After all of those things are taken care of then you can start looking at logs and jstacks.
https://hbase.apache.org/book/compression.html
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.html

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio