I am new to Cassandra. When i tried to set up a Cassandra cluster, i noticed that any node can join the cluster if it has the IP address of seed nodes, then your data can be seen by any one. How to prevent this? (i thought about requiring password before joining but don't know how to do).
use kerberos!! check Datastax documentation - http://www.datastax.com/docs/datastax_enterprise3.0/security/security_setup_kerberos
Related
I have a 3 node nifi cluster, using embedded Zookeeper. Now I need to add a new node to the cluster. What is the procedure for the same?
Will I have to bring down all the nodes, since I need to add an entry to the /nifi/conf/zookeeper.conf(server=) and also to the list of zookeeper servers in /nifi/conf/nifi.properties(nifi.zookeeper.connect.string). As far as I understand, these both will have to be edited on the existing 3 as well as the new box. But dont think this is the right way, since each time the cluster will have to be brought down. Can someone please help out?
Usually you would have three or five ZooKeeper nodes, so if you already have three then you don't really need to add a fourth. You could add another NiFi node that uses the existing embedded ZooKeepers on the other nodes.
I'm using NIFI in a clustered mode with two nodes, and I have noticed that only one node that do all the work.
Any idea why is that ? and how can I make nifi2 do some of the processing of the dataflow ?
It depends how data is coming in to your cluster. It is up to you as the data flow designer to create an approach that allows the data to be partitioned across your cluster for processing.
See this post for an overview of strategies to do this:
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
I have 5 servers. On my first "primary" I have in config:
join=ip2:port
join=ip3:port
join=ip4:port
join=ip5:port
I am connection to rethinkdb via proxy:
proxy --join ip1:port --join ip2:port
When I stop rethinkdb on ip1 everything stops. I do not know how to solve this. Rethinkdb docs are not complete. Do I have to define this joins in every config?
UPDATE
In fact when I stop any server in cluster my app crash! I am getting in webui something like "Table db.table is available for outdated reads, but not up-to-date reads or writes."
Except table shards I do not see point.
Yes, you usually want every node to know the address of every other node so that they can connect to each other if any subset of the nodes is down.
I have two servers in two different geographic locations (alfa1 and alfa2).
r.tableCreate('dados', {shards:1, replicas:{alfa1:1, alfa2:1}, primaryReplicaTag:'alfa1'})
I need to be able to write for both servers, but when I try to shutdown alfa1, and write to alfa2, rethinkdb only allow reads: Table test.dados is available for outdated reads, but not up-to-date reads or writes.
I need a way to write for all replicas, not only for Primary.
Is this possible ? Does rethinkdb allow multidatacenter replication ?
I think that multidatacenter replication need to permit write for both datacenters.
I tried to remove "primaryReplicaTag" but system don't accept !
Any help is welcome !!!
RethinkDB does support multi-datacenter replication/sharding.
I think the problem here is that you've setup a cluster of two, which means that when one fails you only have 50% of the nodes in the cluster which means you have less than 51%.
From the failover docs - https://rethinkdb.com/docs/failover/
To perform automatic failover for a table, the following requirements
must be met:
The cluster must have three or more servers
The table must be configured to have three or more replicas
A majority (greater thanhalf) of replicas for the table must be available
Try adding just one additional server and your problems should be resolved.
I have 3 different pool of clients in 3 different geographical locations.
I need configure Rethinkdb with 3 different clusters and replicate data between the (insert, update and deletes). I do not want to use shard, only replication.
I didn't found in documentation if this is possible.
I didn't found in documentation how to configure multi-cluster replication.
Any help is appreciated.
I think that multi cluster is just same a single clusters with nodes in different data center
First, you need to setup a cluster, follow this document: http://www.rethinkdb.com/docs/start-a-server/#a-rethinkdb-cluster-using-multiple-machines
Basically using below command to join a node into cluster:
rethinkdb --join IP_OF_FIRST_MACHINE:29015 --bind all
Once you have your cluster setup, the rest is easy. Go to your admin ui, select the table, in "Sharding and replication", click Reconfigure and enter how many replication you want, just keep shard at 1.
You can also read more about Sharding and Replication at http://rethinkdb.com/docs/sharding-and-replication/#sharding-and-replication-via-the-web-console