So, we have a Redis cluster mode enabled up and running in an EC2 instance(can't use the AWS managed one) and to connect to it from our internal network we are announcing our IP and Port using cluster-announce-ip,cluster-announce-port and cluster-bus-port where the IP announced is accessible from our network.
It seems to be working fine but not stable i.e. it keeps on switching between the IP provided and the loopback address, see below:
internal:36379> cluster nodes
eec09ffe56b05ad12b615b1d72fb6759f9c442dd internal:36379#40002 slave,fail b48e9381bfc8870317890483f3a610195a88c726 1580294719147 1580294718345 8 connected
ca0cb878becba2270cf00ec75be806304d561b0b internal:36379#40003 slave 3a5c9bc26bb3fbc7e850199320595946f3a6569a 1580294720250 1580294719347 9 connected
3a5c9bc26bb3fbc7e850199320595946f3a6569a internal:30006#40006 myself,master - 0 1580294720000 9 connected 10923-16383
33de5143f47674dd0fc636404fe4d7752d2cf9e2 internal:36379#40004 master - 1580294720651 1580294720151 7 connected 0-5460
b48e9381bfc8870317890483f3a610195a88c726 internal:36379#40005 master,fail - 1580294721253 1580294721153 8 connected 5461-10922
74cd3e1ededd204408e2dabce022bd08ab6b03b3 internal:36379#40001 slave,fail 33de5143f47674dd0fc636404fe4d7752d2cf9e2 1580215532177 1580215531374 7 connected
internal:36379>
internal:36379> cluster nodes
eec09ffe56b05ad12b615b1d72fb6759f9c442dd 127.0.0.1:30002#40002 slave b48e9381bfc8870317890483f3a610195a88c726 0 1580294727000 8 connected
ca0cb878becba2270cf00ec75be806304d561b0b 127.0.0.1:30003#40003 slave 3a5c9bc26bb3fbc7e850199320595946f3a6569a 0 1580294727381 9 connected
3a5c9bc26bb3fbc7e850199320595946f3a6569a internal:30006#40006 myself,master - 0 1580294726000 9 connected 10923-16383
33de5143f47674dd0fc636404fe4d7752d2cf9e2 internal:36379#40004 master,fail? - 1580294725274 1580294725000 7 connected 0-5460
b48e9381bfc8870317890483f3a610195a88c726 127.0.0.1:30005#40005 master,fail - 0 1580294727080 8 connected 5461-10922
74cd3e1ededd204408e2dabce022bd08ab6b03b3 internal:36379#40001 slave,fail 33de5143f47674dd0fc636404fe4d7752d2cf9e2 1580215532177 1580215531374 7 connected
internal:36379>
Where internal is one of our six internal IPs.Originally cluster is running on ports 30001-30006.We are able to set/get keys momentarily before it switches back to announcing the local address instead of our IP.
Any idea why this is not stable?
Related
Here is one deployment scenario wherein NAT exists between DC boundaries, following are the requirements:
Cassandra Version: 2.1.13
There are 2 DCs, two Cassandra nodes (dc1:node1 & dc2:node3) across DCs should communicate across NAT boundaries using public IP.
One of the DC which is behind NAT has 2 Cassandra nodes (dc1:node1 and dc1:node2) and both them should communicate within NAT using private IP.
All these 3 nodes (dc1:node1, dc1:node2 & dc2:node3) should form a ring and communicate with each other.
Looked into seeds, listen_address, broadcast_address & broadcast_rpc_address.
https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html
If there are public IPs used in broadcast_address and seeds then across DC & NAT communication works, however the nodes which are within NAT not able to detect each other.
If there are private IPs used in broadcast_address and seeds then within DC & NAT communication works, however the nodes across DC & NAT not able to detect each other.
Looked into Ec2MultiRegionSnitch but that will not work for premise deployments: https://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureSnitchEC2MultiRegion_c.html#architectureSnitchEC2MultiRegion_c__other-settings
What configuration settings will be required to achieve above 3 requirements?
Use gossiping property file snitch, set public ip as broadcast_address and private ip as listen_address. They will use the listen_address in same DC and broadcast address if in other DC.
Settings made with seeds=public address, listen_address=private and broadcast_address=public.
With these settings made on all 3 nodes:
dc1:node1 & dc2:node3 works but
dc1:node1 and dc1:node2 doesn't
Is it because of seeds have public address within DC behind NAT might nor work?
One of the observation to check listen on dc1:node1, private IP is listed:
node1# netstat -anp | grep -E "(7001)"
tcp 0 0 dc1:node1_privateIP:7001 0.0.0.0:* LISTEN 9999/java
Can dc1:node2 establish connection with dc1:node1_publicIP?
Is this https://issues.apache.org/jira/browse/CASSANDRA-9748 related here or will be only applicable in case of multiple NICS and not NAT environment?
To all those SymmetricDS nerds over there, this one's for you all.
Right, so we have a main db, DB-01. We have 3 instances of our application running namely R1,R2,R3. Each instance has its own in-memory db namely D1,D2,D3 which it(application) is accessing respectively. We are using SymmetricDS to do one-way sync from DB-01 to D1,D2,D3. So, there is a server node, corporate C0, pointing to DB-01 and 3 client nodes, stores S1,S2,S3 pointing to D1,D2,D3 respectively.
All is working fine.
But now, we would like to introduce High Availability and there by FAILOVER into this topology i.e., at any time there will be 2 server nodes running, say Master and Slave, that would be accessing the same DB-01. If Master server goes down, clients should automatically connect to the Slave node and continue operation.
What all might be the configuration changes required to accomplish this? Are there any examples or documentations that i can reproduce to understand this concept?
We do this via clustering with 2 SymmetricDS services running on 2 app servers pointing to the High Availability (HA) connections. Then all you need is HA connections to failover like normal and Symmetric DS clustering does the rest.
Link for the user manual on clustering.
https://www.symmetricds.org/doc/3.13/html/user-guide.html#_clustering
EDIT let me get some configs for you on here service 1:
engine.name=<SDS_SERVICE_1>
db.driver=net.sourceforge.jtds.jdbc.Driver
db.url=jdbc:jtds:sqlserver://<HA_connection1>:1433/<DB>;useCursors=true;bufferMaxMemory=10240;lobBuffer=5242880
db.user=***********
db.password=***********
registration.url=http://<IP>:7004/sync/<SDS_MAIN>
sync.url=http://<IP>:7004/sync/<SDS_SERVICE_1>
group.id=<GID>
external.id=100
auto.registration=true
initial.load.create.first=true
sync.table.prefix=sym
start.initial.load.extract.job=false
cluster.lock.enabled=true
cluster.server.id=11
cluster.lock.timeout.ms=600000
cluster.lock.refresh.ms=60000
compression.level=-1
compression.strategy=0
Service 2:
engine.name=<SDS_SERVICE_2>
db.driver=net.sourceforge.jtds.jdbc.Driver
db.url=jdbc:jtds:sqlserver://<HA_connection2>:1433/<DB>;useCursors=true;bufferMaxMemory=10240;lobBuffer=5242880
db.user=***********
db.password=***********
registration.url=http://<IP>:7004/sync/<SDS_MAIN>
sync.url=http://<IP>:7004/sync/<SDS_SERVICE_2>
group.id=<GID>
external.id=100
auto.registration=true
initial.load.create.first=true
sync.table.prefix=sym
start.initial.load.extract.job=false
cluster.lock.enabled=true
cluster.server.id=12
cluster.lock.timeout.ms=600000
cluster.lock.refresh.ms=60000
compression.level=-1
compression.strategy=0
I'm new to Cassandra and EC2 configuration.
I have configured 3 nodes in AWS EC2 instances with Cassandra 3.0 and all the three nodes are connected to each other .
Following things have been configured in .yaml fie.
Broadcast_add: Private ip ec2 add of instance
seeds : public ip add of all the three nodes.
rpc_add : blank
When I try to connect to this cluster from Datastax dev centre it shows only connected to one node. When individually connecting to all the 3 ip's it gets connected to all the nodes. But when connecting to cluster with 3 ip's in connection file, it connects to only one node.
Could any one help with this issue ?
Thanks
Uttkarsh
open cassandra.yaml file and change the
1) listen_address :- private IP
2) broadcast_address :- blank
3) listen_on_broadcast_address:- true
4) rpc_address :- 0.0.0.0
5) broadcast_rpc_address :- public IP
6) seeds ip :- public IP for node.
it's working finally
Thanks Utpal
I've got the following configuration:
Redis_version:3.2.0
3 master nodes and 3 slave nodes
Each master node is replicated to a slave Everything is correct. When one master node fails by a "kill" command, the corresponding slave node becomes the master as expected. After few seconds, cluster_state returns to the OK state.
BUT, if two master nodes fail simultaneously, none of the associated slave nodes become the master. The cluster_state stays in "fail" state.
cluster nodes command output.
b60c284a515b31aa6b11022fc07cf1a399171e04 127.0.0.1:7000 master,fail? - 1464690455030 1464690454930 1 disconnected 0-5460
637d1f074419963653b206c5ed7cbed4c3d0ace0 127.0.0.1:7001 master,fail? - 1464690455030 1464690454930 2 disconnected 5461-10922
d2aae2a3d87c6407e002076740c8febf80f37865 127.0.0.1:7003 myself,slave b60c284a515b31aa6b11022fc07cf1a399171e04 0 0 4 connected
72d4c9ce140fb57436c1b21702bf3c646ef29db3 127.0.0.1:7002 master - 0 1464690718480 3 connected 10923-16383
af34a7b2241943baf23e634e81b552d8bf23cdd0 127.0.0.1:7005 slave 72d4c9ce140fb57436c1b21702bf3c646ef29db3 0 1464690718480 6 connected
d0fec0609c9e786ac9ca4629f36cabd7c5c3130c 127.0.0.1:7004 slave 637d1f074419963653b206c5ed7cbed4c3d0ace0 0 1464690718480 5 connected
The slave auto-failover won't happen when at least half of the masters get disconnected, because the failover election is required more than half of the masters come into consensus.
To start a manual failover, connect to the slave node with redis-cli and send a cluster failover TAKEOVER command (the takeover is required).
In your case
redis-cli -h 127.0.0.1 -p 7003 cluster failover takeover
After the :7003 becomes a master, the other slave will start an automatic failover as well since there are more than half (2/3) of the masters are alive.
We are trying to run a cassandra cluster on AWS/EC2 within a standard VPC footprint (cassandra nodes on private subnets). Because this is AWS there is always a chance that an EC2 instance will terminate or reboot with no warning. I have been simulating this case on a test cluster and I am seeing things with the cluster that I thought a cluster was suppose to prevent. Specifically if a node reboots some data will go temporarily missing until the node completes its reboot. If a node terminates it appears that some data is lost forever.
For my test I just did a bunch of writes (using QUORUM consistency) to some keyspaces then interrogate the contents of those keyspaces as I bring down nodes (either through reboot or terminate). I'm just using cqlsh SELECT to do the keyspace/column family interrogation of the cluster using ONE consistency level.
Note, even though I am performing no writes to the cluster while I am doing the SELECTs rows temporarily disappear when rebooting and can permanently go missing during termination.
I thought Netflix Priam might be able to help, but sadly it doesn't work in a VPC the last time I checked.
Also, because we are using ephemeral storage instances there is no equivalent of 'shutdown' so I cannot run any scripts during reboot/terminate of an instance to perform a nodetool decommission or nodetool removenode before an instance goes away. Terminate is the equivalent of kicking the plug out of the wall.
Since I am using a replication factor of 3 and quorum/write that should mean that all data is written to at least 2 nodes. So, unless I am totally misunderstanding things (which is possible), losing one node should not mean that I lose any data for any period of time when I am using consistency level ONE for the read.
Questions
Why wouldn't a 6 node cluster with a replication factor of 3 work?
Do I need to run something like a 12 node cluster with a replication factor of 7? Don't bother telling me that will fix the problem, because it doesn't.
Do I need to use consistency level of ALL on the writes then use ONE or QUORUM on the reads?
Is there something not quite right with virtual nodes? unlikely
Are there nodetool commands besides removenode that I need to run when a node terminates to recover missing data? As mentioned earlier, when a reboot occurs, eventually the missing data reappears.
Is there some cassandra savant who can look at my cassandra.yaml file below and send me on the path to salvation?
More Info added 7/19
I don't think this is a QUORUM vs ONE vs ALL is the issue. The test I set up performs no writes to the keyspaces after the initial population of the column families. So the data has had plenty of time (hours) to make it to all the nodes as required by the replication factor. Plus the test dataset is REALLY small (2 column families with about 300-1000 values each). So in other words, the data is completely static.
The behavior I am seeing seems to be tied to the fact that the ec2 instance is no longer on the network. The reason I say this is because if I log on to a node and just do a cassandra stop I see no loss of data. But if I do the reboot or terminate I start getting the following in a stack trace.
CassandraHostRetryService - Downed Host Retry service started with queue size -1 and retry delay 10s
CassandraHostRetryService - Downed Host retry shutdown complete
CassandraHostRetryService - Downed Host retry shutdown hook called
Caused by: TimedOutException()
Caused by: TimedOutException()
So it seems to be more of a networking communication issue in that the cluster is expecting, for example 10.0.12.74, to be on the network after it has joined the cluster. If that ip is suddenly unreachable either due to reboot or termination the timeouts start happening.
When I do a nodetool status under all three scenarios (cassandra stop, reboot or terminate) the status of the node shows up as DN. Which is what you would expect. Eventually nodetool status will return to UN with cassandra start or reboot, but obviously termination always stays DN.
Details of my Configuration
Here are some details of my configuration (cassandra.yaml is at the bottom of this posting):
Nodes are running in private subnets of a VPC.
Cassandra 1.2.5 with num_tokens: 256 (virtual nodes). initial_token: (blank). I am really hoping this works because all of our nodes run in autoscaling groups so the thought that redistribution could be handle dynamically is appealing.
EC2 m1.large one seed and one non-seed node in each availability zone. (so 6 total nodes in the cluster).
Ephemeral storage, not EBS.
Ec2Snitch with NetworkTopologyStrategy and all keyspaces have replication factor of 3.
Non-seed nodes are auto_bootstraped, seed nodes are not.
sample cassandra.yaml file
cluster_name: 'TestCluster'
num_tokens: 256
initial_token:
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
authorizer: org.apache.cassandra.auth.AllowAllAuthorizer
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
disk_failure_policy: stop
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
row_cache_provider: SerializingCacheProvider
saved_caches_directory: /opt/company/dbserver/caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "SEED_IP_LIST"
flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6
concurrent_reads: 32
concurrent_writes: 8
memtable_flush_queue_size: 4
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: LISTEN_ADDRESS
start_native_transport: false
native_transport_port: 9042
start_rpc: true
rpc_address: 0.0.0.0
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
incremental_backups: true
snapshot_before_compaction: false
auto_bootstrap: AUTO_BOOTSTRAP
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 16
compaction_preheat_key_cache: true
read_request_timeout_in_ms: 10000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 10000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
cross_node_timeout: false
endpoint_snitch: Ec2Snitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 128
server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra
client_encryption_options:
enabled: false
keystore: conf/.keystore
keystore_password: cassandra
internode_compression: all
I think http://www.datastax.com/documentation/cassandra/1.2/cassandra/dml/dml_config_consistency_c.html will clear up a lot of this. In particular, QUORUM/ONE is not guaranteed to return the most recent data. QUORUM/QUORUM is. So is ALL/ONE, but that will be intolerant to failure on write.
Edit to go with the new information:
CassandraHostRetryService is part of Hector. I assumed you were testing with cqlsh like a sane person would. Lessons:
Use cqlsh for testing
Use the DataStax Java Driver for building your application, which is faster, easier to use, and has more insight into the cluster state than Hector thanks to the native protocol it's built on.