nodetool status in cassandra giving previous IP of nodes - amazon-ec2

I am trying to expand a 2 node Cassandra Cluster to a 4 node cluster with 4 Amazon EC2 instances. I have created the four nodes and made the following changes in cassandra.yaml file.
listen_address = 10.30.143.145
seeds = 10.30.143.145,10.159.58.234,10.170.31.252,10.158.52.84
endpoint_snitch: Ec2snitch
num_tokens: 256
I have replicated these changes across all 4 nodes. I had expanded from single node cluster to double node cluster by following this procedure. However, after configuring the 4 node cluster, when I do a ./nodetool status on first node, I get the following output:
ubuntu#ip-10-170-31-252:~/VIQ-Cloud/software/apache-cassandra-1.2.5/bin$ ./nodetool status
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.30.143.145 927.14 KB 256 16.1% 34d0424a-fe07-4047-a2a5-f45b9a0049d6 1a
UN 10.159.58.234 135.2 KB 256 15.7% 00308009-8755-4bce-906f-4eda53a31fc6 1a
UN 10.170.31.252 20.94 GB 256 20.9% a815f0de-64db-418c-97a3-9aa7be280280 1a
UN 10.158.52.84 311.33 KB 256 15.0% fc634f65-3cf3-4e24-a9a3-456adbd174e0 1a
Datacenter: UNKNOWN-DC
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
DN 10.170.22.214 ? 256 16.6% 253ee376-c49d-47a1-a321-4f155870c122 UNKNOWN-RACK
DN 10.31.131.35 ? 256 15.7% e94c0cb1-9635-42c9-8982-450271f7da1c UNKNOWN-RACK
The address of the two nodes shown as DN is that of the previous private IP's of the other EC2 nodes (each reboot of EC2 instance changes the private IPs).
But the other 3 nodes is giving me proper result for nodetool status. I am wondering where cassandra is picking the previous IP's from because , they I havent mentioned them in the cassandra.yaml file.
I followed the instructions here for adding new nodes.
Please advise on why this is happening.

If you haven't figured this out, I think the answer may be that you need to make sure you are using the Ec2 or Ec2Multiregion snitch in your cassandra.yaml on the new nodes. The fact that they got put in "UNKNOWN-DC" means the existing nodes don't know what DC to put them in.

Related

Datastax - Cassandra Amazon EC2 Multiregion Setup - Cluster with 3 node

I have launched 3 Amazon EC2 instance and setup datastax cassandra as follows
1.Region - US EAST:
cassandra.yaml - configuration
a.listen_address as private IP of this instance
b.broadcast_address as public IP of this instance
c.seeds as 50.XX.XX.X1, 50.XX.XX.X2 (public-ip of node1,public-ip of node2)
cassandra-rackdc.properties - configuration
dc=DC1
rack=RAC1
dc_suffix=US_EAST_1
2.Region - US WEST:
I did same procedure as I did above.
3.Region - EU IRELAND:
The result of above configuration is
All the node working good individually. But when I do
$nodetool status on all the three node
It only listing the local node only.
I tried to achieve the following things.
1. Launch 3 cassandra node in three different region. For say, US-EAST,US-WEST,EU-IRELAND.
With Following configuration or methodology
a.Ec2MultiRegionSnitch
b.Replication staragey as SimpleStrategy
c.Replication Factor as 3
d. Read & write level as QUORUM.
I wish to attain only one thing i.e. if any two of the region is down or any two of the node down, I can survive with renaming one node.
My Questions here are
Where I did the mistake? and How to attain my requirements?
Any help or inputs are much appreciated.
Thanks.
This is what worked for me with cassandra 3.0
endpoint_snitch: Ec2MultiRegionSnitch
listen_address: <leave_blank>
broadcast_address: <public_ip_of_server>
rpc_address: 0.0.0.0
broadcast_rpc_address: <public_ip_of_server>
-seed: "one_ip_from_other_DC"
Finally, I found the resolution of my issue. I am using replication strategy as SimpleStrategy, hence I do not require to configure cassandra-rackdc.properties.
Once, I removed the file cassandra-rackdc.properties from all node, Everything working as expected.
Thanks

Unable to add another node to existing node to form a cluster. Couldn't change num_tokens to vnodes

i have installed cassandra on two individual nodes both on Amazon.when i am trying to configure nodes to form a cluster the nodes. I am receiving the following error.
ERROR [main] 2016-05-12 11:01:26,402 CassandraDaemon.java:381 - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: Cannot change the number of tokens from 1 to 256.
I using these setting in cassandra.yaml file
listen_address and rpc_address to : private Ip address
seeds : Public Ip [Elastic Ip address]
num_tokens: 256
This message usually appears when num_tokens is changed after the node has been bootstrapped.
The solution is:
Stop Cassandra on all nodes
Delete the data directory (inc. datafiles, commitlog and saved_caches)
Double check that num_tokens is set to 256, initial_token is commented out and auto_bootstrap is set to true in cassandra.yaml
Start Cassandra on all nodes
This will wipe your existing cluster and cause the nodes to bootstrap from scratch again.
Cassandra doesn't support changing between vnodes and static tokens after a datacenter is bootstrapped. If you need to change from vnodes to static tokens or vice versa in an already running cluster, you'll need to create a second datacenter using the new configuration, stream your data across, and then decomission the original nodes.

Hector is unable to read Cassandra data when nodes reboot or terminate

We are trying to run a cassandra cluster on AWS/EC2 within a standard VPC footprint (cassandra nodes on private subnets). Because this is AWS there is always a chance that an EC2 instance will terminate or reboot with no warning. I have been simulating this case on a test cluster and I am seeing things with the cluster that I thought a cluster was suppose to prevent. Specifically if a node reboots some data will go temporarily missing until the node completes its reboot. If a node terminates it appears that some data is lost forever.
For my test I just did a bunch of writes (using QUORUM consistency) to some keyspaces then interrogate the contents of those keyspaces as I bring down nodes (either through reboot or terminate). I'm just using cqlsh SELECT to do the keyspace/column family interrogation of the cluster using ONE consistency level.
Note, even though I am performing no writes to the cluster while I am doing the SELECTs rows temporarily disappear when rebooting and can permanently go missing during termination.
I thought Netflix Priam might be able to help, but sadly it doesn't work in a VPC the last time I checked.
Also, because we are using ephemeral storage instances there is no equivalent of 'shutdown' so I cannot run any scripts during reboot/terminate of an instance to perform a nodetool decommission or nodetool removenode before an instance goes away. Terminate is the equivalent of kicking the plug out of the wall.
Since I am using a replication factor of 3 and quorum/write that should mean that all data is written to at least 2 nodes. So, unless I am totally misunderstanding things (which is possible), losing one node should not mean that I lose any data for any period of time when I am using consistency level ONE for the read.
Questions
Why wouldn't a 6 node cluster with a replication factor of 3 work?
Do I need to run something like a 12 node cluster with a replication factor of 7? Don't bother telling me that will fix the problem, because it doesn't.
Do I need to use consistency level of ALL on the writes then use ONE or QUORUM on the reads?
Is there something not quite right with virtual nodes? unlikely
Are there nodetool commands besides removenode that I need to run when a node terminates to recover missing data? As mentioned earlier, when a reboot occurs, eventually the missing data reappears.
Is there some cassandra savant who can look at my cassandra.yaml file below and send me on the path to salvation?
More Info added 7/19
I don't think this is a QUORUM vs ONE vs ALL is the issue. The test I set up performs no writes to the keyspaces after the initial population of the column families. So the data has had plenty of time (hours) to make it to all the nodes as required by the replication factor. Plus the test dataset is REALLY small (2 column families with about 300-1000 values each). So in other words, the data is completely static.
The behavior I am seeing seems to be tied to the fact that the ec2 instance is no longer on the network. The reason I say this is because if I log on to a node and just do a cassandra stop I see no loss of data. But if I do the reboot or terminate I start getting the following in a stack trace.
CassandraHostRetryService - Downed Host Retry service started with queue size -1 and retry delay 10s
CassandraHostRetryService - Downed Host retry shutdown complete
CassandraHostRetryService - Downed Host retry shutdown hook called
Caused by: TimedOutException()
Caused by: TimedOutException()
So it seems to be more of a networking communication issue in that the cluster is expecting, for example 10.0.12.74, to be on the network after it has joined the cluster. If that ip is suddenly unreachable either due to reboot or termination the timeouts start happening.
When I do a nodetool status under all three scenarios (cassandra stop, reboot or terminate) the status of the node shows up as DN. Which is what you would expect. Eventually nodetool status will return to UN with cassandra start or reboot, but obviously termination always stays DN.
Details of my Configuration
Here are some details of my configuration (cassandra.yaml is at the bottom of this posting):
Nodes are running in private subnets of a VPC.
Cassandra 1.2.5 with num_tokens: 256 (virtual nodes). initial_token: (blank). I am really hoping this works because all of our nodes run in autoscaling groups so the thought that redistribution could be handle dynamically is appealing.
EC2 m1.large one seed and one non-seed node in each availability zone. (so 6 total nodes in the cluster).
Ephemeral storage, not EBS.
Ec2Snitch with NetworkTopologyStrategy and all keyspaces have replication factor of 3.
Non-seed nodes are auto_bootstraped, seed nodes are not.
sample cassandra.yaml file
cluster_name: 'TestCluster'
num_tokens: 256
initial_token:
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
authorizer: org.apache.cassandra.auth.AllowAllAuthorizer
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
disk_failure_policy: stop
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
row_cache_provider: SerializingCacheProvider
saved_caches_directory: /opt/company/dbserver/caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "SEED_IP_LIST"
flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6
concurrent_reads: 32
concurrent_writes: 8
memtable_flush_queue_size: 4
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: LISTEN_ADDRESS
start_native_transport: false
native_transport_port: 9042
start_rpc: true
rpc_address: 0.0.0.0
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
incremental_backups: true
snapshot_before_compaction: false
auto_bootstrap: AUTO_BOOTSTRAP
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 16
compaction_preheat_key_cache: true
read_request_timeout_in_ms: 10000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 10000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
cross_node_timeout: false
endpoint_snitch: Ec2Snitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 128
server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra
client_encryption_options:
enabled: false
keystore: conf/.keystore
keystore_password: cassandra
internode_compression: all
I think http://www.datastax.com/documentation/cassandra/1.2/cassandra/dml/dml_config_consistency_c.html will clear up a lot of this. In particular, QUORUM/ONE is not guaranteed to return the most recent data. QUORUM/QUORUM is. So is ALL/ONE, but that will be intolerant to failure on write.
Edit to go with the new information:
CassandraHostRetryService is part of Hector. I assumed you were testing with cqlsh like a sane person would. Lessons:
Use cqlsh for testing
Use the DataStax Java Driver for building your application, which is faster, easier to use, and has more insight into the cluster state than Hector thanks to the native protocol it's built on.

How to configure Cassandra to work across multiple EC2 regions with Ec2MultiRegionSnitch

I am new to Cassandra and have been tasked with getting it up and running in the EC2 environment across multiple regions such that if an entire EC2 region goes belly up our app will continue on its merry way. I've read as much documentation as I could find regarding Ec2MultiRegionSnitch and have come to a dead stop. I am running cassandra 1.0.10.
My problems are as follows:
1) when I start bin/cassandra I get the error: Could not start register mbean in JMX. Though I can run bin/nodetool -h ring on any of the nodes and I get the display you would expect from a healthy system. I have added the mx4j library to my cassandra deployment. I could try removing that I suppose.
2) when I then start bin/cassandra-cli -h I am able to create the keyspace as follows:
CREATE KEYSPACE mykeyspace
WITH placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy'
and strategy_options = {us-east-1:2,us-west-1:2};
3) After I run 'use mykeyspace' I can create a column family as follows:
CREATE COLUMN FAMILY people
WITH comparator=UTF8Type AND key_validation_class=UTF8Type AND
default_validation_class=UTF8Type AND column_metadata=[{column_name:FIRST_NAME,validation_class:UTF8Type},
{column_name:LAST_NAME,validation_class:UTF8Type},
{column_name:EMAIL,validation_class:UTF8Type},
{column_name:LOGIN,validation_class:UTF8Type, index_type: KEYS}];
4) After I do this I can run bin/cassandra-cli -h on any of the 4 nodes, run use mykeyspace; describe; and each node correctly describes mykeyspace including the column family and seed list.
5) But when I try to perform a simple:
set people['1']['FIRST_NAME'] = 'John';
I get a stack trace as follows:
null
UnavailableException()
at org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:15206)
at org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:858)
at org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:830)
at org.apache.cassandra.cli.CliClient.executeSet(CliClient.java:901)
My configuration:
I have performed ec2-authorize for ports 22, 7000, 7199 and 9160
I have 4 nodes in my cluster: one node in each of the following regions:AvailabilityZones.
us-east-1:us-east-1a (initial_token: 0)
us-east-1:us-east-1c (initial_token: 85070591730234615865843651857942052864)
us-west-1:us-west-1a (initial_token: 1)
us-west-1:us-west-1c (initial_token: 85070591730234615865843651857942052865)
Each EC2 instance has been associated with a public IP address.
In each node I have configured cassandra.yaml as follows:
seeds: <set to the public ip address for the us-east-1a and us-west-1a nodes>
storage_port: 7000
listen_address: <private ip address of this node>
broadcast_address: <public ip address of this node>
rpc_address: 0.0.0.0
rpc_port: 9160
endpoint_snitch: Ec2MultiRegionSnitch
Additionally in each node's cassandra-env.sh I've included:
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=<Node's local IP Address>"
My Plea
Hopefully I have provided someone with enough information to help me get this thing working as one would like.
Additional Information
Stack trace from first mx4j issue:
WARN 22:07:17,651 Could not start register mbean in JMX java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.cassandra.utils.Mx4jTool.maybeLoad(Mx4jTool.java:66)
at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:243)
at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)
at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
Caused by: java.net.BindException: Cannot assign requested address
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:353)
My cassandra-topology.properties
aaa.aaa.aaa.aaa=us-east-1:us-east-1a
bbb.bbb.bbb.bbb=us-east-1:us-east-1c
ccc.ccc.ccc.ccc=us-west-1:us-west-1a
ddd.ddd.ddd.ddd=us-west-1:us-west-1c
default=us-east-1:us-east-1a
My nodetool ring output __
Address DC Rack Status State Load Owns Token
85070591730234615865843651857942052865
aaa.aaa.aaa.aaa us-east 1a Up Normal 11.09 KB 50.00% 0
bbb.bbb.bbb.bbb us-west 1a Up Normal 6.68 KB 0.00% 1
ccc.ccc.ccc.ccc us-east 1c Up Normal 11.09 KB 50.00% 85070591730234615865843651857942052864
ddd.ddd.ddd.ddd us-west 1c Up Normal 15.5 KB 0.00% 85070591730234615865843651857942052865
I'm pretty certain I've added the regions/availability zone correctly. At least I think I matched what appears in the documentation. (Look at Ec2MultiRegionSnitch in this link)
http://www.datastax.com/docs/1.0/cluster_architecture/replication
I don't think I can just list the regions as us-west and us-east because there are two regions out west (us-west-1 is the California region and us-west-2 is the Oregon region). So I don't think just putting us-west would successfully differentiate regions.
My guess in my comment was right. Your replication settings and datacenter names don't match. A couple of things.
1) cassandra-topology.properties is only used by the PropertyFileSnitch. That file is irrelevant while using the ec2 snitch.
2) The reason the snitch is currently reporting 'us-west' instead of 'us-west-1' is due to a bug. https://issues.apache.org/jira/browse/CASSANDRA-4026. If you added nodes in 'us-west-2' they will correctly get reported as that.
So the solution here is to update your replication settings:
CREATE KEYSPACE mykeyspace
WITH placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy'
and strategy_options = {us-east:2,us-west:2};
Also, I unfortunately do not know what is wrong with mx4j. It isn't needed by cassandra though so unless you actually need it for something you can just remove it.

How to remove dead node out of the Cassandra cluster?

I have the cassandra cluster of 12 nodes on EC2.
Because of some failure we lost one of the node completely.I mean that machine do not exist anymore.
So i have created the new EC2 instance with different ip and same token as that of the dead node and i also had the backup of data on that node so it works fine
But the problem is the dead nodes ip still appears as a unreachable node in describe cluster.
As that node (EC2 instance) does not exist anymore I can not use the nodetool decommission or nodetool disablegossip
How can i get rid of this unreachable node
I had the same problem and I resolved it with removenode, which does not require you to find and change the node token.
First, get the node UUID:
nodetool status
DN 192.168.56.201 ? 256 13.1% 4fa4d101-d8d2-4de6-9ad7-a487e165c4ac r1
DN 192.168.56.202 ? 256 12.6% e11d219a-0b65-461e-babc-6485343568f8 r1
UN 192.168.2.91 156.04 KB 256 12.4% e1a33ed4-d613-47a6-8b3b-325650a2bbd4 RAC1
UN 192.168.2.92 156.22 KB 256 13.6% 3a4a086c-36a6-4d69-8b61-864ff37d03c9 RAC1
UN 192.168.2.93 149.6 KB 256 11.3% 20decc72-8d0a-4c3b-8804-cc8bc98fa9e8 RAC1
As you can see the .201 and .202 are dead and on a different network. These have been changed to .91 and .92 without proper decommissioning and recommissioning. I was working on installing the network and made a few mistakes...
Second, remove the .201 with the following command:
nodetool removenode 4fa4d101-d8d2-4de6-9ad7-a487e165c4ac
(in older versions it was nodetool remove ...)
But just like for the nodetool removetoken ..., it blocks... (see comment by samarth in psandord answer) However, it has a side effect, it puts that UUID in a list of nodes to be removed. So next we can force the removal with:
nodetool removenode force
(in older versions it was nodetool remove ...)
Now the node accepts the command it tells me that it is removing the invalid entry:
RemovalStatus: Removing token (-9136982325337481102). Waiting for replication confirmation from [/192.168.2.91,/192.168.2.92].
We also see that it communicates with the two other nodes that are up and thus it takes a little time, but it is still quite fast.
Next a nodetool status does not show the .201 node. I repeat with .202 and now the status is clean.
After that you may also want to run a cleanup as mentioned in psanford answer:
nodetool cleanup
The cleanup should be run on all nodes, one by one, to make sure the change is fully taken in account.
Normally when replacing a node you want to set the new node's token to (failure node's token) - 1 and let it bootstrap. As of 1.0 there is now a flag you can specify on startup to replace a dead node: "cassandra.replace_token=".
Since you have already added the new node with the same token there's an extra step:
Move the new node's token to (failure node's token) - 1 using nodetool move
Run nodetool removetoken <failed node's token> from one of the up nodes
Run nodetool cleanup on each node
These are basically the pre 1.0 instructions for replacing a dead node with the additional token move.

Resources