Too many SSTables are created after running nodetool repair and the SStable is not compacting - cassandra-2.0

I have 4 node in a data center with a replication factor 3. While running nodetool repair in a node too many SSTables are generated but the file size is very less (around 4KB). This issues had been reported earlier by someone else in the link given below. But for us, compaction does not trigger even after 2 weeks of time and remains with so many sstables (about 10k). We are using the size-tiered-compaction and cassandra version is 2.0.3.
https://issues.apache.org/jira/browse/CASSANDRA-6698
Please let me know, if you need any further details.

Related

Hbase Too many bad rows in the VerifyReplication

We have setup 2 hbase cluster with the replication between them. In which one of them act as a active cluser and another one as DR cluster. Some months before, we had error in the DR cluster due to which replication lag in the active cluster.
After some days, we had made DR Cluster up, and replication lag cleared. Though the replication lag cleared, we have considerable amount (3.5T) oldWALs in the active cluster.
I have tried to run VerifyReplication in hbase between clusters for only one day and it shows more number of bad rows (about 25%) between cluster. I suspect the replication has not been replayed between the cluster properly.
Is there any way to fix the difference between 2 clusters? Is there any way to replicate data in the peer by replaying oldWALs? What are the issues in replaying oldWALs.
Please help.
Hbase version - 1.4.11
Hadoop Version - 2.7.3

can database size be different on different node in mairadb galera cluster?

I have three node MariaDB Galera Cluster.
Initially database size in all the three nodes was same.
However recently it has been noticed that database size in one node more than other two node.
Could you please let me know whether it is expected behaviour.
Thanks in advance.
As Galera doesn't do physical replication of tablespace disk blocks but logical replication of transactions the data sizes on different nodes (assuming you are referring to tablespace file sizes on disk) may differ for several reasons:
differences in table/index fragmentation due to different operations order
different undo log sizes due to local rollbacks never replicated to other nodes
... or due to different multi-versioning requirements as old row stats needed to be preserved longer for long running transactions still needing an older isolated view of the data
...
So this is expected behavior for sure.

Cassandra compaction taking too much time to complete

Initially we had 12 nodes in Cassandra cluster and with 500GB of data load on each node major compaction use to complete in 20 hours.
Now we have upgraded the cluster to 24 nodes and with same data size that is 500 GB on each node major compaction is taking 5 days.(hardware configuration of each node is exactly same and we are using cassandra-0.8.2 )
So what could be the possible reason for this slowdown?
Is increased cluster size causing this issue?
Compaction is is a completely local operation, so cluster size would not affect it. Request volume would, and so would data volume.

Unable to add nodes to existing Cassandra Cluster

We have cassandra cluster of 6 nodes on EC2,we have to double its capacity to 12 nodes.
So to add 6 more nodes i followed the following steps.
1 Calculated the tokens for 12 nodes and configured the new nodes accordingly.
2 With proper configuration started the new nodes so that they new nodes will bisect the
existing token ranges.
In the beginning all the new nodes were showing the streaming in
progress.
In ring status all the node were in "Joining" state
After 12 hours 2 nodes completed the streaming and came into the normal state.
But on the remaining 4 nodes after streaming some amount of data they are not showing any progress , look like they are stuck
We have installed Cassandra-0.8.2 and have around 500 GB of data on each existing nodes and storing data on EBS volume.
How can i resolve this issue and get the balanced cluster of 12 nodes?
Can i restart the nodes?
If i cleaned the data directory of stuck Cassandra nodes and restarted with fresh installation, will it cause any data loss?
There will not be any data loss if you replication factor 2 or greater.
Version 0.8.2 of Cassandra has several known issues - please upgrade to 0.8.8 on all original nodes as well as the new the nodes that came up and then start the procedure over for the nodes that did not complete.
Also, be aware that storing data on EBS volumes is a bad idea :
http://www.mail-archive.com/user#cassandra.apache.org/msg11022.html
While this won't answer your question directly, hopefully it points you in the right direction:
There is a fairly active #cassandra IRC channel on freenode.org.
So here is the answer why our some of the nodes were stuck.
1) We have upgraded from cassandra-0.7.2 to cassandra0.8.2
2) And we are loading the sstables with sstable-loader utility
3) But some data for some of the column families are directly inserted from hadoop job.
And the data of these column families are showing some other version as we have not upgraded the cassandra api in hadoop.
4) Because of this version mismatch cassandra throws 'version mismatch exception' and terminate the streaming
5) So the solution for this is to use "nodetool scrub keyspace columnfamily". I have used this and my issue is resolved
So the main thing here is if you are upgrading the cassandra cluster capacity u must do the nodetool scrub

Even data distribution on hadoop/hive

I am trying a small hadoop setup (for experimentation) with just 2 machines. I am loading about 13GB of data, a table of around 39 million rows, with a replication factor of 1 using Hive.
My problem is hadoop always stores all this data on a single datanode. Only if I change the dfs_replication factor to 2 using setrep, hadoop copies data on the other node. I also tried the balancer ($HADOOP_HOME/bin/start-balancer.sh -threshold 0). The balancer recognizes that it needs to move around 5GB to balance. But says: No block can be moved. Exiting... and exits:
2010-07-05 08:27:54,974 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: Using a threshold of 0.0
2010-07-05 08:27:56,995 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.252.130.177:1036
2010-07-05 08:27:56,995 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.220.222.64:1036
2010-07-05 08:27:56,996 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: 1 over utilized nodes: 10.220.222.64:1036
2010-07-05 08:27:56,996 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: 1 under utilized nodes: 10.252.130.177:1036
2010-07-05 08:27:56,997 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: Need to move 5.42 GB bytes to make the cluster balanced.
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
No block can be moved. Exiting...
Balancing took 2.222 seconds
Can anybody suggest how to achieve even distribution of data on hadoop, without replication?
are you using both your machines as datanodes? Highly unlikely but you can confirm this for me.
Typically in a 2 machine cluster, I'd expect one machine to be the namenode and the other one to be the datanode. So the fact that when you set the replication factor to 1, the data gets copied to the only datanode available to it. If you change it to 2, it may look for another datanode in the cluster to copy the data to but won't find it and hence it may exit.

Resources