alter table add column not always propagating in cassandra - cassandra-2.0

I am using apache cassandra (v. 2.0.9) on a 4-node cluster, replication factor = 3 and Datastax Java Driver for Cassandra (v. 2.0.2). I am using CQL queries from inside my Java code to add columns to existing tables.
I observed this issue when my CREATE INDEX queries and SELECT queries on the newly added columns failed, reason being that the column was not found. No error was logged in cassandra logs.
Note that this issue did not appear when I ran cassandra on a single node, but occurs persistently on 4-node cluster. Currently I am working around it by retrying for at most 5 times and I notice that columns are added at most by third or fourth retry. Also I observed that higher the number of existing columns in a table, lesser are such failures.
I found a bug already reported at:
https://issues.apache.org/jira/browse/CASSANDRA-7186

It worked fine after I disabled all firewalls, so this may be happening due to cassandra using particular ports for communication between nodes & that being blocked due to firewall.

Related

Does Cassandra session re-create when WAS disconnects to Cassandra cluster?(e.g. Network issue)

I have tested with Circle CI and docker(cassandra img)
and When I test, logs appear like below
"Tried to execute unknown prepared query. You may have used a PreparedStatement that was created with another Cluster instance."
But Cassandra Cluster exists as solo. So, I can't understand What makes this error.
Could it happen because of Cassandra Connection issue?
Tests have failed sometimes because WAS can't connect to Cassandra Cluster
(I think CircleCI causes this issue)
so I just guess
WAS can't connect to Cassandra Cluster during testing
Session re-created
Error logs with PreparedStatement happens
Is it Possible?
If not, How does this Error happen though just One Cassandra Cluster is operating?
The "Cluster instance" being referred to in this message is the Cluster object in your app code:
Tried to execute unknown prepared query. You may have used a PreparedStatement \
that was created with another Cluster instance.
That error implies that you have multiple Cluster objects in your app. You should only have one instance that is shared throughout your app code and you shouldn't create multiple Cluster objects. Cheers!

How to perform CDC for mysql percona cluster and create topics for all dbs

I was looking for a solution to capture all database changes(create,delete,update) from a 3 node percona mysql cluster and send the updates to visualization tool such as Kibana to search events. Many suggested mysql bin log--> Debezium/maxwell-->Apache kafka--> Ekasticsearch(kibana).
I have following questions on this setup,
Have i picked up the right solution
With 100s of databases and each db with 100 tables, how can I capture changes for all ? I see the topic could be created only for specific table and specific Database with debezium.
With maxwell, i could get all changes output , however unable to resume from where we left in the event of stopping and starting maxwell/ db.
Any help appreciated.

HBase components doesn't appear in Pentaho Kettle

I am trying to working with Pentaho, in order to build some big data solutions. But the Hadoop HBase components aren't appering in the dashboard. I don't understand why HBase doesn't appear, since HBase is up an running on my machine... I've been seeking for a solutions, but without success...
Please check this property value 'hbase.client.scanner.timeout.period' set to 10 mins in hbase-default.xml to get rid of hbase exceptions.
Check that you have added zookeeper host in the hbase output host in pentaho data integration tool.
Have you read this wiki in order to load hbase data into pentaho.

system_auth replication in Cassandra

I'm trying to configure authentication on Cassandra. It seems like because of replication strategy that is used for system_auth, it can't replicate user credentials to all the nodes in cluster, so I end up getting Incorrect credentials on one node, and getting successful connection on another.
This is related question. The guy there says you have to make sure credentials are always on all nodes.
How to do it? The option that is offered there says you have to alter keyspace to put replication factor equal to amount of nodes in cluster, then run repair on each node. That's whole tons of work to be done if you want your cassandra to be dynamically scalable. If I add 1 node today, 1 node another day, alter keyspace replication and then keep restarting nodes manually that will end up some kind of chaos.
Hour of googling actually leaded to slightly mentioned EverywhereStrategy, but I don't see anywhere in docs it mentioned as available. How do people configure APIs to work with Cassandra authentication then, if you can't be sure that your user actually present on node, that you're specifying as contact point?
Obviously, talking about true scale, when you can change the size of cluster without doing restarts of each node.
When you enable authentication in Cassandra, then Yes you have increase the system_auth keyspace replication_factor to N(total number of nodes) and run a complete repair, but you don't need to restart the nodes after you add a new Node.
If repair is consuming more time then you optimize your repair like repair only the system_auth keyspace
nodetool repair system_auth
(or)
nodetool repair -pr system_auth
As per Cassandra a complete repair should be done regularly. For more details on repair see the below links:
http://www.datastax.com/dev/blog/repair-in-cassandra
https://www.pythian.com/blog/effective-anti-entropy-repair-cassandra/
http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html
Answering your questions:
Question: How do people configure APIs to work with Cassandra authentication then, if you can't be sure that your user actually present on node, that you're specifying as contact point?
Answer: I'm using Cassandra 2.2 and Astyanax thrift API from my Spring project, using which I am able to handle the Cassandra authentication effectively. Specify what version of Cassandra you are using and what driver you are using to connect CQL driver or Astyanax thrift API?
Question: Obviously, talking about true scale, when you can change the size of cluster without doing restarts of each node.
Answer: Yes you can scale your Cassandra cluster without restarting nodes, please check the datastax documentation for Cassandra 2.2 version:
http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/operations/opsAddNodeToCluster.html
Check the datastax docs for the version you are using.

elasticsearch snapshot fails when nodes in a cluster run in different machines

we are using elasticsearch 1.1.1
We have a cluster with 3 nodes and all three nodes are in 3 different machines.
Accessing the cluster, performing index operations work fine.
But when we use snapshot feature to take the backup, it (backup) getting failed.
but if we have all three nodes on the same machine, the snapshot command works fine.
Did anybody face this issue.
I did not include the configuration details here, as the cluster and indexing operations work fine without any issues.
Thanks in advance.
For those who are looking for a solution, it is a requirement that we should use the NFS

Resources