cassandra operations queue is full - cassandra-2.0

I'm running datastax enterprise 4.5.1, with opscenter 5.1.1. These were installed from the standalone linux installers on Ubuntu 14.04 LTS.
$ cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.8.39 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
In the datastax-agent log, I have been seeing tons of these WARN messages:
WARN [Thread-11] 2015-04-23 13:13:49,005 7647864 operations dropped so far.
WARN [Thread-11] 2015-04-23 13:13:49,005 Cassandra operation queue is full, discarding cassandra operation
similarly, these errors:
WARN [rollup-snapshot] 2015-04-30 16:20:40,432 Cassandra operation queue is full, discarding cassandra operation
WARN [rollup-snapshot] 2015-04-30 16:20:40,432 9 operations dropped so far.
Can someone give me an idea of what causes these? The node seems to be operating ok, no obvious errors in system.log to correlate. In datastax-agent-env.sh file, I've set JVM_OPTS="$JVM_OPTS -Xmx256M" but that doesn't eliminate the problem.

Try these changes in the config:
The following settings were changed in the agent address.yaml file. The agent process will need to be restarted for these settings to take effect.
thrift_max_conns: 10
async_pool_size: 10
async_queue_size: 20000
https://support.datastax.com/hc/en-us/articles/204225789-Ops-Center-is-not-showing-any-metrics-in-the-UI-dashboard

Related

HBase fully distributed mode [Zookeeper error while executing HBase shell]

Following these two tutorials: i.e tutorial 1 and tutorial 2, I was able to set up HBase cluster in fully-distributed mode. Initially the cluster seems to work okay.
The 'jps' output in HMaster/ Name node
The jps output in DataNodes/ RegionServers
Nevertheless, when every I try to execute hbase shell, it seems that the HBase processors are interrupted due to some Zookeeper error. The error is pasted below:
2021-03-13 11:52:26,047 ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper exists failed a│1951 HRegionServer
fter 4 attempts │hduser#master-vm:~$
2021-03-13 11:52:26,048 WARN [main] zookeeper.ZKUtil: hconnection-0x4375b0130x0, quorum=137.4│
3.49.59:2181,137.43.49.58:2181,137.43.49.50:2181,137.43.49.49:2181, baseZNode=/hbase Unable to│
set watcher on znode (/hbase/hbaseid) │
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss│
for /hbase/hbaseid │
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) │
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) │
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) │
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.│
java:221) │
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:417) │
at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:6│
I tried several attempts to solve this issue (including trying out with different HBase/ Hadoop compatible versions). But still no progress.
Would like to have your input on this.
Shared below are other information required:
in /etc/hosts file:
(I already tried commenting the HBase related hosts in /etc/hosts/, still didn'w work)
in hbase-site.xml
After 5 days of hustle, I learned what went wrong. Posting my solution here. Hope it can help some of the other developers too. Also would like to thank #VV_FS for the comments.
In my scenario, I used virtual machines which I burrowed from an external party. Therefore, there were certain firewalls and other security measures. In case if you follow a similar experimental setup, these steps might help you.
To set up HBase cluster, follow the following tutorials.
Set up Hadoop in distributed mode.
Notes when setting up HBase in fully distributed-mode:
Make sure to open all the ports mentioned in the post. For example, use sudo ufw allow 9000 to open port 9000. Follow the command to open all the ports in relation to running Hadoop.
Set up Zookeeper in distributed mode.
Notes when setting up Zookeeper in fully distributed mode:
Make sure to open all the ports mentioned in the post. For example, use sudo ufw allow 3888 to open port 3888. Follow the command to open all the ports in relation to running Zookeeper.
DO NOT START ZOOKEEPER NODES AFTER INSTALLATION. ZOOKEEPER WILL BE MANAGED HBASE INTERNALLY. THEREFORE, DON'T START ZOOKEEPER AT THIS STAGE.
Set up HBase in distributed mode.
When setting up values for hbase-site.xml, use port number 60000 for hbase.master tag, not 60010. (thanks #VV_FS to point this out in the earlier discussion).
Make sure to open all the ports mentioned in the post. For example, use sudo ufw allow 60000 to open port 60000. Follow the command to open all the ports in relation to running Zookeeper.
[Important thoughts]: If encounters errors, always refer to HBase logs. In my case, hbase-mater-xxxxx.log and zookeeper-master--xxx.log helped me to track down exact errors.

WARN : Your Hadoop installation does not include the StreamCapabilities class from HDFS-11644

I run a Kerberised Hbase Cluster over HDFS and I get this warning in the master log when I start hbase :
WARN [Thread-15] util.CommonFSUtils: Your Hadoop installation does not include the StreamCapabilities class from HDFS-11644, so we will skip checking if any FSDataOutputStreams actually support hflush/hsync. If you are running on top of HDFS this probably just means you have an older version and this can be ignored. If you are running on top of an alternate FileSystem implementation you should manually verify that hflush and hsync are implemented; otherwise you risk data loss and hard to diagnose errors when our assumptions are violated.
I'm running HBase 2.0.5 and Hadoop 3.1.2
Does anyone have any idea what that means ?

Ambari won't restart: DB check failed

When I restarted my cluster, ambari didn't start because of a db check failed config:
sudo service ambari-server restart --skip-database-check
Using python /usr/bin/python
Restarting ambari-server
Waiting for server stop...
Ambari Server stopped
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Ambari Server is starting with the database consistency check skipped. Do not make any changes to your cluster topology or perform a cluster upgrade until you correct the database consistency issues. See "/var/log/ambari-server/ambari-server-check-database.log" for more details on the consistency issues.
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start.....................
DB configs consistency check failed. Run "ambari-server start --skip-database-check" to skip. You may try --auto-fix-database flag to attempt to fix issues automatically. If you use this "--skip-database-check" option, do not make any changes to your cluster topology or perform a cluster upgrade until you correct the database consistency issues. See /var/log/ambari-server/ambari-server-check-database.log for more details on the consistency issues.
ERROR: Exiting with exit code -1.
REASON: Ambari Server java process has stopped. Please check the logs for more information.
I looked in the logs in "/var/log/ambari-server/ambari-server-check-database.log", and I saw:
2017-08-23 08:16:13,445 INFO - Checking Topology tables
2017-08-23 08:16:13,447 ERROR - Your topology request hierarchy is not complete for each row in topology_request should exist at least one raw in topology_logical_request, topology_host_request, topology_host_task, topology_logical_task.
I tried both options --auto-fix-database and --skip-database-check, it didn't work.
It seems that postgresql didn't start correctly, and even if in the log of Ambari there was no mention of postgresql not started or not available, but it was weird that ambari couldn't access to the topology configuration stored in it.
sudo service postgresql restart
Stopping postgresql service: [ OK ]
Starting postgresql service: [ OK ]
It did the trick:
sudo service ambari-server restart
Using python /usr/bin/python
Restarting ambari-server
Ambari Server is not running
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Ambari database consistency check started...
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start.........
Server started listening on 8080

Percona Xtradb Cluster nodes won't start

I setup percona_xtradb_cluster-56 with three nodes in the cluster. To start the first cluster, i use the following command and it starts just fine:
#/etc/init.d/mysql bootstrap-pxc
The other two nodes however fail to start when i start them normally using the command:
#/etc/init.d/mysql start
The error i am getting is "The server quit without updating the PID file". The error log contains this message:
Error in my_thread_global_end(): 1 threads didn't exit 150605 22:10:29
mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended.
The cluster nodes are running all Ubuntu 14.04. When i use percona-xtradb-cluster5.5, the cluster ann all the nodes run just fine as expected. But i need to use version 5.6 because i am also using GTID which is only available in version 5.6 and not supported in earlier versions.
I was following these two percona documentation to setup the cluster:
https://www.percona.com/doc/percona-xtradb-cluster/5.6/installation.html#installation
https://www.percona.com/doc/percona-xtradb-cluster/5.6/howtos/ubuntu_howto.html
Any insight or suggestions on how to resolve this issue would be highly appreciated.
The problem is related to memory, as "The Georgia" writes. There should be at least 500MB for default setup and bootstrapping. See here http://sysadm.pp.ua/linux/px-cluster.html

HBase error: Server IPC version 8 cannot communicate with client version 4

I'm using hbase-0.94.9, I tried to follow the introductions from HBase online book, but I got the error:
org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException
Then I found on the Web that I had to set up Hadoop first, I used start-dfs.sh in Hadoop 2.0.5-alpha but now I get this error, when I try to run start-hbase.sh:
2013-07-09 17:27:40,706 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
org.apache.hadoop.ipc.RemoteException: Server IPC version 8 cannot communicate with client version 4
You trying to use an HBase release that was built against Hadoop 1.0.x with hadoop 2.0.x. Either use an HBase release built against Hadoop 2.0.x or rebuild your HBase with hadoop.profile set to 2.0
-Dhadoop.profile=2.0
If you need help on how to build HBase, you can visit this link.
HTH

Resources