HBase NoServerForRegionException? - hadoop

I am getting this exception when for a while i didn't communicated with HBase:
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region because: Connection refused
is this something related with session expiry, if so, how can i extend session lifetime?

Query bin/hbase hbck and find in which machine root Regionserver is running..
You should get -ROOT- is okay on hbck. Make sure that all your
Regionserver is up and running.
use start regionserver for starting regionserver

I don't think this has anything to do with session lifetime.
Check your cluster to make sure that it is up and working correctly and all region servers are alive. Then check the logs to make sure that they are not reporting some error state.
HBase is complex software -- without more detailed information it is very difficult to diagnose what is going on. And often you can discover the problem by collecting the more detailed information.

This error shows that the client is not able to talk to Region server.
Check the region server associated with the region its trying to connect and check its up.
To identify the region server associated with the region please go through http://hbase.apache.org/0.94/book/regions.arch.html#regions.arch.assignment

Some factors have played a role here.
Please note the below steps which occur when you try to connect to Hbase from a client,
Hbase connects to Zookeeper to get the Ip of the regionservers which host the ROOT table.
The client caches this information about the IP's so that it doesnt have to contact the zookeeper again.
Your problem is that, your client is trying to connect to the zookeeper to get the IP. one of the below things may be going wrong,
Your client is not able to connect to the zookeeper.
The information about the ROOT contained inside the Znode in ZooKeeper is wrong.
Possible fixes.
Check if your zookeeper is working fine.
Delete the Znode for Hbase in your Zookeeper and restart the cluster. Don't worry, this wont delete your data.
Once this is achieved? the client can get the ROOT information and then query for the META table without any issue.

Related

Hortonworks HDP , heartbeat lost in one of the 3 nodes

I have installed HDP Ambari with three nodes in VM, i restarted one of three nodes i.e., datanode2 after that, i lost heart beat from that node in Ambari. I restarted ambari-agent in all three nodes, then also not working. Kindly find me a solution.
Well the provided information is not sufficient, anyway i will try to tell you the normal approach I take to debug this.
First check if all the ambari-agents are running, use the command ambari-agent status.
Check the logs of both ambari-agent and ambari-server. Normally the logs are available at /var/log/ambari-agent and /var/log/ambari-server. Logs should tell you the exact reason for heartbeat lost.
Most common reasons for the agent failure would be Connection issues between the machines, version mismatch or corrupt database entry.
I think log files should help you.

Cloudant : Error with running weatherreport to check cluster health

We have three node cluster setup and facing issue to run weather report command.
By looking at error, it is clear that machine from where weatherreport utility is running not able to connect to other two machines. I have checked all machines and they are accessible using fqdn. But from message it looks like it is using shortname while connecting to peer machine. So how to check from where it is taking peer machine names? So I can give a try to change them to full machine name and that might work for me. if there is any other solution then let us know.
Error is coming as
['cloudant_diag17506#machine2031.domain.com'] [crit] Could not run check weatherreport_check_safe_to_rebuild on cluster node 'cloudant#machine2031'
['cloudant_diag17506#machine2031.domain.com'] [crit] Could not run check weatherreport_check_safe_to_rebuild on cluster node 'cloudant#machine2032'
['cloudant_diag17506#machine2031.domain.com'] [crit] Could not run check weatherreport_check_safe_to_rebuild on cluster node 'cloudant#machine2033'
['cloudant#machine2032.domain.com'] [crit] Rebuilding this node will leave the following shard with NO live copies: default/t_alpha e0000000-ffffffff, default/t_alpha a0000000-bfffffff, default/t_alpha 60000000-7fffffff, default/t_alpha 20000000-3fffffff, default/metrics_app e0000000-ffffffff, default/metrics_app a0000000-bfffffff, default/metrics_app 60000000-7fffffff, default/metrics_app 20000000-3fffffff
I got solution for this problem.
It was problem that when DB was created first time, short name was used so in database it might be referring for short name to connect to other peer hosts.
Now that the Cloudant Local installation is in problematic stage, to make it consistent would be to remove all the files under /srv/cloudant/ on all database nodes. This will remove all default Cloudant databases. Then run the configure.sh script again on each node as before but now that "hostname -f" correctly outputs the fully qualified host name, then create your databases again.

Datastax Opscenter issue: dashboard timeout

I installed Datastax community version in an EC2 server and it worked fine. After that I tried to add one more server and I see two nodes in the Nodes menu but in the main dashboard I see the following error:
Error: Call to /Test_Cluster__No_AMI_Parameters/rc/dashboard_presets/ timed out.
One potential rootcause I can see is the name of the cluster? I specified something else in the cassandra.yaml but it looks like opscenter is still using the original name? Any help would be grealy appreciated.
It was because cluster name change wasn't made properly. I found it easier to change the cluster name before starting Cassandra cluster. On top of this, only one instance of opscentered needs to run in one single cluster. datastax-agent needs to be running in all nodes in the cluster but they need to point to the same opscenterd (change needs to be made at /var/lib/datastax-agent/conf/address.yaml)

Novell eDirectory: Error while adding replica on new server

I want to add a replica of our whole eDirectory tree to a new server (OES11.2 SLES11.3).
So I wanted to do so via iManager. (Partitions and Replicas / Replica View / Add Replica)
Everthing looks normal. I see our other servers with added replicas and of course the server with the master image.
For addition information: I did that a lot of times without problems until now.
When I want to add a replica to the new server, i get the following error: (Error -636) The server is unreachable.
I checked the /etc/hosts file and the network settings on both servers.
Ndsrepair looks normal too. All servers are in sync and there are no connection errors. The replica depth of the new server is -1. I get that, because there is no replica on it yet.
But if i can connect from one server to another and there are no error messages, why does adding a replica not work?
I also tried to make a LAN trace, but didn't get any information that would help me out here. In the trace the communication seems normal!
Am I forgetting something here?
Every server in our environment runs OES11.2 except the master server which runs OES11.1
Thanks for your help!
Daniel
Nothing wrong.
Error -636 means that the replica is not yet available at the new server. When will the synchronization, the replica will be ready and available. Depending on the size of the Tree and the communication channel we can wait for up to some hours.

Errors in setting up HBase on Distributed Hadoop, ZooKeeperServer not running

I'm trying to set up HBase on Hadoop and have been follow various great tutorials online by Michael G. Noll and here. Basically all is fine, my Hdfs and MapRed works well on the web interface it shows that I have 2 nodes (my NameNode is both a NameNode and a DataNode but that's just for testing purposes).
When I got to the point of installing HBase, thats where I meet problems, I get lots of different errors. The latest one I have is on the log file in my slave node
INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /10.2.xx.xx:43089 (no session established for client)
INFO org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
But when I type in
$ zkServer.sh status
It says shows the mode that both machines are running in!
Anyone has any idea what is this problem. Or does any one know of another guide/tutorial that I can follow to set this up? I've tried following the HBase documentation on setting up HBase on a distributed HDFS but it doesn't work too.
Thanks for any help offered!
Are both the zookeepers servers configured in a Qorum? If so have they managed to connect to one another and vote on who's the leader (this should all be in the logs for both servers).
Zookeeper may be running, but if they can't communicate with one another (firewall rules or miss configuration for example), then zookeeper will not accept in coming client connections

Resources