Clickhouse table stuck in read-only mode - clickhouse

I am new to clickhouse
So on one my system i am seeing this issue repeatedly:
"{} \u003cError\u003e void DB::AsynchronousMetrics::update(): Cannot
get replica delay for table: people: Code: 242, e.displayText() =
DB::Exception: Table is in readonly mode, Stack trace:"
And i can see my zookeeper was not in good state, So from the clickhouse docs, it seems this is related to either
Metadata in zookeeper got deleted somehow
Zookeeper was not up when clickhouse was trying to comes up.
Either way i want to recover from the error and docs suggested below steps:
a)To start recovery, create the node /path_to_table/replica_name/flags/force_restore_data in ZooKeeper
with any content, or run the command to restore all replicated tables:
sudo -u clickhouse touch /var/lib/clickhouse/flags/force_restore_data
Then restart the server. On start, the server deletes these flags and starts recovery.
But i am not able to understand where should i run this command, i looked inside the clickhouse container under the location /var/lib/clickhouse there is not flags directory. Should i create it first??
Also is there a way to recover from this error without restarting the server, i would rather avoid container restart??
Attaching few relevant logs before the read only exception:
2020.06.19 16:49:02.789216 [ 13 ] {} <Error> DB_0.people (ReplicatedMergeTreeRestartingThread): Couldn't start replication: Replica /clickhouse/tables/shard_0/people/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/people/replicas/replica_0732646014/is_active manually, DB::Exception: Replica /clickhouse/tables/shard_0/people/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/people/replicas/replica_0732646014/is_active manually, stack trace:
2020.06.19 16:49:13.576855 [ 17 ] {} <Error> DB_0.school (ReplicatedMergeTreeRestartingThread): Couldn't start replication: Replica /clickhouse/tables/shard_0/school/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/school/replicas/replica_0732646014/is_active manually, DB::Exception: Replica /clickhouse/tables/shard_0/school/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/school/replicas/replica_0732646014/is_active manually, stack trace:
2020.06.19 16:49:23.497824 [ 19 ] {} <Error> DB_0.people (ReplicatedMergeTreeRestartingThread): Couldn't start replication: Replica /clickhouse/tables/shard_0/people/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/people/replicas/replica_0732646014/is_active manually, DB::Exception: Replica /clickhouse/tables/shard_0/people/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/people/replicas/replica_0732646014/is_active manually, stack trace:
2020.06.19 16:49:23.665089 [ 20 ] {} <Error> DB_0.school (ReplicatedMergeTreeRestartingThread): Couldn't start replication: Replica /clickhouse/tables/shard_0/school/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/school/replicas/replica_0732646014/is_active manually, DB::Exception: Replica /clickhouse/tables/shard_0/school/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/school/replicas/replica_0732646014/is_active manually, stack trace:
2020.06.19 16:49:59.703591 [ 41 ] {} <Error> void Coordination::ZooKeeper::receiveThread(): Code: 999, e.displayText() = Coordination::Exception: Operation timeout (no response) for path: /clickhouse/tables/shard_0/school/blocks (Operation timeout), Stack trace:
2020.06.19 16:49:59.847751 [ 18 ] {} <Error> DB_0.people: void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/shard_0/people/mutations, Stack trace:
2020.06.19 16:50:00.205911 [ 19 ] {} <Warning> DB_0.school (ReplicatedMergeTreeRestartingThread): ZooKeeper session has expired. Switching to a new session.
2020.06.19 16:50:00.315063 [ 19 ] {} <Error> zkutil::EphemeralNodeHolder::~EphemeralNodeHolder(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), Stack trace:
2020.06.19 16:50:00.338176 [ 15 ] {} <Error> DB_0.people: void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), Stack trace:
2020.06.19 16:50:00.387589 [ 16 ] {} <Error> DB_0.school: void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/shard_0/school/log, Stack trace:
2020.06.19 16:50:00.512689 [ 17 ] {} <Error> zkutil::EphemeralNodeHolder::~EphemeralNodeHolder(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), Stack trace:
2020.06.19 16:50:20.753596 [ 47 ] {} <Error> void DB::DDLWorker::runMainThread(): Code: 999, e.displayText() = Coordination::Exception: All connection tries failed while connecting to ZooKeeper. Addresses: 172.16.0.28:2181
Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Timeout: connect timed out: 172.16.0.28:2181 (version 19.13.1.11 (official build)), 172.16.0.28:2181
Code: 209, e.displayText() = DB::NetException: Timeout exceeded while reading from socket (172.16.0.28:2181): while receiving handshake from ZooKeeper (version 19.13.1.11 (official build)), 172.16.0.28:2181
Code: 209, e.displayText() = DB::NetException: Timeout exceeded while reading from socket (172.16.0.28:2181): while receiving handshake from ZooKeeper (version 19.13.1.11 (official build)), 172.16.0.28:2181
(Connection loss), Stack trace:
2020.06.19 16:50:31.499775 [ 51 ] {} <Error> void DB::AsynchronousMetrics::update(): Cannot get replica delay for table: DB_0.people: Code: 242, e.displayText() = DB::Exception: Table is in readonly mode, Stack trace:
Edit: I did manage to find the folder where flags is present(its present in my volume /repo/data) but when i try to run the command
sudo -u clickhouse touch /repo/data/flags/force_restore_data
I got this:
Use one of the following commands:
clickhouse local [args]
clickhouse client [args]
clickhouse benchmark [args]
clickhouse server [args]
clickhouse extract-from-config [args]
clickhouse compressor [args]
clickhouse format [args]
clickhouse copier [args]
clickhouse obfuscator [args]

Related

Cannot allocate block number in ZooKeeper: Coordination::Exception: Connection loss

When I write data into my clickhouse using jdbc for 1000 0000 records, it throws an exception.
And I find my zookeeper is still alive. Could anyone tell me why the connection is lost.
Thank u.
i: 6999998
i: 6999999
i: 7000000
Wed Jul 07 17:25:40 CST 2021
ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 999, host: 10.10.1.1, port: 8123; Code: 999, e.displayText() = DB::Exception: Cannot allocate block number in ZooKeeper: Coordination::Exception: Connection loss (version 21.6.3.14 (official build))
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:59)
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:29)
at ru.yandex.clickhouse.ClickHouseStatementImpl.checkForErrorAndThrow(ClickHouseStatementImpl.java:1094)
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:1061)
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:1026)
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:1019)
at ru.yandex.clickhouse.ClickHousePreparedStatementImpl.executeBatch(ClickHousePreparedStatementImpl.java:381)
at ru.yandex.clickhouse.ClickHousePreparedStatementImpl.executeBatch(ClickHousePreparedStatementImpl.java:364)
at com.chris.slient.clickhouse.App.test11(App.java:304)
at com.chris.slient.clickhouse.App.main(App.java:441)
Caused by: java.lang.Throwable: Code: 999, e.displayText() = DB::Exception: Cannot allocate block number in ZooKeeper: Coordination::Exception: Connection loss (version 21.6.3.14 (official build))
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:54)
... 9 more

Kafka stream app failing to fetch offsets for partition

I created a kafka cluster with 3 brokers and following details:
Created 3 topics, each one with replication factor=3 and partitions=2.
Created 2 producers each one writing to one of the topics.
Created a Streams application to process messages from 2 topics and write to the 3rd topic.
It was all running fine till now but I suddenly started getting the following warning when starting the Streams application:
[WARN ] 2018-06-08 21:16:49.188 [Stream3-4f7403ad-aba6-4d34-885d-60114fc9fcff-StreamThread-1] org.apache.kafka.clients.consumer.internals.Fetcher [Consumer clientId=Stream3-4f7403ad-aba6-4d34-885d-60114fc9fcff-StreamThread-1-restore-consumer, groupId=] Attempt to fetch offsets for partition Stream3-KSTREAM-OUTEROTHER-0000000005-store-changelog-0 failed due to: Disk error when trying to access log file on the disk.
Due to this warning, Streams application is not processing anything from the 2 topics.
I tried following things:
Stopped all brokers, deleted kafka-logs directory for each broker and restarted the brokers. It didn't solve the issue.
Stopped zookeeper and all brokers, deleted zookeeper logs as well as kafka-logs for each broker, restarted zookeeper and brokers and created the topics again. This too didn't solve the issue.
I am not able to find anything related to this error on official docs or web. Does anyone have an idea of why am I getting this error suddenly?
EDIT:
Out of 3 brokers, 2 brokers(broker-0 and broker-2) continously emit these logs:
Broker-0 logs:
[2018-06-09 02:03:08,750] INFO [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial11_topic-1 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
[2018-06-09 02:03:08,750] INFO [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial12_topic-0 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
Broker-2 logs:
[2018-06-09 02:04:46,889] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial11_topic-1 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
[2018-06-09 02:04:46,889] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial12_topic-0 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
Broker-1 shows following logs:
[2018-06-09 01:21:26,689] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-06-09 01:31:26,689] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-06-09 01:39:44,667] ERROR [KafkaApi-1] Number of alive brokers '0' does not meet the required replication factor '1' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
[2018-06-09 01:41:26,689] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
I again stopped zookeeper and brokers, deleted their logs and restarted. As soon as I create the topics again, I start getting the above logs.
Topic details:
[zk: localhost:2181(CONNECTED) 3] get /brokers/topics/initial11_topic
{"version":1,"partitions":{"1":[1,0,2],"0":[0,2,1]}}
cZxid = 0x53
ctime = Sat Jun 09 01:25:42 EDT 2018
mZxid = 0x53
mtime = Sat Jun 09 01:25:42 EDT 2018
pZxid = 0x54
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 52
numChildren = 1
[zk: localhost:2181(CONNECTED) 4] get /brokers/topics/initial12_topic
{"version":1,"partitions":{"1":[2,1,0],"0":[1,0,2]}}
cZxid = 0x61
ctime = Sat Jun 09 01:25:47 EDT 2018
mZxid = 0x61
mtime = Sat Jun 09 01:25:47 EDT 2018
pZxid = 0x62
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 52
numChildren = 1
[zk: localhost:2181(CONNECTED) 5] get /brokers/topics/final11_topic
{"version":1,"partitions":{"1":[0,1,2],"0":[2,0,1]}}
cZxid = 0x48
ctime = Sat Jun 09 01:25:32 EDT 2018
mZxid = 0x48
mtime = Sat Jun 09 01:25:32 EDT 2018
pZxid = 0x4a
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 52
numChildren = 1
Any clue?
I found out the issue. It was due to following incorrect config in server.properties of broker-1:
advertised.listeners=PLAINTEXT://10.23.152.109:9094
Mistakenly port for advertised.listeners got changed to same as port of advertised.listeners of broker-2.

elasticsearch nest SniffingConnectionPool not working

I'm using Nest.ElasticClient to connect to Elasticsearch cluster. The cluster is located in Azure VM with just one node.
the cluster is accessible outside the vm by url : http://xxxx.cloudapp.net:9200 and also accessible by ElasticClient if not using SniffingConnectionPool. But not accessible by ElasticClient if using SniffingConnectionPool.
Here is the network config
network.host: [_local_, _site_]
Below is the source code I'm using to get client and check index exists.
var pool = new SniffingConnectionPool(urls.Select(url => new Uri(url)));
ConnectionSettings config = new ConnectionSettings(pool) ;
client = new Nest.ElasticClient(config);
IExistsResponse indexExistsResponse = client.IndexExists(indexName);
The debug info message when I try to use the client to check whether a Index exists, the hostnanme and ip address is modified:
Invalid NEST response built from a unsuccessful low level call on HEAD: /globalleads
# Audit trail of this API call:
- SniffOnStartup: Took: 00:00:00.9846171
- SniffSuccess: Node: http://xxxx.cloudapp.net:9200/ Took: 00:00:00.9595496
- PingFailure: Node: http://10.85.xxx.xx:9200/ Exception: PipelineException Took: 00:00:21.4154660
- SniffOnFail: Took: 00:00:21.1967809
- SniffFailure: Node: http://10.85.xxx.xx:9200/ Exception: PipelineException Took: 00:00:21.1787333
# OriginalException: Elasticsearch.Net.ElasticsearchClientException: One or more errors occurred. ---> System.AggregateException: One or more errors occurred. ---> Elasticsearch.Net.PipelineException: Failed sniffing cluster state. ---> System.AggregateException: One or more errors occurred. ---> Elasticsearch.Net.PipelineException: An error occurred trying to establish a connection with the specified node.
at Elasticsearch.Net.RequestPipeline.Sniff() in D:\dev\git\elasticsearch-net-2.x\src\Elasticsearch.Net\Transport\Pipeline\RequestPipeline.cs:line 326
--- End of inner exception stack trace ---
--- End of inner exception stack trace ---
at Elasticsearch.Net.RequestPipeline.Sniff() in D:\dev\git\elasticsearch-net-2.x\src\Elasticsearch.Net\Transport\Pipeline\RequestPipeline.cs:line 341
at Elasticsearch.Net.RequestPipeline.SniffOnConnectionFailure() in D:\dev\git\elasticsearch-net-2.x\src\Elasticsearch.Net\Transport\Pipeline\RequestPipeline.cs:line 301
at Elasticsearch.Net.Transport`1.Ping(IRequestPipeline pipeline, Node node) in D:\dev\git\elasticsearch-net-2.x\src\Elasticsearch.Net\Transport\Transport.cs:line 179
at Elasticsearch.Net.Transport`1.Request[TReturn](HttpMethod method, String path, PostData`1 data, IRequestParameters requestParameters) in D:\dev\git\elasticsearch-net-2.x\src\Elasticsearch.Net\Transport\Transport.cs:line 68
--- End of inner exception stack trace ---
--- End of inner exception stack trace ---
# Audit exception in step 2 PingFailure:
Elasticsearch.Net.PipelineException: An error occurred trying to establish a connection with the specified node.
at Elasticsearch.Net.RequestPipeline.Ping(Node node) in D:\dev\git\elasticsearch-net-2.x\src\Elasticsearch.Net\Transport\Pipeline\RequestPipeline.cs:line 248
# Audit exception in step 4 SniffFailure:
Elasticsearch.Net.PipelineException: An error occurred trying to establish a connection with the specified node.
at Elasticsearch.Net.RequestPipeline.Sniff() in D:\dev\git\elasticsearch-net-2.x\src\Elasticsearch.Net\Transport\Pipeline\RequestPipeline.cs:line 326
# Request:
<Request stream not captured or already read to completion by serializer. Set DisableDirectStreaming() on ConnectionSettings to force it to be set on the response.>
# Response:
<Response stream not captured or already read to completion by serializer. Set DisableDirectStreaming() on ConnectionSettings to force it to be set on the response.>

Elasticsearch Debugging

Our elasticsearch is a mess. The cluster health is always in red and ive decided to look into it and salvage it if possible. But I have no idea where to begin with. Here is some info regarding our cluster:
{
"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 6,
"active_primary_shards" : 91,
"active_shards" : 91,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 201,
"number_of_pending_tasks" : 0
}
The 6 nodes:
host ip heap.percent ram.percent load node.role master name
es04e.p.comp.net 10.0.22.63 30 22 0.00 d m es04e-es
es06e.p.comp.net 10.0.21.98 20 15 0.37 d m es06e-es
es08e.p.comp.net 10.0.23.198 9 44 0.07 d * es08e-es
es09e.p.comp.net 10.0.32.233 62 45 0.00 d m es09e-es
es05e.p.comp.net 10.0.65.140 18 14 0.00 d m es05e-es
es07e.p.comp.net 10.0.11.69 52 45 0.13 d m es07e-es
Straight away you can see I have a very large number of unassigned shards (201). I came across this answer and tried it and got 'acknowledged:true', but there was no change in the either of the above posted sets of info.
Next I logged into one of the nodes es04 and went through the log files. the first log file has a few lines that caught my attention
[2015-05-21 19:44:51,561][WARN ][transport.netty ] [es04e-es] exception caught on transport layer [[id: 0xbceea4eb]], closing connection
and
[2015-05-26 15:14:43,157][INFO ][cluster.service ] [es04e-es] removed {[es03e-es][R8sz5RWNSoiJ2zm7oZV_xg][es03e.p.sojern.net][inet[/10.0.2.16:9300]],}, reason: zen-disco-receive(from master [[es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]]])
[2015-05-26 15:22:28,721][INFO ][cluster.service ] [es04e-es] removed {[es02e-es][XZ5TErowQfqP40PbR-qTDg][es02e.p.sojern.net][inet[/10.0.2.229:9300]],}, reason: zen-disco-receive(from master [[es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]]])
[2015-05-26 15:32:00,448][INFO ][discovery.ec2 ] [es04e-es] master_left [[es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]]], reason [shut_down]
[2015-05-26 15:32:00,449][WARN ][discovery.ec2 ] [es04e-es] master left (reason = shut_down), current nodes: {[es07e-es][etJN3eOySAydsIi15sqkSQ][es07e.p.sojern.net][inet[/10.0.2.69:9300]],[es04e-es][3KFMUFvzR_CzWRddIMdpBg][es04e.p.sojern.net][inet[/10.0.1.63:9300]],[es05e-es][ZoLnYvAdTcGIhbcFRI3H_A][es05e.p.sojern.net][inet[/10.0.1.140:9300]],[es08e-es][FPa4q07qRg-YA7hAztUj2w][es08e.p.sojern.net][inet[/10.0.2.198:9300]],[es09e-es][4q6eACbOQv-TgEG0-Bye6w][es09e.p.sojern.net][inet[/10.0.2.233:9300]],[es06e-es][zJ17K040Rmiyjf2F8kjIiQ][es06e.p.sojern.net][inet[/10.0.1.98:9300]],}
[2015-05-26 15:32:00,450][INFO ][cluster.service ] [es04e-es] removed {[es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]],}, reason: zen-disco-master_failed ([es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]])
[2015-05-26 15:32:36,741][INFO ][cluster.service ] [es04e-es] new_master [es04e-es][3KFMUFvzR_CzWRddIMdpBg][es04e.p.sojern.net][inet[/10.0.1.63:9300]], reason: zen-disco-join (elected_as_master)
In this section i realized there were a few nodes es01, es02, es03 which were deleted.
After this, all log files(around 30 of them) have only 1 line:
[2015-05-26 15:43:49,971][DEBUG][action.bulk ] [es04e-es] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
I have checked all the nodes and they have same version of ES and logstash. I realize this is a big complicated issues but if anyone can find out the issue and nudge me in the right direction it will be HUGE help
I believe this might be because at some point you have a split brain issue and there were 2 versions of same shard in 2 clusters. One or both might have got different sets of data and 2 versions of shard might have come into existence. At some point you might have restarted the whole system and some shards might have gone to red state.
First see if there is data loss , if there is , aforementioned case could be the reason. Next make sure you set minimum master nodes to N/2+1 ( N is the number of shards ) , so that this issue wont surface again.
YOu can use the shard reroute API on the red shards and see if its moving out of red state. You might loose the shard data here , but then that is the the only way i have seen to being back the cluster state to green.
Please try to install Elastic-head plugin to check, to check shard status. you will able to see which shards are corrupted.
Try flush or optimize option.
Also restart Elastic sometime works.

Why Impala not working on hbase table?

I create an external table B of hbase table A using hive. I can successfully access the data of B.Then I followed the official guide to type in Imapla Shell:
invalidate metadata B;
And then I query this external table B in Impala Shell:
select * from B limit 4;
but it outputs:
ERROR: RuntimeException: couldn't retrieve HBase table (mv_p2pusers) info:
Enable/Disable failed
Here are some logs related:
11:13:58.937 AM INFO jni-util.cc:177
java.lang.RuntimeException: couldn't retrieve HBase table (mv_p2pusers) info:
Enable/Disable failed
at com.cloudera.impala.planner.HBaseScanNode.computeScanRangeLocations(HBaseScanNode.java:300)
at com.cloudera.impala.planner.HBaseScanNode.init(HBaseScanNode.java:125)
at com.cloudera.impala.planner.SingleNodePlanner.createScanNode(SingleNodePlanner.java:891)
at com.cloudera.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1082)
at com.cloudera.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:526)
at com.cloudera.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:151)
at com.cloudera.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:117)
at com.cloudera.impala.planner.Planner.createPlan(Planner.java:47)
at com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:842)
at com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:146)
11:13:58.939 AM INFO status.cc:114
RuntimeException: couldn't retrieve HBase table (mv_p2pusers) info:
Enable/Disable failed
# 0x78b793 (unknown)
# 0xa68275 (unknown)
# 0x9802c6 (unknown)
# 0x99db78 (unknown)
# 0x99e6e4 (unknown)
# 0x9d50cb (unknown)
# 0xb33687 (unknown)
# 0xb29054 (unknown)
# 0x9ac52b (unknown)
# 0x1571c39 (unknown)
# 0x155d9cf (unknown)
# 0x155f914 (unknown)
# 0x92d363 (unknown)
# 0x92daca (unknown)
# 0xaa4faa (unknown)
# 0xaa7130 (unknown)
# 0xca79b3 (unknown)
# 0x386be079d1 (unknown)
# 0x386bae8b6d (unknown)
11:13:58.940 AM INFO impala-server.cc:824
UnregisterQuery(): query_id=d4269ff898eb4e7:1866144af0d14a7
11:13:58.940 AM INFO impala-server.cc:893
Cancel(): query_id=d4269ff898eb4e7:1866144af0d14a7
11:13:59.935 AM INFO ClientCnxn.java:975
Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
11:13:59.935 AM WARN ClientCnxn.java:1102
Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
Java exception follows:
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
11:14:01.036 AM INFO ClientCnxn.java:975
Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
11:14:01.037 AM WARN ClientCnxn.java:1102
Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
Java exception follows:
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
11:14:02.138 AM INFO ClientCnxn.java:975
Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
11:14:02.138 AM WARN ClientCnxn.java:1102
Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
Java exception follows:
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
11:14:02.199 AM INFO impala-hs2-server.cc:795
GetSchemas(): request=TGetSchemasReq {
01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
01: guid (string) = "\xf8\xb9n\xe4\xb4\xf6N\xef\xad)9W.\x92#Y",
02: secret (string) = "\xc0?\xc7\xd9\x930C\x9b\xb5\xf6K\x8em\xcb\xf8\xe4",
},
},
}
11:14:02.203 AM INFO MetadataOp.java:414
Returning 19 schemas
It seems the hbase table B is either enabled nor disabled,very strange.I googled around,Is this related to the hbase security issues or the impala version problem?
Did anybody encountered the same problems?How to solve this?Thanks in advance.
Enable HBase service from Impala Configuration.
You can do it from Cloudera manager, Imapala->configuration
search from "Hbase" and enable the service.
PFA.

Resources