I have installed the HDP with Ambari 2.6.1. It mostly did everything automatically but Hive is unable to start properly.
I saw a post somewhere and I deleted the pid and killed the process as well in hope that I would restart it and it would work but now it's showing heartbeat lost on the machine.
Please guide me on what should I do?
I am listing the errors I got from ambari.
1.Hive Metastore:
Metastore on machine2.ambari.local failed (Traceback (mos&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL'' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://machine2.ambari.local:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) Error: Could not open client transport with JDBC Uri: jdbc:hive2://machine2.ambari.local:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) )" data-bindattr-40693="40693"> Connection failed on host machine2.ambari.local:10000 (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute ldap_password=ldap_password) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl timeou... t recent call last): File "/var/lib/ambari-agent/cache/common-servics/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE, File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__ self.env.run() File "/
Hive Server2:
awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL'' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://machine2.ambari.local:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) Error: Could not open client transport with JDBC Uri: jdbc:hive2://machine2.ambari.local:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) )" data-bindattr-40693="40693"> Connection failed on host machine2.ambari.local:10000 (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute ldap_password=ldap_password) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl timeou...
There are following three different issues in this question
Hiveserver2 alert :Following error indicate port 10000 on machine2.ambari.local is not reachable, either there is no process (HiveServer2) running on port 10000 or some proxy issue.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://machine2.ambari.local:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused)
Heartbeat Lost : In question it is mentioned that "heartbeat lost on the machine", this could be because ambari-agent process is no more running on host for which heartbeat lost is notified.
After deleting the pid and killing the process , by default it will not restart automatically.
Related
I'm trying to startup OBIEE 12c
Database is up and running
But running the start.cmd fails when starting Admin Server with this error message.
"""""
Starting AdminServer ...
Unable to connect to AdminServer on host: ***********
Failed to start one or more Servers
/Servers/AdminServer/ListenPort=9500
Accessing admin server using URL t3://*************
Start Admin Server connect Exception caught Error occurred while performing connect : Error getting the initial context. There is no server running at t3://************* : Failed to initialize JNDI context, tried 2 time or times totally, the interval of each time is 0ms.
t3://*************: Destination **********, 9500 unreachable.; nested exception is:
java.net.ConnectException: Connection refused: connect; No available router to destination.; nested exception is:
java.rmi.ConnectException: No available router to destination.
Use dumpStack() to view the full stacktrace :
Reading domain...
Error: runCmd() failed. Do dumpStack() to see details.
Failed to get Status of Servers and System Components
"""""
Start.cmd error message
Also, trying to check the status come up with this error.
"""""
Start Admin Server connect Exception caught Error occurred while performing connect : Error getting the initial context. There is no server running at t3://************* : Failed to initialize JNDI context, tried 2 time or times totally, the interval of each time is 0ms.
t3://*************: Destination 10.10.3.88, 9500 unreachable.; nested exception is:
java.net.ConnectException: Connection refused: connect; No available router to destination.; nested exception is:
java.rmi.ConnectException: No available router to destination.
Use dumpStack() to view the full stacktrace :
Reading domain...
Error: runCmd() failed. Do dumpStack() to see details.
Problem invoking WLST - Traceback (innermost last):
File "F:\Middleware\bi\modules\oracle.bi.sysman\scripts\status_servers.py", line 29, in ?
File "F:\Middleware\bi\modules\oracle.bi.sysman\scripts\process_control.py", line 581, in statusComponents
File "F:\Middleware\bi\modules\oracle.bi.sysman\scripts\process_control.py", line 455, in outputComponentsStatus
File "F:\Middleware\bi\modules\oracle.bi.sysman\scripts\process_control.py", line 243, in connectAdminServer
File "F:\Middleware\bi\modules\oracle.bi.sysman\scripts\process_control.py", line 179, in requestCredentialsAndConnectToAdminServer
File "F:\Middleware\bi\modules\oracle.bi.sysman\scripts\process_control.py", line 513, in getAdminServerUrl
File "C:\Users\Administrator\AppData\Local\Temp\1\WLSTOfflineIni5666259588766029647.py", line 131, in readDomain
File "C:\Users\Administrator\AppData\Local\Temp\1\WLSTOfflineIni5666259588766029647.py", line 19, in command
60713: Attempt to execute command "readDomain" in invalid state: Configuration
"""""
STATUS ERROR MESSAGE
Im trying to configure local logstash config with spring boot in yml format.
Here it is:
appenders:
logstash:
enabled: true
input-type: tcp
destination: 127.0.0.1:4560
and after running app I catch in console:
WARN in net.logstash.logback.appender.LogstashTcpSocketAppender[logstash] - Log destination 127.0.0.1:5010: connection failed. java.net.ConnectException: Connection refused (Connection refused)
at java.net.ConnectException: Connection refused (Connection refused)
I checked all listeners and the port is not in use. I seems to be easy to fix but I dk how.
we have some issue with ambari-metrics-collector service , ( we have HDP cluster version - 2.6.4 with 8 nodes )
ambari metrics collector service can’t start or start of few second then failed
the details about metrics collector version
rpm -qa | grep metrics
ambari-metrics-grafana-2.6.1.0-143.x86_64
ambari-metrics-monitor-2.6.1.0-143.x86_64
ambari-metrics-collector-2.5.0.3-7.x86_64
ambari-metrics-hadoop-sink-2.6.1.0-143.x86_64
all machines are rhel 7.2
we performed the following steps in order to resolve the problem
1.restart metrics-collector service
su - ams -c '/usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ stop'
su - ams -c '/usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ start'
or
ambari-metrics-collector stop
ambari-metrics-collector start
2.restart ambari-metrics-monitor on all nodes
ambari-metrics-monitor stop
ambari-metrics-monitor start
3.clean the folder /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/
mv /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/zookeeper_0 /tmp/bck/zookeeper/
Then restart metrics-collector service
4.Tuning the metrics-collector parameters according - https://docs.cloudera.com/HDPDocuments/Ambari-2.2.1.0/bk_ambari_reference_guide/content/_ams_general_guidelines.html
we update the follwing parameters in ambari
metrics_collector_heap_size=1024
hbase_regionserver_heapsize=1024
hbase_master_heapsize=512
hbase_master_xmn_size=128
status for now: - steps 1-4 doesn’t help
From the logs we can see the following:
log file - ambari-metrics-collector.log
2020-06-25 09:06:14,474 WARN org.apache.zookeeper.ClientCnxn: Session 0x172eab71f310002 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2020-06-25 09:06:14,575 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=master02.sys671.com:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure/meta-region-server
log file - hbase-ams-master-master02.sys671.com.log
2020-06-25 09:38:18,799 WARN [RS:0;master02:51842-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Session 0x172ead5d73a0004 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2020-06-25 09:38:20,437 INFO [main-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Opening socket connection to server master02.sys671.com/23.2.35.171:61181. Will not attempt to authenticate using SASL (unknown error)
2020-06-25 09:38:20,438 WARN [main-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Session 0x172ead5d73a0002 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
we also not see that port is listening ( timeline.metrics.service.webapp.address )
netstat -tulpn | grep 6188
any advice how to continue from this point ?
we'll appreciate to get any help about this problem
Using the kafka-lenses-dev image, I ran into problems connecting Kafka to the local Elasticsearch 7.1.1 instance. As you can see, the connection is refused, although the instance is up and running and it's possible to curl docs in. Is the version of Elastic problematic for the connector or is the config wrong?
Config
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
type.name=kafka-connect
key.converter.schemas.enable=false
topics=cc_data
name=elastic-sink
value.converter.schemas.enable=false
connection.url=http://localhost:9200
key.ignore=true
Error
org.apache.kafka.connect.errors.ConnectException: Couldn't start ElasticsearchSinkTask due to connection error:
at io.confluent.connect.elasticsearch.jest.JestElasticsearchClient.<init>(JestElasticsearchClient.java:147)
at io.confluent.connect.elasticsearch.jest.JestElasticsearchClient.<init>(JestElasticsearchClient.java:114)
at io.confluent.connect.elasticsearch.ElasticsearchSinkTask.start(ElasticsearchSinkTask.java:120)
...
Caused by: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://localhost:9200
at io.searchbox.client.http.JestHttpClient.execute(JestHttpClient.java:60)
at io.confluent.connect.elasticsearch.jest.JestElasticsearchClient.getServerVersion(JestElasticsearchClient.java:168)
at io.confluent.connect.elasticsearch.jest.JestElasticsearchClient.<init>(JestElasticsearchClient.java:145)
Caused by: org.apache.http.conn.HttpHostConnectException: Connect to localhost:9200 [localhost/127.0.0.1] failed: Connection refused (Connection refused)
...
Caused by: java.net.ConnectException: Connection refused (Connection refused)
...
You've specified Elasticsearch as being at localhost:9200. Unless Elasticsearch is running on the same container as Kafka Connect then this hostname is wrong. You need to specify the hostname of Elasticsearch in a form that is reachable from where Kafka Connect is running.
As an example, you can see an example of how to configure this based on this Docker Compose here.
My build machine just started reporting the following:
:::: ERRORS
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/tomcat/7.0.50/tomcat-7.0.50-sources.jar
SERVER ERROR: Backend is unhealthy url=http://repo1.maven.org/maven2/org/grails/plugins/tomcat/7.0.50/tomcat-7.0.50-src.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/tomcat/7.0.50/tomcat-7.0.50-javadoc.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/apache/tomcat/embed/tomcat-embed-core/7.0.50/tomcat-embed-core-7.0.50.pom
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/apache/tomcat/embed/tomcat-embed-core/7.0.50/tomcat-embed-core-7.0.50.jar
SERVER ERROR: Backend is unhealthy url=http://repo1.maven.org/maven2/org/apache/tomcat/tomcat-catalina-ant/7.0.50/tomcat-catalina-ant-7.0.50.pom
SERVER ERROR: Backend is unhealthy url=http://repo1.maven.org/maven2/org/apache/tomcat/tomcat-catalina-ant/7.0.50/tomcat-catalina-ant-7.0.50.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/apache/tomcat/embed/tomcat-embed-jasper/7.0.50/tomcat-embed-jasper-7.0.50.pom
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/apache/tomcat/embed/tomcat-embed-jasper/7.0.50/tomcat-embed-jasper-7.0.50.jar
SERVER ERROR: Backend is unhealthy url=http://repo1.maven.org/maven2/org/apache/tomcat/embed/tomcat-embed-logging-log4j/7.0.50/tomcat-embed-logging-log4j-7.0.50.pom
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/apache/tomcat/embed/tomcat-embed-logging-log4j/7.0.50/tomcat-embed-logging-log4j-7.0.50.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/eclipse/jdt/core/compiler/ecj/3.7.2/ecj-3.7.2.pom
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/eclipse/jdt/core/compiler/ecj/3.7.2/ecj-3.7.2.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/release/2.2.1/release-2.2.1-sources.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/release/2.2.1/release-2.2.1-src.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/release/2.2.1/release-2.2.1-javadoc.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.pom
SERVER ERROR: Backend is unhealthy url=http://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/rest-client-builder/1.0.3/rest-client-builder-1.0.3-sources.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/rest-client-builder/1.0.3/rest-client-builder-1.0.3-src.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/rest-client-builder/1.0.3/rest-client-builder-1.0.3-javadoc.jar
I took those URLs and started to look for what happened. The server is reporting a 404 on those links. I remove parts of the path I find out that:
http://repo1.maven.org/maven2/org/grails/plugins/
This doesn't exist.
What gives? Is this a maven error or something to do with Grails? Are we experiencing an outage that will be fixed or is this something that I have to tell someone is broke?