We're busy with a PoC where we produce message to a Kafka topic (now about 2 million, should in the end be around 130 million) which we like to do queries on via ElasticSearch. So a small PoC has been made which feeds data into ES via the confluent ElasticSearch Sink Connector (latest) and with connector 6.0.0. However we ran into a lot of timeout issues and eventually the tasks fail with the message that the task needs to be restarted:
ERROR WorkerSinkTask{id=transactions-elasticsearch-connector-3} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. Error: java.net.SocketTimeoutException: Read timed out (org.apache.kafka.connect.runtime.WorkerSinkTask)
My configuration for the sink connector is the following:
{
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"connection.url": "http://elasticsearch:9200",
"key.converter" : "org.apache.kafka.connect.storage.StringConverter",
"value.converter" : "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url" : "http://schema-registry:8081",
"topics": "transactions,trades",
"type.name": "transactions",
"tasks.max" : "4",
"batch.size" : "50",
"max.buffered.events" : "500",
"max.buffered.records" : "500",
"flush.timeout.ms" : "100000",
"linger.ms" : "50",
"max.retries" : "10",
"connection.timeout.ms" : "2000",
"name": "transactions-elasticsearch-connector",
"key.ignore": "true",
"schema.ignore": "false",
"transforms" : "ExtractTimestamp",
"transforms.ExtractTimestamp.type" : "org.apache.kafka.connect.transforms.InsertField\$Value",
"transforms.ExtractTimestamp.timestamp.field" : "MSG_TS"
}
Unfortunately even when not producing messages and starting up the Elasticsearch sink connector manually the tasks close and need to be restarted again. I've fiddled around with various batch size windows, retries etc but to no avail. Note that we only have one kafka broker, one elasticSearch connector and one ElasticSearch instance running in docker containers.
We also see a lot of these timeout messages:
[2020-12-08 13:23:34,107] WARN Failed to execute batch 100534 of 50 records with attempt 1/11, will attempt retry after 43 ms. Failure reason: Read timed out (io.confluent.connect.elasticsearch.bulk.BulkProcessor)
^[[36mconnect |^[[0m [2020-12-08 13:23:34,116] WARN Failed to execute batch 100536 of 50 records with attempt 1/11, will attempt retry after 18 ms. Failure reason: Read timed out (io.confluent.connect.elasticsearch.bulk.BulkProcessor)
^[[36mconnect |^[[0m [2020-12-08 13:23:34,132] WARN Failed to execute batch 100537 of 50 records with attempt 1/11, will attempt retry after 24 ms. Failure reason: Read timed out (io.confluent.connect.elasticsearch.bulk.BulkProcessor)
^[[36mconnect |^[[0m [2020-12-08 13:23:36,746] WARN Failed to execute batch 100539 of 50 records with attempt 1/11, will attempt retry after 0 ms. Failure reason: Read timed out (io.confluent.connect.elasticsearch.bulk.BulkProcessor)
^[[36mconnect |^[[0m [2020-12-08 13:23:37,139] WARN Failed to execute batch 100536 of 50 records with attempt 2/11, will attempt retry after 184 ms. Failure reason: Read timed out (io.confluent.connect.elasticsearch.bulk.BulkProcessor)
^[[36mconnect |^[[0m [2020-12-08 13:23:37,155] WARN Failed to execute batch 100534 of 50 records with attempt 2/11, will attempt retry after 70 ms. Failure reason: Read timed out (io.confluent.connect.elasticsearch.bulk.BulkProcessor)
^[[36mconnect |^[[0m [2020-12-08 13:23:37,160] WARN Failed to execute batch 100537 of 50 records with attempt 2/11, will attempt retry after 157 ms. Failure reason: Read timed out (io.confluent.connect.elasticsearch.bulk.BulkProcessor)
^[[36mconnect |^[[0m [2020-12-08 13:23:39,681] WARN Failed to execute batch 100540 of 50 records with attempt 1/11, will attempt retry after 12 ms. Failure reason: Read timed out (io.confluent.connect.elasticsearch.bulk.BulkProcessor)
^[[36mconnect |^[[0m [2020-12-08 13:23:39,750] WARN Failed to execute batch 100539 of 50 records with attempt 2/11, will attempt retry after 90 ms. Failure reason: Read timed out (io.confluent.connect.elasticsearch.bulk.BulkProcessor)
^[[36mconnect |^[[0m [2020-12-08 13:23:40,231] WARN Failed to execute batch 100534 of 50 records with attempt 3/11, will attempt retry after 204 ms. Failure reason: Read timed out (io.confluent.connect.elasticsearch.bulk.BulkProcessor)
^[[36mconnect |^[[0m [2020-12-08 13:23:40,322] WARN Failed to execute batch 100537 of 50 records with attempt 3/11, will attempt retry after 58 ms. Failure reason: Read timed out (io.confluent.connect.elasticsearch.bulk.BulkProcessor)
Any idea what we can improve to make the whole chain reliable? For our purposes it does not need to be blazingly fast as long as all messages are getting reliably into ElasticSearch without restarting every time the tasks of the connector.
Always can access UI until today!
show logs,why error of other server would make NIFI UI can not be accessed!?
1
2020-04-25 14:59:53,546 ERROR [Timer-Driven Process Thread-41] o.apache.nifi.processors.standard.PutSQL PutSQL[id=c1451287-2ee5-397e-9842-5e477212e954] Failed to update database due to a failed batch update, java.sql.BatchUpdateException: ORA-02291: integrity constraint (ISA.PHY_ASSET_CLASS_X_PHY_ASSET_MAPPING_FK1) violated - parent key not found
. There were a total of 1 FlowFiles that failed, 0 that succeeded, and 0 that were not execute and will be routed to retry; : java.sql.BatchUpdateException: ORA-02291: integrity constraint (ISA.PHY_ASSET_CLASS_X_PHY_ASSET_MAPPING_FK1) violated - parent key not found
java.sql.BatchUpdateException: ORA-02291: integrity constraint (ISA.PHY_ASSET_CLASS_X_PHY_ASSET_MAPPING_FK1) violated - parent key not found
2
IOError: java.nio.channels.ClosedByInterruptException
org.python.core.PyException: null
3
2020-04-25 14:59:58,537 ERROR [Timer-Driven Process Thread-94] o.apache.nifi.processors.standard.PutSQL PutSQL[id=d9733768-de43-17b7-0000-00007453720b] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object: org.apache.nifi.processor.exception.ProcessException: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object
org.apache.nifi.processor.exception.ProcessException: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object
4
ERROR [MasterListener-mymaster-[10.148.xxx.xx:26379]] redis.clients.jedis.JedisSentinelPool Lost connection to Sentinel at 10.xxx.xxx.xx:xxxx. Sleeping 5000ms and retrying.
redis.clients.jedis.exceptions.JedisConnectionException: java.net.ConnectException: Connection refused (Connection refused)
Maybe there are many other error...
ANY HELP IS APPRECIATED!
I'm trying to execute Oozie job with the help of
URL: https://www.safaribooksonline.com/library/view/apache-oozie/9781449369910/ch05.html
While executing
oozie job -run -config target/example/job.properties
Getting error as :
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 1 sec. Retry count = 1
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 2 sec. Retry count = 2
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 4 sec. Retry count = 3
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 8 sec. Retry count = 4
Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused (Connection refused)
Any idea; why connection is getting refused?
It is not able to connect to the Oozie server from Oozie client (command line). Find the Oozie server url and do one of the following:
Set (export) the Oozie server as environment variable export OOZIE_URL=http://hostname:11000/oozie
Use the -oozie parameter into the oozie command. oozie job -oozie http://hostname:11000/oozie -run -config target/example/job.properties
My build machine just started reporting the following:
:::: ERRORS
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/tomcat/7.0.50/tomcat-7.0.50-sources.jar
SERVER ERROR: Backend is unhealthy url=http://repo1.maven.org/maven2/org/grails/plugins/tomcat/7.0.50/tomcat-7.0.50-src.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/tomcat/7.0.50/tomcat-7.0.50-javadoc.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/apache/tomcat/embed/tomcat-embed-core/7.0.50/tomcat-embed-core-7.0.50.pom
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/apache/tomcat/embed/tomcat-embed-core/7.0.50/tomcat-embed-core-7.0.50.jar
SERVER ERROR: Backend is unhealthy url=http://repo1.maven.org/maven2/org/apache/tomcat/tomcat-catalina-ant/7.0.50/tomcat-catalina-ant-7.0.50.pom
SERVER ERROR: Backend is unhealthy url=http://repo1.maven.org/maven2/org/apache/tomcat/tomcat-catalina-ant/7.0.50/tomcat-catalina-ant-7.0.50.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/apache/tomcat/embed/tomcat-embed-jasper/7.0.50/tomcat-embed-jasper-7.0.50.pom
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/apache/tomcat/embed/tomcat-embed-jasper/7.0.50/tomcat-embed-jasper-7.0.50.jar
SERVER ERROR: Backend is unhealthy url=http://repo1.maven.org/maven2/org/apache/tomcat/embed/tomcat-embed-logging-log4j/7.0.50/tomcat-embed-logging-log4j-7.0.50.pom
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/apache/tomcat/embed/tomcat-embed-logging-log4j/7.0.50/tomcat-embed-logging-log4j-7.0.50.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/eclipse/jdt/core/compiler/ecj/3.7.2/ecj-3.7.2.pom
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/eclipse/jdt/core/compiler/ecj/3.7.2/ecj-3.7.2.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/release/2.2.1/release-2.2.1-sources.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/release/2.2.1/release-2.2.1-src.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/release/2.2.1/release-2.2.1-javadoc.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.pom
SERVER ERROR: Backend is unhealthy url=http://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/rest-client-builder/1.0.3/rest-client-builder-1.0.3-sources.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/rest-client-builder/1.0.3/rest-client-builder-1.0.3-src.jar
SERVER ERROR: Connection timed out url=http://repo1.maven.org/maven2/org/grails/plugins/rest-client-builder/1.0.3/rest-client-builder-1.0.3-javadoc.jar
I took those URLs and started to look for what happened. The server is reporting a 404 on those links. I remove parts of the path I find out that:
http://repo1.maven.org/maven2/org/grails/plugins/
This doesn't exist.
What gives? Is this a maven error or something to do with Grails? Are we experiencing an outage that will be fixed or is this something that I have to tell someone is broke?
I'm getting the following warning while running my mapreduce jobs under cd4.
java.io.IOException: Lease timeout of 0 seconds expired.
at org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:1700)
at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:652)
at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:604)
at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:411)
at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:436)
at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:70)
at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:297)
at java.lang.Thread.run(Thread.java:662)
Any idea what this means?