Why so many connections are used by Spring reactive with Mongo - spring

I got the exception 'MongoWaitQueueFullException' and I realize the number of connections that my application is using. I use the default configuration of Spring boot (2.2.7.RELEASE) with reactive MongoDB (4.2.8). Transactions are used.
Even when running an integration test that basically creates a bit more than 200 elements then groups them (200 groups). 10 connections are used. When this algorithm is executed over a real data-set, this exception is thrown. The default limit of the waiting queue (500) was reached. This does not make the application scalable.
My question is: is there a way to design a reactive application that helps to reduce the number of connections?
This is the output of my test. Basically, it scans all translations of bundle files and them group them per translation key. An element is persisted per translation key.
return Flux
.fromIterable(bundleFile.getFiles())
.map(ScannedBundleFileEntry::getLocale)
.flatMap(locale ->
handler
.scanTranslations(bundleFileEntity.toLocation(), locale, context)
.index()
.map(indexedTranslation ->
createTranslation(
workspaceEntity,
bundleFileEntity,
locale.getId(),
indexedTranslation.getT1(), // index
indexedTranslation.getT2().getKey(), // bundle key
indexedTranslation.getT2().getValue() // translation
)
)
.flatMap(bundleKeyTemporaryRepository::save)
)
.thenMany(groupIntoBundleKeys(bundleFileEntity))
.then(bundleKeyTemporaryRepository.deleteByBundleFile(bundleFileEntity.getId()))
.then(Mono.just(bundleFileEntity));
The grouping function:
private Flux<BundleKeyEntity> groupIntoBundleKeys(BundleFileEntity bundleFile) {
return this
.findBundleKeys(bundleFile)
.groupBy(BundleKeyGroupKey::new)
.flatMap(bundleKeyGroup ->
bundleKeyGroup
.collectList()
.map(bundleKeys -> {
final BundleKeyGroupKey key = bundleKeyGroup.key();
final BundleKeyEntity entity = new BundleKeyEntity(key.getWorkspace(), key.getBundleFile(), key.getKey());
bundleKeys.forEach(entity::mergeInto);
return entity;
})
)
.flatMap(bundleKeyEntityRepository::save);
}
The test output:
560 [main] INFO o.s.b.t.c.SpringBootTestContextBootstrapper - Neither #ContextConfiguration nor #ContextHierarchy found for test class [be.sgerard.i18n.controller.TranslationControllerTest], using SpringBootContextLoader
569 [main] INFO o.s.t.c.s.AbstractContextLoader - Could not detect default resource locations for test class [be.sgerard.i18n.controller.TranslationControllerTest]: no resource found for suffixes {-context.xml, Context.groovy}.
870 [main] INFO o.s.b.t.c.SpringBootTestContextBootstrapper - Loaded default TestExecutionListener class names from location [META-INF/spring.factories]: [org.springframework.boot.test.mock.mockito.MockitoTestExecutionListener, org.springframework.boot.test.mock.mockito.ResetMocksTestExecutionListener, org.springframework.boot.test.autoconfigure.restdocs.RestDocsTestExecutionListener, org.springframework.boot.test.autoconfigure.web.client.MockRestServiceServerResetTestExecutionListener, org.springframework.boot.test.autoconfigure.web.servlet.MockMvcPrintOnlyOnFailureTestExecutionListener, org.springframework.boot.test.autoconfigure.web.servlet.WebDriverTestExecutionListener, org.springframework.test.context.web.ServletTestExecutionListener, org.springframework.test.context.support.DirtiesContextBeforeModesTestExecutionListener, org.springframework.test.context.support.DependencyInjectionTestExecutionListener, org.springframework.test.context.support.DirtiesContextTestExecutionListener, org.springframework.test.context.transaction.TransactionalTestExecutionListener, org.springframework.test.context.jdbc.SqlScriptsTestExecutionListener, org.springframework.test.context.event.EventPublishingTestExecutionListener, org.springframework.security.test.context.support.WithSecurityContextTestExecutionListener, org.springframework.security.test.context.support.ReactorContextTestExecutionListener]
897 [main] INFO o.s.b.t.c.SpringBootTestContextBootstrapper - Using TestExecutionListeners: [org.springframework.test.context.support.DirtiesContextBeforeModesTestExecutionListener#4372b9b6, org.springframework.boot.test.mock.mockito.MockitoTestExecutionListener#232a7d73, org.springframework.boot.test.autoconfigure.SpringBootDependencyInjectionTestExecutionListener#4b41e4dd, org.springframework.test.context.support.DirtiesContextTestExecutionListener#22ffa91a, org.springframework.test.context.transaction.TransactionalTestExecutionListener#74960bfa, org.springframework.test.context.jdbc.SqlScriptsTestExecutionListener#42721fe, org.springframework.test.context.event.EventPublishingTestExecutionListener#40844aab, org.springframework.security.test.context.support.WithSecurityContextTestExecutionListener#1f6c9cd8, org.springframework.security.test.context.support.ReactorContextTestExecutionListener#5b619d14, org.springframework.boot.test.mock.mockito.ResetMocksTestExecutionListener#66746f57, org.springframework.boot.test.autoconfigure.restdocs.RestDocsTestExecutionListener#447a020, org.springframework.boot.test.autoconfigure.web.client.MockRestServiceServerResetTestExecutionListener#7f36662c, org.springframework.boot.test.autoconfigure.web.servlet.MockMvcPrintOnlyOnFailureTestExecutionListener#28e8dde3, org.springframework.boot.test.autoconfigure.web.servlet.WebDriverTestExecutionListener#6d23017e]
1551 [background-preinit] INFO o.h.v.i.x.c.ValidationBootstrapParameters - HV000006: Using org.hibernate.validator.HibernateValidator as validation provider.
1677 [main] INFO b.s.i.c.TranslationControllerTest - Starting TranslationControllerTest on sgerard with PID 538 (started by sgerard in /home/sgerard/sandboxes/github-oauth/server)
1678 [main] INFO b.s.i.c.TranslationControllerTest - The following profiles are active: test
3250 [main] INFO o.s.d.r.c.RepositoryConfigurationDelegate - Bootstrapping Spring Data Reactive MongoDB repositories in DEFAULT mode.
3747 [main] INFO o.s.d.r.c.RepositoryConfigurationDelegate - Finished Spring Data repository scanning in 493ms. Found 9 Reactive MongoDB repository interfaces.
5143 [main] INFO o.s.c.s.PostProcessorRegistrationDelegate$BeanPostProcessorChecker - Bean 'org.springframework.security.config.annotation.method.configuration.ReactiveMethodSecurityConfiguration' of type [org.springframework.security.config.annotation.method.configuration.ReactiveMethodSecurityConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
5719 [main] INFO org.mongodb.driver.cluster - Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
5996 [cluster-ClusterId{value='5f42490f1c60f43aff9d7d46', description='null'}-localhost:27017] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:1, serverValue:4337}] to localhost:27017
6010 [cluster-ClusterId{value='5f42490f1c60f43aff9d7d46', description='null'}-localhost:27017] INFO org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 2, 8]}, minWireVersion=0, maxWireVersion=8, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=12207332, setName='rs0', canonicalAddress=4802c4aff450:27017, hosts=[4802c4aff450:27017], passives=[], arbiters=[], primary='4802c4aff450:27017', tagSet=TagSet{[]}, electionId=7fffffff0000000000000013, setVersion=1, lastWriteDate=Sun Aug 23 12:46:30 CEST 2020, lastUpdateTimeNanos=384505436362981}
6019 [main] INFO org.mongodb.driver.cluster - Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
6040 [cluster-ClusterId{value='5f42490f1c60f43aff9d7d47', description='null'}-localhost:27017] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:2, serverValue:4338}] to localhost:27017
6042 [cluster-ClusterId{value='5f42490f1c60f43aff9d7d47', description='null'}-localhost:27017] INFO org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 2, 8]}, minWireVersion=0, maxWireVersion=8, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=1727974, setName='rs0', canonicalAddress=4802c4aff450:27017, hosts=[4802c4aff450:27017], passives=[], arbiters=[], primary='4802c4aff450:27017', tagSet=TagSet{[]}, electionId=7fffffff0000000000000013, setVersion=1, lastWriteDate=Sun Aug 23 12:46:30 CEST 2020, lastUpdateTimeNanos=384505468960066}
7102 [nioEventLoopGroup-2-2] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:3, serverValue:4339}] to localhost:27017
11078 [main] INFO o.s.b.a.e.web.EndpointLinksResolver - Exposing 1 endpoint(s) beneath base path ''
11158 [main] INFO o.h.v.i.x.c.ValidationBootstrapParameters - HV000006: Using org.hibernate.validator.HibernateValidator as validation provider.
11720 [main] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:4, serverValue:4340}] to localhost:27017
12084 [main] INFO o.s.s.c.ThreadPoolTaskScheduler - Initializing ExecutorService 'taskScheduler'
12161 [main] INFO b.s.i.c.TranslationControllerTest - Started TranslationControllerTest in 11.157 seconds (JVM running for 13.532)
20381 [nioEventLoopGroup-2-3] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:5, serverValue:4341}] to localhost:27017
20408 [nioEventLoopGroup-2-2] INFO b.s.i.s.w.WorkspaceManagerImpl - Synchronize, there is no workspace for the branch [master], let's create it.
20416 [nioEventLoopGroup-2-3] INFO b.s.i.s.w.WorkspaceManagerImpl - The workspace [master] alias [e3cea374-0d37-4c57-bdbf-8bd14d279c12] has been created.
20421 [nioEventLoopGroup-2-3] INFO b.s.i.s.w.WorkspaceManagerImpl - Initializing workspace [master] alias [e3cea374-0d37-4c57-bdbf-8bd14d279c12].
20525 [nioEventLoopGroup-2-2] INFO b.s.i.s.i18n.TranslationManagerImpl - A bundle file has been found located in [server/src/main/resources/i18n] named [exception] with 2 file(s).
20812 [nioEventLoopGroup-2-4] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:6, serverValue:4342}] to localhost:27017
21167 [nioEventLoopGroup-2-8] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:10, serverValue:4345}] to localhost:27017
21167 [nioEventLoopGroup-2-6] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:8, serverValue:4344}] to localhost:27017
21393 [nioEventLoopGroup-2-5] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:7, serverValue:4343}] to localhost:27017
21398 [nioEventLoopGroup-2-7] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:9, serverValue:4346}] to localhost:27017
21442 [nioEventLoopGroup-2-2] INFO b.s.i.s.i18n.TranslationManagerImpl - A bundle file has been found located in [server/src/main/resources/i18n] named [validation] with 2 file(s).
21503 [nioEventLoopGroup-2-2] INFO b.s.i.s.i18n.TranslationManagerImpl - A bundle file has been found located in [server/src/test/resources/be/sgerard/i18n/service/i18n/file] named [file] with 2 file(s).
21621 [nioEventLoopGroup-2-2] INFO b.s.i.s.i18n.TranslationManagerImpl - A bundle file has been found located in [front/src/main/web/src/assets/i18n] named [i18n] with 2 file(s).
22745 [SpringContextShutdownHook] INFO o.s.s.c.ThreadPoolTaskScheduler - Shutting down ExecutorService 'taskScheduler'
22763 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:4, serverValue:4340}] to localhost:27017 because the pool has been closed.
22766 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:9, serverValue:4346}] to localhost:27017 because the pool has been closed.
22767 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:6, serverValue:4342}] to localhost:27017 because the pool has been closed.
22768 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:8, serverValue:4344}] to localhost:27017 because the pool has been closed.
22768 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:5, serverValue:4341}] to localhost:27017 because the pool has been closed.
22769 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:10, serverValue:4345}] to localhost:27017 because the pool has been closed.
22770 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:7, serverValue:4343}] to localhost:27017 because the pool has been closed.
22776 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:3, serverValue:4339}] to localhost:27017 because the pool has been closed.
Process finished with exit code 0

Spring Reactive is asynchronous. Imagine you have 3 items in your dataset. It opens a connection for the save of the first item. But it won't wait for it to finish and use for the second save. Instead, it opens a second connection as soon as possible. Thus you'll end up overloading all the possible connections in the pool.

Related

Unable to execute import-hive.sh

I am getting below error while running import-hive.sh
Could you please help me out on this?
hadoop#0.0.0.0:~/apache-atlas-2.1.0/hook/apache-atlas-hive-hook-2.1.0/hook-bin$ ./import-hive.sh
Using Hive configuration directory [/home/hadoop/hive/conf]
Log file for import is /home/hadoop/apache-atlas-2.1.0/hook/apache-atlas-hive-hook-2.1.0/logs/import-hive.log
2021-07-13T15:43:21,449 INFO [main] org.apache.atlas.ApplicationProperties - Looking for atlas-application.properties in classpath
2021-07-13T15:43:21,452 INFO [main] org.apache.atlas.ApplicationProperties - Loading atlas-application.properties from file:/home/hadoop/hive/conf/atlas-application.properties
2021-07-13T15:43:21,505 INFO [main] org.apache.atlas.ApplicationProperties - Using graphdb backend 'janus'
2021-07-13T15:43:21,505 INFO [main] org.apache.atlas.ApplicationProperties - Using storage backend 'hbase2'
2021-07-13T15:43:21,506 INFO [main] org.apache.atlas.ApplicationProperties - Using index backend 'solr'
2021-07-13T15:43:21,506 INFO [main] org.apache.atlas.ApplicationProperties - Atlas is running in MODE: PROD.
2021-07-13T15:43:21,506 INFO [main] org.apache.atlas.ApplicationProperties - Setting solr-wait-searcher property 'true'
2021-07-13T15:43:21,506 INFO [main] org.apache.atlas.ApplicationProperties - Setting index.search.map-name property 'false'
2021-07-13T15:43:21,506 INFO [main] org.apache.atlas.ApplicationProperties - Setting atlas.graph.index.search.max-result-set-size = 150
2021-07-13T15:43:21,506 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.db-cache = true
2021-07-13T15:43:21,506 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.db-cache-clean-wait = 20
2021-07-13T15:43:21,506 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.db-cache-size = 0.5
2021-07-13T15:43:21,506 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-cache-size = 15000
2021-07-13T15:43:21,506 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-dirty-size = 120
Enter username for atlas :- admin
Enter password for atlas :-
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/security/authentication/client/ConnectionConfigurator
at org.apache.atlas.AtlasBaseClient.getClient(AtlasBaseClient.java:287)
at org.apache.atlas.AtlasBaseClient.initializeState(AtlasBaseClient.java:454)
at org.apache.atlas.AtlasBaseClient.initializeState(AtlasBaseClient.java:449)
at org.apache.atlas.AtlasBaseClient.<init>(AtlasBaseClient.java:132)
at org.apache.atlas.AtlasClientV2.<init>(AtlasClientV2.java:94)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:134)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.security.authentication.client.ConnectionConfigurator
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 6 more
Failed to import Hive Meta Data!!!

How to solve MQJE001: Completion Code '2', Reason '2085'

I am writing to an MQ queue from Java and I am intermittently get the error response below. I am using IBM MQ version 9.
What could be the cause of this as its intermittent and the queue / queue manager being written to exists and was running during this time.
[INFO ] 2020-06-13 22:48:03.752+0300 [main] [e5643f16-94ea-436f-ad71-54bee1c91381] MQWriteFile - Finished establishing a connection to DB
[INFO ] 2020-06-13 22:48:03.752+0300 [main] [e5643f16-94ea-436f-ad71-54bee1c91381] MQWriteFile - init
[INFO ] 2020-06-13 22:48:03.758+0300 [main] [e5643f16-94ea-436f-ad71-54bee1c91381] MQWriteFile - 5. Before calling write.selectQMgr()
[INFO ] 2020-06-13 22:48:03.864+0300 [main] [e5643f16-94ea-436f-ad71-54bee1c91381] MQWriteFile - 6. After selecting Queue Manager name
[DEBUG] 2020-06-13 22:48:03.876+0300 [main] [e5643f16-94ea-436f-ad71-54bee1c91381] MQWriteFile - ReasonCode:2085
[DEBUG] 2020-06-13 22:48:03.877+0300 [main] [e5643f16-94ea-436f-ad71-54bee1c91381] MQWriteFile - Completion Code:2
[ERROR] 2020-06-13 22:48:03.877+0300 [main] [e5643f16-94ea-436f-ad71-54bee1c91381] MQWriteFile - Message:MQJE001: Completion Code '2', Reason '2085'.
com.ibm.mq.MQException: MQJE001: Completion Code '2', Reason '2085'
at com.ibm.mq.MQDestination.open(MQDestination.java:322) ~[com.ibm.mq.jar:9.0.0.5 - p900-005-180821]
at com.ibm.mq.MQQueue.<init>(MQQueue.java:236) ~[com.ibm.mq.jar:9.0.0.5 - p900-005-180821]
at com.ibm.mq.MQQueueManager.accessQueue(MQQueueManager.java:3288) ~[com.ibm.mq.jar:9.0.0.5 - p900-005-180821]
at custom.MQWriteFile.write(MQWriteFile.java:364) ~[PGPEncryptedSOAPWMQWriter.jar:?]
at custom.MQWriteFile.<init>(MQWriteFile.java:221) [PGPEncryptedSOAPWMQWriter.jar:?]
at custom.PGPEncryptedSOAPWMQWriter.main(PGPEncryptedSOAPWMQWriter.java:69) [PGPEncryptedSOAPWMQWriter.jar:?]
[INFO ] 2020-06-13 22:48:03.879+0300 [main] [e5643f16-94ea-436f-ad71-54bee1c91381] MQWriteFile - LogStatusInDB
[DEBUG] 2020-06-13 22:48:03.911+0300 [main] [e5643f16-94ea-436f-ad71-54bee1c91381] MQWriteFile - Reason Code Desc:MQRC_UNKNOWN_OBJECT_NAME
[DEBUG] 2020-06-13 22:48:03.911+0300 [main] [e5643f16-94ea-436f-ad71-54bee1c91381] MQWriteFile - Completion Code Desc:MQCC_FAILED
[DEBUG] 2020-06-13 22:48:03.911+0300 [main] [e5643f16-94ea-436f-ad71-54bee1c91381] MQWriteFile - Returning with:3
Most likely the cause will be logic flow related with variables or objects falling out of scope, then coming back into scope with reset / default values.
The traces that you are running, will tell you which values your code is actually using. You will most likely need to add logging into your application to determine why the values are being lost.

Zuul Gateway not forwarding call to Eureka registered Instance

I have spent days on this simple issue , I am giving up and finally posting this issue which I am facing locally. I am trying to set up a microservices flow in my local for my hand itching learning purpose. This is no brainer. I have Eureka , Zuul Gateway , Simple Microservice. When I try to reach to the underlying service with the "url route" its working. But when I try to do serviceId look up its not working. Guys help me fixing it.
Git hub link is Git hub source code link
I have also raised an issue Git hut Issue link
Eureka Screenshot
Zuul Gateway logs
2019-10-06 11:11:24.611 INFO 26980 --- [nio-2020-exec-4] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring DispatcherServlet 'dispatcherServlet'
2019-10-06 11:11:24.611 INFO 26980 --- [nio-2020-exec-4] o.s.web.servlet.DispatcherServlet : Initializing Servlet 'dispatcherServlet'
2019-10-06 11:11:24.633 INFO 26980 --- [nio-2020-exec-4] o.s.web.servlet.DispatcherServlet : Completed initialization in 22 ms
2019-10-06 11:11:25.103 INFO 26980 --- [nio-2020-exec-4] c.netflix.config.ChainedDynamicProperty : Flipping property: CHECKOUT-SERVICE.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2019-10-06 11:11:25.157 INFO 26980 --- [nio-2020-exec-4] c.n.u.concurrent.ShutdownEnabledTimer : Shutdown hook installed for: NFLoadBalancer-PingTimer-CHECKOUT-SERVICE
2019-10-06 11:11:25.157 INFO 26980 --- [nio-2020-exec-4] c.netflix.loadbalancer.BaseLoadBalancer : Client: CHECKOUT-SERVICE instantiated a LoadBalancer: DynamicServerListLoadBalancer:{NFLoadBalancer:name=CHECKOUT-SERVICE,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null
2019-10-06 11:11:25.167 INFO 26980 --- [nio-2020-exec-4] c.n.l.DynamicServerListLoadBalancer : Using serverListUpdater PollingServerListUpdater
2019-10-06 11:11:25.215 INFO 26980 --- [nio-2020-exec-4] c.netflix.config.ChainedDynamicProperty : Flipping property: CHECKOUT-SERVICE.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2019-10-06 11:11:25.218 INFO 26980 --- [nio-2020-exec-4] c.n.l.DynamicServerListLoadBalancer : DynamicServerListLoadBalancer for client CHECKOUT-SERVICE initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=CHECKOUT-SERVICE,current list of Servers=[192.168.0.6:8098],Load balancer stats=Zone stats: {defaultzone=[Zone:defaultzone; Instance count:1; Active connections count: 0; Circuit breaker tripped count: 0; Active connections per server: 0.0;]
},Server stats: [[Server:192.168.0.6:8098; Zone:defaultZone; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Wed Dec 31 19:00:00 EST 1969; First connection made: Wed Dec 31 19:00:00 EST 1969; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]
]}ServerList:org.springframework.cloud.netflix.ribbon.eureka.DomainExtractingServerList#6f7f7ca0
2019-10-06 11:11:26.177 INFO 26980 --- [erListUpdater-0] c.netflix.config.ChainedDynamicProperty : Flipping property: CHECKOUT-SERVICE.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
Never mind guys it was a mistake from my side in resolving the API path

Problem in Flink UI on Mesos cluster with two slave nodes

I have four physical nodes with docker installed on each of them. I configured Mesos,Flink,Zookeeper,Hadoop and Marathon on docker of each one. I had already had three nodes,one slave and two masters, that I had run Flink on Marathon and its UI had been run without any problems. After that, I changed the cluster,two masters and two slaves. I added this Json file in Marathon, it was ran, but Flink UI was not shown in both slave nodes. The error is in following.
{
"id": "flink",
"cmd": "/home/flink-1.7.2/bin/mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Drest.port=8081 -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2 -Dmesos.resourcemanager.tasks.cpus=1",
"cpus": 1.0,
"mem": 1024,
"instances": 2
}
Error:
Service temporarily unavailable due to an ongoing leader election. Please refresh
I cleared Zookeeper contents with this commands:
/home/zookeeper-3.4.14/bin/zkCleanup.sh /var/lib/zookeeper/data/ -n 10
rm -rf /var/lib/zookeeper/data/version-2
rm /var/lib/zookeeper/data/zookeeper_server.pid
Also, I ran this command and delete Flink contents in Zookeeper:
/home/zookeeper-3.4.14/bin/zkCli.sh
delete /flink/default/leader/....
But still one of Flink UI has problem.
I have configured Flink high availability like this:
high-availability: zookeeper
high-availability.storageDir: hdfs:///flink/ha/
high-availability.zookeeper.quorum: 0.0.0.0:2181,10.32.0.3:2181,10.32.0.4:2181,10.32.0.5:2181
fs.hdfs.hadoopconf: /opt/hadoop/etc/hadoop
fs.hdfs.hdfssite: /opt/hadoop/etc/hadoop/hdfs-site.xml
recovery.zookeeper.path.mesos-workers: /mesos-workers
env.java.home: /opt/java
mesos.master: 10.32.0.2:5050,10.32.0.3:5050
Because I used Mesos cluster, I did not change any thing in flink-conf.yaml.
This is part of slave log which has error:
- Remote connection to [null] failed with java.net.ConnectException:
Connection refused: localhost/127.0.0.1:37797
2019-07-03 07:22:42,922 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink#localhost:37797] has failed, address is now gated for [50] ms.
Reason: [Association failed with [akka.tcp://flink#localhost:37797]]
Caused by: [Connection refused: localhost/127.0.0.1:37797]
2019-07-03 07:22:43,003 WARN akka.remote.transport.netty.NettyTransport
- Remote connection to [null] failed with java.net.ConnectException:
Connection refused: localhost/127.0.0.1:37797
2019-07-03 07:22:43,004 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink#localhost:37797]
has failed, address is now gated for [50] ms.
Reason: [Association failed with [akka.tcp://flink#localhost:37797]]
Caused by: [Connection refused: localhost/127.0.0.1:37797]
2019-07-03 07:22:43,072 WARN akka.remote.transport.netty.NettyTransport
- Remote connection to [null] failed with java.net.ConnectException:
Connection refused: localhost/127.0.0.1:37797
2019-07-03 07:22:43,073 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink#localhost:37797]
has failed, address is now gated for [50] ms.
Reason: [Association failed with [akka.tcp://flink#localhost:37797]]
Caused by: [Connection refused: localhost/127.0.0.1:37797]
2019-07-03 07:23:45,891 WARN
org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever
- Error while retrieving the leader gateway. Retrying to connect to
akka.tcp://flink#localhost:37797/user/dispatcher.
This is Zookeeper log for the node that has the error in Flink UI:
2019-07-03 09:43:33,425 [myid:] - INFO [main:QuorumPeerConfig#136] - Reading configuration from: /home/zookeeper-3.4.14/bin/../conf/zoo.cfg
2019-07-03 09:43:33,434 [myid:] - INFO [main:QuorumPeer$QuorumServer#185] - Resolved hostname: 0.0.0.0 to address: /0.0.0.0
2019-07-03 09:43:33,435 [myid:] - INFO [main:QuorumPeer$QuorumServer#185] - Resolved hostname: 10.32.0.3 to address: /10.32.0.3
2019-07-03 09:43:33,435 [myid:] - INFO [main:QuorumPeer$QuorumServer#185] - Resolved hostname: 10.32.0.2 to address: /10.32.0.2
2019-07-03 09:43:33,435 [myid:] - INFO [main:QuorumPeer$QuorumServer#185] - Resolved hostname: 10.32.0.5 to address: /10.32.0.5
2019-07-03 09:43:33,435 [myid:] - WARN [main:QuorumPeerConfig#354] - Non-optimial configuration, consider an odd number of servers.
2019-07-03 09:43:33,436 [myid:] - INFO [main:QuorumPeerConfig#398] - Defaulting to majority quorums
2019-07-03 09:43:33,438 [myid:3] - INFO [main:DatadirCleanupManager#78] - autopurge.snapRetainCount set to 3
2019-07-03 09:43:33,438 [myid:3] - INFO [main:DatadirCleanupManager#79] - autopurge.purgeInterval set to 0
2019-07-03 09:43:33,438 [myid:3] - INFO [main:DatadirCleanupManager#101] - Purge task is not scheduled.
2019-07-03 09:43:33,445 [myid:3] - INFO [main:QuorumPeerMain#130] - Starting quorum peer
2019-07-03 09:43:33,450 [myid:3] - INFO [main:ServerCnxnFactory#117] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2019-07-03 09:43:33,452 [myid:3] - INFO [main:NIOServerCnxnFactory#89] - binding to port 0.0.0.0/0.0.0.0:2181
2019-07-03 09:43:33,458 [myid:3] - INFO [main:QuorumPeer#1159] - tickTime set to 2000
2019-07-03 09:43:33,458 [myid:3] - INFO [main:QuorumPeer#1205] - initLimit set to 10
2019-07-03 09:43:33,458 [myid:3] - INFO [main:QuorumPeer#1179] - minSessionTimeout set to -1
2019-07-03 09:43:33,459 [myid:3] - INFO [main:QuorumPeer#1190] - maxSessionTimeout set to -1
2019-07-03 09:43:33,464 [myid:3] - INFO [main:QuorumPeer#1470] - QuorumPeer communication is not secured!
2019-07-03 09:43:33,464 [myid:3] - INFO [main:QuorumPeer#1499] - quorum.cnxn.threads.size set to 20
2019-07-03 09:43:33,465 [myid:3] - INFO [main:QuorumPeer#669] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2019-07-03 09:43:33,519 [myid:3] - INFO [main:QuorumPeer#684] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2019-07-03 09:43:33,566 [myid:3] - INFO [ListenerThread:QuorumCnxManager$Listener#736] - My election bind port: /0.0.0.0:3888
2019-07-03 09:43:33,574 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer#910] - LOOKING
2019-07-03 09:43:33,575 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection#813] - New election. My id = 3, proposed zxid=0x0
2019-07-03 09:43:33,581 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,581 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LEADING (n.state), 1 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,581 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,582 [myid:3] - INFO [WorkerSender[myid=3]:QuorumCnxManager#347] - Have smaller server identifier, so dropping the connection: (4, 3)
2019-07-03 09:43:33,583 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LOOKING (n.state), 3 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,583 [myid:3] - INFO [WorkerSender[myid=3]:QuorumCnxManager#347] - Have smaller server identifier, so dropping the connection: (4, 3)
2019-07-03 09:43:33,583 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LEADING (n.state), 1 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,584 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LOOKING (n.state), 2 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,585 [myid:3] - INFO [/0.0.0.0:3888:QuorumCnxManager$Listener#743] - Received connection request /10.32.0.5:42182
2019-07-03 09:43:33,585 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), FOLLOWING (n.state), 2 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,585 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), FOLLOWING (n.state), 2 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,587 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LOOKING (n.state), 4 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,587 [myid:3] - WARN [RecvWorker:4:QuorumCnxManager$RecvWorker#1025] - Connection broken for id 4, my id = 3, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1010)
2019-07-03 09:43:33,589 [myid:3] - WARN [RecvWorker:4:QuorumCnxManager$RecvWorker#1028] - Interrupting SendWorker
2019-07-03 09:43:33,588 [myid:3] - INFO [/0.0.0.0:3888:QuorumCnxManager$Listener#743] - Received connection request /10.32.0.5:42184
2019-07-03 09:43:33,589 [myid:3] - WARN [SendWorker:4:QuorumCnxManager$SendWorker#941] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1094)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:929)
2019-07-03 09:43:33,589 [myid:3] - WARN [SendWorker:4:QuorumCnxManager$SendWorker#951] - Send worker leaving thread
2019-07-03 09:43:33,590 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), FOLLOWING (n.state), 4 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,590 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer#980] - FOLLOWING
2019-07-03 09:43:33,591 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), FOLLOWING (n.state), 4 (n.sid), 0x3 (n.peerEpoch) FOLLOWING (my state)
2019-07-03 09:43:33,593 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Learner#86] - TCP NoDelay set to: true
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:zookeeper.version=3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:host.name=629a802d822d
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.version=1.8.0_191
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.vendor=Oracle Corporation
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.class.path=/home/zookeeper-3.4.14/bin/../zookeeper-server/target/classes:/home/zookeeper-3.4.14/bin/../build/classes:/home/zookeeper-3.4.14/bin/../zookeeper-server/target/lib/*.jar:/home/zookeeper-3.4.14/bin/../build/lib/*.jar:/home/zookeeper-3.4.14/bin/../lib/slf4j-log4j12-1.7.25.jar:/home/zookeeper-3.4.14/bin/../lib/slf4j-api-1.7.25.jar:/home/zookeeper-3.4.14/bin/../lib/netty-3.10.6.Final.jar:/home/zookeeper-3.4.14/bin/../lib/log4j-1.2.17.jar:/home/zookeeper-3.4.14/bin/../lib/jline-0.9.94.jar:/home/zookeeper-3.4.14/bin/../lib/audience-annotations-0.5.0.jar:/home/zookeeper-3.4.14/bin/../zookeeper-3.4.14.jar:/home/zookeeper-3.4.14/bin/../zookeeper-server/src/main/resources/lib/*.jar:/home/zookeeper-3.4.14/bin/../conf:
2019-07-03 09:43:33,598 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.io.tmpdir=/tmp
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.compiler=<NA>
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:os.name=Linux
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:os.arch=amd64
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:os.version=4.18.0-21-generic
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:user.name=root
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:user.home=/root
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:user.dir=/
2019-07-03 09:43:33,599 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:ZooKeeperServer#174] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /var/lib/zookeeper/data/version-2 snapdir /var/lib/zookeeper/data/version-2
2019-07-03 09:43:33,600 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:Follower#65] - FOLLOWING - LEADER ELECTION TOOK - 25
2019-07-03 09:43:33,601 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer$QuorumServer#185] - Resolved hostname: 10.32.0.2 to address: /10.32.0.2
2019-07-03 09:43:33,637 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Learner#336] - Getting a snapshot from leader 0x300000000
2019-07-03 09:43:33,644 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:FileTxnSnapLog#301] - Snapshotting: 0x300000000 to /var/lib/zookeeper/data/version-2/snapshot.300000000
2019-07-03 09:44:24,320 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /150.20.11.157:55744
2019-07-03 09:44:24,324 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /150.20.11.157:55744
2019-07-03 09:44:24,327 [myid:3] - WARN
[QuorumPeer[myid=3]/0.0.0.0:2181:Follower#119] - Got zxid 0x300000001 expected 0x1
2019-07-03 09:44:24,327 [myid:3] - INFO [SyncThread:3:FileTxnLog#216] - Creating new log file: log.300000001
2019-07-03 09:44:24,384 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860000 with negotiated timeout 10000 for client /150.20.11.157:55744
2019-07-03 09:44:24,892 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /150.20.11.157:55746
2019-07-03 09:44:24,892 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /150.20.11.157:55746
2019-07-03 09:44:24,908 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860001 with negotiated timeout 10000 for client /150.20.11.157:55746
2019-07-03 09:44:26,410 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /150.20.11.157:55748
2019-07-03 09:44:26,411 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#903] - Connection request from old client /150.20.11.157:55748; will be dropped if server is in r-o mode
2019-07-03 09:44:26,411 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /150.20.11.157:55748
2019-07-03 09:44:26,422 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860002 with negotiated timeout 10000 for client /150.20.11.157:55748
2019-07-03 09:45:41,553 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1056] - Closed socket connection for client /150.20.11.157:55746 which had sessionid 0x300393be5860001
2019-07-03 09:45:41,567 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1056] - Closed socket connection for client /150.20.11.157:55744 which had sessionid 0x300393be5860000
2019-07-03 09:45:41,597 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#376] - Unable to read additional data from client sessionid 0x300393be5860002, likely client has closed socket
2019-07-03 09:45:41,597 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1056] - Closed socket connection for client /150.20.11.157:55748 which had sessionid 0x300393be5860002
2019-07-03 09:46:20,896 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /10.32.0.5:45998
2019-07-03 09:46:20,901 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /10.32.0.5:45998
2019-07-03 09:46:20,916 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860003 with negotiated timeout 40000 for client /10.32.0.5:45998
2019-07-03 09:46:43,827 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /150.20.11.157:55864
2019-07-03 09:46:43,830 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /150.20.11.157:55864
2019-07-03 09:46:43,856 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860004 with negotiated timeout 10000 for client /150.20.11.157:55864
2019-07-03 09:46:44,336 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] -
Accepted socket connection from /150.20.11.157:55866
2019-07-03 09:46:44,336 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /150.20.11.157:55866
2019-07-03 09:46:44,348 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694]
- Established session 0x300393be5860005 with negotiated timeout 10000 for client /150.20.11.157:55866
Would you please guide me how to use both Mesos slaves to run Flink platform?
Any help would be really appreciated.

Spring Kafka, Testing with Embedded Kafka

We are observing a strange behavior with our Servicetest and embedded Kafka.
The Test is a Spock Test, we use the JUnit Rule KafkaEmbedded and propagate brokersAsString as follows:
#ClassRule
#Shared
KafkaEmbedded embeddedKafka = new KafkaEmbedded(1)
#Autowired
KafkaListenerEndpointRegistry endpointRegistry
def setupSpec() {
System.setProperty("kafka.bootstrapServers", embeddedKafka.getBrokersAsString())
}
From inspecting the Code of KafkaEmbedded, constructing an Instance with KafkaEmbedded(int count) leads to one Kafka Server with two partitions per topic.
In order to tackle issues with partition assignment and server-client synchronization in the test, we follow the strategy as seen in ContainerTestUtils class from spring-kafka.
public static void waitForAssignment(KafkaMessageListenerContainer<String, String> container, int partitions)
throws Exception {
log.info(
"Waiting for " + container.getContainerProperties().getTopics() + " to connect to " + partitions + " " +
"partitions.")
int n = 0;
int count = 0;
while (n++ < 600 && count < partitions) {
count = 0;
container.getAssignedPartitions().each {
TopicPartition it ->
log.info(it.topic() + ":" + it.partition() + "; ")
}
if (container.getAssignedPartitions() != null) {
count = container.getAssignedPartitions().size();
}
if (count < partitions) {
Thread.sleep(100);
}
}
}
When we observe the logs we notice the following pattern:
2016-07-29 11:24:02.600 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 1 : {deliveryZipCode_v1=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.600 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 1 : {staggering=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.600 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 1 : {moa=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.696 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 3 : {staggering=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.699 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 3 : {moa=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.699 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 3 : {deliveryZipCode_v1=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.807 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 5 : {deliveryZipCode_v1=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.811 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 5 : {staggering=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.812 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 5 : {moa=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:03.544 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:03.544 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:03.544 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:03.602 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : SyncGroup for group timeslot-service-group-06x failed due to coordinator rebalance, rejoining the group
2016-07-29 11:24:03.637 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[]
2016-07-29 11:24:03.637 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[]
2016-07-29 11:24:04.065 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[staggering-0]
2016-07-29 11:24:04.066 INFO 1160 --- [ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 50810 (http)
2016-07-29 11:24:04.073 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : Started AllocationsDeliveryZonesServiceSpec in 20.616 seconds (JVM running for 25.456)
2016-07-29 11:24:04.237 INFO 1160 --- [ main] org.eclipse.jetty.server.Server : jetty-9.2.17.v20160517
2016-07-29 11:24:04.265 INFO 1160 --- [ main] o.e.jetty.server.handler.ContextHandler : Started o.e.j.s.ServletContextHandler#6a8598e7{/__admin,null,AVAILABLE}
2016-07-29 11:24:04.270 INFO 1160 --- [ main] o.e.jetty.server.handler.ContextHandler : Started o.e.j.s.ServletContextHandler#104ea372{/,null,AVAILABLE}
2016-07-29 11:24:04.279 INFO 1160 --- [ main] o.eclipse.jetty.server.ServerConnector : Started ServerConnector#3c9b416a{HTTP/1.1}{0.0.0.0:50811}
2016-07-29 11:24:04.430 INFO 1160 --- [ main] o.eclipse.jetty.server.ServerConnector : Started ServerConnector#7c214597{SSL-http/1.1}{0.0.0.0:50812}
2016-07-29 11:24:04.430 INFO 1160 --- [ main] org.eclipse.jetty.server.Server : Started #25813ms
2016-07-29 11:24:04.632 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : waiting...
2016-07-29 11:24:04.662 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : Waiting for [moa] to connect to 2 partitions.^
2016-07-29 11:24:13.644 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Attempt to heart beat failed since the group is rebalancing, try to re-join group.
2016-07-29 11:24:13.644 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Attempt to heart beat failed since the group is rebalancing, try to re-join group.
2016-07-29 11:24:13.644 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:13.644 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:13.655 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[staggering-0]
2016-07-29 11:24:13.655 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[moa-0]
2016-07-29 11:24:13.655 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[deliveryZipCode_v1-0]
2016-07-29 11:24:13.740 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
[...]
2016-07-29 11:24:16.644 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
2016-07-29 11:24:16.666 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[staggering-0]
2016-07-29 11:24:16.750 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
[...]
2016-07-29 11:24:23.559 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
2016-07-29 11:24:23.660 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Attempt to heart beat failed since the group is rebalancing, try to re-join group.
2016-07-29 11:24:23.660 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Attempt to heart beat failed since the group is rebalancing, try to re-join group.
2016-07-29 11:24:23.662 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
2016-07-29 11:24:23.686 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[moa-0]
2016-07-29 11:24:23.686 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[deliveryZipCode_v1-0]
2016-07-29 11:24:23.695 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[moa-0]
2016-07-29 11:24:23.695 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[staggering-0]
2016-07-29 11:24:23.695 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[deliveryZipCode_v1-0]
Please note the [..] indication omitted lines
We set metadata.max.age.ms to 3000 ms
As a result it tries to refresh the metadata information frequently.
What puzzles us now is, that if we wait for two partitions to connect, the wait will time out. Only if we wait for one partition to connect, after a while everything runs successfully.
Did we understand the code wrong, that there are two partitions per topic in the embedded Kafka? Is it normal that only one is assigned to our Listeners?
For testing, it is important to set spring.kafka.consumer.auto-offset-reset=earliest to avoid race condition (sequence or timing of consumer versus producer), see https://docs.spring.io/spring-kafka/reference/html/#junit
Starting with version 2.5, the consumerProps method sets the ConsumerConfig.AUTO_OFFSET_RESET_CONFIG to earliest. This is because, in most cases, you want the consumer to consume any messages sent in a test case. The ConsumerConfig default is latest which means that messages already sent by a test, before the consumer starts, will not receive those records. To revert to the previous behavior, set the property to latest after calling the method.
I can't explain the flakiness you're seeing; yes, each topic gets 2 partitions by default. I just ran one of the framework container tests and see this...
09:24:06.139 INFO [testSlow3-kafka-consumer-1][org.springframework.kafka.listener.KafkaMessageListenerContainer] partitions revoked:[]
09:24:06.611 INFO [testSlow3-kafka-consumer-1][org.springframework.kafka.listener.KafkaMessageListenerContainer] partitions assigned:[testTopic3-1, testTopic3-0]

Resources