Kafka not Publishing Oracle Data - oracle

I have a Confluent on RHEL setup and am trying to read data from an Oracle 12C table/view (I tried both) and it is never creating messages at the consumer.
My suspicion is that that it has something to do with the data in the tables being loaded using a bulk loader and not unary inserts. I do have a unique incrementing id column in the data that I have specified, and the config loads and it shows my topic name as active/running.
Any ideas?
{
"name":"oracle_source_05",
"config": {
"connector.class":
"io.confluent.connect.jdbc.JdbcSourceConnector",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://<host>:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://<host>:8081",
"connection.url": “<jdbc url>
"connection.user" : "<user>",
"connection.password" : "<pw>",
"table.whitelist": "<view name>",
"table.type" : "VIEW",
"mode": "incrementing",
"incrementing.column.name" : "<id column>",
"validate.non.null":"false",
"topic.prefix":"ORACLE-"
}
}
Log has this message:
[2018-04-17 10:59:19,965] DEBUG [Controller id=0] Topics not in preferred replica Map() (kafka.controller.KafkaController)
[2018-04-17 10:59:19,965] TRACE [Controller id=0] Leader imbalance ratio for broker 0 is 0.0 (kafka.controller.KafkaController)
server.log:
[2018-04-18 09:24:26,495] INFO Accepted socket connection from /127.0.0.1:39228 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2018-04-18 09:24:26,498] INFO Client attempting to establish new session at /127.0.0.1:39228 (org.apache.zookeeper.server.ZooKeeperServer)
[2018-04-18 09:24:26,499] INFO Established session 0x162d403daed0004 with negotiated timeout 30000 for client /127.0.0.1:39228 (org.apache.zookeeper.server.ZooKeeperServer)
[2018-04-18 09:24:26,516] INFO Processed session termination for sessionid: 0x162d403daed0004 (org.apache.zookeeper.server.PrepRequestProcessor)
[2018-04-18 09:24:26,517] INFO Closed socket connection for client /127.0.0.1:39228 which had sessionid 0x162d403daed0004 (org.apache.zookeeper.server.NIOServerCnxn)

Related

Camel aws-s3 Source Connector Error - How should the config be changed

I am working on defining a Camel S3 Source connector with our Confluent (5.5.1) installation. After creating the connector and checking status as "RUNNING", I upload a file to my S3 bucket. Even if I do ls for the bucket, it is empty, which indicates the file is processed and deleted. But, I do not see messages in the topic. I am basically following this example trying a simple 4 line file, but instead of standalone kafka, doing it on a confluent cluster.
This is my configuration
{
"name": "CamelAWSS3SourceConnector",
"connector.class": "org.apache.camel.kafkaconnector.awss3.CamelAwss3SourceConnector",
"bootstrap.servers": "broker1-dev:9092,broker2-dev:9092,broker3-dev:9092",
"sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"user\" password=\"password\";",
"security.protocol": "SASL_SSL",
"ssl.truststore.location": "/config/client.truststore.jks",
"ssl.truststore.password": "password",
"ssl.keystore.location": "/config/client.keystore.jks",
"ssl.keystore.password": "password",
"ssl.key.password": "password",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"errors.tolerance": "all",
"offset.flush.timeout.ms": "60000",
"offset.flush.interval.ms": "10000",
"max.request.size": "10485760",
"flush.size": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.camel.kafkaconnector.awss3.converters.S3ObjectConverter",
"camel.source.maxPollDuration": "10000",
"topics": "TEST-CAMEL-S3-SOURCE-POC",
"camel.source.path.bucketNameOrArn": "arn:aws:s3:::my-bucket",
"camel.component.aws-s3.region": "US_EAST_1",
"tasks.max": "1",
"camel.source.endpoint.useIAMCredentials": "true",
"camel.source.endpoint.autocloseBody": "true"
}
And I see these errors in the logs
[2020-12-23 09:05:01,876] ERROR WorkerSourceTask{id=CamelAWSS3SourceConnector-0} Failed to flush, timed out while waiting for producer to flush outstanding 1 messages (org.apache.kafka.connect.runtime.WorkerSourceTask:448)
[2020-12-23 09:05:01,876] ERROR WorkerSourceTask{id=CamelAWSS3SourceConnector-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:116)
[2020-12-23 09:20:58,685] DEBUG [Worker clientId=connect-1, groupId=connect-cluster] Received successful Heartbeat response (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:1045)
[2020-12-23 09:20:58,688] DEBUG WorkerSourceTask{id=CamelAWSS3SourceConnector-0} Committing offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:111)
[2020-12-23 09:20:58,688] INFO WorkerSourceTask{id=CamelAWSS3SourceConnector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:426)
[2020-12-23 09:20:58,688] INFO WorkerSourceTask{id=CamelAWSS3SourceConnector-0} flushing 1 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:443)
And if I do a curl request for the status of the connector, I get this error for the status
trace: org.apache.kafka.connect.errors.ConnectException: OffsetStorageWriter is already flushing
at org.apache.kafka.connect.storage.OffsetStorageWriter.beginFlush(OffsetStorageWriter.java:111)
at org.apache.kafka.connect.runtime.WorkerSourceTask.commitOffsets(WorkerSourceTask.java:438)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:257)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I saw below solution in a couple of links, but that also didn't help. It suggested to add below keys to the config
"offset.flush.timeout.ms": "60000",
"offset.flush.interval.ms": "10000",
"max.request.size": "10485760",
Thank you
UPDATE
I cut the config to minimal, but still get the same error
{
"connector.class": "org.apache.camel.kafkaconnector.awss3.CamelAwss3SourceConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.camel.kafkaconnector.awss3.converters.S3ObjectConverter",
"camel.source.maxPollDuration": "10000",
"topics": "TEST-S3-SOURCE-MINIMAL-POC",
"camel.source.path.bucketNameOrArn": "pruvpcaws003-np-use1-push-json-poc",
"camel.component.aws-s3.region": "US_EAST_1",
"tasks.max": "1",
"camel.source.endpoint.useIAMCredentials": "true",
"camel.source.endpoint.autocloseBody": "true"
}
Still get the same error
trace: org.apache.kafka.connect.errors.ConnectException: OffsetStorageWriter is already flushing
at org.apache.kafka.connect.storage.OffsetStorageWriter.beginFlush(OffsetStorageWriter.java:111)
at org.apache.kafka.connect.runtime.WorkerSourceTask.commitOffsets(WorkerSourceTask.java:438)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:257)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Not sure where else should I look too find the root cause

Storm topology shuts down in Local cluster after running for few seconds

I have a very basic topology. Starting with KafkaSpout, it has 3 bolts. First bolt is CassandraWriterBolt to write data in Cassandra, remaining 2 other bolts read old data from Cassandra create another set of data by using new and old data and again insert that data into Cassandra.
I am running that topology in LocalCluster during development. It runs for few seconds and then it starts shutting down worker, executor etc. Finally it fails with Cassandra driver related exception -
java.lang.IllegalStateException: Could not send request, session is closed
at com.datastax.driver.core.SessionManager.execute(SessionManager.java:696) ~[cassandra-driver-core-3.6.0.jar:na]
Other logs are -
[er Executor - 1] o.a.s.s.org.apache.zookeeper.ZooKeeper : Session: 0x100166ad36d0024 closed
[.0/0.0.0.0:2000] o.a.s.s.o.a.z.server.NIOServerCnxn : Unable to read additional data from client sessionid 0x100166ad36d0024, likely client has closed socket
[.0/0.0.0.0:2000] o.a.s.s.o.a.z.server.NIOServerCnxn : Closed socket connection for client /0:0:0:0:0:0:0:1:63890 which had sessionid 0x100166ad36d0024
[- 1-EventThread] o.a.s.s.org.apache.zookeeper.ClientCnxn : EventThread shut down for session: 0x100166ad36d0024
[tor-Framework-0] o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl : backgroundOperationsLoop exiting
[:0 cport:2000):] o.a.s.s.o.a.z.s.PrepRequestProcessor : Processed session termination for sessionid: 0x100166ad36d0021
[er Executor - 4] o.a.s.s.org.apache.zookeeper.ZooKeeper : Session: 0x100166ad36d0021 closed
[.0/0.0.0.0:2000] o.a.s.s.o.a.z.server.NIOServerCnxn : Unable to read additional data from client sessionid 0x100166ad36d0021, likely client has closed socket
[.0/0.0.0.0:2000] o.a.s.s.o.a.z.server.NIOServerCnxn : Closed socket connection for client /0:0:0:0:0:0:0:1:63885 which had sessionid 0x100166ad36d0021
[- 4-EventThread] o.a.s.s.org.apache.zookeeper.ClientCnxn : EventThread shut down for session: 0x100166ad36d0021
[ SLOT_1027] org.apache.storm.ProcessSimulator : Begin killing process 1347f01d-7982-4141-9b9d-cac65a6e703d
[ SLOT_1027] org.apache.storm.daemon.worker.Worker : Shutting down worker forex-topology-1-1577152204 517f3306-5ad3-433b-82e1-b2d031779f0b 1027
[ SLOT_1027] org.apache.storm.daemon.worker.Worker : Terminating messaging context
[ SLOT_1027] org.apache.storm.daemon.worker.Worker : Shutting down executors
[ SLOT_1027] o.a.storm.executor.ExecutorShutdown : Shutting down executor __system:[-1, -1]
[xecutor[-1, -1]] org.apache.storm.utils.Utils : Async loop interrupted!
[ SLOT_1027] o.a.storm.executor.ExecutorShutdown : Shut down executor __system:[-1, -1]
[ SLOT_1027] o.a.storm.executor.ExecutorShutdown : Shutting down executor pairStrengthAccumulator:[8, 8]
[-executor[8, 8]] org.apache.storm.utils.Utils : Async loop interrupted!
[ SLOT_1027] o.a.s.cassandra.executor.AsyncExecutor : shutting down async handler executor
[ SLOT_1027] o.a.s.c.client.impl.DefaultClient : Try to close connection to cluster: cluster2
Following logs can be seen for 40 times -
[ main] o.a.storm.zookeeper.ClientZookeeper : Starting ZK Curator
[ main] o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl : Starting
[ main] o.a.s.s.org.apache.zookeeper.ZooKeeper : Initiating client connection, connectString=localhost:2000/storm sessionTimeout=20000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState#4bcaa195
[ main] o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl : Default schema
[localhost:2000)] o.a.s.s.org.apache.zookeeper.ClientCnxn : Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2000. Will not attempt to authenticate using SASL (unknown error)
[ main] o.a.storm.zookeeper.ClientZookeeper : Starting ZK Curator
[ main] o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl : Starting
[.0/0.0.0.0:2000] o.a.s.s.o.a.z.s.NIOServerCnxnFactory : Accepted socket connection from /0:0:0:0:0:0:0:1:63756
[ main] o.a.s.s.org.apache.zookeeper.ZooKeeper : Initiating client connection, connectString=localhost:2000/storm sessionTimeout=20000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState#6bc24e72
[localhost:2000)] o.a.s.s.org.apache.zookeeper.ClientCnxn : Socket connection established to localhost/0:0:0:0:0:0:0:1:2000, initiating session
[.0/0.0.0.0:2000] o.a.s.s.o.a.z.server.ZooKeeperServer : Client attempting to establish new session at /0:0:0:0:0:0:0:1:63756
[ main] o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl : Default schema
[localhost:2000)] o.a.s.s.org.apache.zookeeper.ClientCnxn : Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2000, sessionid = 0x100166ad36d0001, negotiated timeout = 20000
[ SyncThread:0] o.a.s.s.o.a.z.server.ZooKeeperServer : Established session 0x100166ad36d0001 with negotiated timeout 20000 for client /0:0:0:0:0:0:0:1:63756
[localhost:2000)] o.a.s.s.org.apache.zookeeper.ClientCnxn : Opening socket connection to server localhost/127.0.0.1:2000. Will not attempt to authenticate using SASL (unknown error)
[ain-EventThread] o.a.s.s.o.a.c.f.s.ConnectionStateManager : State change: CONNECTED
[localhost:2000)] o.a.s.s.org.apache.zookeeper.ClientCnxn : Socket connection established to localhost/127.0.0.1:2000, initiating session
[.0/0.0.0.0:2000] o.a.s.s.o.a.z.s.NIOServerCnxnFactory : Accepted socket connection from /127.0.0.1:63759
[.0/0.0.0.0:2000] o.a.s.s.o.a.z.server.ZooKeeperServer : Client attempting to establish new session at /127.0.0.1:63759
[ SyncThread:0] o.a.s.s.o.a.z.server.ZooKeeperServer : Established session 0x100166ad36d0002 with negotiated timeout 20000 for client /127.0.0.1:63759
[localhost:2000)] o.a.s.s.org.apache.zookeeper.ClientCnxn : Session establishment complete on server localhost/127.0.0.1:2000, sessionid = 0x100166ad36d0002, negotiated timeout = 20000
[ain-EventThread] o.a.s.s.o.a.c.f.s.ConnectionStateManager : State change: CONNECTED
[ main] o.a.storm.validation.ConfigValidation : task.heartbeat.frequency.secs is a deprecated config please see class org.apache.storm.Config.TASK_HEARTBEAT_FREQUENCY_SECS for more information.
Your main method does this:
public static void main(String[] args) {
ApplicationContext springContext = SpringApplication.run(CurrencyStrengthCalculatorApplication.class, args);
StormTopology topology = SpringBasedTopologyBuilder.getInstance().buildStormTopologyUsingApplicationContext(springContext);
LOG.info("Topology created successfully. Now starting it .............");
new LocalCluster().submitTopology("forext-topology", ImmutableMap.of(), topology);
}
submitTopology isn't a blocking call, it just submits the topology and returns. If you want to keep the program running for a while, you need to put in a sleep after the submit. Once the main method returns, the LocalCluster will begin shutting down.

how to configure an HdfsSinkConnector with Kafka Connect?

I'm trying to setup an HdfsSinkConnector. This is my worker.properties config:
bootstrap.servers=kafkacluster01.corp:9092
group.id=nycd-og-kafkacluster
config.storage.topic=hive_conn_conf
offset.storage.topic=hive_conn_offs
status.storage.topic=hive_conn_stat
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://my-schemaregistry.co:8081
schema.registry.url=http://my-schemaregistry.co:8081
hive.integration=true
hive.metastore.uris=dev-hive-metastore
schema.compatibility=BACKWARD
value.converter.schemas.enable=true
logs.dir = /logs
topics.dir = /topics
plugin.path=/usr/share/java
and this is the post requst that i'm calling to setup the connector
curl -X POST localhost:9092/connectors -H "Content-Type: application/json" -d '{
"name":"hdfs-hive_sink_con_dom16",
"config":{
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
"topics": "dom_topic",
"hdfs.url": "hdfs://hadoop-sql-dev:10000",
"flush.size": "3",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://my-schemaregistry.co:8081"
}
}'
The topic dom_topic already exists (is Avro) but I get the following error from my worker:
INFO Couldn't start HdfsSinkConnector: (io.confluent.connect.hdfs.HdfsSinkTask:72)
org.apache.kafka.connect.errors.ConnectException: java.io.IOException:
Failed on local exception: com.google.protobuf.InvalidProtocolBufferException:
Protocol message end-group tag did not match expected tag.;
Host Details : local host is: "319dc5d70884/172.17.0.2"; destination host is: "hadoop-sql-dev":10000;
at io.confluent.connect.hdfs.DataWriter.<init>(DataWriter.java:202)
at io.confluent.connect.hdfs.HdfsSinkTask.start(HdfsSinkTask.java:64)
at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:207)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:139)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
the hdfs.url I've gotten from hive: jdbc:hive2://hadoop-sql-dev:10000
If I change the port to, say, 9092 I get
INFO Retrying connect to server: hadoop-sql-dev/xxx.xx.x.xx:9092. Already tried 0 time(s); maxRetries=45 (org.apache.hadoop.ipc.Client:837)
I'm running this all on Docker, and my Dockerfile is very simple
#FROM coinsmith/cp-kafka-connect-hdfs
FROM confluentinc/cp-kafka-connect:5.3.1
COPY confluentinc-kafka-connect-hdfs-5.3.1 /usr/share/java/kafka-connect-hdfs
COPY worker.properties worker.properties
# start
ENTRYPOINT ["connect-distributed", "worker.properties"]
Any help would be appreciated.

Kafka JDBC Sink Connector - Oracle

I am creating jdbc sink connector(confluent) to oracle, first time. Below is the connector config. It is not creating and loading the table, but offset is being increased as per log and no error thrown in the log. Please suggest what could be the issue.
Producing through java, 5 sample records. I have tried curl to check the status, it is running.
{
"name": "ora_sink_task",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "jdbc:oracle:thin:#host:port/servicename",
"connection.user": "user",
"connection.password": "password",
"topics": "connecttest",
"tasks.max": "1",
"table.name.format": "member_cbdt_sink1",
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://localhost:8081",
"auto.create": "true"
}
}
Connector log:
[2019-09-09 00:34:23,832] INFO Checking Oracle dialect for existence
of table "member_cbdt_sink1"
(io.confluent.connect.jdbc.dialect.OracleDatabaseDialect:492)
[2019-09-09 00:34:23,834] INFO Using Oracle dialect table
"member_cbdt_sink1" absent
(io.confluent.connect.jdbc.dialect.OracleDatabaseDialect:500)
[2019-09-09 00:34:23,846] INFO Checking Oracle dialect for existence
of table "member_cbdt_sink1"
(io.confluent.connect.jdbc.dialect.OracleDatabaseDialect:492)
[2019-09-09 00:34:23,849] INFO Using Oracle dialect table
"member_cbdt_sink1" present
(io.confluent.connect.jdbc.dialect.OracleDatabaseDialect:500)
[2019-09-09 00:34:24,037] INFO Setting metadata for table
"member_cbdt_sink1" to Table{name='"member_cbdt_sink1"',
columns=[Column{'first_name', isPrimaryKey=false, allowsNull=false,
sqlType=CLOB}, Column{'height', isPrimaryKey=false, allowsNull=false,
sqlType=BINARY_FLOAT}, Column{'last_name', isPrimaryKey=false,
allowsNull=false, sqlType=CLOB}, Column{'age', isPrimaryKey=false,
allowsNull=false, sqlType=NUMBER}, Column{'automated_email',
isPrimaryKey=false, allowsNull=true, sqlType=NUMBER}, Column{'weight',
isPrimaryKey=false, allowsNull=false, sqlType=BINARY_FLOAT}]}
(io.confluent.connect.jdbc.util.TableDefinitions:65) [2019-09-09
00:35:13,775] INFO WorkerSinkTask{id=ora_sink_task-0} Committing
offsets asynchronously using sequence number 1:
{connecttest-0=OffsetAndMetadata{offset=55, metadata=''}}
(org.apache.kafka.connect.runtime.WorkerSinkTask:345) [2019-09-09
01:03:13,775] INFO WorkerSinkTask{id=ora_sink_task-0} Committing
offsets asynchronously using sequence number 29:
{connecttest-0=OffsetAndMetadata{offset=60, metadata=''}}
(org.apache.kafka.connect.runtime.WorkerSinkTask:345)
It worked by changing the value of table.name.format from lowercase to uppercase as Oracle looks for uppercase seems.

Mesos Slave on Windows 2016. Not Connecting with Master

My current set up is as follows:
Mesos Master — 10.20.200.300:14081 - RHEL 7
Zookeeper — 10.20.200.300:14080 - RHEL 7
Mesos Agent — 10.21.210.310:5051 - Windows 2016
The master is up & is able to connect to zookeeper. However, on starting the agent, even if the agent is connecting to zookeeper, it is not getting connected to the Master.
Master was started as systemd process with below paramters under /etc/mesos-master -
hostname - mymaster.mesos.com
quorum - 1
work_dir - /var/lib/mesos
advertise_ip - 10.20.200.300
advertise_port - 14081
Below are the logs from master, slave & zookeeper.
Master Logs(Running on 10.20.200.300:14081) :
E1208 12:22:21.269227 4302 process.cpp:2455] Failed to shutdown socket with fd 26, address 10.20.200.300:14081: Transport endpoint is not connected
Zookeeper Logs(Running on 10.20.200.300:14080) :
2017-12-08 12:22:21,185 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:14080:ZooKeeperServer#942] - Client attempting to establish new session at /10.21.210.310:63039
2017-12-08 12:22:21,196 [myid:] - INFO [SyncThread:0:ZooKeeperServer#687] - Established session 0x160372c2b770010 with negotiated timeout 10000 for client /10.21.210.310:63039
Slave Logs(Running on 10.21.210.310:5051) :
I1208 12:22:21.179652 4224 slave.cpp:1007] New master detected at master#10.20.200.300:14081
I1208 12:22:21.195278 4224 slave.cpp:1031] No credentials provided. Attempting to register without authentication
I1208 12:22:21.195278 4224 slave.cpp:1042] Detecting new master
I1208 12:22:21.210924 6156 slave.cpp:5135] Got exited event for master#10.20.200.300:14081
W1208 12:22:21.210924 6156 slave.cpp:5140] Master disconnected! Waiting for a new master to be elected
I1208 12:22:21.226510 2700 slave.cpp:5135] Got exited event for master#10.20.200.300:14081
W1208 12:22:21.226510 2700 slave.cpp:5140] Master disconnected! Waiting for a new master to be elected
Does anyone know the reason for these?
I have tested the connectivity between slave -> master & master -> Slave & it was successful.
Test-NetConnection -ComputerName 10.20.200.300 -Port 14081
ComputerName : 10.20.200.300
RemoteAddress : 10.20.200.300
RemotePort : 14081
InterfaceAlias : Ethernet
SourceAddress : 10.21.210.310
TcpTestSucceeded : True
[root#mesos-master]# telnet 10.21.210.310 5051
Trying 10.21.210.310...
Connected to 10.21.210.310.
Escape character is '^]'.
I got up the agents with below parameters -
C:\Mesos\mesos\build\src>C:\Mesos\mesos\build\src\mesos-agent.exe \
--master=zk://10.20.200.300:14080/mesos \
--work_dir=C:\Mesos\Logs \
--launcher_dir=C:\Mesos\mesos\build\src \
--ip=10.21.210.310 \
--advertise_ip=10.21.210.310 \
--advertise_port=5051
Master/state Logs-
{
"version": "1.3.1",
"git_sha": "1beaede8c13f0832d4921121da34f924deec8950",
"git_tag": "1.3.1",
"build_date": "2017-09-05 18:02:12",
"build_time": 1504634532,
"build_user": "centos",
"start_time": 1513010072.51033,
"elected_time": 1513010072.67995,
"id": "90f5702f-f867-41ac-8087-5d20c87ea96f",
"pid": "master#10.20.200.300:14081",
"hostname": "MYhost.COM",
"activated_slaves": 0,
"deactivated_slaves": 0,
"unreachable_slaves": 0,
"leader": "master#10.20.200.300:14081",
"leader_info": {
"id": "90f5702f-f867-41ac-8087-5d20c87ea96f",
"pid": "master#10.20.200.300:14081",
"port": 14081,
"hostname": "MYhost.COM"
},
"log_dir": "/var/log/mesos",
"flags": {
"advertise_ip": "10.20.200.300",
"advertise_port": "14081",
"agent_ping_timeout": "15secs",
"agent_reregister_timeout": "10mins",
"allocation_interval": "1secs",
"allocator": "HierarchicalDRF",
"authenticate_agents": "false",
"authenticate_frameworks": "false",
"authenticate_http_frameworks": "false",
"authenticate_http_readonly": "false",
"authenticate_http_readwrite": "false",
"authenticators": "crammd5",
"authorizers": "local",
"framework_sorter": "drf",
"help": "false",
"hostname": "MYhost.COM",
"hostname_lookup": "true",
"http_authenticators": "basic",
"initialize_driver_logging": "true",
"log_auto_initialize": "true",
"log_dir": "/var/log/mesos",
"logbufsecs": "0",
"logging_level": "INFO",
"max_agent_ping_timeouts": "5",
"max_completed_frameworks": "50",
"max_completed_tasks_per_framework": "1000",
"max_unreachable_tasks_per_framework": "1000",
"port": "14081",
"quiet": "false",
"quorum": "1",
"recovery_agent_removal_limit": "100%",
"registry": "replicated_log",
"registry_fetch_timeout": "1mins",
"registry_gc_interval": "15mins",
"registry_max_agent_age": "2weeks",
"registry_max_agent_count": "102400",
"registry_store_timeout": "20secs",
"registry_strict": "false",
"root_submissions": "true",
"user_sorter": "drf",
"version": "false",
"webui_dir": "/usr/share/mesos/webui",
"work_dir": "/var/lib/mesos",
"zk": "zk://localhost:14080/mesos",
"zk_session_timeout": "10secs"
},
"slaves": [],
"recovered_slaves": [],
"frameworks": [],
"completed_frameworks": [],
"orphan_tasks": [],
"unregistered_frameworks": []
}
Do we need to test any other connectivity or this error is for some other reason?
I would try this
Set hostname on slave (you can say hostname=10.21.210.310)
Check firewall on Windows machine. Allow incoming conections to 5051 port

Resources