I have installed and configured Mesos and Marathon. Whenever I try to schedule an application, it remains in 'Waiting' state which seems to indicate that Marathon is waiting for offers from Mesos.
When I check the logs in Mesos, I see the following:
I0425 20:22:10.313910 4279 master.cpp:2231] Received SUBSCRIBE call for framework 'chronos-2.4.0' at scheduler-07d9654e-5c40-4172-a25d-97c565b5765d#127.0.1.1:50892
I0425 20:22:10.313987 4279 master.cpp:2302] Subscribing framework chronos-2.4.0 with checkpointing enabled and capabilities [ ]
I0425 20:22:10.313994 4279 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0001 (chronos-2.4.0) at scheduler-07d9654e-5c40-4172-a25d-97c565b5765d#127.0.1.1:50892 already subscribed, resending acknowledgement
W0425 20:22:10.314007 4279 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0001 (chronos-2.4.0) at scheduler-07d9654e-5c40-4172-a25d-97c565b5765d#127.0.1.1:50892
E0425 20:22:10.314193 4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:11.226884 4284 master.cpp:2231] Received SUBSCRIBE call for framework 'marathon' at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57#127.0.1.1:35928
I0425 20:22:11.226959 4284 master.cpp:2302] Subscribing framework marathon with checkpointing enabled and capabilities [ ]
I0425 20:22:11.226969 4284 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57#127.0.1.1:35928 already subscribed, resending acknowledgement
W0425 20:22:11.226982 4284 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57#127.0.1.1:35928
E0425 20:22:11.227226 4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:12.113598 4281 http.cpp:312] HTTP GET for /master/state from 192.0.2.1:49698 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
I0425 20:22:12.314221 4286 master.cpp:2231] Received SUBSCRIBE call for framework 'chronos-2.4.0' at scheduler-07d9654e-5c40-4172-a25d-97c565b5765d#127.0.1.1:50892
I0425 20:22:12.314304 4286 master.cpp:2302] Subscribing framework chronos-2.4.0 with checkpointing enabled and capabilities [ ]
I0425 20:22:12.314312 4286 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0001 (chronos-2.4.0) at scheduler-07d9654e-5c40-4172-a25d-97c565b5765d#127.0.1.1:50892 already subscribed, resending acknowledgement
W0425 20:22:12.314337 4286 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0001 (chronos-2.4.0) at scheduler-07d9654e-5c40-4172-a25d-97c565b5765d#127.0.1.1:50892
E0425 20:22:12.314524 4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:13.081887 4284 master.cpp:2231] Received SUBSCRIBE call for framework 'marathon' at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57#127.0.1.1:35928
I0425 20:22:13.081964 4284 master.cpp:2302] Subscribing framework marathon with checkpointing enabled and capabilities [ ]
I0425 20:22:13.081987 4284 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57#127.0.1.1:35928 already subscribed, resending acknowledgement
W0425 20:22:13.082005 4284 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57#127.0.1.1:35928
E0425 20:22:13.082314 4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:13.221590 4282 master.cpp:2231] Received SUBSCRIBE call for framework 'marathon' at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57#127.0.1.1:35928
I0425 20:22:13.221664 4282 master.cpp:2302] Subscribing framework marathon with checkpointing enabled and capabilities [ ]
I0425 20:22:13.221674 4282 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57#127.0.1.1:35928 already subscribed, resending acknowledgement
W0425 20:22:13.221688 4282 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57#127.0.1.1:35928
E0425 20:22:13.222162 4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:14.412215 4286 master.cpp:2231] Received SUBSCRIBE call for framework 'marathon' at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57#127.0.1.1:35928
I0425 20:22:14.412281 4286 master.cpp:2302] Subscribing framework marathon with checkpointing enabled and capabilities [ ]
I0425 20:22:14.412289 4286 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57#127.0.1.1:35928 already subscribed, resending acknowledgement
W0425 20:22:14.412302 4286 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57#127.0.1.1:35928
E0425 20:22:14.412495 4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
Any idea as to why it mentions a 'disconnected' framework. In Mesos, I can see the 3 slaves and the Marathon (and Chronos) framework are mentioned in the 'active frameworks'.
The /etc/hosts mention the following entries:
192.0.2.11 master1 # VAGRANT: cd38e81ab8742b23dfbcb913468368ea (master1) / 1b611425-dbad-4bd0-8727-4169c09ec045
192.0.2.51 slave1 # VAGRANT: 94630539b67d178dddffda29a0313a75 (slave1) / 1a1694de-2bd2-4d96-bdf2-dd6767d1f310
192.0.2.52 slave2 # VAGRANT: 306e67b33b327b3d1c9990bf1316a321 (slave2) / bdbd677e-5298-4d49-90a8-e521139dd127
192.0.2.12 master2 # VAGRANT: fb338e9e9c001a5bfab605387ba88d02 (master2) / bdccfd80-b1e6-48a0-8986-b24c7cbd7a25
192.0.2.53 slave3 # VAGRANT: 3913b3358eadc90c622859ddb90bfede (slave3) / 786cbe69-2af5-43b7-8e70-d6cc07d4ddf4
192.0.2.13 master3 # VAGRANT: 92cdd6e36a6c0391e2a66f73661e56fe (master3) / 03bb2c16-f474-4412-b8f4-fce82e12955c
Note: in case more info is needed on how the cluster was installed, please refer to this
You can also set LIBPROCESS_IP as environment variable. I think this is better than changing the /etc/hosts.
Found the solution here: https://groups.google.com/forum/#!topic/marathon-framework/1qboeZTOLU4
I guess you need to make sure that the hostnames are resolvable to actual IP addresses.
That's at least what fixed my problems when Marathon etc. tried to bind to 127.0.1.1 on Ubuntu. I.e. you should add on each host the IP to hostname mappings, e.g.
192.0.2.11 master1
entry in the /etc/hosts file either before the mapping of the 127.0.1.1 to the hostname, or remove the 127.0.1.1 entry entirely. The Vagrant plugin vagrant-hostsupdater might help.
Related
Mosquitto config:
per_listener_settings true
listener 1883
protocol mqtt
listener 9001
protocol websockets
require_certificate false
log_type all
allow_anonymous true
In Node-Red, I have an MQTT Publish node set to publish to localhost:1883.
When I run mosquitto without the listener 9001 and protocol websockets lines, node-red successfully connects and publishes to a topic. But I need websockets for a react application. When I run it with websockets on port 9001, I get the following error in mosquitto:
New connection from 127.0.0.1:61482 on port 1883.
Sending CONNACK to nodered_7b952a504a975460 (0, 5)
Client nodered_7b952a504a975460 disconnected, not authorised.
I've even tried using the websocket as the url for the node-red publish node like this:
ws://localhost:9001 and ws://localhost:9001/mqtt. Neither works.
What do I have to do to be able to publish from node-red to my react app via a local mosquitto broker?
I had to remove per_listener_settings true from the config file. I didn't realize I had it in there at first. Unfortunately, I do not know why this is the case.
I use Jasmin sms Gateway, and I have to connect to a server: smpp-1-ire.smscarrier.com at port 8011, I following your instructions but I can not connect, all other configurations and test works :
Establishing TCP connection to smpp-1-ire.smscarrier.com:8011
2018-04-10 15:24:06 INFO 11188 Connecting to IPv4Address(TCP, 'smpp-1-ire.smscarrier.com', 8011) ...
2018-04-10 15:24:07 WARNING 11188 SMPP connection established from 52.31.169.62 to port 51982
2018-04-10 15:24:07 INFO 11188 Connection made to smpp-1-ire.smscarrier.com:8011
2018-04-10 15:24:07 WARNING 11188 Requesting bind as transceiver
2018-04-10 15:24:07 ERROR 11188 Bind failed [[Failure instance: Traceback (failure with no frames): <class 'jasmin.vendor.smpp.pdu.error.SMPPTransactionError'>: ESME_RBINDFA$
]]. Disconnecting...
I tried to change the 2275 listening port in 8011 on jasmin.cfg , nothing...
But this works :
System ID: test
Password: test
host: smsc-sim.smscarrier.com
Port: 2775
Log:
*2018-04-10 17:35:21 INFO 14022 Establishing TCP connection to smsc-sim.smscarrier.com:2775
2018-04-10 17:35:21 INFO 14022 Connecting to IPv4Address(TCP, 'smsc-sim.smscarrier.com', 2775) ...
2018-04-10 17:35:21 WARNING 14022 SMPP connection established from 35.177.141.136 to port 48570
2018-04-10 17:35:21 INFO 14022 Connection made to smsc-sim.smscarrier.com:2775
2018-04-10 17:35:21 WARNING 14022 Requesting bind as transceiver*
2018-04-10 17:35:21 WARNING 14022 Bind succeeded...now in state BOUND_TRX
I am still convinced that Jasmin can be configured with my service provider, but I do not know all JasminSMS settings, as with NowSMS it worked, and that I used for tests, I took a screenshot of the configuration, if someone one can help me find the 2 parameters that are missing, systype smpp which is accepted but also maybe SMSC Character Set whose value should be: IA5 (GSM).
JasminSMS Gateway is perhaps not compatible with this provider, it would still be extraordinary, given the notoriety of Jasmin.
Anyway, thank you for your help.
I suggest improving logging where you got error response to your SMPP bind request, print request parameters and response error code. Have a look at SMPP specification as well to see the meaning of the error code. Potential reasons :
At least one of the following parameter values are wrong: system_id, system_type or password.
Your bind requestor origin host IP is not white listed on SMPP server side.
SMPP server side might be refusing 2nd client bind using the same system_id.
Address Range parameter usage might have been enforced by SMPP serve
There can be system_id or password character length limitation on your API side
I'm running into a bug where RabbitMQ is sometimes complaining about "PRECONDITION_FAILED - fast reply consumer does not exist" although as you can see below the message I send does not have a fast reply, the reply-to is null. About 50% of the time the message will get sent to the exchange/queue as I'm expecting and the other 50% of the time I am getting this error which destroys the message. I am running this code in spring boot 1.3.6 with spring amqp 1.6.0. RabbitMQ server is 3.5.5 with Erlan 18.1. I am unable to update the versions as this is production code.
My code is very simple. I declare a response exchange/queue for further communication.
amqpAdmin.declareExchange(exchange);
amqpAdmin.declareQueue(queue);
amqpAdmin.declareBinding(binding);
I send my amqp message to the exchange/routing key of a topic exchange, but it never makes it there due to the following error:
Publish Message Success: [MyObject], MessageProperties [headers={__TypeId__=com.do.comp.amqp}, timestamp=null, messageId=null, userId=null, appId=null, clusterId=null, type=null, correlationId=null, replyTo=null, contentType=application/json, contentEncoding=UTF-8, contentLength=97, deliveryMode=PERSISTENT, expiration=null, priority=0, redelivered=null, receivedExchange=null, receivedRoutingKey=null, deliveryTag=0, messageCount=null]]
AMQP Connection 10.12.36.75:5672 [ERROR] org.springframework.amqp.rabbit.connection.CachingConnectionFactory.log(CachingConnectionFactory.java:1198) - Channel shutdown: channel error; protocol method: #method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - fast reply consumer does not exist, class-id=60, method-id=40)<br>
http-nio-8122-exec-7 [DEBUG] org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getCachedChannelProxy(CachingConnectionFactory.java:476) - Creating cached Rabbit Channel from AMQChannel(amqp://admin#10.12.36.75:5672/,3)<br>
http-nio-8122-exec-7 [DEBUG] org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:1392) - Executing callback on RabbitMQ Channel: Cached Rabbit Channel: AMQChannel(amqp://admin#10.12.36.75:5672/,3), conn: Proxy#782534f9 Shared Rabbit Connection: SimpleConnection#74658797 [delegate=amqp://admin#10.12.36.75:5672/, localPort= 60282]
Then I listen on the queue I created for a response that will never come:
rabbitTemplate.receive(queue);
The error above has to do with direct reply-to queues and I'm not using that, my reply-to messageHeader is null. Another odd thing is we are running this exact jar on three different servers for testing and development and only one of them seems to be having an issue, but they are all the same version of everything. RabbitMQ v.3.5.5 Erland 18.1
Why is RabbitMQ throwing a fast reply error when reply-to is null?
I'm able to perform ssl & websocket handshake. The http connection is getting upgraded to websocket connection which is fine. The erlang websocket client is getting connected to the IBM Bluemix server.
But after some time I receive ssl_closed response which closes down the connection. I was sending ping request to the server and was getting response for it in binary format (which might be pong ({binary,<<10,0>>} .. haven't decoded the binary response frame).
SockReply : {ok,{sslsocket,{gen_tcp,#Port<0.2284>,tls_connection,undefined}, <0.52.0>}}
Socket : {sslsocket,{gen_tcp,#Port<0.2284>,tls_connection,undefined}, <0.52.0>} [debug] [d:6xxxxx:myFybr123:streetlight_123#172.16.1.237:57054]
SENT: CONNECT(Q0, R0, D0, ClientId=d:6xxxxx:myXXXX123:streetlight_123, ProtoName=MQTT, ProtoVsn=3, CleanSess=true, KeepAlive=300, Username=use-token-auth, Password=**)
[info] [Client <0.36.0>] connected with wss://6xxxxx.messaging.internetofthings.ibmcloud.com:443
[warning] [Client <0.36.0>] Connection lost for: ssl_closed when state is waiting_for_connack
Message : {ssl_closed, {sslsocket, {gen_tcp,#Port<0.3922>,tls_connection,undefined}, <0.74.0>}}
Why am I receiving ssl_closed after getting connected?
I deployed a long running Storm topology. After several hours running, the whole topology went down. I checked worker logs, and found these logs . As it says, zookeeper client session timed out and it caused reconnection. I suspect it was relate to my broken topology. Now I try to find out what can cause clients timeout.
2016-02-29T10:34:12.386+0800 o.a.s.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 23789ms for sessionid 0x252f862028c0083, closing socket connection and attempting reconnect
2016-02-29T10:34:12.986+0800 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED
2016-02-29T10:34:13.059+0800 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.
2016-02-29T10:34:13.197+0800 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server zk-3.cloud.mos/172.16.13.147:2181. Will not attempt to authenticate using SASL (unknown error)
2016-02-29T10:34:13.241+0800 o.a.s.z.ClientCnxn [WARN] Session 0x252f862028c0083 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_31]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) ~[na:1.8.0_31]
at org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) ~[storm-core-0.9.6.jar:0.9.6]
Your client can no longer talk to the ZooKeeper server. The first thing that happened was there was no answer to the heartbeats within the negotiated session timeout:
2016-02-29T10:34:12.386+0800 o.a.s.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 23789ms for sessionid 0x252f862028c0083, closing socket connection and attempting reconnect
Then when it tried to reconnect, it got a connection refused:
2016-02-29T10:34:13.241+0800 o.a.s.z.ClientCnxn [WARN] Session 0x252f862028c0083 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
This means either your ZooKeeper server:
Is not reachable (network connection down)
Is dead (so nothing is listening on the socket)
Is GCing itself to death and cannot communicate (although that might have issued a connection timeout error, I'm not sure)
To tell more you will need to check the ZooKeeper server logs on your (Hadoop?) cluster.
Its worked for me by increasing the connection timeout in server.properties:
zookeeper.connection.timeout.ms=60000
One way that this can happen is if you start zookeeper, then break in the terminal, then try to start kafka.
In order to use kafka, you really should use 3 terminal windows (or 3 PuTTY sessions if you are SSHing into your instance from Windows)
First Session for Zookeeper server.
Second Session for Kafka server.
Third Session for running Kafka commands to do things like create topics.
I have started Kafka in cluster mode with 3 zookeeper server and 3 Kafka server. All zookeeper server started successfully but while starting Kafka server its get disconnected stating "fatal error during Kafka server startup. prepare to shutdown (kafka.server.kafkaserver)". while investigation, I found that Kafka server get disconnected every time after 18 seconds[which is zookeeper.connection.timeout.ms = 18000 default value] so I updated the same and issue get resolved.
always use 2181 as port number for zookeeper connection until you haven't configured your zookeeper !!!