Consul deployment issues in a three node cluster setup

Consul deployment issues in a three node cluster setup - consul

We are setting up a three node Cosul server. We are able to access only one of the server host but any other host is not accessible from the Consul UI.
Gives this error (No cluster Leader )
Here are the error logs that are present on the non working Consul Server node
2022-07-06T14:54:29.925Z [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-17.99.211.49:8300 lapp116.dc1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <IP-1>:0-><Ip-2>:8300: operation was canceled". Reconnecting...
2022-07-06T14:54:41.537Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2022-07-06T14:55:05.054Z [ERROR] agent: Coordinate update error: error="No cluster leader"
2022-07-06T14:55:06.052Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2022-07-06T14:55:28.927Z [ERROR] agent: Coordinate update error: error="No cluster leader"
2022-07-06T14:55:31.513Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2022-07-06T14:56:02.307Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2022-07-06T14:56:03.070Z [ERROR] agent: Coordinate update error: error="No cluster leader"
2022-07-06T14:56:26.165Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2022-07-06T14:56:35.031Z [ERROR] agent: Coordinate update error: error="No cluster leader"
2022-07-06T14:56:55.459Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2022-07-06T14:57:07.616Z [ERROR] agent: Coordinate update error: error="No cluster leader"
2022-07-06T14:57:27.686Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2022-07-06T14:57:35.128Z [ERROR] agent: Coordinate update error: error="No cluster leader"
2022-07-06T14:57:58.915Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2022-07-06T14:58:03.708Z [ERROR] agent: Coordinate update error: error="No cluster leader"
2022-07-06T14:58:22.158Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2022-07-06T14:58:29.168Z [ERROR] agent: Coordinate update error: error="No cluster leader"
2022-07-06T14:58:50.469Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader
"
Need help !
thanks much.

Related

Expected hostname at index 7 for neo4j bolt (3.5.21)

We run neo4j (3.5.21) in an EC2 instance. Today, after I restarted the server, noticed this error:
Expected hostname at index 7: bolt://:7687". Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start
Service start logs:
Active database: graph.db
Directories in use:
home: /var/lib/neo4j
config: /etc/neo4j
logs: /var/log/neo4j
plugins: /var/lib/neo4j/plugins
import: /var/lib/neo4j/import
data: /var/lib/neo4j/data
certificates: /var/lib/neo4j/certificates
run: /var/run/neo4j
Starting Neo4j.
WARNING: Max 1024 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
Started neo4j (pid 22577). It is available at http://0.0.0.0:7474/
There may be a short delay until the server is ready.
See /var/log/neo4j/neo4j.log for current status.
This is what I see in neo4j.log:
2022-12-03 20:29:49.886+0000 INFO Bolt enabled on 0.0.0.0:7687.
2022-12-03 20:29:51.968+0000 INFO Started.
2022-12-03 20:29:52.121+0000 INFO Stopping...
2022-12-03 20:29:52.231+0000 INFO Stopped.
2022-12-03 20:29:52.233+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687". Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:45)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:187)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:124)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:91)
at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:32)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:180)
... 3 more
Caused by: org.neo4j.graphdb.config.InvalidSettingException: Unable to construct bolt discoverable URI using '' as hostname: Expected hostname at index 7: bolt://:7687
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.add(DiscoverableURIs.java:133)
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.lambda$addBoltConnectorFromConfig$1(DiscoverableURIs.java:155)
at java.util.Optional.ifPresent(Optional.java:159)
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.addBoltConnectorFromConfig(DiscoverableURIs.java:145)
at org.neo4j.server.rest.discovery.CommunityDiscoverableURIs.communityDiscoverableURIs(CommunityDiscoverableURIs.java:38)
at org.neo4j.server.CommunityNeoServer.lambda$createDBMSModule$0(CommunityNeoServer.java:99)
at org.neo4j.server.modules.DBMSModule.start(DBMSModule.java:59)
at org.neo4j.server.AbstractNeoServer.startModules(AbstractNeoServer.java:249)
at org.neo4j.server.AbstractNeoServer.access$700(AbstractNeoServer.java:102)
at org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter.start(AbstractNeoServer.java:541)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Caused by: java.net.URISyntaxException: Expected hostname at index 7: bolt://:7687
at java.net.URI$Parser.fail(URI.java:2847)
at java.net.URI$Parser.failExpecting(URI.java:2853)
at java.net.URI$Parser.parseHostname(URI.java:3389)
at java.net.URI$Parser.parseServer(URI.java:3235)
at java.net.URI$Parser.parseAuthority(URI.java:3154)
at java.net.URI$Parser.parseHierarchical(URI.java:3096)
at java.net.URI$Parser.parse(URI.java:3052)
at java.net.URI.<init>(URI.java:673)
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.add(DiscoverableURIs.java:128)
... 15 more
2022-12-03 20:29:52.243+0000 INFO Neo4j Server shutdown initiated by request
EC2: t3.large
OS: Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1092-aws x86_64)
I have already tried restarting the server, restarting the service multiple times without any success. We have not changed anything on the networking (vpc, subnet, security groups, network interface, etc)
Curious if there's a config I am missing. Any help will be much appreciated.

Delete records from heroku production in phoenix app

Similar to this question I asked on how to delete live records in a rails app, how do I do this in Phoenix How to delete a record from production in Rails
I tried heroku run iex -S mix phx.server but I got the bellow error:
Simons-MBP:iotc Simon$ heroku run iex -S mix phx.server
Running iex -S mix phx.server on ⬢ icingonthecake... up, run.3732 (Free)
Erlang/OTP 19 [erts-8.3] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false]
14:29:43.723 [info] Running Iotc.Web.Endpoint with Cowboy using http://:::54253
14:29:43.734 [error] Postgrex.Protocol (#PID<0.276.0>) failed to connect: ** (Postgrex.Error) FATAL 53300 (too_many_connections): too many connections for role "onktihhpdhhwlp"
14:29:43.734 [error] Postgrex.Protocol (#PID<0.282.0>) failed to connect: ** (Postgrex.Error) FATAL 53300 (too_many_connections): too many connections for role "onktihhpdhhwlp"
14:29:43.734 [error] Postgrex.Protocol (#PID<0.287.0>) failed to connect: ** (Postgrex.Error) FATAL 53300 (too_many_connections): too many connections for role "onktihhpdhhwlp"
14:29:43.734 [error] Postgrex.Protocol (#PID<0.281.0>) failed to connect: ** (Postgrex.Error) FATAL 53300 (too_many_connections): too many connections for role "onktihhpdhhwlp"
14:29:43.734 [error] Postgrex.Protocol (#PID<0.284.0>) failed to connect: ** (Postgrex.Error) FATAL 53300 (too_many_connections): too many connections for role "onktihhpdhhwlp"
14:29:43.734 [error] Postgrex.Protocol (#PID<0.279.0>) failed to connect: ** (Postgrex.Error) FATAL 53300 (too_many_connections): too many connections for role "onktihhpdhhwlp"
14:29:43.734 [error] Postgrex.Protocol (#PID<0.286.0>) failed to connect: ** (Postgrex.Error) FATAL 53300 (too_many_connections): too many connections for role "onktihhpdhhwlp"

This is most likely a pool_size problem in the apps/<yourapp>/config/*.exs files -- if you're using Phoenix 1.3 that is (not sure where the configs are in 1.2 or below). I've had it set to 60 on my local machine and I wasn't even able to execute any mix task if I already had a console running. I dropped my pool_size to 10 and was fine.
Reference: https://github.com/elixir-ecto/postgrex/issues/210#issuecomment-239941678

Postgrex.Protocol (#PID<0.226.0>) failed to connect:

I've just started learning Phoenix and I ran into the following error after typing mix phoenix.server:
[error] Postgrex.Protocol (#PID<0.226.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (localhost:5432): connection refused - :econnrefused
However, the website is running perfectly when I run localhost:400

Storm - Supervisors launched but not connecting to Nimbus

I have a Storm cluster with 1 Nimbus, 4 Supervisors and 2 Zookeeper nodes. My Storm.yaml is as following:
storm.zookeeper.servers:
- "storage14"
- "storage15"
nimbus.seeds: ["storage01"]
#storm.local.hostname: "storage05"
supervisor.supervisors:
- "storage02"
- "storage03"
- "storage04"
- "storage05"
storm.local.dir: "/tmp/storm"
worker.childopts: "-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump"
This storm.yaml file is used by both Nimbus and Supervisors. When Nimbus is started I have the storm.local.hostname commented out as is shown above.
However, when starting Supervisors on respective nodes, I uncomment the storm.local.hostname and set it to the hostname of the node on which the supervisor is being launched. For instance if I was launching the supervisor on storage05, the storm.yaml file would have the following additional config param:
storm.local.hostname: "storage05"
The problem is even though Nimubs is launched successfully and I can see it on the Storm UI, some supervisors do not seem to be able to connect to Nimbus. For instance of the 4 nodes I start supervisors on, Storm UI often shows only 2 of them connected. However, if I ssh in to these nodes and run jps, I can see that the supervisor process is running on ALL of these nodes.
The Supervisors at the nodes which do end up connecting are not the same always, so it is definitely not a problem with those specific nodes.
Another thing to notice is if I try to execute a topology on whatever nodes that got connected, it does not get registered by the cluster and I can not see that topology on the UI either.
What do you think might be causing this erratic behavior?
UPDATE:
Tail end of nimbus.log has the following lines
2017-01-25 00:04:25.216 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage14/192.168.140.194:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

Your UPDATE (nimbus log) indicates that your Nimbus cannot connect Zookeeper cluster. Please check that Zookeeper cluster (storage14/storage15) is accessible from storage01 (not only node is accessible, but also do telnet to Zookeeper server via "telnet storage14 (and/or storage15) 2181").
When ZK connectivity issue is gone please try starting supervisor again.

playframework 2.0 - exceeded max_user_connections on database evolutions (with local postgres server)

I am having the same exact issue as (playframework 2.0 - exceeded max_user_connections on database evolutions?) only this time it is with a local postgres install. I have a sample application I threw up on hub # http://git.io/CdEntA.
I attempt to run it locally, using
sbt stage
target/start -DapplyEvolutions.default=true -Ddb.default.driver=org.postgresql.Driver - Ddb.default.url="jdbc:postgresql://localhost:5432/test?user=myuser"
When I launch http://localhost:9000, all I see on the console is...
[error] c.j.b.h.AbstractConnectionHook - Failed to acquire connection Sleeping for 1000ms and trying again. Attempts left: 10. Exception: null [error] c.j.b.h.AbstractConnectionHook - Failed to acquire connection Sleeping for 1000ms and trying again. Attempts left: 10. Exception: null [info] play - Application started (Prod) [info] play - Listening for HTTP on port 9000... [error] c.j.b.h.AbstractConnectionHook - Failed to acquire connection Sleeping for 1000ms and trying again. Attempts left: 9. Exception: null [error] c.j.b.h.AbstractConnectionHook - Failed to acquire connection Sleeping for 1000ms and trying again. Attempts left: 8. Exception: null [error] c.j.b.h.AbstractConnectionHook - Failed to acquire connection Sleeping for 1000ms and trying again. Attempts left: 7. Exception: null [error] c.j.b.h.AbstractConnectionHook - Failed to acquire connection Sleeping for 1000ms and trying again. Attempts left: 6. Exception: null [error] c.j.b.h.AbstractConnectionHook - Failed to acquire connection Sleeping for 1000ms and trying again. Attempts left: 5. Exception: null [error] c.j.b.h.AbstractConnectionHook - Failed to acquire connection Sleeping for 1000ms and trying again. Attempts left: 4. Exception: null [error] c.j.b.h.AbstractConnectionHook - Failed to acquire connection Sleeping for 1000ms and trying again. Attempts left: 3. Exception: null [error] c.j.b.h.AbstractConnectionHook - Failed to acquire connection Sleeping for 1000ms and trying again. Attempts left: 2. Exception: null [error] c.j.b.h.AbstractConnectionHook - Failed to acquire connection Sleeping for 1000ms and trying again. Attempts left: 1. Exception: null [error] c.j.b.PoolWatchThread - Error in trying to obtain a connection. Retrying in 1000ms org.postgresql.util.PSQLException: FATAL: sorry, too many clients already at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:293) ~[postgresql-9.1-901-1.jdbc4.jar:na]
When I take a peek into the database, I see all connections are in fact used up by the process.
Any help with be greatly appreciated.
Thanks.

I believe your problem is that you are not overriding the db.default.user config parameter so it's using the sa value. Just comment out the following line in your conf/application.conf file:
db.default.user=sa
Once I did that and re-ran sbt stage then it worked fine for me.

Try reducing the numbers of connections used by your play app.
Here is a configuration that only uses 5 connections.
db.default.partitionCount=1
db.default.maxConnectionsPerPartition=5
db.default.minConnectionsPerPartition=5
Basically the number of connections will be partitionCount x ConnectionsPerPartition

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Consul deployment issues in a three node cluster setup - consul

Related

Expected hostname at index 7 for neo4j bolt (3.5.21)

Delete records from heroku production in phoenix app

Postgrex.Protocol (#PID<0.226.0>) failed to connect:

Storm - Supervisors launched but not connecting to Nimbus

playframework 2.0 - exceeded max_user_connections on database evolutions (with local postgres server)

Categories

Resources