Where can I find Druid broker/coordinator configuration details? - hortonworks-data-platform

I'm working on the Hontonworks platform with Druid and Superset.
An architect has configured the connection to the Druid cluster on Superset.
That works, but I would like to understand where I can find all this information.
To declare a Druid cluster on Superset, I have to set parameters. The configuration that works looks like:
- Coordinator Host : <URL of the cluster Druid>
- Coordinator Port : 8081
- Coordinator Endpoint : druid/coordinator/v1
- Broker Host : <URL of the cluster Druid>
- Broker Port : 8888
- Broker Endpoint : druid/v2
Does anyone know where can I find all this information?

Related

org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts

I am getting the below message.
Could not find leader nimbus from seed hosts [master]. Did you specify
a valid list of nimbus hosts for config nimbus.seeds?
Delete storm under zookeeper.try to use hostname or ip in nimbus.seeds
storm.zookeeper.servers :
- "master"
- "salve1"
- "salve2"
storm.zookeeper.port : 2181
nimbus.seeds : ["master"]
nimbus.thrift.port : 6690
storm.local.dir : "/root/storm"
supervisor.slots.ports :
- 6700
- 6701
- 6702
- 6703
Why this can happen?
Make sure Zookeeper is running, and is accessible on the machine you're running the command from. You can check this with curl. On the machine you're starting your storm command from, try running curl master:2181. You should get an empty reply.
e.g.
$ curl localhost:2181
curl: (52) Empty reply from server
Do the same for the two other hosts you run Zookeeper on.
Then make sure curl master:6690 also returns an empty reply, since that is the Thrift port you've configured.
If you are getting connection refused on either command, you need to fix your network setup, so the machines can talk to each other.

Kafka: client has run out of available brokers to talk to

I'm trying to wrap up changes to our Kafka but I'm in over my head and am having a hard time debugging the issue.
I have multiple servers funneling their Ruby on Rails logs to 1 Kafka broker using Filebeat, from there the logs go to our Logstash server, and are then stashed in Elasticsearch. I didnt setup the original system but I tried taking us down from 3 Kafka servers to 1 as they weren't need. I updated the IP address configs in these files in our setup to remove the 2 old Kafka servers and restarted the appropriate services.
# main (filebeat)
sudo vi /etc/filebeat/filebeat.yml
sudo service filebeat restart
# kafka
sudo vi /etc/hosts
sudo vi /etc/kafka/config/server.properties
sudo vi /etc/zookeeper/conf/zoo.cfg
sudo vi /etc/filebeat/filebeat.yml
sudo service kafka-server restart
sudo service zookeeper-server restart
sudo service filebeat restart
# elasticsearch
sudo service elasticsearch restart
# logstash
sudo vi /etc/logstash/conf.d/00-input-kafka.conf
sudo service logstash restart
sudo service kibana restart
When I tail the Filebeat logs I see this -
2018-04-23T15:20:05Z WARN kafka message: client/metadata got error from broker while fetching metadata:%!(EXTRA *net.OpError=dial tcp 172.16.137.132:9092: getsockopt: connection refused)
2018-04-23T15:20:05Z WARN kafka message: client/metadata no available broker to send metadata request to
2018-04-23T15:20:05Z WARN client/brokers resurrecting 1 dead seed brokers
2018-04-23T15:20:05Z WARN kafka message: Closing Client
2018-04-23T15:20:05Z ERR Kafka connect fails with: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
to 1 Kafka broker... I tried taking us down from 3 Kafka servers to 1 as they weren't need. I updated the IP address configs in these files in our setup to remove the 2 old Kafka servers and restarted the appropriate services
I think you are misunderstanding that Kafka is only a highly available system if you have more than one broker, so the other 2 are needed despite you possibly only providing a single broker in the logstash config
Your errors state the single broker refused a connection, and therefore no logs will be sent to it.
At a minimum, I would recommend 4 brokers, and a replication factor of 3 on all your critical topics for a useful Kafka cluster.. That way, you can tolerate broker outages as well as distribute the load of your Kafka brokers.
It would also be beneficial to make the topic count a factor of your total logging servers, as well as key a Kafka message based on the application type, for example. That way you are guaranteed log order for those applications

Unable to gossip with any seeds but continuing since node is in its own seed list

To remove a node from 2 node cluster in AWS I ran
nodetool removenode <Host ID>
After this I was supposed to get my cluster back if I put all the cassandra.yaml and cassandra-rackdc.properties correctly.
I did it but still, I am not able to get back my cluster.
nodetool status is displaying only one node.
significant system.log on cassandra is :
INFO [main] 2017-08-14 13:03:46,409 StorageService.java:553 - Cassandra version: 3.9
INFO [main] 2017-08-14 13:03:46,409 StorageService.java:554 - Thrift API version: 20.1.0
INFO [main] 2017-08-14 13:03:46,409 StorageService.java:555 - CQL supported versions: 3.4.2 (default: 3.4.2)
INFO [main] 2017-08-14 13:03:46,445 IndexSummaryManager.java:85 - Initializing index summary manager with a memory pool size of 198 MB and a resize interval of 60 minutes
INFO [main] 2017-08-14 13:03:46,459 MessagingService.java:570 - Starting Messaging Service on /172.15.81.249:7000 (eth0)
INFO [ScheduledTasks:1] 2017-08-14 13:03:48,424 TokenMetadata.java:448 - Updating topology for all endpoints that have changed
WARN [main] 2017-08-14 13:04:17,497 Gossiper.java:1388 - Unable to gossip with any seeds but continuing since node is in its own seed list
INFO [main] 2017-08-14 13:04:17,499 StorageService.java:687 - Loading persisted ring state
INFO [main] 2017-08-14 13:04:17,500 StorageService.java:796 - Starting up server gossip
Content of files:
cassandra.yaml : https://pastebin.com/A3BVUUUr
cassandra-rackdc.properties: https://pastebin.com/xmmvwksZ
system.log : https://pastebin.com/2KA60Sve
netstat -atun https://pastebin.com/Dsd17i0G
Both the nodes have same error log.
All required ports are open.
Any suggestion ?
It's usually a best practice to have one seed node per DC if you have just two nodes available in your datacenter. You shouldn't make every node a seed node in this case.
I noticed that node1 has - seeds: "node1,node2" and node2 has - seeds: "node2,node1" in your configuration. A node will start by default without contacting any other seeds if it can find it's IP address as first element in - seeds: ... section in the cassandra.yml configuration file. That's what you can also find in your logs:
... Unable to gossip with any seeds but continuing since node is in its own seed list ...
I suspect, that in your case node1 and node2 are starting without contacting each other, since they identify themselves as seed nodes.
Try to use just node1 for seed node in both instance's configuration and reboot your cluster.
In case of node1 being down and node2 is up, you have to change - seeds: ... section in node1 configuration to point just to node2's IP address and just boot node1.
If your nodes can't find each other because of firewall misconfiguration, it's usually a good approach to verify if a specific port is accessible from another location. E.g. you can use nc for checking if a certain port is open:
nc -vz node1 7000
References and Links
See the list of ports Cassandra is using under the following link
http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/secureFireWall.html
See also a detailed documentation on running multiple nodes with plenty of sample commands:
http://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html
This is for future reference. My problem has been solved just by opening 7000 port for same security group in AWS. Although it was open but security group was something different.
When I ran:
ec2-user#ip-Node1 ~]$ telnet Node2 7000
Trying Node2...
telnet: connect to address Node2: Connection timed out
I came to know the problem could be of the security group.
And that is how it has been solved.
About seeds I am using IP of both the nodes, like this:
-seeds: "node1,node2"
It is same on both the nodes.

Elasticsearch configuration on google cloud

I have installed elastic search on google cloud. I get this error when I try to connect to it:
Elasticsearch ERROR: 2017-04-17T04:27:45Z
Error: Request error, retrying
HEAD http://localhost:9200/ => connect ECONNREFUSED 127.0.0.1:9200
In /etc/elasticsearch/elasticsearch.ym file, I have unsuccessfully tried :
network.host: 127.0.0.1
and
#network.host: 192.168.0.1 (default)
I appreciate if someone help me find out what I'm missing.
on which interface do you want your ElasticSearch to listen to ?
Quickly you can start ES and locally inspect listening sockets with :
netstat -tlpn
ss -tlpn
By default ES listens to localhost, you can find how to manage this here :
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html
Kind regards.

RabbitMQ starts on a different port every time?

I am running rabbitmq servers over Ec2. I am trying to create a cluster, and have the ports: 4369 and 25672 and 5672 open as specified in the rabbitmq docs : https://www.rabbitmq.com/clustering.html
Whenever I start my rabbitmq server:
rabbitmq-server -detached
The server starts on a different port. Output of epmd -names gives:
epmd: up and running on port 4369 with data:
name rabbit at port 50696
Where '50696' changes each time I stop the server and start it again. This is making it impossible for me to cluster my instances without allowing all ports inbound on my aws firewall rules.
Any ideas on what is going on?
Take a look at RABBITMQ_DIST_PORT here: https://www.rabbitmq.com/ec2.html
rabbitmq.config:
[
{kernel,
[
{inet_dist_listen_min, 55555},
{inet_dist_listen_max, 55560}
]}
].
epmd -names
epmd: up and running on port 4369 with data:
name rabbit at port 55555

Resources