Edge Node hortonworks usage - hadoop
I have a 6 Nodes (2 masters + 4 slaves)production cluster with HA configured .
The actual topology is :
Master 1 :
Active HBase Master
Hive Metastore
HiveServer2
HST Server
Knox Gateway
Active NameNode
Oozie Server
Active ResourceManager
WebHCat Server
ZooKeeper Server
HST Agent
JournalNode
Metrics Monitor
Master 2 :
App Timeline Server
Standby HBase Master
History Server
Infra Solr Instance
Metrics Collector
Grafana
Standby NameNode
Standby ResourceManager
Spark2 History Server
Zeppelin Notebook
ZooKeeper Server
HST Agent
JournalNode
Metrics Monitor
Clients
SLAVE 1/2/3 :
DataNode
RegionServer
HST Agent
NodeManager
MetricsMonitoring
One of the slave nodes Contains : JournalNode + Zookeeper Server
Now We are planning to add some Edge Node .
Our plan is :
SQL Edge Node :
HCatalog
HiveServer2
WebHCat
Admin Edge Node
Ambari Server
Ranger
Lineage Edge Node
Job History Server
Spark2 History Server
App Timeline Server
Slider Registry Server
End User Access Edge Node
Hue
Knox Edge Node
Knox Gateway
Scheduling Edge Node
Oozie Server
Falcon
What do you think ?
What's the best practice ?
What's the components to move from Master/Slave to Edge nodes ?
Thanks
Edge nodes are meant to be Clients only. No masters/slaves. Very minimal resources other than disk space maybe for being to SCP files before using hdfs dfs -put
The Knox Gateway itself is somewhat self-described as a secure edge-node, proxy into the cluster. Depending on if you are actually using it.
If you aren't using HBase & Zeppelin, then, you could probably remove those from the cluster. If you have the available resources, HBase should sit on its own dedicated servers
Same for Zookeeper - those should ideally be separated for optimal throughput.
Related
Is there a way to reconnect a remote cluster in elasticsearch cluster when the remote reatarts
Cluster A is having a remote cluster setup with Cluster B, but when B restarts, the remote cluster status always shows as disconnected. I am using mode proxy.
Apache Accumulo role assignment
I'm adding Accumulo to my Cloudera cluster. How should I assign roles. I have 4 servers currently running. 1 Server: HDFS Name Node, HDFS Secondary Name node, HDFS Balancer, Activity Monitor, Cloudera Management Services, Spark Gateway, Spark History Server, Yarn Job History Server, Yarn Resource Manager, Zookeeper Server 3 Servers: HDFS Data Node, Kafka Broker, Spark Gateway, Yarn Node Manager, Zookeeper Server Cloudera wizard asks for assignment of the following Accumulo roles: Master, Tablet Server, Garbage Collector, Monitor, Tracer, Gateway. Is it OK if Tablet Server role is assigned to all HDFS Data Nodes and all other roles to first server? Is there a sense to assign Accumulo Gateway to the same nodes as Tablet Server?
Yes, running the Accumulo Master, Garbage Collector, Monitor, and Tracer on the first server and running TabletServers on the others make sense. I'm not sure what the "Accumulo Gateway" is; Apache Accumulo has no such component/service called "Gateway".
Heartbeat lost AMBARI HDP
I lost all the heartbeats with Ambari on one of the nodes of a cluster of 4 nodes. http://i.stack.imgur.com/51Gie.png I already tried to reboot the cluster, restart ambari-agent, ambari-server and restart some of the services manually like yarn. Nothing work and I am stuck now. Ambari is in 2.1.1
Phoenix not connected on zookeeper from Cloudera CDH5 distribution
I installed zookeeper, hbase-master, hbase-regionserver in theree different systems. And configured according to the CDH5 guideline. Able to start all the services. Added Phoenix-4.2.2 on zookeeper node. When trying to connect database by ./sqlline localhost , getting below error ERROR: Can't get master address from ZooKeeper; znode data == null please help
i assume you have copied phoenix-*-server.jar to all region server hbase lib.. check what is the zookeeper quorem server name giver hbase-site.xml .. if the machine name is given then you should map machine name to its ip address in /etc/hosts..hope this helps
Hortonworks HDP ambari AWS EC2 heartbeat lost
HDP cluster deployed successfully on AWS EC2. After restart of the HDP cluster nodes, heartbeat lost from ambari server as all Public and Private IP’s and DNS are changed. Where in ambari server we can configure new IP’s or DNS ??
First, Ambari requires to have FQHN for all your nodes. It is best practice to assign proper hostnames on all your nodes. A simple word-around for getting back the heartbeat on your Ambari server is to run the following on all your clients nodes: sudo ambari-agent restart your_ambari.server.hostname.com It worked for me on Ambari 2.0 and Ubuntu 12. Good luck!