I'm adding Accumulo to my Cloudera cluster.
How should I assign roles.
I have 4 servers currently running.
1 Server: HDFS Name Node, HDFS Secondary Name node, HDFS Balancer, Activity Monitor, Cloudera Management Services, Spark Gateway, Spark History Server, Yarn Job History Server, Yarn Resource Manager, Zookeeper Server
3 Servers: HDFS Data Node, Kafka Broker, Spark Gateway, Yarn Node Manager, Zookeeper Server
Cloudera wizard asks for assignment of the following Accumulo roles: Master, Tablet Server, Garbage Collector, Monitor, Tracer, Gateway.
Is it OK if Tablet Server role is assigned to all HDFS Data Nodes and all other roles to first server?
Is there a sense to assign Accumulo Gateway to the same nodes as Tablet Server?
Yes, running the Accumulo Master, Garbage Collector, Monitor, and Tracer on the first server and running TabletServers on the others make sense.
I'm not sure what the "Accumulo Gateway" is; Apache Accumulo has no such component/service called "Gateway".
Related
I recently installed the CDH distribution of Cloudera 6.1 to create a two node cluster. From the Cloudera Manager UI, all services are running fine.
However the namenode (and datanode) web UI alone is not opening.
Firewall is already disabled.
Any pointers on how to debug this problem?
Is anything listening on the ports on the host itself?
Did you check namenode logs? They should be in /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-your.server.hostname.com.log.out
I have a 6 Nodes (2 masters + 4 slaves)production cluster with HA configured .
The actual topology is :
Master 1 :
Active HBase Master
Hive Metastore
HiveServer2
HST Server
Knox Gateway
Active NameNode
Oozie Server
Active ResourceManager
WebHCat Server
ZooKeeper Server
HST Agent
JournalNode
Metrics Monitor
Master 2 :
App Timeline Server
Standby HBase Master
History Server
Infra Solr Instance
Metrics Collector
Grafana
Standby NameNode
Standby ResourceManager
Spark2 History Server
Zeppelin Notebook
ZooKeeper Server
HST Agent
JournalNode
Metrics Monitor
Clients
SLAVE 1/2/3 :
DataNode
RegionServer
HST Agent
NodeManager
MetricsMonitoring
One of the slave nodes Contains : JournalNode + Zookeeper Server
Now We are planning to add some Edge Node .
Our plan is :
SQL Edge Node :
HCatalog
HiveServer2
WebHCat
Admin Edge Node
Ambari Server
Ranger
Lineage Edge Node
Job History Server
Spark2 History Server
App Timeline Server
Slider Registry Server
End User Access Edge Node
Hue
Knox Edge Node
Knox Gateway
Scheduling Edge Node
Oozie Server
Falcon
What do you think ?
What's the best practice ?
What's the components to move from Master/Slave to Edge nodes ?
Thanks
Edge nodes are meant to be Clients only. No masters/slaves. Very minimal resources other than disk space maybe for being to SCP files before using hdfs dfs -put
The Knox Gateway itself is somewhat self-described as a secure edge-node, proxy into the cluster. Depending on if you are actually using it.
If you aren't using HBase & Zeppelin, then, you could probably remove those from the cluster. If you have the available resources, HBase should sit on its own dedicated servers
Same for Zookeeper - those should ideally be separated for optimal throughput.
Two of my drives crashed on the Ambari Server node so I have to re-migrate my Ambari Cluster. No real data was lost (due to a different backup strategy) but the configuration files of the node, including Ambari Server configuration, are gone.
Because two drives crashed, I can not access any files from that node anymore (RAID 5).
I am now in the process of reinstalling the Ambari Server on the same node and would like to have my agents seamlessly reconnect to the "new" Ambari Server.
Is there a way to migrate the existing Cluster settings to the Ambari Server? I am thinking of Cluster settings that were distributed to the agents or similar.
If there is no such way to migrate the cluster, how would I go and install the Ambari Server? Do a fresh install and setup everything again? Will the Ambari agents be able to connect to the "new" Cluster without problems? Note that the Ambari Server will run on the same hostname/ip.
HDP cluster deployed successfully on AWS EC2. After restart of the HDP cluster nodes, heartbeat lost from ambari server as all Public and Private IP’s and DNS are changed.
Where in ambari server we can configure new IP’s or DNS ??
First, Ambari requires to have FQHN for all your nodes. It is best practice to assign proper hostnames on all your nodes.
A simple word-around for getting back the heartbeat on your Ambari server is to run the following on all your clients nodes:
sudo ambari-agent restart your_ambari.server.hostname.com
It worked for me on Ambari 2.0 and Ubuntu 12. Good luck!
I'm trying to use the Dedoop application that runs using Hadoop and HDFS on Amazon EC2. The Hadoop cluster is set up and the Namenode JobTracker and all other Daemons are running without error.
But the war Dedoop.war application is not able to connect to the Hadoop Namenode after deploying it on tomcat.
I have also checked to see if the ports are open in EC2.
Any help is appreciated.
If you're using Amazon AWS, I highly recommend using Amazon Elastic Map Reduce. Amazon takes care of setting up and provisioning the Hadoop cluster for you, including things like setting up IP addresses, NameNode, etc.
If you're setting up your own cluster on EC2, you have to be careful with public/private IP addresses. Most likely, you are pointing to the external IP addresses - can you replace them with the internal IP addresses and see if that works?
Can you post some lines of the Stacktrace from Tomcat's log files?
Dedoop must etablish an SOCKS proxy server (similar to ssh -D port username#host) to pass connections to Hadoop nodes on EC2. This is mainly because Hadoop resolves puplic IPS to EC2-internal IPs which breaks MR Jobs submission and HDFS management.
To this end Tomcat must be configured to to etablish ssh connections. The setup procedure is described here.