I wanted to understand how hive knows which of the hadoop namenode is in active state and what happens when the active namenode fails
Hive is configured via metatool to point to the configured dfs.nameservices for HA HDFS. See https://cwiki.apache.org/confluence/display/Hive/Hive+MetaTool. dfs.nameservices is a logical address while the actual namenodes are configured with dfs.ha.namenodes.[id].
As for which Namenode is active, state is stored in Zookeeper. When the active namenode fails, failover is triggered after a configured time (5 second default, ha.zookeeper.session-timeout.ms). A fencing script is required and triggers the standby namenode to become active.
In hdfs HA environment name node url should be a logical name (eg hdfs://logicalnamenode). We need to configure hive to work with HA. For that you need to change the hive name node configuration with metatool command.
List the current NN configuration
~# metatool -listFSRoot
hdfs://namenode:8020/user/hive/warehouse
The following command will update the old NN configuration with Logical name
metatool -updateLocation hdfs://logicalnamenode hdfs://namenode:8020 -tablePropKey avro.schema.url
Related
I have a multi-node standalone hadoop cluster for HDFS. I am able to load data to HDFS, however everytime I reboot my computer and start the cluster by start-dfs.sh, I don't see the dashboard until I perform hdfs namenode -format which erases all my data.
How do I start hadoop cluster without having to go through hdfs namenode -format?
You need to shutdown hdfs and the namenode cleanly (stop-dfs) before you shutdown your computer. Otherwise, you can corrupt the namenode, causing you to need to format to get back to a clean state
I am trying to understand how to submit a MR job to Hadoop cluster, YARN based.
Case1:
For case in which there is only one ResourceManager (that is NO HA), we can submit the job like this (which i actually used and I believe is correct).
hadoop jar word-count.jar com.example.driver.MainDriver -fs hdfs://master.hadoop.cluster:54310 -jt master.hadoop.cluster:8032 /first/dir/IP_from_hdfs.txt /result/dir
As can be seen, RM is running on port 8032 and NN on 54310 and I am specifying the hostname becasue there is only ONE master.
case2:
Now, for the case when there is HA for both NN and RM, how do I submit the job? I am not able to understand this, because now we have two RM and NN (active / standby), and I understand that there is zookeeper to keep track of failures. So, from client perspective trying to submit a job, do I need to know the exact NN and RM for submitting the job or is there some logical naming which we have to use for submitting the job?
Can anyone please help me understand this?
With or without HA, the command to submit the job remains the same.
hadoop jar <jar> <mainClass> <inputpath> <outputpath> [args]
Using -fs and -jt is optional and are not used unless you want to specify a Namenode and JobTracker that is different from the one in the configurations.
If the fs.defaultFS property in core-site.xml and the properties defining the nameservice (dfs.nameservices) and its namenodes are configured properly in hdfs-site.xml of the client, the Active Master will be chosen whenever a client operation is performed.
By default, this Java class is used by the DFS Client to determine which NameNode is currently Active.
<property>
<name>dfs.client.failover.proxy.provider.<nameserviceID></name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
We are using HDP 2.7.1.2.3 with Ambari 2.1.2
After finish setup, every node status is correct.
But oneday ambari suddenly show namdenode is stopped.(we don't change any config of ambari or namenode)
However, we still can use HBASE and run MapReduce.
we think name node status should be normal.
We try to restart namenode and check ambari-server log
It shows:
ServiceComponentHostImpl:949 - Host role transitioned to a new state, serviceComponentName=NAMENODE, oldState=STARTING, currentState=STARTED
HeartBeatHandler:657 - State of service component NAMENODE of service HDFS of cluster wae has changed from STARTED to INSTALLED
we don't understand why its status change from "STARTED" to "INSTALLED".
In namenode side, we check ambari-agent.log
It shows one warning:
[Alert][namenode_directory_status] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
We think it is irrelevant.
What's the reason that ambari think namenode is stopped?
Is there any way that we can fix this issue?
Run the command ambari-server restart from linux terminal in Ambari server node
Run the command ambari-agent restart from linux terminal in all the nodes in the cluster.
You can run the command hdfs dfsadmin -report from the terminal as hdfs user to confirm all the nodes are up and running.
How to bring down your Namenode in Hadoop 1.2.1 on CentOs and swap your namenode with a Datanode instance, also I have to make sure no data is lost during the process.
I am using Hadoop 1.2.1 with master, slave 1 and slave 2 nodes.
I am looking for the Unix commands or the changes I need to make in the configuration files.
Please ask for any particular details if needed!
You can take a back up of namenode metadata and kill namenode. Install namenode packages on other node of interest and put the backup copy of metadata in namenode data dir. Now start namenode this should pick up your old metadata. Remember to change namenode details in all config files.
I am attempting to run a single-node instance of Hadoop on Amazon Web Services using Apache Whirr. I set whirr.instance-templates equal to 1 jt+nn+dn+tt. The instance starts up fine. I am able to create directories, but when I try to put files, I get a File could only be replicated to 0 nodes, instead of 1 error. When I do a hadoop fsck / I get a Exception in thread "main" java.net.ConnectException: Connection refused error. Does anyone know what is wrong with my configuration?
I made the experience that whirr does not always start all services reliable. It sounds like the namenode started (the namenode is responsible for storing directory information) but the datanode did not start (the datanode stores the data).
Try running
hadoop dfsadmin -report
to see if a datanode is available.
If not: often it helps to restart the cluster.