Installing Hive 2.1.0 Interactive Query (LLAP) on Kerberized HDP 2.6.2 environment - hadoop

I had a lot of issues surrounding the installation/activation of Hive 2.1.0 on our HDP 2.6.2 cluster. But finally I got it working, so I wanted to share the steps involved with the community. I got these steps from different sources, which I will also mention below each step. My specifications:
Clustered HDP 2.6.2 (hortonworks) environment
Kerberos
Hive 1.2.1000 -> Hive 2.1.0
Step 1: Enable Hive Interactive Query
Follow the steps on the Hortonworks website. This includes enabling YARN pre-emption and some other Yarn settings. After adjusting YARN your can enable Hive Interactive Query via Ambari. You also have to specify a default queue that is at least 20% of your total cluster capacity.
Source
Step 2: Kerberos related settings
Make sure you add the following settings to the custom hiveserver2-interactive site in Ambari. Where ${REALMNAME} is the name of your LDAP realm.
hive.llap.zk.sm.keytab.file=/etc/security/keytabs/hive.llap.zk.sm.keytab
hive.llap.zk.sm.principal=hive/_HOST#${REALMNAME}
hive.llap.daemon.keytab.file=/etc/security/keytabs/hive.service.keytab
hive.llap.daemon.service.principal=hive/_HOST#${REALMNAME}
Now you have to put those 2 keytabs (basically the same keytabs) on every YARN node. This can be done manually or through Ambari (Kerberos service). Make sure those keytabs are chown hive:hadoop and have a chmod 440 (group read).
Note: you also need a user hive on all those nodes.
Source
Step 3: Zookeeper configuration
It could be that Hive is not recognized by Zookeeper, this will give acl errors when trying to start the HiveServer2 Interactive. To cope with this issue I added the right hive acl nodes through a zookeeper client host.
su -
# First, authenticate with the hive keytab
kinit hive/'hostname' -kt /etc/security/keytabs/hive.service.keytab
# Second, connect to a zookeeper client on your cluster
/usr/hdp/current/zookeeper-server/bin/zkCli.sh -server ${ZOOKEEPER_CLIENT}
# Third, check the current status of the user-hive acl
getAcl /llap-sasl/user-hive
# Fourth, If this is not there create the following nodes
create /llap-sasl/user-hive "" sasl:hive:cdrwa,world:anyone:r
create /llap-sasl/user-hive/llap0 "" sasl:hive:cdrwa,world:anyone:r
create /llap-sasl/user-hive/llap0/workers "" sasl:hive:cdrwa,world:anyone:r
# Fifth, change the llap-sasl node to add the user hive
setAcl /llap-sasl sasl:hive:cdrwa,world:anyone:r
Source 1, Source 2
Basically, this should work for Kerberized environments. If you got errors related to ACL, go back to your Zookeeper settings and look if everything is fine. If you have errors related to a missing Hive user, you should look of the hive user is added correctly to the nodes. If you have an error related to Kerberos (principal or keytabs) look if the keytabs are on the designated (YARN) nodes with the correct rights.

Related

Error while working on hive which is installed on edgenode

I am new to hadoop/HIve learning and struggling to fix this, for a distributed hadoop environment where should hive and pig need to install, is this edge node or where my hadoop installed
Hadoop installed on different server say hadoopVM, 2 separate data nodes DN1, DN2 & Edge Nodes from where I can submit jobs to hadoop to load any files to HDFS
till here i have no issue, i am trying to install hive edge node and getting below error
Attached error which i am getting on edgenode server
It seems that the Meta Store service is not started. start the service by issuing the following command in one of the session and don't close that session, and parallel start another session and try to use hive.
Active session mode:
sudo hive --service metastore
Background service mode:
If you add "&&" then service will be started and keep running as a background process.
sudo hive --service metastore &&
Altarnative:
If you still facing the problem then this is the problem because the new version of MySQL, you can refer my answer at below link.
SemanticException in Hive Shell Mode

HDP 2.5: Zeppelin won't run Notebook in Kerberos-enabled cluster

I set up a Hadoop cluster with Hortonworks Data Platform 2.5 and Ambari 2.4. I also added the Zeppelin service to the cluster installation via Ambari UI.
Since I enabled Kerberos, I can't run the Zeppelin Notebooks anymore. When I click "Run paragraph" or "Run all paragraphs" nothing seems to happen. I also don't get any new entries in my logs in /var/log/zeppelin/. Before enabling Kerberos I was able to run the paragraphs.
I tried some example notebooks, and also some of mine, same problem: nothing happens... Tried with admin and non-admin users.
Here are my "Spark" and "sh" interpreter settings (other paragraphs e.g. %sql also don't work):
The tutorial below captures the configuration of Ambari and Hadoop Kerberos:
Configuring Ambari and Hadoop for Kerberos

Hadoop client node installation

I have 12 node cluster. Its Hardware information are :
NameNode : CPU Core i3 2.7 Ghz | 8GB RAM | 500 GB HDD
DataNode : CPU Core i3 2.7 Ghz | 2GB RAM | 500 GB HDD
I have installed the hadoop 2.7.2. I am using normal hadoop installation process on ubuntu and it work fine. But I want to add client machine.and I have no such clue that how to add client machine.
Question :
Installing process of Client machine. ?
How to run any script of pig/hive on that client machine ?
Client should have same copy of Hadoop Distribution and configuration which is present at Namenode then Only Client will come to know on which node Job tracker/Resourcemanager is running, and IP of Namenode to access HDFS data.
Also you need to update /etc/hosts of client machine with IP addresses and hostnames of namenode and datanode.
Note that, you shouldn’t start any hadoop service on client machine.
Steps to follow on client machine:
create an user account on the cluster, say user1
create an account on client machine with the same name: user1
configure client machine to access the cluster machines (ssh w\out passphrase i.e, password less login)
copy/get a hadoop distribution same as cluster to client machine and extract it to /home/user1/hadoop-2.x.x
copy(or Edit) the hadoop configuration files (*-site.xml) from Namenode of the cluster - from this client will know where the Namenode/resourcemanager is running.
Set environment variables: JAVA_HOME, HADOOP_HOME (/home/user1/hadoop-2.x.x)
Set hadoop bin to your path: export PATH=$HADOOP_HOME/bin:$PATH
test it out: hadoop fs -ls / which should list the root directory of the cluster hdfs.
you may face some issues like privileges, may need to set JAVA_HOME places like conf/hadoop-env.sh on client machine. update/comment any error you get.
Answers to more questions from comments:
How to load data from client node to hdfs ? - Just run hadoop fs commands from client machine: hadoop fs -put /home/user1/data/* /user/user1/data - you can also write shell-scripts that would run these command(s) if you need to run them many times.
Why I am installing hadoop on the client if we only use ssh to connect remotely to the master node ?
because client need to communicate with cluster, and need to know
where cluster nodes are.
client will be running hadoop jobs
like hadoop fs commands, hive queries, hadoop jar commnads, spark
jobs, developing mapreduce jobs etc for which client will need
hadoop binaries on client node.
Basically you are not only using the ssh to
connect, but you are performing some operations on hadoop cluster from
client node so you would need hadoop binaries.
ssh is used by
hadoop binaries on client node, when you run such operations like hadoop fs
-ls/ from client node to cluster. (remember adding $HADOOP_HOME/bin to PATH as part of installation process above)
when you are saying "we only use ssh" - that sounds to me like when you want to make changes/access hadoop configuration files from cluster you are connecting using ssh to cluster nodes - you do this as part of administrative work but when you need to run hadoop commands/jobs against cluster from client node you dont need to ssh manually - hadoop installation on client node will take care of it.
with out hadoop instalations how can you run hadoop commands/jobs/queries from client node to cluster?
3. should user name 'user1' must be same ? what if it is different ? - it will work. you can install hadoop on client node under group user say: qa or dev, and all users on client node as sudo under that group. than when user1 on client node need to run any hadoop job on cluster: user1 should be able to sudo -i -u qa and then run hadoop command from it.

Hadoop on Google Compute Engine

I am trying to setup hadoop cluster in Google Compute Engine through "Launch click-to-deploy software" feature .I have created 1 master and 1 slave node and tried to start the cluster using start-all.sh script from master node and i got error "permission denied(publickey)" .
I have generated public and private keys in both slave and master nodes .
currently i logged into the master with my username, is it mandatory to login into master as "hadoop" user .If so ,what is the password for that userid .
please let me know how to overcome this problem .
The deployment creates a user hadoop which owns Hadoop-specific SSH keys which were generated dynamically at deployment time; this means since start-all.sh uses SSH under the hood, you must do the following:
sudo su hadoop
/home/hadoop/hadoop-install/bin/start-all.sh
Otherwise, your "normal" username doesn't have SSH keys properly set up so you won't be able to launch the Hadoop daemons, as you saw.
Another thing to note is that the deployment should have already started all the Hadoop daemons automatically, so you shouldn't need to manually run start-all.sh unless you're rebooting the daemons after some manual configuration updates. If the daemons weren't running after the deployment ran, you may have encountered some unexpected error during initialization.

Setting up Kerberos on HDP 2.1

I have 2 node Ambari Hadoop cluster running on CentOS 6. Recently I setup Kerberos for the services in the cluster as per the instructions detailed here:
http://docs.hortonworks.com/HDPDocuments/Ambari-1.6.0.0/bk_ambari_security/content/ambari-kerb.html
In addition to the above documentation, found that you have to add additional configurations for the Web Namenode UI and so on (QuickLinks in the Ambari server console for each of the Hadoop Services) to work. Hence I followed the configuration options, listed in the question portion of the article to setup HTTP Authentication:Hadoop Web Authentication using Kerberos
Also to create the secret http file, I used the command to generate the file on node 1, and then copied the file to the same folder location on node 2 on the cluster as well:
sudo dd if=/dev/urandom of=/etc/security/keytabs/http_secret bs=1024 count=1
Updated the Zookeeper JAAS client file under, /etc/zookeeper/conf/zookeeper_client_jaas.conf to add the following:
Client { com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=false
useTicketCache=true
keyTab='/etc/security/keytabs/zk.service.keytab'
principal='zookeeper/host#realm-name';
};
This step followed from the article: http://blog.spryinc.com/2014/03/configuring-kerberos-security-in.html
When I restarted my Hadoop services, I get the 401 Authentication Required error, when I try to access the NameNode UI/ NameNode Logs/ NameNode JMX and so on. None of the links given in the QuickLinks drop down is able to connect and pull up the data.
Any thoughts to resolve this error?

Resources