Setting up Kerberos on HDP 2.1 - hadoop

I have 2 node Ambari Hadoop cluster running on CentOS 6. Recently I setup Kerberos for the services in the cluster as per the instructions detailed here:
http://docs.hortonworks.com/HDPDocuments/Ambari-1.6.0.0/bk_ambari_security/content/ambari-kerb.html
In addition to the above documentation, found that you have to add additional configurations for the Web Namenode UI and so on (QuickLinks in the Ambari server console for each of the Hadoop Services) to work. Hence I followed the configuration options, listed in the question portion of the article to setup HTTP Authentication:Hadoop Web Authentication using Kerberos
Also to create the secret http file, I used the command to generate the file on node 1, and then copied the file to the same folder location on node 2 on the cluster as well:
sudo dd if=/dev/urandom of=/etc/security/keytabs/http_secret bs=1024 count=1
Updated the Zookeeper JAAS client file under, /etc/zookeeper/conf/zookeeper_client_jaas.conf to add the following:
Client { com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=false
useTicketCache=true
keyTab='/etc/security/keytabs/zk.service.keytab'
principal='zookeeper/host#realm-name';
};
This step followed from the article: http://blog.spryinc.com/2014/03/configuring-kerberos-security-in.html
When I restarted my Hadoop services, I get the 401 Authentication Required error, when I try to access the NameNode UI/ NameNode Logs/ NameNode JMX and so on. None of the links given in the QuickLinks drop down is able to connect and pull up the data.
Any thoughts to resolve this error?

Related

Installing Hive 2.1.0 Interactive Query (LLAP) on Kerberized HDP 2.6.2 environment

I had a lot of issues surrounding the installation/activation of Hive 2.1.0 on our HDP 2.6.2 cluster. But finally I got it working, so I wanted to share the steps involved with the community. I got these steps from different sources, which I will also mention below each step. My specifications:
Clustered HDP 2.6.2 (hortonworks) environment
Kerberos
Hive 1.2.1000 -> Hive 2.1.0
Step 1: Enable Hive Interactive Query
Follow the steps on the Hortonworks website. This includes enabling YARN pre-emption and some other Yarn settings. After adjusting YARN your can enable Hive Interactive Query via Ambari. You also have to specify a default queue that is at least 20% of your total cluster capacity.
Source
Step 2: Kerberos related settings
Make sure you add the following settings to the custom hiveserver2-interactive site in Ambari. Where ${REALMNAME} is the name of your LDAP realm.
hive.llap.zk.sm.keytab.file=/etc/security/keytabs/hive.llap.zk.sm.keytab
hive.llap.zk.sm.principal=hive/_HOST#${REALMNAME}
hive.llap.daemon.keytab.file=/etc/security/keytabs/hive.service.keytab
hive.llap.daemon.service.principal=hive/_HOST#${REALMNAME}
Now you have to put those 2 keytabs (basically the same keytabs) on every YARN node. This can be done manually or through Ambari (Kerberos service). Make sure those keytabs are chown hive:hadoop and have a chmod 440 (group read).
Note: you also need a user hive on all those nodes.
Source
Step 3: Zookeeper configuration
It could be that Hive is not recognized by Zookeeper, this will give acl errors when trying to start the HiveServer2 Interactive. To cope with this issue I added the right hive acl nodes through a zookeeper client host.
su -
# First, authenticate with the hive keytab
kinit hive/'hostname' -kt /etc/security/keytabs/hive.service.keytab
# Second, connect to a zookeeper client on your cluster
/usr/hdp/current/zookeeper-server/bin/zkCli.sh -server ${ZOOKEEPER_CLIENT}
# Third, check the current status of the user-hive acl
getAcl /llap-sasl/user-hive
# Fourth, If this is not there create the following nodes
create /llap-sasl/user-hive "" sasl:hive:cdrwa,world:anyone:r
create /llap-sasl/user-hive/llap0 "" sasl:hive:cdrwa,world:anyone:r
create /llap-sasl/user-hive/llap0/workers "" sasl:hive:cdrwa,world:anyone:r
# Fifth, change the llap-sasl node to add the user hive
setAcl /llap-sasl sasl:hive:cdrwa,world:anyone:r
Source 1, Source 2
Basically, this should work for Kerberized environments. If you got errors related to ACL, go back to your Zookeeper settings and look if everything is fine. If you have errors related to a missing Hive user, you should look of the hive user is added correctly to the nodes. If you have an error related to Kerberos (principal or keytabs) look if the keytabs are on the designated (YARN) nodes with the correct rights.

Getting "User [dr.who] is not authorized to view the logs for application <AppID>" while running a YARN application

I'm running a custom Yarn Application using Apache Twill in HDP 2.5 cluster, but I'm not able to see my own container logs (syslog, stderr and stdout) when I go to my container web page:
Also the login changes from my kerberos to "dr.who" when I navigate to this page.
But I can see the logs of map-reduce jobs. Hadoop version is 2.7.3 and the cluster is yarn acl enabled.
i had this issue with hadoop ui. I found in this doc, that the hadoop.http.staticuser.user is set to dr.who by default and you need include it in the related setting file (in my issue is core-site.xml file).
so late but hope useful.

How to connect Apache Spark with Yarn from the SparkContext?

I have developed a Spark application in Java using Eclipse.
So far, I am using the standalone mode by configuring the master's address to 'local[*]'.
Now I want to deploy this application on a Yarn cluster.
The only official documentation I found is http://spark.apache.org/docs/latest/running-on-yarn.html
Unlike the documentation for deploying on a mesos cluster or in standalone (http://spark.apache.org/docs/latest/running-on-mesos.html), there is not any URL to use within SparkContext for the master's adress.
Apparently, I have to use line commands to deploy spark on Yarn.
Do you know if there is a way to configure the master's adress in the SparkContext like the standalone and mesos mode?
There actually is a URL.
Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster. These configs are used to write to HDFS and connect to the YARN ResourceManager
You should have at least hdfs-site.xml, yarn-site.xml, and core-site.xml files that specify all the settings and URLs for the Hadoop cluster you connect to.
Some properties from yarn-site.xml include yarn.nodemanager.hostname and yarn.nodemanager.address.
Since the address has a default of ${yarn.nodemanager.hostname}:0, you may only need to set the hostname.

YARN client authentication fails with SIMPLE authentication is not enabled. Available:[TOKEN]

I've setup a simple local PHD 3.0 Hadoop cluster and followed the steps described in the Spring Yarn Basic Getting Started guide
Running the app against my Hadoop cluster gives
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN]
and the following stack trace in the YARN ResourceManager:
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN]
at org.apache.hadoop.ipc.Server$Connection.initializeAuthContext(Server.java:1554)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1510)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:762)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:636)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:607)
This is probably a very basic question.
I'd like simply to run a YARN app test without setting up any authentication.
As I understand, YARN does not allow SIMPLE client authentication:
https://issues.apache.org/jira/browse/YARN-2156
According to this question
How can I pass a Kerberos ticket to Spring Yarn application
I might end up setting up a Kerberos authentication.
Is there a way to run Spring YARN example without elaborate authentication setup?
My mistake was simple.
I had to add
spring:
hadoop:
resourceManagerAddress: myyarnhost:8050
resourceManagerSchedulerAddress: myyarnhost:8030
to the application.yml too, but mixed up the port numbers (8030 for manager and 8050 for ManagerScheduler).
And that typo has caused such effect.
Maybe adding these two configuration properties to the getting started guide could save some time to the next readers.
Also, to run the example against a freshly installed PHD3.0 I had to modify the HDFS client user name by exporting the default HADOOP_USER_NAME:
export HADOOP_USER_NAME=hdfs
I just tried that with 5 node phd30 cluster and everything was ok:
In build.gradle I used phd30 packages instead of vanilla(which depends on hadoop 2.6.0). Versions in this case should not matter afaik.
compile("org.springframework.data:spring-yarn-boot:2.2.0.RELEASE-phd30")
testCompile("org.springframework.data:spring-yarn-boot-test:2.2.0.RELEASE-phd30")
In src/main/resources/application.yml I changed hdfs, rm and scheduler addresses to match cluster settings:
spring:
hadoop:
fsUri: hdfs://ambari-2.localdomain:8020
resourceManagerAddress: ambari-3.localdomain:8050
resourceManagerSchedulerAddress: ambari-3.localdomain:8030
Then I just ran it externally from my own computer:
$ java -jar target/gs-yarn-basic-single-0.1.0.jar
There's one appmaster and one container executed and app should succeed.
If it still doesn't work then there's something else. I didn't deploy hawk if that makes a difference.

How to run hadoop balancer from client node?

I want to ask how can I run the hadoop balancer? I've tried before on the namenode to run hadoop balancer command, but it has no effect at all (my new datanode still empty). I also read that hadoop balancer is not run on namenode but on client node. So what is the client node, how can I configure it, and how can client node access the hadoop file system?
Thanks all, I need your suggest
Client node is also know as edge node, Usually all the developers in a organization will not have access to all nodes on cluster. So for developers to accesss cluster usually we will have a Client node. You need to install hadoop-client packages on client node. If you are using cloudera RPM based installation, you can use below command.
sudo yum install hadoop-client
After client node installation update your configuration files like core-site.xml, hdfs-site.xml and other required files. Now when you execute hadoop CLI commands, they will be executed on cluster.
Balancer can be run from any node in the cluster. It can be a client machine/any node in cluster.
sudo -u hdfs hdfs balancer
Regarding newly added datanode, Just check in the namenode web UI if your node is added ? If you are able to see there, just check logs.

Resources