Hiveserver2 Kerberos - hadoop

If I want to configure my HiveServer2 (we use selfmade hadoop enviroment) to use Kerberos authentication it's require a secure hadoop envoriment too?
What I mean:
after I installed the kerberos I want to secure my Hiveserver2, but first I have to secure my hdfs, hadoop core-conf, mapred etc.. or not?
I hope it make sense, thank your for the help!

In order to kerborize hive server 2, It is must to kerborize all other hadoop services(for example hdfs, yarn etc) as well.
I don't know what is the meaning of selfmade hadoop environment. You better use cloudera manager(CDH) or ambari server(Hortonworks) to enable kerberos for your hadoop services.

Related

can I connect to hdfs without hadoop installation?

I've read the docs and tutorials and I can see that all nodes that are either namenode/datanode would need to install hadoop.
But what about the client that actually requests a file read/write operation on hdfs?
Does the client require hadoop installation too? Or can it just do hdfs i/o only by somehow communication with the namenode url?
For example in python, I've seen sample codes that import pyarrow and read data from hdfs by giving the namenode url as a parameter. In such cases, should hadoop installation be required?
You need to install Hadoop client libraries to be able to make RPC requests to the Hadoop services such as HDFS or YARN.
PyArrow, Spark, Flink, etc. are clients, and do not require a local Hadoop installation to run/write code.

How to query kerberos enabled hbase using apache drill?

We have a kerberoized hadoop cluster, where HBase is running. Apache drill is running in distributed mode, in another cluster. Now, we need to query the Kerberos enabled HBase from the apache drill, using web UI. The Hadoop cluster is actually running in AWS, the HBase uses s3 as storage. Please help me with steps to achieve successful queries to HBase.
Apache_Drill_version:1.16.0 Hadoop version: 2
Usually, to query HBase, we run kinit with the keytab manually, then get into HBase shell in Hadoop cluster. We wanted to make use of drill, to query in a SQL fashion easily, better readability.

Why would I Kerberise my hadoop (HDP) cluster if it already uses AD/LDAP?

I have a HDP cluster.
This cluster is configured to use Active Directory as Authentication and Authorization authority. To be more specific, we use Ranger to limit accesses to HDFS directories, Hive tables and Yarn queues after said user provided correct username/password combinaison.
I have been tasked to Kerberise the Cluster, which is very easy thanks to the "press buttons and skip" like option in Ambari.
We Kerberised a test cluster. While interacting with Hive does not require any modification on our existing scripts on the cluster's machines, it is very, very difficult to find a way for end users to interact with Hive from OUTSIDE the cluster (PowerBI, DbVisualizer, PHP application).
Kerberising seems to bring an unnecessary amount of work.
What concret benefits would I get from Kerberising the cluster (except make the guys above in the hierachy happy because, hey, we Kerberised, yoohoo) ?
Edit:
One benefit:
Kerberising the Cluster grants more security as it is running on linux machines, but the company Active Directory is not able to handle such OS.
Ranger with AD/LDAP authentication and authorization is ok for external users, but AFAIK, it will not secure machine-to-machine or command-line interactions.
I'm not sure if it still applies, but on a Cloudera cluster without Kerberos, you could fake a login by setting an environment parameter HADOOP_USER_NAME on the command line:
sh-4.1$ whoami
ali
sh-4.1$ hadoop fs -ls /tmp/hive/zeppelin
ls: Permission denied: user=ali, access=READ_EXECUTE, inode="/tmp/hive/zeppelin":zeppelin:hdfs:drwx------
sh-4.1$ export HADOOP_USER_NAME=hdfs
sh-4.1$ hadoop fs -ls /tmp/hive/zeppelin
Found 4 items
drwx------ - zeppelin hdfs 0 2015-09-26 17:51 /tmp/hive/zeppelin/037f5062-56ba-4efc-b438-6f349cab51e4
For machine-to-machine communications, tools like Storm, Kafka, Solr or Spark are not secured by Ranger, but they are secured by Kerberos, so only dedicated processes can use those services.
Source: https://community.cloudera.com/t5/Support-Questions/Kerberos-AD-LDAP-and-Ranger/td-p/96755
Update: Apparently, Kafka and Solr Integration has been implemented in Ranger since then.

Adding Hbase service in kerberos enabled CDH cluster

I have a CDH cluster already running with kerberos authentication.
I have a requirement to add HBase service to the running cluster.
Looking for a documentation to enable hbase service since its kerberos enabled. Both command line and GUI options welcome.
Also, its good if there is a testing method like small table creation steps like that.
Thanks in advance!
If you add it through Coudera Manager-Add Service wizards, CDH takes care automatically (create/distribute Kerberos keytabs and add services)

How to set kerberos in hadoop?

Anyone know how to setup kerberos in hadoop cluster. Are there any tutorials available. I am using Apache Hadoop distribution.
Use CDH4.. it's free and comes with a great Security guide (CDH4 Security Guide).

Resources