cloudera navigator multi-tenancy capability - hadoop

In short, can Cloudera Navigator be configured for a multi-tenancy context ?
In detail, we have a datalake (Hadoop cluster) with many business entities, and we want that each business ventity view, manage and access only it own data using the cloudera navigator.
I didn't find any information on the net, and the ui seems to not provide such option.
thanks in advance

You may use Cloudera Manager to create Kerberos principals and keytabs which you may configure to access required directories.
Read: Configuring Authentication in Cloudera Manager

At the current version, Cloudera navigator is not multi-tenancy enabled.
So, at short time one of the solution is custom dev using the Cloudera navigator API coupled with other technologies.

Related

Cloudera node /etc/krb5.conf replaced at every reboot

I have a question, why are my cloudera nodes replacing the file /etc/krb5.conf ata every reboot ?? Im trying to make modifications, and when someone issues a reboot the file is again replaced by the old config file
Both CDH and HDP distros have an option to let their Hadoop cluster manager (Cloudera Manager vs. Ambari) also manage the Kerberos client config on all nodes.
Or rather, they have an option not to let it manage it for you...
From CDH 6.3 documentation
Choose whether Cloudera Manager should deploy and manage the krb5.conf on your cluster or not ...this page will let you configure the properties that will be emitted in it. In particular, the safety valves on this page can be used to configure cross-realm authentication.
From HDP 3.1 documentation
(Optional) To manage your Kerberos client krb5.conf manually (and not have Ambari manage the krb5.conf), expand the Advanced krb5-conf section and uncheck the "Manage" option.(Optional) To not have Ambari install the Kerberos client libraries on all hosts, expand the Advanced kerberos-env section and uncheck the “Install OS-specific Kerberos client package(s)” option

Adding Hbase service in kerberos enabled CDH cluster

I have a CDH cluster already running with kerberos authentication.
I have a requirement to add HBase service to the running cluster.
Looking for a documentation to enable hbase service since its kerberos enabled. Both command line and GUI options welcome.
Also, its good if there is a testing method like small table creation steps like that.
Thanks in advance!
If you add it through Coudera Manager-Add Service wizards, CDH takes care automatically (create/distribute Kerberos keytabs and add services)

Accessing Hadoop data using REST service

I am trying to update HDP architecture so data residing in Hive tables can be accessed by REST APIs. What are the best approaches how to expose data from HDP to other services?
This is my initial idea:
I am storing data in Hive tables and I want to expose some of the information through REST API therefore I thought that using HCatalog/WebHCat would be the best solution. However, I found out that it allows only to query metadata.
What are the options that I have here?
Thank you
You can very well use WebHDFS which is basically a REST Service over Hadoop.
Please see documentation below:
https://hadoop.apache.org/docs/r1.0.4/webhdfs.html
The REST API gateway for the Apache Hadoop Ecosystem is called KNOX
I would check it before explore any other options. In other words, Do you have any reason to avoid using KNOX?
What version of HDP are you running?
The Knox component has been available for quite a while and manageable via Ambari.
Can you get an instance of HiveServer2 running in HTTP mode?
This would give you SQL access through J/ODBC drivers without requiring Hadoop config and binaries (other than those required for the drivers) on the client machines.

hbase as database in web application

A big question about using hadoop or related technologies in a real web application.
I just want to find out how a web app can use hbase as its database. I mean is it the thing the big data apps do or they use normal databases and just use these sort of technologies for analysis?
Is it ok to have a online store with Hbase database or something like this?
Yes it is perfectly fine to have hbase as your backend.
What I am doing to get this done,( I have a online community and forum running on my website )
1.Writing C# code to access the Hbase using thrift, very easy and simple to get this done. (Thrift is a cross language binding platform, to HBase Java is only the first class citizen!)
2.Managing the HBase cluster(have it on Amazon) using the Amazon EMI
3.Using ganglia to monitor Hbase
Some Extra tips:
So you can organize the web application like this
You can set up your webservers on Amazon Web Services or IBMWebSphere
You can set up your own HBase cluster using cloudera or use AmazonEC2 again here.
Communication between web server and Hbase master node happens via thrift client.
You can generate thrift code in your own desired programming language
Here are some links that helped me
A)Thrift Client,
B)Filtering options
Along with this I refer to HBase administrative cookbook by Yifeng Jiang and HBase reference guide by Lars George in case I dont get answers on web.
Filtering options provided by HBase are fast and accurate. Let's say if you use HBase for storing your product details, you can have sub-stores and have a column in your Product table, which tells to which store a product may belong and use Filters to get products for a specific store.
I think you should read the article below:
"Apache HBase Do’s and Don’ts"
http://blog.cloudera.com/blog/2011/04/hbase-dos-and-donts/

Can Hortonworks Ambari manage multiple clusters

I have been looking all over the web to see if Ambari can manage multiple clusters like Cloudera does. Is this possible in Ambari? If so, how? I have looked all over the Ambari web UI and only see options to add a new host or service, but nothing about adding a cluster.
It's in roadmap. For now it's possible to do so in API level, from version 2.0 it would be possible to manage multiple clusters from web UI.

Resources