How do add a new user to Hue/Hive with permissions? - hadoop

I have a Hadoop cluster that I manage via the Hue interface to run Hive queries.
I would like to add another user to Hue and give them access to SOME of the table to run queries on. Is this possible?

Hue is just a view on top of Hive so using Hive Authorization should do it (beware: Hive Authorization is currently being reworked in order to be really secure).

You might want to add that user to the hadoop group that you must have created to run hadoop. Also you might want to create a separate working directory for each user on hadoop.

Related

Ambari Hive View: How to make Hive queries run as current user

I finished configuring the Hive and HDFS views inside Ambari 2.4.2.. I want execute a query as a user and not a hive user.
I changed hive.server2.enable.doAs is set to “true” but I think is not enough because always my query execute as a hive user when I see it in ranger.
I'm looked in th eforum, that I should to configure impersonation the authParam look like the following:
hive.server2.proxy.user=${username}
I don't find where is this parameter to change it. knowing that I don't have a kerberized cluster hadoop, not yet I setting up the kerberos.
Thank you

Authorizing Hadoop users without Sentry

I have a Kerberized CDH cluster, where there are some daily oozie workflows running. All of them use shell, impala-shell, hive and sqoop to ingest data to Hive tables (lets call these tables SensitiveTables)
Now, I want to create 2 new BI users to use the cluster and experiment with some other ingested data.
The requirement is that these new BI users:
should not have access to the SensitiveTables
should be able to spark-submit jobs to the cluster
(optionally) use Hue
Apart from setting-up Apache Sentry (which is the recommended way to go), is there any chance to meet those requirements using file-permissions or ACL and Service Level Authorization ?
So far, I managed (via hadoop fs -chmod o-rwx /user/hive/warehouse/sensitive) to restrict access to SensitiveTables via Hive (which uses user impersonation), but failed to do so via Impala (which submits all jobs to the cluster as user impala). Is there anything else I should try?
Thank you,
Gee
After a lot of research and based on the assumptions I described, the answer is NO. Furthermore, the metastore can not be protected this way.

How to run hive on google cloud dataproc from within the machine?

I've just created a google cloud dataproc cluster. A few basic things are not working for me:
I'm trying to run the hive console from the master node but it fails to load with any user other than root (it looks like there's a lock, the console is just stuck).
But even when using root, I see some odd behaviour:
"show tables;" shows a table named "input"
querying the table raises an exception that this table not found.
It is not clear which user is creating the tables through the web ui. I create a job, execute it, but then don't see the results through the console.
Couldn't find any good documentation on that - does anybody have an idea on this?
Running the hive command at present is somewhat broken due to the default metastore configuration.
I recommend you use the beeline client instead, which talks to the same Hive Server 2 as Dataproc Hive Jobs. You can use it via ssh by running beeline -u jdbc:hive2://localhost:10000 on the master.
YARN applications are submitted by the Hive Server 2 as the user "nobody", you can specify a different user by passing the -n flag to beeline, but it shouldn't matter with default permissions.
This thread is a bit old but when some one search Google Cloud Platform and Hive this result is coming. So I'm adding some info which may be useful.
Currently, in order to submit job to Google dataproc, I think - like all other products - there are 3 options:
from UI
from console using command line like:
gcloud dataproc jobs submit hive --cluster=CLUSTER (--execute=QUERY, -e QUERY | --file=FILE, -f FILE) [--async] [--bucket=BUCKET] [--continue-on-failure] [--jars=[JAR,…]] [--labels=[KEY=VALUE,…]] [--params=[PARAM=VALUE,…]] [--properties=[PROPERTY=VALUE,…]] [GLOBAL-FLAG …]
REST API call like: https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs/submit
Hope this will be useful to someone.

Unable to retain HIVE tables

I have set up a single node hadoop cluster on ubuntu.I have installed hadoop 2.6 version on in my machine.
Problem:
Everytime i create HIVE tables and load data into it , i can see the data by querying on it but once i shut-down my hadoop , tables gets wiped out. Is there any way i can retain them or is there any setting i am missing?
I tried some online solution provided , but nothing worked , kindly help me out with this.
Blockquote
Thanks
B
The hive table data is on the hadoop hdfs, hive just add a meta data and provide users sql like commands to prevent them from writing basic MR jobs.So if you shutdown the hadoop cluster,Hive cant find the data in the table.
But if you are saying when you restart the hadoop cluster, the data is lost.That's another problem.
seems you are using default derby as metastore.configure the hive metastore.i am pointing the link.please fallow it.
Hive is not showing tables

Who accessed a Hive table or HDFS directory

Is there a way to figure out which user ran a ‘select’ query against a Hive table? What time it was run?
More generically, which user accessed a HDFS directory?
HDFS has an audit log which will tell you which operations were run by which users. This is an old doc that shows how to enable audit logging but should still be relevant. For audit logging at the Hive level though, you'll have to look at some cutting edge tech.
Hortonworks acquired XASecure to implement security level features on top of their platform. Cloudera acquired Gazzang to do the same thing. They have some level of audit logging (and authorizations) for other services like Hive and HBase. They're also adding a lot more security related feature, but I'm not sure of the roadmap though.

Resources