I finished configuring the Hive and HDFS views inside Ambari 2.4.2.. I want execute a query as a user and not a hive user.
I changed hive.server2.enable.doAs is set to “true” but I think is not enough because always my query execute as a hive user when I see it in ranger.
I'm looked in th eforum, that I should to configure impersonation the authParam look like the following:
hive.server2.proxy.user=${username}
I don't find where is this parameter to change it. knowing that I don't have a kerberized cluster hadoop, not yet I setting up the kerberos.
Thank you
Related
I have a Kerberized CDH cluster, where there are some daily oozie workflows running. All of them use shell, impala-shell, hive and sqoop to ingest data to Hive tables (lets call these tables SensitiveTables)
Now, I want to create 2 new BI users to use the cluster and experiment with some other ingested data.
The requirement is that these new BI users:
should not have access to the SensitiveTables
should be able to spark-submit jobs to the cluster
(optionally) use Hue
Apart from setting-up Apache Sentry (which is the recommended way to go), is there any chance to meet those requirements using file-permissions or ACL and Service Level Authorization ?
So far, I managed (via hadoop fs -chmod o-rwx /user/hive/warehouse/sensitive) to restrict access to SensitiveTables via Hive (which uses user impersonation), but failed to do so via Impala (which submits all jobs to the cluster as user impala). Is there anything else I should try?
Thank you,
Gee
After a lot of research and based on the assumptions I described, the answer is NO. Furthermore, the metastore can not be protected this way.
I've just created a google cloud dataproc cluster. A few basic things are not working for me:
I'm trying to run the hive console from the master node but it fails to load with any user other than root (it looks like there's a lock, the console is just stuck).
But even when using root, I see some odd behaviour:
"show tables;" shows a table named "input"
querying the table raises an exception that this table not found.
It is not clear which user is creating the tables through the web ui. I create a job, execute it, but then don't see the results through the console.
Couldn't find any good documentation on that - does anybody have an idea on this?
Running the hive command at present is somewhat broken due to the default metastore configuration.
I recommend you use the beeline client instead, which talks to the same Hive Server 2 as Dataproc Hive Jobs. You can use it via ssh by running beeline -u jdbc:hive2://localhost:10000 on the master.
YARN applications are submitted by the Hive Server 2 as the user "nobody", you can specify a different user by passing the -n flag to beeline, but it shouldn't matter with default permissions.
This thread is a bit old but when some one search Google Cloud Platform and Hive this result is coming. So I'm adding some info which may be useful.
Currently, in order to submit job to Google dataproc, I think - like all other products - there are 3 options:
from UI
from console using command line like:
gcloud dataproc jobs submit hive --cluster=CLUSTER (--execute=QUERY, -e QUERY | --file=FILE, -f FILE) [--async] [--bucket=BUCKET] [--continue-on-failure] [--jars=[JAR,…]] [--labels=[KEY=VALUE,…]] [--params=[PARAM=VALUE,…]] [--properties=[PROPERTY=VALUE,…]] [GLOBAL-FLAG …]
REST API call like: https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs/submit
Hope this will be useful to someone.
I have set up a single node hadoop cluster on ubuntu.I have installed hadoop 2.6 version on in my machine.
Problem:
Everytime i create HIVE tables and load data into it , i can see the data by querying on it but once i shut-down my hadoop , tables gets wiped out. Is there any way i can retain them or is there any setting i am missing?
I tried some online solution provided , but nothing worked , kindly help me out with this.
Blockquote
Thanks
B
The hive table data is on the hadoop hdfs, hive just add a meta data and provide users sql like commands to prevent them from writing basic MR jobs.So if you shutdown the hadoop cluster,Hive cant find the data in the table.
But if you are saying when you restart the hadoop cluster, the data is lost.That's another problem.
seems you are using default derby as metastore.configure the hive metastore.i am pointing the link.please fallow it.
Hive is not showing tables
I've got a Cloudera Hadoop installation (CDH4) which runs the Yarn framework, and I've got Hue installed as well.
I've noticed that when I submit a Hive query via the Hue (Beeswax) interface, the resulting mapreduce job shows up in the resourcemanager web UI, as well as the Hue 'Job Browser' interface. However, if I run the hive cli application on any of the nodes and run the same query from there, it doesn't appear to hit any of the nodemanagers, although it does return the correct results.
The only difference I can think of is that the Hue job runs as the user I'm logged into Hue as, whereas the hive cli job runs as the user that started the hive cli, which is a different user.
I would expect queries submitted via the hive CLI to show up in the resource manager. Is there any reason why they are not?
Are you logged-in as the same user in Hue? Hue's JobBrowser filter shows you only your own jobs by default. You can reset the username filter and check if they are other jobs?
I have a Hadoop cluster that I manage via the Hue interface to run Hive queries.
I would like to add another user to Hue and give them access to SOME of the table to run queries on. Is this possible?
Hue is just a view on top of Hive so using Hive Authorization should do it (beware: Hive Authorization is currently being reworked in order to be really secure).
You might want to add that user to the hadoop group that you must have created to run hadoop. Also you might want to create a separate working directory for each user on hadoop.