How to use the ResourceManager web interface as an user - hadoop

Every time i try to use the Hadoop Resource Manager web interface (http://resource-manger.host:8088/cluster/) i show up logged in as dr.who.
My question, how can I login as another user? In this case i want to login as myself and have a higher lever of privileges than dr.who.

The user infomation is got from HttpServletRequest#getRemoteUser().
1. If you deployed an insecure cluster, the simplest way to pass the username to server is by url parameter. For example, http://localhost:8088/cluster?user.name=babu
2. If you deployed a secure cluster, you probably use Kerberos authentication. You can use kinit to get a kerberos tgt, then configure the browser to negotiate. (network.negotiate-auth.trusted-uris for firefox, and --auth-server-whitelist for chromium. I'm sure there's lots of answers about this)
For more information, you can check hadoop official documentation.(https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/HttpAuthentication.html)

You should set the access control list by changing the default configuration of:
yarn.resourcemanager.zk-acl
from
world:anyone:rwcda
to something else,which is Cluster-specific
The ACLs the ResourceManager uses for the znode structure to store the internal state.

Related

What determines what user / groups Ranger can see when setting policies?

Have users on local machines that have HDFS /user dirs that do not show up as possible users when setting Ranger policies
I can see that Ranger already have a place where you can see and add users in the settings menu of the ranger UI, but not sure where this is getting populated from.
So my question then is what determines if Ranger can see cluster users for setting policies (and is there an easy way to manage this via ambari)?
The problem was that I had thought, looking at a answer on the Hortonworks community forums, that for a user to be recognized as "existing" on the HDP cluster, all that was required was for the user to 1) exist on a cluster node and 2) have a folder in hdfs:///user/<the username>. This apparantly is not correct (at least in the case of being recognized by Ranger as a valid user that can have policies set on them).
In order for a user to be recognized by Ranger (here, I do not have a cluster integrated with Kerberos or Active Directory), that user needs to exist on the usersync server machine which supports...
the ability [for Ranger] to get users and groups from the corporate AD to use in policy definitions.

How to switch or change user in Hue

Is there an option to switch user in HUE?
In my organization, infrastructure team setup usecase id, which has all HDFS file system access and only usecase id can submit yarn jobs. Individual user can sudo
to usecase id sudo su - xyz. There are no password for usecase id.
I am able to login to HUE but can't submit any jobs as I don't have access to any queue so I want to switch to usecase id, after login to HUE. How to switch user ( sudo su - xyz) in hue?
Hue, by default, can only run under the first account that's logged in with.
You need to ask the infrastructure team to configure Hue with a PAM or LDAP login authentication, in which case the password will be required for any Hue login user
Once that's setup, you are also able to switch accounts.
There are other configurations, but for enterprise users, I think those are the best options other than some single sign on OAuth/OpenID tool.
There's also SPNEGO, and that'll require a completely kerberized cluster.
Realistically, your company sounds like their cluster is not using Kerberos, so it isn't even secure.
For example, don't even need to sudo... Just export a variable
export HADOOP_USER_NAME=usecase
Of course, this isn't possible in Hue, but if you already have SSH access, you really can do anything in the cluster you want to
Configure yarn core-site.xml to impersonate hue users to access yarn cluster for submitting jobs.
key: hadoop.proxyuser.default_user.hosts
value: *
key: hadoop.proxyuser.default_user.groups
value: *
Replace default_user with hue or any system user manages your application.
Hue supports different backend module for authentication and authorization.
django.contrib.auth.backends.ModelBackend
desktop.auth.backend.AllowAllBackend
desktop.auth.backend.AllowFirstUserDjangoBackend
desktop.auth.backend.LdapBackend
desktop.auth.backend.PamBackend
desktop.auth.backend.SpnegoDjangoBackend
desktop.auth.backend.RemoteUserDjangoBackend
libsaml.backend.SAML2Backend
libopenid.backend.OpenIDBackend
liboauth.backend.OAuthBackend (New oauth, support Twitter, Facebook, Google+ and Linkedin)
Each module has its own advantage and disadvantages. For a very large scale deployment where multiple user need to access HUE application, LDAP, SAML2, Oauth authentication is preferred over PAM and Django based login.

How to understand the process of Kerberos (over Hadoop)?

I have deployed Kerberos in hadoop cluster. According to the theory, the KDC will verify you are the one as you clared, according to the private key.
However, using that system confused me. For example, if you need access to the HDFS, what you need to do is just to input "kinit hdfs#MY.REALM" and the password from a client. Then you will get ticket and manipulate the HDFS as the superuser "hdfs".
Does this the real process of kerberos? If the user are only verified by password, why don't we directly build a list inside the server and require the user to input its username/password? Where is the private key mentioned in the theory? Can anyone explain this to me please?

How to run "hadoop jar" as another user?

hadoop jar uses the name of the currently logged-in user. Is there a way to change this without adding a new system user?
There is, through a feature called Secure Impersonation, which lets one user submit on behalf of another (that user must exist though). If you're running as the hadoop superuser, it's as simple as setting the env variable $HADOOP_PROXY_USER.
If you want to impersonate a user which doesn't exist, you'll have to do the above and then implement your own AuthenticationHandler.
If you don't have to impersonate too many users, I find it easiest to just create those users on the namenode and use secure impersonation in my scripts.

How to get Kerberos instead of delegation token in Hadoop mapReduce?

I'm a Java user, when submitting a job to Hadoop mapReduce, it uses Kerberos to authenticate for Hadoop, and upon success there's the delegation token created and passed with the job submission to Hadoop instead of the kerberos ticket (for security reason as stated by Hadoop). Now the job is running as me, but the job itself needs to use Kerberos to send request to other services outside Hadoop. Now I don't have kerberos TGT on Hadoop and I can't get the service ticket.
Is there anyway I can pass the Kerberos ticket with the job? (I know it might be dangerous since we don't want to pass the secret around), JobConf could pass the string to string pairs to Hadoop, but I have to convert the TGT to a json string and revert it during job running?
Or is it possible to use the delegation token reform TGT?
I tried to google it but not much information, anyone could help? Thank you.
**Editted:**
Looks like there's no easy way of doing this without passing the TGT to Hadoop, so I am going to try the following method by passing the TGT as string via job config map to Hadoop (String only), and convert the string back to TGT object when the job runs in Hadoop. The concern is I am going to pass the credentials over the network, which is not a best practice and one of the very reasons Hadoop didn't pass Kerberos around for security. If I could re-use the reformed TGT passed to Hadoop to get the service tickets, I will try to encrypt the TGT string as much as possible to avoid security issues.
So before starting a job in the local machine, the code would be like:
import sun.security.krb5.Credentials;
Credentials tgt = Credentials.acquireTGTFromCache(null, null); // Make sure kinit is done before this
String tgtStr = tgt.convertToJsonString(); //Need to implement this
Job job = new Job("Test");
JobConf jobConf = job.getJobConf();
jobConf.set("tgtStr", tgtStr);
job.addTask(Test.class, "run", null);
job.submit();
job.waitForCompletion(true);
Then the function in the job for Hadoop to run would be like:
Configuration conf = TaskContext.get().getConfiguration();
String tgtStr = conf.get("tgtStr");
Credentials tgt = reformTGTFromString(tgtStr);//Need to implement this
Credentials serviceTicket = Credentials.acquireServiceCreds(servicePrincipal, tgt); //This is to get any service ticket
So I need to implement two function to stream TGT object (Credentials.class) to string and then reform it back to object.
Anyone knows a better solution for this? Thanks.
Please see the design at http://carfield.com.hk:8080/document/distributed/hadoop-security-design.pdf , if you have not done so already.
Or is it possible to use the delegation token reform TGT?
No, the delegation tokens are issued by Hadoop name node and while it is based on the Kerberos authentication, it is independent and you can not derive the Kerberos TGT from it.
In the original design, we considered using solely Kerberos(without any additional tokens), which would have made your plan easy but decided against it for these reasons:
Performance:
Thousands of M/R tasks may need to get the Kerberos tickets at the
same time
Kerberos credentials need to be renewed before the expiry
For scheduled jobs, this will be an issue
Delegation tokens don’t depend on Kerberos and can be coupled with non-Kerberos authentication mechanisms- (such as SSL) used at the edge.
In your case, you can use a private distributed cache and send the forwardable TGT. I think this will be OK but need to think about it some more. Obviously you need to make sure your implementation is secure, that your tickets have minimally necessary lifetime, IP channel bindings are used if possible and restrict the use of tickets only to authorized processes.
By disassembling the Credentials fields and convert them to Strings using Base64 encoder, form a JSON string and pass it to Hadoop using config map or distributed cache suggested by RVM, and then reform the Credentials object in the job running on Hadoop, I can get back the Kerberos TGT and successfully get any service tickets using it. So this method works, and the only thing here needs to be very cautious is the encryption of the keys that are passed over network.
First of all, your account has to have delegation enabled. The service ticket has to request a forwardable ticket. If that is all true, Hadoop has to retrieve the delegated credential from the GSSContext and construct a new one on behalf of you. With that new TGT it will be able to perform further steps. Use Wireshark to check the ticket for hadoop.

Resources