How to understand the process of Kerberos (over Hadoop)? - hadoop

I have deployed Kerberos in hadoop cluster. According to the theory, the KDC will verify you are the one as you clared, according to the private key.
However, using that system confused me. For example, if you need access to the HDFS, what you need to do is just to input "kinit hdfs#MY.REALM" and the password from a client. Then you will get ticket and manipulate the HDFS as the superuser "hdfs".
Does this the real process of kerberos? If the user are only verified by password, why don't we directly build a list inside the server and require the user to input its username/password? Where is the private key mentioned in the theory? Can anyone explain this to me please?

Related

What is needed to generate kerberos keytab file on windows?

I was looking for answer to above question on different web sites but in the every case there was how to generate keytab file. I need keytab to get hbase connection which contains kerberos authentication.
In order to generate a keytab on Windows, you need to be running some version of Kerberos which talks back to a Directory server. On Windows, by far the most prevalent example of this is Active Directory, which has Kerberos support built-in. You'll need to create the keytab on a Windows server joined to the Active Directory domain, using the ktpass command to actually create the keytab.
Keytab generation syntax example:
ktpass -out centos1-dev-local.keytab -mapUser krbCentos#DEV.LOCAL +rndPass -mapOp set +DumpSalt -crypto AES256-SHA1 -ptype KRB5_NT_PRINCIPAL -princ HTTP/centos1.dev.local#DEV.LOCAL
The above command example successfully creats a keytab for use in an AD domain named DEV.LOCAL. Note: notice the use of the randomize password syntax (+rndPass). In my opinion, there is no need to specify a password in the keytab creation command syntax. Instead, it's better to allow the password to be randomized - that provides much better security since it prevents anyone from being able to manually logon as the AD account surreptitiously and bypass the keytab.
For additional reference, I highly suggest you read my article on Kerberos keytab creation on the Windows platform on Microsoft Technet which greatly expands on what I said here: Kerberos Keytabs – Explained. I frequently go back and edit it based on questions I see here in this forum.

Unable to login in Kerberos Enabled Hadoop Cluster

We configured our Hadoop clutser with Kerberos and everything started fine .We are trying to generate ticket for our hdfs principal using
kinit hdfs#HADOOP.COM
it is asking for password that we never configured although we are able to login using keytab file using
kinit hdfs#HADOOP.COM -t <keytab file location>
but now we wan the ticket that was generated using the keytab file to expire
I am very new in using Kerberos ,any pointers in right direction will be of a great help.
To list the kerberos ticket details, execute the below command in terminal
klist
Make sure JAVA_HOME is set in bashrc file
Not sure what you really mean by "now we wan the ticket that was generated using the keytab file to expire".
AFAIK you cannot force expiration of a ticket, but...
you can delete it completely with kdestroy
you can re-create it (delete + create) with kinit, either in
interactive mode (prompts for password then encrypts it to be shipped
to KDC) or background mode (uses provided keytab, which contains a
pre-encrypted password)
you can renew it (shift the expiration date, as far as you don't
bump into the max renewal lifetime)
So my best bet is that you just need to run kdestroy.

How to use the ResourceManager web interface as an user

Every time i try to use the Hadoop Resource Manager web interface (http://resource-manger.host:8088/cluster/) i show up logged in as dr.who.
My question, how can I login as another user? In this case i want to login as myself and have a higher lever of privileges than dr.who.
The user infomation is got from HttpServletRequest#getRemoteUser().
1. If you deployed an insecure cluster, the simplest way to pass the username to server is by url parameter. For example, http://localhost:8088/cluster?user.name=babu
2. If you deployed a secure cluster, you probably use Kerberos authentication. You can use kinit to get a kerberos tgt, then configure the browser to negotiate. (network.negotiate-auth.trusted-uris for firefox, and --auth-server-whitelist for chromium. I'm sure there's lots of answers about this)
For more information, you can check hadoop official documentation.(https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/HttpAuthentication.html)
You should set the access control list by changing the default configuration of:
yarn.resourcemanager.zk-acl
from
world:anyone:rwcda
to something else,which is Cluster-specific
The ACLs the ResourceManager uses for the znode structure to store the internal state.

Is a Spring DriverManagerDataSource password stored in plaintext?

You can set a org.springframework.jdbc.datasource.DriverManagerDataSource user name and password with:
dataSource.setUsername("johnsmith");
dataSource.setPassword("myplaintextpassword");
My question is - if I were to create an object this way, and then examine the memory of the machine this is running on, could I see the plaintext password?
If so, how can one securely create a database connection using a passed in password?
Sure. Create a complete heapdump of the process or JVM and you will be able to see it.
I don't know what operating system your application runs in or if it is standalone or run in container like Tomcat but exactly this is the reason why processes need to be separated.
You have to make sure that the file or JNDI configuration your password is stored in is only accessible by those processes / users that absolutely need access to them. And an additional layer of encryption will help too. There always will be someone (like root on Linux) who can read every process memory but your job is to keep the group of people being able to do this as small as possible.
Perhaps serverfault is a better place for asking or searching details about this. I am sure you can describe your environment (OS, container for your application, ...), try to get help for that setup.

How to get Kerberos instead of delegation token in Hadoop mapReduce?

I'm a Java user, when submitting a job to Hadoop mapReduce, it uses Kerberos to authenticate for Hadoop, and upon success there's the delegation token created and passed with the job submission to Hadoop instead of the kerberos ticket (for security reason as stated by Hadoop). Now the job is running as me, but the job itself needs to use Kerberos to send request to other services outside Hadoop. Now I don't have kerberos TGT on Hadoop and I can't get the service ticket.
Is there anyway I can pass the Kerberos ticket with the job? (I know it might be dangerous since we don't want to pass the secret around), JobConf could pass the string to string pairs to Hadoop, but I have to convert the TGT to a json string and revert it during job running?
Or is it possible to use the delegation token reform TGT?
I tried to google it but not much information, anyone could help? Thank you.
**Editted:**
Looks like there's no easy way of doing this without passing the TGT to Hadoop, so I am going to try the following method by passing the TGT as string via job config map to Hadoop (String only), and convert the string back to TGT object when the job runs in Hadoop. The concern is I am going to pass the credentials over the network, which is not a best practice and one of the very reasons Hadoop didn't pass Kerberos around for security. If I could re-use the reformed TGT passed to Hadoop to get the service tickets, I will try to encrypt the TGT string as much as possible to avoid security issues.
So before starting a job in the local machine, the code would be like:
import sun.security.krb5.Credentials;
Credentials tgt = Credentials.acquireTGTFromCache(null, null); // Make sure kinit is done before this
String tgtStr = tgt.convertToJsonString(); //Need to implement this
Job job = new Job("Test");
JobConf jobConf = job.getJobConf();
jobConf.set("tgtStr", tgtStr);
job.addTask(Test.class, "run", null);
job.submit();
job.waitForCompletion(true);
Then the function in the job for Hadoop to run would be like:
Configuration conf = TaskContext.get().getConfiguration();
String tgtStr = conf.get("tgtStr");
Credentials tgt = reformTGTFromString(tgtStr);//Need to implement this
Credentials serviceTicket = Credentials.acquireServiceCreds(servicePrincipal, tgt); //This is to get any service ticket
So I need to implement two function to stream TGT object (Credentials.class) to string and then reform it back to object.
Anyone knows a better solution for this? Thanks.
Please see the design at http://carfield.com.hk:8080/document/distributed/hadoop-security-design.pdf , if you have not done so already.
Or is it possible to use the delegation token reform TGT?
No, the delegation tokens are issued by Hadoop name node and while it is based on the Kerberos authentication, it is independent and you can not derive the Kerberos TGT from it.
In the original design, we considered using solely Kerberos(without any additional tokens), which would have made your plan easy but decided against it for these reasons:
Performance:
Thousands of M/R tasks may need to get the Kerberos tickets at the
same time
Kerberos credentials need to be renewed before the expiry
For scheduled jobs, this will be an issue
Delegation tokens don’t depend on Kerberos and can be coupled with non-Kerberos authentication mechanisms- (such as SSL) used at the edge.
In your case, you can use a private distributed cache and send the forwardable TGT. I think this will be OK but need to think about it some more. Obviously you need to make sure your implementation is secure, that your tickets have minimally necessary lifetime, IP channel bindings are used if possible and restrict the use of tickets only to authorized processes.
By disassembling the Credentials fields and convert them to Strings using Base64 encoder, form a JSON string and pass it to Hadoop using config map or distributed cache suggested by RVM, and then reform the Credentials object in the job running on Hadoop, I can get back the Kerberos TGT and successfully get any service tickets using it. So this method works, and the only thing here needs to be very cautious is the encryption of the keys that are passed over network.
First of all, your account has to have delegation enabled. The service ticket has to request a forwardable ticket. If that is all true, Hadoop has to retrieve the delegated credential from the GSSContext and construct a new one on behalf of you. With that new TGT it will be able to perform further steps. Use Wireshark to check the ticket for hadoop.

Resources