YARN client authentication fails with SIMPLE authentication is not enabled. Available:[TOKEN] - spring

I've setup a simple local PHD 3.0 Hadoop cluster and followed the steps described in the Spring Yarn Basic Getting Started guide
Running the app against my Hadoop cluster gives
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN]
and the following stack trace in the YARN ResourceManager:
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN]
at org.apache.hadoop.ipc.Server$Connection.initializeAuthContext(Server.java:1554)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1510)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:762)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:636)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:607)
This is probably a very basic question.
I'd like simply to run a YARN app test without setting up any authentication.
As I understand, YARN does not allow SIMPLE client authentication:
https://issues.apache.org/jira/browse/YARN-2156
According to this question
How can I pass a Kerberos ticket to Spring Yarn application
I might end up setting up a Kerberos authentication.
Is there a way to run Spring YARN example without elaborate authentication setup?

My mistake was simple.
I had to add
spring:
hadoop:
resourceManagerAddress: myyarnhost:8050
resourceManagerSchedulerAddress: myyarnhost:8030
to the application.yml too, but mixed up the port numbers (8030 for manager and 8050 for ManagerScheduler).
And that typo has caused such effect.
Maybe adding these two configuration properties to the getting started guide could save some time to the next readers.
Also, to run the example against a freshly installed PHD3.0 I had to modify the HDFS client user name by exporting the default HADOOP_USER_NAME:
export HADOOP_USER_NAME=hdfs

I just tried that with 5 node phd30 cluster and everything was ok:
In build.gradle I used phd30 packages instead of vanilla(which depends on hadoop 2.6.0). Versions in this case should not matter afaik.
compile("org.springframework.data:spring-yarn-boot:2.2.0.RELEASE-phd30")
testCompile("org.springframework.data:spring-yarn-boot-test:2.2.0.RELEASE-phd30")
In src/main/resources/application.yml I changed hdfs, rm and scheduler addresses to match cluster settings:
spring:
hadoop:
fsUri: hdfs://ambari-2.localdomain:8020
resourceManagerAddress: ambari-3.localdomain:8050
resourceManagerSchedulerAddress: ambari-3.localdomain:8030
Then I just ran it externally from my own computer:
$ java -jar target/gs-yarn-basic-single-0.1.0.jar
There's one appmaster and one container executed and app should succeed.
If it still doesn't work then there's something else. I didn't deploy hawk if that makes a difference.

Related

Getting "User [dr.who] is not authorized to view the logs for application <AppID>" while running a YARN application

I'm running a custom Yarn Application using Apache Twill in HDP 2.5 cluster, but I'm not able to see my own container logs (syslog, stderr and stdout) when I go to my container web page:
Also the login changes from my kerberos to "dr.who" when I navigate to this page.
But I can see the logs of map-reduce jobs. Hadoop version is 2.7.3 and the cluster is yarn acl enabled.
i had this issue with hadoop ui. I found in this doc, that the hadoop.http.staticuser.user is set to dr.who by default and you need include it in the related setting file (in my issue is core-site.xml file).
so late but hope useful.

Hadoop standalone - hdfs commands are slow

I'm doing development/research in an Ubuntu 14.04 VM with Hadoop 2.6.2 and I'm getting constantly held back because any commands I issue to hdfs always take about 15 seconds to run. I've tried digging around, but I am unable to locate the source of the problem or even if this is expected behavior.
I followed the directions on Apache's website and successfully got it up and running just fine in /opt/hadoop-2.6.2/
The following is a simple test command that I'm using to evaluate if I have solved the problem.
/opt/hadoop-2.6.2/bin/hdfs dfs -ls /
I have inspected the logs and found no errors or strange warnings. A recommendation that I found online was to set the logger to output the console.
HADOOP_ROOT_LOGGER=DEBUG,console /opt/hadoop-2.6.2/bin/hdfs dfs -ls /
Doing this yields something of interest. You can watch it hang between the following.
16/01/15 11:59:02 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
16/01/15 11:59:17 DEBUG util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
Thoughts: When I first saw this I assumed that it was hanging on authentication, but not only do I not have Kerberos installed, the default configuration for core-site.xml shows the authentication mode set to "simple". This makes wonder why it would be looking for anything Kerberos related to begin with. I attempted to specifically disable it in the xml and the lag/slowness didn't go away. I kinda get the feeling that the delay is because its timing out waiting for something. Does anyone else have any ideas?
I just went ahead and install Kerberos anyways just to see if it would work. Large delays have disappeared now that /etc/krb5.conf is present. I wonder if I could have just created the file with nothing in it. Hrmmm...
sudo apt-get install krb5-kdc krb5-admin-server

Apache Phoenix Installation not done properly

We are trying to install Phoenix 4.4.0 on HBase 1.0.0-cdh5.4.4 (CDH5.5.5 four nodes cluster) via this installation document: Phoenix installation
Based on that we copied our phoenix-server-4.4.0-HBase-1.0.jar to hbase libs on each region server and master server, so that, on each /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hbase/lib folder in the master and three region servers.
After that we reboot the HBase service via Cloudera Manager.
Everything seems to be ok, but when we are trying to access to phoenix shell via ./sqlline.py localhost command, we get a Zookeeper error in that way:
15/09/09 14:20:51 WARN client.ZooKeeperRegistry: Can't retrieve clusterId from Zookeeper
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
So we are not sure that the installation is properly done. Is necessary any further configuration?
We are not even sure wether we are using the sqlline command properly.
Any help will be appreciated.
After reinstalling the 4 nodes cluster on AWS, phoenix is now working properly.
It's a pitty that we don't know exactly what was really happening, but we think that after several changes in our config, we broke something that made phoenix impossible to work.
One thing to take into consideration is that sqllline command has to be executed with an ip that is in the zookeeper quorum, and this is something we were doing wrong, since we were trying to run it from the namenode, and it wasn't in the zookeeper quorum.Once we run sqlline.py from a datanode, everything is working fine.
Btw, the installation guide that we finally followed is Phoenix Installation

Setting up Kerberos on HDP 2.1

I have 2 node Ambari Hadoop cluster running on CentOS 6. Recently I setup Kerberos for the services in the cluster as per the instructions detailed here:
http://docs.hortonworks.com/HDPDocuments/Ambari-1.6.0.0/bk_ambari_security/content/ambari-kerb.html
In addition to the above documentation, found that you have to add additional configurations for the Web Namenode UI and so on (QuickLinks in the Ambari server console for each of the Hadoop Services) to work. Hence I followed the configuration options, listed in the question portion of the article to setup HTTP Authentication:Hadoop Web Authentication using Kerberos
Also to create the secret http file, I used the command to generate the file on node 1, and then copied the file to the same folder location on node 2 on the cluster as well:
sudo dd if=/dev/urandom of=/etc/security/keytabs/http_secret bs=1024 count=1
Updated the Zookeeper JAAS client file under, /etc/zookeeper/conf/zookeeper_client_jaas.conf to add the following:
Client { com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=false
useTicketCache=true
keyTab='/etc/security/keytabs/zk.service.keytab'
principal='zookeeper/host#realm-name';
};
This step followed from the article: http://blog.spryinc.com/2014/03/configuring-kerberos-security-in.html
When I restarted my Hadoop services, I get the 401 Authentication Required error, when I try to access the NameNode UI/ NameNode Logs/ NameNode JMX and so on. None of the links given in the QuickLinks drop down is able to connect and pull up the data.
Any thoughts to resolve this error?

Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses-submiting job2remoteClustr

I recently upgraded my cluster from Apache Hadoop1.0 to CDH4.4.0. I have a weblogic server in another machine from where i submit jobs to this remote cluster via mapreduce client. I still want to use MR1 and not Yarn. I have compiled my client code against the client jars in the CDH installtion (/usr/lib/hadoop/client/*)
Am getting the below error when creating a JobClient instance. There are many posts related to the same issue but all the solutions refer to the scenario of submitting the job to a local cluster and not to remote and specifically in my case from a wls container.
JobClient jc = new JobClient(conf);
Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
But running from the command prompt on the cluster works perfectly fine.
Appreciate your timely help!
I had a similar error and added the following jars to classpath and it worked for me:
hadoop-mapreduce-client-jobclient-2.2.0.2.0.6.0-76:hadoop-mapreduce-client-shuffle-2.3.0.jar:hadoop-mapreduce-client-common-2.3.0.jar
It's likely that your app is looking at your old Hadoop 1.x configuration files. Maybe your app hard-codes some config? This error tends to indicate you are using the new client libraries but that they are not seeing new-style configuration.
It must exist since the command-line tools see them fine. Check your HADOOP_HOME or HADOOP_CONF_DIR env variables too although that's what the command line tools tend to pick up, and they work.
Note that you need to install the 'mapreduce' service and not 'yarn' in CDH 4.4 to make it compatible with MR1 clients. See also the '...-mr1-...' artifacts in Maven.
In my case, this error was due to the version of the jars, make sure that you are using the same version as in the server.
export HADOOP_MAPRED_HOME=/cloudera/parcels/CDH-4.1.3-1.cdh4.1.3.p0.23/lib/hadoop-0.20-mapreduce
I my case i was running sqoop 1.4.5 and pointing it to the latest hadoop 2.0.0-cdh4.4.0 which had the yarn stuff also thats why it was complaining.
When i pointed sqoop to hadoop-0.20/2.0.0-cdh4.4.0 (MR1 i think) it worked.
As with Akshay (comment by Setob_b) all I needed to fix was to get hadoop-mapreduce-client-shuffle-.jar on my classpath.
As follows for Maven:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-shuffle</artifactId>
<version>${hadoop.version}</version>
</dependency>
In my case, strangely this error was because in my 'core-site.xml' file, I mentioned "IP-address" rather than "hostname".
The moment I mentioned "hostname" in place of IP address and in "core-site.xml" and "mapred.xml" and re-installed mapreduce lib files, error got resolved.
in my case, i resolved this by using hadoop jar instead of java -jar .
it's usefull, hadoop will provide the configuration context from hdfs-site.xml, core-site.xml ....

Resources