Facing issue in setting up oozie with secure MapR cluster - hadoop

We are facing an issue with setting up an oozie service with secure mapr cluster.
We are using the MapR installer to setup the MapR Cluster. Below are the configuration and steps that we followed.
MapR version - 6.1
Os - Ubuntu 16.04
Authentication - Kerberos
Nodes - Single node
We have enabled the Mapr security by using the Enable Secure Cluster option in the installer.
Reference doc - https://docs.datafabric.hpe.com/61/AdvancedInstallation/using_enable_secure_cluster_option.html
We have installed the kerberos in the machine.
Reference doc - https://linuxconfig.org/how-to-install-kerberos-kdc-server-and-client-on-ubuntu-18-04
Below are the commands we executed to setup kerberos authentication for the MapR cluster
Reference docs -
https://docs.datafabric.hpe.com/61/SecurityGuide/Configuring-Kerberos-User-Authentication.html
https://docs.datafabric.hpe.com/61/SecurityGuide/ConfiguringSPNEGOonMapR.html
sudo kadmin.local
addprinc -randkey mapr/my.cluster.com
ktadd -k /opt/mapr/conf/mapr.keytab mapr/my.cluster.com
addprinc -randkey HTTP/<instance-name>#<realm-name>
ktadd -k /opt/mapr/conf/http.keytab HTTP/<instance-name>#<realm-name>
addprinc -randkey mapr/<instance-name>#<realm-name>
ktadd -k /opt/mapr/conf/mapr2.keytab mapr/<instance-name>#<realm-name>
sudo chown mapr:mapr /opt/mapr/conf/mapr.keytab /opt/mapr/conf/http.keytab /opt/mapr/conf/mapr2.keytab
sudo chmod 777 /opt/mapr/conf/mapr.keytab /opt/mapr/conf/http.keytab /opt/mapr/conf/mapr2.keytab
ktutil
rkt /opt/mapr/conf/mapr.keytab
rkt /opt/mapr/conf/http.keytab
rkt /opt/mapr/conf/mapr2.keytab
wkt /opt/mapr/conf/mapr.keytab
sudo /opt/mapr/server/configure.sh -N my.cluster.com -C <CLDB Node>:7222 -Z <ZookeeperNode>:5181 -K -P "mapr/my.cluster.com#<realm-name>"
Note:
The command which is mentioned in the doc (configure.sh -K -P "<cldbPrincipal>") throws error , but the above command works.
kinit
maprlogin kerberos
hadoop fs -ls
3.1 ) We are able to access the mapr file system.
3.2) We are using the below command to run a simple mapreduce job and it works fine.
hadoop jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0-mapr-1808.jar pi 16 1000
Oozie configuration with kerberos authentication
Reference doc - https://docs.datafabric.hpe.com/61/Oozie/ConfiguringOozieonaSecureCluster.html
We have added below properties in the oozie-site.xml
<property>
<name>oozie.authentication.type</name>
<value>kerberos</value>
<description>
Defines authentication used for Oozie HTTP endpoint.
Supported values are: simple | kerberos | #AUTHENTICATION_HANDLER_CLASSNAME#
</description>
</property>
<property>
<name>oozie.service.HadoopAccessorService.keytab.file</name>
<value>/opt/mapr/conf/mapr.keytab</value>
<description>
Location of the Oozie user keytab file.
</description>
</property>
<property>
<name>local.realm</name>
<value>{local.realm}</value>
<description>
Kerberos Realm used by Oozie and Hadoop. Using 'local.realm' aligns with Hadoop configuration
</description>
</property>
<property>
<name>oozie.service.HadoopAccessorService.kerberos.principal</name>
<value>mapr/<hostname>#${local.realm}</value>
<description>
Kerberos principal for Oozie service.
</description>
</property>
<property>
<name>oozie.authentication.kerberos.principal</name>
<value>HTTP/<hostname>#${local.realm}</value>
<description>
Indicates the Kerberos principal to be used for the HTTP endpoint. The principal MUST start with 'HTTP/' per the Kerberos HTTP SPNEGO specification.
</description>
</property>
We are checking the oozie status by using bin/oozie admin -status -auth KERBEROS command , we are getting below error.
java.io.IOException: Error while connecting Oozie server. No of retries = 1. Exception = Could not authenticate, Authentication failed, status: 302
Kindly help us to resolve this issue

Oozie is a frigging nightmare in general. Adding Kerberos won't make it easier. Just saying.
The issue that you are describing appears to be that some component isn't getting the memo about the Kerberos identity that you are using or doesn't have access/permissions to validate an access. This is a common problem and typically requires step-by-step interaction to work through what is known and what is not yet known (but often is assumed). I am definitely not an expert on these kinds of issues, however.
You have a really excellent problem report here which is exactly the sort of thing that the support team can use.
Do you have an active support or partner in place?

Related

Oozie on YARN - oozie is not allowed to impersonate hadoop

I'm trying to use Oozie from Java to start a job on a Hadoop cluster. I have very limited experience with Oozie on Hadoop 1 and now I'm struggling trying out the same thing on YARN.
I'm given a machine that doesn't belong to the cluster, so when I try to start my job I get the following exception:
E0501 : E0501: Could not perform authorization operation, User: oozie is not allowed to impersonate hadoop
Why is that and what to do?
I read a bit about core-site properties that need to be set
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>users</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>master</value>
</property>
Does it seem that this is the problem? Should I contact people responsible for cluster to fix that?
Could there be problems because I'm using same code for YARN as I did for Hadoop 1? Should something be changed? For example, I'm setting nameNode and jobTracker in workflow.xml, should jobTracker exist, since there is now ResourceManager? I have set the address of ResourceManager, but left the property name as jobTracker, could that be the error?
Maybe I should also mention that Ambari is used...
Hi please update the core-site.xml
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
and jobTracker address is the Resourcemananger address that will not be the case . once update the core-site.xml file it will works.
Reason:
Cause of this type of error is- You run oozie server as a hadoop user but you define oozie as a proxy user in core-site.xml file.
Solution:
change the ownership of oozie installation directory to oozie user and run oozie server as a oozie user and problem will be solved.

Running JAR in Hadoop on Google Cloud using Yarn-client

i want to run a JAR in Hadoop on Google Cloud using Yarn-client.
i use this command in the master node of hadoop
spark-submit --class find --master yarn-client find.jar
but it return this error
15/06/17 10:11:06 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m-on8g/10.240.180.15:8032
15/06/17 10:11:07 INFO ipc.Client: Retrying connect to server: hadoop-m-on8g/10.240.180.15:8032. Already tried 0
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
What is the problem? In case it is useful this is my yarn-site.xml
<?xml version="1.0" ?>
<!--
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/yarn-logs/</value>
<description>
The remote path, on the default FS, to store logs.
</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-m-on8g</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5999</value>
<description>
In your case, it looks like the YARN ResourceManager may be unhealthy for unknown reasons; you can try to fix yarn with the following:
sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/stop-yarn.sh
sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/start-yarn.sh
However, it looks like you're using the Click-to-Deploy solution; Click-to-Deploy's Spark + Hadoop 2 deployment actually doesn't support Spark on YARN at the moment, due to some bugs and lack of memory configs. You'd normally run into something like this if you just try to run it with --master yarn-client out-of-the-box:
15/06/17 17:21:08 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1434561664937
yarnAppState: ACCEPTED
15/06/17 17:21:09 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1434561664937
yarnAppState: ACCEPTED
15/06/17 17:21:10 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: 0
appStartTime: 1434561664937
yarnAppState: RUNNING
15/06/17 17:21:15 ERROR cluster.YarnClientSchedulerBackend: Yarn application already ended: FAILED
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
The well-supported way to deploy is a cluster on Google Compute Engine with Hadoop 2 and Spark configured to be able to run on YARN is to use bdutil. You'd run something like:
./bdutil -P <instance prefix> -p <project id> -b <bucket> -z <zone> -d \
-e extensions/spark/spark_on_yarn_env.sh generate_config my_custom_env.sh
./bdutil -e my_custom_env.sh deploy
# Shorthand for logging in to the master
./bdutil -e my_custom_env.sh shell
# Handy way to run a socks proxy to make it easy to access the web UIs
./bdutil -e my_custom_env.sh socksproxy
# When done, delete your cluster
./bdutil -e my_custom_env.sh delete
With spark_on_yarn_env.sh Spark should default to yarn-client, though you can always re-specify --master yarn-client if you want. You can see a more detailed explanation of the flags available in bdutil with ./bdutil --help. Here are the help entries just for the flags I included above:
-b, --bucket
Google Cloud Storage bucket used in deployment and by the cluster.
-d, --use_attached_pds
If true, uses additional non-boot volumes, optionally creating them on
deploy if they don't exist already and deleting them on cluster delete.
-e, --env_var_files
Comma-separated list of bash files that are sourced to configure the cluster
and installed software. Files are sourced in order with later files being
sourced last. bdutil_env.sh is always sourced first. Flag arguments are
set after all sourced files, but before the evaluate_late_variable_bindings
method of bdutil_env.sh. see bdutil_env.sh for more information.
-P, --prefix
Common prefix for cluster nodes.
-p, --project
The Google Cloud Platform project to use to create the cluster.
-z, --zone
Specify the Google Compute Engine zone to use.

Configuring HCatalog, WebHCat with Hive

I'm installing Hadoop, Hive to be integrated with WebHCat which will be used to run hive queries through it using Map-Reduce jobs of Hadoop.
I installed Hadoop 2.4.1 and Hive 0.13.0 (latest stable versions).
The request I'm sending using the web interface is:
POST: http://localhost:50111/templeton/v1/hive?user.name='hadoop'&statusdir='out'&execute='show tables'
And I got response as the following:
{
"id": "job_local229830426_0001"
}
But in the logs webhcat-console-error.log I find that exit value of this job is 1, which means some error occurred. Tracking this error I found it Missing argument for option: hiveconf
This is the webhcat-site.xml which contains the configurations of webhcat (known previously as templeton):
<configuration>
<property>
<name>templeton.port</name>
<value>50111</value>
<description>The HTTP port for the main server.</description>
</property>
<property>
<name>templeton.hive.path</name>
<value>/usr/local/hive/bin/hive</value>
<description>The path to the Hive executable.</description>
</property>
<property>
<name>templeton.hive.properties</name>
<value>hive.metastore.local=false,hive.metastore.uris=thrift://localhost:9933,hive.metastore.sasl.enabled=false</value>
<description>Properties to set when running hive.</description>
</property>
</configuration>
But the cmd query executed is weird as it have some additional hiveconf parameters with no values:
tool.TrivialExecService: Starting cmd: [/usr/local/hive/bin/hive, --service, cli, --hiveconf, --hiveconf, --hiveconf, hive.metastore.local=false, --hiveconf, hive.metastore.uris=thrift://localhost:9933, --hiveconf, hive.metastore.sasl.enabled=false, -e, show tables]
Any Idea?

After changing CDH5 Kerberos Authentication i am not able to access hdfs

I am trying to implement Kerberos authentication. I am using Hadoop 2.3 version of hadoop on cdh5.0.1. I have done the following changes :
Added following properties to core-site.xml
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
After restarting the daemon when i am issuing hadoop fs -ls / command, I am getting following error :
ls: Failed on local exception: java.io.IOException: Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.; Host Details : local host is: "cldx-xxxx-xxxx/xxx.xx.xx.xx"; destination host is: "cldx-xxxx-xxxx":8020;
Please help me out.
Thanks in advance,
Ankita Singla
There is a lot more to configuring a secure HDFS cluster than just specifying hadoop.security.authentication as Kerberos. See Configuring Hadoop Security in CDH 5 about the required config settings. You'll need to create appropriate keytab files. Only after you configured everything and you confirmed that none of the Hadoop services report any error in their respective logs (namenode, datanode on all hosts, resourcemanager, nodemanager on all nodes etc) can you attempt to connect.

DistCp from Local Hadoop to Amazon S3

I'm trying to use distcp to copy a folder from my local hadoop cluster (cdh4) to my Amazon S3 bucket.
I use the following command:
hadoop distcp -log /tmp/distcplog-s3/ hdfs://nameserv1/tmp/data/sampledata s3n://hdfsbackup/
hdfsbackup is the name of my Amazon S3 Bucket.
DistCp fails with unknown host exception:
13/05/31 11:22:33 INFO tools.DistCp: srcPaths=[hdfs://nameserv1/tmp/data/sampledata]
13/05/31 11:22:33 INFO tools.DistCp: destPath=s3n://hdfsbackup/
No encryption was performed by peer.
No encryption was performed by peer.
13/05/31 11:22:35 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 54 for hadoopuser on ha-hdfs:nameserv1
13/05/31 11:22:35 INFO security.TokenCache: Got dt for hdfs://nameserv1; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameserv1, Ident: (HDFS_DELEGATION_TOKEN token 54 for hadoopuser)
No encryption was performed by peer.
java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfsbackup
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
at org.apache.hadoop.security.SecurityUtil.buildDTServiceName(SecurityUtil.java:295)
at org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(FileSystem.java:282)
at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:503)
at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1046)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
Caused by: java.net.UnknownHostException: hdfsbackup
... 14 more
I have the AWS ID/Secret configured in the core-site.xml of all nodes.
<!-- Amazon S3 -->
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>MY-ID</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>MY-SECRET</value>
</property>
<!-- Amazon S3N -->
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>MY-ID</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>MY-SECRET</value>
</property>
I'm able to copy files from hdfs using the cp command without any problem. The below command successfully copied the hdfs folder to S3
hadoop fs -cp hdfs://nameserv1/tmp/data/sampledata s3n://hdfsbackup/
I know there is Amazon S3 optimized distcp (s3distcp) available, but I don't want to use it as it doesn't support update/overwrite options.
It looks like you are using Kerberos security, and unfortunately Map/Reduce jobs cannot access Amazon S3 currently if Kerberos is enabled. You can see more details in MAPREDUCE-4548.
They actually have a patch that should fix it but is not currently part of any Hadoop distribution, so if you have an opportunity to modify and build Hadoop from source here is what you should do:
Index: core/org/apache/hadoop/security/SecurityUtil.java
===================================================================
--- core/org/apache/hadoop/security/SecurityUtil.java (révision 1305278)
+++ core/org/apache/hadoop/security/SecurityUtil.java (copie de travail)
## -313,6 +313,9 ##
if (authority == null || authority.isEmpty()) {
return null;
}
+ if (uri.getScheme().equals("s3n") || uri.getScheme().equals("s3")) {
+ return null;
+ }
InetSocketAddress addr = NetUtils.createSocketAddr(authority, defPort);
return buildTokenService(addr).toString();
}
The ticket was last updated a couple days ago, so hopefully this will be officially patched soon.
An easier solution would be to just disable Kerberos, but that might not be possible in your environment.
I've seen that you might be able to do this if your bucket is named like a domain name, but I haven't tried it and even if this works this sounds like a hack.

Resources