Hadoop single-node starting issue - hadoop

I'm trying to bring up the hadoop standalone server (in aws) by executing
start-dfs.sh file but got the below error
Starting namenodes on [ip-xxx-xx-xxx-xx]
ip-xxx-xx-xxx-xx: Permission denied (publickey).
Starting datanodes
localhost: Permission denied (publickey).
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/hadoop/hdfs/tools/GetConf : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:808)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:442)
at java.net.URLClassLoader.access$100(URLClassLoader.java:64)
at java.net.URLClassLoader$1.run(URLClassLoader.java:354)
at java.net.URLClassLoader$1.run(URLClassLoader.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:347)
at java.lang.ClassLoader.loadClass(ClassLoader.java:430)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:323)
at java.lang.ClassLoader.loadClass(ClassLoader.java:363)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Installed Java version is javac 1.7.0_181
Hadoop is 3.0.3.
Below is the path contents in profile file
export JAVA_HOME=/usr
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
#export PATH=$PATH:$HADOOP_CONF_DIR
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
What is the issue ? is there anything i'm missing?
thanks

ssh-keygen
2.It will ask for folder location where it will copy the keys, I entered /home/hadoop/.ssh/id_rsa
3.it will ask for pass phrase, keep it empty for simplicity.
cat /home/hadoop/.ssh/id_rsa.pub .>> ssh/authorized_keys (To copy the newly generated public key to auth file in your users home/.ssh directory)
ssh localhost should not ask for a password
start-dfs.sh (Now it should work!)

Related

user authenticate issue In kerberos with keytab

I am trying to integrate Kerberos Hadoop with Pinot.and using below configurations.
Executables:
export HADOOP_HOME=/usr/hdp/2.6.3.0-235/hadoop
export HADOOP_VERSION=2.7.3.2.6.3.0-235
export HADOOP_GUAVA_VERSION=11.0.2
export HADOOP_GSON_VERSION=2.2.4
export GC_LOG_LOCATION=/home/hdfs/Pinot/pinotGcLog
export PINOT_VERSION=0.7.1
export PINOT_DISTRIBUTION_DIR=/home/hdfs/Pinot_IMP_FOLDER/apache-pinot-incubating-0.7.1-bin
export HADOOP_CLIENT_OPTS="-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml"
export SERVER_CONF_DIR=/home/hdfs/Pinot_IMP_FOLDER/apache-pinot-incubating-0.7.1-bin/bin
export ZOOKEEPER_ADDRESS=<ZOOKEEPER_ADDRESS>
export CLASSPATH_PREFIX="${HADOOP_HOME}/hadoop-hdfs/hadoop-hdfs-${HADOOP_VERSION}.jar:${HADOOP_HOME}/hadoop-annotations-${HADOOP_VERSION}.jar:${HADOOP_HOME}/hadoop-auth-${HADOOP_VERSION}.jar:${HADOOP_HOME}/hadoop-common-${HADOOP_VERSION}.jar:${HADOOP_HOME}/lib/guava-${HADOOP_GUAVA_VERSION}.jar:${HADOOP_HOME}/lib/gson-${HADOOP_GSON_VERSION}.jar"
export JAVA_OPTS="-Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:${GC_LOG_LOCATION}/gc-pinot-server.log"
controller.conf
controller.data.dir=<fs.defaultFS>/user/hdfs/controller_segment
controller.local.temp.dir=/home/hdfs/Pinot/pinot_tmp/
controller.zk.str=<ZOOKEEPER_ADDRESS>
controller.enable.split.commit=true
controller.access.protocols.http.port=9000
controller.helix.cluster.name=PinotCluster
pinot.controller.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.controller.storage.factory.hdfs.hadoop.conf.path=/usr/hdp/2.6.3.0-235/hadoop/conf
pinot.controller.segment.fetcher.protocols=file,http,hdfs
pinot.controller.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.principle='hdfs#HDFSSITHDP.COM'
pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.keytab='/home/hdfs/hdfs.keytab'
pinot.controller.storage.factory.hdfs.hadoop.kerberos.principle='hdfs#HDFSSITHDP.COM'
pinot.controller.storage.factory.hdfs.hadoop.kerberos.keytab='/home/hdfs/hdfs.keytab'
controller.vip.port=9000
controller.port=9000
pinot.set.instance.id.to.hostname=true
pinot.server.grpc.enable=true
Kerbeous Information:
kinit -V -k -t /home/hdfs/hdfs.keytab hdfs#HDFSSITHDP.COM
Using default cache: /tmp/krb5cc_57372
Using principal: hdfs#HDFSSITHDP.COM
Using keytab: /home/hdfs/hdfs.keytab
Authenticated to Kerberos v5
ERROR MESSAGE:
END: Invoking TASK controller pipeline for event ResourceConfigChange::15fc3764_TASK for cluster PinotCluster, took 278 ms
START AsyncProcess: TASK::TaskGarbageCollectionStage
END AsyncProcess: TASK::TaskGarbageCollectionStage, took 0 ms
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Trying to authenticate user 'hdfs#HDFSSITHDP.COM' with keytab '/home/hdfs/hdfs.keytab'..
Could not instantiate file system for class org.apache.pinot.plugin.filesystem.HadoopPinotFS with scheme hdfs
java.lang.RuntimeException: Failed to authenticate user principal ['hdfs#HDFSSITHDP.COM'] with keytab ['/home/hdfs/hdfs.keytab']
at org.apache.pinot.plugin.filesystem.HadoopPinotFS.authenticate(HadoopPinotFS.java:258) ~[pinot-hdfs-0.7.1-shaded.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
Caused by: java.io.IOException: Login failure for 'hdfs#HDFSSITHDP.COM' from keytab '/home/hdfs/hdfs.keytab': javax.security.auth.login.LoginException: Unable to obtain password from user
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:962) ~[pinot-orc-0.7.1-shaded.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at org.apache.pinot.plugin.filesystem.HadoopPinotFS.authenticate(HadoopPinotFS.java:254) ~[pinot-hdfs-0.7.1-shaded.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
... 15 more
Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user
at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:901) ~[?:1.8.0_241]
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:764) ~[?:1.8.0_241]
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) ~[?:1.8.0_241]
at org.apache.pinot.plugin.filesystem.HadoopPinotFS.authenticate(HadoopPinotFS.java:254) ~[pinot-hdfs-0.7.1-shaded.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
... 15 more
Failed to start a Pinot [CONTROLLER] at 21.954 since launch
java.lang.RuntimeException: java.lang.RuntimeException: Failed to authenticate user principal ['hdfs#HDFSSITHDP.COM'] with keytab ['/home/hdfs/hdfs.keytab']
at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:58) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
P.s. I am executing this hdfs user and for keytab file also user is hdfs .i have also given 777 access to hdfs.keytab file.
someone Kindly suggest What is the issue here.I have read multiple blocks and everywhere found that it is because of wrong prinicpal/keytab file combination/user don't have access/give 777 access to file/try with different user. tried all the the options but nothing worked as of now.
It Worked now.I just Removed ' from keytab and principle name. it was unable to read keytab with '.
Old Configuration:
pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.principle='hdfs#HDFSSITHDP.COM'
pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.keytab='/home/hdfs/hdfs.keytab'
pinot.controller.storage.factory.hdfs.hadoop.kerberos.principle='hdfs#HDFSSITHDP.COM'
pinot.controller.storage.factory.hdfs.hadoop.kerberos.keytab='/home/hdfs/hdfs.keytab'
New Configuration:
pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.principle=hdfs#HDFSSITHDP.COM
pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.keytab=/home/hdfs/hdfs.keytab
pinot.controller.storage.factory.hdfs.hadoop.kerberos.principle=hdfs#HDFSSITHDP.COM
pinot.controller.storage.factory.hdfs.hadoop.kerberos.keytab=/home/hdfs/hdfs.keytab

mapr windows client not working

I am trying to install mapr windows client. I have followed all the steps outlined in the mapr windows client installation. I have copied the ssl_truststore file from our cluster into the C:\opt\mapr\conf folder and ran the configure.bat file. It ran without any errors and I even verified the C:\opt\mapr\conf\mapr-clusters.conf with updated cluster name and CLDB nodes.
But however when i run the following command by changing to folder c:\opt\mapr\hadoop\hadoop-2.7.0\bin
hadoop fs -ls /
I get the following error
18/01/19 14:05:07 ERROR cldbutils.CLDBRpcCommonUtils: Exception during init
java.lang.UnsatisfiedLinkError: com.mapr.security.JNISecurity.SetClusterOption(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)I
at com.mapr.security.JNISecurity.SetClusterOption(Native Method)
at com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils.init(CLDBRpcCommonUtils.java:163)
at com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils.<init>(CLDBRpcCommonUtils.java:73)
at com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils.<clinit>(CLDBRpcCommonUtils.java:63)
at org.apache.hadoop.conf.CoreDefaultProperties.<clinit>(CoreDefaultProperties.java:69)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2147)
at org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2362)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2579)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2531)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2444)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1156)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1128)
at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1464)
at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:321)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:487)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
Exception in thread "main" java.lang.UnsatisfiedLinkError: com.mapr.security.JNISecurity.SetParsingDone()V
at com.mapr.security.JNISecurity.SetParsingDone(Native Method)
at com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils.init(CLDBRpcCommonUtils.java:231)
at com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils.<init>(CLDBRpcCommonUtils.java:73)
at com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils.<clinit>(CLDBRpcCommonUtils.java:63)
at org.apache.hadoop.conf.CoreDefaultProperties.<clinit>(CoreDefaultProperties.java:69)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2147)
at org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2362)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2579)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2531)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2444)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1156)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1128)
at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1464)
at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:321)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:487)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
We use java 8 and windows 7.
I am stuck with this issue for a while. I tried all the possible options but was not successful. Any help is greatly appreciated.
These are steps :
1. Install Mapr applications
Configure MapR client
/opt/mapr/server/configure.sh -N poc2.cibdatahub.com -c -C (Ur cluster name)
Upload the missing jar from MapR cluster node to Edge node under:
/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/
Create a local mapr account if you don't have one:
Example: useradd mapr
Passwd mapr
Set environment for user mapr
su mapr
vi ~/.bashrc
#append line below
export HADOOP_HOME=/opt/mapr/hadoop/hadoop-2.7.0
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export LD_LIBRARY_PATH=$HADOOP_COMMON_LIB_NATIVE_DIR
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
#HIVE home directory configuration
export HIVE_HOME=/opt/mapr/hive/hive-2.1
export PATH="$PATH:$HIVE_HOME/bin"
export HADOOP_USER_NAME="USERname"
# load environment
source ~/.bashrc
Edit /opt/mapr/spark/spark-2.1.0/conf/spark-defaults.conf by adding
spark.yarn.archive maprfs:///apps/spark/jars/spark-jars.zip
Verify the cluster setting
/opt/mapr/conf/mapr-clusters.conf has option secure=false
Test
hadoop fs -Dfs.mapr.trace -ls
Please try these options.

Exception :Server IPC version 9 cannot communicate with client version 4: using Hcatalog with Hive-0.14.0

I have following tools :
Hadoop-2.6.0,
Hive-0.14.0,
hbase-0.94.8,
sqoop-1.4.5,
pig-0.14.0 installed in a psuedo distributed environment on Ubuntu 14.0.4.
My goal is to use Hcatalog as an interface to work with Hive, Pig, MapReduce applications.
Steps I did:
1. I have Mysql configured as remote metastore, with mysql-connector-java-5.1.37 jar copied in HIVE_HOME/lib. I have created hive-site.xml in HIVE_HOME/conf for remote metastore but running on same machine.
2. I have hive-env.sh file with HADOOP_HOME pointing to Hadoop-2.6.0 home.
3. I have remote metastore running at port 9083
4. in bashrc file I have following env variables set:
#Hadoop variables start
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
export HADOOP_HOME=/home/user/hadoop-2.6.0
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
#Hadoop variables end
export HADOOP_USER_CLASSPATH_FIRST=true
export PIG_USER_CLASSPATH_FIRST=true
#PIG ENV VARIABLE
export PIG_HOME=/home/user/pig-0.14.0
export PATH=$PATH:$PIG_HOME/bin
#Hive Env Variable
export HIVE_HOME=/home/user/hive-0.14.0/apache-hive-0.14.0-bin
export PATH=$PATH:$HIVE_HOME/bin
#HCatalog env
export HCAT_HOME=$HIVE_HOME/hcatalog
export HCAT_HOME
export PATH=$PATH:$HCAT_HOME/bin
HCATJAR=$HCAT_HOME/share/hacatalog/hive-hcatalog-core-0.14.0.jar
export HCATJAR
HCATPIGJAR=$HCAT_HOME/share/hcatalog/hive-hcatalog-pig-adapter-0.14.0.jar
export HCATPIGJAR
export HADOOP_CLASSPATH=$HCATJAR:$HCATPIGJAR:$HIVE_HOME/lib/hive-exec-0.14.0.jar\
:$HIVE_HOME/lib/hive-metastore-0.14.0.jar:$HIVE_HOME/lib/jdo-api-*.jar:$HIVE_HOME/lib/libfb303-*.jar\
:$HIVE_HOME/lib/libthrift-*.jar:$HIVE_HOME/conf:$HADOOP_HOME/etc/hadoop
#Pig hcatalog integration
export PIG_OPTS=-Dhive.metastore.uris=thrift://localhost:9083
export PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/*:$HIVE_HOME/lib/*:$HCATPIGJAR:$HIVE_HOME/conf:$HADOOP_HOME/etc/hadoop
I am trying to invoke "hcat" command on HIVE_HOME/hcatalog/bin path. Below are the errors which is being generated:
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:444)
at org.apache.hive.hcatalog.cli.HCatCli.main(HCatCli.java:149)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
**Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4**
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:427)
... 6 more
Observation: After Googling a lot, If I understand right "Server IPC version 9 cannot communicate with client version 4" is hadoop version mismatch.
So I added HADOOP_HOME in hive-env.sh to refer to hadoop-2.6.0.
Error still persists. I am not sure what I am missing. Any help on this will be really appreciated.

Sqoop Export Oozie Workflow Fails with File Not Found, Works when ran from the console

I have a hadoop cluster with 6 nodes. I'm pulling data out of MSSQL and back into MSSQL via Sqoop. Sqoop import commands work fine, and I can run a sqoop export command from the console (on one of the hadoop nodes). Here's the shell script I run:
SQLHOST=sqlservermaster.local
SQLDBNAME=db1
HIVEDBNAME=db1
BATCHID=
USERNAME="sqlusername"
PASSWORD="password"
sqoop export --connect 'jdbc:sqlserver://'$SQLHOST';username='$USERNAME';password='$PASSWORD';database='$SQLDBNAME'' --table ExportFromHive --columns col1,col2,col3 --export-dir /apps/hive/warehouse/$HIVEDBNAME.db/hivetablename
When I run this command from an oozie workflow, and it's passed the same parameters, I receive the error (when digging into the actual job run logs from the yarn scheduler screen):
**2015-10-01 20:55:31,084 WARN [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Job init failed
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://hadoopnode1:8020/user/root/.staging/job_1443713197941_0134/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1568)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1432)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1390)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1312)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1080)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1519)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://hadoopnode1:8020/user/root/.staging/job_1443713197941_0134/job.splitmetainfo
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1563)
... 17 more**
Has anyone ever seen this and been able to troubleshoot it? It only happens from the oozie workflow. There are similar topics but no one seems to have solved this specific problem.
Thanks!
I was able to solve this problem by setting the user.name property on the job.properties file for the oozie workflow to the user yarn.
user.name=yarn
I think the problem was it did not have permission to create the staging files under /user/root. Once I modified the running user to yarn, the staging files were created under /user/yarn which did have the proper permission.

MapReduce ERROR UserGroupInformation - PriviledgedActionException

I am trying to run following MapReduce code in my local machine:
https://github.com/Jeffyrao/warcbase/blob/extract-links/src/main/java/org/warcbase/data/ExtractLinks.java
However, I met this exception:
[main] ERROR UserGroupInformation - PriviledgedActionException as:jeffy (auth:SIMPLE) cause:java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Resource file:/Users/jeffy/Documents/Eclipse/warcbase/map_backup.txt is not publicly accessable and as such cannot be part of the public cache.
Exception in thread "main" java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Resource file:/Users/jeffy/Documents/Eclipse/warcbase/map_backup.txt is not publicly accessable and as such cannot be part of the public cache.
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:144)
at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:155)
at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:625)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:391)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1287)
at org.warcbase.data.ExtractLinks.run(ExtractLinks.java:254)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.warcbase.data.ExtractLinks.main(ExtractLinks.java:270)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Resource file:/Users/jeffy/Documents/Eclipse/warcbase/map_backup.txt is not publicly accessable and as such cannot be part of the public cache.
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:140)
... 14 more
I think this problem is because of I am trying to add a file to DistributedCache(Look at my code at Line 81-86 and Line 235). Any suggestion is welcome. Thanks!
I've met with a similar problem while running a Hadoop 2 job with DistributedCache added in local environment. Finally the cause of my problem is that Hadoop 2 is not only verifying the path itself to have public execution & read access permission, but it also verifies that all its ancestor directories should have execution permission. In this case, if "/" or "/Users" does not have a 755 permission, the file will still fail to be added into public cache.
See method static boolean ancestorsHaveExecutePermissions(FileSystem fs,
Path path, LoadingCache<Path,Future<FileStatus>> statCache)at Hadoop class FSDownload.java
One solution could be granting permission to all directories (sounds unsafe).
And a better solution is making sure all resource files to be cached are in /tmp folder or any other folder that defaultly have a >755 permission.
I've met with similar problem.
I run mahout seq2sparse with tfidf in local mode. And raise error:
Exception in thread "main" java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Resource file:/root/title.tfidf/dictionary.file-0 is not publicly accessable and as such cannot be part of the public cache.
I found permission of /root is 750 by default
drwxr-x---. 12 root root 4096 16:04 root
So i changed permission of /root
chmod 755 /root
then it works. so thank Yitong.
I had to change permissions of only my home directory as following
chmod go+rx /home/hadoop
to fix the problem as / and /home already have rx permisions for group and other users on my system. Here 'hadoop' is my linux login/user name.

Resources