Oozie sharelib creation in hdfs.(Root is not able to impersonate root) - hadoop

I am following http://hadooptutorial.info/apache-oozie-installation-on-ubuntu-14-04/ for installing oozie 4.1.0 with hadoop 2.7.2
Build is successfull and i can able to create oozie war by issuing this command
hduser#master:~/oozie/oozie-bin$ sudo bin/oozie-setup.sh prepare-war
New Oozie WAR file with added 'ExtJS library, JARs' at /home/hduser/oozie/oozie-bin/oozie-server/webapps/oozie.war
INFO: Oozie is ready to be started
But when i issue this the command for crating sharelib got error
hduser#master:~/oozie/oozie-bin$ sudo bin/oozie-setup.sh sharelib create -fs hdfs://master:9000
output:
setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hduser/oozie/oozie-bin/libtools/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hduser/oozie/oozie-bin/libtools/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hduser/oozie/oozie-bin/libext/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
the destination path for sharelib is: /user/root/share/lib/lib_20160614094056
Error: User: root is not allowed to impersonate root
Stack trace for the error was (for debug purposes):
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate root
at org.apache.hadoop.ipc.Client.call(Client.java:1406)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy7.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy7.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1746)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1112)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1108)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1108)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1399)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:496)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:348)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1904)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1872)
at org.apache.oozie.tools.OozieSharelibCLI.run(OozieSharelibCLI.java:165)
at org.apache.oozie.tools.OozieSharelibCLI.main(OozieSharelibCLI.java:56)
Also i restarted my hadoop cluster but no success.
here is my core-site.xml
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
Can Anyone help?

Do not use sudo for creating sharelib and it will work.

Related

Hive does not launch in Ubuntu 20.04 and throws error

I am setting up hive after I setup hadoop on my personal machine. However after I install all the steps to configure hive I am still unable to use hive. I see the following error message:
hadoop#ub:/usr/local/hive/lib$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = d7574f3f-4ec1-4fac-bbbd-631c5f698e3c
Exception in thread "main" java.lang.ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are in module java.base of loader 'bootstrap')
at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:413)
at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:389)
at org.apache.hadoop.hive.cli.CliSessionState.<init>(CliSessionState.java:60)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
When I run jps I get the following:
hadoop#acharya-ub:/usr/local/hive/lib$ jps
4866 Jps
28644 SecondaryNameNode
28117 NameNode
28965 ResourceManager
29353 NodeManager
28349 DataNode
After searching over the internet, I found that running hiverser2 helped them but it hasn't helped me and I still get the same error message even after I start hiveserver2 service.
I am using java 11, hadoop-3.3.0 and apache-hive-3.1.2
I created symbolic links for the hadoop and hive locations:
lrwxrwxrwx 1 root root 23 Sep 1 22:39 hadoop -> /usr/local/hadoop-3.3.0
drwxr-xr-x 11 hadoop hadoop 4096 Sep 1 22:46 hadoop-3.3.0
drwxrwxr-x 10 hadoop hadoop 4096 Sep 1 23:02 apache-hive-3.1.2-bin
lrwxrwxrwx 1 root root 33 Sep 1 23:03 hive -> /usr/local/apache-hive-3.1.2-bin/
I then setup the following in my .bashrc file:
HADOOP_HOME=/usr/local/hadoop
HIVE_HOME=usr/local/hive
Can someone help please?

Unable to read HiveServer2 configs from ZooKeeper

I use HDP3.1. And I Ambari to deploy hadoop cluster and hive. After deployed, I can run hive in shell successfully. And then I deploy Apache Kylin2.6, it can sync hive table. But when I build the cube, I got the following error:
java.io.IOException: OS command error exit with return code: 1, error message: SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://datacenter1:2181,datacenter2:2181,datacenter3:2181/default;password=hdfs;serviceDiscoveryMode=zooKeeper;user=hdfs;zooKeeperNamespace=hiveserver2
19/02/15 10:04:53 [main]: INFO jdbc.HiveConnection: Connected to datacenter3:10000
19/02/15 10:04:53 [main]: WARN jdbc.HiveConnection: Failed to connect to datacenter3:10000
19/02/15 10:04:53 [main]: ERROR jdbc.Utils: Unable to read HiveServer2 configs from ZooKeeper
Error: Could not open client transport for any of the Server URI's in ZooKeeper: Failed to open new session: java.lang.IllegalArgumentException: Cannot modify dfs.replication at runtime. It is not in list of params that are allowed to be modified at runtime (state=08S01,code=0)
Cannot run commands specified using -e. No current connection
The command is:
hive -e "USE default;
I run hive command in shell. It's success. The connection string is same as the string when run build cube in kylin. I'm confused why it is success in shell but failed in building cube.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://datacenter1:2181,datacenter2:2181,datacenter3:2181/default;password=hdfs;serviceDiscoveryMode=zooKeeper;user=hdfs;zooKeeperNamespace=hiveserver2
19/02/15 12:10:19 [main]: INFO jdbc.HiveConnection: Connected to datacenter3:10000
Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
0: jdbc:hive2://datacenter1:2181,datacenter2:>
You can try to add these two properties to hive-site.xml.
<property>
<name>hive.security.authorization.sqlstd.confwhitelist</name>
<value>mapred.*|hive.*|mapreduce.*|spark.*</value>
</property>
<property>
<name>hive.security.authorization.sqlstd.confwhitelist.append</name>
<value>mapred.*|hive.*|mapreduce.*|spark.*</value>
</property>
Finally, I found the root cause. There is 'Cannot modify dfs.replication at runtime.' error message in the error log. Kylin set this property in $KYLIN_HOME/conf/kylin_hive_conf.xml. And when it is running hive command, it will auto append the properties in that file. The final command likes: hive --hiveconf dfs.replication=2 ..........
It looks like that dfs.replication property can't be appened to hive command. I removed this property in kylin_hive_conf.xml. And it works now.

Failed to initialize schema for HiveServer2 in Apache Hive 3.0.0 on Cygwin (Windows 10)

I already had a Hadoop 3.0.0 cluster consisting of 2 machine: 1 namenode + RM and 1 datanode. I tried to install Apache Hive 3.0.0 by following this document.
When I run schematool -dbType derby -initSchema --verbose on Cygwin, an exception was thrown:
$ schematool -dbType derby -initSchema --verbose
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/BigSol/apache-hive-3.0.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/BigSol/hadoop-3.0.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting metastore schema initialization to 3.0.0
org.apache.hadoop.hive.metastore.HiveMetaException: Unknown version specified for initialization: 3.0.0
org.apache.hadoop.hive.metastore.HiveMetaException: Unknown version specified for initialization: 3.0.0
at org.apache.hadoop.hive.metastore.MetaStoreSchemaInfo.generateInitFileName(MetaStoreSchemaInfo.java:137)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:580)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:562)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1445)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
*** schemaTool failed ***
When viewing the line of code that thrown the exception, I found that Hive tried to find a SQL schema located at %HIVE_HOME%\scripts\metastore\upgrade\derby\hive-schema-3.0.0.derby.sql.
I doubt that Cygwin messed up the path so that Hive didn't find that schema.
My questions:
How can I correct the path (or fix the problem)?
Are there batch files equivalent to *.sh files in %HIVE_HOME%\bin directory as Hive 2.1.1 have?
I found the solution. After running schematool on a Linux machine and copied metastore_db directory to Windows machine, I managed to start HiveServer2 but the beeline CLI said that the jar in C:\cygdrive\c\BigSol\apache-hive-3.0.0-bin\lib\hive-beeline-3.1.0.jar was not found.
It turned out that java in Cygwin parse the wrong path. I made a symbolic link from C:\cygdrive\c to C:\ and it worked.

Unable to start Hive CLI Hadoop(MapR)

I am trying to access hive CLI. However, it is failing to start with the following AccessControl issue.
Strangly enough, I am able to query hive data from Hue without the AccessControl issue. However, hive CLI is not working.
I am on a MapR cluster.
Any help is much appreciated.
[<user_name>#<edge_node> ~]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/hive/hive-2.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in file:/opt/mapr/hive/hive-2.1/conf/hive-log4j2.properties Async: true
2017-09-23 23:52:08,988 WARN [main] DataNucleus.General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/mapr/spark/spark-2.1.0/jars/datanucleus-api-jdo-4.2.4.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/mapr/hive/hive-2.1/lib/datanucleus-api-jdo-4.2.1.jar."
2017-09-23 23:52:08,993 WARN [main] DataNucleus.General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/mapr/spark/spark-2.1.0/jars/datanucleus-core-4.1.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/mapr/hive/hive-2.1/lib/datanucleus-core-4.1.6.jar."
2017-09-23 23:52:09,004 WARN [main] DataNucleus.General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/mapr/spark/spark-2.1.0/jars/datanucleus-rdbms-4.1.19.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/mapr/hive/hive-2.1/lib/datanucleus-rdbms-4.1.7.jar."
2017-09-23 23:52:09,038 INFO [main] DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
2017-09-23 23:52:09,039 INFO [main] DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
2017-09-23 23:52:14,2251 ERROR JniCommon fs/client/fileclient/cc/jni_MapRClient.cc:2172 Thread: 20235 mkdirs failed for /user/<user_name>, error 13
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: User <user_name>(user id 50005586) has been denied access to create <user_name>
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:617)
at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:531)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:646)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.security.AccessControlException: User <user_name>(user id 50005586) has been denied access to create <user_name>
at com.mapr.fs.MapRFileSystem.makeDir(MapRFileSystem.java:1256)
at com.mapr.fs.MapRFileSystem.mkdirs(MapRFileSystem.java:1276)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1913)
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.getDefaultDestDir(DagUtils.java:823)
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.getHiveJarDirectory(DagUtils.java:917)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createJarLocalResource(TezSessionState.java:616)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:256)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.beginOpen(TezSessionState.java:220)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:614)
... 10 more
The error is saying you're defined access to create a directory in the file system. This is likely /user/<user name>, which will need to be added by the HDFS / MapR FS super user.
I am able to query hive data from Hue without the AccessControl
Hue communicates via Thrift and HiveServer2.
Hive CLI bypasses HiveServer2 and is deprecated.
You should use Beeline instead.
beeline -n $(whoami) -u jdbc:hive2://hiveserver:10000/default
And if you're in a kerberized cluster, then you'll need some extra options there.

Create input file in HDFS

I'm trying to create an input file in the hdfs using this command :
hduser#salma-SATELLITE-C855-1EQ:/usr/local/hadoop$ ./bin/hadoop fs -mkdir /in
but it gives me an error that the connection failed :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/10/09 02:12:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
mkdir: Call From salma-SATELLITE-C855-1EQ/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
I already run hadoop service with start-all.sh :
hduser#salma-SATELLITE-C855-1EQ:/usr/local/hadoop$ jps
20437 Jps
20030 SecondaryNameNode
19839 DataNode
So can anyone help to solve this problem
Your are missing the namenode ( primary ) this is where you code is attempitng to connect. Check the log to understand why it does not started.

Resources