After running hive command it failed to create database
Following the official "Getting Started" guide on Apache website
NestedThrowables:
java.sql.SQLException: Unable to open a test connection to the given
database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true,
username = APP. Terminating connection pool (set lazyInit to true if you
expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to create database 'metastore_db', see the next exception for details.
Caused by: java.sql.SQLException: Directory /opt/hive/bin/metastore_db cannot be created.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
... 80 more
Caused by: ERROR XBM0H: Directory /opt/hive/bin/metastore_db cannot be created.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.services.monitor.StorageFactoryService$10.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.services.monitor.StorageFactoryService.createServiceRoot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.createPersistentService(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.createPersistentService(Unknown Source)
... 80 more
check the particular user have permission to create /opt/hive/bin/metastore_db
if not then add permission
sudo chmod -R 777 /opt/hive/bin/metastore_db
hope this helps
use: chown -R hduser:hduser * xxxxxxxxx/hive/ (*or whatever username /usergroup you are using) to grant owner (hence full permission.....incl write/create)
(I dont know whether there is a default hive user/group named hive:hive, i created this but granting the ownership right to this hive user created wont work..)
basically the installation instruction assumes you (the hadoop and hence hive who inherit your right) have sufficient permissions.
(note: this is kind of equivalent to run command prompt in Admin Mode in windows, even though you are already admin). Linux tends to be flexible in many things so cause some chaos.
I just realized in hive: hive-site.xml and in the installation steps to create /user/hive/warehouse:
"user" can be so easily confused with "usr" without any alert at all. so remember:
under hadoop fs, this is /user/hive/warehouse, which points to metadata_db folder, to create (NOT /usr/hive/warehouse)
Related
Update: This has been resolved. It was a typo in the URL.
--
I'm trying to read data from Netezza using pyspark on Windows 10 1909.
I can read from it using DbVisualizer no problem. Then I tried running pyspark --driver-class-path <path to nzjdbc.jar> --jars <path to nzjdbc.jar> --master local[*] (same machine, VPN connection, JDBC driver jar, and all).
I used this code from the pyspark shell:
dataframe = spark.read.format("jdbc").options(
url="jdbc:netezza://<server>:5480/<database>",
dbtable="ADMIN.<table>",
user="***",
password="***",
driver="org.netezza.Driver",
).load()
but this fails for me, with the following stack, after about 10-20 seconds (I also tried adding queryTimeout="300", but that didn't make a difference):
"...\AppData\Local\Continuum\miniconda3\envs\spark\lib\site-packages\pyspark\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o41.load.
: org.netezza.error.NzSQLException: Connection timed out: connect
at org.netezza.sql.NzConnection.initSocket(NzConnection.java:2859)
at org.netezza.sql.NzConnection.open(NzConnection.java:293)
at org.netezza.datasource.NzDatasource.getConnection(NzDatasource.java:675)
at org.netezza.datasource.NzDatasource.getConnection(NzDatasource.java:662)
at org.netezza.Driver.connect(Driver.java:155)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:64)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:279)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:268)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:268)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:203)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source)
A coworker is able to run the same code from his Mac with no issues (also on VPN).
Is there something in Windows or in Netezza itself that could affect what clients are able connect to Netezza? Or could I be missing something in the pyspark command?
Can you try increasing the LoginTimeout value ? FYI queryTimeout refers to timeout for a single query.
I've found a typo in my URL. That was a really sneaky one
I went through quite a few Q/A on SO on the same topic but none of the solutions helped me solve this, hence asking the question. Mine is a Windows 10 64 bits machine.
This is my error stacktrace:
py4j.protocol.Py4JJavaError: An error occurred while calling o25.sql.
: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:141)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:136)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$2.apply(HiveSessionStateBuilder.scala:55)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$2.apply(HiveSessionStateBuilder.scala:55)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:91)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:91)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:701)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:730)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:685)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:715)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:708)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:708)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:654)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:78)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:183)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:117)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:271)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:384)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:286)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
... 84 more
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 99 more
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 39, in <module>
File "C:\Users\aakash.basu\Documents\spark-2.4.3-bin-hadoop2.7\python\pyspark\sql\session.py", line 767, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "C:\Users\aakash.basu\Documents\spark-2.4.3-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1257, in __call__
File "C:\Users\aakash.basu\Documents\spark-2.4.3-bin-hadoop2.7\python\pyspark\sql\utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: 'java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-;'
Whenever I'm trying to join two DF, it is trying to save the intermediate output in a temp location and throwing the above error.
I tried:
To give full access permission to the tmp folder: cacls C:\tmp /p everyone:f
To give full access from Properties of tmp and then security, its all ticked.
To give chmod 777 to C:\tmp\hive (as is solution from first answer of this)
And so on...
But nothing is working. On the contrary, my Pycharm works like a charm, it is running spark without any issue. I still want to fix my pyspark shell because I want to run ad hoc Spark in REPL mode, and running batch full jobs is not always convenient during debugging.
I'm currently running a derby DB instance created from version 10.13.1.1
I connect via the network mode (startNetworkServer) running on a redhat server.
I'm now wanting to upgrade to version 10.14.2.0
However, when trying to connect to the upgraded database, I receive an access denied "java.io.FilePermission" error.
Details:
I went and downloaded both version 10.13.1.1 and 10.14.2.0 onto my windows desktop.
A backup of the database is created using the following command: SYSCS_UTIL.SYSCS_BACKUP_DATABASE
I copied this backup to both the 10.13 and 10.14 folders.
Starting with my current version (13) i start the network server, and then use ij to connect to the database. This works fine, and i can see the tables. This validates my backup is fine.
connect 'jdbc:derby://localhost:1527/c:\Temp\13\database;create=false';
I then start my 14 versions network server, and then go to 14's ij. When I try to connect to the backup:
connect 'jdbc:derby://localhost:1527/c:\Temp\14\database;create=false';
I get the filePermission error:
ERROR XJ001: DERBY SQL error: ERRORCODE: 0, SQLSTATE: XJ001, SQLERRMC: java.security.AccessControlException
access denied ("java.io.FilePermission" "C:\Temp\updating_derby\threatadvisor" "read")
XJ001.U
Fair enough, I assume this is because i'm trying to connect to an older version, without having run the upgrade=true parameter. When I remove the create parameter, and add the upgrade parameter, it still fails with the same issue.
Ok, so perhaps I can't upgrade a DB via the network server, and I have to directly connect to the DB. From within my app, I use the following connection string:
jdbc:derby:C:/Temp/14/database;upgrade=true;
The app has the version 14 jar on the classpath, so should use it and upgrade. Which it does, the app starts normally and I see all the data. How do I know it upgraded? Because I tried to connect to this 14 database using 13 network server and ij, and it fails (as expected due to version).
So i'm done right? No, I once more try to connect to this now upgraded database via the network server, using ij and i once again get the java.io.FilePermission issue.
I went in and ensured the actual OS permissions on the folders and files inside the "database" folder are not just read-only. None are. Yet still it errors.
I've even tried running 14 network server on the redhat box (on a different port), and trying to connect to this db via ij and even there i get the file permission issue.
I'm really at a loss as to what to do next. Please help!
FYI, the full issue from the derby.log file:
Tue Jun 11 12:04:15 AEST 2019 : Apache Derby Network Server - 10.14.2.0 - (1828579) started and ready to accept connections on port 1527
Tue Jun 11 12:04:28 AEST 2019 Thread[DRDAConnThread_2,5,main] Cleanup action starting
java.security.AccessControlException: access denied ("java.io.FilePermission" "C:\Temp\14\database" "read")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at java.security.AccessController.checkPermission(AccessController.java:884)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
at java.io.File.exists(File.java:814)
at java.io.WinNTFileSystem.canonicalize(WinNTFileSystem.java:434)
at java.io.File.getCanonicalPath(File.java:618)
at org.apache.derby.impl.io.DirStorageFactory.doInit(Unknown Source)
at org.apache.derby.impl.io.BaseStorageFactory.init(Unknown Source)
at org.apache.derby.impl.io.DirStorageFactory.init(Unknown Source)
at org.apache.derby.impl.services.monitor.StorageFactoryService.privGetStorageFactoryInstance(Unknown Source)
at org.apache.derby.impl.services.monitor.StorageFactoryService.access$400(Unknown Source)
at org.apache.derby.impl.services.monitor.StorageFactoryService$12.run(Unknown Source)
at org.apache.derby.impl.services.monitor.StorageFactoryService$12.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.services.monitor.StorageFactoryService.getCanonicalServiceName(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection$4.run(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection$4.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.jdbc.EmbedConnection.startPersistentService(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.(Unknown Source)
at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source)
at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.jdbc.InternalDriver.getNewEmbedConnection(Unknown Source)
at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.connect(Unknown Source)
at org.apache.derby.impl.drda.Database.makeConnection(Unknown Source)
at org.apache.derby.impl.drda.DRDAConnThread.getConnFromDatabaseName(Unknown Source)
at org.apache.derby.impl.drda.DRDAConnThread.verifyUserIdPassword(Unknown Source)
at org.apache.derby.impl.drda.DRDAConnThread.parseSECCHK(Unknown Source)
at org.apache.derby.impl.drda.DRDAConnThread.parseDRDAConnection(Unknown Source)
at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown Source)
at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
Cleanup action completed
EDIT 1
Now trying to setup the security.policy file as per this guide. However, after creating a new policy file based off the template in the demo directory, we can't even get derby to pick up our file.
When we try to run:
java -classpath "C:\Temp\14\lib\derby.jar;C:\Temp\14\lib\derbynet.jar;C:\Temp\14\lib\derbyclient.jar;C:\Temp\14\lib\derbytools.jar;C:\Temp\14\lib\derbyoptionaltools.jar" -Djava.security.manager -Djava.security.policy=C:\Temp\14\server.policy org.apache.derby.drda.NetworkServerControl start
We get the following error:
java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at java.security.AccessController.checkPermission(AccessController.java:884)
at org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.getMonitorLite(Unknown Source)
at org.apache.derby.iapi.services.property.PropertyUtil$2.run(Unknown Source)
at org.apache.derby.iapi.services.property.PropertyUtil$2.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.iapi.services.property.PropertyUtil.getMonitorLite(Unknown Source)
at org.apache.derby.iapi.services.property.PropertyUtil.getSystemProperty(Unknown Source)
at org.apache.derby.iapi.services.property.PropertyUtil.getSystemProperty(Unknown Source)
at org.apache.derby.impl.drda.NetworkServerControlImpl.init(Unknown Source)
at org.apache.derby.impl.drda.NetworkServerControlImpl.(Unknown Source)
at org.apache.derby.drda.NetworkServerControl.main(Unknown Source)
I know this line is in the policy file (and uncommented):
permission org.apache.derby.security.SystemPermission "engine", "usederbyinternals";
However, I don't think it is even picking up our policy file, as if we change our reference to a non-existing policy file, we still get the same error.
Thanks to #BryanPendleton for pointing me in the right direction. For the initial issue, it was indeed because we needed the server.policy file. His link was helpful:
db.apache.org/derby/docs/10.14/security/csecjavasecurity.html
The second issue which we were having was resolved by using the server.policy file template located here:
https://builds.apache.org/job/Derby-docs/lastSuccessfulBuild/artifact/trunk/out/security/rsecbasicserver.html
Instead of the one provided in the download (the one in the derby download didn't have as many jars mentioned in it). More to the point, the way we referenced the jars had to be tweaked. You will see all the examples were for unix format, whereas we were developing on a test windows PC. Therefore instead of something like (unix):
grant codeBase "file:///home/someone/derby/lib/derby.jar"
We needed to do:
grant codeBase "file:///C:/Temp/14/lib/derby.jar"
Note the additional '/' after 'file' - we had assumed it was merely "file://C:...."
There is another solution to the problem which is to use this code:
https://github.com/apache/hive/blob/master/core/src/test/java/org/apache/hive/hcatalog/DerbyPolicy.java
and use this:
Policy.setPolicy(new DerbyPolicy());
To get a policy set programmatically.
I'm using a shared instance of Fiware Cosmos (meaning I don't have root privileges). I have until today successfully acessed and managed tables in hive both remotely using jdbc, and Hive CLI.
But now I'm getting this error when starting Hive CLI:
log4j:ERROR Could not instantiate class [org.apache.hadoop.hive.shims.HiveEventCounter].
java.lang.RuntimeException: Could not load shims in class org.apache.hadoop.log.metrics.EventCounter
at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:123)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:115)
at org.apache.hadoop.hive.shims.ShimLoader.getEventCounter(ShimLoader.java:98)
at org.apache.hadoop.hive.shims.HiveEventCounter.<init>(HiveEventCounter.java:34)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:357)
at java.lang.Class.newInstance(Class.java:310)
at org.apache.log4j.helpers.OptionConverter.instantiateByClassName(OptionConverter.java:330)
at org.apache.log4j.helpers.OptionConverter.instantiateByKey(OptionConverter.java:121)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:664)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:647)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:544)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:440)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:476)
at org.apache.log4j.PropertyConfigurator.configure(PropertyConfigurator.java:354)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:127)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:77)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:58)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:641)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.log.metrics.EventCounter
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:120)
... 27 more
log4j:ERROR Could not instantiate appender named "EventCounter".
Logging initialized using configuration in jar:file:/usr/local/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
I can however perform select and create in the Hive CLI.
If I then try to access Hive remotely, I get this:
Connecting to jdbc:hive://x.x.x.x:10000/default?user=user&password=XXXXXXXXXX
Could not establish connection: java.net.ConnectException: Connection refused
I didn't do any changes in code or commands before the errors appeared, and after googling around I haven't found any working solutions.
If anyone can guide me to where the problem is, or how to find it, or even better how to solve it, I'd be grateful.
Thanks in advance!
HiveServer2 (the Hive JDBC service) is a very unstable piece of shoftware. In our Prod cluster we have a CRON job to restart each instance every day, and even then, sometimes it blows OutOfMemory errors then just hangs saying Connection refused like you show. Open a ticket to your Hadoop admin so that he/she retarts the damn service.
On the other hand, the org.apache.hadoop.log.metrics.EventCounter message smells like someone tried to change a shared config somewhere (or tried to upgrade some JARs) and now Hive believes that it runs on a very, very old version of Hadoop
=> e.g. comments in Hive-4133 or that MapR support post
The cause of these issues were Hive upgrades in Cosmos. A more thorough explanation and solution is found here:
My Hive client stopped working with Cosmos instance
I am trying to load large data into a dynamically partitioned table in HIVE.
I keep on getting this error. If I load data without partitioning, it works fine. If I work with smaller data set (with partition), it works fine as well. But for large dataset I start getting this error
The Error:
2014-11-10 09:28:01,112 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file
/tmp/hive-username/hive_2014-11-10_09-25-26_785_2042278847834453465/_task_tmp.-ext-10002/
pseudo_element_id=NN%09/_tmp.000002_2
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /tmp/hive-username/hive_2014-11-10_09-25-26_785_2042278847834453465/_task_tmp.-ext-10002
/pseudo_element_id=NN%09/_tmp.000002_2: File does not exist.
Holder DFSClient_NONMAPREDUCE_-737676454_1 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2445)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2437)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2503)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2480)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:535)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44958)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)
at org.apache.hadoop.ipc.Client.call(Client.java:1225)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.complete(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:330)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.
java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy11.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1795)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1782)
at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:709)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:726)
at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:561)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2398)
at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2414)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
This usually happens when multiple mappers tries to access the same file.
Check your code for any bug which causes same file to be accessed simultaneously.This can also happen when a mapper access a file which is already deleted.
set the option hive.exec.parallel=false and rerun the query.
Just increase the no. of default allowed partitions per node and your job should be fine.
SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;