Has anyone managed to run Nutch 2.3.1 on a Hadoop 2 cluster? I have been trying to run Nutch 2.3.1 on my Hadoop/Yarn 2.7.1 cluster for about two days now.
First of all, my Nutch is installed only locally and not on all nodes. I set HBase to be the storage engine.
Initially, downloading and trying it on the cluster was failing because it could not find some libraries at the workers side, which I resolved by modifying the runtime/local/bin/nutch script to include all the libraries when sending the jar to be executed:
LIBJARS="$NUTCH_HOME"/lib/apache-nutch-2.3.1.jar
for f in "$NUTCH_HOME"/lib/*.jar; do
LIBJARS="${LIBJARS},$f";
done
# run it
exec "${EXEC_CALL[#]}" $CLASS -libjars $LIBJARS "$#"
However, after resolving this, I am encountering the following error which I don't know how to resolve:
InjectorJob: starting at 2016-05-11 10:37:46
InjectorJob: Injecting urlDir: /user/ubuntu/urls
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
Error: org.apache.gora.util.GoraException: java.lang.RuntimeException: java.net.MalformedURLException
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:118)
at org.apache.gora.mapreduce.GoraOutputFormat.getRecordWriter(GoraOutputFormat.java:88)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.net.MalformedURLException
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:132)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
... 10 more
Caused by: java.net.MalformedURLException
at java.net.URL.<init>(URL.java:630)
at java.net.URL.<init>(URL.java:493)
at java.net.URL.<init>(URL.java:442)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:518)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:865)
at org.apache.gora.hbase.store.HBaseStore.readMapping(HBaseStore.java:719)
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:116)
... 12 more
Caused by: java.lang.NullPointerException
at java.net.URL.<init>(URL.java:535)
... 25 more
Error: org.apache.gora.util.GoraException: java.lang.RuntimeException: java.net.MalformedURLException
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:118)
at org.apache.gora.mapreduce.GoraOutputFormat.getRecordWriter(GoraOutputFormat.java:88)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.net.MalformedURLException
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:132)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
... 10 more
Caused by: java.net.MalformedURLException
at java.net.URL.<init>(URL.java:630)
at java.net.URL.<init>(URL.java:493)
at java.net.URL.<init>(URL.java:442)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:518)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:865)
at org.apache.gora.hbase.store.HBaseStore.readMapping(HBaseStore.java:719)
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:116)
... 12 more
Caused by: java.lang.NullPointerException
at java.net.URL.<init>(URL.java:535)
... 25 more
Error: org.apache.gora.util.GoraException: java.lang.RuntimeException: java.net.MalformedURLException
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:118)
at org.apache.gora.mapreduce.GoraOutputFormat.getRecordWriter(GoraOutputFormat.java:88)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.net.MalformedURLException
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:132)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
... 10 more
Caused by: java.net.MalformedURLException
at java.net.URL.<init>(URL.java:630)
at java.net.URL.<init>(URL.java:493)
at java.net.URL.<init>(URL.java:442)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:518)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:865)
at org.apache.gora.hbase.store.HBaseStore.readMapping(HBaseStore.java:719)
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:116)
... 12 more
Caused by: java.lang.NullPointerException
at java.net.URL.<init>(URL.java:535)
... 25 more
InjectorJob: java.lang.RuntimeException: job failed: name=apache-nutch-2.3.1.jar, jobid=job_1462952885071_0009
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:119)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:231)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Got it. First of all, I was trying to execute the runtime/local/bin scripts, which will not work on cluster. The correct scripts to run in this case are the ones in runtime/deploy/bin. According to nutch wiki
The Nutch job jar you find in $NUTCH_HOME/runtime/deploy is self containing and ships with all the configuration files necessary for Nutch to be able to run on any vanilla Hadoop cluster. All you need is a healthy cluster and a Hadoop environment (cluster or local) that points to the jobtracker.
Also, and very importantly, in order to correctly build nutch for distributed mode, the nutch-site.xml configuration should not contain the setting for plugin.folders. Mine contains
<property>
<name>http.agent.name</name>
<value>Sofia's Nutch Spider</value>
</property>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.hbase.store.HBaseStore</value>
</property>
<!-- If this is not set to -1, then big pages might not be scanned till the end -->
<property>
<name>http.content.limit</name>
<value>-1</value>
</property>
<property>
<name>file.crawl.parent</name>
<value>false</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:8020</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-master:8032</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value> org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-master:8030</value>
</property>
Related
I am getting below exception while submitting a job to Yarn:
java.net.ConnectException: Call From sparkstandalone.abc.com/127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
I am not able to identify where the problem exactly lies.
Resource manager(Yarn) is running. Below is my yarn-site.xml
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>127.0.0.1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>127.0.0.1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>127.0.0.1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>127.0.0.1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>127.0.0.1:8088</value>
</property>
I'm learning a bit of Data Science and I'm trying to discover and understand the various tools related to it.
So far I have a working installation of Hadoop 2.8.0 on Mac OS and now I'd like to make Spark 2.1.1 work too. I know that Spark doesn't necessarily need the Hadoop environment to work, but I also know that making it run over YARN can be useful in order to share data with other applications.
After reading different guides and suggestions online, this is what I have done:
In Hadoop configuration files, I added in yarn-site.xml:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>localhost:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8033</value>
</property>
In Spark configuration files, I added in spark-env.sh:
export SPARK_MASTER_IP=localhost
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=800m
export SPARK_WORKER_ISTANCES=1
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_EXECUTOR_INSTANCES=1
export SPARK_LOCAL_IP=127.0.0.1
Now, after starting Hadoop with $HADOOP_HOME/sbin/start-dfs.sh and $HADOOP_HOME/sbin/start-yarn.sh, if I try to launch:
sudo spark-shell --master yarn
(which should be the way to make Spark run over Yarn, if I understand correctly), after a very long time, I get the following error:
17/06/09 14:55:44 ERROR SparkContext: Error initializing SparkContext.
java.net.ConnectException: Call From Alessandro.local/192.168.2.1 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy12.getNewApplication(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:221)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.getNewApplication(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:219)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:227)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:159)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
at $line3.$read$$iw$$iw.<init>(<console>:15)
at $line3.$read$$iw.<init>(<console>:42)
at $line3.$read.<init>(<console>:44)
at $line3.$read$.<init>(<console>:48)
at $line3.$read$.<clinit>(<console>)
at $line3.$eval$.$print$lzycompute(<console>:7)
at $line3.$eval$.$print(<console>:6)
at $line3.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37)
at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:105)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
at org.apache.spark.repl.Main$.doMain(Main.scala:69)
at org.apache.spark.repl.Main$.main(Main.scala:52)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 69 more
17/06/09 14:55:44 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
What am I doing wrong? Maybe it's something obvious, but I'm new to this and I need some help.
The 0.0.0.0 address in the exception points to the fact that spark-shell is not configured to pick up the address of YARN resource-manager. (Refer this)
Spark picks up the address of YARN ResourceManager from HADOOP_CONF_DIR or YARN_CONF_DIR. In your case, I would suspect that HADOOP_CONF_DIR is not set properly. Just a hunch. Hope this helps!
I have successfully installed Java, Hadoop MySQL. I am getting this error while running hive. I was getting the file permission error before which got fixed by making the metastore_db folder under hive folder to 777 permission mode. How can i resolve this issue.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:591)
at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:531)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:226)
at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:366)
at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:310)
at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:290)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:266)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:558)
... 9 more
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1654)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:80)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:101)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3367)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3406)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3386)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3640)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:236)
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:221)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1652)
... 23 more
Caused by: MetaException(message:Version information not found in metastore. )
at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:7753)
at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:7731)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
at com.sun.proxy.$Proxy20.verifySchema(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:565)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:626)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:416)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:78)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6490)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:238)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70)
... 28 more
I tried even moving the db.lck and dbex.lck files It did not work. I restarted Hadoop and machine several times after the changes made nothing worked. Help !
Thanks in advance.
hive-site.xml
<configuration> <property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description> </property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description> </property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>USER NAME</description> </property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>PASSWORD ***</description> </property> </configuration>
From the property, it will create only the database but not the schema. Try creating the schema required for Hive metastore in MySQL
cd $HIVE_HOME/scripts/metastore/upgrade/mysql/
< login to mysql >
mysql> use metastore;
mysql> source hive-schema-<your_version>.mysql.sql;
Then restart hive metastore process.
For me, I got this error when not start hive inside $HIVE_HOME.
I am trying to configure Hbase in pseudo distributed mode integrated with Hadoop which is already running in pseudo distributed mode. Hbase-master fails to start.
1.
hbase-site.xml looks like below:
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8030/hbase</value>
</property>
<!-- <property>
<name>hbase.rootdir</name>
<value>file:/home/hadoop/HBase/HFiles</value>
</property> -->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
</configuration>
hbase-master fails to start and below error is written in hbase-root-master-bdhost.log
2016-01-08 17:48:38,333 FATAL [bdhost:16000.activeMasterManager] master.HMaster: Failed to become active master
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN]
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy16.setSafeMode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.setSafeMode(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setSafeMode(ClientNamenodeProtocolTranslatorPB.java:602)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)
at com.sun.proxy.$Proxy17.setSafeMode(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:2264)
at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:986)
at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:970)
at org.apache.hadoop.hbase.util.FSUtils.isInSafeMode(FSUtils.java:524)
at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:970)
at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:417)
at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:146)
at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:126)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:649)
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:182)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1646)
at java.lang.Thread.run(Thread.java:745)
2016-01-08 17:48:38,334 FATAL [bdhost:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN]
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy16.setSafeMode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
....
I'm using hadoop 2.6.3 and hbase 1.1.2 on Fedora Linux release 21.
Tried disabling selinux, ipv6 but that did not help.
Any pointer is much appreciated?
Thanks.
In the following property try giving the name as rootDir (with an uppercase D) and let me know if it works. Of course make sure the HDFS is running on the port mentioned in the property.
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8030/hbase</value>
</property>
Ravi
I currently have a problem where intermittently my oozie workflow will be unable to connect to my hive metastore. It seems like it is running out of connections to the hive-metastore?
Caused by: MetaException(message:Could not connect to meta store using
any of the URIs provided. Most recent failure:
org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:277)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:163)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:51)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151)
at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:1013)
at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:1000)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:8732)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8097)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:443)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:347)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:908)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:445)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:455)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:711)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:261)
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:238)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 44 more
)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:323)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:163)
... 42 more
FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
I can run hive from the command line. Use hue to communicate to the hive-metastore and execute queries. This only seems to happen half way through my oozie workflows. Each and every hive action i commit has the hive-site.xml config as oozie.hive.defaults.
hive-site.xml
<property> <name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value> <description>Thrift uri for
the remote metastore. Used by metastore client to connect to remote
metastore.</description> </property>
<property> <name>datanucleus.fixedDatastore</name>
<value>false</value> </property>
<property> <name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property> <name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value> <description>Driver class name
for a JDBC metastore</description> </property>
<property> <name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value> <description>username to use against metastore
database</description> </property>
<property> <name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value> <description>password to use against metastore
database</description> </property>
hive-metastore, hive-server2, mysql-server and oozie all run on the same host for the moment so localhost works. Any ideas? I have oozie share lib default true enabled and the sharelibs has been created.
CDH 4.2.1 packages
-oozie: oozie-3.3.0
-hive-metastore: hive-metastore-0.10.0
-hive-server2: hive-server2-0.10.0
-mysql-server: mysql-server-5.1.69-1
Any help would be greatly appreciated
oozie.hive.defaults is deprecated in the Hive action. Can you try to have Job Xml pointing to the hive-site.xml uploaded on the HDFS?