Problems trying to load hwi service in hive-1.1.0? - hadoop

I have been teaching myself to use Hadoop (2.6.0) and associated applications in the case hive-1.1.0. I am run the hwi server using the information on Hadoop for Dummies page 237, but following the instructions there, I keep running into an error message which says the WAR file is not found in hive-1.1.0/lib.
I had to configure $HIVE_HOME/config/hive-site.xml file to point at where this WAR file is in hive-1.1.0/lib but when i run the command to start the hwi server, it does start but then breaks because in running this command, some of the lines in the path (which should come from my definition in hive-site.xml) are duplicated so the command cannot find the WAR file. I am attaching a screenshot of my hive-site.xml file and the results from what happens when I run the command hive --service hwi.
Relevant part of $HIVE_HOME/config/hive-site.xml file
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- Hive Execution Parameters -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/home/hadoop/Hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.hwi.war.file></name>
<value>$HIVE_HOME/lib/hive-hwi.0.12.0.war</value>
<description> This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
</configuration>
On this version of Hive, there was no WAR file, and I copied the hive-hwi.0.12.0.war from hive-0.12.0 as suggested
Results from the following:
[hadoop#fedora21_2 ~]$ hive --service hwi
15/04/05 15:53:02 INFO hwi.HWIServer: HWI is starting up
15/04/05 15:53:04 WARN conf.HiveConf: HiveConf of name hive.hwi.war.file> does not exist
15/04/05 15:53:04 FATAL hwi.HWIServer: HWI WAR file not found at /home/hadoop/hive-1.1.0/home/hadoop/hive-1.1.0/lib/hive-hwi-0.12.0.war
[hadoop#fedora21_2 ~]$
It looks as if when I ran the command to load the HWI service, somehow the command botched up the path to the WAR file as posted in hive-site.xml. Not sure what I am missing here.

change this property from:
<property>
<name>hive.hwi.war.file</name>
<value>{$HIVE_HOME}/lib/hive-hwi-[version].war</value>
</property>
to:
<property>
<name>hive.hwi.war.file</name>
<value>/lib/hive-hwi-[version].war</value>
</property>
You were having the problem because the final execution path became {$HIVE_HOME}/{$HIVE_HOME}/lib/hive-hwi-[version].war
This happened because you are already at the {$HIVE_HOME} directory while reading the configuration file.
So, if you remove {$HIVE_HOME} from your configuration, you get {$HIVE_HOME}/lib/hive-hwi-[version].war which is the correct path.
in your case, [version] = 0.12.0

make a folder inside hive
home/hadoop/hive-1.1.0/lib
and paste all the files of lib in this, then run the command
hive --service hwi
that will work will the time bug is fixed.

Related

Query tables present in external hive from Apache Spark [duplicate]

This question already has answers here:
How to connect Spark SQL to remote Hive metastore (via thrift protocol) with no hive-site.xml?
(11 answers)
Closed 2 years ago.
I am relatively new to hadoop ecosystem. My goal is to read hive tables using Apache Spark and process it. Hive is running in EC2 instance. Whereas Spark is running in my local machine.
To do a prototype, I've installed Apache Hadoop by following steps present over here . I've added required environment variables as well.
I've started dfs using $HADOOP_HOME/sbin/start-dfs.sh
I've installed Apache Hive by following steps present over here. I've started hiverserver2 and hive metadatastore. I've configured apache derby db (Server mode) in hive. I've created a sample table 'web_log' and added few rows in it using beeline.
I've added below in hadoop core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
And added below in hdfs-site.xml
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
I've added core-site.xml, hdfs-site.xml and hive-site.xml in $SPARK_HOME/conf in my local spark instance
core-site.xml and hdfs-site.xml are empty. i.e.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
</configuration>
hive-site.xml has below content
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://ec2-instance-external-dbs-name:9083</value>
<description>URI for client to contact metastore server</description>
</property>
</configuration>
I've started spark-shell and executed the following command
scala> sqlContext
res0: org.apache.spark.sql.SQLContext = org.apache.spark.sql.hive.HiveContext#57d0c779
It seems spark has created HiveContext.
I've executed sql using below command
scala> val df = sqlContext.sql("select * from web_log")
df: org.apache.spark.sql.DataFrame = [viewtime: int, userid: bigint, url: string, referrer: string, ip: string]
The columns and its types matches the sample table 'web_log' that I've created.
Now when I execute scala> df.show, it took some time and throws below error
16/11/21 18:46:17 WARN BlockReaderFactory: I/O error constructing remote block reader.
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/ec2-instance-private-ip:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
It seems DFSClient is using EC2 instances internal ip. And AFAIK, I didn't start any application on port 50010.
Do I need to install and start any other application?
How can make sure that DFSClient uses EC2 instance external IP or external DNS name?
Is it possible to access hive from external spark instance?
Add below code snippet to program which you are running ,
hiveContext.getConf.getAll.mkString("\n") this will print which hive metastore its connecting to... you can review all the properties which are not correct.
if they are not what you are looking for, and you cant adjust...
due to some limitations then as described the link. you can try
like this to point to correct uris... etc
hiveContext.setConf("hive.metastore.uris", "thrift://METASTOREl:9083");

Running Hive Query in Spark through Oozie 4.1.0.3

Getting table not found exception while running Hive Query in Spark using Oozie version 4.1.0.3, as java action.
Copied hive-site.xml and hive-default.xml from hdfs path
workflow.xml used:
<start to="scala_java"/>
<action name="scala_java">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive- site.xml</job-xml>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive-default.xml</value>
</property>
<property>
<name>pool.name</name>
<value>${etlPoolName}</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>${QUEUE_NAME}</value>
</property>
</configuration>
<main-class>org.apache.spark.deploy.SparkSubmit</main-class>
<arg>--master</arg>
<arg>yarn-cluster</arg>
<arg>--class</arg>
<arg>HiveFromSparkExample</arg>
<arg>--deploy-mode</arg>
<arg>cluster</arg>
<arg>--queue</arg>
<arg>testq</arg>
<arg>--num-executors</arg>
<arg>64</arg>
<arg>--executor-cores</arg>
<arg>5</arg>
<arg>--jars</arg>
<arg>datanucleus-api-jdo-3.2.6.jar,datanucleus-core-3.2.10.jar,datanucleus- rdbms-3.2.9.jar</arg>
<arg>TEST-0.0.2-SNAPSHOT.jar</arg>
<file>TEST-0.0.2-SNAPSHOT.jar</file>
</java>
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Table not found test_hive_spark_t1)
Exception in thread "Driver" org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found test_hive_spark_t1
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:79)
at org.apache.spark.sql.hive.HiveContext$$anon$1.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:255)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:137)
at org.apache.spark.sql.hive.HiveContext$$anon$1.lookupRelation(HiveContext.scala:255)
A. The X-default config files are just for user information; they are created at install time, from the hard-coded defaults in the JARs.
It's the X-site config files that contain useful information, e.g. how to connect to the Metastore (default for that is "just start an embedded Derby DB with no data inside"... might explain the "table not found message!
B. Hadoop components search for X-site config files in the CLASSPATH; and if they don't find them there, they silently fallback to default.
So you must tell Oozie to download them to local CWD via <file> instructions.
(Except for an explicit Hive Action that uses another, explicit, convention for its specific hive-site but that's not the case here)
hive-default.xml is not needed.
Create a custom hive-site.xml and which has hive.metastore.uris property alone.
Pass the custom hive-site.xml in --files hive-site.xml as spark Arguments.
Remove the job-xml property and oozie-hive-defaults.

HMaster not starting up

I have configured Hadoop 2.6.0 successfully. Next, I am trying to install Hbase 0.98.9 but am having trouble starting up Hbase.
I get the below error message:
Error: Could not find or load main class org.apache.hadoop.hbase.util.HBaseConfTool
Error: Could not find or load main class org.apache.hadoop.hbase.zookeeper.ZKServerTool
starting master, logging to /usr/local/hbase/logs/hbase-yarn-master-hadoopmaster.out
Error: Could not find or load main class org.apache.hadoop.hbase.master.HMaster
localhost:
starting regionserver, logging to /usr/local/hbase/bin/../logs/hbase-yarn-regionserver-hadoopmaster.out
localhost: Error: Could not find or load main class org.apache.hadoop.hbase.regionserver.HRegionServer
And, this is my hbase-site.xml file
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoopmaster:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/yarn/hbase/zookeeper</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
</configuration>
Please let me know what is wrong with my configuration.
Regards.
Add this line in hadoop-env.sh:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/path/to/hbase/jars
NOTE: Change /path/to/hbase/jars to hbase jars location. If possible add all available hbase jar files to hadoop classpath (to avoid future class problems).

Error on starting Hbase 1.0.0

I have just installed Hbase through brew install hbase. Edited hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/Cellar/hbase/databases/hbase-${user.name}/hbase</value>
<description>The directory shared by region servers and into
which HBase persists. The URL should be 'fully-qualified'
to include the filesystem scheme. For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase. By default HBase writes
into /tmp. Change this configuration else all data will be lost
on machine restart.
</description>
</property>
</configuration>
Exported JAVA_HOME and HBASE_HOME.
When i'm trying to start i m getting following exception:
Abhisheks-MacBook-Pro:bin abhishek$ start-hbase.sh
Error: Could not find or load main class org.apache.hadoop.hbase.util.HBaseConfTool
Error: Could not find or load main class org.apache.hadoop.hbase.zookeeper.ZKServerTool
starting master, logging to /usr/local/Cellar/hbase/1.0.0/logs/hbase-abhishek-master-Abhisheks-MacBook-Pro.local.out
Error: Could not find or load main class org.apache.hadoop.hbase.master.HMaster
cat: /usr/local/Cellar/hbase/1.0.0/conf/regionservers: No such file or directory
cat: /usr/local/Cellar/hbase/1.0.0/conf/regionservers: No such file or directory
I have Hadoop2.6.0 and Hbase1.0.0. Though i'm seeing many people have already faced this problem but i cannot find the solution. What else needs to be done to start Hbase without any issue?
Solution:
HBASE_HOME=/usr/local/Cellar/hbase/1.0.0/libexec
it should be configured such that conf folder lies in HBASE_HOME directory.
Checking master-status:
localhost:60010
edit hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/Cellar/hbase/databases/hbase-${user.name}/hbase</value>
<description>The directory shared by region servers and into
which HBase persists. The URL should be 'fully-qualified'
to include the filesystem scheme. For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase. By default HBase writes
into /tmp. Change this configuration else all data will be lost
on machine restart.
</description>
</property>
<property >
<name>hbase.master.port</name>
<value>60000</value>
<description>The port the HBase Master should bind to.</description>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
<description>The port for the HBase Master web UI.
Set to -1 if you do not want a UI instance run.</description>
</property>
</configuration>

Configuring HCatalog, WebHCat with Hive

I'm installing Hadoop, Hive to be integrated with WebHCat which will be used to run hive queries through it using Map-Reduce jobs of Hadoop.
I installed Hadoop 2.4.1 and Hive 0.13.0 (latest stable versions).
The request I'm sending using the web interface is:
POST: http://localhost:50111/templeton/v1/hive?user.name='hadoop'&statusdir='out'&execute='show tables'
And I got response as the following:
{
"id": "job_local229830426_0001"
}
But in the logs webhcat-console-error.log I find that exit value of this job is 1, which means some error occurred. Tracking this error I found it Missing argument for option: hiveconf
This is the webhcat-site.xml which contains the configurations of webhcat (known previously as templeton):
<configuration>
<property>
<name>templeton.port</name>
<value>50111</value>
<description>The HTTP port for the main server.</description>
</property>
<property>
<name>templeton.hive.path</name>
<value>/usr/local/hive/bin/hive</value>
<description>The path to the Hive executable.</description>
</property>
<property>
<name>templeton.hive.properties</name>
<value>hive.metastore.local=false,hive.metastore.uris=thrift://localhost:9933,hive.metastore.sasl.enabled=false</value>
<description>Properties to set when running hive.</description>
</property>
</configuration>
But the cmd query executed is weird as it have some additional hiveconf parameters with no values:
tool.TrivialExecService: Starting cmd: [/usr/local/hive/bin/hive, --service, cli, --hiveconf, --hiveconf, --hiveconf, hive.metastore.local=false, --hiveconf, hive.metastore.uris=thrift://localhost:9933, --hiveconf, hive.metastore.sasl.enabled=false, -e, show tables]
Any Idea?

Resources