How to map hive warehouse path in hive-site.xml? - hadoop

I'm new to ubuntu, and I'm trying to install hive 3.0.0 on my system.
I'm following a tutorial on internet and I came across this command.
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -mkdir /tmp
This command are used to store metadata. But I can't find those in filesystem.
So my question is that, what is the meaning of this commands?
And how to map that metadata on hive-site.xml?
Following is the hive-site.xml from my tutorial.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="confguration.xsl"?><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contri utor license agreements. See the NOTICE fle distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this fle to You under the Apache License, Version 2.0
(the "License"); you may not use this fle except in compliance with
the License. You may o tain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required y applicable law or agreed to in writing, software-->
<confguration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specifc SSL fag in
the connection URL.
For example, jdbc:postgresql://myhost/d ?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password#123</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.fxedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property>
</confguration>
**I'm using mysql to store metadata. Please explain necessary changes, if any.
THANKS IN ADVANCE!!!
**

This command are used to store metadata
Not quite. That's the Hive Warehouse directory on HDFS. This is where the raw data exists, not the metadata - that is entirely in Mysql.
HDFS is configured by the core-site.xml of the Hadoop client libraries required by Hive, not the hive-site.xml file
But I can't find those in filesystem.
Probably because those are HDFS paths, not places on your local filesystem
what is the meaning of this commands?
To create the default warehouse and temp scratch directories needed to store data to be used by Hive processes

Related

Unable to access databases created with hive and run queries on hue

I would like to use Hue as a visualization interface for hive, the server hiveserver 2 starts well and I can work in command without problem.
My hadoop is also functional (single node running on localhost), I managed to configure the hdfs files for hue and I can easily view hdfs files with the interface hue. but my big problem for weeks is to make a HIVE request with hue (even if I configured according to the research I found on the internet). I can not do it and get stuck on it
your help will be really appreciated.
this is hive-site.xml
<?xml version="1.0"?>
-<configuration>
-<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/hive_temp</value>
<description>Local scratch space for Hive jobs</description>
</property>
-<property>
<name>hive.execution.engine</name>
<value>mr</value>
<description> Expects one of [mr, tez, spark]. Chooses execution engine. Options are: mr (Map reduce, default)</description>
</property>
-<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true&useSSL=false</value>
<description>metadata is stored in a MySQL server</description>
</property>
-<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
-<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
-<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepassword</value>
<description>password to use against metastore database</description>
</property>
-<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive_tmp</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission</description>
</property>
-<property>
<name>hive.exec.scratchdir</name>
<value>/user/hive/warehouse</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission</description>
</property>
</configuration>
and hive configuration in HUE pseudo-distributed.ini
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=localhost
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10002
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/usr/local/hive/conf

HiveException java.lang.RuntimeException:

I'm getting this exception when I try to show databases.
vallabh#vallabh:~$ hive
hive> show databases;
FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Following is my .bashrc file
# Set HIVE_HOME
export HIVE_HOME=/opt/apache-hive-3.0.0-bin
export HIVE_CONF_DIR=/opt/apache-hive-3.0.0-bin/conf/
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/opt/hadoop-3.0.1/lib/*:.
export CLASSPATH=$CLASSPATH:/opt/apache-hive-3.0.0-bin/lib/*:
following is my hive-env.sh file
# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=/opt/hadoop-3.0.1
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/apache-hive-3.0.0-bin/conf
following is my hive-config.sh file
export HADOOP_HOME=/opt/hadoop-3.0.1
following is my hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="confguration.xsl"?><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contri utor license agreements. See the NOTICE fle distri uted with
this work for additional information regarding copyright ownership.
The ASF licenses this fle to You under the Apache License, Version 2.0
(the "License"); you may not use this fle except in compliance with
the License. You may o tain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required y applica le law or agreed to in writing, software-->
<confguration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jd c:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide data ase-specifc SSL fag in
the connection URL.
For example, jd c:postgresql://myhost/d ?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password#123</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.fxedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property>
</confguration>
What will be the possible cause of the error.?

How to get the URL for Hive Web Interface

Sorry, it may be a basic question. I tried to google it but couldn't find exact solution
I am trying to find out URL for my Hive web interface.
Through this I can check the tables present in it. With the help of the web interface URL I can also access the beeline command line interface
I am accessing my company's server for hadoop interface through putty.
I access hdfs web interface using
http://ibmlnx01:50070/
However when I try the below URLs, it doesn't show any web userinterface
http://ibmlnx01:9999/
http://ibmlnx01:10000/
http://0.0.0.0:9999/
http://0.0.0.0:10000
Below is my hive-default.xml.template
I couldn't copy the whole file. But copied the main code I hope its sufficient
<property>
<name>hive.hwi.war.file</name>
<value>lib/hive-hwi-0.12.0.war</value>
<description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description>
</property>
<property>
<name>hive.hwi.listen.host</name>
<value>0.0.0.0</value>
<description>This is the host address the Hive Web Interface will listen on</description>
</property>
<property>
<name>hive.hwi.listen.port</name>
<value>9999</value>
<description>This is the port the Hive Web Interface will listen on</description>
</property>
Below is the code for hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://volgalnx03.ad.infosys.com/metastore_db?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>user name for connecting to mysql server </description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>1234</value>
<description>password for connecting to mysql server </description>
</property>
</configuration>
I connect the putty terminal through 10.66.82.52 IP address. If this is of any help
Hue (Hadoop User Experience) is the web interface for analyzing data with Hadoop (and not just with Hive).
Hue's standard port is 8888.
However, that value may be different in your installation.
Look for the http_port entry in /etc/hue/conf/hue.ini if 8888 doesn't work for you.
just appending hwi helped me ;)
http://host:9999/hwi
Make sure u executed below line first.
$HIVE_HOME/bin/hive --service hwi
At first I tried to get ui using url http://host:9999/ and it returned 404.
Double check the installation steps in www.tutorialspoint.com/hive/hive_quick_guide.htm.
Then, make sure you have JDK on your server. Finally, you should run the execution command:
hive --service hwi
In addition, I inserted a link about Gareth's personal experience in the same situation.
https://dzone.com/articles/hadoop-hive-web-interface

Issues in saving bulk data in HBase in Pseudo-distributed mode

I am setting up CDH4 in a pseudo-distributed mode.
I have set up Hadoop, and as suggested on CDH4 installation guide, have also completed the hdfs demo successfully.
I have also set up, HIVE, & HBase.
To populate the data in Hbase, I have written a java client, which populates the bulk data in HBase (around 1M rows each in 4 tables).
Now I am facing two issues:
When java client is running to port the dummy data into hbase, the regionserver shut down after around 4,50,000 rows of data is entered in total.
Using Hive, I am not able to access tables created in HBase, or worst, even cannot create tables from hive shell. Though, the hbase shell shows me the data/table structure (whetever has been generated before regionserver shut down.)
I have seen other posts regarding same. Seems that the 2nd issue is related to my /etc/hosts or hive-site.xml. Thus, I am pasting contents of both of them.
/etc/hosts
198.251.79.225 u17162752.onlinehome-server.com u17162752
198.251.79.225 default-domain.com
198.251.79.225 hbase.zookeeper.quorum localhost
198.251.79.225 cloudera-vm # Added by NetworkManager
127.0.0.1 localhost.localdomain localhost
127.0.1.1 cloudera-vm-local localhost
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore</value>
<description>the URL of the MySQL database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mypassword</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
<property>
<name>hive.support.concurrency</name>
<description>Enable Hive's Table Lock Manager Service</description>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NOSASL</value>
</property>
</configuration>
These issue are holding me from accomplish the task, I am supposed to.
Thanks in advance
Abhiskek
PS: This is my first post to this forum, so apologies, for anything inappropriate, you might have found! Thanks for bearing with me.
Hi Tariq, Thanks for your response. I have somehow managed to get over this. Now, I am facing another issue.
I am having 4 tables in HBase already, for which I want to create external tables in hive shell. But on running create external table commands on hive shell gives following error:
'ERROR: org.apache.hadoop.hbase.client.NoServerForRegionException: No server address listed in -ROOT- for region .META.,,1.1028785192 containing row'
Also, this error appears when I do something in HBase shell.
The other error that comes with the former one, on hbase shell is related to zookeeper. Stacktrace:
'WARN zookeeper.ZKUtil: catalogtracker-on- org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation#6a9a56bf- 0x1413718482c0010 Unable to get data of znode /hbase/unassigned/1028785192
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/unassigned/1028785192'
Please help. Thanks!

Impala cannot find com.mysql.jdbc.Driver

I'm trying to set up Cloudera Impala with CDH4 in pseudo distributed mode on Red Hat 5. I have Hive using JDBC to connect to a MySQL metastore, but I'm having trouble setting up Impala with JDBC. I've been following the instructions found here: http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_impala_jdbc.html
I've extracted the JARs to a directory and included that directory in $CLASSPATH. I've also included /usr/lib/hive/lib in $CLASSPATH, which has mysql-connector-java-5.1.25-bin.jar.
In both my Hive and Impala conf directories, I have hive-site.xml including the following properties:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
But when I run sudo service impala-server restart, the server log has this error:
ERROR common.MetaStoreClientPool: Error initializing Hive Meta Store client
javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
Which it says is cause by this:
Caused by: org.datanucleus.store.rdbms.datasource.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
at org.datanucleus.store.rdbms.datasource.dbcp.DBCPDataSourceFactory.makePooledDataSource(DBCPDataSourceFactory.java:80)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initDataSourceTx(ConnectionFactoryImpl.java:144)
... 57 more
Is there any step I'm missing to configure Impala with JDBC?
I fixed this by copying mysql-connector-java-5.1.25-bin.jar to /var/lib/impala - the startup script was telling the classpath to look here for the connector jar for some reason.

Resources