default.fs.name and hive.metastore.warehouse.dir do not conflict - hadoop

Hi When I try to run the below command
Load data Inpath '/data' into Table Tablename;
in hive shell it throws following error
Move from: hdfs://hadoopcluster/data to: file:/user/hive/warehouse/Tablename is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict.
where my default.fs.name property is
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoopcluster</value>
</property>
where my hive.metastore.warehouse.dir is
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
Can any one help me in this?

This is because you are using "local" storage location /user/hive/warehouse for your Hive metastore that conflicts with the defaultFS (per Hive).
Do you mean to be using "local" storage, or HDFS?
To use HDFS for the Hive metastore setting you need to specify the full HDFS URI for that storage:
hdfs://hadoopcluster/user/hive/warehouse

Related

unable to connect to sparkSQL

I am using remote mysql metastore for hive. when i run hive client it runs perfect. but when i try to use spark-sql either via spark-shell or by spark-submit i am not able to connect to hive. & getting following error :
Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.derby.jdbc.EmbeddedDriver
I am not getting why spark tries to connect derby database while i am using mysql database for metastore.
i am using apache spark version 1.3 & cloudera version CDH 5.4.8
It seems spark is using default hive settings, follow these steps:
Copy or create soft-link of hive-site.xml to your SPARK_HOME/conf folder.
Add hive lib path to classpath in SPARK_HOME/conf/spark-env.sh
Restart the Spark cluster for everything to take effect.
I believe your hive-site.xml has location of MYSQL metastore? if not, follow these steps and restart spark-shell:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://MYSQL_HOST:3306/hive_{version}</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore/description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>XXXXXXXX</value>
<description>Username to use against metastore database/description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>XXXXXXXX</value>
<description>Password to use against metastore database/description>
</property>

Running Hive Query in Spark through Oozie 4.1.0.3

Getting table not found exception while running Hive Query in Spark using Oozie version 4.1.0.3, as java action.
Copied hive-site.xml and hive-default.xml from hdfs path
workflow.xml used:
<start to="scala_java"/>
<action name="scala_java">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive- site.xml</job-xml>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive-default.xml</value>
</property>
<property>
<name>pool.name</name>
<value>${etlPoolName}</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>${QUEUE_NAME}</value>
</property>
</configuration>
<main-class>org.apache.spark.deploy.SparkSubmit</main-class>
<arg>--master</arg>
<arg>yarn-cluster</arg>
<arg>--class</arg>
<arg>HiveFromSparkExample</arg>
<arg>--deploy-mode</arg>
<arg>cluster</arg>
<arg>--queue</arg>
<arg>testq</arg>
<arg>--num-executors</arg>
<arg>64</arg>
<arg>--executor-cores</arg>
<arg>5</arg>
<arg>--jars</arg>
<arg>datanucleus-api-jdo-3.2.6.jar,datanucleus-core-3.2.10.jar,datanucleus- rdbms-3.2.9.jar</arg>
<arg>TEST-0.0.2-SNAPSHOT.jar</arg>
<file>TEST-0.0.2-SNAPSHOT.jar</file>
</java>
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Table not found test_hive_spark_t1)
Exception in thread "Driver" org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found test_hive_spark_t1
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:79)
at org.apache.spark.sql.hive.HiveContext$$anon$1.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:255)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:137)
at org.apache.spark.sql.hive.HiveContext$$anon$1.lookupRelation(HiveContext.scala:255)
A. The X-default config files are just for user information; they are created at install time, from the hard-coded defaults in the JARs.
It's the X-site config files that contain useful information, e.g. how to connect to the Metastore (default for that is "just start an embedded Derby DB with no data inside"... might explain the "table not found message!
B. Hadoop components search for X-site config files in the CLASSPATH; and if they don't find them there, they silently fallback to default.
So you must tell Oozie to download them to local CWD via <file> instructions.
(Except for an explicit Hive Action that uses another, explicit, convention for its specific hive-site but that's not the case here)
hive-default.xml is not needed.
Create a custom hive-site.xml and which has hive.metastore.uris property alone.
Pass the custom hive-site.xml in --files hive-site.xml as spark Arguments.
Remove the job-xml property and oozie-hive-defaults.

Error on starting Hbase 1.0.0

I have just installed Hbase through brew install hbase. Edited hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/Cellar/hbase/databases/hbase-${user.name}/hbase</value>
<description>The directory shared by region servers and into
which HBase persists. The URL should be 'fully-qualified'
to include the filesystem scheme. For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase. By default HBase writes
into /tmp. Change this configuration else all data will be lost
on machine restart.
</description>
</property>
</configuration>
Exported JAVA_HOME and HBASE_HOME.
When i'm trying to start i m getting following exception:
Abhisheks-MacBook-Pro:bin abhishek$ start-hbase.sh
Error: Could not find or load main class org.apache.hadoop.hbase.util.HBaseConfTool
Error: Could not find or load main class org.apache.hadoop.hbase.zookeeper.ZKServerTool
starting master, logging to /usr/local/Cellar/hbase/1.0.0/logs/hbase-abhishek-master-Abhisheks-MacBook-Pro.local.out
Error: Could not find or load main class org.apache.hadoop.hbase.master.HMaster
cat: /usr/local/Cellar/hbase/1.0.0/conf/regionservers: No such file or directory
cat: /usr/local/Cellar/hbase/1.0.0/conf/regionservers: No such file or directory
I have Hadoop2.6.0 and Hbase1.0.0. Though i'm seeing many people have already faced this problem but i cannot find the solution. What else needs to be done to start Hbase without any issue?
Solution:
HBASE_HOME=/usr/local/Cellar/hbase/1.0.0/libexec
it should be configured such that conf folder lies in HBASE_HOME directory.
Checking master-status:
localhost:60010
edit hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/Cellar/hbase/databases/hbase-${user.name}/hbase</value>
<description>The directory shared by region servers and into
which HBase persists. The URL should be 'fully-qualified'
to include the filesystem scheme. For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase. By default HBase writes
into /tmp. Change this configuration else all data will be lost
on machine restart.
</description>
</property>
<property >
<name>hbase.master.port</name>
<value>60000</value>
<description>The port the HBase Master should bind to.</description>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
<description>The port for the HBase Master web UI.
Set to -1 if you do not want a UI instance run.</description>
</property>
</configuration>

java.net.URISyntaxException when starting HIVE

I am new in HIVE.
I have already set up hadoop and it works well, and I want to set up Hive.
When I start hive , it shows an error as
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
Are there any solutions?
Put the following at the beginning of hive-site.xml
<property>
<name>system:java.io.tmpdir</name>
<value>/tmp/hive/java</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>
See also question
Change in hfs-site.xml this properties
<name>hive.exec.scratchdir</name>
<value>/tmp/hive-${user.name}</value>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/${user.name}</value>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/${user.name}_resources</value>
<name>hive.scratch.dir.permission</name>
<value>733</value>
restart hive metastore and hiveserver2
I figure it out myself.
In the hive-site.xml, replace ${system:java.io.tmpdir}/${system:user.name} by /tmp/mydir as what has been told in https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration.
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
system:java.io.tmpdir - path
system:user.name - username
Above properties are system level properties which need to set by user, So hive site template didn't provide these, required manual configuration.
Set the above properties like using property tag with name value key pair in hive-site.xml, Its upto user level to choose the location of temp
<property>
<name>system:java.io.tmpdir</name>
<value>/user/local/hive/tmp/java</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>
add property in hive-site.xml
<configuration>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>Will remove your error occurring because of metastore_db in shark</description>
</property>
</configuration>
add java and hadoop path in hive-env.sh according to your system.
# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=/home/user17/BigData/hadoop
#hive
export HIVE_HOME=/home/user17/BigData/hive
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=$HIVE_HOME/conf
and also set hive and hadoop path in .bashrc
export JAVA_HOME=/home/user17/jdk
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_INSTALL=/home/user17/BigData/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HIVE_INSTALL=/home/user17/BigData/hive
export PATH=$PATH:$HIVE_INSTALL/bin
Note-- this all files path are set according to my system , you should give path according to your system.
let me know if not work
I too have encountered the same error while starting HMaster for Hbase.
this was corrected by specfying the path to directory on hdfs where you want to store hbase data in hbase.rootdir property of hbase-site.xml
earlier i was using only relative path.
path causing exception : hdfs://localhost:8020
correct path : hdfs://localhost:8020/hbase
Update the local: /tmp absolute temporary path too in hive-site.xml as it's not picking automatically, so I've added manually for property: (hive.exec.local.scratchdir and hive.downloaded.resources.dir)
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/${user.name}</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
now it's working....

Configuring HCatalog, WebHCat with Hive

I'm installing Hadoop, Hive to be integrated with WebHCat which will be used to run hive queries through it using Map-Reduce jobs of Hadoop.
I installed Hadoop 2.4.1 and Hive 0.13.0 (latest stable versions).
The request I'm sending using the web interface is:
POST: http://localhost:50111/templeton/v1/hive?user.name='hadoop'&statusdir='out'&execute='show tables'
And I got response as the following:
{
"id": "job_local229830426_0001"
}
But in the logs webhcat-console-error.log I find that exit value of this job is 1, which means some error occurred. Tracking this error I found it Missing argument for option: hiveconf
This is the webhcat-site.xml which contains the configurations of webhcat (known previously as templeton):
<configuration>
<property>
<name>templeton.port</name>
<value>50111</value>
<description>The HTTP port for the main server.</description>
</property>
<property>
<name>templeton.hive.path</name>
<value>/usr/local/hive/bin/hive</value>
<description>The path to the Hive executable.</description>
</property>
<property>
<name>templeton.hive.properties</name>
<value>hive.metastore.local=false,hive.metastore.uris=thrift://localhost:9933,hive.metastore.sasl.enabled=false</value>
<description>Properties to set when running hive.</description>
</property>
</configuration>
But the cmd query executed is weird as it have some additional hiveconf parameters with no values:
tool.TrivialExecService: Starting cmd: [/usr/local/hive/bin/hive, --service, cli, --hiveconf, --hiveconf, --hiveconf, hive.metastore.local=false, --hiveconf, hive.metastore.uris=thrift://localhost:9933, --hiveconf, hive.metastore.sasl.enabled=false, -e, show tables]
Any Idea?

Resources