java.net.URISyntaxException when starting HIVE - hadoop

I am new in HIVE.
I have already set up hadoop and it works well, and I want to set up Hive.
When I start hive , it shows an error as
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
Are there any solutions?

Put the following at the beginning of hive-site.xml
<property>
<name>system:java.io.tmpdir</name>
<value>/tmp/hive/java</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>
See also question

Change in hfs-site.xml this properties
<name>hive.exec.scratchdir</name>
<value>/tmp/hive-${user.name}</value>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/${user.name}</value>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/${user.name}_resources</value>
<name>hive.scratch.dir.permission</name>
<value>733</value>
restart hive metastore and hiveserver2

I figure it out myself.
In the hive-site.xml, replace ${system:java.io.tmpdir}/${system:user.name} by /tmp/mydir as what has been told in https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration.

Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
system:java.io.tmpdir - path
system:user.name - username
Above properties are system level properties which need to set by user, So hive site template didn't provide these, required manual configuration.
Set the above properties like using property tag with name value key pair in hive-site.xml, Its upto user level to choose the location of temp
<property>
<name>system:java.io.tmpdir</name>
<value>/user/local/hive/tmp/java</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>

add property in hive-site.xml
<configuration>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>Will remove your error occurring because of metastore_db in shark</description>
</property>
</configuration>
add java and hadoop path in hive-env.sh according to your system.
# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=/home/user17/BigData/hadoop
#hive
export HIVE_HOME=/home/user17/BigData/hive
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=$HIVE_HOME/conf
and also set hive and hadoop path in .bashrc
export JAVA_HOME=/home/user17/jdk
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_INSTALL=/home/user17/BigData/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HIVE_INSTALL=/home/user17/BigData/hive
export PATH=$PATH:$HIVE_INSTALL/bin
Note-- this all files path are set according to my system , you should give path according to your system.
let me know if not work

I too have encountered the same error while starting HMaster for Hbase.
this was corrected by specfying the path to directory on hdfs where you want to store hbase data in hbase.rootdir property of hbase-site.xml
earlier i was using only relative path.
path causing exception : hdfs://localhost:8020
correct path : hdfs://localhost:8020/hbase

Update the local: /tmp absolute temporary path too in hive-site.xml as it's not picking automatically, so I've added manually for property: (hive.exec.local.scratchdir and hive.downloaded.resources.dir)
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/${user.name}</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
now it's working....

Related

Hadoop name node format warning

When I execute the order
"bin/hadoop namenode -format"
in Linux, I got the below warning,
"WARN common.Util: Path /data/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration."
the namenode dir setting in the file hdfs-site.xml is
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/dfs/name</value>
<final>true</final>
</property>
when I changed it to
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/dfs/name</value>
<final>true</final>
</property>
the warning disappeared, so what is the meaning of "file://", why should we add it there?
It's a major bug https://issues.apache.org/jira/browse/HADOOP-15772 and fixed in this commit https://github.com/apache/hadoop/commit/2eb597b1511f8f46866abe4eeec820f4191cc295
You don't need to worry if you hit this issue/bug. It's perfectly fine and ignore this warning.
The description goes like this.
The following warnings are logged on service startup and are noise. It is perfectly valid to list local paths without using the URI syntax.
2018-09-16 23:16:11,393 WARN common.Util (Util.java:stringAsURI(99)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
Also, Log level has changed from WARNING to INFO with this message
Assuming 'file' scheme for path /hadoop/hdfs/namenode in
configuration.

Running Hive Query in Spark through Oozie 4.1.0.3

Getting table not found exception while running Hive Query in Spark using Oozie version 4.1.0.3, as java action.
Copied hive-site.xml and hive-default.xml from hdfs path
workflow.xml used:
<start to="scala_java"/>
<action name="scala_java">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive- site.xml</job-xml>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive-default.xml</value>
</property>
<property>
<name>pool.name</name>
<value>${etlPoolName}</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>${QUEUE_NAME}</value>
</property>
</configuration>
<main-class>org.apache.spark.deploy.SparkSubmit</main-class>
<arg>--master</arg>
<arg>yarn-cluster</arg>
<arg>--class</arg>
<arg>HiveFromSparkExample</arg>
<arg>--deploy-mode</arg>
<arg>cluster</arg>
<arg>--queue</arg>
<arg>testq</arg>
<arg>--num-executors</arg>
<arg>64</arg>
<arg>--executor-cores</arg>
<arg>5</arg>
<arg>--jars</arg>
<arg>datanucleus-api-jdo-3.2.6.jar,datanucleus-core-3.2.10.jar,datanucleus- rdbms-3.2.9.jar</arg>
<arg>TEST-0.0.2-SNAPSHOT.jar</arg>
<file>TEST-0.0.2-SNAPSHOT.jar</file>
</java>
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Table not found test_hive_spark_t1)
Exception in thread "Driver" org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found test_hive_spark_t1
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:79)
at org.apache.spark.sql.hive.HiveContext$$anon$1.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:255)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:137)
at org.apache.spark.sql.hive.HiveContext$$anon$1.lookupRelation(HiveContext.scala:255)
A. The X-default config files are just for user information; they are created at install time, from the hard-coded defaults in the JARs.
It's the X-site config files that contain useful information, e.g. how to connect to the Metastore (default for that is "just start an embedded Derby DB with no data inside"... might explain the "table not found message!
B. Hadoop components search for X-site config files in the CLASSPATH; and if they don't find them there, they silently fallback to default.
So you must tell Oozie to download them to local CWD via <file> instructions.
(Except for an explicit Hive Action that uses another, explicit, convention for its specific hive-site but that's not the case here)
hive-default.xml is not needed.
Create a custom hive-site.xml and which has hive.metastore.uris property alone.
Pass the custom hive-site.xml in --files hive-site.xml as spark Arguments.
Remove the job-xml property and oozie-hive-defaults.

default.fs.name and hive.metastore.warehouse.dir do not conflict

Hi When I try to run the below command
Load data Inpath '/data' into Table Tablename;
in hive shell it throws following error
Move from: hdfs://hadoopcluster/data to: file:/user/hive/warehouse/Tablename is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict.
where my default.fs.name property is
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoopcluster</value>
</property>
where my hive.metastore.warehouse.dir is
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
Can any one help me in this?
This is because you are using "local" storage location /user/hive/warehouse for your Hive metastore that conflicts with the defaultFS (per Hive).
Do you mean to be using "local" storage, or HDFS?
To use HDFS for the Hive metastore setting you need to specify the full HDFS URI for that storage:
hdfs://hadoopcluster/user/hive/warehouse

Error on starting Hbase 1.0.0

I have just installed Hbase through brew install hbase. Edited hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/Cellar/hbase/databases/hbase-${user.name}/hbase</value>
<description>The directory shared by region servers and into
which HBase persists. The URL should be 'fully-qualified'
to include the filesystem scheme. For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase. By default HBase writes
into /tmp. Change this configuration else all data will be lost
on machine restart.
</description>
</property>
</configuration>
Exported JAVA_HOME and HBASE_HOME.
When i'm trying to start i m getting following exception:
Abhisheks-MacBook-Pro:bin abhishek$ start-hbase.sh
Error: Could not find or load main class org.apache.hadoop.hbase.util.HBaseConfTool
Error: Could not find or load main class org.apache.hadoop.hbase.zookeeper.ZKServerTool
starting master, logging to /usr/local/Cellar/hbase/1.0.0/logs/hbase-abhishek-master-Abhisheks-MacBook-Pro.local.out
Error: Could not find or load main class org.apache.hadoop.hbase.master.HMaster
cat: /usr/local/Cellar/hbase/1.0.0/conf/regionservers: No such file or directory
cat: /usr/local/Cellar/hbase/1.0.0/conf/regionservers: No such file or directory
I have Hadoop2.6.0 and Hbase1.0.0. Though i'm seeing many people have already faced this problem but i cannot find the solution. What else needs to be done to start Hbase without any issue?
Solution:
HBASE_HOME=/usr/local/Cellar/hbase/1.0.0/libexec
it should be configured such that conf folder lies in HBASE_HOME directory.
Checking master-status:
localhost:60010
edit hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/Cellar/hbase/databases/hbase-${user.name}/hbase</value>
<description>The directory shared by region servers and into
which HBase persists. The URL should be 'fully-qualified'
to include the filesystem scheme. For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase. By default HBase writes
into /tmp. Change this configuration else all data will be lost
on machine restart.
</description>
</property>
<property >
<name>hbase.master.port</name>
<value>60000</value>
<description>The port the HBase Master should bind to.</description>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
<description>The port for the HBase Master web UI.
Set to -1 if you do not want a UI instance run.</description>
</property>
</configuration>

Tachyon configuration for s3 under filesystem

I am trying to set up Tachyon on S3 filesystem. For HDFS, tachyon has a parameter called TACHYON_UNDERFS_HDFS_IMPL which is set to "org.apache.hadoop.hdfs.DistributedFileSystem". Does anyone know if such a parameter exists for S3? If so, what is its value?
Thanks in advance for any help!
Hadoop FS type you mentioned (org.apache.hadoop.hdfs.DistributedFileSystem) is just the interface and it fits your needs. Instead, Tachyon create the s3n FileSystem implementation basing on scheme specified in the uri of remote dfs which is configured with TACHYON_UNDERFS_ADDRESS.
For Amazon, you will need to specify something like this:
export TACHYON_UNDERFS_ADDRESS=s3n://your_bucket
Note "s3n", not "s3" here.
Additional setup you will need to work with s3 (see also
Error in setting up Tachyon on S3 under filesystem and http://tachyon-project.org/Setup-UFS.html):
in ${TACHYON}/bin/tachyon-env.sh: add key id and the secret key to TACHYON_JAVA_OPTS:
-Dfs.s3n.awsAccessKeyId=123
-Dfs.s3n.awsSecretAccessKey=456
Publish extra dependencies required by s3n Hadoop FileSystem implementation, the version depends on the version of Hadoop installed. These are : commons-httpclients-* and jets3t-*.
For that, publish the TACHYON_CLASSPATH as mentioned in one of links above. This can be done by adding export of TACHYON_CLASSPATH in ${TACHYON}/libexec/tachyon-config.sh before exporting CLASSPATH:
export TACHYON_CLASSPATH=~/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:~/.m2/repository/net/java/dev/jets3t/jets3t/0.9.0/jets3t-0.9.0.jar
export CLASSPATH="$TACHYON_CONF_DIR/:$TACHYON_JAR:$TACHYON_CLASSPATH":
Start Tachyon cluster:
./bin/tachyon format
./bin/tachyon-start.sh local
Check its availability via web interface:
http://localhost:19999/
in logs:
${TACHYON}/logs
Your core-site.xml should contain following sections to make sure you are integrated with Tachyon (see Spark reference http://tachyon-project.org/Running-Spark-on-Tachyon.html for configuration right from scala)
fs.defaultFS - specify the Tachyon master host-port (below are defaults)
fs.default.name - default name of fs, the same as before
fs.tachyon.impl - Tachyon's hadoop.FileSystem implementation hint
fs.s3n.awsAccessKeyId - Amazon key id
fs.s3n.awsSecretAccessKey - Amazon secret key
<configuration>
<property>
<name>fs.defaultFS</name>
<value>tachyon://localhost:19998</value>
</property>
<property>
<name>fs.default.name</name>
<value>tachyon://localhost:19998</value>
<description>The name of the default file system. A URI
whose scheme and authority determine the
FileSystem implementation.
</description>
</property>
<property>
<name>fs.tachyon.impl</name>
<value>tachyon.hadoop.TFS</value>
</property>
...
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>123</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>345</value>
</property>
...
</configuration>
Refer to any path using tachyon scheme and master host port:
tachyon://master_host:master_port/path
Example with default Tachyon master host-port:
tachyon://localhost:19998/remote_dir/remote_file.csv

Resources