Running Hive Query in Spark through Oozie 4.1.0.3

Running Hive Query in Spark through Oozie 4.1.0.3 - hadoop

Getting table not found exception while running Hive Query in Spark using Oozie version 4.1.0.3, as java action.
Copied hive-site.xml and hive-default.xml from hdfs path
workflow.xml used:
<start to="scala_java"/>
<action name="scala_java">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive- site.xml</job-xml>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive-default.xml</value>
</property>
<property>
<name>pool.name</name>
<value>${etlPoolName}</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>${QUEUE_NAME}</value>
</property>
</configuration>
<main-class>org.apache.spark.deploy.SparkSubmit</main-class>
<arg>--master</arg>
<arg>yarn-cluster</arg>
<arg>--class</arg>
<arg>HiveFromSparkExample</arg>
<arg>--deploy-mode</arg>
<arg>cluster</arg>
<arg>--queue</arg>
<arg>testq</arg>
<arg>--num-executors</arg>
<arg>64</arg>
<arg>--executor-cores</arg>
<arg>5</arg>
<arg>--jars</arg>
<arg>datanucleus-api-jdo-3.2.6.jar,datanucleus-core-3.2.10.jar,datanucleus- rdbms-3.2.9.jar</arg>
<arg>TEST-0.0.2-SNAPSHOT.jar</arg>
<file>TEST-0.0.2-SNAPSHOT.jar</file>
</java>
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Table not found test_hive_spark_t1)
Exception in thread "Driver" org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found test_hive_spark_t1
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:79)
at org.apache.spark.sql.hive.HiveContext$$anon$1.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:255)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:137)
at org.apache.spark.sql.hive.HiveContext$$anon$1.lookupRelation(HiveContext.scala:255)

A. The X-default config files are just for user information; they are created at install time, from the hard-coded defaults in the JARs.
It's the X-site config files that contain useful information, e.g. how to connect to the Metastore (default for that is "just start an embedded Derby DB with no data inside"... might explain the "table not found message!
B. Hadoop components search for X-site config files in the CLASSPATH; and if they don't find them there, they silently fallback to default.
So you must tell Oozie to download them to local CWD via <file> instructions.
(Except for an explicit Hive Action that uses another, explicit, convention for its specific hive-site but that's not the case here)

hive-default.xml is not needed.
Create a custom hive-site.xml and which has hive.metastore.uris property alone.
Pass the custom hive-site.xml in --files hive-site.xml as spark Arguments.
Remove the job-xml property and oozie-hive-defaults.

Related

Oozie on YARN - oozie is not allowed to impersonate hadoop

I'm trying to use Oozie from Java to start a job on a Hadoop cluster. I have very limited experience with Oozie on Hadoop 1 and now I'm struggling trying out the same thing on YARN.
I'm given a machine that doesn't belong to the cluster, so when I try to start my job I get the following exception:
E0501 : E0501: Could not perform authorization operation, User: oozie is not allowed to impersonate hadoop
Why is that and what to do?
I read a bit about core-site properties that need to be set
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>users</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>master</value>
</property>
Does it seem that this is the problem? Should I contact people responsible for cluster to fix that?
Could there be problems because I'm using same code for YARN as I did for Hadoop 1? Should something be changed? For example, I'm setting nameNode and jobTracker in workflow.xml, should jobTracker exist, since there is now ResourceManager? I have set the address of ResourceManager, but left the property name as jobTracker, could that be the error?
Maybe I should also mention that Ambari is used...

Hi please update the core-site.xml
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
and jobTracker address is the Resourcemananger address that will not be the case . once update the core-site.xml file it will works.

Reason:
Cause of this type of error is- You run oozie server as a hadoop user but you define oozie as a proxy user in core-site.xml file.
Solution:
change the ownership of oozie installation directory to oozie user and run oozie server as a oozie user and problem will be solved.

Error on starting Hbase 1.0.0

I have just installed Hbase through brew install hbase. Edited hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/Cellar/hbase/databases/hbase-${user.name}/hbase</value>
<description>The directory shared by region servers and into
which HBase persists. The URL should be 'fully-qualified'
to include the filesystem scheme. For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase. By default HBase writes
into /tmp. Change this configuration else all data will be lost
on machine restart.
</description>
</property>
</configuration>
Exported JAVA_HOME and HBASE_HOME.
When i'm trying to start i m getting following exception:
Abhisheks-MacBook-Pro:bin abhishek$ start-hbase.sh
Error: Could not find or load main class org.apache.hadoop.hbase.util.HBaseConfTool
Error: Could not find or load main class org.apache.hadoop.hbase.zookeeper.ZKServerTool
starting master, logging to /usr/local/Cellar/hbase/1.0.0/logs/hbase-abhishek-master-Abhisheks-MacBook-Pro.local.out
Error: Could not find or load main class org.apache.hadoop.hbase.master.HMaster
cat: /usr/local/Cellar/hbase/1.0.0/conf/regionservers: No such file or directory
cat: /usr/local/Cellar/hbase/1.0.0/conf/regionservers: No such file or directory
I have Hadoop2.6.0 and Hbase1.0.0. Though i'm seeing many people have already faced this problem but i cannot find the solution. What else needs to be done to start Hbase without any issue?

Solution:
HBASE_HOME=/usr/local/Cellar/hbase/1.0.0/libexec
it should be configured such that conf folder lies in HBASE_HOME directory.
Checking master-status:
localhost:60010
edit hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/Cellar/hbase/databases/hbase-${user.name}/hbase</value>
<description>The directory shared by region servers and into
which HBase persists. The URL should be 'fully-qualified'
to include the filesystem scheme. For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase. By default HBase writes
into /tmp. Change this configuration else all data will be lost
on machine restart.
</description>
</property>
<property >
<name>hbase.master.port</name>
<value>60000</value>
<description>The port the HBase Master should bind to.</description>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
<description>The port for the HBase Master web UI.
Set to -1 if you do not want a UI instance run.</description>
</property>
</configuration>

Problems trying to load hwi service in hive-1.1.0?

I have been teaching myself to use Hadoop (2.6.0) and associated applications in the case hive-1.1.0. I am run the hwi server using the information on Hadoop for Dummies page 237, but following the instructions there, I keep running into an error message which says the WAR file is not found in hive-1.1.0/lib.
I had to configure $HIVE_HOME/config/hive-site.xml file to point at where this WAR file is in hive-1.1.0/lib but when i run the command to start the hwi server, it does start but then breaks because in running this command, some of the lines in the path (which should come from my definition in hive-site.xml) are duplicated so the command cannot find the WAR file. I am attaching a screenshot of my hive-site.xml file and the results from what happens when I run the command hive --service hwi.
Relevant part of $HIVE_HOME/config/hive-site.xml file
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- Hive Execution Parameters -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/home/hadoop/Hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.hwi.war.file></name>
<value>$HIVE_HOME/lib/hive-hwi.0.12.0.war</value>
<description> This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
</configuration>
On this version of Hive, there was no WAR file, and I copied the hive-hwi.0.12.0.war from hive-0.12.0 as suggested
Results from the following:
[hadoop#fedora21_2 ~]$ hive --service hwi
15/04/05 15:53:02 INFO hwi.HWIServer: HWI is starting up
15/04/05 15:53:04 WARN conf.HiveConf: HiveConf of name hive.hwi.war.file> does not exist
15/04/05 15:53:04 FATAL hwi.HWIServer: HWI WAR file not found at /home/hadoop/hive-1.1.0/home/hadoop/hive-1.1.0/lib/hive-hwi-0.12.0.war
[hadoop#fedora21_2 ~]$
It looks as if when I ran the command to load the HWI service, somehow the command botched up the path to the WAR file as posted in hive-site.xml. Not sure what I am missing here.

change this property from:
<property>
<name>hive.hwi.war.file</name>
<value>{$HIVE_HOME}/lib/hive-hwi-[version].war</value>
</property>
to:
<property>
<name>hive.hwi.war.file</name>
<value>/lib/hive-hwi-[version].war</value>
</property>
You were having the problem because the final execution path became {$HIVE_HOME}/{$HIVE_HOME}/lib/hive-hwi-[version].war
This happened because you are already at the {$HIVE_HOME} directory while reading the configuration file.
So, if you remove {$HIVE_HOME} from your configuration, you get {$HIVE_HOME}/lib/hive-hwi-[version].war which is the correct path.
in your case, [version] = 0.12.0

make a folder inside hive
home/hadoop/hive-1.1.0/lib
and paste all the files of lib in this, then run the command
hive --service hwi
that will work will the time bug is fixed.

Configuring HCatalog, WebHCat with Hive

I'm installing Hadoop, Hive to be integrated with WebHCat which will be used to run hive queries through it using Map-Reduce jobs of Hadoop.
I installed Hadoop 2.4.1 and Hive 0.13.0 (latest stable versions).
The request I'm sending using the web interface is:
POST: http://localhost:50111/templeton/v1/hive?user.name='hadoop'&statusdir='out'&execute='show tables'
And I got response as the following:
{
"id": "job_local229830426_0001"
}
But in the logs webhcat-console-error.log I find that exit value of this job is 1, which means some error occurred. Tracking this error I found it Missing argument for option: hiveconf
This is the webhcat-site.xml which contains the configurations of webhcat (known previously as templeton):
<configuration>
<property>
<name>templeton.port</name>
<value>50111</value>
<description>The HTTP port for the main server.</description>
</property>
<property>
<name>templeton.hive.path</name>
<value>/usr/local/hive/bin/hive</value>
<description>The path to the Hive executable.</description>
</property>
<property>
<name>templeton.hive.properties</name>
<value>hive.metastore.local=false,hive.metastore.uris=thrift://localhost:9933,hive.metastore.sasl.enabled=false</value>
<description>Properties to set when running hive.</description>
</property>
</configuration>
But the cmd query executed is weird as it have some additional hiveconf parameters with no values:
tool.TrivialExecService: Starting cmd: [/usr/local/hive/bin/hive, --service, cli, --hiveconf, --hiveconf, --hiveconf, hive.metastore.local=false, --hiveconf, hive.metastore.uris=thrift://localhost:9933, --hiveconf, hive.metastore.sasl.enabled=false, -e, show tables]
Any Idea?

oozie Sqoop action fails to import data to hive

I am facing issue while executing oozie sqoop action.
In logs I can see that sqoop is able to import data to temp directory then sqoop creates hive scripts to import data.
It fails while importing temp data to hive.
In logs I am not getting any exception.
Below is a sqoop action I am using.
<workflow-app name="testSqoopLoadWorkflow" xmlns="uri:oozie:workflow:0.4">
<credentials>
<credential name='hive_credentials' type='hcat'>
<property>
<name>hcat.metastore.uri</name>
<value>${HIVE_THRIFT_URL}</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>${KERBEROS_PRINCIPAL}</value>
</property>
</credential>
</credentials>
<start to="loadSqoopDataAction"/>
<action name="loadSqoopDataAction" cred="hive_credentials">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/tmp/hive-oozie-site.xml</job-xml>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>/tmp/hive-oozie-site.xml</value>
</property>
</configuration>
<command>job --meta-connect ${SQOOP_METASTORE_URL} --exec TEST_SQOOP_LOAD_JOB</command>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
Below is a sqoop Job I am using to import data.
sqoop job --meta-connect ${SQOOP_METASTORE_URL} --create TEST_SQOOP_LOAD_JOB -- import --connect '${JDBC_URL}' --table testTable -m 1 --append --check-column pkId --incremental append --hive-import --hive-table testHiveTable;
In mapred logs I am getting following exception.
72285 [main] INFO org.apache.sqoop.hive.HiveImport - Loading uploaded data into Hive
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher ends
Please suggest.

This seems like a typical Sqoop import to Hive job. So it seems like Sqoop has successfully imported data in HDFS and is failing to load that data into Hive.
Here's some background on what's happening... Oozie launches a separate job (which will execute on any node in your hadoop cluster) to run the Sqoop command. The Sqoop command starts a separate job to load data into HDFS. Then, at the end of the Sqoop job, sqoop runs a hive script to load that data into Hive.
Since this is theoretically running from any node in your Hadoop cluster, hive CLI will need to be available on each node and talk to the same metastore. The Hive Metastore will need to run in remote mode.
The most normal problem is because Sqoop cannot talk to the correct metastore. The main reasons for this are normally:
Hive metastore service is not running. It should be running in remote mode and a separate service should be started. Here's a quick way to check if its running:
service hive-metastore status
hive-site.xml does not contain hive.metastore.uris. Here's an example hive-site.xml with hive.metastore.uris set:
<configuration>
...
<property>
<name>hive.metastore.uris</name>
<value>thrift://sqoop2.example.com:9083</value>
</property>
...
</configuration>
hive-site.xml is not included in your Sqoop action (or its properties). Try adding your hive-site.xml to a <file> element in your Sqoop action. Here's an example workflow.xml with <file> in it:
<workflow-app name="sqoop-to-hive" xmlns="uri:oozie:workflow:0.4">
...
<action name="sqoop2hive">
...
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
...
<file>/tmp/hive-site.xml#hive-site.xml</file>
</sqoop>
...
</action>
...
</workflow-app>

This seems to be a bug in Sqoop. Am not sure about the JIRA#. Hortonworks mentioned that the issue is still not resolved even in HDP 2.2 version.

#abeaamase - I want try to use your solution.
Just want to check if below solution works good for sqoop + Hive import in one single oozie job?
...
...
...
/tmp/hive-site.xml#hive-site.xml
...
...

If you are using cdh then problem may be due to hive metastore jar dependency conflicts.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Running Hive Query in Spark through Oozie 4.1.0.3 - hadoop

hive-default.xml is not needed. Create a custom hive-site.xml and which has hive.metastore.uris property alone. Pass the custom hive-site.xml in --files hive-site.xml as spark Arguments. Remove the job-xml property and oozie-hive-defaults.

Related

Oozie on YARN - oozie is not allowed to impersonate hadoop

Error on starting Hbase 1.0.0

Problems trying to load hwi service in hive-1.1.0?

Configuring HCatalog, WebHCat with Hive

oozie Sqoop action fails to import data to hive

Categories

Resources