I had a scenario, where I need to read a properties file (.xml) and update the value in profile file (eg: txt or .xml)
also there will be multiple profile files created with adding numbers for increment, need to read properties file value based on the name of the profile file and copy the value.
properties file (.xml):
<Properties>
<Node>
<AppnodeName>Node1</AppnodeName>
<Property>
<Name>//I-080_Control_CostCenter_test//FT.DEPLOYMENT.NAME</Name>
<Value>Value1</Value>
</Property>
<Property>
<Name>//I-080_Control_CostCenter_test//FT.DEPLOYMENT.VERSION</Name>
<Value>Value11</Value>
</Property>
</Node>
<Node>
<AppnodeName>Node2</AppnodeName>
<Property>
<Name>//I-080_Control_CostCenter_test//FT.DEPLOYMENT.NAME</Name>
<Value>Value2</Value>
</Property>
<Property>
<Name>//I-080_Control_CostCenter_test//FT.DEPLOYMENT.VERSION</Name>
<Value>Value12</Value>
</Property>
</Node>
</Properties>
====================================================
Profile file:
will be similar, need to search for the appnodename, name and update the Value.
Please help me
I upgraded hive version in cloudera vm to 2.3.2'. It is installed successfully and I copiedhive-site.xmlfile from older/hive/conffolder to the newerconffolder and there is no problem with the metastore. However when I am executing query like'drop table table_name'` then it throws below exception :
FAILED: SemanticException Unable to fetch table table_name. Invalid method name: 'get_table_req'
Below is my hive-site.xml file:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- Hive Configuration can either be stored in this file or in the hadoop configuration files -->
<!-- that are implied by Hadoop setup variables. -->
<!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive -->
<!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
<!-- resource). -->
<!-- Hive Execution Parameters -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>cloudera</value>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-0.8.1-cdh4.0.0.jar</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
</configuration>
Below are my bashrc variables:
#Setting hive variables
export HIVE_HOME="/usr/lib/apache-hive-2.3.2-bin"
export PATH="$HIVE_HOME/bin:$PATH"
NOTE: I am able to create tables but when I execute any select query it fails and throws the above exception. Where am I going wrong? Do I need to copy any other file as well ?? Thanks in advance.
check your metastore version and upprade metastore version
I installed Hadoop on windows and also setup hive. When I start hive using hive.cmd, I get the following error
16/12/28 18:14:05 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
It has not created the metastore_db folder in the hive\bin path.
I also tried using the schematool to initialize the schemas. But it gives me "'schematool' is not recognized as an internal or external command,
operable program or batch file."
My environment variables are as follows :
HIVE_BIN_PATH : C:\hadoop-2.7.1.tar\apache-hive-2.1.1-bin\bin
HIVE_HOME : C:\hadoop-2.7.1.tar\apache-hive-2.1.1-bin
HIVE_LIB : C:\hadoop-2.7.1.tar\apache-hive-2.1.1-bin\lib
PATH : C:\hadoop-2.7.1.tar\hadoop-2.7.1\bin;C:\apache\db-derby-10.12.1.1-bin\bin;C:\hadoop-2.7.1.tar\apache-hive-2.1.1-bin\bin;
Here is my hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.ClientDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.server2.enable.impersonation</name>
<value>true</value>
<description>Enable user impersonation for HiveServer2</description>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NONE</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>true</value>
</property>
</configuration>
I have added the derby.jar, derby-client.jar and derbytools.jar to the hive\lib folder. I have also added the slf4j-api-1.5.8.jar to the hive\lib folder. But it still does not work. Any pointers on this one?
how to check the partition location exist or not with oozie work flow using decision node.
example: /user/cloudera/year=2016/month=201609/day=20150912
in my hdfs location i will get one data set every day like above.i.e...year=2016/month=201609/day=20150912
with the help of coordination job i will get the date value
<property>
<name>today</name>
<value>${coord:formatTime(coord:dateOffset(coord:dateTzOffset(coord:nominalTime(), "America/Los_Angeles"), -1, 'DAY'), 'yyyyMMdd')}</value>
</property>
In my workflow with the help of decision node,how to check year=2016/month=201609/day=20150912 path exist or not?
You can use the HCatalog EL Functions from the oozie workflow EL functions:
The format to specify a hcatalog table partition URI is
hcat://[metastore server]:[port]/[database name]/[table name]/[partkey1]=[value];[partkey2]=[value].
For example:
hcat://foo:8020/mydb/mytable/region=us;dt=20121212
It seems like this is the location that you would want to check:
/user/cloudera/year=${YEAR}/month=${YEAR}${MONTH}/day=${YEAR}${MONTH}${DAY}
Of course you would correct these with the right offset where required.
Thank you for your prompt response #YoungHobbit and #Dennis Jaheruddin.
I wanted to use the decision node to check whether path is exist or not but not the URI.
I have found out that the coordinate job and workflow.xml helped me to achieve the solution.
coordinate_job.xml
<coordinator-app name="testemailjob" frequency="15" start="${jobStart}" end="${jobEnd}" timezone="America/Los_Angeles" xmlns="uri:oozie:coordinator:0.2" >
<controls>
<execution>FIFO</execution>
</controls>
<action>
<workflow>
<app-path>${test}</app-path>
<configuration>
<property>
<name>year</name>
<value>${coord:formatTime(coord:dateOffset(coord:dateTzOffset(coord:nominalTime(), "America/Los_Angeles"), -1, 'DAY'), 'yyyy')}</value>
</property>
<property>
<name>month</name>
<value>${coord:formatTime(coord:dateOffset(coord:dateTzOffset(coord:nominalTime(), "America/Los_Angeles"), -1, 'DAY'), 'yyyyMM')}</value>
</property>
<property>
<name>yesterday</name>
<value>${coord:formatTime(coord:dateOffset(coord:dateTzOffset(coord:nominalTime(), "America/Los_Angeles"), -1, 'DAY'), 'yyyyMMdd')}</value>
</property>
<property>
<name>today</name>
<value>${coord:formatTime(coord:dateTzOffset(coord:nominalTime(), "America/Los_Angeles"), 'yyyyMMdd')}</value>
</property>
<property>
<name>oozie.use.system.libpath</name>
<value>True</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
My workflow.xml :
<workflow-app name= ......>
...........................
...............................
<decision name="CheckFile">
<switch>
<case to="nextOozieTask">
${fs:exists(concat(concat(concat(concat(concat(concat(nameNode, path),year),"/month="),month),"/day="),today))}
</case>
<case to="nextOozieTask1">
${fs:exists(concat(concat(concat(concat(concat(concat(nameNode, path),year),'/month='),month),'/day='),yesterday))}
</case>
<default to="MailActionFileMissing" />
</switch>
</decision>
....................
......................
</workflow-app>
I would like to pass hive set commands into all hql calling in Oozie scripts. I have many hql and I would like to pass the hive parameters to each hql.I used to write all the set commands in each hql file now I would like to keep in work flow level. Can any one suggest If I am doing something wrong.
I have put part of my workflow. when executing the jobs I don't see the hive parameters are not propagated and hence jobs are failing.
<workflow-app name="WF_AMLMKTM_L1_LOAD" xmlns="uri:oozie:workflow:0.5">
<global>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>hive.exec.parallel</name>
<value>true</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>hive.exec.dynamic.partition</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
</configuration>
</global>
<action name="map_prc_stg_l1_load_com" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<jdbc-url>${hive2_jdbc_url}</jdbc-url>
<script>${basepath}/applications/stg_l1_load_com.hql</script>
<param>basepath=${basepath}</param>
<param>runsk=${wf:actionData('runsk_gen')['runsk']}</param>
I think you can add it as below.
... <argument>--hiveconf</argument>
<argument>hive.exec.dynamic.partition.mode=nonstrict</argument>
<argument>--hiveconf</argument>
<argument>hive.exec.dynamic.partition=true</argument>
Put all your hive related configurations in hive-site.xml and pass it with hive action using
<job-xml>[HIVE SETTINGS FILE]</job-xml>
https://oozie.apache.org/docs/4.2.0/DG_Hive2ActionExtension.html