SequenceFile Input format not recognized in Oozie workflow.xml? - hadoop

I have a MR program which runs perfectly on a bunch of SequenceFile's and output is as expected.
When I try to achieve the same via an Oozie WorkFlow for some reason the InputFormat class property is not recognized and I feel the input is considered as default TextInputFormat only.
Here is how the mapper is declared. SequenceFile key is LongWritable and value is Text.
public static class FeederCounterMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
// setup map function for stripping the feeder for a zone from the input
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
final int count = 1;
// convert input rec to string
String inRec = value.toString();
System.out.println("Feeder:" + inRec);
// strip out the feeder from record
String feeder = inRec.substring(3, 7);
// write the key+value as map output
context.write(new Text(feeder), new IntWritable(count));
}
}
The workflow layout for my application is as below
/{$namenode}/workflow.xml
/{$namenode}/lib/FeederCounterDriver.jar
The below is my workflow.xml. The $namenode, $jobtracker, $outputdir, $inputdir are defined in the job.properties file.
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/${outputDir}"/>
</prepare>
<configuration>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/flume/events/sincal*</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${outputDir}</value>
</property>
<property>
<name>mapred.input.format.class</name>
<value>org.apache.hadoop.mapred.SequenceFileInputFormat</value>
</property>
<property>
<name>mapred.output.format.class</name>
<value>org.apache.hadoop.mapred.TextOutputFormat</value>
</property>
<property>
<name>mapred.input.key.class</name>
<value>org.apache.hadoop.io.LongWritable</value>
</property>
<property>
<name>mapred.input.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>org.poc.hadoop121.gissincal.FeederCounterDriver$FeederCounterMapper</value>
</property>
<property>
<name>mapreduce.reduce.class</name>
<value>org.poc.hadoop121.gissincal.FeederCounterDriver$FeederCounterReducer</value>
</property>
<property>
<name>mapreduce.map.tasks</name>
<value>1</value>
</property>
</configuration>
</map-reduce>
A snippet of the stout(first 2 lines) when I run the MR job is
Feeder:00107371PA1700TEET67576 LKHS 5666LH 2.....
Feeder:00107231PA1300TXDS 8731TX 1FSHS 8731FH 1.....
A snippet of the output(first 3 lines) when I run using Ooozie work flow is
Feeder:SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.Text�������b'b��X�...
Feeder:��00105271PA1000FSHS 2255FH 1TXDS 2255TX 1.....
Feeder:��00103171PA1800LKHS 3192LH 2LKHS 2335LH 1.....
With the above output from the Oozie workflow I highly doubt the input format SequenceFileInputFormat mentioned in the workflow.xml is even considered, else I feel this is overridden.
Any inputs towards this would help. Thanks

Find the job.xml created for this mapreduce job in job tracker and see what is the input format class being set there. This will confirm whether it is a problem with input format or not.

I had a really similar problem and I got oozie to use the proper input format by setting my property like this
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat</value>
</property>
So one dot to remove from the property name (check for your version) and the class to change too.

Related

org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Multiple 'property' definitions

I'm new to Spring and trying to get familiar with the concepts. My purpose is to create multiple instances of the below-mentioned class.
Item.java
public class Item {
private int itemID;
private String itemName;
public int getItemID() {
return itemID;
}
public void setItemID(int itemID) {
this.itemID = itemID;
}
public String getItemName() {
return itemName;
}
public void setItemName(String itemName) {
this.itemName = itemName;
}
#Override
public String toString() {
return itemName;
}
}
In the config.xml I'm trying to set the property values in a below-mentioned way.
<bean name="item" class="com.manasa.spring.springcore.task1.Item">
<property name="itemID">
<value>1</value>
</property>
<property name="itemName">
<value>Sandisk Pendrive</value>
</property>
<property name="itemID">
<value>2</value>
</property>
<property name="itemName">
<value>Dell Keyboard</value>
</property>
</bean>
<bean name="cart" class="com.manasa.spring.springcore.task1.ShoppingCart"
p:id="1">
<property name="items">
<map>
<entry key-ref="item">
<value>2</value>
</entry>
<entry key-ref="item">
<value>1</value>
</entry>
</map>
</property>
</bean>
By doing so, I am facing this issue:
> Exception in thread "main" org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Unexpected failure during bean definition parsing
Offending resource: class path resource [com/manasa/spring/springcore/task1/mapconfig.xml]
Bean 'item'; nested exception is org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Multiple 'property' definitions for property 'itemID'
Offending resource: class path resource [com/manasa/spring/springcore/task1/mapconfig.xml]
Bean 'item'
-> Property 'itemID'
at org.springframework.beans.factory.parsing.FailFastProblemReporter.error(FailFastProblemReporter.java:70)
at org.springframework.beans.factory.parsing.ReaderContext.error(ReaderContext.java:85)
at org.springframework.beans.factory.xml.BeanDefinitionParserDelegate.error(BeanDefinitionParserDelegate.java:308)
at org.springframework.beans.factory.xml.BeanDefinitionParserDelegate.parseBeanDefinitionElement(BeanDefinitionParserDelegate.java:562)
Could anyone please suggest on how to achieve this?
I guess the problem is here :
<property name="itemID">
<value>1</value>
</property>
<property name="itemName">
<value>Sandisk Pendrive</value>
</property>
<property name="itemID">
<value>2</value>
</property>
<property name="itemName">
<value>Dell Keyboard</value>
</property>
I don't think it is allowed to set values for the same properties several times. When Spring parses this config it actually calls setXXX (appropriate setter) and you are not allowed to reassign values for properties in XML config.
So you need to remove duplicates. Result :
<bean name="item" class="com.manasa.spring.springcore.task1.Item">
<property name="itemID">
<value>1</value>
</property>
<property name="itemName">
<value>Sandisk Pendrive</value>
</property>
</bean>
And if you need several instances (more Item objects) you need to create more beans (add more <bean> ... </bean> sections). E.g.
<bean id="someOtherInstance" name="someOtherInstance" class="com.manasa.spring.springcore.task1.Item">
<property name="itemID">
<value>123</value>
</property>
<property name="itemName">
<value>Some Other Value</value>
</property>
</bean>
Remember that you need to give them different ids (names) so Spring could distinguish.

Spark not reading hive-site.xml?

I am trying to access hive metastore and I am using SparkSql for this . I have setup sparksession , but when I run my program and see log I see this exception
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
... 61 more
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
... 62 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
... 68 more
Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to create database 'metastore_db', see the next exception for details.
I am running a servlet which accesses following code
public class HiveReadone extends HttpServlet {
private static final long serialVersionUID = 1L;
/**
* #see HttpServlet#HttpServlet()
*/
public HiveReadone() {
super();
// TODO Auto-generated constructor stub
}
/**
* #see HttpServlet#doGet(HttpServletRequest request, HttpServletResponse response)
*/
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
// TODO Auto-generated method stub
response.getWriter().append("Served at: ").append(request.getContextPath());
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.enableHiveSupport()
.config("spark.sql.warehouse.dir", "hdfs://saurab:9000/user/hive/warehouse")
.config("mapred.input.dir.recursive", true)
.config("hive.mapred.supports.subdirectories", true)
.config("hive.vectorized.execution.enabled", true)
.master("local")
.getOrCreate();
response.getWriter().println(spark);
Nothing gets print on browser accept output from response.getWriter().append("Served at:
").append(request.getContextPath()); which is Served at: /hiveServ
Please take a look at my conf/hive-site.xml
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://saurab:3306/metastore_db?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>/home/saurab/hadoopec/hive/lib/hive-serde-2.1.1.jar</value>
</property>
<property>
<name>spark.sql.warehouse.dir</name>
<value>hdfs://saurab:9000/user/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.uris</name>
<!--Make sure that <value> points to the Hive Metastore URI in your cluster -->
<value>thrift://saurab:9083</value>
<description>URI for client to contact metastore server</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10001</value>
<description>Port number of HiveServer2 Thrift interface.
Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepassword</value>
<description>password for connecting to mysql server</description>
</property>
As far as I have read if we configure hive.metastore.uris spark will connect to hive metastore, but in my case it is not and giving me above error.
To configure spark on hive try to copy your hive-site.xml to the spark/conf directory

Java based approach for injecting list of spring beans

I am trying to get rid of my XML beans definition file. I would like to know how can i convert the following XML configuration to Java code.
<bean id="CustomerBean" class="com.java2s.common.Customer">
<property name="lists">
<bean class="org.springframework.beans.factory.config.ListFactoryBean">
<property name="targetListClass">
<value>java.util.ArrayList</value>
</property>
<property name="sourceList">
<list>
<value>1</value>
<value>2</value>
<value>3</value>
</list>
</property>
</bean>
</property>
</bean>
I am especially interested in knowing how to convert a list, Set, Map and properties XML configurations to Java code.
And if in a list if i have defined the beans in order like
<bean p:order="1000"
How i can manage the same ordering in java code.
A <list> corresponds to java.util.List, <map> corresponds to java.util.Map, <props> corresponds to java.util.Properties and so on.
To set the order, use the org.springframework.core.annotation.Order annotation on your bean or let it implement org.springframework.core.Ordered.
The equivalent of your XML configuration is something like:
#Bean
public Customer CustomerBean() {
Customer customer = new Customer();
List<String> lists = new ArraysList<>();
lists.add("1");
lists.add("2");
lists.add("3");
customer.setLists(lists);
return customer;
}
Note that the name of the method will be the name of the bean.

Hadoop map throw "Filesystem closed" exception

I'm running a MapReduce task against Wikipedia dump with history using XmlInputFormat for parsing the XML.
"xxx_m_000053_0" always stop at 70% before it's kill due to time out.
in the console:
xxx_m_000053_0 failed to report status for 300 seconds. Killing!
I increase the timeout to 2 hours. It didn't work.
In xxx_m_000053_0 log file:
Processing split: hdfs://localhost:8020/user/martin/history/history.xml:3556769792+67108864
I was expecting something wrong in history.xml in offset [3556769792,3623878656]. I split the file from this offset and run it in hadoop. It worked... (???)
In xxx_m_000053_0 log file:
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:323)
at org.apache.hadoop.hdfs.DFSClient.access$1200(DFSClient.java:78)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:2326)
at java.io.FilterInputStream.close(FilterInputStream.java:155)
**at com.doduck.wikilink.history.XmlInputFormat$XmlRecordReader.close(XmlInputFormat.java:109)**
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:496)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1776)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:778)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2013-09-17 13:13:32,248 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
2013-09-17 13:13:32,248 INFO org.apache.hadoop.mapred.MapTask: Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewOutputCollector#54e9a7c2
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/file.out
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
at org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1645)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1328)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
So I'm thinking it might be a configuration problem? Why is my file system stop?
Something wrong with XmlInputFormat ?
My empty mapper:
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
//nothing to do...
}
My Main:
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
conf.set("xmlinput.start", "<page>");
conf.set("xmlinput.end", "</page>");
Job job = new Job(conf, "wikipedia link history");
job.setJarByClass(Main.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(XmlInputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
hdfs-site.xml:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx9216m</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>300000</value>
</property>
My core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/Volumes/WD/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>

Why System.setProperty cannot change the configuration attribute in Hadoop?

My environment is ubuntu12.04+eclipse3.3.0+hadoop0.20.2
When I test on the System.serProperty, which would change the configuration defined in xml file. But I don't get the same effect when I test it. Here is my code snippet:
//cofiguration class test
public static void test() {
Configuration conf = new Configuration();
conf.addResource("raw/conf-1.xml");
System.out.println(conf.get("z"));
System.setProperty("z", "SystemProp_mz");
System.out.println(conf.get("z"));
}
conf-1.xml is as follows:
<configuration>
<property>
<name>z</name>
<value>mz</value>
</property>
</configuration>
the output is:
mz
mz
Could anyone give me some help? Thanks a lot!
The Configuration object isn't linked to the System properties - if you want to change the value of z in the configuration, then use conf.set('z', 'SystemProp_mz') instead of System.setProperty(..)
Update
The Configuration object can use variable expansion as outlined in http://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html, but this requires you to have defined an entry as follows:
<configuration>
<property>
<name>z</name>
<value>${z}</value>
</property>
</configuration>
If you have don't have the above entry, then just calling conf.get("z") will not resolve to system properties. The following unit test block demonstrates this:
#Test
public void testConfSystemProps() {
System.setProperty("sysProp", "value");
Configuration conf = new Configuration();
conf.set("prop", "${sysProp}");
Assert.assertNull(conf.get("sysProp"));
Assert.assertEquals(conf.get("prop"), "value");
}

Resources