Wire up a Hadoop Jobfactorybean, multiple Reducers on single Hadoop Node - spring

What I want to achieve:
I have set up a Spring Batch Job containing Hadoop Tasks to process some larger files.
To get multiple Reducers running for the job, i need to set the number of Reducers with setNumOfReduceTasks. I'm trying to set this via the JobFactorybean.
My bean configuration in classpath:/META-INF/spring/batch-common.xml :
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:p="http://www.springframework.org/schema/p"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="jobFactoryBean" class="org.springframework.data.hadoop.mapreduce.JobFactoryBean" p:numberReducers="5"/>
<bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean" />
<bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/>
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher" p:jobRepository-ref="jobRepository" />
</beans>
The XML is included via:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xsi:schemaLocation="
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd">
<context:property-placeholder location="classpath:batch.properties,classpath:hadoop.properties"
ignore-resource-not-found="true" ignore-unresolvable="true" />
<import resource="classpath:/META-INF/spring/batch-common.xml" />
<import resource="classpath:/META-INF/spring/hadoop-context.xml" />
<import resource="classpath:/META-INF/spring/sort-context.xml" />
</beans>
I'm getting the beans for the jUnit Test via
JobLauncher launcher = ctx.getBean(JobLauncher.class);
Map<String, Job> jobs = ctx.getBeansOfType(Job.class);
JobFactoryBean jfb = ctx.getBean(JobFactoryBean.class);
The jUnit Test stops with a error:
No bean named '&jobFactoryBean' is defined
So: the JobFactoryBean is not loaded, but the others are loaded correctly and without an error.
Without the line
JobFactoryBean jfb = ctx.getBean(JobFactoryBean.class);
the project tests runs, but there is just one Reducer per job.
The method
ctx.getBean("jobFactoryBean");
returns a Hadoop Job. I would expect to get the factoryBean there...
To test it I have extended the constructor of the Reducer to log each creation of a Reducer to get a notification when one is generated. So far I just get one entry in the log.
I have a 2 VM's with 2 assigned cores and 2 GB ram each, and I'm trying o sort a 75MB file consisting of multiple books from Project Gutenberg.
EDIT:
Another thing i have tried is to set the number of the reducers in the hadoop job via the property, without a result.
<job id="search-jobSherlockOk" input-path="${sherlock.input.path}"
output-path="${sherlockOK.output.path}"
mapper="com.romediusweiss.hadoopSort.mapReduce.SortMapperWords"
reducer="com.romediusweiss.hadoopSort.mapReduce.SortBlockReducer"
partitioner="com.romediusweiss.hadoopSort.mapReduce.SortPartitioner"
number-reducers="2"
validate-paths="false" />
the settings in the mapreduce-site.xml are on both nodes:
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>10</value>
</property>
...and Why:
I want to copy the example of the following blog post:
http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/
I need different Reducers on the same machine or a fully distributed environment to test the behaviour of the Partitioner. The first approach would be easier.
P.s.: could a user with a higher reputation create a tag "spring-data-hadoop" Thank you!

Answered the question on the Spring forums where it was also posted (recommend using it for Spring Data Hadoop questions).
The full answer is here http://forum.springsource.org/showthread.php?130500-Additional-Reducers , but in short, the number of reducers is driven by the number of input splits. See http://wiki.apache.org/hadoop/HowManyMapsAndReduces

Related

Spring profile XML bean configuration required to be at beginning of file

Using the below Spring configuration I load com.Test2 when DEV profile is used and load com.Test1 when in all other cases:
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd">
<bean id="bean1"
class="com.Test1">
</bean>
<beans profile="DEV">
<bean id="bean1"
class="com.Test2">
</bean>
</beans>
</beans>
Moving the Spring profile configuration to the beginning of the file:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd">
<beans profile="DEV">
<bean id="bean1"
class="com.Test1">
</bean>
</beans>
<bean id="bean1"
class="com.Test2">
</bean>
</beans>
the IntelliJ IDE reports the error:
Invalid content was found starting with element '{"http://www.springframework.org/schema/beans":bean}'. One of '{"http://www.springframework.org/schema/beans":beans}' is expected.
Why is this error reported ? Why is it required that the Spring profile be set at the beginning of the file ?
The error is reported because, according to the XML schema, in the second case the elements are in the incorrect order:
As you can see, any <bean> declaration must be provided before any nested <beans>.
This restriction is also indicated in the Spring documentation:
It is also possible to avoid that split and nest <beans/> elements within the same file, as the following example shows:
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:jdbc="http://www.springframework.org/schema/jdbc"
xmlns:jee="http://www.springframework.org/schema/jee"
xsi:schemaLocation="...">
<!-- other bean definitions -->
<beans profile="development">
<jdbc:embedded-database id="dataSource">
<jdbc:script location="classpath:com/bank/config/sql/schema.sql"/>
<jdbc:script location="classpath:com/bank/config/sql/test-data.sql"/>
</jdbc:embedded-database>
</beans>
<beans profile="production">
<jee:jndi-lookup id="dataSource" jndi-name="java:comp/env/jdbc/datasource"/>
</beans>
</beans>
The spring-bean.xsd has been constrained to allow such elements only as the last ones in the file. This should help provide flexibility without incurring clutter in the XML files.

Drools7: How to assign drl files to the KieBase in kmodule.xml?

I'm trying to migrate from drools 5 to drools 7. In version 6 there were changes in the spring integration. Based on the documentation drools:resources
drools:resource was removed, however I couldn't find out how to achieve the same behavior with the new toolset. What I want is to have different kiebases with different rules, that are defined in drl files.
The documentation says that the resources can be defined using packages. Unfortunately in my case a package may contain several drl files and I want to filter some of them.
What I had in drools 5.x:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:drools-spring="http://drools.org/schema/drools-spring"
xsi:schemaLocation="http://drools.org/schema/drools-spring http://drools.org/schema/drools-spring.xsd
http://www.springframework.org/schema/task http://www.springframework.org/schema/task/spring-task.xsd
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd">
<drools-spring:kbase id="rules1And3">
<drools-spring:resources>
<drools-spring:resource source="classpath:rules/Rules1.drl"/>
<drools-spring:resource source="classpath:rules/Rules3.drl"/>
</drools-spring:resources>
</drools-spring:kbase>
<drools-spring:kbase id="rules2And3">
<drools-spring:resources>
<drools-spring:resource source="classpath:rules/Rules2.drl"/>
<drools-spring:resource source="classpath:rules/Rules3.drl"/>
</drools-spring:resources>
</drools-spring:kbase>
<bean id="ruleSessionAutoRefundAndPox" factory-bean="rules1And3"
factory-method="newStatelessKnowledgeSession"/>
<bean id="ruleSessionNonCashRefund" factory-bean="rules2And3"
factory-method="newStatelessKnowledgeSession"/>
</beans>
So here there were 3 files under the Rules. The first kbase only had rule 1 and rule 2 and the second had only rule 2 and rule3.
How it "should" look like in 7.x:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:kie="http://drools.org/schema/kie-spring"
xsi:schemaLocation="http://www.springframework.org/schema/task http://www.springframework.org/schema/task/spring-task.xsd
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://drools.org/schema/kie-spring http://drools.org/schema/kie-spring.xsd">
<kie:kmodule id="rules">
<kie:kbase name="rules1And3">
<!--loaded drl files-->
</kie:kbase>
<kie:kbase name="rules2And3">
<!--loaded drl files-->
</kie:kbase>
</kie:kmodule>
<!-- maybe these are unnecessary and instead ksessions should been defined within kbase elements-->
<bean id="sessionRules1And3" factory-bean="rules1And3"
factory-method="newStatelessKnowledgeSession"/>
<bean id="sessionRules2And3" factory-bean="rules2And3"
factory-method="newStatelessKnowledgeSession"/>
</beans>
Based on what I saw I'm not even sure if the very same behavior is achievable in the new version or maybe the whole approach is wrong, but what I want is to be able to define which drl files are loaded for a kiebase.
Thanks for any help!
I could not find anything that would make it possible to refer to a file explicitly.
What I did is that I placed the under a folder in the main/resources
i.e:
src/main/resources/rules1/Rules1.drl
src/main/resources/rules3/Rules3.drl
and then the kbase definition looks like this:
<kie:kmodule id="rules">
<kie:kbase name="rules1And3" packages="rules1,rules3"/>
...
</kie:kmodule>
NOTE1: the package in the drl file should be correct. i.e. if it is under com/some/package then in the drl file you should have package com.some.package
This should how it is normally, but this was not validated in version 5
NOTE2: if you have a separate kmodule for tests then you should have the drl files also in test/resouces in order to be able to load them.
So this is how I manged to resolve it, if there will be any better answer I will accept it, but this works.

How to restart Map Reduce Hadoop job using Spring Batch Yarn Application?

I have Map Reduce application and i want to use Spring Batch Yarn logic.
Like the one mentioned in below link.
https://spring.io/guides/gs/yarn-batch-restart/
But i want it to be really specific as normally we have in hadoop map reduce jobs.
Just looking for Spring Batch Yarn Class and configuration. Considering my hadoop map reduce logic is already inplace and working.
Thanks in Advance !!
Here is the ApplicationContext Configuration that u were looking for
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:util="http://www.springframework.org/schema/util"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:hdp="http://www.springframework.org/schema/hadoop" xmlns:batch="http://www.springframework.org/schema/batch"
xsi:schemaLocation="
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch.xsd
http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-4.2.xsd">
<context:property-placeholder location="classpath:application.properties" />
<hdp:configuration namenode-principal="hdfs://xx.yy.com" rm-manager-uri="xx.yy.com"
security-method="kerb" user-keytab="location" rm-manager-principal="username"
user-principal="username">
fs.default.name=${fs.default.name}
mapred.job.tracker=${mapred.job.tracker}
</hdp:configuration>
<hdp:job id="wordCountJobId" input-path="${input.path}"
output-path="${output.path}" jar-by-class="com.xx.poc.Application"
mapper="com.xx.poc.Map" reducer="com.xx.poc.Reduce" />
<hdp:job-runner id="wordCountJobRunner" job-ref="wordCountJobId"
run-at-startup="true" />
</beans>

Loading util:properties with Spring Profile causes multiple occurrences of ID

I am using Spring(3.1) profiles to load property files vis util:properties:
<beans profile="local">
<util:properties id="myProps"
location="classpath:local.properties" />
</beans>
<beans profile="dev">
<util:properties id="myProps"
location="classpath:dev.properties" />
</beans>
And I invoke the profile via a runtime parameter(running on TC Server):-Dspring.profiles.active=local
But I get the error There are multiple occurrences of ID value 'myProps'
This was running previously with other bean definitions but once the util:properties was added I get the error.
Make sure your xsd declarations are using >= 3.1 versions for both beans and util namespaces:
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.1.xsd
http://www.springframework.org/schema/util
http://www.springframework.org/schema/util/spring-util-3.1.xsd ">
Most likely cause of error would be forgetting to set util declaration to 3.1, if as you say this works for other beans but not those declared using util.

spring social xml config

i have already read the spring social document but the part of configuration is Java based, but my project's configuration is xml based. so please tell me how config spring social in spring xml config file. thank you and sorry for my poor english
Posting your code and issues will help us to provide you the best solution. Refer to the link below may be that is what you are looking for
http://harmonicdevelopment.tumblr.com/post/13613051804/adding-spring-social-to-a-spring-mvc-and-spring
Take a look at the example xml config
https://github.com/SpringSource/spring-social-samples/tree/master/spring-social-showcase-xml/src/main/webapp/WEB-INF/spring
You have to create a social config xml file and you have to import to your root-context.xml file. Also, you may think about configure your app with spring security. It's depends of your project architecture.
Sample spring social xml config file :
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:social="http://www.springframework.org/schema/social"
xmlns:facebook="http://www.springframework.org/schema/social/facebook" xmlns:bean="http://java.sun.com/jsf/core"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.2.xsd
http://www.springframework.org/schema/social http://www.springframework.org/schema/social/spring-social.xsd
http://www.springframework.org/schema/social/facebook http://www.springframework.org/schema/social/spring-social-facebook.xsd">
<!-- Ensures that configuration properties are read from a property file -->
<context:property-placeholder location="file:${sampleapp.appdir}/conf/appparam.txt"/>
<!--
Configures FB and Twitter support.
-->
<facebook:config app-id="${facebook.clientId}" app-secret="${facebook.clientSecret}" />
<!--
Configures the connection repository. This application uses JDBC
connection repository which saves connection details to database.
This repository uses the data source bean for obtaining database
connection.
-->
<social:jdbc-connection-repository data-source-ref="sampleappDS" connection-signup-ref="accountConnectionSignup"/>
<!--
This bean is custom account connection signup bean for your registeration logic.
-->
<bean id="accountConnectionSignup" class="com.sampleapp.social.AccountConnectionSignup"></bean>
<!--
This bean manages the connection flow between the account provider and
the example application.
-->
<bean id="connectController" class="org.springframework.social.connect.web.ConnectController" autowire="constructor">
<constructor-arg index="0" ref="connectionFactoryLocator"/>
<constructor-arg index="1" ref="connectionRepository"/>
</bean>
Sample root-context.xml :
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:aop="http://www.springframework.org/schema/aop" xmlns:cache="http://www.springframework.org/schema/cache"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.0.xsd
http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-4.0.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-4.0.xsd
http://www.springframework.org/schema/cache http://www.springframework.org/schema/cache/spring-cache.xsd">
<!-- Scan for Spring beans declared via annotations. -->
<context:component-scan base-package="com.sampleapp"/>
<context:annotation-config/>
<context:property-placeholder location="file:${sampleapp.appdir}/conf/appparam.txt"/>
<cache:annotation-driven/>
<!-- Root Context: defines shared resources visible to all other web components -->
<import resource="security-config.xml"/>
<import resource="classpath*:spring/bean-context.xml"/>
<import resource="classpath*:spring/persistence-config.xml"/>
<import resource="social-config.xml"/>
<aop:aspectj-autoproxy proxy-target-class="true"/>

Resources