I am reading the article http://spring.io/guides/gs/batch-processing/ which explains reading a csv and writing it back to a DB. I want to know how can I read mutiple CSV files say A.csv, B.csv etc and write the content back in respective tables table_A, table_B etc. Please note the content of each csv file should go in a different table.
The basic use case here would be to create as much steps as you have CSV files (since there is no default MultiResourceItemReader implementation).
Each of your step would read a CSV (with a FlatFileItemReader) and write to your database (using JdbcBatchItemWriter or another one of the same kind). Although you will have multiple steps, if your CSV files have the same format (columns, separators), you can factorize the configuration using an AbstractStep. See documentation : http://docs.spring.io/spring-batch/trunk/reference/html/configureStep.html
If not, then you can at least share the common attributes such as LineMapper, ItemPreparedStatementSetter and DataSource.
UPDATE
Here are examples for your readers and writers :
<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="resource" value="yourFile.csv" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names" value="column1,column2,column3..." />
</bean>
</property>
<property name="fieldSetMapper">
<bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
<property name="prototypeBeanName" value="yourBeanClass" />
</bean>
</property>
</bean>
</property>
</bean>
<bean id="writer" class="org.springframework.batch.item.database.JdbcBatchItemWriter">
<property name="dataSource" ref="dataSource" />
<property name="sql">
<value>
<![CDATA[
insert into YOUR_TABLE(column1,column2,column3...)
values (:beanField1, :beanField2, :beanField3...)
]]>
</value>
</property>
<property name="itemSqlParameterSourceProvider">
<bean class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider" />
</property>
</bean>
UPDATE 2
Here's an example to chain the steps in the job (with Java-based configuration) :
#Bean
public Job job() {
return jobBuilderFactory().get("job").incrementer(new RunIdIncrementer())
.start(step1()).next(step2()).build();
}
Related
Need to get the schema name from properties file for spring batch application.
Where the schema name is different for dev and prod for MSSQL database.
Job configuration in xml as below
<bean id="dataItemWriter" class="org.springframework.batch.item.database.JdbcBatchItemWriter">
<property name="assertUpdates" value="true" />
<property name="itemPreparedStatementSetter">
<bean class="org.test.batch.model.ItemStatementMapper" />
</property>
<property name="sql" >
<value>
<![CDATA[
INSERT INTO dbo.EMPLOYEE
(PROJECT_NAME
,APP_NAME
,EMPLOYEE_NAME)
values (?,?,?)
]]>
</value>
</property>
<property name="dataSource" ref="dataDataSource" />
</bean>
The schema name "dbo" should be retrieved form proprieties file so that DEV and PROD this can be changed in configuration
I don't see the need to put the value in a CDATA block, there are no special xml characters in your query. Here is an example: https://github.com/spring-projects/spring-batch/blob/8762e3411557aaf887867f8d8594b01127538cb1/spring-batch-core/src/test/resources/org/springframework/batch/core/resource/ListPreparedStatementSetterTests-context.xml#L25. So in your case, it should be something like:
<bean id="dataItemWriter" class="org.springframework.batch.item.database.JdbcBatchItemWriter">
<property name="assertUpdates" value="true" />
<property name="itemPreparedStatementSetter">
<bean class="org.test.batch.model.ItemStatementMapper" />
</property>
<property name="sql" >
<value>
INSERT INTO dbo.EMPLOYEE (PROJECT_NAME ,APP_NAME ,EMPLOYEE_NAME) values (?,?,?)
</value>
</property>
<property name="dataSource" ref="dataDataSource" />
</bean>
My Requirement is I need to have two datasource connected to Spring Batch Application.
1) One for Spring Batch Jobs and Executions storing
2) One for Business Data Stroing, Processing and Retreiving.
I know that there are lot of solutions for achieving this. But I have achieved by setting the second datasource as primary. The problem is the second datasource is not coming under transaction scope instead it is committing for each sql statement executing expecially through jdbctemplate.
As I can't able to edit my question. I am writing another Post in detail
My Requirement is I need to have two datasource connected to Spring Batch Application.
1) One for Spring Batch Jobs and Executions storing
2) One for Business Data Stroing, Processing and Retreiving.
In env-context.xml I have following configuration
<!-- Enable annotations-->
<context:annotation-config/>
<bean primary="true" id="dataSource" class="org.springframework.jndi.JndiObjectFactoryBean">
<property name="jndiName" value="java:/DB2XADS"/>
</bean>
<!-- Creating TransactionManager Bean, since JDBC we are creating of type
DataSourceTransactionManager -->
<bean id="transactionManager" primary="true"
class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
<property name="dataSource" ref="dataSource" />
</bean>
<!-- jdbcTemplate uses dataSource -->
<bean id="jdbcTemplate" class="org.springframework.jdbc.core.JdbcTemplate">
<property name="dataSource" ref="dataSource" />
</bean>
<bean id="batchTransactionManager"
class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/>
<bean id="transactionTemplate"
class="org.springframework.transaction.support.TransactionTemplate">
<property name="transactionManager" ref="transactionManager" />
</bean>
In override-context.xml I have the following code
<tx:annotation-driven transaction-manager="transactionManager" />
<!-- jdbcTemplate uses dataSource -->
<bean id="batchDataSource" class="org.springframework.jndi.JndiObjectFactoryBean">
<property name="jndiName" value="java:/MySqlDS"/>
</bean>
<bean class="com.honda.pddabulk.utility.MyBatchConfigurer">
<property name="dataSource" ref="batchDataSource" />
</bean>
<!-- Use this to set additional properties on beans at run time -->
<bean id="placeholderProperties"
class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
<property name="locations">
<list>
<value>classpath:/org/springframework/batch/admin/bootstrap/batch.properties
</value>
<value>classpath:/batch/batch-mysql.properties</value>
<value>classpath:log4j.properties</value>
</list>
</property>
<property name="systemPropertiesModeName" value="SYSTEM_PROPERTIES_MODE_OVERRIDE"/>
<property name="ignoreResourceNotFound" value="true"/>
<property name="ignoreUnresolvablePlaceholders" value="true"/>
<property name="order" value="1"/>
</bean>
<!-- Overrider job repository -->
<bean id="jobRepository"
class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean">
<property name="databaseType" value="mysql"/>
<property name="dataSource" ref="batchDataSource"/>
<property name="tablePrefix" value="${batch.table.prefix}"/>
<property name="maxVarCharLength" value="2000"/>
<property name="isolationLevelForCreate" value="ISOLATION_SERIALIZABLE"/>
<property name="transactionManager" ref="batchTransactionManager"/>
</bean>
<!-- Override job service -->
<bean id="jobService" class="org.springframework.batch.admin.service.SimpleJobServiceFactoryBean">
<property name="tablePrefix" value="${batch.table.prefix}"/>
<property name="jobRepository" ref="jobRepository"/>
<property name="jobLauncher" ref="jobLauncher"/>
<property name="jobLocator" ref="jobRegistry"/>
<property name="dataSource" ref="batchDataSource"/>
</bean>
<!-- Override job launcher -->
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
<property name="taskExecutor" ref="jobLauncherTaskExecutor" />
</bean>
<task:executor id="jobLauncherTaskExecutor" pool-size="21" rejection-policy="ABORT" />
<!-- Override job explorer -->
<bean id="jobExplorer"
class="org.springframework.batch.core.explore.support.JobExplorerFactoryBean">
<property name="tablePrefix" value="${batch.table.prefix}"/>
<property name="dataSource" ref="batchDataSource"/>
</bean>
In job-config.xml I have the following code
<context:component-scan base-package="com.honda.*">
<context:exclude-filter type="regex"
expression="com.honda.pddabulk.utility.MyBatch*" />
</context:component-scan>
I have the custom Batch configurer set. Now the problem is when I try to execute queries with jdbctemplate for update and insert it is not under transaction which means #Transactional is not working.
Rather commit is happening for each method call. The example is
#Transactional
public void checkInsertion() throws Exception{
try{
jdbcTemplate.update("INSERT INTO TABLE_NAME(COLUMN1, COLUMN2) VALUES( 'A','AF' );
throw new PddaException("custom error");
}catch(Exception ex){
int count=jdbcTemplate.update("ROLLBACK");
log.info("DATA HAS BEEN ROLLBACKED SUCCESSFULLY... "+count);
throw ex;
}
}
In the above code I am trying to insert data and immediately I am also throwing a exception which means the insert should happen but commit will not. So we will not be able to see any data but unfortunately the commit is happening. Please some one help
The table has more than 200 million records, but i need to restrict the select top 5 million records. I have tried with jdbcCursorItemReader which is taking around 2-3 hrs to select and write it to the csv file using single step by chunk processing, So i choose to go with parallel processing that spring is batch offering.
i,e by having taskExecutor and JdbcPagingItemReader making each 5 individual files of million each but the problem is i am not able to specify the limit and offset clause in query parameters. please help me on this. Approach better than this too is appreciated.
<bean id="itemReader" class="org.springframework.batch.item.database.JdbcPagingItemReader" scope="step">
<property name="dataSource" ref="dataSource" />
<property name="rowMapper">
<bean class="MyRowMapper" />
</property>
<property name="queryProvider">
<bean class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
<property name="dataSource" ref="dataSource" />
<property name="sortKeys">
<map>
<entry key="esmeaddr" value="ASCENDING"/>
</map>
</property>
<property name="selectClause" value="elect cust_send,dest,msg,stime,dtime,dn_status,mid,rp,operator,circle,cust_mid,first_attempt,second_attempt,third_attempt,fourth_attempt,fifth_attempt,term_operator,term_circle,bindata,reason,tag1,tag2,tag3,tag4,tag5"
/>
<property name="fromClause" value="FROM bill_log " />
<property name="whereClause" value="where esmeaddr = '70897600000000' and country='India' and apptype='SMS' Limit 0,1000000" />
</bean>
</property>
<property name="pageSize" value="1000000" />
<property name="parameterValues">
<map>
<entry key="param1" value="#{jobExecutionContext[param1]}" />
<entry key="param2" value="#{jobExecutionContext[param2]}" />
</map>
</property>
</bean>
You can't use a SQL LIMIT clause within that reader since that's what the reader itself will do. Instead, Spring Batch has the functionality built into the JdbcPagingItemReader. To set the max number of items to process, you can configure the reader with JdbcPagingItemReader#setMaxItemCount(5000000) and if there is an offset, you would set the JdbcPagingItemReader#setCurrentItemCount(offset). That being said, the offset will be overriden on a restart with any value it finds in the ExecutionContext. You can read more about this in the javadoc here: https://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/database/JdbcPagingItemReader.html
I have a CSV file that doesn't have a fixed number of columns, like this:
col1,col2,col3,col4,col5
val1,val2,val3,val4,val5
column1,column2,column3
value1,value2,value3
Is there any way to read this kind of CSV file with Spring Batch?
I tried to do this:
<bean id="ItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
<!-- Read a csv file -->
<property name="resource" value="classpath:file.csv" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<!-- split it -->
<property name="lineTokenizer">
<bean
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names"
value="col1,col2,col3,col4,col5,column1,column2,column3" />
</bean>
</property>
<property name="fieldSetMapper">
<bean
class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
<property name="prototypeBeanName" value="myBean" />
</bean>
</property>
</bean>
</property>
</bean>
But the result was this error:
You can use the PatternMatchingCompositeLineMapper to delegate to the appropriate LineMapper implementation per line based on a pattern. From there, each of your delegates would use a DelimtedLineTokenizer and a FieldSetMapper to map the line accordingly.
You can read more about this in the documentation here: http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/file/mapping/PatternMatchingCompositeLineMapper.html
AbstractLineTokenizer#setStrict(boolean) in your DelimitedLineTokenizer should do the job.
From the javadoc :
Public setter for the strict flag. If true (the default) then number
of tokens in line must match the number of tokens defined (by Range,
columns, etc.) in LineTokenizer. If false then lines with less tokens
will be tolerated and padded with empty columns, and lines with more
tokens will simply be truncated.
You should change this part of your configuration to:
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names" value="col1,col2,col3,col4,col5,column1,column2,column3" />
<property name="strict" value="false" />
</bean>
I am very new to spring batch. I have requirement in which i have to read a file having a header(Field Names) record and data records
i have to validate 1st record (check the field names matching against set of predefined names)- note that this record need to be skipped- i mean should not be part of items in processor)
read and store rest of the field values to a POJO
if the field 'date' is empty , i need to set the default value as 'xxxx-yy-zz'
i am unable to 1st and 3rd requirement with batch
here is the sample reader XML. please help
<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="resource" value="classpath:input/import" />
<property name="encoding" value="UTF-8" />
<property name="linesToSkip" value="1" />
<property name="lineMapper" ref="line.mapper"/>
</bean>
<bean id="line.mapper" class="org.springframework.batch.item.file.mapping .DefaultLineMapper">
<property name="lineTokenizer" ref="line.tokenizer"/>
<property name="fieldSetMapper" ref="fieldSet.enity.mapper"/>
</bean>
<bean id="line.tokenizer" class="org.springframework.batch.item.file.transfo rm.DelimitedLineTokenizer">
<property name="delimiter">
<util:constant static-field="org.springframework.batch.item.file.transfo rm.DelimitedLineTokenizer.DELIMITER_TAB"/>
</property>
<property name="names" value="id,date,age " />
<property name="strict" value="false"/>
</bean>
<bean id="fieldSet.enity.mapper" class="org.springframework.batch.item.file.mapping .BeanWrapperFieldSetMapper">
<property name="targetType" value="a.b.myPOJO"/>
<property name="customEditors">
<map>
<entry key="java.util.Date">
<bean class="org.springframework.beans.propertyeditors.C ustomDateEditor">
<constructor-arg>
<bean class="java.text.SimpleDateFormat">
<constructor-arg value="yyyy-mm-dd" />
</bean>
</constructor-arg>
<constructor-arg value="true" />
</bean>
</entry>
</map>
</property>
Create your own custom FieldSetMapper like below
CustomeFieldSetMapper implements FieldSetMapper<a.b.myPOJO> {
#Override
public a.b.myPOJO mapFieldSet(FieldSet fs) {
a.b.myPOJO myPOJO = new a.b.myPOJO();
if(fs.readString("date").isEmpty()){
myPOJO.setDate("xxxx-yy-zz");
}
return a.b.myPOJO;
}
}
You think you should do date set in ItemProcessor.
Also, if <property name="linesToSkip" value="1" /> not fill your requirements - extend FlatFileItemReader and validate first line manually in it.