How to read CSV file with different number of columns with Spring Batch - spring

I have a CSV file that doesn't have a fixed number of columns, like this:
col1,col2,col3,col4,col5
val1,val2,val3,val4,val5
column1,column2,column3
value1,value2,value3
Is there any way to read this kind of CSV file with Spring Batch?
I tried to do this:
<bean id="ItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
<!-- Read a csv file -->
<property name="resource" value="classpath:file.csv" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<!-- split it -->
<property name="lineTokenizer">
<bean
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names"
value="col1,col2,col3,col4,col5,column1,column2,column3" />
</bean>
</property>
<property name="fieldSetMapper">
<bean
class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
<property name="prototypeBeanName" value="myBean" />
</bean>
</property>
</bean>
</property>
</bean>
But the result was this error:

You can use the PatternMatchingCompositeLineMapper to delegate to the appropriate LineMapper implementation per line based on a pattern. From there, each of your delegates would use a DelimtedLineTokenizer and a FieldSetMapper to map the line accordingly.
You can read more about this in the documentation here: http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/file/mapping/PatternMatchingCompositeLineMapper.html

AbstractLineTokenizer#setStrict(boolean) in your DelimitedLineTokenizer should do the job.
From the javadoc :
Public setter for the strict flag. If true (the default) then number
of tokens in line must match the number of tokens defined (by Range,
columns, etc.) in LineTokenizer. If false then lines with less tokens
will be tolerated and padded with empty columns, and lines with more
tokens will simply be truncated.
You should change this part of your configuration to:
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names" value="col1,col2,col3,col4,col5,column1,column2,column3" />
<property name="strict" value="false" />
</bean>

Related

Spring DelimitedLineTokenizer, how to disable de-escaping double quotes

org.springframework.batch.item.file.transform.DelimitedLineTokenizer, by default, "de-escapes" the fields. That means, it replaces two double quotes into one in the value. How can I stop this behavior? I need this because there is a next DelimitedLineTokenizer in the program misinterpreting this single double quote.
How can I do it? Could not get an answer from other posts.
I do have feature one value on multiple lines, because of that I cannot return false from isQuoteCharacter, as mentioned in some other posts. Here is the code.
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="delimiter" ref="delimiter-#{jobExecutionContext['DELIMITER']}" />
<property name="names"
value="#{jobExecutionContext['COLUMNS_NAME_LOOKUP']}" />
</bean>
</property>
<property name="fieldSetMapper">
<bean class="com.tsys.enterprise.converters.flexible.delimited.file.FlexibleDelimiterBasedFileParserFieldSetMapper"
scope="step">
<aop:scoped-proxy/>
<property name="targetType" value="com.tsys.enterprise.converters.flexible.delimited.file.vo.DataHolder"/>
<property name="udfLabel1Label" value="#{jobExecutionContext['UDF1_LABEL']}"/>
</bean>

how to use limit and offset clause in JdbcPagingItemReader in spring batch?

The table has more than 200 million records, but i need to restrict the select top 5 million records. I have tried with jdbcCursorItemReader which is taking around 2-3 hrs to select and write it to the csv file using single step by chunk processing, So i choose to go with parallel processing that spring is batch offering.
i,e by having taskExecutor and JdbcPagingItemReader making each 5 individual files of million each but the problem is i am not able to specify the limit and offset clause in query parameters. please help me on this. Approach better than this too is appreciated.
<bean id="itemReader" class="org.springframework.batch.item.database.JdbcPagingItemReader" scope="step">
<property name="dataSource" ref="dataSource" />
<property name="rowMapper">
<bean class="MyRowMapper" />
</property>
<property name="queryProvider">
<bean class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
<property name="dataSource" ref="dataSource" />
<property name="sortKeys">
<map>
<entry key="esmeaddr" value="ASCENDING"/>
</map>
</property>
<property name="selectClause" value="elect cust_send,dest,msg,stime,dtime,dn_status,mid,rp,operator,circle,cust_mid,first_attempt,second_attempt,third_attempt,fourth_attempt,fifth_attempt,term_operator,term_circle,bindata,reason,tag1,tag2,tag3,tag4,tag5"
/>
<property name="fromClause" value="FROM bill_log " />
<property name="whereClause" value="where esmeaddr = '70897600000000' and country='India' and apptype='SMS' Limit 0,1000000" />
</bean>
</property>
<property name="pageSize" value="1000000" />
<property name="parameterValues">
<map>
<entry key="param1" value="#{jobExecutionContext[param1]}" />
<entry key="param2" value="#{jobExecutionContext[param2]}" />
</map>
</property>
</bean>
You can't use a SQL LIMIT clause within that reader since that's what the reader itself will do. Instead, Spring Batch has the functionality built into the JdbcPagingItemReader. To set the max number of items to process, you can configure the reader with JdbcPagingItemReader#setMaxItemCount(5000000) and if there is an offset, you would set the JdbcPagingItemReader#setCurrentItemCount(offset). That being said, the offset will be overriden on a restart with any value it finds in the ExecutionContext. You can read more about this in the javadoc here: https://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/database/JdbcPagingItemReader.html

org.springframework.beans.factory.NoSuchBeanDefinitionException: No bean named 'exampleFileMapper' is defined

I'm trying to implement an example of spring batch 2.1.9
the scenario is to have a flexibility to switch Writers based on some conditions.
i have one csv file as input and 4 csv file as output.
while runing my processor this error is occured
Caused by: org.springframework.beans.factory.NoSuchBeanDefinitionException: No bean named 'exampleFileMapper' is defined.
The problem is in this bean
<bean id="exampleFileSourceReader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<property name="resource" value="file:#{jobParameters['file']}" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<!-- split it -->
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<!-- this is missing -->
<property name="delimiter" value=";"/>
<property name="names" value="institution,type,nom,rubrique,montantPaye,MontantRetenu" />
</bean>
</property>
<property name="fieldSetMapper">
<!-- map to an object -->
<bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
<property name="prototypeBeanName" value="exampleFileMapper" />
</bean>
</property>
</bean>
</property>
If anyone can explain to me what the operation of this line
<property name="prototypeBeanName" value="exampleFileMapper" />

Spring batch job to update different tables

I am reading the article http://spring.io/guides/gs/batch-processing/ which explains reading a csv and writing it back to a DB. I want to know how can I read mutiple CSV files say A.csv, B.csv etc and write the content back in respective tables table_A, table_B etc. Please note the content of each csv file should go in a different table.
The basic use case here would be to create as much steps as you have CSV files (since there is no default MultiResourceItemReader implementation).
Each of your step would read a CSV (with a FlatFileItemReader) and write to your database (using JdbcBatchItemWriter or another one of the same kind). Although you will have multiple steps, if your CSV files have the same format (columns, separators), you can factorize the configuration using an AbstractStep. See documentation : http://docs.spring.io/spring-batch/trunk/reference/html/configureStep.html
If not, then you can at least share the common attributes such as LineMapper, ItemPreparedStatementSetter and DataSource.
UPDATE
Here are examples for your readers and writers :
<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="resource" value="yourFile.csv" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names" value="column1,column2,column3..." />
</bean>
</property>
<property name="fieldSetMapper">
<bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
<property name="prototypeBeanName" value="yourBeanClass" />
</bean>
</property>
</bean>
</property>
</bean>
<bean id="writer" class="org.springframework.batch.item.database.JdbcBatchItemWriter">
<property name="dataSource" ref="dataSource" />
<property name="sql">
<value>
<![CDATA[
insert into YOUR_TABLE(column1,column2,column3...)
values (:beanField1, :beanField2, :beanField3...)
]]>
</value>
</property>
<property name="itemSqlParameterSourceProvider">
<bean class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider" />
</property>
</bean>
UPDATE 2
Here's an example to chain the steps in the job (with Java-based configuration) :
#Bean
public Job job() {
return jobBuilderFactory().get("job").incrementer(new RunIdIncrementer())
.start(step1()).next(step2()).build();
}

Spring Batch: Reading a File : if field is empty setting the default value

I am very new to spring batch. I have requirement in which i have to read a file having a header(Field Names) record and data records
i have to validate 1st record (check the field names matching against set of predefined names)- note that this record need to be skipped- i mean should not be part of items in processor)
read and store rest of the field values to a POJO
if the field 'date' is empty , i need to set the default value as 'xxxx-yy-zz'
i am unable to 1st and 3rd requirement with batch
here is the sample reader XML. please help
<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="resource" value="classpath:input/import" />
<property name="encoding" value="UTF-8" />
<property name="linesToSkip" value="1" />
<property name="lineMapper" ref="line.mapper"/>
</bean>
<bean id="line.mapper" class="org.springframework.batch.item.file.mapping .DefaultLineMapper">
<property name="lineTokenizer" ref="line.tokenizer"/>
<property name="fieldSetMapper" ref="fieldSet.enity.mapper"/>
</bean>
<bean id="line.tokenizer" class="org.springframework.batch.item.file.transfo rm.DelimitedLineTokenizer">
<property name="delimiter">
<util:constant static-field="org.springframework.batch.item.file.transfo rm.DelimitedLineTokenizer.DELIMITER_TAB"/>
</property>
<property name="names" value="id,date,age " />
<property name="strict" value="false"/>
</bean>
<bean id="fieldSet.enity.mapper" class="org.springframework.batch.item.file.mapping .BeanWrapperFieldSetMapper">
<property name="targetType" value="a.b.myPOJO"/>
<property name="customEditors">
<map>
<entry key="java.util.Date">
<bean class="org.springframework.beans.propertyeditors.C ustomDateEditor">
<constructor-arg>
<bean class="java.text.SimpleDateFormat">
<constructor-arg value="yyyy-mm-dd" />
</bean>
</constructor-arg>
<constructor-arg value="true" />
</bean>
</entry>
</map>
</property>
Create your own custom FieldSetMapper like below
CustomeFieldSetMapper implements FieldSetMapper<a.b.myPOJO> {
#Override
public a.b.myPOJO mapFieldSet(FieldSet fs) {
a.b.myPOJO myPOJO = new a.b.myPOJO();
if(fs.readString("date").isEmpty()){
myPOJO.setDate("xxxx-yy-zz");
}
return a.b.myPOJO;
}
}
You think you should do date set in ItemProcessor.
Also, if <property name="linesToSkip" value="1" /> not fill your requirements - extend FlatFileItemReader and validate first line manually in it.

Resources