Spring Batch: Reading fixed-width file without line breaks - spring

I've a fixed length content Flatfile which contains sample records like below and has no delimiter as such it contains special hex characters and data is spread across multiple lines too. But each line has constant 2000 bytes/characters and I need to keep picking the bytes from 1-2000, 2001-4000 and so on. I've fixed index characters.
Note - Here I don't want to read all characters from 2000 lines, just wanted to read based on Range.
Customer.java
#AllArgsConstructor
#NoArgsConstructor
#Builder
#Data
public class Customer {
private String firstValue;
private String secondValue;
private String thirdValue;
private String fourthValue;
}
Error
Java Bean
#Bean
public FlatFileItemReader<Customer> customerItemReader(){
return new FlatFileItemReaderBuilder<Customer>()
.name("customerItemReader")
.linesToSkip(1)
.resource(new ClassPathResource("/data/test.conv"))
.fixedLength()
.columns(new Range[] { new Range(3, 6), new Range(7, 13), new Range(14, 15), new Range(14, 15) })
.names(new String[] { "firstValue", "secondValue", "thirdValue", "fourthValue" })
.targetType(Customer.class)
.build();
}
Error
org.springframework.batch.item.file.FlatFileParseException: Parsing error at line: 2 in resource=[class path resource [data/test.conv]], input=[560000000000411999999992052300000000D 0000 0000000000010000000100000040000000000000 00000000 NYNNVX N N 0 N004 000100000001000100000001000100000001000100000001000100000001000100000001000100000001 YNYNYYNNNNNYNNNN0004000000070000000300010000000000000000000000020000000000000000NN1N N00NNNND 001NNN 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000O840000000000000000000AN0201000000NNNC840 N N00N A NN00400000000NNNNNUSAN NNNN00000000000000NN141900INNNNNN N N000000 NN 200//0055//20000YNN MO ΒΆ200528000000 !!B3K555800000001A****00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0005230000000000000000 ]
at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:189) ~[spring-batch-infrastructure-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.read(AbstractItemCountingItemStreamItemReader.java:93) ~[spring-batch-infrastructure-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.step.item.SimpleChunkProvider.doRead(SimpleChunkProvider.java:99) ~[spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.step.item.SimpleChunkProvider.read(SimpleChunkProvider.java:180) ~[spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.step.item.SimpleChunkProvider$1.doInIteration(SimpleChunkProvider.java:126) ~[spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:375) ~[spring-batch-infrastructure-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215) ~[spring-batch-infrastructure-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:145) ~[spring-batch-infrastructure-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.step.item.SimpleChunkProvider.provide(SimpleChunkProvider.java:118) ~[spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:71) ~[spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:407) ~[spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:331) ~[spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140) ~[spring-tx-5.2.6.RELEASE.jar:5.2.6.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:273) ~[spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:82) ~[spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:375) ~[spring-batch-infrastructure-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215) ~[spring-batch-infrastructure-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:145) ~[spring-batch-infrastructure-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:258) ~[spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:208) ~[spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:148) [spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.job.AbstractJob.handleStep(AbstractJob.java:410) [spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.job.SimpleJob.doExecute(SimpleJob.java:136) [spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:319) [spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:147) [spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50) [spring-core-5.2.6.RELEASE.jar:5.2.6.RELEASE]
at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:140) [spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_171]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_171]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_171]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_171]
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) [spring-aop-5.2.6.RELEASE.jar:5.2.6.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) [spring-aop-5.2.6.RELEASE.jar:5.2.6.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) [spring-aop-5.2.6.RELEASE.jar:5.2.6.RELEASE]
at org.springframework.batch.core.configuration.annotation.SimpleBatchConfiguration$PassthruAdvice.invoke(SimpleBatchConfiguration.java:127) [spring-batch-core-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) [spring-aop-5.2.6.RELEASE.jar:5.2.6.RELEASE]
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212) [spring-aop-5.2.6.RELEASE.jar:5.2.6.RELEASE]
at com.sun.proxy.$Proxy57.run(Unknown Source) [na:na]
at com.example.DatabaseOutputApplication.run(DatabaseOutputApplication.java:39) [classes/:na]
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:784) [spring-boot-2.2.7.RELEASE.jar:2.2.7.RELEASE]
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:768) [spring-boot-2.2.7.RELEASE.jar:2.2.7.RELEASE]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:322) [spring-boot-2.2.7.RELEASE.jar:2.2.7.RELEASE]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226) [spring-boot-2.2.7.RELEASE.jar:2.2.7.RELEASE]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1215) [spring-boot-2.2.7.RELEASE.jar:2.2.7.RELEASE]
at com.example.DatabaseOutputApplication.main(DatabaseOutputApplication.java:29) [classes/:na]
Caused by: org.springframework.batch.item.file.transform.IncorrectLineLengthException: Line is longer than max range 15
at org.springframework.batch.item.file.transform.FixedLengthTokenizer.doTokenize(FixedLengthTokenizer.java:113) ~[spring-batch-infrastructure-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.item.file.transform.AbstractLineTokenizer.tokenize(AbstractLineTokenizer.java:130) ~[spring-batch-infrastructure-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.item.file.mapping.DefaultLineMapper.mapLine(DefaultLineMapper.java:43) ~[spring-batch-infrastructure-4.2.2.RELEASE.jar:4.2.2.RELEASE]
at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:185) ~[spring-batch-infrastructure-4.2.2.RELEASE.jar:4.2.2.RELEASE]
I also tried this
#Bean
public FlatFileItemReader<Customer> customerItemReader(){
FixedLengthTokenizer tokenizer = new FixedLengthTokenizer();
tokenizer.setNames("firstValue", "secondValue", "thirdValue", "fourthValue", "fifthValue", "sixthValue", "seventhValue", "eighthValue", "ninethValue", "dummyRange");
tokenizer.setColumns(
new Range(3, 6), new Range(7, 13), new Range(14,15), new Range(16,24), new Range(25, 28), new Range(29,32), new Range(33, 36), new Range(1322, 1324),
new Range(1406, 1408), new Range(1409));
DefaultLineMapper<Customer> customerLineMapper = new DefaultLineMapper<>();
customerLineMapper.setLineTokenizer(tokenizer);
customerLineMapper.setFieldSetMapper(new CustomerFieldSetMapper());
customerLineMapper.afterPropertiesSet();
FlatFileItemReader<Customer> reader = new FlatFileItemReader<>();
reader.setLinesToSkip(1);
reader.setResource(new ClassPathResource("/data/test.conv"));
reader.setLineMapper(customerLineMapper);
reader.setStrict(false);
return reader;
}
These solution doesn't works when you've no delimiter and your data is spread across multiple lines. here 1406 column index is present in different line and delimiter has generated in file by mainframe. Please guide here.

The main problem here is that FlatFileItemReader assumes you have line breaks, which you don't. The clearest solution to me is to copy/paste the class and swap out the readLine() method with one that takes in the appropriate number of characters. Unfortunately, because much of the class is private, you can't easily extend and override.
package org.springframework.batch.item.file;
import java.io.BufferedReader;
import java.io.IOException;
import java.nio.charset.Charset;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.springframework.batch.item.ReaderNotOpenException;
import org.springframework.batch.item.file.separator.RecordSeparatorPolicy;
import org.springframework.batch.item.file.separator.SimpleRecordSeparatorPolicy;
import org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.core.io.Resource;
import org.springframework.util.Assert;
import org.springframework.util.ClassUtils;
import org.springframework.util.StringUtils;
/**
* Modified version of {#link FlatFileItemWriter} which reads in mainframe style files with fixed width and no line breaks.
* Still a restartable {#link ItemReader} that reads lines from input {#link #setResource(Resource)}. Line is defined by the
* {#link #setRecordSeparatorPolicy(RecordSeparatorPolicy)} and mapped to item using {#link #setLineMapper(LineMapper)}.
* If an exception is thrown during line mapping it is rethrown as {#link FlatFileParseException} adding information
* about the problematic line and its line number.
*
* #author Robert Kasanicky
* #author Dean Clark
*/
public class FixedLengthFlatFileItemReader<T> extends AbstractItemCountingItemStreamItemReader<T> implements ResourceAwareItemReaderItemStream<T>, InitializingBean {
private static final Log logger = LogFactory.getLog(FlatFileItemReader.class);
// default encoding for input files
public static final String DEFAULT_CHARSET = Charset.defaultCharset().name();
private RecordSeparatorPolicy recordSeparatorPolicy = new SimpleRecordSeparatorPolicy();
private Resource resource;
private BufferedReader reader;
private int lineCount = 0;
private String[] comments = new String[] { "#" };
private boolean noInput = false;
private String encoding = DEFAULT_CHARSET;
private LineMapper<T> lineMapper;
private int linesToSkip = 0;
private LineCallbackHandler skippedLinesCallback;
private boolean strict = true;
private BufferedReaderFactory bufferedReaderFactory = new DefaultBufferedReaderFactory();
// CHANGE: Added a variable to store Line Length
private Integer lineLength;
public FixedLengthFlatFileItemReader() {
setName(ClassUtils.getShortName(FlatFileItemReader.class));
}
/**
* In strict mode the reader will throw an exception on
* {#link #open(org.springframework.batch.item.ExecutionContext)} if the input resource does not exist.
* #param strict <code>true</code> by default
*/
public void setStrict(final boolean strict) {
this.strict = strict;
}
/**
* #param skippedLinesCallback will be called for each one of the initial skipped lines before any items are read.
*/
public void setSkippedLinesCallback(final LineCallbackHandler skippedLinesCallback) {
this.skippedLinesCallback = skippedLinesCallback;
}
/**
* Public setter for the number of lines to skip at the start of a file. Can be used if the file contains a header
* without useful (column name) information, and without a comment delimiter at the beginning of the lines.
*
* #param linesToSkip the number of lines to skip
*/
public void setLinesToSkip(final int linesToSkip) {
this.linesToSkip = linesToSkip;
}
/**
* Setter for line mapper. This property is required to be set.
* #param lineMapper maps line to item
*/
public void setLineMapper(final LineMapper<T> lineMapper) {
this.lineMapper = lineMapper;
}
/**
* Setter for the encoding for this input source. Default value is {#link #DEFAULT_CHARSET}.
*
* #param encoding a properties object which possibly contains the encoding for this input file;
*/
public void setEncoding(final String encoding) {
this.encoding = encoding;
}
/**
* Factory for the {#link BufferedReader} that will be used to extract lines from the file. The default is fine for
* plain text files, but this is a useful strategy for binary files where the standard BufferedReaader from java.io
* is limiting.
*
* #param bufferedReaderFactory the bufferedReaderFactory to set
*/
public void setBufferedReaderFactory(final BufferedReaderFactory bufferedReaderFactory) {
this.bufferedReaderFactory = bufferedReaderFactory;
}
/**
* Setter for comment prefixes. Can be used to ignore header lines as well by using e.g. the first couple of column
* names as a prefix.
*
* #param comments an array of comment line prefixes.
*/
public void setComments(final String[] comments) {
this.comments = new String[comments.length];
System.arraycopy(comments, 0, this.comments, 0, comments.length);
}
/**
* Public setter for the input resource.
*/
#Override
public void setResource(final Resource resource) {
this.resource = resource;
}
/**
* Public setter for the recordSeparatorPolicy. Used to determine where the line endings are and do things like
* continue over a line ending if inside a quoted string.
*
* #param recordSeparatorPolicy the recordSeparatorPolicy to set
*/
public void setRecordSeparatorPolicy(final RecordSeparatorPolicy recordSeparatorPolicy) {
this.recordSeparatorPolicy = recordSeparatorPolicy;
}
/**
* #return string corresponding to logical record according to
* {#link #setRecordSeparatorPolicy(RecordSeparatorPolicy)} (might span multiple lines in file).
*/
#Override
protected T doRead() throws Exception {
if (noInput) {
return null;
}
final String line = readLine();
if (line == null) {
return null;
} else {
try {
return lineMapper.mapLine(line, lineCount);
} catch (final Exception ex) {
throw new FlatFileParseException("Parsing error at line: " + lineCount + " in resource=["
+ resource.getDescription() + "], input=[" + line + "]", ex, line, lineCount);
}
}
}
/**
* #return next line (skip comments).getCurrentResource
*/
// CHANGE: Modified readLine() to pull in a set number of characters
private String readLine() {
if (reader == null) {
throw new ReaderNotOpenException("Reader must be open before it can be read.");
}
String line = null;
try {
// CHANGE: READ IN LINE BASED ON LINE LENGTH
final char[] chars = new char[lineLength + 1];
final int charsRead = reader.read(chars, 0, lineLength);
if (charsRead <= 10) {
noInput = true;
return null;
}
line = new String(chars);
// END CHANGE: READ IN LINE BASED ON LINE LENGTH
lineCount++;
while (isComment(line)) {
line = reader.readLine();
if (line == null) {
return null;
}
lineCount++;
}
line = applyRecordSeparatorPolicy(line);
} catch (final IOException e) {
// Prevent IOException from recurring indefinitely
// if client keeps catching and re-calling
noInput = true;
throw new NonTransientFlatFileException("Unable to read from resource: [" + resource + "]", e, line,
lineCount);
}
return line;
}
private boolean isComment(final String line) {
for (final String prefix : comments) {
if (line.startsWith(prefix)) {
return true;
}
}
return false;
}
#Override
protected void doClose() throws Exception {
lineCount = 0;
if (reader != null) {
reader.close();
}
}
#Override
protected void doOpen() throws Exception {
Assert.notNull(resource, "Input resource must be set");
Assert.notNull(recordSeparatorPolicy, "RecordSeparatorPolicy must be set");
noInput = true;
if (!resource.exists()) {
if (strict) {
throw new IllegalStateException("Input resource must exist (reader is in 'strict' mode): " + resource);
}
logger.warn("Input resource does not exist " + resource.getDescription());
return;
}
if (!resource.isReadable()) {
if (strict) {
throw new IllegalStateException("Input resource must be readable (reader is in 'strict' mode): "
+ resource);
}
logger.warn("Input resource is not readable " + resource.getDescription());
return;
}
reader = bufferedReaderFactory.create(resource, encoding);
for (int i = 0; i < linesToSkip; i++) {
final String line = readLine();
if (skippedLinesCallback != null) {
skippedLinesCallback.handleLine(line);
}
}
noInput = false;
}
#Override
public void afterPropertiesSet() throws Exception {
Assert.notNull(lineMapper, "LineMapper is required");
// CHANGE: Added an assertion to verify Line Length was provided
Assert.notNull(lineLength, "Line length is required");
}
#Override
protected void jumpToItem(final int itemIndex) throws Exception {
for (int i = 0; i < itemIndex; i++) {
readLine();
}
}
private String applyRecordSeparatorPolicy(String line) throws IOException {
String record = line;
while ((line != null) && !recordSeparatorPolicy.isEndOfRecord(record)) {
line = this.reader.readLine();
if (line == null) {
if (StringUtils.hasText(record)) {
// A record was partially complete since it hasn't ended but
// the line is null
throw new FlatFileParseException("Unexpected end of file before record complete", record, lineCount);
} else {
// Record has no text but it might still be post processed
// to something (skipping preProcess since that was already
// done)
break;
}
} else {
lineCount++;
}
record = recordSeparatorPolicy.preProcess(record) + line;
}
return recordSeparatorPolicy.postProcess(record);
}
// CHANGE: Added a setter for Line Length
public void setLineLength(final Integer lineLength) {
this.lineLength = lineLength;
}
}

You can utilize the isStrict flag to get read of this issue - use the tokenizer separately and your issue will be solved
I have implemented - fixedLengthTokenizer as below
#Value("classpath:mainframe.txt")
private Resource resource;
#Bean
public FlatFileItemReader fixLengthItemReader(){
FlatFileItemReader reader = new FlatFileItemReader();
reader.setResource(resource);
reader.setLineMapper(new DefaultLineMapper() {
{
setLineTokenizer(fixedLengthTokenizer());
setFieldSetMapper(new BeanWrapperFieldSetMapper<Customer>() {
{
setTargetType(Customer.class);
}
});
}
});
return reader;
}
#Bean
public FixedLengthTokenizer fixedLengthTokenizer() {
FixedLengthTokenizer tokenizer = new FixedLengthTokenizer();
tokenizer.setColumns(new Range[] { new Range(3, 6), new Range(7, 13), new Range(14, 15), new Range(14, 15) });
tokenizer.setNames(new String[] { "firstValue", "secondValue", "thirdValue", "fourthValue" });
tokenizer.setStrict(false);
return tokenizer;
}
Note - do remember to set this flag --> tokenizer.setStrict(false);

Related

How to convert file to a string using SpringIntegration

I am trying to read a text file and convert it to a string using SpringIntegration.
Need help in transforming file to a string.
Git Link: https://github.com/ravikalla/spring-integration
Source Code -
#Bean
#InboundChannelAdapter(value = "payorFileSource", poller = #Poller(fixedDelay = "10000"))
public MessageSource<File> fileReadingMessageSource() {
FileReadingMessageSource sourceReader = new FileReadingMessageSource();
sourceReader.setDirectory(new File(INPUT_DIR));
sourceReader.setFilter(new SimplePatternFileListFilter(FILE_PATTERN));
return sourceReader;
}
#Bean
#Transformer(inputChannel="payorFileSource", outputChannel="payorFileContent")
public FileToStringTransformer transformFileToString() {
FileToStringTransformer objFileToStringTransformer = new FileToStringTransformer();
return objFileToStringTransformer;
}
Error -
SEVERE: org.springframework.integration.handler.ReplyRequiredException: No reply produced by handler 'fileCopyConfig.transformPayorStringToObject.transformer.handler', and its 'requiresReply' property is set to true., failedMessage=GenericMessage [payload=1|test1, headers={sequenceNumber=1, file_name=payor.txt, sequenceSize=4, correlationId=ff1fef7d-7011-ee99-8d71-96146ac9ea07, file_originalFile=source/payor.txt, id=fd4f950b-afcf-70e6-a053-7d59ff593add, file_relativePath=payor.txt, timestamp=1554875904858}]
at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:119)
You can convert the file into an InputStream and the use
IOUtils.toString(inputStream) to convert it into a String.
That error is coming from somewhere else; the FTST can't return null.
I haven't looked at all your code, but this looks suspicious:
#Bean
#Transformer(inputChannel="payorRawStringChannel", outputChannel="payorRawObjectChannel")
public GenericTransformer<String, Payor> transformPayorStringToObject() {
return new GenericTransformer<String, Payor>() {
#Override
public Payor transform(String strPayor) {
String[] arrPayorData = strPayor.split(",");
Payor objPayor = null;
if (null != arrPayorData && arrPayorData.length > 1)
objPayor = new Payor(Integer.parseInt(arrPayorData[0]), arrPayorData[1]);
return objPayor;
}
};
}
It can return null; transformers are not allowed to do that.
Turn on DEBUG logging and follow the message flow to see which component is at fault.
package org.springframework.integration.samples.tcpclientserver;
import java.io.UnsupportedEncodingException;
import org.springframework.core.convert.converter.Converter;
/**
* Simple byte array to String converter; allowing the character set
* to be specified.
*
* #author Gary Russell
* #since 2.1
*
*/
public class ByteArrayToStringConverter implements Converter<byte[], String> {
private String charSet = "UTF-8";
public String convert(byte[] bytes) {
try {
return new String(bytes, this.charSet);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
return new String(bytes);
}
}
/**
* #return the charSet
*/
public String getCharSet() {
return charSet;
}
/**
* #param charSet the charSet to set
*/
public void setCharSet(String charSet) {
this.charSet = charSet;
}
}

Edit next row on tab

When i put all code in a SSCCE, it works as expected i.e first and third cells are editable. When tab on last column, takes to next row.
import java.text.NumberFormat;
import java.text.ParseException;
import java.text.ParsePosition;
import java.util.ArrayList;
import java.util.List;
import javafx.application.Application;
import static javafx.application.Application.launch;
import javafx.application.Platform;
import javafx.beans.property.ListProperty;
import javafx.beans.property.SimpleListProperty;
import javafx.beans.value.ChangeListener;
import javafx.beans.value.ObservableValue;
import javafx.collections.FXCollections;
import javafx.collections.ObservableList;
import javafx.event.ActionEvent;
import javafx.event.EventHandler;
import javafx.geometry.Insets;
import javafx.scene.Group;
import javafx.scene.Scene;
import javafx.scene.control.Button;
import javafx.scene.control.ContentDisplay;
import javafx.scene.control.TableCell;
import javafx.scene.control.TableColumn;
import javafx.scene.control.TableColumn.CellEditEvent;
import javafx.scene.control.TablePosition;
import javafx.scene.control.TableView;
import javafx.scene.control.TextField;
import javafx.scene.control.cell.PropertyValueFactory;
import javafx.scene.input.KeyCode;
import javafx.scene.input.KeyEvent;
import javafx.scene.layout.VBox;
import javafx.stage.Stage;
import javafx.util.Callback;
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
/**
*
* #author Yunus
*/
public class CollectionForm extends Application{
private TableView table = new TableView();
private ObservableList<Collection> collectionList = FXCollections.<Collection>observableArrayList();
ListProperty<Collection> collectionListProperty = new SimpleListProperty<>();
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
launch(args);
}
#Override
public void start(Stage stage) {
// single cell selection mode
table.getSelectionModel().setCellSelectionEnabled(true);
//Create a custom cell factory so that cells can support editing.
Callback<TableColumn, TableCell> editableFactory = new Callback<TableColumn, TableCell>() {
#Override
public TableCell call(TableColumn p) {
return new EditableTableCell();
}
};
//A custom cell factory that creates cells that only accept numerical input.
Callback<TableColumn, TableCell> numericFactory = new Callback<TableColumn, TableCell>() {
#Override
public TableCell call(TableColumn p) {
return new NumericEditableTableCell();
}
};
Button b = createSaveCollectionBtn();
//Create columns
TableColumn colMNO = createMNOColumn(editableFactory);
TableColumn colName = createNameColumn(editableFactory);
TableColumn colQty = createQuantityColumn(numericFactory);
table.getColumns().addAll(colMNO, colName, colQty);
//Make the table editable
table.setEditable(true);
collectionListProperty.set(collectionList);
table.itemsProperty().bindBidirectional(collectionListProperty);
collectionList.add(new Collection());
collectionList.add(new Collection());
Scene scene = new Scene(new Group());
stage.setTitle("Table View Sample");
final VBox vbox = new VBox();
vbox.setSpacing(5);
vbox.getChildren().addAll(b, table);
vbox.setPadding(new Insets(10, 0, 0, 10));
((Group) scene.getRoot()).getChildren().addAll(vbox);
stage.setScene(scene);
stage.show();
}
private void handleCollection(ActionEvent event){
for (Collection collection : collectionList) {
System.out.println("MNO: "+collection.getMno()+" Quantity: "+collection.getQuantity());
}
}
private Button createSaveCollectionBtn(){
Button btn = new Button("Save Collection");
btn.setId("btnSaveCollection");
btn.setOnAction(this::handleCollection);
return btn;
}
private TableColumn createQuantityColumn(Callback<TableColumn, TableCell> editableFactory) {
TableColumn colQty = new TableColumn("Quantity");
colQty.setMinWidth(25);
colQty.setId("colQty");
colQty.setCellValueFactory(new PropertyValueFactory("quantity"));
colQty.setCellFactory(editableFactory);
colQty.setOnEditCommit(new EventHandler<CellEditEvent<Collection, Long>>() {
#Override
public void handle(CellEditEvent<Collection, Long> t) {
((Collection) t.getTableView().getItems().get(t.getTablePosition().getRow())).setQuantity(t.getNewValue());
}
});
return colQty;
}
private TableColumn createMNOColumn(Callback<TableColumn, TableCell> editableFactory) {
TableColumn colMno = new TableColumn("M/NO");
colMno.setMinWidth(25);
colMno.setId("colMNO");
colMno.setCellValueFactory(new PropertyValueFactory("mno"));
colMno.setCellFactory(editableFactory);
colMno.setOnEditCommit(new EventHandler<CellEditEvent<Collection, String>>() {
#Override
public void handle(CellEditEvent<Collection, String> t) {
((Collection) t.getTableView().getItems().get(t.getTablePosition().getRow())).setMno(t.getNewValue());
}
});
return colMno;
}
private TableColumn createNameColumn(Callback<TableColumn, TableCell> editableFactory) {
TableColumn colName = new TableColumn("Name");
colName.setEditable(false);
colName.setMinWidth(100);
colName.setId("colName");
colName.setCellValueFactory(new PropertyValueFactory<Collection, String>("name"));
colName.setCellFactory(editableFactory);
//Modifying the firstName property
colName.setOnEditCommit(new EventHandler<CellEditEvent<Collection, String>>() {
#Override
public void handle(CellEditEvent<Collection, String> t) {
((Collection) t.getTableView().getItems().get(t.getTablePosition().getRow())).setName(t.getNewValue());
}
});
return colName;
}
/**
*
* #author Graham Smith
*/
public class EditableTableCell<S extends Object, T extends String> extends AbstractEditableTableCell<S, T> {
public EditableTableCell() {
}
#Override
protected String getString() {
return getItem() == null ? "" : getItem().toString();
}
#Override
protected void commitHelper( boolean losingFocus ) {
commitEdit(((T) textField.getText()));
}
}
/**
*
* #author Graham Smith
*/
public class NumericEditableTableCell<S extends Object, T extends Number> extends AbstractEditableTableCell<S, T> {
private final NumberFormat format;
private boolean emptyZero;
private boolean completeParse;
/**
* Creates a new {#code NumericEditableTableCell} which treats empty strings as zero,
* will parse integers only and will fail if is can't parse the whole string.
*/
public NumericEditableTableCell() {
this( NumberFormat.getInstance(), true, true, true );
}
/**
* The integerOnly and completeParse settings have a complex relationship and care needs
* to be take to get the correct result.
* <ul>
* <li>If you want to accept only integers and you want to parse the whole string then
* set both integerOnly and completeParse to true. Strings such as 1.5 will be rejected
* as invalid. A string such as 1000 will be accepted as the number 1000.</li>
* <li>If you only want integers but don't care about parsing the whole string set
* integerOnly to true and completeParse to false. This will parse a string such as
* 1.5 and provide the number 1. The downside of this combination is that it will accept
* the string 1x and return the number 1 also.</li>
* <li>If you want to accept decimals and want to parse the whole string set integerOnly
* to false and completeParse to true. This will accept a string like 1.5 and return
* the number 1.5. A string such as 1.5x will be rejected.</li>
* <li>If you want to accept decimals and don't care about parsing the whole string set
* both integerOnly and completeParse to false. This will accept a string like 1.5x and
* return the number 1.5. A string like x1.5 will be rejected because ti doesn't start
* with a number. The downside of this combination is that a string like 1.5x3 will
* provide the number 1.5.</li>
* </ul>
*
* #param format the {#code NumberFormat} to use to format this cell.
* #param emptyZero if true an empty cell will be treated as zero.
* #param integerOnly if true only the integer part of the string is parsed.
* #param completeParse if true an exception will be thrown if the whole string given can't be parsed.
*/
public NumericEditableTableCell( NumberFormat format, boolean emptyZero, boolean integerOnly, boolean completeParse ) {
this.format = format;
this.emptyZero = emptyZero;
this.completeParse = completeParse;
format.setParseIntegerOnly(integerOnly);
}
#Override
protected String getString() {
return getItem() == null ? "" : format.format(getItem());
}
/**
* Parses the value of the text field and if matches the set format
* commits the edit otherwise it returns the cell to it's previous value.
*/
#Override
protected void commitHelper( boolean losingFocus ) {
if( textField == null ) {
return;
}
try {
String input = textField.getText();
if (input == null || input.length() == 0) {
if(emptyZero) {
setText( format.format(0) );
commitEdit( (T)new Integer( 0 ));
}
return;
}
int startIndex = 0;
ParsePosition position = new ParsePosition(startIndex);
Number parsedNumber = format.parse(input, position);
if (completeParse && position.getIndex() != input.length()) {
throw new ParseException("Failed to parse complete string: " + input, position.getIndex());
}
if (position.getIndex() == startIndex ) {
throw new ParseException("Failed to parse a number from the string: " + input, position.getIndex());
}
commitEdit( (T)parsedNumber );
} catch (ParseException ex) {
//Most of the time we don't mind if there is a parse exception as it
//indicates duff user data but in the case where we are losing focus
//it means the user has clicked away with bad data in the cell. In that
//situation we want to just cancel the editing and show them the old
//value.
if( losingFocus ) {
cancelEdit();
}
}
}
}
/**
* Provides the basis for an editable table cell using a text field. Sub-classes can provide formatters for display and a
* commitHelper to control when editing is committed.
*
* #author Graham Smith
*/
public abstract class AbstractEditableTableCell<S, T> extends TableCell<S, T> {
protected TextField textField;
public AbstractEditableTableCell() {
}
/**
* Any action attempting to commit an edit should call this method rather than commit the edit directly itself. This
* method will perform any validation and conversion required on the value. For text values that normally means this
* method just commits the edit but for numeric values, for example, it may first parse the given input. <p> The only
* situation that needs to be treated specially is when the field is losing focus. If you user hits enter to commit the
* cell with bad data we can happily cancel the commit and force them to enter a real value. If they click away from the
* cell though we want to give them their old value back.
*
* #param losingFocus true if the reason for the call was because the field is losing focus.
*/
protected abstract void commitHelper(boolean losingFocus);
/**
* Provides the string representation of the value of this cell when the cell is not being edited.
*/
protected abstract String getString();
#Override
public void startEdit() {
super.startEdit();
if (textField == null) {
createTextField();
}
setGraphic(textField);
setContentDisplay(ContentDisplay.GRAPHIC_ONLY);
Platform.runLater(new Runnable() {
#Override
public void run() {
textField.selectAll();
textField.requestFocus();
}
});
}
#Override
public void cancelEdit() {
super.cancelEdit();
setText(getString());
setContentDisplay(ContentDisplay.TEXT_ONLY);
//Once the edit has been cancelled we no longer need the text field
//so we mark it for cleanup here. Note though that you have to handle
//this situation in the focus listener which gets fired at the end
//of the editing.
textField = null;
}
#Override
public void updateItem(T item, boolean empty) {
super.updateItem(item, empty);
if (empty) {
setText(null);
setGraphic(null);
} else {
if (isEditing()) {
if (textField != null) {
textField.setText(getString());
}
setGraphic(textField);
setContentDisplay(ContentDisplay.GRAPHIC_ONLY);
} else {
setText(getString());
setContentDisplay(ContentDisplay.TEXT_ONLY);
}
}
}
private void createTextField() {
textField = new TextField(getString());
textField.setMinWidth(this.getWidth() - this.getGraphicTextGap() * 2);
textField.setOnKeyPressed(new EventHandler<KeyEvent>() {
#Override
public void handle(KeyEvent t) {
if (t.getCode() == KeyCode.ENTER) {
commitHelper(false);
} else if (t.getCode() == KeyCode.ESCAPE) {
cancelEdit();
} else if (t.getCode() == KeyCode.TAB) {
commitHelper(false);
TableColumn nextColumn = getNextColumn(!t.isShiftDown());
TablePosition focusedCellPosition = getTableView().getFocusModel().getFocusedCell();
if (nextColumn != null) {
//if( focusedCellPosition.getColumn() ){}focusedCellPosition.getTableColumn()
System.out.println("Column: "+focusedCellPosition.getColumn());
System.out.println("nextColumn.getId();: "+nextColumn.getId());
if( nextColumn.getId().equals("colMNO") ){
collectionList.add(new Collection());
getTableView().edit((getTableRow().getIndex())+1,getTableView().getColumns().get(0) );
getTableView().layout();
} else {
getTableView().edit(getTableRow().getIndex(), nextColumn);
}
}else{
getTableView().edit((getTableRow().getIndex())+1,getTableView().getColumns().get(0) );
}
}
}
});
textField.focusedProperty().addListener(new ChangeListener<Boolean>() {
#Override
public void changed(ObservableValue<? extends Boolean> observable, Boolean oldValue, Boolean newValue) {
//This focus listener fires at the end of cell editing when focus is lost
//and when enter is pressed (because that causes the text field to lose focus).
//The problem is that if enter is pressed then cancelEdit is called before this
//listener runs and therefore the text field has been cleaned up. If the
//text field is null we don't commit the edit. This has the useful side effect
//of stopping the double commit.
if (!newValue && textField != null) {
commitHelper(true);
}
}
});
}
/**
*
* #param forward true gets the column to the right, false the column to the left of the current column
* #return
*/
private TableColumn<S, ?> getNextColumn(boolean forward) {
List<TableColumn<S, ?>> columns = new ArrayList<>();
for (TableColumn<S, ?> column : getTableView().getColumns()) {
columns.addAll(getLeaves(column));
}
//There is no other column that supports editing.
if (columns.size() < 2) {
return null;
}
int currentIndex = columns.indexOf(getTableColumn());
int nextIndex = currentIndex;
if (forward) {
nextIndex++;
if (nextIndex > columns.size() - 1) {
nextIndex = 0;
}
} else {
nextIndex--;
if (nextIndex < 0) {
nextIndex = columns.size() - 1;
}
}
return columns.get(nextIndex);
}
private List<TableColumn<S, ?>> getLeaves(TableColumn<S, ?> root) {
List<TableColumn<S, ?>> columns = new ArrayList<>();
if (root.getColumns().isEmpty()) {
//We only want the leaves that are editable.
if (root.isEditable()) {
columns.add(root);
}
return columns;
} else {
for (TableColumn<S, ?> column : root.getColumns()) {
columns.addAll(getLeaves(column));
}
return columns;
}
}
}
public class Collection {
private int id;
private String mno;
private String name;
private float quantity;
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
public String getMno() {
return mno;
}
public void setMno(String mno) {
this.mno = mno;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public float getQuantity() {
return quantity;
}
public void setQuantity(float quantity) {
this.quantity = quantity;
}
}
}
The problem is when i take the same code to a controller and add this table programmatically, does not work as before: it jumps next line and go to third.
Before asking the TableView to edit the cell it's important to make sure that it has focus, that the cell in question is in view, and that the view layout is up to date. This is probably because of the way TableView uses virtual cells.
Add these three lines before any call to TableView#edit:
getTableView().requestFocus();
getTableView().scrollTo(rowToEdit);
getTableView().layout();
// getTableView().edit goes here.
This solved this problem for me.

ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hbase.io.ImmutableBytesWritable

I have exported a table from Hbase to a file in almost like org.apache.hadoop.mapreduce.lib.output.TextOutputFormat,To import the exported Text format file I have tweaked the code of Import from the open source to support importing text based files instead of SequenceFile.
job.setInputFormatClass(TextInputFormat.class);
while running the Import class I am getting the following exception.
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hbase.io.ImmutableBytesWritable
at Import$Importer.map(Import.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
here is my Export Class which was tweaked to write the content to the file from the ExpoterTable.
public class Export
{
private static final Log LOG = LogFactory.getLog(Export.class);
final static String NAME = "export";
final static String RAW_SCAN = "hbase.mapreduce.include.deleted.rows";
private static OutputStream out;
private static final String utf8 = "UTF-8";
private static final byte[] newline;
private static final byte[] keyValueSeparator;
static {
try {
newline = "\n".getBytes(utf8);
keyValueSeparator = "\t".getBytes(utf8);
}
catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException("can't find " + utf8 + " encoding");
}
}
/**
* Mapper.
*/
static class ExporterTable extends TableMapper<ImmutableBytesWritable, Result>
{
/**
* #param row The current table row key.
* #param value The columns.
* #param context The current context.
* #throws IOException When something is broken with the data.
* #see org.apache.hadoop.mapreduce.Mapper#map(KEYIN, VALUEIN,
* org.apache.hadoop.mapreduce.Mapper.Context)
*/
#Override
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException {
try {
context.write(row, value);
write(row, value);
System.out.println(row);
System.out.println(value);
}
catch (InterruptedException e) {
e.printStackTrace();
}
}
}
/**
* Sets up the actual job.
*
* #param conf The current configuration.
* #param args The command line parameters.
* #return The newly created job.
* #throws IOException When setting up the job fails.
*/
public static Job createSubmittableJob(Configuration conf, String[] args) throws IOException {
String tableName = args[0];
// this.out = new DataOutputStream(fos);
Path outputDir = new Path(args[1]);
Job job = new Job(conf, NAME + "_" + tableName);
job.setJobName(NAME + "_" + tableName);
job.setJarByClass(ExporterTable.class);
// Set optional scan parameters
Scan s = getConfiguredScanForJob(conf, args);
TableMapReduceUtil.initTableMapperJob(tableName, s, ExporterTable.class, ImmutableBytesWritable.class, IntWritable.class, job);
// No reducers. Just write straight to output files.
job.setNumReduceTasks(0);
job.setOutputValueClass(Text.class);
// FileOutputFormat.setOutputPath(job, outputDir);
job.setOutputFormatClass(NullOutputFormat.class);
TableMapReduceUtil.addHBaseDependencyJars(conf);
TableMapReduceUtil.addDependencyJars(conf, JsonProcessingException.class);
TableMapReduceUtil.addDependencyJars(job);
return job;
}
private static Scan getConfiguredScanForJob(Configuration conf, String[] args) throws IOException {
Scan s = new Scan();
// Optional arguments.
// Set Scan Versions
int versions = args.length > 2 ? Integer.parseInt(args[2]) : 1;
s.setMaxVersions(versions);
// Set Scan Range
long startTime = args.length > 3 ? Long.parseLong(args[3]) : 0L;
long endTime = args.length > 4 ? Long.parseLong(args[4]) : Long.MAX_VALUE;
s.setTimeRange(startTime, endTime);
// Set cache blocks
s.setCacheBlocks(false);
// Set Scan Column Family
boolean raw = Boolean.parseBoolean(conf.get(RAW_SCAN));
if (raw) {
s.setRaw(raw);
}
if (conf.get(TableInputFormat.SCAN_COLUMN_FAMILY) != null) {
s.addFamily(Bytes.toBytes(conf.get(TableInputFormat.SCAN_COLUMN_FAMILY)));
}
// Set RowFilter or Prefix Filter if applicable.
Filter exportFilter = getExportFilter(args);
if (exportFilter != null) {
LOG.info("Setting Scan Filter for Export.");
s.setFilter(exportFilter);
}
LOG.info("versions=" + versions + ", starttime=" + startTime + ", endtime=" + endTime + ", keepDeletedCells=" + raw);
return s;
}
private static Filter getExportFilter(String[] args) {
Filter exportFilter = null;
String filterCriteria = (args.length > 5) ? args[5] : null;
if (filterCriteria == null)
return null;
if (filterCriteria.startsWith("^")) {
String regexPattern = filterCriteria.substring(1, filterCriteria.length());
exportFilter = new RowFilter(CompareOp.EQUAL, new RegexStringComparator(regexPattern));
}
else {
exportFilter = new PrefixFilter(Bytes.toBytes(filterCriteria));
}
return exportFilter;
}
/*
* #param errorMsg Error message. Can be null.
*/
private static void usage(final String errorMsg) {
if (errorMsg != null && errorMsg.length() > 0) {
System.err.println("ERROR: " + errorMsg);
}
System.err.println("Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> " + "[<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]\n");
System.err.println(" Note: -D properties will be applied to the conf used. ");
System.err.println(" For example: ");
System.err.println(" -D mapred.output.compress=true");
System.err.println(" -D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec");
System.err.println(" -D mapred.output.compression.type=BLOCK");
System.err.println(" Additionally, the following SCAN properties can be specified");
System.err.println(" to control/limit what is exported..");
System.err.println(" -D " + TableInputFormat.SCAN_COLUMN_FAMILY + "=<familyName>");
System.err.println(" -D " + RAW_SCAN + "=true");
System.err.println("For performance consider the following properties:\n" + " -Dhbase.client.scanner.caching=100\n" + " -Dmapred.map.tasks.speculative.execution=false\n" + " -Dmapred.reduce.tasks.speculative.execution=false");
}
/**
* Main entry point.
*
* #param args The command line parameters.
* #throws Exception When running the job fails.
*/
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
conf.set("mapreduce.framework.name", "local");
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
usage("Wrong number of arguments: " + otherArgs.length);
System.exit(-1);
}
boolean jobStatus = false;
Job job = createSubmittableJob(conf, otherArgs);
try {
File f = new File("Test");
out = new FileOutputStream(f);
jobStatus = job.waitForCompletion(true);
}
catch (Exception e) {
e.printStackTrace();
}
finally {
IOUtils.closeStream(out);
}
// convertTextToSequence(conf);
System.exit(jobStatus ? 0 : 1);
}
public static void write(ImmutableBytesWritable key, Result value) throws IOException {
boolean nullKey = key == null;
boolean nullValue = value == null;
if (nullKey && nullValue) {
return;
}
if (!nullKey) {
writeObject(key);
}
if (!(nullKey || nullValue)) {
out.write(keyValueSeparator);
}
if (!nullValue) {
writeObject(value);
}
out.write(newline);
}
/**
* Write the object to the byte stream, handling Text as a special
* case.
* #param o the object to print
* #throws IOException if the write throws, we pass it on
*/
private static void writeObject(Object o) throws IOException {
if (o instanceof Text) {
Text to = (Text) o;
out.write(to.getBytes(), 0, to.getLength());
}
else {
out.write(o.toString().getBytes(utf8));
}
}
}
any help is appreciated .
You have declared map method as follows and writing output key as ImmutableBytesWritable
public void map(ImmutableBytesWritable row, Result value, Context context)
throws IOException {
try {
context.write(row, value);
You have to override job parameters as follows to set MapOutputKeyClass and MapOutPutvalueClass
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Result.class);
Have a look at working Example : 7. Export an HBase table to File

Quartz doesn't recognize schema job_scheduling_data_2_0.xsd present in quartz jar file

I am getting below exception on server startup.
I am using quartz 2.2.21 with spring 3.2.
I have enabled quartz plugin (org.quartz.plugins.xml.XMLSchedulingDataProcessorPlugin).
Please find below the start tag of our XML file:
During server startup we are getting below log information and stacktrace:
Error Message:
Unable to load local schema packaged in quartz distribution jar. Utilizing schema online at http://www.quartz-scheduler.org/xml/job_scheduling_data_2_0.xsd
Exception:
Caused by: org.xml.sax.SAXParseException; systemId: file:///quartz_job_data.xml; lineNumber: 5; columnNumber: 104;
schema_reference.4: Failed to read schema document 'http://www.quartz-scheduler.org/xml/job_scheduling_data_2_0.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
I have the same problem. I'm using the 7.1.1 of Jboss and the problem appears when you don't have connection to the internet. This is easy as putting a fake address that's unreachable in hosts.
I tried to force to local copy but it does not work.
What I finally did is to partially overwrite the functionality until this is fixed. Watch: https://jira.spring.io/browse/SPR-13706
public class CustomXMLSchedulingDataProcessor extends org.quartz.xml.XMLSchedulingDataProcessor {
public static final String QUARTZ_XSD_PATH_IN_JAR_CLASSPATH = "classpath:org/quartz/xml/job_scheduling_data_2_0.xsd";
public CustomXMLSchedulingDataProcessor(ClassLoadHelper clh) throws ParserConfigurationException {
super(clh);
}
#Override
protected Object resolveSchemaSource() {
InputSource inputSource;
InputStream is = null;
try {
is = classLoadHelper.getResourceAsStream(QUARTZ_XSD_PATH_IN_JAR_CLASSPATH);
} finally {
if (is != null) {
inputSource = new InputSource(is);
inputSource.setSystemId(QUARTZ_SCHEMA_WEB_URL);
}
else {
return QUARTZ_SCHEMA_WEB_URL;
}
}
return inputSource;
}
}
And I did a new plugin XMLSchedulingDataProcessorPlugin overwritting just the instanciation of above class.
public class XMLSchedulingDataProcessorPlugin
extends SchedulerPluginWithUserTransactionSupport
implements FileScanListener {
/*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
* Data members.
*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
private static final int MAX_JOB_TRIGGER_NAME_LEN = 80;
private static final String JOB_INITIALIZATION_PLUGIN_NAME = "JobSchedulingDataLoaderPlugin";
private static final String FILE_NAME_DELIMITERS = ",";
private boolean failOnFileNotFound = true;
private String fileNames = CustomXMLSchedulingDataProcessor.QUARTZ_XML_DEFAULT_FILE_NAME;
// Populated by initialization
private Map<String, JobFile> jobFiles = new LinkedHashMap<String, JobFile>();
private long scanInterval = 0;
boolean started = false;
protected ClassLoadHelper classLoadHelper = null;
private Set<String> jobTriggerNameSet = new HashSet<String>();
/*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
* Constructors.
*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
public XMLSchedulingDataProcessorPlugin() {
}
/*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
* Interface.
*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
/**
* Comma separated list of file names (with paths) to the XML files that should be read.
*/
public String getFileNames() {
return fileNames;
}
/**
* The file name (and path) to the XML file that should be read.
*/
public void setFileNames(String fileNames) {
this.fileNames = fileNames;
}
/**
* The interval (in seconds) at which to scan for changes to the file.
* If the file has been changed, it is re-loaded and parsed. The default
* value for the interval is 0, which disables scanning.
*
* #return Returns the scanInterval.
*/
public long getScanInterval() {
return scanInterval / 1000;
}
/**
* The interval (in seconds) at which to scan for changes to the file.
* If the file has been changed, it is re-loaded and parsed. The default
* value for the interval is 0, which disables scanning.
*
* #param scanInterval The scanInterval to set.
*/
public void setScanInterval(long scanInterval) {
this.scanInterval = scanInterval * 1000;
}
/**
* Whether or not initialization of the plugin should fail (throw an
* exception) if the file cannot be found. Default is <code>true</code>.
*/
public boolean isFailOnFileNotFound() {
return failOnFileNotFound;
}
/**
* Whether or not initialization of the plugin should fail (throw an
* exception) if the file cannot be found. Default is <code>true</code>.
*/
public void setFailOnFileNotFound(boolean failOnFileNotFound) {
this.failOnFileNotFound = failOnFileNotFound;
}
/*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
* SchedulerPlugin Interface.
*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
/**
* <p>
* Called during creation of the <code>Scheduler</code> in order to give
* the <code>SchedulerPlugin</code> a chance to initialize.
* </p>
*
* #throws org.quartz.SchedulerConfigException
* if there is an error initializing.
*/
public void initialize(String name, final Scheduler scheduler, ClassLoadHelper schedulerFactoryClassLoadHelper)
throws SchedulerException {
super.initialize(name, scheduler);
this.classLoadHelper = schedulerFactoryClassLoadHelper;
getLog().info("Registering Quartz Job Initialization Plug-in.");
// Create JobFile objects
StringTokenizer stok = new StringTokenizer(fileNames, FILE_NAME_DELIMITERS);
while (stok.hasMoreTokens()) {
final String fileName = stok.nextToken();
final JobFile jobFile = new JobFile(fileName);
jobFiles.put(fileName, jobFile);
}
}
#Override
public void start(UserTransaction userTransaction) {
try {
if (jobFiles.isEmpty() == false) {
if (scanInterval > 0) {
getScheduler().getContext().put(JOB_INITIALIZATION_PLUGIN_NAME + '_' + getName(), this);
}
Iterator<JobFile> iterator = jobFiles.values().iterator();
while (iterator.hasNext()) {
JobFile jobFile = iterator.next();
if (scanInterval > 0) {
String jobTriggerName = buildJobTriggerName(jobFile.getFileBasename());
TriggerKey tKey = new TriggerKey(jobTriggerName, JOB_INITIALIZATION_PLUGIN_NAME);
// remove pre-existing job/trigger, if any
getScheduler().unscheduleJob(tKey);
JobDetail job = newJob().withIdentity(jobTriggerName, JOB_INITIALIZATION_PLUGIN_NAME).ofType(FileScanJob.class)
.usingJobData(FileScanJob.FILE_NAME, jobFile.getFileName())
.usingJobData(FileScanJob.FILE_SCAN_LISTENER_NAME, JOB_INITIALIZATION_PLUGIN_NAME + '_' + getName())
.build();
SimpleTrigger trig = newTrigger().withIdentity(tKey).withSchedule(
simpleSchedule().repeatForever().withIntervalInMilliseconds(scanInterval))
.forJob(job)
.build();
getScheduler().scheduleJob(job, trig);
getLog().debug("Scheduled file scan job for data file: {}, at interval: {}", jobFile.getFileName(), scanInterval);
}
processFile(jobFile);
}
}
} catch(SchedulerException se) {
getLog().error("Error starting background-task for watching jobs file.", se);
} finally {
started = true;
}
}
/**
* Helper method for generating unique job/trigger name for the
* file scanning jobs (one per FileJob). The unique names are saved
* in jobTriggerNameSet.
*/
private String buildJobTriggerName(
String fileBasename) {
// Name w/o collisions will be prefix + _ + filename (with '.' of filename replaced with '_')
// For example: JobInitializationPlugin_jobInitializer_myjobs_xml
String jobTriggerName = JOB_INITIALIZATION_PLUGIN_NAME + '_' + getName() + '_' + fileBasename.replace('.', '_');
// If name is too long (DB column is 80 chars), then truncate to max length
if (jobTriggerName.length() > MAX_JOB_TRIGGER_NAME_LEN) {
jobTriggerName = jobTriggerName.substring(0, MAX_JOB_TRIGGER_NAME_LEN);
}
// Make sure this name is unique in case the same file name under different
// directories is being checked, or had a naming collision due to length truncation.
// If there is a conflict, keep incrementing a _# suffix on the name (being sure
// not to get too long), until we find a unique name.
int currentIndex = 1;
while (jobTriggerNameSet.add(jobTriggerName) == false) {
// If not our first time through, then strip off old numeric suffix
if (currentIndex > 1) {
jobTriggerName = jobTriggerName.substring(0, jobTriggerName.lastIndexOf('_'));
}
String numericSuffix = "_" + currentIndex++;
// If the numeric suffix would make the name too long, then make room for it.
if (jobTriggerName.length() > (MAX_JOB_TRIGGER_NAME_LEN - numericSuffix.length())) {
jobTriggerName = jobTriggerName.substring(0, (MAX_JOB_TRIGGER_NAME_LEN - numericSuffix.length()));
}
jobTriggerName += numericSuffix;
}
return jobTriggerName;
}
/**
* Overriden to ignore <em>wrapInUserTransaction</em> because shutdown()
* does not interact with the <code>Scheduler</code>.
*/
#Override
public void shutdown() {
// Since we have nothing to do, override base shutdown so don't
// get extranious UserTransactions.
}
private void processFile(JobFile jobFile) {
if (jobFile == null || !jobFile.getFileFound()) {
return;
}
try {
CustomXMLSchedulingDataProcessor processor =
new CustomXMLSchedulingDataProcessor(this.classLoadHelper);
processor.addJobGroupToNeverDelete(JOB_INITIALIZATION_PLUGIN_NAME);
processor.addTriggerGroupToNeverDelete(JOB_INITIALIZATION_PLUGIN_NAME);
processor.processFileAndScheduleJobs(
jobFile.getFileName(),
jobFile.getFileName(), // systemId
getScheduler());
} catch (Exception e) {
getLog().error("Error scheduling jobs: " + e.getMessage(), e);
}
}
public void processFile(String filePath) {
processFile((JobFile)jobFiles.get(filePath));
}
/**
* #see org.quartz.jobs.FileScanListener#fileUpdated(java.lang.String)
*/
public void fileUpdated(String fileName) {
if (started) {
processFile(fileName);
}
}
class JobFile {
private String fileName;
// These are set by initialize()
private String filePath;
private String fileBasename;
private boolean fileFound;
protected JobFile(String fileName) throws SchedulerException {
this.fileName = fileName;
initialize();
}
protected String getFileName() {
return fileName;
}
protected boolean getFileFound() {
return fileFound;
}
protected String getFilePath() {
return filePath;
}
protected String getFileBasename() {
return fileBasename;
}
private void initialize() throws SchedulerException {
InputStream f = null;
try {
String furl = null;
File file = new File(getFileName()); // files in filesystem
if (!file.exists()) {
URL url = classLoadHelper.getResource(getFileName());
if(url != null) {
try {
furl = URLDecoder.decode(url.getPath(), "UTF-8");
} catch (UnsupportedEncodingException e) {
furl = url.getPath();
}
file = new File(furl);
try {
f = url.openStream();
} catch (IOException ignor) {
// Swallow the exception
}
}
} else {
try {
f = new java.io.FileInputStream(file);
}catch (FileNotFoundException e) {
// ignore
}
}
if (f == null) {
if (isFailOnFileNotFound()) {
throw new SchedulerException(
"File named '" + getFileName() + "' does not exist.");
} else {
getLog().warn("File named '" + getFileName() + "' does not exist.");
}
} else {
fileFound = true;
}
filePath = (furl != null) ? furl : file.getAbsolutePath();
fileBasename = file.getName();
} finally {
try {
if (f != null) {
f.close();
}
} catch (IOException ioe) {
getLog().warn("Error closing jobs file " + getFileName(), ioe);
}
}
}
}
}
That way you only have to use this plugin in your configuration and everything will work by default.
org.quartz.plugin.jobInitializer.class =
com.level2.quartz.processor.plugin.XMLSchedulingDataProcessorPlugin

isSplitable in combineFileInputFormat does not work

I have thousands of small files, and I want to process them with combineFileInputFormat.
In combineFileInputFormat, there are multiple small files for one mapper, each file will not be split.
the snippet of one of the small input files like this,
vers,3
period,2015-01-26-18-12-00,438469546,449329626,complete
config,libdvm.so,chromeview
pkgproc,com.futuredial.digitchat,10021,,0ns:10860078
pkgpss,com.futuredial.digitchat,10021,,0ns:9:6627:6627:6637:5912:5912:5912
pkgsvc-run,com.futuredial.digitchat,10021,.LiveScreenService,1,0n:10860078
pkgsvc-start,com.futuredial.digitchat,10021,.LiveScreenService,1,0n:10860078
pkgproc,com.google.android.youtube,10103,,0ns:10860078
pkgpss,com.google.android.youtube,10103,,0ns:9:12986:13000:13021:11552:11564:11580
pkgsvc- run,com.google.android.youtube,10103,com.google.android.apps.youtube.app.offline.transfer.OfflineTransferService,1,0n:10860078
pkgsvc- start,com.google.android.youtube,10103,com.google.android.apps.youtube.app.offline.transfer.OfflineTransferService,1,0n:10860078
I want to pass whole file content to the mapper. However, hadoop split the file to half.
For example, the above file may be split into
vers,3
period,2015-01-26-18-12-00,438469546,449329626,complete
config,libdvm.so,chromeview
pkgproc,com.futuredial.digitchat,#the line has been cut
But I want the content of whole file to be processed.
Here is my code, which reference Reading file as single record in hadoop
The driven code
public class CombineSmallfiles {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: conbinesmallfiles <in> <out>");
System.exit(2);
}
conf.setInt("mapred.min.split.size", 1);
conf.setLong("mapred.max.split.size", 26214400); // 25m
//conf.setLong("mapred.max.split.size", 134217728); // 128m
//conf.setInt("mapred.reduce.tasks", 5);
Job job = new Job(conf, "combine smallfiles");
job.setJarByClass(CombineSmallfiles.class);
job.setMapperClass(CombineSmallfileMapper.class);
//job.setReducerClass(IdentityReducer.class);
job.setNumReduceTasks(0);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
MultipleOutputs.addNamedOutput(job,"pkgproc",TextOutputFormat.class,Text.class,Text.class);
MultipleOutputs.addNamedOutput(job,"pkgpss",TextOutputFormat.class,Text.class,Text.class);
MultipleOutputs.addNamedOutput(job,"pkgsvc",TextOutputFormat.class,Text.class,Text.class);
job.setInputFormatClass(CombineSmallfileInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
int exitFlag = job.waitForCompletion(true) ? 0 : 1;
System.exit(exitFlag);
}
}
My Mapper code
public class CombineSmallfileMapper extends Mapper<NullWritable, Text, Text, Text> {
private Text file = new Text();
private MultipleOutputs mos;
private String period;
private Long elapsed;
#Override
public void setup(Context context) throws IOException, InterruptedException {
mos = new MultipleOutputs(context);
}
#Override
protected void map(NullWritable key, Text value, Context context) throws IOException, InterruptedException {
String file_name = context.getConfiguration().get("map.input.file.name");
String [] filename_tokens = file_name.split("_");
String uuid = filename_tokens[0];
String [] datetime_tokens;
try{
datetime_tokens = filename_tokens[1].split("-");
}catch(ArrayIndexOutOfBoundsException err){
throw new ArrayIndexOutOfBoundsException(file_name);
}
String year,month,day,hour,minute,sec,msec;
year = datetime_tokens[0];
month = datetime_tokens[1];
day = datetime_tokens[2];
hour = datetime_tokens[3];
minute = datetime_tokens[4];
sec = datetime_tokens[5];
msec = datetime_tokens[6];
String datetime = year+"-"+month+"-"+"-"+day+" "+hour+":"+minute+":"+sec+"."+msec;
String content = value.toString();
String []lines = content.split("\n");
for(int u = 0;u<lines.length;u++){
String line = lines[u];
String []tokens = line.split(",");
if(tokens[0].equals("period")){
period = tokens[1];
try{
long startTime = Long.valueOf(tokens[2]);
long endTime = Long.valueOf(tokens[3]);
elapsed = endTime-startTime;
}catch(NumberFormatException err){
throw new NumberFormatException(line);
}
}else if(tokens[0].equals("pkgproc")){
String proc_info = "";
try{
proc_info += period+","+String.valueOf(elapsed)+","+tokens[2]+","+tokens[3];
}catch(ArrayIndexOutOfBoundsException err){
throw new ArrayIndexOutOfBoundsException("pkgproc: "+content+ "line:"+line);
}
for(int i = 4;i<tokens.length;i++){
String []state_info = tokens[i].split(":");
String state = "";
state += ","+state_info[0].charAt(0)+","+state_info[0].charAt(1)+","+state_info[0].charAt(2)+","+state_info[1];
mos.write("pkgproc",new Text(tokens[1]), new Text(proc_info+state+','+uuid+','+datetime));
}
}else if(tokens[0].equals("pkgpss")){
String proc_info = "";
proc_info += period+","+String.valueOf(elapsed)+","+tokens[2]+","+tokens[3];
for(int i = 4;i<tokens.length;i++){
String []state_info = tokens[i].split(":");
String state = "";
state += ","+state_info[0].charAt(0)+","+state_info[0].charAt(1)+","+state_info[0].charAt(2)+","+state_info[1]+","+state_info[2]+","+state_info[3]+","+state_info[4]+","+state_info[5]+","+state_info[6]+","+state_info[7];
mos.write("pkgpss",new Text(tokens[1]), new Text(proc_info+state+','+uuid+','+datetime));
}
}else if(tokens[0].startsWith("pkgsvc")){
String []stateName = tokens[0].split("-");
String proc_info = "";
//tokens[2] = uid, tokens[3] = serviceName
proc_info += stateName[1]+','+period+","+String.valueOf(elapsed)+","+tokens[2]+","+tokens[3];
String opcount = tokens[4];
for(int i = 5;i<tokens.length;i++){
String []state_info = tokens[i].split(":");
String state = "";
state += ","+state_info[0].charAt(0)+","+state_info[0].charAt(1)+","+state_info[1];
mos.write("pkgsvc",new Text(tokens[1]), new Text(proc_info+state+','+opcount+','+uuid+','+datetime));
}
}
}
}
}
My CombineFileInputFormat, which overrides isSplitable and return false
public class CombineSmallfileInputFormat extends CombineFileInputFormat<NullWritable, Text> {
#Override
public RecordReader<NullWritable, Text> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException {
return new CombineFileRecordReader<NullWritable,Text>((CombineFileSplit) split,context,WholeFileRecordReader.class);
}
#Override
protected boolean isSplitable(JobContext context,Path file ){
return false;
}
}
The WholeFileRecordReader
public class WholeFileRecordReader extends RecordReader<NullWritable, Text> {
//private static final Logger LOG = Logger.getLogger(WholeFileRecordReader.class);
/** The path to the file to read. */
private final Path mFileToRead;
/** The length of this file. */
private final long mFileLength;
/** The Configuration. */
private final Configuration mConf;
/** Whether this FileSplit has been processed. */
private boolean mProcessed;
/** Single Text to store the file name of the current file. */
// private final Text mFileName;
/** Single Text to store the value of this file (the value) when it is read. */
private final Text mFileText;
/**
* Implementation detail: This constructor is built to be called via
* reflection from within CombineFileRecordReader.
*
* #param fileSplit The CombineFileSplit that this will read from.
* #param context The context for this task.
* #param pathToProcess The path index from the CombineFileSplit to process in this record.
*/
public WholeFileRecordReader(CombineFileSplit fileSplit, TaskAttemptContext context,
Integer pathToProcess) {
mProcessed = false;
mFileToRead = fileSplit.getPath(pathToProcess);
mFileLength = fileSplit.getLength(pathToProcess);
mConf = context.getConfiguration();
context.getConfiguration().set("map.input.file.name", mFileToRead.getName());
assert 0 == fileSplit.getOffset(pathToProcess);
//if (LOG.isDebugEnabled()) {
//LOG.debug("FileToRead is: " + mFileToRead.toString());
//LOG.debug("Processing path " + pathToProcess + " out of " + fileSplit.getNumPaths());
//try {
//FileSystem fs = FileSystem.get(mConf);
//assert fs.getFileStatus(mFileToRead).getLen() == mFileLength;
//} catch (IOException ioe) {
//// oh well, I was just testing.
//}
//}
//mFileName = new Text();
mFileText = new Text();
}
/** {#inheritDoc} */
#Override
public void close() throws IOException {
mFileText.clear();
}
/**
* Returns the absolute path to the current file.
*
* #return The absolute path to the current file.
* #throws IOException never.
* #throws InterruptedException never.
*/
#Override
public NullWritable getCurrentKey() throws IOException, InterruptedException {
return NullWritable.get();
}
/**
* <p>Returns the current value. If the file has been read with a call to NextKeyValue(),
* this returns the contents of the file as a BytesWritable. Otherwise, it returns an
* empty BytesWritable.</p>
*
* <p>Throws an IllegalStateException if initialize() is not called first.</p>
*
* #return A BytesWritable containing the contents of the file to read.
* #throws IOException never.
* #throws InterruptedException never.
*/
#Override
public Text getCurrentValue() throws IOException, InterruptedException {
return mFileText;
}
/**
* Returns whether the file has been processed or not. Since only one record
* will be generated for a file, progress will be 0.0 if it has not been processed,
* and 1.0 if it has.
*
* #return 0.0 if the file has not been processed. 1.0 if it has.
* #throws IOException never.
* #throws InterruptedException never.
*/
#Override
public float getProgress() throws IOException, InterruptedException {
return (mProcessed) ? (float) 1.0 : (float) 0.0;
}
/**
* All of the internal state is already set on instantiation. This is a no-op.
*
* #param split The InputSplit to read. Unused.
* #param context The context for this task. Unused.
* #throws IOException never.
* #throws InterruptedException never.
*/
#Override
public void initialize(InputSplit split, TaskAttemptContext context)
throws IOException, InterruptedException {
// no-op.
}
/**
* <p>If the file has not already been read, this reads it into memory, so that a call
* to getCurrentValue() will return the entire contents of this file as Text,
* and getCurrentKey() will return the qualified path to this file as Text. Then, returns
* true. If it has already been read, then returns false without updating any internal state.</p>
*
* #return Whether the file was read or not.
* #throws IOException if there is an error reading the file.
* #throws InterruptedException if there is an error.
*/
#Override
public boolean nextKeyValue() throws IOException, InterruptedException {
if (!mProcessed) {
if (mFileLength > (long) Integer.MAX_VALUE) {
throw new IOException("File is longer than Integer.MAX_VALUE.");
}
byte[] contents = new byte[(int) mFileLength];
FileSystem fs = mFileToRead.getFileSystem(mConf);
FSDataInputStream in = null;
try {
// Set the contents of this file.
in = fs.open(mFileToRead);
IOUtils.readFully(in, contents, 0, contents.length);
mFileText.set(contents, 0, contents.length);
} finally {
IOUtils.closeQuietly(in);
}
mProcessed = true;
return true;
}
return false;
}
}
I want every mapper to parse multiple small files and each small file can not be split.
However, above code will cut(split) my input file and will raise a parsing error (since my parser will split the line into tokens).
In my concept, combineFileInputFormat will gather multiple files into one split, and each split will feed into one mapper. Therefore, one mapper can handle multiple files.
In my code, the max input split is set to 25MB, so I think the problem is that combineFileInputFormat will split the last part of small file of input split to satisfy the split size limit.
However, I have override isSplitable and return false, but it still splits the small file.
What is the correct way to do that?
I am not sure if it is possible to specify number of files to a mapper, rather than specify input split size?
Use setMaxSplitSize() method in your constructor code, it should work,
It ideally tells the split size,
public class CFInputFormat extends CombineFileInputFormat<FileLineWritable, Text> {
public CFInputFormat(){
super();
setMaxSplitSize(67108864); // 64 MB, default block size on hadoop
}
public RecordReader<FileLineWritable, Text> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException{
return new CombineFileRecordReader<FileLineWritable, Text>((CombineFileSplit)split, context, CFRecordReader.class);
}
#Override
protected boolean isSplitable(JobContext context, Path file){
return false;
}
}

Resources