trying to optimise the code and increasing performance by reading same text file from a method to different methods in java - java-7

Am trying to reducing the code and increasing performance by reading same text file from a method to different methods in java.
sample code of reading text file in each every method based on requirement.
enter code here:
class{
main(){
method1();
method2();
method3();
....
}
method1(){
BufferedReader reader = new BufferedReader(new FileReader(file.txt));
...
}
method2(){
BufferedReader reader = new BufferedReader(new FileReader(file.txt));
...
}
method3(){
BufferedReader reader = new BufferedReader(new FileReader(file.txt));
.....
}
}
what i want to know is there any logic to read text file once in one method and use in different method in java?

If the content of the file is immutable, you can:
store his content, line by line, in a specific method
this method is called by constructor
the returned datas are stored in a List attribute of class
and refer to this attribute by the other methods
method1()
method2()
method3()

Related

How does the POI Event API read data from Excel and why does it use less RAM?

I am currently writing my bachelor thesis and I am using the POI Event API from Apache. In short, my work is about a more efficient way to read data from Excel.
I get asked by developers again and again how exactly this is meant with Event API. Unfortunately I don't find anything on the Apache page about the basic principle.
Following code, how I use the POI Event API (This is from the Apache example for XSSF and SAX):
import java.io.InputStream;
import java.util.Iterator;
import org.apache.poi.ooxml.util.SAXHelper;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.ParserConfigurationException;
public class ExampleEventUserModel {
public void processOneSheet(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
// To look up the Sheet Name / Sheet Order / rID,
// you need to process the core Workbook stream.
// Normally it's of the form rId# or rSheet#
InputStream sheet2 = r.getSheet("rId2");
InputSource sheetSource = new InputSource(sheet2);
parser.parse(sheetSource);
sheet2.close();
}
public void processAllSheets(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
Iterator<InputStream> sheets = r.getSheetsData();
while(sheets.hasNext()) {
System.out.println("Processing new sheet:\n");
InputStream sheet = sheets.next();
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
System.out.println("");
}
}
public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException, ParserConfigurationException {
XMLReader parser = SAXHelper.newXMLReader();
ContentHandler handler = new SheetHandler(sst);
parser.setContentHandler(handler);
return parser;
}
/**
* See org.xml.sax.helpers.DefaultHandler javadocs
*/
private static class SheetHandler extends DefaultHandler {
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => cell
if(name.equals("c")) {
// Print the cell reference
System.out.print(attributes.getValue("r") + " - ");
// Figure out if the value is an index in the SST
String cellType = attributes.getValue("t");
if(cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
}
// Clear contents cache
lastContents = "";
}
public void endElement(String uri, String localName, String name)
throws SAXException {
// Process the last contents as required.
// Do now, as characters() may be called more than once
if(nextIsString) {
int idx = Integer.parseInt(lastContents);
lastContents = sst.getItemAt(idx).getString();
nextIsString = false;
}
// v => contents of a cell
// Output after we've seen the string contents
if(name.equals("v")) {
System.out.println(lastContents);
}
}
public void characters(char[] ch, int start, int length) {
lastContents += new String(ch, start, length);
}
}
public static void main(String[] args) throws Exception {
ExampleEventUserModel example = new ExampleEventUserModel();
example.processOneSheet(args[0]);
example.processAllSheets(args[0]);
}
}
Can someone please explain to me how the Event API works? Is it the same as the event-based architecture or is it something else?
A *.xlsx file, which is Excel stored in Office Open XML and is what apache poi handles as XSSF, is a ZIP archive containing the data in XML files within a directory structure. So we can unzip the *.xlsx file and get the data directly from the XML files then.
There is /xl/sharedStrings.xml having all the string cell values in it. And there is /xl/workbook.xml describing the workbook structure. And there are /xl/worksheets/sheet1.xml, /xl/worksheets/sheet2.xml, ... which are storing the sheets' data. And there is /xl/styles.xml having the style settings for all cells in the sheets.
Per default while creating a XSSFWorkbook all those parts of the *.xlsx file will become object representations as XSSFWorkbook, XSSFSheet, XSSFRow, XSSFCell, ... and further objects of org.apache.poi.xssf.*.* in memory.
To get an impression of how memory consuming XSSFSheet, XSSFRow and XSSFCell are, a look into the sources will be good. Each of those objects contains multiple Lists and Maps as internally members and of course multiple methods too. Now imagine a sheet having hundreds of thousands of rows each containing up to hundreds of cells. Each of those rows and cells will be represented by a XSSFRow or a XSSFCell in memory. This cannot be an accusation to apache poi because those objects are necessary if working with those objects is needed. But if the need is really only getting the content out of the Excel sheet, then those objects are not all necessary. That's why the XSSF and SAX (Event API) approach.
So if the need is only reading data from sheets one could simply parsing the XML of all the /xl/worksheets/sheet[n].xml files without the need for creating memory consuming objects for each sheet and for each row and for each cell in those sheets.
Parsing XML in event based mode means that the code goes top down through the XML and has callback methods defined which get called if the code detects the start of an element, the end of an element or character content within an element. The appropriate callback methods then handle what to do on start, end or with character content of an element. So reading the XML file only means running top down through the file once, handle the events (start, end, character content of an element) and are able getting all needed content out of it. So memory consuming is reduced to storing the text data gotten from the XML.
XSSF and SAX (Event API) uses class SheetHandler which extends DefaultHandler for this.
But if we are already at this level where we get at the underlying XML data and process it, then we could go one more step back too. Native Java is able handling ZIP and parsing XML. So we would not even need additional libraries at all. See how read excel file having more than 100000 row in java? where I have shown this. My code uses Package javax.xml.stream which also provides using event based XMLEventReader but not using callbacks but linear code. Maybe this code is simpler to understand because it is all in one.
For detecting whether a number format is a date format, and so the formatted cell contains a date / time value, one single apache poi class org.apache.poi.ss.usermodel.DateUtil is used. This is done to simplify the code. Of course even this class we could have coded our self.

Find Error When Using Codec

I have a simple object named Tag that has an id, name, and three numeric properties. I also have a codec for the object. The folloing code executes without error.
MongoDatabase tagsDatabase =
usersProcess.getMongoClient().getDatabase(tagsDB)
.withCodecRegistry(usersProcess.getCodecRegistry());
MongoCollection<Tag> tagsCollection =
tagsDatabase.getCollection(tagsCollectionName, Tag.class);
ArrayList<Tag> tagsList = new ArrayList<Tag>();
FindIterable<Tag> tagsByAlpha =
tagsCollection.find().sort(Sorts.ascending("name"));
Following this, the code
tagsByAlpha.forEach(new Consumer<Tag>() {
#Override
public void accept(Tag t) {
tagsList.add(t);
}
});
thows the exception "org.bson.BsonInvalidOperationException: readEndArray can only be called when ContextType is ARRAY, not when ContextType is DOCUMENT" at the first line (forEach). An alternative construct
MongoCursor<Tag> tagsCursor = tagsByAlpha.iterator();
throws the same exception. It seems to be implying that find() has returned Documents rather than Tag objects. At the same time, the code that does work suggests that what I'm trying is possible. What am I doing wrong?
Should have used org.bson.codecs.DoubleCodec

New Output file for each Item passed into FlatFileItemWriter

I have the following domain object. This is the object being passed from my processor to my writer.
public class DivisionIdPromoCompStartDtEndDtGrouping {
private int divisionId;
private Date rpmPromoCompDetailStartDate;
private Date rpmPromoCompDetailEndDate;
private List<MasterList> detailRecords = new ArrayList<MasterList>();
I would like a new file per DivisionIdPromoCompStartDtEndDtGrouping. each file would have a line for each of the detailRecords in the list. The output files would be of the same format just logically separated based on data (divisionId,rpmPromoCompDetailStartDate and rpmPromoCompDetailEndDate).
How can I create an FlatFileItemWriter to output a new file for each DivisionIdPromoCompStartDtEndDtGrouping with the content detailRecords?
I think the answer might be a compositeItemWriter. Is that right? Could someone help me with an example of this.
thanks in advance
You're close. Instead of just a CompositeItemWriter, use a ClassifierCompositeItemWriter. This coupled with a Classifier implementation that will choose a writer by grouping will allow you to have one file per group. You can read more about this ItemReader in the javadoc here: http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/item/support/ClassifierCompositeItemWriter.html
No, the answer is not a composite writer. A composite writer simple forwards all items it receives to all defined childwriters.
The problem with FlatFileItemWriter is, that you you have to open and to close it, which is handled by the Framwork itself.
A simple approach would be to implement your own writer and use a FlatFileWriter in its write method.
public class MyWriter implements ItemWriter<..>{
public void write(List<..> items) {
for (.. item:items) {
FlatFileItemWriter fileWriter = new FlatFileItemWriter();
fileWriter.setResource(...); // unique FileName
fileWriter.setLineAggregator(...);
fileWriter.... ; // do other settings if necessary
fileWriter.afterPropertiesSet();
fileWriter.open(new ExecutionContext());
fileWriter.write(Collections.singleList(item));
fileWriter.close();
}
}
}
The lineAggregator has to create an appropriate String including all the linebreaks, so that everyDetail is written on its own line in the file.
Of course, you don't have to use a FlatFileWriter and just open an file, use the lineAggregator to create to line and save the line to the file.

How to close Java Formatter, in finally or not?

I know that normally streams and formatters (particularly java.util.Formatter) in Java should be closed in finally to avoid from resource leaks. But here I am a little bit confused, because I see a lot of examples where people just close it without any finally block, especially the formatters. This question may have no sense to some people, but I want to be sure in what I am asking about.
Some examples from java2s.com and from tutorialspoint.com where the formatters are just closed without any block.
Please consider that my question is only for Java 6 and lower versions, because I know about try with resources.
Example:
public static void main(String[] args) {
StringBuffer buffer = new StringBuffer();
Formatter formatter = new Formatter(buffer, Locale.US);
// format a new string
String name = "from java2s.com";
formatter.format("Hello %s !", name);
// print the formatted string
System.out.println(formatter);
// close the formatter
formatter.close();
// attempt to access the formatter results in exception
System.out.println(formatter);
}
In this specific example, it is not necessary to call close(). You only need to close the formatter if the underlying appender is Closable. In this case you are using a StringBuffer, which is not Closable so the call to close() does nothing. If you were to use Writer or PrintStream, those are closable and the call to close() would be necessary to avoid leaving the stream open.
If you are ever unsure if it is Closable it is best to just call close() anyway. No harm in doing so.
How about this, without further comments:
public static void main(String[] args) {
StringBuffer buffer = new StringBuffer();
Formatter formatter = null;
try {
formatter = new Formatter(buffer, Locale.US);
String name = "from java2s.com";
formatter.format("Hello %s !", name);
System.out.println(formatter);
}
finally {
if (formatter != null) {
formatter.close();
}
}
}

Modify file using Files.lines

I'd like to read in a file and replace some text with new text. It would be simple using asm and int 21h but I want to use the new java 8 streams.
Files.write(outf.toPath(),
(Iterable<String>)Files.lines(inf)::iterator,
CREATE, WRITE, TRUNCATE_EXISTING);
Somewhere in there I'd like a lines.replace("/*replace me*/","new Code()\n");. The new lines are because I want to test inserting a block of code somewhere.
Here's a play example, that doesn't work how I want it to, but compiles. I just need a way to intercept the lines from the iterator, and replace certain phrases with code blocks.
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import static java.nio.file.StandardOpenOption.*;
import java.util.Arrays;
import java.util.stream.Stream;
public class FileStreamTest {
public static void main(String[] args) {
String[] ss = new String[]{"hi","pls","help","me"};
Stream<String> stream = Arrays.stream(ss);
try {
Files.write(Paths.get("tmp.txt"),
(Iterable<String>)stream::iterator,
CREATE, WRITE, TRUNCATE_EXISTING);
} catch (IOException ex) {}
//// I'd like to hook this next part into Files.write part./////
//reset stream
stream = Arrays.stream(ss);
Iterable<String> it = stream::iterator;
//I'd like to replace some text before writing to the file
for (String s : it){
System.out.println(s.replace("me", "my\nreal\nname"));
}
}
}
edit: I've gotten this far and it works. I was trying with filter and maybe it isn't really necessary.
Files.write(Paths.get("tmp.txt"),
(Iterable<String>)(stream.map((s) -> {
return s.replace("me", "my\nreal\nname");
}))::iterator,
CREATE, WRITE, TRUNCATE_EXISTING);
The Files.write(..., Iterable, ...) method seems tempting here, but converting the Stream to an Iterable makes this cumbersome. It also "pulls" from the Iterable, which is a bit odd. It would make more sense if the file-writing method could be used as the stream's terminal operation, within something like forEach.
Unfortunately, most things that write throw IOException, which isn't permitted by the Consumer functional interface that forEach expects. But PrintWriter is different. At least, its writing methods don't throw checked exceptions, although opening one can still throw IOException. Here's how it could be used.
Stream<String> stream = ... ;
try (PrintWriter pw = new PrintWriter("output.txt", "UTF-8")) {
stream.map(s -> s.replaceAll("foo", "bar"))
.forEachOrdered(pw::println);
}
Note the use of forEachOrdered, which prints the output lines in the same order in which they were read, which is presumably what you want!
If you're reading lines from an input file, modifying them, and then writing them to an output file, it would be reasonable to put both files within the same try-with-resources statement:
try (Stream<String> input = Files.lines(Paths.get("input.txt"));
PrintWriter output = new PrintWriter("output.txt", "UTF-8"))
{
input.map(s -> s.replaceAll("foo", "bar"))
.forEachOrdered(output::println);
}

Resources