Modify file using Files.lines - java-8

I'd like to read in a file and replace some text with new text. It would be simple using asm and int 21h but I want to use the new java 8 streams.
Files.write(outf.toPath(),
(Iterable<String>)Files.lines(inf)::iterator,
CREATE, WRITE, TRUNCATE_EXISTING);
Somewhere in there I'd like a lines.replace("/*replace me*/","new Code()\n");. The new lines are because I want to test inserting a block of code somewhere.
Here's a play example, that doesn't work how I want it to, but compiles. I just need a way to intercept the lines from the iterator, and replace certain phrases with code blocks.
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import static java.nio.file.StandardOpenOption.*;
import java.util.Arrays;
import java.util.stream.Stream;
public class FileStreamTest {
public static void main(String[] args) {
String[] ss = new String[]{"hi","pls","help","me"};
Stream<String> stream = Arrays.stream(ss);
try {
Files.write(Paths.get("tmp.txt"),
(Iterable<String>)stream::iterator,
CREATE, WRITE, TRUNCATE_EXISTING);
} catch (IOException ex) {}
//// I'd like to hook this next part into Files.write part./////
//reset stream
stream = Arrays.stream(ss);
Iterable<String> it = stream::iterator;
//I'd like to replace some text before writing to the file
for (String s : it){
System.out.println(s.replace("me", "my\nreal\nname"));
}
}
}
edit: I've gotten this far and it works. I was trying with filter and maybe it isn't really necessary.
Files.write(Paths.get("tmp.txt"),
(Iterable<String>)(stream.map((s) -> {
return s.replace("me", "my\nreal\nname");
}))::iterator,
CREATE, WRITE, TRUNCATE_EXISTING);

The Files.write(..., Iterable, ...) method seems tempting here, but converting the Stream to an Iterable makes this cumbersome. It also "pulls" from the Iterable, which is a bit odd. It would make more sense if the file-writing method could be used as the stream's terminal operation, within something like forEach.
Unfortunately, most things that write throw IOException, which isn't permitted by the Consumer functional interface that forEach expects. But PrintWriter is different. At least, its writing methods don't throw checked exceptions, although opening one can still throw IOException. Here's how it could be used.
Stream<String> stream = ... ;
try (PrintWriter pw = new PrintWriter("output.txt", "UTF-8")) {
stream.map(s -> s.replaceAll("foo", "bar"))
.forEachOrdered(pw::println);
}
Note the use of forEachOrdered, which prints the output lines in the same order in which they were read, which is presumably what you want!
If you're reading lines from an input file, modifying them, and then writing them to an output file, it would be reasonable to put both files within the same try-with-resources statement:
try (Stream<String> input = Files.lines(Paths.get("input.txt"));
PrintWriter output = new PrintWriter("output.txt", "UTF-8"))
{
input.map(s -> s.replaceAll("foo", "bar"))
.forEachOrdered(output::println);
}

Related

How does the POI Event API read data from Excel and why does it use less RAM?

I am currently writing my bachelor thesis and I am using the POI Event API from Apache. In short, my work is about a more efficient way to read data from Excel.
I get asked by developers again and again how exactly this is meant with Event API. Unfortunately I don't find anything on the Apache page about the basic principle.
Following code, how I use the POI Event API (This is from the Apache example for XSSF and SAX):
import java.io.InputStream;
import java.util.Iterator;
import org.apache.poi.ooxml.util.SAXHelper;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.ParserConfigurationException;
public class ExampleEventUserModel {
public void processOneSheet(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
// To look up the Sheet Name / Sheet Order / rID,
// you need to process the core Workbook stream.
// Normally it's of the form rId# or rSheet#
InputStream sheet2 = r.getSheet("rId2");
InputSource sheetSource = new InputSource(sheet2);
parser.parse(sheetSource);
sheet2.close();
}
public void processAllSheets(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
Iterator<InputStream> sheets = r.getSheetsData();
while(sheets.hasNext()) {
System.out.println("Processing new sheet:\n");
InputStream sheet = sheets.next();
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
System.out.println("");
}
}
public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException, ParserConfigurationException {
XMLReader parser = SAXHelper.newXMLReader();
ContentHandler handler = new SheetHandler(sst);
parser.setContentHandler(handler);
return parser;
}
/**
* See org.xml.sax.helpers.DefaultHandler javadocs
*/
private static class SheetHandler extends DefaultHandler {
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => cell
if(name.equals("c")) {
// Print the cell reference
System.out.print(attributes.getValue("r") + " - ");
// Figure out if the value is an index in the SST
String cellType = attributes.getValue("t");
if(cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
}
// Clear contents cache
lastContents = "";
}
public void endElement(String uri, String localName, String name)
throws SAXException {
// Process the last contents as required.
// Do now, as characters() may be called more than once
if(nextIsString) {
int idx = Integer.parseInt(lastContents);
lastContents = sst.getItemAt(idx).getString();
nextIsString = false;
}
// v => contents of a cell
// Output after we've seen the string contents
if(name.equals("v")) {
System.out.println(lastContents);
}
}
public void characters(char[] ch, int start, int length) {
lastContents += new String(ch, start, length);
}
}
public static void main(String[] args) throws Exception {
ExampleEventUserModel example = new ExampleEventUserModel();
example.processOneSheet(args[0]);
example.processAllSheets(args[0]);
}
}
Can someone please explain to me how the Event API works? Is it the same as the event-based architecture or is it something else?
A *.xlsx file, which is Excel stored in Office Open XML and is what apache poi handles as XSSF, is a ZIP archive containing the data in XML files within a directory structure. So we can unzip the *.xlsx file and get the data directly from the XML files then.
There is /xl/sharedStrings.xml having all the string cell values in it. And there is /xl/workbook.xml describing the workbook structure. And there are /xl/worksheets/sheet1.xml, /xl/worksheets/sheet2.xml, ... which are storing the sheets' data. And there is /xl/styles.xml having the style settings for all cells in the sheets.
Per default while creating a XSSFWorkbook all those parts of the *.xlsx file will become object representations as XSSFWorkbook, XSSFSheet, XSSFRow, XSSFCell, ... and further objects of org.apache.poi.xssf.*.* in memory.
To get an impression of how memory consuming XSSFSheet, XSSFRow and XSSFCell are, a look into the sources will be good. Each of those objects contains multiple Lists and Maps as internally members and of course multiple methods too. Now imagine a sheet having hundreds of thousands of rows each containing up to hundreds of cells. Each of those rows and cells will be represented by a XSSFRow or a XSSFCell in memory. This cannot be an accusation to apache poi because those objects are necessary if working with those objects is needed. But if the need is really only getting the content out of the Excel sheet, then those objects are not all necessary. That's why the XSSF and SAX (Event API) approach.
So if the need is only reading data from sheets one could simply parsing the XML of all the /xl/worksheets/sheet[n].xml files without the need for creating memory consuming objects for each sheet and for each row and for each cell in those sheets.
Parsing XML in event based mode means that the code goes top down through the XML and has callback methods defined which get called if the code detects the start of an element, the end of an element or character content within an element. The appropriate callback methods then handle what to do on start, end or with character content of an element. So reading the XML file only means running top down through the file once, handle the events (start, end, character content of an element) and are able getting all needed content out of it. So memory consuming is reduced to storing the text data gotten from the XML.
XSSF and SAX (Event API) uses class SheetHandler which extends DefaultHandler for this.
But if we are already at this level where we get at the underlying XML data and process it, then we could go one more step back too. Native Java is able handling ZIP and parsing XML. So we would not even need additional libraries at all. See how read excel file having more than 100000 row in java? where I have shown this. My code uses Package javax.xml.stream which also provides using event based XMLEventReader but not using callbacks but linear code. Maybe this code is simpler to understand because it is all in one.
For detecting whether a number format is a date format, and so the formatted cell contains a date / time value, one single apache poi class org.apache.poi.ss.usermodel.DateUtil is used. This is done to simplify the code. Of course even this class we could have coded our self.

trying to optimise the code and increasing performance by reading same text file from a method to different methods in java

Am trying to reducing the code and increasing performance by reading same text file from a method to different methods in java.
sample code of reading text file in each every method based on requirement.
enter code here:
class{
main(){
method1();
method2();
method3();
....
}
method1(){
BufferedReader reader = new BufferedReader(new FileReader(file.txt));
...
}
method2(){
BufferedReader reader = new BufferedReader(new FileReader(file.txt));
...
}
method3(){
BufferedReader reader = new BufferedReader(new FileReader(file.txt));
.....
}
}
what i want to know is there any logic to read text file once in one method and use in different method in java?
If the content of the file is immutable, you can:
store his content, line by line, in a specific method
this method is called by constructor
the returned datas are stored in a List attribute of class
and refer to this attribute by the other methods
method1()
method2()
method3()

get method to display containing folder, size, and time of last modification Java

I have a program I am writing for a class. I have got the first part down, but need help with the code for this part:
containing folder, size, and time of last modification these steps are the ones I need help writing in.
Here is the challenge
1. Create a file using any word-processing program or text editor. Write an application that displays the file’s name, containing folder, size, and time of last modification.
below is my code so far
import java.nio.file.*;
import java.nio.file.attribute.*;
import java.io.IOException;
import static java.nio.file.AccessMode.*;
public class FileStatistics
{
public static void main(String[] args)
{
Path filePath =
Paths.get("C:\\Users\\John\\Desktop\\N Drive\\St Leo Master folder\\COM-209\\module 6\\sixtestfile.txt");
System.out.println("Path is" + filePath.toString ());
try
{
filePath.getFileSystem().provider().checkAccess
(filePath, READ, EXECUTE);
System.out.println("File can be read & executed");
}
catch(IOException e)
{
System.out.println("File cannot be used in this app");
}
}
}

Downlolad and save file from ClientRequest using ExchangeFunction in Project Reactor

I have problem with correctly saving a file after its download is complete in Project Reactor.
class HttpImageClientDownloader implements ImageClientDownloader {
private final ExchangeFunction exchangeFunction;
HttpImageClientDownloader() {
this.exchangeFunction = ExchangeFunctions.create(new ReactorClientHttpConnector());
}
#Override
public Mono<File> downloadImage(String url, Path destination) {
ClientRequest clientRequest = ClientRequest.create(HttpMethod.GET, URI.create(url)).build();
return exchangeFunction.exchange(clientRequest)
.map(clientResponse -> clientResponse.body(BodyExtractors.toDataBuffers()))
//.flatMapMany(clientResponse -> clientResponse.body(BodyExtractors.toDataBuffers()))
.flatMap(dataBuffer -> {
AsynchronousFileChannel fileChannel = createFile(destination);
return DataBufferUtils
.write(dataBuffer, fileChannel, 0)
.publishOn(Schedulers.elastic())
.doOnNext(DataBufferUtils::release)
.then(Mono.just(destination.toFile()));
});
}
private AsynchronousFileChannel createFile(Path path) {
try {
return AsynchronousFileChannel.open(path, StandardOpenOption.CREATE);
} catch (Exception e) {
throw new ImageDownloadException("Error while creating file: " + path, e);
}
}
}
So my question is:
Is DataBufferUtils.write(dataBuffer, fileChannel, 0) blocking?
What about when the disk is slow?
And second question about what happens when ImageDownloadException occurs ,
In doOnNext I want to release the given data buffer, is that a good place for this kind operation?
I think also this line:
.map(clientResponse -> clientResponse.body(BodyExtractors.toDataBuffers()))
could be blocking...
Here's another (shorter) way to achieve that:
Flux<DataBuffer> data = this.webClient.get()
.uri("/greeting")
.retrieve()
.bodyToFlux(DataBuffer.class);
Path file = Files.createTempFile("spring", null);
WritableByteChannel channel = Files.newByteChannel(file, StandardOpenOption.WRITE);
Mono<File> result = DataBufferUtils.write(data, channel)
.map(DataBufferUtils::release)
.then(Mono.just(file));
Now DataBufferUtils::write operations are not blocking because they use non-blocking IO with channels. Writing to such channels means it'll write whatever it can to the output buffer (i.e. may write all the DataBuffer or just part of it).
Using Flux::map or Flux::doOnNext is the right place to do that. But you're right, if an error occurs, you're still responsible for releasing the current buffer (and all the remaining ones). There might be something we can improve here in Spring Framework, please keep an eye on SPR-16782.
I don't see how your last sample shows anything blocking: all methods return reactive types and none are doing blocking I/O.

Why does Java8 Stream generate nothing?

import java.util.Comparator;
import java.util.PriorityQueue;
public class TestPQ {
public static void main(String[] args){
Comparator<String> comparator = new StringLengthComparator();
PriorityQueue<String> queue = new PriorityQueue<String>(10, comparator);
queue.offer("Short");
queue.offer("ABCahahahha");
queue.offer("lululu");
queue.stream().map( s-> {
System.out.println("queue: "+ s);
return s;
});
}
}
I have this code and I expect that I would see "Short", "lululu" and "ABCahahahha" been printed out.
But I don't see them. what's wrong with my code?
Compile is fine. and I am using java 8 compiler and runtime.
You don't have any terminal operation consuming your stream. So nothing happens. map() is an intermediate operation, which is not supposed to have side effects. What your code should be is
queue.stream().forEach(s-> {
System.out.println("queue: "+ s);
});
The map() method itself is intermediate and does not enforce the consumption of a Stream so it's a very bad idea to put side effects there.
In this case, you should use the dedicated forEach() method:
queue.stream()
.forEach(s -> System.out.println("queue: " + s));
non terminal operation is not doing any processing. Its the terminal operation only, who start the processing of all the non terminal operation and then finally terminal operation.

Resources