List as an object field - how to handle it via Stream (Java 8)? - java-8

There is a Waybill object that has a Set<Packing> field, the Packing object has a PRICE field.
I get a List<Waybill>.
Need to calculate the total cost of all Packing from the entire List<Waybill>.
How it competently to make through Stream?
Thank you.
class Waybill {
Set<Packing> setOfPacking;
}
class Packing {
int PRICE;
}
List<Waybill> allWaybills = ...

This worked for me:
double total = allWaybills.stream()
.flatMap(waybill -> waybill.setOfPacking.stream())
.mapToInt(packing -> packing.PRICE)
.sum();
I think it is easier to reason about because there aren't any multi-level stream operations.
I would be interested to see how to use flatMapToInt to replace both the flatMap and map operations with one operation without making it multi-level.
Here is a test program:
import java.util.Set;
import java.util.List;
import java.util.HashSet;
import java.util.ArrayList;
import java.util.stream.Collectors;
public class HelloWorld
{
public static class Packing
{
public int PRICE = 0;
}
public static class Waybill
{
public Set<Packing> setOfPacking = new HashSet<Packing>();
}
public static void main(String []args){
List<Waybill> allWaybills = new ArrayList<Waybill>();
Waybill w1 = new Waybill();
Packing p1 = new Packing(); p1.PRICE = 1; w1.setOfPacking.add(p1);
Packing p2 = new Packing(); p2.PRICE = 2; w1.setOfPacking.add(p2);
allWaybills.add(w1);
Waybill w2 = new Waybill();
Packing p3 = new Packing(); p3.PRICE = 3; w2.setOfPacking.add(p3);
Packing p4 = new Packing(); p4.PRICE = 4; w2.setOfPacking.add(p4);
allWaybills.add(w2);
double total = allWaybills.stream()
.flatMap(waybill -> waybill.setOfPacking.stream())
.mapToInt(packing -> packing.PRICE)
.sum();
System.out.println("total = "+total);
}
}

import java.util.stream.*
List<Waybill> allWaybills = ...
int totalCost = allWaybills
.stream()
.mapToInt(w -> w.setOfPacking
.stream()
.mapToInt(p -> p.PRICE)
.sum()
)
.sum();

Related

Functional/Stream programming for the graph problem "Reconstruct Itinerary"

I am trying to solve the reconstruct itinerary problem (https://leetcode.com/problems/reconstruct-itinerary/) in Scala using functional approach. Java solution works but Scala doesn't. One reason I found out was the hashmap is being updated and every iteration has the latest hashmap (even when popping from recursion) which is weird.
Here is the solution in Java:
import java.util.ArrayList;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.PriorityQueue;
public class Solution1 {
private void dfg(Map<String, PriorityQueue<String>> adj, LinkedList<String> result, String vertex){
PriorityQueue<String> pq = adj.get(vertex);
while (pq!=null && !pq.isEmpty()){
System.out.println("Before :"+adj.get(vertex));
String v = pq.poll();
System.out.println("After :"+ adj.get(vertex));
dfg(adj,result,v);
}
result.addFirst(vertex);
}
public List<String> findItinerary(List<List<String>> tickets){
Map<String,PriorityQueue<String>> adj = new HashMap<>();
for(List<String> ticket: tickets){
adj.putIfAbsent(ticket.get(0),new PriorityQueue<>());
adj.get(ticket.get(0)).add(ticket.get(1));
}
LinkedList<String> result = new LinkedList<>();
dfg(adj,result,"JFK");
//not reverse yet
return result;
}
public static void main(String[] args){
List<List<String>> tickets = new ArrayList<>();
List t1= new ArrayList();
t1.add("JFK");
t1.add("SFO");
tickets.add(t1);
List t2= new ArrayList();
t2.add("JFK");
t2.add("ATL");
tickets.add(t2);
List t3= new ArrayList();
t3.add("SFO");
t3.add("ATL");
tickets.add(t3);
List t4= new ArrayList();
t4.add("ATL");
t4.add("JFK");
tickets.add(t4);
List t5= new ArrayList();
t5.add("ATL");
t5.add("SFO");
tickets.add(t5);
System.out.println();
Solution1 s1 = new Solution1();
List<String> finalRes = s1.findItinerary(tickets);
for(String model : finalRes) {
System.out.print(model + " ");
}
}
}
Here is my solution in Scala which is not working:
package graph
class Itinerary {
}
case class Step(g: Map[String,List[String]],sort: List[String]=List())
object Solution {
def main(arr: Array[String]) = {
val tickets = List(List("JFK","SFO"),List("JFK","ATL"),List("SFO","ATL"),List("ATL","JFK"),List("ATL","SFO"))
println(findItinerary(tickets))
}
def findItinerary(tickets: List[List[String]]): List[String] = {
val g = tickets.foldLeft(Map[String,List[String]]())((m,t)=>{
val key=t(0)
val value= t(1)
m + (key->(m.getOrElse(key,Nil) :+ value).sorted)
})
println(g)
// g.keys.foldLeft(Step())((s,n)=> dfs(n,g,s)).sort.toList
dfs("JFK",Step(g)).sort.toList
}
def dfs(vertex: String,step: Step): Step = {
println("Input vertex " + vertex)
println("Input map "+ step.g)
val updatedStep= step.g.getOrElse(vertex,Nil).foldLeft(step) ((s,n)=>{
//println("Processing "+n+" of vertex "+vertex)
//delete link
val newG = step.g + (vertex->step.g.getOrElse(vertex,Nil).filter(v=>v!=n))
// println(newG)
dfs(n,step.copy(g=newG))
})
println("adding vertex to result "+vertex)
updatedStep.copy(sort = updatedStep.sort:+vertex)
}
}
Scala is sometimes approached as a "better" Java, but that's really very limiting. If you can get into the FP mindset, and study the Standard Library, you'll find that it's a whole new world.
def findItinerary(tickets: List[List[String]]): List[String] = {
def loop(from : String
,jump : Map[String,List[String]]
,acc : List[String]) : List[String] = jump.get(from) match {
case None => if (jump.isEmpty) from::acc else Nil
case Some(next::Nil) => loop(next, jump - from, from::acc)
case Some(nLst) =>
nLst.view.map{ next =>
loop(next, jump+(from->(nLst diff next::Nil)), from::acc)
}.find(_.lengthIs > 0).getOrElse(Nil)
}
loop("JFK"
,tickets.groupMap(_(0))(_(1)).map(kv => kv._1 -> kv._2.sorted)
,Nil).reverse
}
I am going to be honest that I didn't look through your code to see where the problem was. But, I got caught by the problem and decided to give it a go; here is the code:
(hope my code helps you)
type Airport = String // Refined 3 upper case letters.
final case class AirlineTiket(from: Airport, to: Airport)
object ReconstructItinerary {
// I am using cats NonEmptyList to improve type safety, but you can easily remove it from the code.
private final case class State(
currentAirport: Airport,
availableDestinations: Map[Airport, NonEmptyList[Airport]],
solution: List[Airport]
)
def apply(tickets: List[AirlineTiket])(start: Airport): Option[List[Airport]] = {
#annotation.tailrec
def loop(currentState: State, checkpoints: List[State]): Option[List[Airport]] = {
if (currentState.availableDestinations.isEmpty) {
// We used all the tickets, so we can return this solution.
Some((currentState.currentAirport :: currentState.solution).reverse)
} else {
val State(currentAirport, availableDestinations, solution) = currentState
availableDestinations.get(currentAirport) match {
case None =>
// We got into nowhere, lets see if we can return to a previous state...
checkpoints match {
case checkpoint :: remaining =>
// If we can return from there
loop(currentState = checkpoint, checkpoints = remaining)
case Nil =>
// If we can't, then we can say that there is no solution.
None
}
case Some(NonEmptyList(destination, Nil)) =>
// If from the current airport we can only travel to one destination, we will just follow that.
loop(
currentState = State(
currentAirport = destination,
availableDestinations - currentAirport,
currentAirport :: solution
),
checkpoints
)
case Some(NonEmptyList(destination, destinations # head :: tail)) =>
// If we can travel to more than one destination, we are going to try all in order.
val newCheckpoints = destinations.map { altDestination =>
val newDestinations = NonEmptyList(head = destination, tail = destinations.filterNot(_ == altDestination))
State(
currentAirport = altDestination,
availableDestinations.updated(key = currentAirport, value = newDestinations),
currentAirport :: solution
)
}
loop(
currentState = State(
currentAirport = destination,
availableDestinations.updated(key = currentAirport, value = NonEmptyList(head, tail)),
currentAirport :: solution
),
newCheckpoints ::: checkpoints
)
}
}
}
val availableDestinations = tickets.groupByNel(_.from).view.mapValues(_.map(_.to).sorted).toMap
loop(
currentState = State(
currentAirport = start,
availableDestinations,
solution = List.empty
),
checkpoints = List.empty
)
}
}
You can see the code running here.

page X of Y with re-ordered TOC: X will start from 1 again after the TOC

I could create seprately the "page x of y" and re-ordered the TOC with the official examples. "Page x of y" is created according to iText 7: Building Blocks Chapter 7: Handling events; setting viewer preferences and writer properties with the examples Solving the "Page X of Y" problem; and TOC is created with reference to iText 7 examples TOC as first page.
Now I want the generated PDF to have both "page x of y" and re-ordered TOC. And "page x of y" shall be shown on all pages, i.e. on the 1st page (the TOC page), it shall show "Page 1 of 35", the 2nd page (start page of the main text) shall show "Page 2 of 35" (In this Jekyll and Hyde example, TOC has one page).
But when I tried to put "page x of y" and re-order TOC together, I found a problem in the generated PDF: the 1st page (the TOC page) showed correctly "Page 1 of 35", but the 2nd page (start page of the main text) showed also "Page 1 of 35".
What is the tricks to let the 2nd page to show "Page 2 of 35" with re-ordered TOC?
==code for Page X of Y and re-order TOC==
package main;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.AbstractMap.SimpleEntry;
import java.util.ArrayList;
import java.util.List;
import com.itextpdf.io.IOException;
import com.itextpdf.io.font.FontConstants;
import com.itextpdf.kernel.events.Event;
import com.itextpdf.kernel.events.IEventHandler;
import com.itextpdf.kernel.events.PdfDocumentEvent;
import com.itextpdf.kernel.font.PdfFont;
import com.itextpdf.kernel.font.PdfFontFactory;
import com.itextpdf.kernel.geom.Rectangle;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfName;
import com.itextpdf.kernel.pdf.PdfOutline;
import com.itextpdf.kernel.pdf.PdfPage;
import com.itextpdf.kernel.pdf.PdfString;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.kernel.pdf.action.PdfAction;
import com.itextpdf.kernel.pdf.canvas.PdfCanvas;
import com.itextpdf.kernel.pdf.canvas.draw.DottedLine;
import com.itextpdf.kernel.pdf.navigation.PdfDestination;
import com.itextpdf.kernel.pdf.xobject.PdfFormXObject;
import com.itextpdf.layout.Canvas;
import com.itextpdf.layout.Document;
import com.itextpdf.layout.element.AreaBreak;
import com.itextpdf.layout.element.Paragraph;
import com.itextpdf.layout.element.Tab;
import com.itextpdf.layout.element.TabStop;
import com.itextpdf.layout.hyphenation.HyphenationConfig;
import com.itextpdf.layout.layout.LayoutContext;
import com.itextpdf.layout.layout.LayoutResult;
import com.itextpdf.layout.property.AreaBreakType;
import com.itextpdf.layout.property.TabAlignment;
import com.itextpdf.layout.property.TextAlignment;
import com.itextpdf.layout.renderer.ParagraphRenderer;
public class CreateTOC {
public static final String SRC = "D:/work/java_workspace/result/jekyll_hyde.txt";
public static final String DEST = "D:/work/java_workspace/result/test_toc.pdf";
public static void main(String args[]) throws IOException, Exception {
File file = new File(DEST);
file.getParentFile().mkdirs();
new CreateTOC().createPdf(DEST);
}
public void createPdf(String dest) throws IOException, java.io.IOException {
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
pdf.getCatalog().setPageMode(PdfName.UseOutlines);
PageXofY event = new PageXofY(pdf);
pdf.addEventHandler(PdfDocumentEvent.END_PAGE, event);
PdfFont font = PdfFontFactory.createFont(FontConstants.TIMES_ROMAN);
PdfFont bold = PdfFontFactory.createFont(FontConstants.HELVETICA_BOLD);
Document document = new Document(pdf);
document.setTextAlignment(TextAlignment.JUSTIFIED)
.setHyphenation(new HyphenationConfig("en", "uk", 3, 3))
.setFont(font)
.setFontSize(11);
// // add the cover
// document.add(new Paragraph("this is the cover 1"));
// document.add(new AreaBreak(AreaBreakType.NEXT_PAGE));
//
//
// document.add(new Paragraph("this is the cover 2"));
// document.add(new AreaBreak(AreaBreakType.NEXT_PAGE));
// parse text to PDF
BufferedReader br = new BufferedReader(new FileReader(SRC));
String name, line;
Paragraph p;
boolean title = true;
int counter = 0;
PdfOutline outline = null;
List<SimpleEntry<String,SimpleEntry<String, Integer>>> toc = new ArrayList<>();
while ((line = br.readLine()) != null) {
p = new Paragraph(line);
p.setKeepTogether(true);
if (title) {
name = String.format("title%02d", counter++);
outline = createOutline(outline, pdf, line, name);
int pagesWithoutCover = pdf.getNumberOfPages();
SimpleEntry<String, Integer> titlePage = new SimpleEntry(line, pagesWithoutCover);
p.setFont(bold).setFontSize(12)
.setKeepWithNext(true)
.setDestination(name)
.setNextRenderer(new UpdatePageRenderer(p, titlePage));
title = false;
document.add(p);
toc.add(new SimpleEntry(name, titlePage));
}
else {
p.setFirstLineIndent(18);
if (line.isEmpty()) {
p.setMarginBottom(12);
title = true;
}
else {
p.setMarginBottom(0);
}
document.add(p);
}
}
document.add(new AreaBreak(AreaBreakType.NEXT_PAGE));
// create table of contents
int startToc = pdf.getNumberOfPages();
p = new Paragraph().setFont(bold).add("Table of Contents").setDestination("toc");
document.add(p);
toc.remove(0);
List<TabStop> tabstops = new ArrayList();
tabstops.add(new TabStop(580, TabAlignment.RIGHT, new DottedLine()));
for (SimpleEntry<String, SimpleEntry<String, Integer>> entry : toc) {
SimpleEntry<String, Integer> text = entry.getValue();
p = new Paragraph()
.addTabStops(tabstops)
.add(text.getKey())
// .setFixedLeading(150)
.add(new Tab())
.add(String.valueOf(text.getValue()))
.setAction(PdfAction.createGoTo(entry.getKey()));
document.add(p);
}
int tocPages = pdf.getNumberOfPages() - startToc;
// reorder pages
PdfPage page;
for (int i = 0; i <= tocPages; i++) {
page = pdf.removePage(startToc + i);
pdf.addPage(i + 1, page);
}
event.writeTotal(pdf);
document.close();
}
protected class UpdatePageRenderer extends ParagraphRenderer {
protected SimpleEntry<String, Integer> entry;
public UpdatePageRenderer(Paragraph modelElement, SimpleEntry<String, Integer> entry) {
super(modelElement);
this.entry = entry;
}
#Override
public LayoutResult layout(LayoutContext layoutContext) {
LayoutResult result = super.layout(layoutContext);
entry.setValue(layoutContext.getArea().getPageNumber());
return result;
}
}
public PdfOutline createOutline(PdfOutline outline, PdfDocument pdf, String title, String name) {
if (outline == null) {
outline = pdf.getOutlines(false);
outline = outline.addOutline(title);
outline.addDestination(PdfDestination.makeDestination(new PdfString(name)));
return outline;
}
PdfOutline kid = outline.addOutline(title);
kid.addDestination(PdfDestination.makeDestination(new PdfString(name)));
return outline;
}
protected class PageXofY implements IEventHandler {
protected PdfFormXObject placeholder;
protected float side = 20;
protected float x = 300;
protected float y = 25;
protected float space = 4.5f;
protected float descent = 3;
public PageXofY(PdfDocument pdf) {
placeholder = new PdfFormXObject(new Rectangle(0, 0, side, side));
}
#Override
public void handleEvent(Event event) {
PdfDocumentEvent docEvent = (PdfDocumentEvent) event;
PdfDocument pdf = docEvent.getDocument();
PdfPage page = docEvent.getPage();
int pageNumber = pdf.getPageNumber(page);
Rectangle pageSize = page.getPageSize();
PdfCanvas pdfCanvas = new PdfCanvas(
page.newContentStreamBefore(), page.getResources(), pdf);
Canvas canvas = new Canvas(pdfCanvas, pdf, pageSize);
Paragraph p = new Paragraph().add("Page ").add(String.valueOf(pageNumber)).add(" of");
canvas.showTextAligned(p, x, y, TextAlignment.RIGHT);
pdfCanvas.addXObject(placeholder, x + space, y - descent);
pdfCanvas.release();
}
public void writeTotal(PdfDocument pdf) {
Canvas canvas = new Canvas(placeholder, pdf);
canvas.showTextAligned(String.valueOf(pdf.getNumberOfPages()),
0, descent, TextAlignment.LEFT);
}
}
}
In general
You will obviously run into trouble if you first create pages including a "page x/y" using the current page number of each page and then re-order the pages.
If you know beforehand how many pages you will move up front, you can take this re-ordering into account by adding this number as offset to the page number in your event listener. Be sure to reset that offset when you start creating the TOC pages.
If you don't know that number, it does not make sense to try to number the pages before re-ordering at all. Instead add page numbers afterwards as described in the iText 7: Building Blocks Chapter 2: Working with the RootElement example Adding a Page X of Y footer, i.e. loop over every page in the document and add a "Page X of Y" Paragraph to each page:
int n = pdf.getNumberOfPages();
Paragraph footer;
for (int page = 1; page <= n; page++) {
footer = new Paragraph(String.format("Page %s of %s", page, n));
document.showTextAligned(footer, 297.5f, 20, page,
TextAlignment.CENTER, VerticalAlignment.MIDDLE, 0);
}
document.close();
Don't forget to set immediateFlush to false as described right after that example.
Using an offset
In a comment you indicated that you did not want to use the solution from chapter 2 referenced above as you didn't want to keep the whole PDF in memory. Then you posted your code.
Thus, let's try and implement the offset mentioned above in your code.
The offset variable is best located right in the event listener. Having added it, it might looks like this:
protected class PageXofY implements IEventHandler
{
// vvv added
int offset = 0;
// ^^^ added
protected PdfFormXObject placeholder;
protected float side = 20;
protected float x = 300;
protected float y = 25;
protected float space = 4.5f;
protected float descent = 3;
public PageXofY(PdfDocument pdf)
{
placeholder = new PdfFormXObject(new Rectangle(0, 0, side, side));
}
#Override
public void handleEvent(Event event)
{
PdfDocumentEvent docEvent = (PdfDocumentEvent) event;
PdfDocument pdf = docEvent.getDocument();
PdfPage page = docEvent.getPage();
int pageNumber = pdf.getPageNumber(page);
Rectangle pageSize = page.getPageSize();
PdfCanvas pdfCanvas = new PdfCanvas(
page.newContentStreamBefore(), page.getResources(), pdf);
Canvas canvas = new Canvas(pdfCanvas, pdf, pageSize);
// vvv changed
Paragraph p = new Paragraph().add("Page ").add(String.valueOf(pageNumber + offset)).add(" of");
// ^^^ changed
canvas.showTextAligned(p, x, y, TextAlignment.RIGHT);
pdfCanvas.addXObject(placeholder, x + space, y - descent);
pdfCanvas.release();
}
public void writeTotal(PdfDocument pdf)
{
Canvas canvas = new Canvas(placeholder, pdf);
canvas.showTextAligned(String.valueOf(pdf.getNumberOfPages()),
0, descent, TextAlignment.LEFT);
}
}
(PageXofY)
(You might want to add getters and setters for the offset.)
When importing the text body your page numbers currently are created off-by-one as the TOC page will later be pulled up to the front. Thus, you need to use an offset of 1 (1 page TOC) during that import.
Afterwards, before starting the TOC page, you will have to reset the offset to 0 as nothing will be pulled before the TOC page thereafter.
Id est:
public void createPdf(Reader reader, String dest) throws IOException
{
[...]
Document document = new Document(pdf);
document.setTextAlignment(TextAlignment.JUSTIFIED)
.setHyphenation(new HyphenationConfig("en", "uk", 3, 3))
.setFont(font)
.setFontSize(11);
// vvv added
event.offset = 1;
// ^^^ added
// // add the cover
[...]
document.add(new AreaBreak(AreaBreakType.NEXT_PAGE));
// vvv added
event.offset = 0;
// ^^^ added
// create table of contents
int startToc = pdf.getNumberOfPages();
[...]
}
(CreateTOC method createPdf)
In the current iText 7 development 7.0.3-SNAPSHOT version this results in desired page numbering.
Beware: There had been reports on delayed page event execution. Probably the event timing meanwhile has been changed. With older versions, therefore, the code might still apply wrong page numbers.

How to add tags to a parsed tree that has no tag?

For example, the parsing tree from Stanford Sentiment Treebank
"(2 (2 (2 near) (2 (2 the) (2 end))) (3 (3 (2 takes) (2 (2 on) (2 (2 a) (2 (2 whole) (2 (2 other) (2 meaning)))))) (2 .)))",
where the number is the sentiment label of each node.
I want to add POS tagging information to each node. Such as:
"(NP (ADJP (IN near)) (DT the) (NN end)) "
I have tried to directly parse the sentence, but the resulted tree is different from that in the Sentiment Treebank (may be because of the parsing version or parameters, I have tried to contact to the author but there is no response).
How can I obtain the tagging information?
I think the code in edu.stanford.nlp.sentiment.BuildBinarizedDataset should be helpful. The main() method steps through how these binary trees can be created in Java code.
Some key lines to look out for in the code:
LexicalizedParser parser = LexicalizedParser.loadModel(parserModel);
TreeBinarizer binarizer = TreeBinarizer.simpleTreeBinarizer(parser.getTLPParams().headFinder(), parser.treebankLanguagePack());
...
Tree tree = parser.apply(tokens);
Tree binarized = binarizer.transformTree(tree);
You can access the node tag information from the Tree object. You should look at the javadoc for edu.stanford.nlp.trees.Tree to see how to access this information.
Also in this answer I have some code that shows accessing a Tree:
How to get NN andNNS from a text?
You want to look at the label() of each tree and subtree to get the tag for a node.
Here is the reference on GitHub to BuildBinarizedDataset.java:
https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/sentiment/BuildBinarizedDataset.java
Please let me know if anything is unclear about this and I can provide further assistance!
First, you need to download the Stanford Parser
Set up
private LexicalizedParser parser;
private TreeBinarizer binarizer;
private CollapseUnaryTransformer transformer;
parser = LexicalizedParser.loadModel(PCFG_PATH);
binarizer = TreeBinarizer.simpleTreeBinarizer(
parser.getTLPParams().headFinder(), parser.treebankLanguagePack());
transformer = new CollapseUnaryTransformer();
Parse
Tree tree = parser.apply(tokens);
Access POSTAG
public String[] constTreePOSTAG(Tree tree) {
Tree binarized = binarizer.transformTree(tree);
Tree collapsedUnary = transformer.transformTree(binarized);
Trees.convertToCoreLabels(collapsedUnary);
collapsedUnary.indexSpans();
List<Tree> leaves = collapsedUnary.getLeaves();
int size = collapsedUnary.size() - leaves.size();
String[] tags = new String[size];
HashMap<Integer, Integer> index = new HashMap<Integer, Integer>();
int idx = leaves.size();
int leafIdx = 0;
for (Tree leaf : leaves) {
Tree cur = leaf.parent(collapsedUnary); // go to preterminal
int curIdx = leafIdx++;
boolean done = false;
while (!done) {
Tree parent = cur.parent(collapsedUnary);
if (parent == null) {
tags[curIdx] = cur.label().toString();
break;
}
int parentIdx;
int parentNumber = parent.nodeNumber(collapsedUnary);
if (!index.containsKey(parentNumber)) {
parentIdx = idx++;
index.put(parentNumber, parentIdx);
} else {
parentIdx = index.get(parentNumber);
done = true;
}
tags[curIdx] = parent.label().toString();
cur = parent;
curIdx = parentIdx;
}
}
return tags;
}
Here is the full source code ConstituencyParse.java that run:
Use param:
java ConstituencyParse -tokpath outputtoken.toks -parentpath outputparent.txt -tagpath outputag.txt < input_sentence_in_text_file_one_sent_per_line.txt
(Note: the source code is adapt from treelstm repo, you also need to replace preprocess-sst.py to call ConstituencyParse.java file below)
import edu.stanford.nlp.process.WordTokenFactory;
import edu.stanford.nlp.ling.HasWord;
import edu.stanford.nlp.ling.Word;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.ling.TaggedWord;
import edu.stanford.nlp.process.PTBTokenizer;
import edu.stanford.nlp.util.StringUtils;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;
import edu.stanford.nlp.parser.lexparser.TreeBinarizer;
import edu.stanford.nlp.tagger.maxent.MaxentTagger;
import edu.stanford.nlp.trees.GrammaticalStructure;
import edu.stanford.nlp.trees.GrammaticalStructureFactory;
import edu.stanford.nlp.trees.PennTreebankLanguagePack;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.Trees;
import edu.stanford.nlp.trees.TreebankLanguagePack;
import edu.stanford.nlp.trees.TypedDependency;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.StringReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.HashMap;
import java.util.Properties;
import java.util.Scanner;
public class ConstituencyParse {
private boolean tokenize;
private BufferedWriter tokWriter, parentWriter, tagWriter;
private LexicalizedParser parser;
private TreeBinarizer binarizer;
private CollapseUnaryTransformer transformer;
private GrammaticalStructureFactory gsf;
private static final String PCFG_PATH = "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz";
public ConstituencyParse(String tokPath, String parentPath, String tagPath, boolean tokenize) throws IOException {
this.tokenize = tokenize;
if (tokPath != null) {
tokWriter = new BufferedWriter(new FileWriter(tokPath));
}
parentWriter = new BufferedWriter(new FileWriter(parentPath));
tagWriter = new BufferedWriter(new FileWriter(tagPath));
parser = LexicalizedParser.loadModel(PCFG_PATH);
binarizer = TreeBinarizer.simpleTreeBinarizer(
parser.getTLPParams().headFinder(), parser.treebankLanguagePack());
transformer = new CollapseUnaryTransformer();
// set up to produce dependency representations from constituency trees
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
gsf = tlp.grammaticalStructureFactory();
}
public List<HasWord> sentenceToTokens(String line) {
List<HasWord> tokens = new ArrayList<>();
if (tokenize) {
PTBTokenizer<Word> tokenizer = new PTBTokenizer(new StringReader(line), new WordTokenFactory(), "");
for (Word label; tokenizer.hasNext(); ) {
tokens.add(tokenizer.next());
}
} else {
for (String word : line.split(" ")) {
tokens.add(new Word(word));
}
}
return tokens;
}
public Tree parse(List<HasWord> tokens) {
Tree tree = parser.apply(tokens);
return tree;
}
public String[] constTreePOSTAG(Tree tree) {
Tree binarized = binarizer.transformTree(tree);
Tree collapsedUnary = transformer.transformTree(binarized);
Trees.convertToCoreLabels(collapsedUnary);
collapsedUnary.indexSpans();
List<Tree> leaves = collapsedUnary.getLeaves();
int size = collapsedUnary.size() - leaves.size();
String[] tags = new String[size];
HashMap<Integer, Integer> index = new HashMap<Integer, Integer>();
int idx = leaves.size();
int leafIdx = 0;
for (Tree leaf : leaves) {
Tree cur = leaf.parent(collapsedUnary); // go to preterminal
int curIdx = leafIdx++;
boolean done = false;
while (!done) {
Tree parent = cur.parent(collapsedUnary);
if (parent == null) {
tags[curIdx] = cur.label().toString();
break;
}
int parentIdx;
int parentNumber = parent.nodeNumber(collapsedUnary);
if (!index.containsKey(parentNumber)) {
parentIdx = idx++;
index.put(parentNumber, parentIdx);
} else {
parentIdx = index.get(parentNumber);
done = true;
}
tags[curIdx] = parent.label().toString();
cur = parent;
curIdx = parentIdx;
}
}
return tags;
}
public int[] constTreeParents(Tree tree) {
Tree binarized = binarizer.transformTree(tree);
Tree collapsedUnary = transformer.transformTree(binarized);
Trees.convertToCoreLabels(collapsedUnary);
collapsedUnary.indexSpans();
List<Tree> leaves = collapsedUnary.getLeaves();
int size = collapsedUnary.size() - leaves.size();
int[] parents = new int[size];
HashMap<Integer, Integer> index = new HashMap<Integer, Integer>();
int idx = leaves.size();
int leafIdx = 0;
for (Tree leaf : leaves) {
Tree cur = leaf.parent(collapsedUnary); // go to preterminal
int curIdx = leafIdx++;
boolean done = false;
while (!done) {
Tree parent = cur.parent(collapsedUnary);
if (parent == null) {
parents[curIdx] = 0;
break;
}
int parentIdx;
int parentNumber = parent.nodeNumber(collapsedUnary);
if (!index.containsKey(parentNumber)) {
parentIdx = idx++;
index.put(parentNumber, parentIdx);
} else {
parentIdx = index.get(parentNumber);
done = true;
}
parents[curIdx] = parentIdx + 1;
cur = parent;
curIdx = parentIdx;
}
}
return parents;
}
// convert constituency parse to a dependency representation and return the
// parent pointer representation of the tree
public int[] depTreeParents(Tree tree, List<HasWord> tokens) {
GrammaticalStructure gs = gsf.newGrammaticalStructure(tree);
Collection<TypedDependency> tdl = gs.typedDependencies();
int len = tokens.size();
int[] parents = new int[len];
for (int i = 0; i < len; i++) {
// if a node has a parent of -1 at the end of parsing, then the node
// has no parent.
parents[i] = -1;
}
for (TypedDependency td : tdl) {
// let root have index 0
int child = td.dep().index();
int parent = td.gov().index();
parents[child - 1] = parent;
}
return parents;
}
public void printTokens(List<HasWord> tokens) throws IOException {
int len = tokens.size();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < len - 1; i++) {
if (tokenize) {
sb.append(PTBTokenizer.ptbToken2Text(tokens.get(i).word()));
} else {
sb.append(tokens.get(i).word());
}
sb.append(' ');
}
if (tokenize) {
sb.append(PTBTokenizer.ptbToken2Text(tokens.get(len - 1).word()));
} else {
sb.append(tokens.get(len - 1).word());
}
sb.append('\n');
tokWriter.write(sb.toString());
}
public void printParents(int[] parents) throws IOException {
StringBuilder sb = new StringBuilder();
int size = parents.length;
for (int i = 0; i < size - 1; i++) {
sb.append(parents[i]);
sb.append(' ');
}
sb.append(parents[size - 1]);
sb.append('\n');
parentWriter.write(sb.toString());
}
public void printTags(String[] tags) throws IOException {
StringBuilder sb = new StringBuilder();
int size = tags.length;
for (int i = 0; i < size - 1; i++) {
sb.append(tags[i]);
sb.append(' ');
}
sb.append(tags[size - 1]);
sb.append('\n');
tagWriter.write(sb.toString().toLowerCase());
}
public void close() throws IOException {
if (tokWriter != null) tokWriter.close();
parentWriter.close();
tagWriter.close();
}
public static void main(String[] args) throws Exception {
String TAGGER_MODEL = "stanford-tagger/models/english-left3words-distsim.tagger";
Properties props = StringUtils.argsToProperties(args);
if (!props.containsKey("parentpath")) {
System.err.println(
"usage: java ConstituencyParse -deps - -tokenize - -tokpath <tokpath> -parentpath <parentpath>");
System.exit(1);
}
// whether to tokenize input sentences
boolean tokenize = false;
if (props.containsKey("tokenize")) {
tokenize = true;
}
// whether to produce dependency trees from the constituency parse
boolean deps = false;
if (props.containsKey("deps")) {
deps = true;
}
String tokPath = props.containsKey("tokpath") ? props.getProperty("tokpath") : null;
String parentPath = props.getProperty("parentpath");
String tagPath = props.getProperty("tagpath");
ConstituencyParse processor = new ConstituencyParse(tokPath, parentPath, tagPath, tokenize);
Scanner stdin = new Scanner(System.in);
int count = 0;
long start = System.currentTimeMillis();
while (stdin.hasNextLine() && count < 2) {
String line = stdin.nextLine();
List<HasWord> tokens = processor.sentenceToTokens(line);
//end tagger
Tree parse = processor.parse(tokens);
// produce parent pointer representation
int[] parents = deps ? processor.depTreeParents(parse, tokens)
: processor.constTreeParents(parse);
String[] tags = processor.constTreePOSTAG(parse);
// print
if (tokPath != null) {
processor.printTokens(tokens);
}
processor.printParents(parents);
processor.printTags(tags);
// print tag
StringBuilder sb = new StringBuilder();
int size = tags.length;
for (int i = 0; i < size - 1; i++) {
sb.append(tags[i]);
sb.append(' ');
}
sb.append(tags[size - 1]);
sb.append('\n');
count++;
if (count % 100 == 0) {
double elapsed = (System.currentTimeMillis() - start) / 1000.0;
System.err.printf("Parsed %d lines (%.2fs)\n", count, elapsed);
}
}
long totalTimeMillis = System.currentTimeMillis() - start;
System.err.printf("Done: %d lines in %.2fs (%.1fms per line)\n",
count, totalTimeMillis / 100.0, totalTimeMillis / (double) count);
processor.close();
}
}

What is the Java 8 way to pull an object from a set?

I'd like to pull an item out of a set, and keep it, based on a predicate. It sure seems like this should be possible, but I can't find a way to prevent going thru the list twice. Such an operation could be used to 'pop' an object based on a dynamic priority.
Perhaps I should stick with an iterator.
Here's an example:
import org.junit.Test;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
public class RemoveAndUse {
class A {
int x;
A(int x) { this.x = x;}
}
class B {
int y;
B(int y) { this.y = y;}
}
#Test
public void removeHappyPath() {
Set<A> aList = new HashSet<>(Arrays.asList(new A(1), new A(2), new A(3)));
B b = new B(2);
// remove and keep an A that matches b
A found = aList.stream()
.filter( a -> a.x == b.y )
.findAny().get();
aList.removeIf( a -> a.x == b.y);
// or: aList.remove(found);
assert(!aList.contains(found));
assert(found.x == b.y);
}
}
Any other ideas?
A found;
for (Iterator<A> it = aList.iterator();it.hasNext();) {
A a = it.next();
if (a.x == b.y) {
found = a;
it.remove();
break;
}
}
O(n) is guaranteed;

How to find the address where width and height are stored inside an mp4 file?

I need to find the addresses where the width and height are stored, but the IUT version of the standard don't give a clear definition of the file format.
What I found so far... :
Both values are stored in "a QuickTime float". I couldn't find the format, but it seems it use two 16-bits integer: a signed one followed by an unsigned one.
Unlike many file format, there are no fixed position, so it is file specific. It depend on the TrackHeaderBox address.
What I desperatly need :
A clear canonical answer describing the places to find only those kind of information. I don't want answers only referring to third party libraries (unless they are written in proper JavaScript). Some pseudo C like structures can help.
There is no fixed position. You need to parse into the file. Please check this Java example.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Arrays;
import java.util.List;
public class GetHeight {
public static void main(String[] args) throws IOException {
FileInputStream fis = new FileInputStream(new File(args[0]));
GetHeight ps = new GetHeight();
ps.find(fis);
}
byte[] lastTkhd;
private void find(InputStream fis) throws IOException {
while (fis.available() > 0) {
byte[] header = new byte[8];
fis.read(header);
long size = readUint32(header, 0);
String type = new String(header, 4, 4, "ISO-8859-1");
if (containers.contains(type)) {
find(fis);
} else {
if (type.equals("tkhd")) {
lastTkhd = new byte[(int) (size - 8)];
fis.read(lastTkhd);
} else {
if (type.equals("hdlr")) {
byte[] hdlr = new byte[(int) (size - 8)];
fis.read(hdlr);
if (hdlr[8] == 0x76 && hdlr[9] == 0x69 && hdlr[10] == 0x64 && hdlr[11] == 0x65) {
System.out.println("Video Track Header identified");
System.out.println("width: " + readFixedPoint1616(lastTkhd, lastTkhd.length - 8));
System.out.println("height: " + readFixedPoint1616(lastTkhd, lastTkhd.length - 4));
System.exit(1);
}
} else {
fis.skip(size - 8);
}
}
}
}
}
public static long readUint32(byte[] b, int s) {
long result = 0;
result |= ((b[s + 0] << 24) & 0xFF000000);
result |= ((b[s + 1] << 16) & 0xFF0000);
result |= ((b[s + 2] << 8) & 0xFF00);
result |= ((b[s + 3]) & 0xFF);
return result;
}
public static double readFixedPoint1616(byte[] b, int s) {
return ((double) readUint32(b, s)) / 65536;
}
List<String> containers = Arrays.asList(
"moov",
"mdia",
"trak"
);
}

Resources