How to find the address where width and height are stored inside an mp4 file? - data-structures

I need to find the addresses where the width and height are stored, but the IUT version of the standard don't give a clear definition of the file format.
What I found so far... :
Both values are stored in "a QuickTime float". I couldn't find the format, but it seems it use two 16-bits integer: a signed one followed by an unsigned one.
Unlike many file format, there are no fixed position, so it is file specific. It depend on the TrackHeaderBox address.
What I desperatly need :
A clear canonical answer describing the places to find only those kind of information. I don't want answers only referring to third party libraries (unless they are written in proper JavaScript). Some pseudo C like structures can help.

There is no fixed position. You need to parse into the file. Please check this Java example.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Arrays;
import java.util.List;
public class GetHeight {
public static void main(String[] args) throws IOException {
FileInputStream fis = new FileInputStream(new File(args[0]));
GetHeight ps = new GetHeight();
ps.find(fis);
}
byte[] lastTkhd;
private void find(InputStream fis) throws IOException {
while (fis.available() > 0) {
byte[] header = new byte[8];
fis.read(header);
long size = readUint32(header, 0);
String type = new String(header, 4, 4, "ISO-8859-1");
if (containers.contains(type)) {
find(fis);
} else {
if (type.equals("tkhd")) {
lastTkhd = new byte[(int) (size - 8)];
fis.read(lastTkhd);
} else {
if (type.equals("hdlr")) {
byte[] hdlr = new byte[(int) (size - 8)];
fis.read(hdlr);
if (hdlr[8] == 0x76 && hdlr[9] == 0x69 && hdlr[10] == 0x64 && hdlr[11] == 0x65) {
System.out.println("Video Track Header identified");
System.out.println("width: " + readFixedPoint1616(lastTkhd, lastTkhd.length - 8));
System.out.println("height: " + readFixedPoint1616(lastTkhd, lastTkhd.length - 4));
System.exit(1);
}
} else {
fis.skip(size - 8);
}
}
}
}
}
public static long readUint32(byte[] b, int s) {
long result = 0;
result |= ((b[s + 0] << 24) & 0xFF000000);
result |= ((b[s + 1] << 16) & 0xFF0000);
result |= ((b[s + 2] << 8) & 0xFF00);
result |= ((b[s + 3]) & 0xFF);
return result;
}
public static double readFixedPoint1616(byte[] b, int s) {
return ((double) readUint32(b, s)) / 65536;
}
List<String> containers = Arrays.asList(
"moov",
"mdia",
"trak"
);
}

Related

page X of Y with re-ordered TOC: X will start from 1 again after the TOC

I could create seprately the "page x of y" and re-ordered the TOC with the official examples. "Page x of y" is created according to iText 7: Building Blocks Chapter 7: Handling events; setting viewer preferences and writer properties with the examples Solving the "Page X of Y" problem; and TOC is created with reference to iText 7 examples TOC as first page.
Now I want the generated PDF to have both "page x of y" and re-ordered TOC. And "page x of y" shall be shown on all pages, i.e. on the 1st page (the TOC page), it shall show "Page 1 of 35", the 2nd page (start page of the main text) shall show "Page 2 of 35" (In this Jekyll and Hyde example, TOC has one page).
But when I tried to put "page x of y" and re-order TOC together, I found a problem in the generated PDF: the 1st page (the TOC page) showed correctly "Page 1 of 35", but the 2nd page (start page of the main text) showed also "Page 1 of 35".
What is the tricks to let the 2nd page to show "Page 2 of 35" with re-ordered TOC?
==code for Page X of Y and re-order TOC==
package main;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.AbstractMap.SimpleEntry;
import java.util.ArrayList;
import java.util.List;
import com.itextpdf.io.IOException;
import com.itextpdf.io.font.FontConstants;
import com.itextpdf.kernel.events.Event;
import com.itextpdf.kernel.events.IEventHandler;
import com.itextpdf.kernel.events.PdfDocumentEvent;
import com.itextpdf.kernel.font.PdfFont;
import com.itextpdf.kernel.font.PdfFontFactory;
import com.itextpdf.kernel.geom.Rectangle;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfName;
import com.itextpdf.kernel.pdf.PdfOutline;
import com.itextpdf.kernel.pdf.PdfPage;
import com.itextpdf.kernel.pdf.PdfString;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.kernel.pdf.action.PdfAction;
import com.itextpdf.kernel.pdf.canvas.PdfCanvas;
import com.itextpdf.kernel.pdf.canvas.draw.DottedLine;
import com.itextpdf.kernel.pdf.navigation.PdfDestination;
import com.itextpdf.kernel.pdf.xobject.PdfFormXObject;
import com.itextpdf.layout.Canvas;
import com.itextpdf.layout.Document;
import com.itextpdf.layout.element.AreaBreak;
import com.itextpdf.layout.element.Paragraph;
import com.itextpdf.layout.element.Tab;
import com.itextpdf.layout.element.TabStop;
import com.itextpdf.layout.hyphenation.HyphenationConfig;
import com.itextpdf.layout.layout.LayoutContext;
import com.itextpdf.layout.layout.LayoutResult;
import com.itextpdf.layout.property.AreaBreakType;
import com.itextpdf.layout.property.TabAlignment;
import com.itextpdf.layout.property.TextAlignment;
import com.itextpdf.layout.renderer.ParagraphRenderer;
public class CreateTOC {
public static final String SRC = "D:/work/java_workspace/result/jekyll_hyde.txt";
public static final String DEST = "D:/work/java_workspace/result/test_toc.pdf";
public static void main(String args[]) throws IOException, Exception {
File file = new File(DEST);
file.getParentFile().mkdirs();
new CreateTOC().createPdf(DEST);
}
public void createPdf(String dest) throws IOException, java.io.IOException {
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
pdf.getCatalog().setPageMode(PdfName.UseOutlines);
PageXofY event = new PageXofY(pdf);
pdf.addEventHandler(PdfDocumentEvent.END_PAGE, event);
PdfFont font = PdfFontFactory.createFont(FontConstants.TIMES_ROMAN);
PdfFont bold = PdfFontFactory.createFont(FontConstants.HELVETICA_BOLD);
Document document = new Document(pdf);
document.setTextAlignment(TextAlignment.JUSTIFIED)
.setHyphenation(new HyphenationConfig("en", "uk", 3, 3))
.setFont(font)
.setFontSize(11);
// // add the cover
// document.add(new Paragraph("this is the cover 1"));
// document.add(new AreaBreak(AreaBreakType.NEXT_PAGE));
//
//
// document.add(new Paragraph("this is the cover 2"));
// document.add(new AreaBreak(AreaBreakType.NEXT_PAGE));
// parse text to PDF
BufferedReader br = new BufferedReader(new FileReader(SRC));
String name, line;
Paragraph p;
boolean title = true;
int counter = 0;
PdfOutline outline = null;
List<SimpleEntry<String,SimpleEntry<String, Integer>>> toc = new ArrayList<>();
while ((line = br.readLine()) != null) {
p = new Paragraph(line);
p.setKeepTogether(true);
if (title) {
name = String.format("title%02d", counter++);
outline = createOutline(outline, pdf, line, name);
int pagesWithoutCover = pdf.getNumberOfPages();
SimpleEntry<String, Integer> titlePage = new SimpleEntry(line, pagesWithoutCover);
p.setFont(bold).setFontSize(12)
.setKeepWithNext(true)
.setDestination(name)
.setNextRenderer(new UpdatePageRenderer(p, titlePage));
title = false;
document.add(p);
toc.add(new SimpleEntry(name, titlePage));
}
else {
p.setFirstLineIndent(18);
if (line.isEmpty()) {
p.setMarginBottom(12);
title = true;
}
else {
p.setMarginBottom(0);
}
document.add(p);
}
}
document.add(new AreaBreak(AreaBreakType.NEXT_PAGE));
// create table of contents
int startToc = pdf.getNumberOfPages();
p = new Paragraph().setFont(bold).add("Table of Contents").setDestination("toc");
document.add(p);
toc.remove(0);
List<TabStop> tabstops = new ArrayList();
tabstops.add(new TabStop(580, TabAlignment.RIGHT, new DottedLine()));
for (SimpleEntry<String, SimpleEntry<String, Integer>> entry : toc) {
SimpleEntry<String, Integer> text = entry.getValue();
p = new Paragraph()
.addTabStops(tabstops)
.add(text.getKey())
// .setFixedLeading(150)
.add(new Tab())
.add(String.valueOf(text.getValue()))
.setAction(PdfAction.createGoTo(entry.getKey()));
document.add(p);
}
int tocPages = pdf.getNumberOfPages() - startToc;
// reorder pages
PdfPage page;
for (int i = 0; i <= tocPages; i++) {
page = pdf.removePage(startToc + i);
pdf.addPage(i + 1, page);
}
event.writeTotal(pdf);
document.close();
}
protected class UpdatePageRenderer extends ParagraphRenderer {
protected SimpleEntry<String, Integer> entry;
public UpdatePageRenderer(Paragraph modelElement, SimpleEntry<String, Integer> entry) {
super(modelElement);
this.entry = entry;
}
#Override
public LayoutResult layout(LayoutContext layoutContext) {
LayoutResult result = super.layout(layoutContext);
entry.setValue(layoutContext.getArea().getPageNumber());
return result;
}
}
public PdfOutline createOutline(PdfOutline outline, PdfDocument pdf, String title, String name) {
if (outline == null) {
outline = pdf.getOutlines(false);
outline = outline.addOutline(title);
outline.addDestination(PdfDestination.makeDestination(new PdfString(name)));
return outline;
}
PdfOutline kid = outline.addOutline(title);
kid.addDestination(PdfDestination.makeDestination(new PdfString(name)));
return outline;
}
protected class PageXofY implements IEventHandler {
protected PdfFormXObject placeholder;
protected float side = 20;
protected float x = 300;
protected float y = 25;
protected float space = 4.5f;
protected float descent = 3;
public PageXofY(PdfDocument pdf) {
placeholder = new PdfFormXObject(new Rectangle(0, 0, side, side));
}
#Override
public void handleEvent(Event event) {
PdfDocumentEvent docEvent = (PdfDocumentEvent) event;
PdfDocument pdf = docEvent.getDocument();
PdfPage page = docEvent.getPage();
int pageNumber = pdf.getPageNumber(page);
Rectangle pageSize = page.getPageSize();
PdfCanvas pdfCanvas = new PdfCanvas(
page.newContentStreamBefore(), page.getResources(), pdf);
Canvas canvas = new Canvas(pdfCanvas, pdf, pageSize);
Paragraph p = new Paragraph().add("Page ").add(String.valueOf(pageNumber)).add(" of");
canvas.showTextAligned(p, x, y, TextAlignment.RIGHT);
pdfCanvas.addXObject(placeholder, x + space, y - descent);
pdfCanvas.release();
}
public void writeTotal(PdfDocument pdf) {
Canvas canvas = new Canvas(placeholder, pdf);
canvas.showTextAligned(String.valueOf(pdf.getNumberOfPages()),
0, descent, TextAlignment.LEFT);
}
}
}
In general
You will obviously run into trouble if you first create pages including a "page x/y" using the current page number of each page and then re-order the pages.
If you know beforehand how many pages you will move up front, you can take this re-ordering into account by adding this number as offset to the page number in your event listener. Be sure to reset that offset when you start creating the TOC pages.
If you don't know that number, it does not make sense to try to number the pages before re-ordering at all. Instead add page numbers afterwards as described in the iText 7: Building Blocks Chapter 2: Working with the RootElement example Adding a Page X of Y footer, i.e. loop over every page in the document and add a "Page X of Y" Paragraph to each page:
int n = pdf.getNumberOfPages();
Paragraph footer;
for (int page = 1; page <= n; page++) {
footer = new Paragraph(String.format("Page %s of %s", page, n));
document.showTextAligned(footer, 297.5f, 20, page,
TextAlignment.CENTER, VerticalAlignment.MIDDLE, 0);
}
document.close();
Don't forget to set immediateFlush to false as described right after that example.
Using an offset
In a comment you indicated that you did not want to use the solution from chapter 2 referenced above as you didn't want to keep the whole PDF in memory. Then you posted your code.
Thus, let's try and implement the offset mentioned above in your code.
The offset variable is best located right in the event listener. Having added it, it might looks like this:
protected class PageXofY implements IEventHandler
{
// vvv added
int offset = 0;
// ^^^ added
protected PdfFormXObject placeholder;
protected float side = 20;
protected float x = 300;
protected float y = 25;
protected float space = 4.5f;
protected float descent = 3;
public PageXofY(PdfDocument pdf)
{
placeholder = new PdfFormXObject(new Rectangle(0, 0, side, side));
}
#Override
public void handleEvent(Event event)
{
PdfDocumentEvent docEvent = (PdfDocumentEvent) event;
PdfDocument pdf = docEvent.getDocument();
PdfPage page = docEvent.getPage();
int pageNumber = pdf.getPageNumber(page);
Rectangle pageSize = page.getPageSize();
PdfCanvas pdfCanvas = new PdfCanvas(
page.newContentStreamBefore(), page.getResources(), pdf);
Canvas canvas = new Canvas(pdfCanvas, pdf, pageSize);
// vvv changed
Paragraph p = new Paragraph().add("Page ").add(String.valueOf(pageNumber + offset)).add(" of");
// ^^^ changed
canvas.showTextAligned(p, x, y, TextAlignment.RIGHT);
pdfCanvas.addXObject(placeholder, x + space, y - descent);
pdfCanvas.release();
}
public void writeTotal(PdfDocument pdf)
{
Canvas canvas = new Canvas(placeholder, pdf);
canvas.showTextAligned(String.valueOf(pdf.getNumberOfPages()),
0, descent, TextAlignment.LEFT);
}
}
(PageXofY)
(You might want to add getters and setters for the offset.)
When importing the text body your page numbers currently are created off-by-one as the TOC page will later be pulled up to the front. Thus, you need to use an offset of 1 (1 page TOC) during that import.
Afterwards, before starting the TOC page, you will have to reset the offset to 0 as nothing will be pulled before the TOC page thereafter.
Id est:
public void createPdf(Reader reader, String dest) throws IOException
{
[...]
Document document = new Document(pdf);
document.setTextAlignment(TextAlignment.JUSTIFIED)
.setHyphenation(new HyphenationConfig("en", "uk", 3, 3))
.setFont(font)
.setFontSize(11);
// vvv added
event.offset = 1;
// ^^^ added
// // add the cover
[...]
document.add(new AreaBreak(AreaBreakType.NEXT_PAGE));
// vvv added
event.offset = 0;
// ^^^ added
// create table of contents
int startToc = pdf.getNumberOfPages();
[...]
}
(CreateTOC method createPdf)
In the current iText 7 development 7.0.3-SNAPSHOT version this results in desired page numbering.
Beware: There had been reports on delayed page event execution. Probably the event timing meanwhile has been changed. With older versions, therefore, the code might still apply wrong page numbers.

How to add tags to a parsed tree that has no tag?

For example, the parsing tree from Stanford Sentiment Treebank
"(2 (2 (2 near) (2 (2 the) (2 end))) (3 (3 (2 takes) (2 (2 on) (2 (2 a) (2 (2 whole) (2 (2 other) (2 meaning)))))) (2 .)))",
where the number is the sentiment label of each node.
I want to add POS tagging information to each node. Such as:
"(NP (ADJP (IN near)) (DT the) (NN end)) "
I have tried to directly parse the sentence, but the resulted tree is different from that in the Sentiment Treebank (may be because of the parsing version or parameters, I have tried to contact to the author but there is no response).
How can I obtain the tagging information?
I think the code in edu.stanford.nlp.sentiment.BuildBinarizedDataset should be helpful. The main() method steps through how these binary trees can be created in Java code.
Some key lines to look out for in the code:
LexicalizedParser parser = LexicalizedParser.loadModel(parserModel);
TreeBinarizer binarizer = TreeBinarizer.simpleTreeBinarizer(parser.getTLPParams().headFinder(), parser.treebankLanguagePack());
...
Tree tree = parser.apply(tokens);
Tree binarized = binarizer.transformTree(tree);
You can access the node tag information from the Tree object. You should look at the javadoc for edu.stanford.nlp.trees.Tree to see how to access this information.
Also in this answer I have some code that shows accessing a Tree:
How to get NN andNNS from a text?
You want to look at the label() of each tree and subtree to get the tag for a node.
Here is the reference on GitHub to BuildBinarizedDataset.java:
https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/sentiment/BuildBinarizedDataset.java
Please let me know if anything is unclear about this and I can provide further assistance!
First, you need to download the Stanford Parser
Set up
private LexicalizedParser parser;
private TreeBinarizer binarizer;
private CollapseUnaryTransformer transformer;
parser = LexicalizedParser.loadModel(PCFG_PATH);
binarizer = TreeBinarizer.simpleTreeBinarizer(
parser.getTLPParams().headFinder(), parser.treebankLanguagePack());
transformer = new CollapseUnaryTransformer();
Parse
Tree tree = parser.apply(tokens);
Access POSTAG
public String[] constTreePOSTAG(Tree tree) {
Tree binarized = binarizer.transformTree(tree);
Tree collapsedUnary = transformer.transformTree(binarized);
Trees.convertToCoreLabels(collapsedUnary);
collapsedUnary.indexSpans();
List<Tree> leaves = collapsedUnary.getLeaves();
int size = collapsedUnary.size() - leaves.size();
String[] tags = new String[size];
HashMap<Integer, Integer> index = new HashMap<Integer, Integer>();
int idx = leaves.size();
int leafIdx = 0;
for (Tree leaf : leaves) {
Tree cur = leaf.parent(collapsedUnary); // go to preterminal
int curIdx = leafIdx++;
boolean done = false;
while (!done) {
Tree parent = cur.parent(collapsedUnary);
if (parent == null) {
tags[curIdx] = cur.label().toString();
break;
}
int parentIdx;
int parentNumber = parent.nodeNumber(collapsedUnary);
if (!index.containsKey(parentNumber)) {
parentIdx = idx++;
index.put(parentNumber, parentIdx);
} else {
parentIdx = index.get(parentNumber);
done = true;
}
tags[curIdx] = parent.label().toString();
cur = parent;
curIdx = parentIdx;
}
}
return tags;
}
Here is the full source code ConstituencyParse.java that run:
Use param:
java ConstituencyParse -tokpath outputtoken.toks -parentpath outputparent.txt -tagpath outputag.txt < input_sentence_in_text_file_one_sent_per_line.txt
(Note: the source code is adapt from treelstm repo, you also need to replace preprocess-sst.py to call ConstituencyParse.java file below)
import edu.stanford.nlp.process.WordTokenFactory;
import edu.stanford.nlp.ling.HasWord;
import edu.stanford.nlp.ling.Word;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.ling.TaggedWord;
import edu.stanford.nlp.process.PTBTokenizer;
import edu.stanford.nlp.util.StringUtils;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;
import edu.stanford.nlp.parser.lexparser.TreeBinarizer;
import edu.stanford.nlp.tagger.maxent.MaxentTagger;
import edu.stanford.nlp.trees.GrammaticalStructure;
import edu.stanford.nlp.trees.GrammaticalStructureFactory;
import edu.stanford.nlp.trees.PennTreebankLanguagePack;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.Trees;
import edu.stanford.nlp.trees.TreebankLanguagePack;
import edu.stanford.nlp.trees.TypedDependency;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.StringReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.HashMap;
import java.util.Properties;
import java.util.Scanner;
public class ConstituencyParse {
private boolean tokenize;
private BufferedWriter tokWriter, parentWriter, tagWriter;
private LexicalizedParser parser;
private TreeBinarizer binarizer;
private CollapseUnaryTransformer transformer;
private GrammaticalStructureFactory gsf;
private static final String PCFG_PATH = "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz";
public ConstituencyParse(String tokPath, String parentPath, String tagPath, boolean tokenize) throws IOException {
this.tokenize = tokenize;
if (tokPath != null) {
tokWriter = new BufferedWriter(new FileWriter(tokPath));
}
parentWriter = new BufferedWriter(new FileWriter(parentPath));
tagWriter = new BufferedWriter(new FileWriter(tagPath));
parser = LexicalizedParser.loadModel(PCFG_PATH);
binarizer = TreeBinarizer.simpleTreeBinarizer(
parser.getTLPParams().headFinder(), parser.treebankLanguagePack());
transformer = new CollapseUnaryTransformer();
// set up to produce dependency representations from constituency trees
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
gsf = tlp.grammaticalStructureFactory();
}
public List<HasWord> sentenceToTokens(String line) {
List<HasWord> tokens = new ArrayList<>();
if (tokenize) {
PTBTokenizer<Word> tokenizer = new PTBTokenizer(new StringReader(line), new WordTokenFactory(), "");
for (Word label; tokenizer.hasNext(); ) {
tokens.add(tokenizer.next());
}
} else {
for (String word : line.split(" ")) {
tokens.add(new Word(word));
}
}
return tokens;
}
public Tree parse(List<HasWord> tokens) {
Tree tree = parser.apply(tokens);
return tree;
}
public String[] constTreePOSTAG(Tree tree) {
Tree binarized = binarizer.transformTree(tree);
Tree collapsedUnary = transformer.transformTree(binarized);
Trees.convertToCoreLabels(collapsedUnary);
collapsedUnary.indexSpans();
List<Tree> leaves = collapsedUnary.getLeaves();
int size = collapsedUnary.size() - leaves.size();
String[] tags = new String[size];
HashMap<Integer, Integer> index = new HashMap<Integer, Integer>();
int idx = leaves.size();
int leafIdx = 0;
for (Tree leaf : leaves) {
Tree cur = leaf.parent(collapsedUnary); // go to preterminal
int curIdx = leafIdx++;
boolean done = false;
while (!done) {
Tree parent = cur.parent(collapsedUnary);
if (parent == null) {
tags[curIdx] = cur.label().toString();
break;
}
int parentIdx;
int parentNumber = parent.nodeNumber(collapsedUnary);
if (!index.containsKey(parentNumber)) {
parentIdx = idx++;
index.put(parentNumber, parentIdx);
} else {
parentIdx = index.get(parentNumber);
done = true;
}
tags[curIdx] = parent.label().toString();
cur = parent;
curIdx = parentIdx;
}
}
return tags;
}
public int[] constTreeParents(Tree tree) {
Tree binarized = binarizer.transformTree(tree);
Tree collapsedUnary = transformer.transformTree(binarized);
Trees.convertToCoreLabels(collapsedUnary);
collapsedUnary.indexSpans();
List<Tree> leaves = collapsedUnary.getLeaves();
int size = collapsedUnary.size() - leaves.size();
int[] parents = new int[size];
HashMap<Integer, Integer> index = new HashMap<Integer, Integer>();
int idx = leaves.size();
int leafIdx = 0;
for (Tree leaf : leaves) {
Tree cur = leaf.parent(collapsedUnary); // go to preterminal
int curIdx = leafIdx++;
boolean done = false;
while (!done) {
Tree parent = cur.parent(collapsedUnary);
if (parent == null) {
parents[curIdx] = 0;
break;
}
int parentIdx;
int parentNumber = parent.nodeNumber(collapsedUnary);
if (!index.containsKey(parentNumber)) {
parentIdx = idx++;
index.put(parentNumber, parentIdx);
} else {
parentIdx = index.get(parentNumber);
done = true;
}
parents[curIdx] = parentIdx + 1;
cur = parent;
curIdx = parentIdx;
}
}
return parents;
}
// convert constituency parse to a dependency representation and return the
// parent pointer representation of the tree
public int[] depTreeParents(Tree tree, List<HasWord> tokens) {
GrammaticalStructure gs = gsf.newGrammaticalStructure(tree);
Collection<TypedDependency> tdl = gs.typedDependencies();
int len = tokens.size();
int[] parents = new int[len];
for (int i = 0; i < len; i++) {
// if a node has a parent of -1 at the end of parsing, then the node
// has no parent.
parents[i] = -1;
}
for (TypedDependency td : tdl) {
// let root have index 0
int child = td.dep().index();
int parent = td.gov().index();
parents[child - 1] = parent;
}
return parents;
}
public void printTokens(List<HasWord> tokens) throws IOException {
int len = tokens.size();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < len - 1; i++) {
if (tokenize) {
sb.append(PTBTokenizer.ptbToken2Text(tokens.get(i).word()));
} else {
sb.append(tokens.get(i).word());
}
sb.append(' ');
}
if (tokenize) {
sb.append(PTBTokenizer.ptbToken2Text(tokens.get(len - 1).word()));
} else {
sb.append(tokens.get(len - 1).word());
}
sb.append('\n');
tokWriter.write(sb.toString());
}
public void printParents(int[] parents) throws IOException {
StringBuilder sb = new StringBuilder();
int size = parents.length;
for (int i = 0; i < size - 1; i++) {
sb.append(parents[i]);
sb.append(' ');
}
sb.append(parents[size - 1]);
sb.append('\n');
parentWriter.write(sb.toString());
}
public void printTags(String[] tags) throws IOException {
StringBuilder sb = new StringBuilder();
int size = tags.length;
for (int i = 0; i < size - 1; i++) {
sb.append(tags[i]);
sb.append(' ');
}
sb.append(tags[size - 1]);
sb.append('\n');
tagWriter.write(sb.toString().toLowerCase());
}
public void close() throws IOException {
if (tokWriter != null) tokWriter.close();
parentWriter.close();
tagWriter.close();
}
public static void main(String[] args) throws Exception {
String TAGGER_MODEL = "stanford-tagger/models/english-left3words-distsim.tagger";
Properties props = StringUtils.argsToProperties(args);
if (!props.containsKey("parentpath")) {
System.err.println(
"usage: java ConstituencyParse -deps - -tokenize - -tokpath <tokpath> -parentpath <parentpath>");
System.exit(1);
}
// whether to tokenize input sentences
boolean tokenize = false;
if (props.containsKey("tokenize")) {
tokenize = true;
}
// whether to produce dependency trees from the constituency parse
boolean deps = false;
if (props.containsKey("deps")) {
deps = true;
}
String tokPath = props.containsKey("tokpath") ? props.getProperty("tokpath") : null;
String parentPath = props.getProperty("parentpath");
String tagPath = props.getProperty("tagpath");
ConstituencyParse processor = new ConstituencyParse(tokPath, parentPath, tagPath, tokenize);
Scanner stdin = new Scanner(System.in);
int count = 0;
long start = System.currentTimeMillis();
while (stdin.hasNextLine() && count < 2) {
String line = stdin.nextLine();
List<HasWord> tokens = processor.sentenceToTokens(line);
//end tagger
Tree parse = processor.parse(tokens);
// produce parent pointer representation
int[] parents = deps ? processor.depTreeParents(parse, tokens)
: processor.constTreeParents(parse);
String[] tags = processor.constTreePOSTAG(parse);
// print
if (tokPath != null) {
processor.printTokens(tokens);
}
processor.printParents(parents);
processor.printTags(tags);
// print tag
StringBuilder sb = new StringBuilder();
int size = tags.length;
for (int i = 0; i < size - 1; i++) {
sb.append(tags[i]);
sb.append(' ');
}
sb.append(tags[size - 1]);
sb.append('\n');
count++;
if (count % 100 == 0) {
double elapsed = (System.currentTimeMillis() - start) / 1000.0;
System.err.printf("Parsed %d lines (%.2fs)\n", count, elapsed);
}
}
long totalTimeMillis = System.currentTimeMillis() - start;
System.err.printf("Done: %d lines in %.2fs (%.1fms per line)\n",
count, totalTimeMillis / 100.0, totalTimeMillis / (double) count);
processor.close();
}
}

Feature Detection Opencv/Javacv not working

I am trying to run the feature detection program of javacv to compare the similar features in 2 images however I am getting a runtimeexception. Since I am completely new to javacv I don't know how to resolve this.
The exception trace is
OpenCV Error: Assertion failed (queryDescriptors.type() == trainDescCollection[0].type()) in unknown function, file ..\..\..\src\opencv\modules\features2d\src\matchers.cpp, line 351
Exception in thread "main" java.lang.RuntimeException: ..\..\..\src\opencv\modules\features2d\src\matchers.cpp:351: error: (-215) queryDescriptors.type() == trainDescCollection[0].type()
at com.googlecode.javacv.cpp.opencv_features2d$DescriptorMatcher.match(Native Method)
at Ex7DescribingSURF.main(Ex7DescribingSURF.java:63)
Here is the source code
import static com.googlecode.javacv.cpp.opencv_core.NORM_L2;
import static com.googlecode.javacv.cpp.opencv_core.cvCreateImage;
import static com.googlecode.javacv.cpp.opencv_features2d.drawMatches;
import static com.googlecode.javacv.cpp.opencv_highgui.cvLoadImage;
import java.util.Arrays;
import java.util.Comparator;
import javax.swing.JFrame;
import com.googlecode.javacv.CanvasFrame;
import com.googlecode.javacv.cpp.opencv_core.CvMat;
import com.googlecode.javacv.cpp.opencv_core.CvScalar;
import com.googlecode.javacv.cpp.opencv_core.CvSize;
import com.googlecode.javacv.cpp.opencv_core.IplImage;
import com.googlecode.javacv.cpp.opencv_features2d.BFMatcher;
import com.googlecode.javacv.cpp.opencv_features2d.DMatch;
import com.googlecode.javacv.cpp.opencv_features2d.DescriptorExtractor;
import com.googlecode.javacv.cpp.opencv_features2d.DrawMatchesFlags;
import com.googlecode.javacv.cpp.opencv_features2d.KeyPoint;
import com.googlecode.javacv.cpp.opencv_nonfree.SURF;
public class Ex7DescribingSURF {
/**
* Example for section "Describing SURF features" in chapter 8, page 212.
*
* Computes SURF features, extracts their descriptors, and finds best
* matching descriptors between two images of the same object. There are a
* couple of tricky steps, in particular sorting the descriptors.
*/
public static void main(String[] args) {
IplImage img = cvLoadImage("A.jpg");
IplImage template = cvLoadImage("B.jpg");
IplImage images[] = { img, template };
// Setup SURF feature detector and descriptor.
double hessianThreshold = 2500d;
int nOctaves = 4;
int nOctaveLayers = 2;
boolean extended = true;
boolean upright = false;
SURF surf = new SURF(hessianThreshold, nOctaves, nOctaveLayers,
extended, upright);
DescriptorExtractor surfDesc = DescriptorExtractor.create("SURF");
KeyPoint keyPoints[] = { new KeyPoint(), new KeyPoint() };
CvMat descriptors[] = new CvMat[2];
// Detect SURF features and compute descriptors for both images
for (int i = 0; i < 1; i++) {
surf.detect(images[i], null, keyPoints[i]);
// Create CvMat initialized with empty pointer, using simply `new
// CvMat()` leads to an exception.
descriptors[i] = new CvMat(null);
surfDesc.compute(images[i], keyPoints[i], descriptors[i]);
}
// Create feature matcher
BFMatcher matcher = new BFMatcher(NORM_L2, true);
DMatch matches = new DMatch();
// "match" is a keyword in Scala, to avoid conflict between a keyword
// and a method match of the BFMatcher,
// we need to enclose method name in ticks: `match`.
matcher.match(descriptors[0], descriptors[1], matches, null);
System.out.println("Matched: " + matches.capacity());
// Select only 25 best matches
DMatch bestMatches = selectBest(matches, 25);
// Draw best matches
IplImage imageMatches = cvCreateImage(new CvSize(images[0].width()
+ images[1].width(), images[0].height()), images[0].depth(), 3);
drawMatches(images[0], keyPoints[0], images[1], keyPoints[1],
bestMatches, imageMatches, CvScalar.BLUE, CvScalar.RED, null,
DrawMatchesFlags.DEFAULT);
CanvasFrame canvas = new CanvasFrame("");
canvas.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
canvas.showImage(imageMatches);
}
// ----------------------------------------------------------------------------------------------------------------
/** Select only the best matches from the list. Return new list. */
private static DMatch selectBest(DMatch matches, int numberToSelect) {
// Convert to Scala collection for the sake of sorting
int oldPosition = matches.position();
DMatch a[] = new DMatch[matches.capacity()];
for (int i = 0; i < a.length; i++) {
DMatch src = matches.position(i);
DMatch dest = new DMatch();
copy(src, dest);
a[i] = dest;
}
// Reset position explicitly to avoid issues from other uses of this
// position-based container.
matches.position(oldPosition);
// Sort
DMatch aSorted[] = a;
Arrays.sort(aSorted, new DistanceComparator());
// DMatch aSorted[]=sort(a);
// Create new JavaCV list
DMatch best = new DMatch(numberToSelect);
for (int i = 0; i < numberToSelect; i++) {
// Since there is no may to `put` objects into a list DMatch,
// We have to reassign all values individually, and hope that API
// will not any new ones.
copy(aSorted[i], best.position(i));
}
// Set position to 0 explicitly to avoid issues from other uses of this
// position-based container.
best.position(0);
return best;
}
private static void copy(DMatch src, DMatch dest) {
// TODO: use Pointer.copy() after JavaCV/JavaCPP 0.3 is released
// (http://code.google.com/p/javacpp/source/detail?r=51f4daa13d618c6bd6a5556ff2096d0e834638cc)
// dest.put(src)
dest.distance(src.distance());
dest.imgIdx(src.imgIdx());
dest.queryIdx(src.queryIdx());
dest.trainIdx(src.trainIdx());
}
static class DistanceComparator implements Comparator<DMatch> {
public int compare(DMatch o1, DMatch o2) {
if (o1.compare(o2))
return -1;
else
return 1;
}
};
}
Does anybody know what I might need more to make this work.. Any help appreciated
As the error clearly says that descriptor types does not match. You have to check for the condition if the descriptor types match.
A simple if statement before matcher.match would solve your problem
if (descriptors[0].type() == descriptors[1].type())
{
matcher.match(descriptors[0], descriptors[1], matches, null);
System.out.println("Matched: " + matches.capacity());
}
The CvMat was not initialized properly which was giving the error.
descriptors[i] = new CvMat(null);
Instead I put it like this which solved the problem.
descriptors[i] = CvMat.create(1, 1);
Don't know if still needed, but I found answer. In code there's problem with this loop:
for (int i = 0; i < 1; i++) {
surf.detect(images[i], null, keyPoints[i]);
// Create CvMat initialized with empty pointer, using simply `new
// CvMat()` leads to an exception.
descriptors[i] = new CvMat(null);
surfDesc.compute(images[i], keyPoints[i], descriptors[i]);
}
i is just 0, than the loop exits and you try to use object descriptors[1] which is absent.
Change it to for( int i = 0, i < 2, i++) {

why Array Index Out Of Bound Exception while re arranging doc file paragraph?

Here is a code snippet. Its giving arrayindexoutofboundexception. dont know why ?
import java.io.File;
import java.io.FileInputStream;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.xslf.usermodel.XSLFTextParagraph;
public class wordcount
{
public static void main(String[] args) throws Exception
{
File file = new File("E:\\myFiles\\abc.doc");
FileInputStream fis=new FileInputStream(file.getAbsolutePath());
HWPFDocument document=new HWPFDocument(fis);
WordExtractor extractor = new WordExtractor(document);
String [] fileData = extractor.getParagraphText();
for (int i = 0; i < fileData.length; i++)
{
// System.out.println(fileData[i].toString());
String[] paraword = fileData[i].toString().split(" ");
// out.println(paraword.length);
if(paraword[i].length() == 0 )
{
System.out.println("\n");
}
else if(paraword[i].length() > 0 && paraword[i].length() < 12)
{
for(int k=0 ; k < paraword[i].length()-1 ; k++)
{
System.out.println(paraword[k].toString());
}
}
else if(paraword[i].length() >= 12 )
{
for(int k=0 ; k < 12 ; k++)
{
System.out.println(paraword[k].toString());
}
}
System.out.println("\n");
}
}
}
This is the image of the abc.doc file
Note : Expected output will be printed on java console.
and the output will contain 12 words in each line. But after executing first line the error occurs.
Any help would be appreciated
TIA
Honestly, I'm not familiar with the apache.org API, but just by looking at your logic it looks like you want to replace every instance of:
paraword[i].length()
with:
paraword.length
Because it looks like you want to check how many words are in the paragraph and not how long the first word of the paragraph is. Correct me if I'm wrong, but I think that will fix you up.
Here is the correct code snippet
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
public class ExtractWordDocument
{
public String myString() throws IOException
{
File file = new File("PATH FOR THE .doc FILE");
FileInputStream fis=new FileInputStream(file.getAbsolutePath());
HWPFDocument document=new HWPFDocument(fis);
WordExtractor extractor = new WordExtractor(document);
String [] fileData = extractor.getParagraphText();
ArrayList<Object> EntireDoc = new ArrayList<>();
for (int i = 0; i < fileData.length; i++)
{
String[] paraword = fileData[i].toString().split("\\s+");
if(paraword.length == 0 )
{EntireDoc.add("\n");}
else if(paraword.length > 0 && paraword.length < 12)
{
for(int k=0 ; k < paraword.length ; k++)
{EntireDoc.add(paraword[k].toString()+" ");}
}
else if(paraword.length > 12 )
{
java.util.List<String> arrAsList = Arrays.asList(paraword);
String formatedString = arrAsList.toString()
.replace(",", "") //remove the commas
.replace("[", "") //remove the right bracket
.replace("]", ""); //remove the left bracket
StringBuilder sb = new StringBuilder(formatedString);
int i1 = 0;
while ((i1 = sb.indexOf(" ", i1 + 75)) != -1)
{sb.replace(i1, i1 + 1, "\n");}
EntireDoc.add(sb.toString());
}
EntireDoc.add("\n");
}
String formatedString = EntireDoc.toString()
.replace(",", "") //remove the commas
.replace("[", "") //remove the right bracket
.replace("]", ""); //remove the left bracket
return formatedString;
}
public static void main(String[] args)
{
try{
System.out.print(new ExtractWordDocument().myString());
}
catch(IOException ioe){System.out.print(ioe);}
}
}
Note : This code will not print 12 words in each line but 75 charecters in each line.

Problems during counting strings in the txt file

I am developing a progam which reads a text file and creates a report. The content of the report is the following: the number of every string in file, its "status", and some symbols of every string beginning. It works well with file up to 100 Mb.
But when I run the program with input files which are bigger than 1,5Gb in size and contain more than 100000 lines, I get the following error:
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOfRange(Unknown Source) at
> java.lang.String.<init>(Unknown Source) at
> java.lang.StringBuffer.toString(Unknown Source) at
> java.io.BufferedReader.readLine(Unknown Source) at
> java.io.BufferedReader.readLine(Unknown Source) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:771) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:723) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:745) at
> org.apache.commons.io.FileUtils.readLines(FileUtils.java:1512) at
> org.apache.commons.io.FileUtils.readLines(FileUtils.java:1528) at
> org.apache.commons.io.ReadFileToListSample.main(ReadFileToListSample.java:43)
I increased VM arguments up to -Xms128m -Xmx1600m (in eclipse run configuration) but this did not help. Specialists from OTN forum advised me to read some books and improve my program's performance. Could anybody help me to improve it? Thank you.
code:
import org.apache.commons.io.FileUtils;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.PrintStream;
import java.util.List;
public class ReadFileToList {
public static void main(String[] args) throws FileNotFoundException
{
File file_out = new File ("D:\\Docs\\test_out.txt");
FileOutputStream fos = new FileOutputStream(file_out);
PrintStream ps = new PrintStream (fos);
System.setOut (ps);
// Create a file object
File file = new File("D:\\Docs\\test_in.txt");
FileReader fr = null;
LineNumberReader lnr = null;
try {
// Here we read a file, sample.txt, using FileUtils
// class of commons-io. Using FileUtils.readLines()
// we can read file content line by line and return
// the result as a List of string.
List<String> contents = FileUtils.readLines(file);
//
// Iterate the result to print each line of the file.
fr = new FileReader(file);
lnr = new LineNumberReader(fr);
for (String line : contents)
{
String begin_line = line.substring(0, 38); // return 38 chars from the string
String begin_line_without_null = begin_line.replace("\u0000", " ");
String begin_line_without_null_spaces = begin_line_without_null.replaceAll(" +", " ");
int stringlenght = line.length();
line = lnr.readLine();
int line_num = lnr.getLineNumber();
String status;
// some correct length for if
int c_u_length_f = 12;
int c_ea_length_f = 13;
int c_a_length_f = 2130;
int c_u_length_e = 3430;
int c_ea_length_e = 1331;
int c_a_length_e = 442;
int h_ext = 6;
int t_ext = 6;
if ( stringlenght == c_u_length_f ||
stringlenght == c_ea_length_f ||
stringlenght == c_a_length_f ||
stringlenght == c_u_length_e ||
stringlenght == c_ea_length_e ||
stringlenght == c_a_length_e ||
stringlenght == h_ext ||
stringlenght == t_ext)
status = "ok";
else status = "fail";
System.out.println(+ line_num + stringlenght + status + begin_line_without_null_spaces);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Also specialists from OTN said that this programm opens the input and reading it twice. May be some mistakes in "for statement"? But I can't find it.
Thank you.
You're declaring variables inside the loop and doing a lot of uneeded work, including reading the file twice - not good for peformance either. You can use the line number reader to get the line number and the text and reuse the line variable (declared outside the loop). Here's a shortened version that does what you need. You'll need to complete the validLength method to check all the values since I included only the first couple of tests.
import java.io.*;
public class TestFile {
//a method to determine if the length is valid implemented outside the method that does the reading
private static String validLength(int length) {
if (length == 12 || length == 13 || length == 2130) //you can finish it
return "ok";
return "fail";
}
public static void main(String[] args) {
try {
LineNumberReader lnr = new LineNumberReader(new FileReader(args[0]));
BufferedWriter out = new BufferedWriter(new FileWriter(args[1]));
String line;
int length;
while (null != (line = lnr.readLine())) {
length = line.length();
line = line.substring(0,38);
line = line.replace("\u0000", " ");
line = line.replace("+", " ");
out.write( lnr.getLineNumber() + length + validLength(length) + line);
out.newLine();
}
out.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
}
Call this as java TestFile D:\Docs\test_in.txt D:\Docs\test_in.txt or replace the args[0] and args[1] with the file names if you want to hard code them.

Resources