How to get XML parse of a tree to a string? - stanford-nlp

Tree tree = sentence.get(TreeAnnotation.class);
I want to get an XML representation of the parse tree into a string. How do I do it?

The string s contains the XML output of the parse tree"
import java.io.StringWriter;
import java.io.PrintWriter;
StringWriter stringWriter = new StringWriter();
PrintWriter writer = new PrintWriter(stringWriter);
tree.indentedXMLPrint( writer, false);
String s = stringWriter.toString()

Related

Jackson ObjectWriter only writes first entry from stream

I want to create a Spring Boot controller which creates a CSV file using data from a stream. I use Jackson CSV (jackson-dataformat-csv 2.12.1) to write the data stream from the DB to a StreamingResponseBody.
To keep it simple, I replaced the actual data from the DB with a list containing 1, 2, 3. I want a CSV file which looks like this:
1
2
3
But it only contains the first entry (1). Can someone help me to identify the problem?
Please not that I don't want to create the file somewhere on the server, I want to stream the content directly to the user.
My code looks like this:
import com.fasterxml.jackson.dataformat.csv.CsvMapper
import org.springframework.http.MediaType
import org.springframework.http.ResponseEntity
import org.springframework.web.bind.annotation.GetMapping
import org.springframework.web.bind.annotation.RequestMapping
import org.springframework.web.bind.annotation.RestController
import org.springframework.web.servlet.mvc.method.annotation.StreamingResponseBody
import javax.servlet.http.HttpServletResponse
#RestController
#RequestMapping(value = ["/test"], produces = [MediaType.TEXT_HTML_VALUE])
class TestController {
#GetMapping("/download", produces = [MediaType.TEXT_EVENT_STREAM_VALUE])
fun download(response: HttpServletResponse): ResponseEntity<StreamingResponseBody>? {
response.setHeader("Content-Disposition", "attachment;filename=download.csv")
response.status = HttpServletResponse.SC_OK
val mapper = CsvMapper()
val schema = mapper.schemaFor(Int::class.java)
val writer = mapper.writer(schema)
val input = listOf(1, 2, 3).stream()
val stream = StreamingResponseBody { outputStream ->
input.forEach { entity ->
writer.writeValue(outputStream, entity)
}
}
return ResponseEntity
.ok()
.header("Content-Disposition", "attachment;filename=download.csv")
.body(stream)
}
}
With the help of Andriy's comment I was able to find the cause and the solution. Jackson closes the stream when it's finished writing to it, see: ObjectMapper._writeValueAndClose.
To change this behavior you have to set JsonGenerator.Feature.AUTO_CLOSE_TARGET to false like this:
val jsonFactory = CsvFactory().configure(JsonGenerator.Feature.AUTO_CLOSE_TARGET, false)
val mapper = ObjectMapper(jsonFactory)
val writer = mapper.writer(CsvMapper().schemaFor(Int::class.java))
Note: There is no AUTO_CLOSE_TARGET option for the CsvGeneratorbut using the JsonGenerator setting also works for the CsvFactory.
Didn't see the column formatter assigned while writing to the stream in your code. Can you try the following :-
CsvMapper mapper = new CsvMapper();
CsvSchema schema = mapper.schemaFor(Dymmy.class);
schema = schema.withColumnSeparator('\t');
There is a better alternative then setting AUTO_CLOSE_TARGET to false, which is to use SequenceWritter.
val stream = StreamingResponseBody { outputStream ->
val mapper = CsvMapper()
val sequenceWriter = mapper.writer(mapper.schemaFor(Int::class.java).withHeader())
.writeValues(outputStream)
input.forEach { entity ->
sequenceWriter.write(entity)
}
}

How to use StanfordNLP Chinese segmentor in Java?

I have tried the following code, however the code does not work and only outputs null.
String text = "我爱北京天安门。";
StanfordCoreNLP pipeline = new StanfordCoreNLP();
Annotation annotation = pipeline.process(text);
String result = annotation.get(CoreAnnotations.ChineseSegAnnotation.class);
System.out.println(result);
The result:
...
done [0.6 sec].
Using mention detector type: rule
null
How to use StanfordNLP Chinese segmentor correctly?
Some sample code:
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.util.StringUtils;
import java.util.*;
public class ChineseSegmenter {
public static void main (String[] args) {
// set the properties to the standard Chinese pipeline properties
Properties props = StringUtils.argsToProperties("-props", "StanfordCoreNLP-chinese.properties");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "...";
Annotation annotation = new Annotation(text);
pipeline.annotate(annotation);
List<CoreLabel> tokens = annotation.get(CoreAnnotations.TokensAnnotation.class);
for (CoreLabel token : tokens)
System.out.println(token);
}
}
Note: Make sure the Chinese models jar is on your CLASSPATH. That file is available here: http://stanfordnlp.github.io/CoreNLP/download.html
The above code should print out the tokens created after the Chinese segmenter is run.

How to split the result of PTBTokenizer into sentences?

I know I could use DocumentPreprocessor to split a text into sentence. But it does not provide enough information if one wants to convert the tokenized text back to the original text. So I have to use PTBTokenizer, which has an invertible option.
However, PTBTokenizer simply returns an iterator of all the tokens (CoreLabels) in a document. It does not split the document into sentences.
The documentation says:
The output of PTBTokenizer can be post-processed to divide a text into sentences.
But this is obviously not trivial.
Is there a class in the Stanford NLP library that can take as input a sequence of CoreLabels, and output sentences? Here's what I mean exactly:
List<List<CoreLabel>> split(List<CoreLabel> documentTokens);
I would suggest you use the StanfordCoreNLP class. Here is some sample code:
import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.ling.CoreAnnotations.*;
import edu.stanford.nlp.util.*;
public class PipelineExample {
public static void main (String[] args) throws IOException {
// build pipeline
Properties props = new Properties();
props.setProperty("annotators","tokenize, ssplit, pos");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = " I am a sentence. I am another sentence.";
Annotation annotation = new Annotation(text);
pipeline.annotate(annotation);
System.out.println(annotation.get(TextAnnotation.class));
List<CoreMap> sentences = annotation.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
System.out.println(sentence.get(TokensAnnotation.class));
for (CoreLabel token : sentence.get(TokensAnnotation.class)) {
System.out.println(token.after() != null);
System.out.println(token.before() != null);
System.out.println(token.beginPosition());
System.out.println(token.endPosition());
}
}
}
}

Hbase:Need suitable jar files for cloudera-quickstart-vm-5.4.2-0

I am trying to load data from flat file to Hbase through API.But I am getting following error
========================================================
java.lang.NumberFormatException.forInputString
(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.parseInt(Integer.java:527)
at org.apache.hadoop.hbase.HServerAddress.(HServerAddress.java:63)
at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:63)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:354)
at org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:94)
at Hbase.readFromFile.main(readFromFile.java:16)
Code :
package Hbase;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
public class readFromFile {
public static void main(String[] args) throws IOException{
if(args.length==1)
{
Configuration conf = HBaseConfiguration.create(new Configuration());
HBaseAdmin hba = new HBaseAdmin(conf);
if(!hba.tableExists(args[0])){
HTableDescriptor ht = new HTableDescriptor(args[0]);
ht.addFamily(new HColumnDescriptor("sample"));
ht.addFamily(new HColumnDescriptor("region"));
ht.addFamily(new HColumnDescriptor("time"));
ht.addFamily(new HColumnDescriptor("product"));
ht.addFamily(new HColumnDescriptor("sale"));
ht.addFamily(new HColumnDescriptor("profit"));
hba.createTable(ht);
System.out.println("New Table Created");
HTable table = new HTable(conf,args[0]);
File f = new File("/home/training/Desktop/data");
BufferedReader br = new BufferedReader(new FileReader(f));
String line = br.readLine();
int i =1;
String rowname="row";
while(line!=null && line.length()!=0){
System.out.println("Ok till here");
StringTokenizer tokens = new StringTokenizer(line,",");
rowname = "row"+i;
Put p = new Put(Bytes.toBytes(rowname));
p.add(Bytes.toBytes("sample"),Bytes.toBytes("sampleNo."),
Bytes.toBytes(Integer.parseInt(tokens.nextToken())));
p.add(Bytes.toBytes("region"),Bytes.toBytes("country"),
Bytes.toBytes(tokens.nextToken()));
p.add(Bytes.toBytes("region"),Bytes.toBytes("state"),
Bytes.toBytes(tokens.nextToken()));
p.add(Bytes.toBytes("region"),Bytes.toBytes("city"),
Bytes.toBytes(tokens.nextToken()));
p.add(Bytes.toBytes("time"),Bytes.toBytes("year"),
Bytes.toBytes(Integer.parseInt(tokens.nextToken())));
p.add(Bytes.toBytes("time"),Bytes.toBytes("month"),
Bytes.toBytes(tokens.nextToken()));
p.add(Bytes.toBytes("product"),Bytes.toBytes("productNo."),
Bytes.toBytes(tokens.nextToken()));
p.add(Bytes.toBytes("sale"),Bytes.toBytes("quantity"),
Bytes.toBytes(Integer.parseInt(tokens.nextToken())));
p.add(Bytes.toBytes("profit"),Bytes.toBytes("earnings"),
Bytes.toBytes(tokens.nextToken()));
i++;
table.put(p);
line = br.readLine();
}
br.close();
table.close();
}
else
System.out.println("Table Already exists.
Please enter another table name");
}
else
System.out.println("Please Enter the table
name through command line");
}
}
Please let me know whether we need to add any suitable jars ..I am using cloudera cloudera-quickstart-vm-5.4.2-0
Thanks,
VJ
If you read the error, it says that the Integer.parseInt method raised a NumberFormatException. This means that you attempted to convert a String of invalid format into an Integer. In your code, you call that method in this line:
Bytes.toBytes(Integer.parseInt(tokens.nextToken())));
You need look at the tokens you're passing into this method via tokens.nextToken() and ensure that each can be converted to an Integer.
I think the problem is with the cloudera jar versions used. Please check on it, that should work.

How to Convert gson to LinkedHashMap<String, List<String>>?

i'm new to gson and i wonder how convert json data to LinkedHashMap<String, List<String>>
my json data is show like below:
{ "data":
{
"data1": ["asdf", "qwer"],
"data2": ["xczv", "aweqrfds123", "sfdgq234"],
"data3": ["dsafasd", "xcvr123", "sdfa324123"]
}
}
field names of json data of data are dynamic, so i want to convert json data of data to LinkedHashMap<String, List<String>>
how can i do that ?
You can use TypeToken to convert it into expected type with Gson#fromJson(Reader,Type)
As per JSON string it is LinkedHashMap<String,LinkedHashMap<String,ArrayList<String>>>
Sample code:
BufferedReader reader = new BufferedReader(new FileReader(new File("json.txt")));
Type type = new TypeToken<LinkedHashMap<String,LinkedHashMap<String,ArrayList<String>>>>() {}.getType();
LinkedHashMap<String,LinkedHashMap<String,ArrayList<String>>> data = new Gson().fromJson(reader, type);
LinkedHashMap<String,ArrayList<String>> innerMap = data.get("data");
System.out.println(new GsonBuilder().setPrettyPrinting().create().toJson(innerMap));
This is not how it works in Gson world - you can't convert JSON to any Java class you want, unless you want to do all of that manually. The common approach works as described below:
Create a Java class, which matches your JSON format, e.g. you can use a Java class generator described here: http://jsongen.byingtondesign.com/
Use GsonBuilder to read your Json from a file and to import it to the generated class
I've used that approach and the Java file that has been generated (after I've fixed a minor syntax error in your initial JSON) looks like this:
package com.json;
import java.util.List;
public class Data{
private List data1;
private List data2;
private List data3;
public List getData1(){
return this.data1;
}
public void setData1(List data1){
this.data1 = data1;
}
public List getData2(){
return this.data2;
}
public void setData2(List data2){
this.data2 = data2;
}
public List getData3(){
return this.data3;
}
public void setData3(List data3){
this.data3 = data3;
}
}
To start working with the newly created class you can use the template below:
is = new InputStreamReader(new FileInputStream(new File('<path-to-json>')), "UTF-8")/;
Gson gson = new GsonBuilder().create();
Data d = gson.fromJson(is, Data.class);
// Start using your d instance here

Resources