How to use STANFORD PARSER from GATE - stanford-nlp

How to use Stanford parser from GATE embedded (using GATE through Java code). I currently use GATE_Developer_7.0 on my machine; i know that there is plugin for Stanford Parser in GATE but don't know how to use it using java code.
Thanks

The usual approach we always recommend for GATE Embedded is to build up your pipeline using GATE Developer, test it out and get it debugged by processing sample documents in the GUI. Once you're happy with the application, use "save application state" or "export for GATECloud.net" to produce a saved state that you can then load in your embedded code using the PersistenceManager. This will automatically ensure that all the necessary plugins are loaded and is generally much simpler and less error-prone than trying to build up your pipeline by hand in your code.
The BatchProcessApp example on the GATE website shows how you can load a saved application with the PersistenceManager, essentially it's
Gate.init(); // always the first thing you do
CorpusController controller = (CorpusController)PersistenceManager
.loadObjectFromFile(new File("/path/to/application.xgapp"));
Corpus corpus = Factory.newCorpus("myCorpus");
controller.setCorpus(corpus);
then for each document you want to process
Document doc = Factory.newDocument(....);
corpus.add(doc);
try {
controller.execute();
// code here to do stuff with the annotated document, e.g. extract
// annotations/features
} finally {
corpus.clear();
Factory.deleteResource(doc);
}

Related

Empty output when reproducing Chinese coreference results on Conll-2012 using CoreNLP Neural System

Following the instructions on this page https://stanfordnlp.github.io/CoreNLP/coref.html#running-on-conll-2012, Here's my code when I tried to reproduce Chinese coreference results on Conll-2012:
public class TestCoref {
public static void main(String[] args) throws Exception {
Properties props = StringUtils.argsToProperties(args);
props.setProperty("props", "edu/stanford/nlp/coref/properties/neural-chinese-conll.properties");
props.setProperty("coref.data", "path-to/data/conll-2012");
props.setProperty("coref.conllOutputPath", "path-to-output/conll-results");
props.setProperty("coref.scorer", "path-to/reference-coreference-scorers/v8.01/scorer.pl");
CorefSystem coref = new CorefSystem(props);
coref.runOnConll(props);
}
}
As output, I got 3 files like these:
"date-time.coref.predicted.txt
date-time.coref.gold.txt
date-time.predicted.txt"
but all of them are EMPTY!
I got my "conll-2012" data as follows:
First I downloaded train/dev/test-key data from this page http://conll.cemantix.org/2012/data.html, as well as the ontonote-release-5.0 from LDC. Then I ran the script skeleton2conll.sh provided with the official conll 2012 data which produced _conll files.
the model I used is downloaded here http://nlp.stanford.edu/software/stanford-chinese-corenlp-models-current.jar
When I tried to find the problem, I noticed that there exists a function "annotate" in the class CorefSystem which seems to do the real job, but it is not used at all. https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/coref/CorefSystem.java
I wonder if there is a bug in runOnConll function which doesn't read an annotate anything, or how could I reproduce the coreference results?
PS:
I especially want to produce some results on conversational data like "tc" and "bc" in conll-2012. I find that using the coreference API, I can only parse textual data. Is there any other way to use Neural Coref System on conversational data (where different speakers should be indicated) apart from running on Conll-2012?
thanks in advance for help!
As a start, why don't you run this command from the command line:
java -Xmx10g -cp stanford-corenlp-3.9.1.jar:stanford-chine-corenlp-models-3.9.1.jar:* edu.stanford.nlp.coref.CorefSystem -props edu/stanford/nlp/coref/properties/neural-chinese-conll.properties -coref.data <path-to-conll-data> -coref.conllOutputPath <where-to-save-system-output> -coref.scorer <path-to-scoring-script>

Can a build template be based on another build template on TeamCity?

I am using TeamCity 9.0.2, and I would like to make a template implement another template, or make a build configuration implement more than one template.
Can this be achieved?
This was not available when you asked the question but since Team City 10 you can now use Kotlin to configure your builds and thus your templates.
From this you can make Templates implement other Templates.
I myself have made Templates inherit from other templates to cut down on reconfiguration time and to not have to repeat myself so many times.
open class TheBaseTemplate(uuidIn: String, extIdIn: String, nameIn: String, additionalSettings: Template.() -> Unit) : Template({
uuid = uuidIn
extId = extIdIn
name = nameIn
/* all the other settings that are the same for the derived templates*/
additionalSettings()
})
object DerivedTemplateA : TheBaseTemplate("myUuidA", "myExtIdA", "myNameA", {
params {
param("set this", "to this")
}
})
object DerivedTemplateB : TheBaseTemplate("myUuidB", "myExtIdB", "myNameB", {
params {
param("set this", "to that")
}
})
object Project : Project({
uuid = "project uuid"
extId = "project extid"
name = "project name"
buildType {
template(DerivedTemplateA)
/* the uuid, extId and name are set here */
}
buildType {
template(DerivedTemplateB)
/* the uuid, extId and name are set here */
}
template(DerivedTemplateA)
template(DerivedTemplateB)
})
The above code might be very hard to understand. It will take some time to familiarise yourself with Kotlin, what it does, and how it interfaces with TeamCity. I should point out that some imports are missing.
Additionally, take the example with a pinch of salt. It is a quick example to demonstrate one way of templates implementing other templates. Do not take this example as the definitive way to do things.
Unfortunately, this is currently not possible but already requested for a long time in TW-12153 (maybe you would like to vote for it).
To share several build steps among several build configurations or build configuration templates, I am using meta runners:
A Meta-Runner allows you to extract build steps, requirements and parameters from a build configuration and create a build runner out of them.
This build runner can then be used as any other build runner in a build step of any other build configuration or template.
Although using meta runners works as a workaround for us, editing meta runners is not as convenient as editing a build configuration template (as it usually requires editing the meta runner definition XML file by hand).
Update 2021
As #zosal points out in his answer TeamCity meanwhile provides another way of sharing common build configuration data or logic by means of the Kotlin DSL. The Kotlin DSL is a very powerful tool but may not always fit in your specific scenario. I would recommend to at least give it a try or watch one of the introductory tutorial videos.

Model evaluation in Stanford NER

I'm doing a project with the NER module from Stanford CoreNLP and I'm currently having some issues with the evaluation of the model.
I'm using the API to call the functionality from inside a java program instead of using the command line arguments and so far I've managed to train the model from several training files (in a tab-separated format; 2 columns with token and annotation/answer) and to serialize it to a file which was pretty easy.
Now I'm trying to evaluate the model I've trained on some test files (precision, recall, f1) and I'm kinda stuck there. First of all, what format should the test files be in? I'm assuming they should be the same as the training files (tab-separated) which would be the logical thing. I've looked through the JavaDoc documentation for information on how to use the classify method and also had a look at the NERDemo.java. I've managed to get the classifyToString method to work but that doesn't really help me with the evaluation. I've found the classifyAndWriteAnswers(String testFile, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores) method that I assume would give me the precision and recall scores if I set outputScores to true.
However, I can't manage to get this to work. Which DocumentReaderAndWriter should I use as the second argument?
This is what I've got right now:
public static void evaluate(CRFClassifier classifier, File testFile) {
try {
classifier.classifyAndWriteAnswers(testFile.getPath(), new PlainTextDocumentReaderAndWriter(), true);
} catch (IOException e) {
e.printStackTrace();
}
}
This is what I get:
Unchecked call to 'classifyAndWriteAnswers(String, DocumentReaderAndWriter<IN>, boolean)' as a member of raw type 'edu.stanford.nlp.ie.AbstractSequenceClassifier'
Also, do I pass the path to the test file as the first argument or rather the file itself loaded into a String? Some help would be greatly appreciated.

How to generate hash(SHA1) using beanshell in JMeter?

How do I want to generate a hash using beanshell(SHA1) in JMeter to sign up to a application?
I'm not able to get a substantial answer yet from net
Generating a hash is pretty easy, just use DigestUtils class from Apache Commons Codec library (it's a part of JMeter so you won't need to install anything external)
Something like:
import org.apache.commons.codec.digest.DigestUtils;
String foo = "bar";
String sha1Hex = DigestUtils.sha1Hex(foo);
Usually SHA1 is being required for signing requests to OAuth-protected applications, if it is your case, I believe How to Run Performance Tests on OAuth Secured Apps with JMeter will be extremely helpful.
There's a new JMeter function __digest, currently in nightly builds which can be used to encode strings
In your case to save in sha1Value variable the result of myVar variable use the following:
${__digest(SHA-1,${myVar},,,sha1Value)}
4th parameter is uppercase, so you can send true to automatically uppercase it.

Motorola MC65 - EMDK .NET 2.6 - E_SCN_READTIMEOUT using ScanWait()

I'm looking to integrate the Barcode2 class in the EDMK 2.6 library into our existing Barcode scanning interface.
I've wired the example code up to our interface method StartScan() and always get E_SCN_READTIMEOUT as the result even though the code seems to be responding to the scan. (the breakpoint at if (scan.Result == Results.SUCCESS) is hit in response to the scan
public void StartScan()
{
if (!barcode.IsScanPending)
{
ScanData scan = barcode.ScanWait(2000); // 2 second timeout
if (scan.Result == Results.SUCCESS)
{
if (scan.IsText)
{
textbox1.Text = scan.Text;
}
}
}
}
The result is always E_SCN_READTIMEOUT, I suspect this may be a conflict with DataWedge 3.4 running on the device, but the functionality of the scanner and triggers seem to be dependent on it.
Getting barcode scans to the clipboard using DataWedge is not an option for us, is there a way to get the library to function despite DataWedge(assuming that is causing the read timeouts)?
The DataWedge application did need to be disabled, (this can be done programmatically via the datawedge API from Motorola, Thanks Abdel for the hint here!).
https://docs.symbol.com/ReleaseNotes/Release%20Notes%20-%20DataWedge_3.3.htm
A little background on our Windows Mobile application for reference, we have a hardware singleton that contains interfaces for all hardware components and loads related types and assemblies via reflection. If we referenced types directly the code above worked.
The end solution ended up being to use the Symbol.Barcode library instead of Symbol.Barcode2.

Resources