Where can I find the Stanford NLP dependency manual? Is it available online?
The original manual can be found here: http://nlp.stanford.edu/software/dependencies_manual.pdf
The general website for the parser is: https://nlp.stanford.edu/software/lex-parser.html
A more specific page about the Neural Network Dependency Parser is: https://nlp.stanford.edu/software/nndep.html
In the latest CoreNLP release, Stanford Dependencies have been replaced by Universal Dependencies as a default model. You can find documentation for this annotation schema on the Universal Dependencies site.
Related
I downloaded Stanford CoreNLP from the official website and GitHub.
In the guides, it is stated
On the Stanford NLP machines, training data is available in
/u/nlp/data/depparser/nn/data
or
HERE
The list of models currently distributed is:
edu/stanford/nlp/models/parser/nndep/english_UD.gz (default, English,
Universal Dependencies)
It may sound a silly question, but I cannot find such files and folders in any distribution.
Where can I find the source data and models officially distributed with Stanford CoreNLP?
We don't distribute most of the CoreNLP training data. Quite a lot of it is non-free, licensed data produced by other people (such as LDC https://www.ldc.upenn.edu/).
However, a huge number of free dependency treebanks are available through the Universal Dependencies project: https://universaldependencies.org/.
All the Stanford CoreNLP models are available in the "models" jar files. edu/stanford/nlp/models/parser/nndep/english_UD.gz is in this one: stanford-corenlp-3.9.2-models.jar, which is both in the zip file download http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip or can be found on Maven here: http://central.maven.org/maven2/edu/stanford/nlp/stanford-parser/3.9.2/.
We're planning on adding Fortran language support to SonarQube by creating a SonarQube Fortran plugin.
We already have an existing ANTLR grammar and generated parser for the Fortran language. Can we use this as the plugin's parser and build a rule engine parsing the AST generated by ANTLR? The official supported solution seems to be to use SonarSouce's SSLR for the parsing. I could find some older projects (Delphi support and Checkstyle) built on top of an ANTLR grammar, but those are both deprecated, so it got me wondering whether using ANTLR is still supported.
Just making sure before we start planning this in more detail.
You don't need to use SSLR to create a language plugin for SonarQube: the SonarQube APIs are independent of any parsing technology.
The SonarQube CheckStyle plugin is still supported, even if a good number of its rules were rewritten in the SonarQube Java plugin.
I think this i a little outdated, now the sonar-groovy plugin is a better approach IMHO if you want to develop a plugin with antlr: https://github.com/SonarSource/sonar-groovy
Is it possible to have more than one Stanford CoreNLP instance, each of them using a different language, in the same Java project?
In the CoreNLP documentation, it seems that the only way to change language is to add a different Maven dependency: what if I want to use all of them together?
If you include a dependency for each language, you will get all of the model files for Chinese, German, and Spanish. You will now have all the resources to run on Chinese, German, and Spanish.
Within your code, you determine the language by the .properties file you use to build the StanfordCoreNLP pipeline object. So you are free to build different pipelines with different .properties files.
The appropriate .properties files for the various languages can be found in the corresponding model jars.
Where can I find the Stanford NLP dependency manual? Is it available online?
The original manual can be found here: http://nlp.stanford.edu/software/dependencies_manual.pdf
The general website for the parser is: https://nlp.stanford.edu/software/lex-parser.html
A more specific page about the Neural Network Dependency Parser is: https://nlp.stanford.edu/software/nndep.html
In the latest CoreNLP release, Stanford Dependencies have been replaced by Universal Dependencies as a default model. You can find documentation for this annotation schema on the Universal Dependencies site.
I am working on a project which requires me to add POS tags to an input string. I am also going to use grammatical dependency structure generated by the Stanford parser for later processing.
Something to point out before I jump to my problem.
For POS tagging I am using http://nlp.stanford.edu/software/tagger.shtml (Version 3.3.1)
For grammetical dependency generation I am using http://nlp.stanford.edu/software/lex-parser.shtml#Download (version 3.3.1)
I included both these jars in my class path.(By include I am using maven to pull stanford parser jar from maven repository and including POStagger jar using steps mentioned later)
Now the problem is whenever I try to get the POS tags for an input string I get the following error.
Exception in thread "main" java.lang.NoSuchMethodError: edu.stanford.nlp.tagger.maxent.TaggerConfig.getTaggerDataInputStream(Ljava/lang/String;)Ljava/io/DataInputStream;
My intuition says that this is because Stanford parser jar also has Maxent package that contains TaggerConfig class. Every time I ask for POS tags for a string the program looks into the Stanford parser jar instead of Stanford POStagger jar hence the error.
I am using maven and couldn't find the POStagger jar on Maven central so I included it into my local maven repository using instructions on http://charlie.cu.cc/2012/06/how-add-external-libraries-maven/ link.
I would really appreciate if anyone can point out any solution to this problem
You are using two jar files. Go to the BuildPath and reverse the order of your imported jars. That should fix it.
The method Java is complaining about was in releases of the Stanford POS Tagger in the 2009-2011 period, but is not in any recent (or ancient) release.
So what this means is that you have another jar on your class path which contains an old version of the Stanford POS tagger hidden inside it, and its MaxentTagger has been invoked, not the one from the v3.3.1 jars (due to class path search order). You should find it and complain.
The most common case recently has been the CMU ark-tweet-nlp.jar. See: http://nlp.stanford.edu/software/corenlp-faq.shtml#nosuchmethoderror.
The overlapping classes of the Stanford releases are not a problem: Providing you use the same version of the tagger and parser, they are identical.