How to get training data and models of Stanford CoreNLP? - stanford-nlp

I downloaded Stanford CoreNLP from the official website and GitHub.
In the guides, it is stated
On the Stanford NLP machines, training data is available in
/u/nlp/data/depparser/nn/data
or
HERE
The list of models currently distributed is:
edu/stanford/nlp/models/parser/nndep/english_UD.gz (default, English,
Universal Dependencies)
It may sound a silly question, but I cannot find such files and folders in any distribution.
Where can I find the source data and models officially distributed with Stanford CoreNLP?

We don't distribute most of the CoreNLP training data. Quite a lot of it is non-free, licensed data produced by other people (such as LDC https://www.ldc.upenn.edu/).
However, a huge number of free dependency treebanks are available through the Universal Dependencies project: https://universaldependencies.org/.
All the Stanford CoreNLP models are available in the "models" jar files. edu/stanford/nlp/models/parser/nndep/english_UD.gz is in this one: stanford-corenlp-3.9.2-models.jar, which is both in the zip file download http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip or can be found on Maven here: http://central.maven.org/maven2/edu/stanford/nlp/stanford-parser/3.9.2/.

Related

In the NLP with Pre-Trained project, what the difference of direct installation of libraries and manual installation?

Currently, I am working on this stuff:
NLP
Transformers Pre-trained models (especially the Bert family)
on google colab
Sometimes, When I clone projects from GitHub, I see that the programmers do not use the direct installation from Huggingface Repository, like:
!pip install transformers
import transformers
but instead, they downloaded the installation files of these libraries manually in the project.
my questions are:
why this approach?
Is this better than installing directly?
How can I do the same in my project?

Multi-language coreNLP

Is it possible to have more than one Stanford CoreNLP instance, each of them using a different language, in the same Java project?
In the CoreNLP documentation, it seems that the only way to change language is to add a different Maven dependency: what if I want to use all of them together?
If you include a dependency for each language, you will get all of the model files for Chinese, German, and Spanish. You will now have all the resources to run on Chinese, German, and Spanish.
Within your code, you determine the language by the .properties file you use to build the StanfordCoreNLP pipeline object. So you are free to build different pipelines with different .properties files.
The appropriate .properties files for the various languages can be found in the corresponding model jars.

Updated Stanford dependencies manual [duplicate]

Where can I find the Stanford NLP dependency manual? Is it available online?
The original manual can be found here: http://nlp.stanford.edu/software/dependencies_manual.pdf
The general website for the parser is: https://nlp.stanford.edu/software/lex-parser.html
A more specific page about the Neural Network Dependency Parser is: https://nlp.stanford.edu/software/nndep.html
In the latest CoreNLP release, Stanford Dependencies have been replaced by Universal Dependencies as a default model. You can find documentation for this annotation schema on the Universal Dependencies site.

Conflict between Stanford Parser & Stanford POS tagger

I am working on a project which requires me to add POS tags to an input string. I am also going to use grammatical dependency structure generated by the Stanford parser for later processing.
Something to point out before I jump to my problem.
For POS tagging I am using http://nlp.stanford.edu/software/tagger.shtml (Version 3.3.1)
For grammetical dependency generation I am using http://nlp.stanford.edu/software/lex-parser.shtml#Download (version 3.3.1)
I included both these jars in my class path.(By include I am using maven to pull stanford parser jar from maven repository and including POStagger jar using steps mentioned later)
Now the problem is whenever I try to get the POS tags for an input string I get the following error.
Exception in thread "main" java.lang.NoSuchMethodError: edu.stanford.nlp.tagger.maxent.TaggerConfig.getTaggerDataInputStream(Ljava/lang/String;)Ljava/io/DataInputStream;
My intuition says that this is because Stanford parser jar also has Maxent package that contains TaggerConfig class. Every time I ask for POS tags for a string the program looks into the Stanford parser jar instead of Stanford POStagger jar hence the error.
I am using maven and couldn't find the POStagger jar on Maven central so I included it into my local maven repository using instructions on http://charlie.cu.cc/2012/06/how-add-external-libraries-maven/ link.
I would really appreciate if anyone can point out any solution to this problem
You are using two jar files. Go to the BuildPath and reverse the order of your imported jars. That should fix it.
The method Java is complaining about was in releases of the Stanford POS Tagger in the 2009-2011 period, but is not in any recent (or ancient) release.
So what this means is that you have another jar on your class path which contains an old version of the Stanford POS tagger hidden inside it, and its MaxentTagger has been invoked, not the one from the v3.3.1 jars (due to class path search order). You should find it and complain.
The most common case recently has been the CMU ark-tweet-nlp.jar. See: http://nlp.stanford.edu/software/corenlp-faq.shtml#nosuchmethoderror.
The overlapping classes of the Stanford releases are not a problem: Providing you use the same version of the tagger and parser, they are identical.

Manual for Stanford NLP Toolkit Parse

Where can I find the Stanford NLP dependency manual? Is it available online?
The original manual can be found here: http://nlp.stanford.edu/software/dependencies_manual.pdf
The general website for the parser is: https://nlp.stanford.edu/software/lex-parser.html
A more specific page about the Neural Network Dependency Parser is: https://nlp.stanford.edu/software/nndep.html
In the latest CoreNLP release, Stanford Dependencies have been replaced by Universal Dependencies as a default model. You can find documentation for this annotation schema on the Universal Dependencies site.

Resources