Stanford NLP equivalent of Apache OpenNLP chunking? - stanford-nlp

Apache OpenNLP has a Chunker tool https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.chunker, and I wonder what is the Stanford NLP equivalent.
The dependency parser seems closest http://nlp.stanford.edu/software/stanford-dependencies.shtml

Related

CoreNLP constituency parsing

How would you describe the status of constituency parsing in CoreNLP? Is it maintained-yet-not-being-improved as the package has moved on (as of 3.5.3?) to dependency parsing, thusly aligning with the recent decade's computational linguistics key fashion of research?
I wonder whether the java-nlp-user mailing list is not the more appropriate place for this discussion, but a short authoritative answer would be much appreciated, if there is one.
Since dependency parsing probably reaches very good accuracy using neural state-of-the-art, would you recommend any package for converting from dependency to constituency parses?
Is there any form of (noisy) conversion code provided in CoreNLP, for converting from its dependency parses to constituency parses? Only a rule-based conversion in the opposite direction appears to be provided for some languages.
We are not actively developing constituency parsing in the Java Stanford CoreNLP package any more. I think any future improved constituency parsers will be in Python and neural based. I believe AllenNLP has such an implementation, and it's possible in the future we will add a neural model to our Python StanfordNLP package.
We do not offer any type of dependency to constituency conversion to the best of my knowledge.

CoreNLP Road Map

The road map for CoreNLP is unclear. Is it in maintenance mode? I'm happy to see emphasis on StanfordNLP, but the lack of visibility into the direction is concerning. If the new neural models are better, will wee see them wrapped in the Java CoreNLP API's?
CoreNLP is not yet in maintenance mode. We are going to put in some quite significant (and compatibility-breaking) changes over the summer. Among other things, we're going to convert across to using UDv2 (from the current UDv1), we're going to make tokenization changes to English and perhaps other languages to better align with UD and "new" (since about 2004!) Penn Treebank tokenization, and we'll have more consistent availability and use of word vectors. These changes should increase compatibility between the Java and Python packages, and over time also make it possible for us to use more data to train Python stanfordnlp models. Now that the Python stanfordnlp v0.2 is out, work on CoreNLP should pick up.
On the other hand, most of the research energy in the Stanford NLP group has now moved to exploring neural models built in Python on top of the major deep learning frameworks. (Hopefully that's not a surprise to hear!) It is therefore less likely that major new components will be added to CoreNLP. It's hard to predict the future, but it is reasonable to expect that CoreNLP will head more in the direction of being a stable, efficient-on-CPU NLP package, rather than something implementing the latest neural models.

Where can i find the implementation of SPIED tool from stanford corenlp?

In the paper, "Improved Pattern Learning for Bootstrapped Entity Extraction. Sonal Gupta and Christopher D. Manning. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning (CoNLL). 2014."
cited for the tool the link for implementation has been specified as http://nlp.stanford.edu/software/patternviz.shtml but it seems to be taken down.
Oops! We should maybe fix that; but, in the meantime the new link is: http://nlp.stanford.edu/software/patternslearning.html. The code is distributed with Stanford CoreNLP, so there's no extra download. An example invocation is:
java -cp stanford-corenlp-3.5.1.jar:stanford-corenlp-3.5.1-models.jar:javax.json.jar:joda-time.jar:jollyday.jar edu.stanford.nlp.patterns.GetPatternsFromDataMultiClass -props patterns/example.properties

how to know whether stanford dependency parser performs tokenization based on rule-based methods or probabilistic theory?

I am confused whether stanford dependency parser performs tokenization of sentences and words based on probabilistic theory or rule-based methods?? and I want to know what is dependency grammar and dependency parsing
please helpp!!!
thanks
The tokenization is entirely rule-based. If you're curious, you can take at the (very lengthy) tokenizer definition for English.
There is a short introduction to dependency parsing on this Stanford page, with some links to relevant papers as well.

Natural Language Processing in Ruby [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I'm looking to do some sentence analysis (mostly for twitter apps) and infer some general characteristics. Are there any good natural language processing libraries for this sort of thing in Ruby?
Similar to Is there a good natural language processing library but for Ruby. I'd prefer something very general, but any leads are appreciated!
Three excellent and mature NLP packages are Stanford Core NLP, Open NLP and LingPipe. There are Ruby bindings to the Stanford Core NLP tools (GPL license) as well as the OpenNLP tools (Apache License).
On the more experimental side of things, I maintain a Text Retrieval, Extraction and Annotation Toolkit (Treat), released under the GPL, that provides a common API for almost every NLP-related gem that exists for Ruby. The following list of Treat's features can also serve as a good reference in terms of stable natural language processing gems compatible with Ruby 1.9.
Text segmenters and tokenizers (punkt-segmenter, tactful_tokenizer, srx-english, scalpel)
Natural language parsers for English, French and German and named entity extraction for English (stanford-core-nlp).
Word inflection and conjugation (linguistics), stemming (ruby-stemmer, uea-stemmer, lingua, etc.)
WordNet interface (rwordnet), POS taggers (rbtagger, engtagger, etc.)
Language (whatlanguage), date/time (chronic, kronic, nickel), keyword (lda-ruby) extraction.
Text retrieval with indexation and full-text search (ferret).
Named entity extraction (stanford-core-nlp).
Basic machine learning with decision trees (decisiontree), MLPs (ruby-fann), SVMs (rb-libsvm) and linear classification (tomz-liblinear-ruby-swig).
Text similarity metrics (levenshtein-ffi, fuzzy-string-match, tf-idf-similarity).
Not included in Treat, but relevant to NLP: hotwater (string distance algorithms), yomu (binders to Apache Tiki for reading .doc, .docx, .pages, .odt, .rtf, .pdf), graph-rank (an implementation of GraphRank).
There are some things at Ruby Linguistics and some links therefrom, though it doesn't seem anywhere close to what NLTK is for Python, yet.
You can always use jruby and use the java libraries.
EDIT: The ability to do ruby natively on the jvm and easily leverage java libraries is a big plus for rubyists. This is a good option that should be considered in a situation like this.
Which NLP toolkit to use in JAVA?
I found an excellent article detailing some NLP algorithms in Ruby here. This includes stemmers, date time parsers and grammar parsers.
TREAT – the Text REtrieval and Annotation Toolkit – is the most comprehensive toolkit I know of for Ruby: https://github.com/louismullie/treat/wiki/
I maintain a list of Ruby Natural Language Processing resources (libraries, APIs, and presentations) on GitHub that covers the libraries listed in the other answers here as well as some additional libraries.
Also consider using SaaS APIs like MonkeyLearn. You can easily train text classifiers with machine learning and integrate via an API. There's a Ruby SDK available.
Besides creating your own classifiers, you can pick pre-created modules for sentiment analysis, topic classification, language detection and more.
We also have extractors like keyword extraction and entities, and we'll keep adding more public modules.
Other nice features:
You have a GUI to create/test algorithms.
Algorithms run really fast in our cloud computing platform.
You can integrate with Ruby or any other programming language.
Try this one
https://github.com/louismullie/stanford-core-nlp
About stanford-core-nlp gem
This gem provides high-level Ruby bindings to the Stanford Core NLP package, a set natural language processing tools for tokenization, sentence segmentation, part-of-speech tagging, lemmatization, and parsing of English, French and German. The package also provides named entity recognition and coreference resolution for English.
http://nlp.stanford.edu/software/corenlp.shtml
demo page
http://nlp.stanford.edu:8080/corenlp/
You need to be much more specific about what these "general characteristics" are.
In NLP "general characteristics" of a sentence can mean a million different things - sentiment analysis (ie, the attitude of the speaker), basic part of speech tagging, use of personal pronoun, does the sentence contain active or passive verbs, what's the tense and voice of the verbs...
I don't mind if you're vague about describing it, but if we don't know what you're asking it's highly unlikely we can be specific in helping you.
My general suggestion, especially for NLP, is you should get the tool best designed for the job instead of limiting yourself to a specific language. Limiting yourself to a specific language is fine for some tasks where the general tools are implemented everywhere, but NLP is not one of those.
The other issue in working with Twitter is a great deal of the sentences there will be half baked or compressed in strange and wonderful ways - which most NLP tools aren't trained for. To help there, the NUS SMS Corpus consists of "about 10,000 SMS messages collected by students". Due to the similar restrictions and usage, analysing that may be helpful in your explorations with Twitter.
If you're more specific I'll try and list some tools that will help.
I would check out Mark Watson's free book Practical Semantic Web and Linked Data Applications, Java, Scala, Clojure, and JRuby Edition. He has chapters on NLP using java, clojure, ruby, and scala. He also provides links to the resources you need.
For people looking for something more lightweight and simple to implement this option worked well for me.
https://github.com/yohasebe/engtagger

Resources