I am using GATE for Arabic under Linux and when using Stanford Tagger there aren't any POS tags. All parameters were not changed (as default) what can I do to perform that task?
Does this problem depend on the parameters ? What installation directory of the tagger must be set in the taggerBinary parameter?
First you have to create a new StanfordPOSTaggerPR in GATE and initialize the Tagger with the arabic.tagger model provided with the StanfordTagger.
The last version of GATE is the 8.0 and it uses the StanfordTagger 3.4. So you'll have to download the models provided with this version.
Next, you have to create a corpus pipeline with a SentenceSplitter and a Tokeniser (I've tried with the UnicodeTokeniser and the RegExp SentenceSplitter):
Finally, try the pipeline with a sample file:
Related
Very helpfully Stanford NLP core 3.9.2 used to split rolled together Spanish verbs and pronouns
This is the 4.0.0 output:
The previous version had more .tagger files. These have not been included with the 4.0.0 distribution.
Is that the cause. Will be they added back?
There are some documentation updates that still need to be made for Stanford CoreNLP 4.0.0.
A major change is that a new multi-word-token annotator has been added, that makes tokenization conform with the UD standard. So the new default Spanish pipeline should run tokenize,ssplit,mwt,pos,depparse,ner. It may not be possible to run such a pipeline from the server demo at this time, as some modifications will need to be made. I can try to send you what such modifications would be soon. We will try to make a new release in early summer to handle issues like this that we missed.
It won't split the word in your example unfortunately, but I think in many cases it will do the correct thing. The Spanish mwt model is just based off of a large dictionary of terms, and was tuned to optimize performance on the Spanish training data.
Since version 3.5.2 the Stanford Parser and Stanford CoreNLP output grammatical relations in the Universal Dependencies v1 representation by default.
I wonder if Stanford still improving the English_SD parser model or it's concentrated on improving English_UD instead. What was the last time English_SD got updated?
Mailing list archive tells me there are new neural dependency parsing models for English released in 3.7.0, but I'm not sure if it's SD and/or UD models.
We are not updating SD any more, that description is a reference to a new UD model.
I am making my own model of Stanford NER which is CRF based, by following conventions given at this link.I want to add Gazettes and following this from same link. I am mentioning all of my Gazettes using this property, gazette=file1.txt;file2.txt and also mentioning useGazettes=true in austen.prop. After making model when I am testing data from my Gazettes then it is not TAGGING correctly. The tag which I mentioned in files in not coming correctly. These are little bit surprising results for me as Stanford NER is not giving them same tag as mentioned in those files.
Is there some limitations of Stanford NER with Gazettes or I am still missing something? If somebody can help me I will be thankful to you.
Stanford NLP postagger claims imperative verbs added to recent version. I've inputted lots of text with abundant and obvious imperatives, but there seems to be no tag for them on output. Must one, after all, train it for this pos?
There is no special tag for imperatives, they are simply tagged as VB.
The info on the website refers to the fact that we added a bunch of manually annotated imperative sentences to our training data such that the POS tagger gets more of them right, i.e. tags the verb as VB.
I want to use gate-EN-twitter.model for pos tagging when in the process of parsing by Stanford parser. Is there an option on command line that does that? like -pos.model gate-EN-twitter.model? Or do I have to use Stanford pos tagger with gate model for tagging first then use its output as input for the parser?
Thanks!
If I understand you correctly, you want to force the Stanford Parser to use the tags generated by this Twitter-specific POS tagger. That's definitely possible, though this tweet from Stanford NLP about this exact model should serve as a warning:
Tweet from Stanford NLP, 13 Apr 2014:
Using CoreNLP on social media? Try GATE Twitter model (iff not parsing…) -pos.model gate-EN-twitter.model https://gate.ac.uk/wiki/twitter-postagger.html #nlproc
(https://twitter.com/stanfordnlp/status/455409761492549632)
That being said, if you really want to try, we can't stop you :)
There is a parser FAQ entry on forcing in your own tags. See http://nlp.stanford.edu/software/parser-faq.shtml#f
Basically, you have two options (see the FAQ for full details):
If calling the parser from the command line, you can pre-tag your text file and then alert the parser to the fact that the text is pre-tagged using some command-line options.
If parsing programmatically, the LexicalizedParser#parse method will accept any List<? extends HasTag> and treat the tags in that list as golden. Just pre-tag your list (using the CoreNLP pipeline or MaxentTagger) and pass on that token list to the parser.