I want to get nodes of parseTree - opennlp

This is part of my code:
String sentence = "The system Does Not Require users to identify themselves to search for books according to certain criteria and to check the availability of a particular book. However to check out books, to check their respective book loan status, and to place holds on books that are already on loan, users must first identify themselves to the system.";
Parse topParses[] =ParserTool.parseLine(sentence, parser, /*numParses=*/ 3);
for (Parse parseTree: topParses){
parseTree.show();
How can I get verbs in the sentence? Please!
I mean, how can I get nodes of tree?

If only you need to get the verbs from the sentence , then POSTagger in opennlp is sufficient.All you have to do is to use a Opennlp tokenizer to get tokens in a array and feed it to the POSTaggerME.It will give you the corresponding POS tags..Then you can filter by tags for the Verb like VB, VBZ etc.

If you are looking for verb phrases then use the Chunker, if you need just verbs then use the POS tagger.
Check out this answer
How to extract the noun phrases using Open nlp's chunking parser

Related

How do I lookup Industry by Symbol on Yahoo using ruby?

I am trying to get company information for a given symbol, and I have gotten quotes data using a wonderful 'yahoo-finance' gem, but now I need to get company's industry information, and can't find a way.
Any ideas?
Just add :industry to the list of fields you want returned. available_fields gives you the full list. E.g.,
require 'yahoo_finance'
stocks = YahooFinance::Stock.new(['AAPL'], [:industry, :sector])
# use stocks.available_fields to search for the fields that you want
results = stocks.fetch; nil
results['AAPL'][:industry]
# "Electronic Equipment"
results['AAPL'][:sector]
# "Consumer Goods"

How to use the PubMed API to search for an article with an exact title?

I'm trying to use the PubMed API to search for articles with an exact title. As an example, I want to search for the title: The cost-effectiveness of mirtazapine versus paroxetine in treating people with depression in primary care.
I want up to 1000 results in JSON format, so I know that the first part of my URL should look like this:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmode=json&retmax=1000&term=
How do I add a title search as a GET parameter?
I've been using the Pubmed advanced search constructor, and that suggests that the query should look like The cost-effectiveness of mirtazapine versus paroxetine in treating people with depression in primary care[Title].
But if I try just adding that to the URL term=, PubMed tries to break down the title into all kinds of peculiar queries:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmode=json&retmax=1000&term=The%20cost-effectiveness%20of%20mirtazapine%20versus%20paroxetine%20in%20treating%20people%20with%20depression%20in%20primary%20care[Title]
How can I specify an exact title as a GET param?
Use field=title
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmode=json&retmax=1000&term=The%20cost-effectiveness%20of%20mirtazapine%20versus%20paroxetine%20in%20treating%20people%20with%20depression%20in%20primary%20care&field=title
Check out ESearch API for more information:
http://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_ESearch_
Use + instead of %20 (space).
For example:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmode=json&retmax=1000&term=cost+effectiveness+of+mirtazapine[title]

Wiktionary/MediaWiki Search & Suffix Filtering

I'm building an application that will hopefully use Wiktionary words and definitions as a data source. In my queries, I'd like to be able to search for all Wiktionary entries that are similar to user provided terms in either the title or definition, but also have titles ending with a specified suffix (or one of a set of suffixes).
For example, I want to find all Wiktionary entries that contain the words "large dog", like this:
https://en.wiktionary.org/w/api.php?action=query&list=search&srsearch=large%20dog
But further filter the results to only contain entries with titles ending with "d". So in that example, "boarhound", "Saint Bernard", and "unleashed" would be returned.
Is this possible with the MediaWiki search API? Do you have any recommendations?
This is mostly possible with ElasticSearch/CirrusSearch, but disabled for performance reasons. You can still use it on your wiki, or attempt smart search queries.
Usually for Wiktionary I use yanker, which can access the page table of the database. Your example (one-letter suffix) would be huge, but for instance .*hound$ finds:
Afghan_hound
Bavarian_mountain_hound
Foxhound
Irish_Wolfhound
Mahound
Otterhound
Russian_Wolfhound
Scottish_Deerhound
Tripehound
basset_hound
bearhound
black_horehound
bloodhound
boarhound
bookhound
boozehound
buckhound
chowhound
coon_hound
coonhound
covert-hound
covert_hound
coverthound
deerhound
double-nosed_andean_tiger_hound
elkhound
foxhound
gazehound
gorehound
grayhound
greyhound
harehound
heckhound
hell-hound
hell_hound
hellhound
hoarhound
horehound
hound
limehound
lyam-hound
minkhound
newshound
nursehound
otterhound
powder_hound
powderhound
publicity-hound
publicity_hound
rock_hound
rockhound
scent_hound
scenthound
shag-hound
sighthound
sleuth-hound
sleuthhound
slot-hound
slowhound
sluthhound
smooth_hound
smoothhound
smuthound
staghound
war_hound
whorehound
wolfhound

tweepy streaming track filter results

It seems not all the tweets I get using filter contain the item ("health" in this case). How could I get only tweets contain this specific item? Anyone can help me?
Thanks so much in advance!!
This is the line when I use filter:
sapi.filter(locations=[-79.55, 37.883, -75.067, 39.717],track = ["health"])
Unfortunately, the Streaming API does not allow filtering by both location and terms. From the docs:
Bounding boxes do not act as filters for other filter parameters. For example track=twitter&locations=-122.75,36.8,-121.75,37.8 would match any tweets containing the term Twitter (even non-geo tweets) OR coming from the San Francisco area.
So essentially the reason you are seeing some tweets that do not contain the word "health" is because you are receiving tweets containing the word "health", OR located within your bounding box (in this case, locations=[-79.55, 37.883, -75.067, 39.717]).
You can, however, try to filter by your term(s) then parse through the tweet data for the location, or alternately filter by location then search the tweet text for your term(s). I would probably suggest the latter if location is necessary to limit the scope of your tweet consumption.
It is very easy you just need to add this line in your code.
twitterStream.filter(track=["health"])

How do I define a SemgrexPattern in Stanford API, to match nodes' text without using {lemma:...}?

I am working with the edu.stanford.nlp.semgrex and edu.stanford.nlp.tress.semgraph packages and am looking for a way to match nodes with a text value other than the lemma: directive.
I couldn't find all possible attribute names in javadoc for SemgrexPattern, only those for lemma, tag, and relational operators - is there a comprehensive list available?
For example, in the following sentence
My take-home pay is $20.
extracting the 'take-home' node is not possible using
(SemgrexPattern.compile( "{lemma:take-home}"))
.matcher( "My take-home pay is $20.").find()
yields false, because take-home is deemed not to be a lemma.
What do I need to do to match nodes with non-lemma, arbitrary text?
Thanks for any advice or comment.
Sorry - I realize that {word:take-home} would work in the example above.
Thanks..

Resources