I've been following the spaCy quick-start guide for text classification.
Let's say I have a very simple dataset.
TRAIN_DATA = [
("beef", {"cats": {"POSITIVE": 1.0, "NEGATIVE": 0.0}}),
("apple", {"cats": {"POSITIVE": 0, "NEGATIVE": 1}})
]
I'm training a pipe to classify text. It trains and has a low loss rate.
textcat = nlp.create_pipe("pytt_textcat", config={"exclusive_classes": True})
for label in ("POSITIVE", "NEGATIVE"):
textcat.add_label(label)
nlp.add_pipe(textcat)
optimizer = nlp.resume_training()
for i in range(10):
random.shuffle(TRAIN_DATA)
losses = {}
for batch in minibatch(TRAIN_DATA, size=8):
texts, cats = zip(*batch)
nlp.update(texts, cats, sgd=optimizer, losses=losses)
print(i, losses)
Now, how do I predict whether a new string of text is "POSITIVE" or "NEGATIVE"?
This will work:
doc = nlp(u'Pork')
print(doc.cats)
It gives a score for each category we've trained to predict on.
But that seems at odds with the docs. It says I should use a predict method on the original subclass pipeline component.
That doesn't work though.
Trying textcat.predict('text') or textcat.predict(['text']) etc.. throws:
AttributeError Traceback (most recent call last)
<ipython-input-29-39e0c6e34fd8> in <module>
----> 1 textcat.predict(['text'])
pipes.pyx in spacy.pipeline.pipes.TextCategorizer.predict()
AttributeError: 'str' object has no attribute 'tensor'
The predict methods of pipeline components actually expect a Doc as input, so you'll need to do something like textcat.predict(nlp(text)). The nlp used there does not necessarily have a textcat component. The result of that call then needs to be fed into a call to set_annotations() as shown here.
However, your first approach is just fine:
...
nlp.add_pipe(textcat)
...
doc = nlp(u'Pork')
print(doc.cats)
...
Internally, when calling nlp(text), first the Doc for the text will be generated, and then each pipeline component, one by one, will run its predict method on that Doc and keep adding information to it with set_annotations. Eventually the textcat component will define the cats variable of the Doc.
The API docs from which you're citing for the other approach, kind of give you a look "under the hood". So they're not really conflicting approaches ;-)
1) Please explain what is the functionality of TextGetTargetedSentiment.
2) Please provide Java code snippet calling TextGetTargetedSentiment.
EDIT
API info is at
http://www.alchemyapi.com/api/sentiment/textc.html#targetedsentiment
As answered by Zach below, code snippet given by AlchemyAPI is
AlchemyAPI_TargetedSentimentParams sentimentParams = new AlchemyAPI_TargetedSentimentParams();
sentimentParams.setShowSourceText(true);
doc = alchemyObj.TextGetTargetedSentiment("This car is terrible.", "car", sentimentParams);
System.out.print(getStringFromDocument(doc));
Result is
:
<totalTransactions>1</totalTransactions>
<language>english</language>
<text>This car is terrible.</text>
<docSentiment>
<score>-0.776261</score>
<type>negative</type>
</docSentiment>
If we change a statement to
"This car is superb."
Then result is
:
<totalTransactions>1</totalTransactions>
<language>english</language>
<text>This car is superb.</text>
<docSentiment>
<score>0.695491</score>
<type>positive</type>
</docSentiment>
All files
TextGetTargetedSentiment finds the sentiment for a specific keyword within a text. This can be contrasted with document level sentiment (the endpoint TextGetTextSentiment), which looks at the whole text to determine sentiment.
The AlchemyAPI Java SDK can help you get up and running quickly with the targeted sentiment call.
I would like to use Stanford CoreNLP for lemmatization but I have some words not to be lemmatized. Is there a way to provide this ignore list to the tool? I am following this code, and when the program calls this.pipeline.annotate(document);then, that's it; it would be hard to replace the occurrences. One solution is that create a mapping list in which each word to be ignored is paired with lemmatize(word) (i.e., d = {(w1, lemmatize(w1)), (w2, lemmatize(w2), ...} and do the post processing with this mapping list. But it should be easier than this, I guess.
Thanks for the help.
I think I found the solution with my friend's help.
for(CoreMap sentence: sentences) {
// Iterate over all tokens in a sentence
for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
System.out.print(token.get(OriginalTextAnnotation.class) + "\t");
System.out.println(token.get(LemmaAnnotation.class));
}
}
You can get original form of the word by calling token.get(OriginalTextAnnotation.class).
FIRST :
Thank you for this great "Restler" !
MY CONTEXT :
I use Restler 3 to build my own API and return the total score of my users (on games).
Example : "www.mysite.com/api/score/12345" => Return the total score (json object) of user id "12345"
But, to get the total score (from the different games), i need to use my own library (called "scorers") and its classes :
- vendor/damGames/scorers/scoreGameA.class.php
- vendor/damGames/scorers/scoreGameB.class.php
- vendor/damGames/scorers/scoreGameC.class.php
- etc...
MY QUESTION IS :
To load the classes of my library, should i use the "Luracast\Restler\AutoLoader" ? I don't understand how... Else, how can i do ?
Thank you !
The only solution I found :
In the first lines of my Restler class called "Score", I put :
use Luracast\Restler\RestException;
use damGames\scorers\scoreGameA;
use damGames\scorers\scoreGameB;
use damGames\scorers\scoreGameC;
...
It's a good way to do ? Have you something better ?
I want to build a semi-natural language interface for a data warehouse. A simple data model looks for example like this:
Company
- attribute 'name'
- reference to 'Departments'
Department
- attribute 'type'
- reference to 'Employees'
Employee
- attribute 'age'
- attribute 'salary'
And I would like to make queries like so:
ACME employees, Bugs Bunny salary, ACME department types etc.
For input that is not in the grammar, I would call the database and resolve say ACME into Company.
... and turn the queries into paths that my database language will understand:
[Company].departments.employees, [Employee].salary, [Company].departments.type.
I remember using SWI-Prolog way back when to parse English sentences and say if they are correct. Is Prolog still the way to go in this case?
Thanks
Even, better, I now use DCG with embedded Prolog rules.
So, for a model that has classes and attributes like this:
c(company, class(company)) --> [company].
a(company, attribute(name)) --> [name].
I can ask for attributes of a class of a class:
q(q(A,C1,C2)) --> a(T1,A), [of], c(T1,C1)
,[of], c(T2,C2), {is_child(T1, T2)}.
And get a tree back as an answer.
In SWI-Prolog, there is Chat80 ready to install. I think could be very similar to what you are after, mutatis mutandis.
Just a sample query from the session log (note: was my own old port of chat80 to SWI-Prolog, the pack is presumably more functional, but I haven't tried to run):
what rivers are there ?
Parse: 0.0168457sec.
whq
$VAR
1
s
np
3+plu
np_head
int_det(B)
[]
river
[]
verb(be,active,pres+fin,[],pos)
void
[]
Semantics: 0.0170898sec.
answer([B]) :-
river(B)
& exists B
true
Planning: 0.0sec.
answer([B]) :-
river(B)
& exists B
true
amazon, amu_darya, amur, brahmaputra, colorado, congo_river, cubango, danube, don, elbe, euphrates, ganges, hwang_ho, indus, irrawaddy, lena, limpopo, mackenzie, mekong, mississippi, murray, niger_river, nile, ob, oder, orange, orinoco, parana, rhine, rhone, rio_grande, salween, senegal_river, tagus, vistula, volga, volta, yangtze, yenisei, yukon and zambesi.
Reply: 0.166992sec.
The logical form required by discourse to answer a query it's the central point of the system. Not really easy to craft from ground!
I read the book Prolog and Natural Language Analysis, F.Pereira, S.Shieber, 1987
(translated in Italian), still my preferred! The english original it's freely available here.
Have ended up with this example that translates a 'sentence' into a path in the model:
% root classes
class(ceo).
% model relations
attribute_of(age, ceo).
attribute_of(salary, ceo).
% grammar of relations
attribute('age', age).
attribute('salary', salary).
attribute('money', salary).
% answer format
answer([Class, Attribute], Class, Attribute).
% language rules
% query(Attribute,'of',Object, Answer).
query(AttributeQ, 'of', ClassQ, Answer) :-
db(ClassQ, Class), attribute(AttributeQ, Attribute), attribute_of(Attribute, Class), answer(Answer, Class, Attribute).
% database
db('Bugs Bunny', ceo).
As an example, the following query:
?- query('age','of','Bugs Bunny', Answer).
...gives me:
Answer = [ceo, age].