How to combine two sentences using simplenlg - nlg

Given a set of sentence like "John has a cat" and "John has a dog" would to create a sentence like "John has a cat and dog".
Can I use simplenlg to create the same.

The task you are asking about is called aggregation in Natural Language Generation (NLG). Whilst SimpleNLG does support aggregation with its realisation engine, it will not directly aggregate two strings such as those in your example.
It is possible however to use a syntactic parser and SimpleNLG to perform this task. I will first explain how to generate your target sentence using SimpleNLG grammar:
import simplenlg.framework.*;
import simplenlg.lexicon.*;
import simplenlg.realiser.english.*;
import simplenlg.phrasespec.*;
import simplenlg.features.*;
public class TestMain {
public static void main(String[] args) throws Exception {
Lexicon lexicon = Lexicon.getDefaultLexicon();
NLGFactory nlgFactory = new NLGFactory(lexicon);
Realiser realiser = new Realiser(lexicon);
// Create the SPhraseSpec object (sentence phrase).
SPhraseSpec p = nlgFactory.createClause();
// Create a noun phrase and set it as the subject of your sentence
NPPhraseSpec john = nlgFactory.createNounPhrase("John");
p.setSubject(john);
// Create a verb phrase and set it as the verb of your sentence
VPPhraseSpec have = nlgFactory.createVerbPhrase("have");
// Note that the verb is "have" not "has". Have is the base lemma.
// The morphology of this will be handled based on the tense you set (see below)
p.setVerb(have);
// Create a determiner 'a'
NPPhraseSpec a = nlgFactory.createNounPhrase("a");
// Create two more noun phrases
// One for dog
NPPhraseSpec cat = nlgFactory.createNounPhrase("cat");
// set the determiner
cat.setDeterminer(a);;
// And one for cat.
NPPhraseSpec dog = nlgFactory.createNounPhrase("dog");
// set the determiner
dog.setDeterminer(a);
// Create a coordinated phrase
// This tells SimpleNLG that these objects are a collection which should be aggregated
CoordinatedPhraseElement coord = nlgFactory.createCoordinatedPhrase(cat, dog);
// Set the coordinated phrase as the object of your sentence
p.setObject(coord);
// Print it -
String output = realiser.realiseSentence(p);
System.out.println(output);
// => John has a cat and a dog.
// Now lets see what SimpleNLG can do!
// Change the tense to past (present was the default)
p.setTense(Tense.PAST);
output = realiser.realiseSentence(p);
System.out.println(output);
// => John had a cat and a dog.
// Change the tense to future
p.setTense(Tense.FUTURE);
output = realiser.realiseSentence(p);
System.out.println(output);
// => John will will have a cat and a dog.
}
}
That is how you work with language in the SimpleNLG realiser. It does not however answer your question of aggregating two strings directly. There may be other ways but my first thought is to use a syntactic parses such as StanfordNLP or spaCy.
I use spaCy in my own work (which is a python library). I will show a brief example of what I mean here.
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp(u'John has a cat')
for token in doc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
token.shape_, token.is_alpha, token.is_stop)
This outputs:
John john PROPN NNP nsubj Xxxx True False
has have VERB VBZ ROOT xxx True True
a a DET DT det x True True
cat cat NOUN NN dobj xxx True False
You can see from the output that each token in the sentence has been marked as a noun, verb, determiner etc. You could use this information to format the input for SimpleNLG and then aggregate your sentences. I would suggest the XMLRealiser available in SimpleNLG would be better than just coding the grammar in Java. It takes XML as input.
NLP/NLG work is not trivial. Language is very complex. The above is just one way of approaching such a task. Tools might exist which just aggregate based on strings, but SimpleNLG is just a surface realiser so you would have to present it with input data in a suitable format as shown above.

Related

Do any functional programming languages have syntax sugar for changing part of an object?

In imperative programming, there is concise syntax sugar for changing part of an object, e.g. assigning to a field:
foo.bar = new_value
Or to an element of an array, or in some languages an array-like list:
a[3] = new_value
In functional programming, the idiom is not to mutate part of an existing object, but to create a new object with most of the same values, but a different value for that field or element.
At the semantic level, this brings about significant improvements in ease of understanding and composing code, albeit not without trade-offs.
I am asking here about the trade-offs at the syntax level. In general, creating a new object with most of the same values, but a different value for one field or element, is a much more heavyweight operation in terms of how it looks in your code.
Is there any functional programming language that provides syntax sugar to make that operation look more concise? Obviously you can write a function to do it, but imperative languages provide syntax sugar to make it more concise than calling a procedure; do any functional languages provide syntax sugar to make it more concise than calling a function? I could swear that I have seen syntax sugar for at least the object.field case, in some functional language, though I forget which one it was.
(Performance is out of scope here. In this context, I am talking only about what the code looks like and does, not how fast it does it.)
Haskell records have this functionality. You can define a record to be:
data Person = Person
{ name :: String
, age :: Int
}
And an instance:
johnSmith :: Person
johnSmith = Person
{ name = "John Smith"
, age = 24
}
And create an alternation:
johnDoe :: Person
johnDoe = johnSmith {name = "John Doe"}
-- Result:
-- johnDoe = Person
-- { name = "John Doe"
-- , age = 24
-- }
This syntax, however, is cumbersome when you have to update deeply nested records. We've got a library lens that solves this problem quite well.
However, Haskell lists do not provide an update syntax because updating on lists will have an O(n) cost - they are singly-linked lists.
If you want efficient update on list-like collections, you can use Arrays in the array package, or Vectors in the vector package. They both have the infix operator (//) for updating:
alteredVector = someVector // [(1, "some value")]
-- similar to `someVector[1] = "some value"`
it is not built-in, but I think infix notation is convenient enough!
One language with that kind of sugar is F#. It allows you to write
let myRecord3 = { myRecord2 with Y = 100; Z = 2 }
Scala also has sugar for updating a Map:
ms + (k -> v)
ms updated (k,v)
In a language such as Haskell, you would need to write this yourself. If you can express the update as a key-value pair, you might define
let structure' =
update structure key value
or
update structure (key, value)
which would let you use infix notation such as
structure `update` (key, value)
structure // (key, value)
As a proof of concept, here is one possible (inefficient) implementation, which also fails if your index is out of range:
module UpdateList (updateList, (//)) where
import Data.List (splitAt)
updateList :: [a] -> (Int,a) -> [a]
updateList xs (i,y) = let ( initial, (_:final) ) = splitAt i xs
in initial ++ (y:final)
infixl 6 // -- Same precedence as +
(//) :: [a] -> (Int,a) -> [a]
(//) = updateList
With this definition, ["a","b","c","d"] // (2,"C") returns ["a","b","C","d"]. And [1,2] // (2,3) throws a runtime exception, but I leave that as an exercise for the reader.
H. Rhen gave an example of Haskell record syntax that I did not know about, so I’ve removed the last part of my answer. See theirs instead.

How to use Stanford parser to find the parent and child of the word?

Am I right(whether or not the extraction of parent and child in a sentence is correct.)?
String sentence = "Kumi is the girl who likes dogs.";
Tree parse = lp.parse(sentence);
SemanticGraph graph = SemanticGraphFactory.makeFromTree(parse);
IndexedWord wordConnective = graph.getNodeByIndexSafe(i);//i is the index of the word
List<IndexedWord> parentWords = graph.getParentList(wordConnective);
List<IndexedWord> childWord = graph.getChildList(wordConnective);
If I want to extract the heads of the who in this sentence, how should I do?
An easy solution, using the simple API, might be:
new Sentence("Kumi is the girl who likes dogs").governor(4);

Graphlab: How to avoid manually duplicating functions that has only a different string variable?

I imported my dataset with SFrame:
products = graphlab.SFrame('amazon_baby.gl')
products['word_count'] = graphlab.text_analytics.count_words(products['review'])
I would like to do sentiment analysis on a set of words shown below:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']
Then I would like to create a new column for each of the selected words in the products matrix and the entry is the number of times such word occurs, so I created a function for the word "awesome":
def awesome_count(word_count):
if 'awesome' in product:
return product['awesome']
else:
return 0;
products['awesome'] = products['word_count'].apply(awesome_count)
so far so good, but I need to manually create other functions for each of the selected words in this way, e.g., great_count, etc. How to avoid this manual effort and write cleaner code?
I think the SFrame.unpack command should do the trick. In fact, the limit parameter will accept your list of selected words and keep only these results, so that part is greatly simplified.
I don't know precisely what's in your reviews data, so I made a toy example:
# Create the data and convert to bag-of-words.
import graphlab
products = graphlab.SFrame({'review':['this book is awesome',
'I hate this book']})
products['word_count'] = \
graphlab.text_analytics.count_words(products['review'])
# Unpack the bag-of-words into separate columns.
selected_words = ['awesome', 'hate']
products2 = products.unpack('word_count', limit=selected_words)
# Fill in zeros for the missing values.
for word in selected_words:
col_name = 'word_count.{}'.format(word)
products2[col_name] = products2[col_name].fillna(value=0)
I also can't help but point out that GraphLab Create does have its own sentiment analysis toolkit, which could be worth checking out.
I actually find out an easier way do do this:
def wordCount_select(wc,selectedWord):
if selectedWord in wc:
return wc[selectedWord]
else:
return 0
for word in selected_words:
products[word] = products['word_count'].apply(lambda wc: wordCount_select(wc, word))

How to Connect Logic with Objects

I have a system that contains x number of strings. These string are shown in a UI based on some logic. For example string number 1 should only show if the current time is past midday and string 3 only shows if a randomly generated number between 0-1 is less than 0.5.
How would be the best way to model this?
Should the logic just be in code and be linked to a string by some sort or ID?
Should the logic be some how stored with the strings?
NOTE The above is a theoretical example before people start questioning my logic.
It's usually better to keep resources (such as strings) separate from logic. So referring strings by IDs is a good idea.
It seems that you have a bunch of rules which you have to link to the display of strings. I'd keep all three as separate entities: rules, strings, and the linking between them.
An illustration in Python, necessarily simplified:
STRINGS = {
'morning': 'Good morning',
'afternoon': 'Good afternoon',
'luck': 'you must be lucky today',
}
# predicates
import datetime, random
def showMorning():
return datetime.datetime.now().hour < 12
def showAfternoon():
return datetime.datetime.now().hour >= 12
def showLuck():
return random.random() > 0.5
# interconnection
RULES = {
'morning': showMorning,
'afternoon': showAfternoon,
'luck': showLuck,
}
# usage
for string_id, predicate in RULES.items():
if predicate():
print STRINGS[string_id]

Is Odersky serious with "bills !*&^%~ code!"?

In his book programming in scala (Chapter 5 Section 5.9 Pg 93)
Odersky mentioned this expression "bills !*&^%~ code!"
In the footnote on same page:
"By now you should be able to figure out that given this code,the Scala compiler would
invoke (bills.!*&^%~(code)).!()."
That's a bit to cryptic for me, could someone explain what's going on here?
What Odersky means to say is that it would be possible to have valid code looking like that. For instance, the code below:
class BadCode(whose: String, source: String) {
def ! = println(whose+", what the hell do you mean by '"+source+"'???")
}
class Programmer(who: String) {
def !*&^%~(source: String) = new BadCode(who, source)
}
val bills = new Programmer("Bill")
val code = "def !*&^%~(source: String) = new BadCode(who, source)"
bills !*&^%~ code!
Just copy&paste it on the REPL.
The period is optional for calling a method that takes a single parameter, or has an empty parameter list.
When this feature is utilized, the next chunk after the space following the method name is assumed to be the single parameter.
Therefore,
(bills.!*&^%~(code)).!().
is identical to
bills !*&^%~ code!
The second exclamation mark calls a method on the returned value from the first method call.
I'm not sure if the book provides method signatures but I assume it's just a comment on Scala's syntactic sugar so it assumes if you type:
bill add monkey
where there is an object bill which has a method add which takes a parameter then it automatically interprets it as:
bill.add(monkey)
Being a little Scala rusty, I'm not entirely sure how it splits code! into (code).!() except for a vague tickling of the grey cells that the ! operator is used to fire off an actor which in compiler terms might be interpretted as an implicit .!() method on the object.
The combination of the '.()' being optional with method calls (as Wysawyg explained above) and the ability to use (almost) whatever characters you like for naming methods, makes it possible to write methods in Scala that look like operator overloading. You can even invent your own operators.
For example, I have a program that deals with 3D computer graphics. I have my own class Vector for representing a 3D vector:
class Vector(val x: Double, val y: Double, val z: Double) {
def +(v: Vector) = new Vector(x + v.x, y + v.y, z + v.z)
// ...etc.
}
I've also defined a method ** (not shown above) to compute the cross product of two vectors. It's very convenient that you can create your own operators like that in Scala, not many other programming languages have this flexibility.

Resources