Steps to generate parse tree from CYK Algorithm (Natural Language Processing) - algorithm

I am currently working on a project involving NLP. I have implemented a CKY identifier as given in Jurafsky and Martin (algorithm on page 450). The table so produced actually stores the nonterminals in the table (instead of the usual boolean values). However, the only issue I am getting is to retrieve the parse tree.
Here is an illustration of what my CKY identifier does:
This is my grammar
S -> NP VP
S -> VP
NP -> MODAL PRON | DET NP | NOUN VF | NOUN | DET NOUN | DET FILENAME
MODAL -> 'MD'
PRON -> 'PPSS' | 'PPO'
VP -> VERB NP
VP -> VERB VP
VP -> ADVERB VP
VP -> VF
VERB -> 'VB' | 'VBN'
NOUN -> 'NN' | 'NP'
VF -> VERB FILENAME
FILENAME -> 'NN' | 'NP'
ADVERB -> 'RB'
DET -> 'AT'
And this is the algorithm:
for j from i to LENGTH(words) do
table[j-1,j] = A where A -> POS(word[j])
for i from j-2 downto 0
for k from i+1 to j-1
table[i,j] = Union(table[i,j], A such that A->BC)
where B is in table[i,k] and C is in table[k,j]
And this is what my parsing table looks like after being filled:
Now that I know that since S resides in [0,5], the string has been parsed, and that for k = 1 (as per the algorithm given in Martin and Jurafsky), we have S -> table[0][2] table[2][5]
i.e. S -> NP VP
The only issue I am getting is that I have been able to retrieve the rules used, but then they are in a jumbled format, i.e. not on the basis of their appearance in parse tree. Can someone suggest an algorithm to retrieve the correct parse tree?
Thankyou.

You should visit recursively the cells of your table and unfold them in the same way you did for the S node until everything is a terminal (so you don't have anything else to unfold). In your example, you first go to cell [0][2]; this is a terminal, you don't have to do anything. Next you go to [2][5], this is a non-terminal made by [2][3] and [3][5]. You visit [2][3], it's a terminal. [3][5] is a non-terminal, made by two terminals. You are done. Here is a demo in Python:
class Node:
'''Think this as a cell in your table'''
def __init__(self, left, right, type, word):
self.left = left
self.right = right
self.type = type
self.word = word
# Declare terminals
t1 = Node(None,None,'MOD','can')
t2 = Node(None,None,'PRON','you')
t3 = Node(None,None,'VERB', 'eat')
t4 = Node(None,None,'DET', 'a')
t5 = Node(None,None,'NOUN','flower')
# Declare non-terminals
nt1 = Node(t1,t2, 'NP', None)
nt2 = Node(t4,t5, 'NP', None)
nt3 = Node(t3,nt2,'VP', None)
nt4 = Node(nt1,nt3,'S', None)
def unfold(node):
# Check for a terminal
if node.left == None and node.right == None:
return node.word+"_"+node.type
return "["+unfold(node.left)+" "+unfold(node.right)+"]_"+node.type
print unfold(nt4)
And the output:
[[can_MOD you_PRON]_NP [eat_VERB [a_DET flower_NOUN]_NP]_VP]_S

Related

Binary trees post traversal into a linked list

I'm studying for my exam and I can't figure out how to answer this question.
Make a function that returns a linked list of the post traversal of a binary tree. You cannot use any containers to store your data before hand and it must be recursive.
So I can't take the post traversal, put it in a list and then iterate through it to make a linked list.
There was another question that said to do the same thing but for a inorder traversal and this was the solution.
There are two solutions. One that I don't understand and a clear version of that one
THIS IS THE MAIN SOLUTION
def inorder(root: BTNode) -> LLNode:
"""Return the first node in a linked list that contains every value from the
binary tree rooted at root, listed according to an inorder traversal.
>>> b = BTNode(1, BTNode(2), BTNode(3))
>>> repr(inorder(b))
'LLNode(2, LLNode(1, LLNode(3)))'
>>> b2 = BTNode(4, BTNode(5))
>>> b3 = BTNode(7, b, b2)
>>> str(inorder(b3))
'2 -> 1 -> 3 -> 7 -> 5 -> 4'
>>> # from the handout...
>>> left = BTNode('B', None, BTNode('D', BTNode('G')))
>>> right = BTNode('C', BTNode('E'), BTNode('F'))
>>> root = BTNode('A', left, right)
>>> str(inorder(root))
'B -> G -> D -> A -> E -> C -> F'
"""
return _inorder(root)[0]
def _inorder(root: BTNode) -> (LLNode, LLNode):
"""Return the first and last nodes in a linked list that contains every
value from the binary tree rooted at root, listed according to an inorder
traversal.
"""
if root:
head_left, tail_left = _inorder(root.left)
head_right, tail_right = _inorder(root.right)
node_root = LLNode(root.item, head_right)
if tail_left:
tail_left.link = node_root
return head_left or node_root, tail_right or node_root
else:
return None, None
CLEAR SOLUTION
def inorder(root: BTNode) -> LLNode:
"""Return the first node in a linked list that contains every value from the
binary tree rooted at root, listed according to an inorder traversal.
>>> b = BTNode(1, BTNode(2), BTNode(3))
>>> repr(inorder(b))
'LLNode(2, LLNode(1, LLNode(3)))'
>>> b2 = BTNode(4, BTNode(5))
>>> b3 = BTNode(7, b, b2)
>>> str(inorder(b3))
'2 -> 1 -> 3 -> 7 -> 5 -> 4'
>>> # from the handout...
>>> left = BTNode('B', None, BTNode('D', BTNode('G')))
>>> right = BTNode('C', BTNode('E'), BTNode('F'))
>>> root = BTNode('A', left, right)
>>> str(inorder(root))
'B -> G -> D -> A -> E -> C -> F'
"""
return _inorder(root)[0] # what must this first item represent?
def _inorder(root: BTNode) -> (LLNode, LLNode): # what are these 1st and 2nd things?
"""Return the first and last nodes in a linked list that contains every
value from the binary tree rooted at root, listed according to an inorder
traversal.
>>> left = BTNode('B', None, BTNode('D', BTNode('G')))
>>> right = BTNode('C', BTNode('E'), BTNode('F'))
>>> root = BTNode('A', left, right)
>>> str(inorder(root))
'B -> G -> D -> A -> E -> C -> F'
"""
if not root:
return None, None
else:
# Start off by making a new node of our item, with None for a link
# Obviously we will need to replace that None if we have a right branch
new_node = LLNode(root.item)
# Recursive call on right branch gives us its head and tail
right_head, right_tail = _inorder(root.right)
# The link on our new node should be the right head, even if it's None
new_node.link = right_head
# Ultimately the tail for this whole node will be the rightmost tail
# If there is no right side, though, this node is the rightmost tail
if not right_tail:
right_tail = new_node
# Recursive call on left branch gives us its head and tail
left_head, left_tail = _inorder(root.left)
# If there is a left tail, we should string our current node to the end
if left_tail:
left_tail.link = new_node
# Ultimately the head for this whole node will be the leftmost head
# If there is no left head, though, this node is the leftmost head
if not left_head:
left_head = new_node
# Return the leftmost head and the rightmost tail
return left_head, right_tail
I am not sure what keeps you from understanding the main solution. The comment explains it very well. In any case, the recursive call to _inorder(root.left) flattens the left subtree, and returns the head and tail of the resulting list. Same way, the recursive call to _inorder(root.right) flattens the right subtree. Now you have two lists,
head_left-> ... ->tail_left
head_right-> ... ->tail_right
stitch them with the root,
head_left-> ... ->tail_left->root->head_right-> ... ->tail_right
and return the resulting list
head_left-> ... ->tail_right
To achieve the postorder, stitch them as in
head_left-> ... ->tail_left->head_right-> ... ->tail_right->root
and return
head_left-> ... ->root

DPLL algorithm and number of visited nodes

I'm implementing DPLL algorithm that counts the number of visited nodes. I managed to implement DPLL that doesn't count visited nodes but I can't think of any solutions to the problem of counting. The main problem is that as the algorithm finds satisfying valuation and returns True, the recursion rolls up and returns counter from the moment the recursion started. In any imperative language I would just use global variable and increment it as soon as function is invoked, but it is not the case in Haskell.
The code I pasted here does not represent my attempts to solve the counting problem, it is just my solution without it. I tried to use tuples such as (True,Int) but it will always return integer value from the moment the recursion started.
This is my implementation where (Node -> Variable) is a heuristic function, Sentence is list of clauses in CNF to be satisfied, [Variable] is a list of Literals not assigned and Model is just a truth valuation.
dpll' :: (Node -> Variable) -> Sentence -> [Variable] -> Model -> Bool
dpll' heurFun sentence vars model
| satisfiesSentence model sentence = True
| falsifiesSentence model sentence = False
| otherwise = applyRecursion
where
applyRecursion
| pureSymbol /= Nothing = recurOnPureSymbol
| unitSymbol /= Nothing = recurOnUnitSymbol
| otherwise = recurUsingHeuristicFunction
where
pureSymbol = findPureSymbol vars sentence model
unitSymbol = findUnitClause sentence model
heurVar = heurFun (sentence,(vars,model))
recurOnPureSymbol =
dpll' heurFun sentence (vars \\ [getVar pureSymbol]) ((formAssignment pureSymbol):model)
recurOnUnitSymbol =
dpll' heurFun sentence (vars \\ [getVar unitSymbol]) ((formAssignment unitSymbol):model)
recurUsingHeuristicFunction = case vars of
(v:vs) -> (dpll' heurFun sentence (vars \\ [heurVar]) ((AS (heurVar,True)):model)
|| dpll' heurFun sentence (vars \\ [heurVar]) ((AS (heurVar,False)):model))
[] -> False
I would really appreciate any advice on how to count the visited nodes. Thank you.
EDIT:
The only libraries I'm allowed to use are System.Random, Data.Maybe and Data.List.
EDIT:
One possible solution I tried to implement is to use a tuple (Bool,Int) as a return value from DPPL function. The code looks like:
dpll'' :: (Node -> Variable) -> Sentence -> [Variable] -> Model -> Int -> (Bool,Int)
dpll'' heurFun sentence vars model counter
| satisfiesSentence model sentence = (True,counter)
| falsifiesSentence model sentence = (False,counter)
| otherwise = applyRecursion
where
applyRecursion
| pureSymbol /= Nothing = recurOnPureSymbol
| unitSymbol /= Nothing = recurOnUnitSymbol
| otherwise = recurUsingHeuristicFunction
where
pureSymbol = findPureSymbol vars sentence model
unitSymbol = findUnitClause sentence model
heurVar = heurFun (sentence,(vars,model))
recurOnPureSymbol =
dpll'' heurFun sentence (vars \\ [getVar pureSymbol]) ((formAssignment pureSymbol):model) (counter + 1)
recurOnUnitSymbol =
dpll'' heurFun sentence (vars \\ [getVar unitSymbol]) ((formAssignment unitSymbol):model) (counter + 1)
recurUsingHeuristicFunction = case vars of
(v:vs) -> ((fst $ dpll'' heurFun sentence (vars \\ [heurVar]) ((AS (heurVar,True)):model) (counter + 1))
|| (fst $ dpll'' heurFun sentence (vars \\ [heurVar]) ((AS (heurVar,False)):model) (counter + 1)),counter)
[] -> (False,counter)
The basic idea of this approach is to increment the counter at each recursive call. However, the problem with this approach is that I have no idea how to retrieve counter from recursive calls in OR statement. I'm not even sure if this is possible in Haskell.
You can retrieve the counter from the recursive call using case or similar.
recurUsingHeuristicFunction = case vars of
v:vs -> case dpll'' heurFun sentence (vars \\ [heurVar]) (AS (heurVar,True):model) (counter + 1) of
(result, counter') -> case dpll'' heurFun sentence (vars \\ [heurVar]) (AS (heurVar,False):model) counter' of
(result', counter'') -> (result || result', counter'')
[] -> (False,counter)
This is a manual implementation of the State monad. However, it's not clear to me why you are passing in a counter at all. Just return it. Then it is the simpler Writer monad instead. The code for this helper would look something like this:
recurUsingHeuristicFunction = case vars of
v:vs -> case dpll'' heurFun sentence (vars \\ [heurVar]) (AS (heurVar,True):model) of
(result, counter) -> case dpll'' heurFun sentence (vars \\ [heurVar]) (AS (heurVar,False):model) of
(result', counter') -> (result || result', counter + counter' + 1)
[] -> (False,0)
Other results would be similar -- returning 0 instead of counter and 1 instead of counter+1 -- and the call to the function would be simpler, with one fewer argument to worry about setting up correctly.
Basically what you described as your solution in imperative language can be modeled by passing around a counting variable, adding the variable to the result at the time you return it (the bottom of recursion that reaches satisfiable assignment), i.e. for a function a -> b you would create a new function a -> Int -> (b, Int). The Int argument is the current state of the counter, the result is enriched with the updated state of the counter.
This can further be re-expressed elegantly using the state monad. A very nice tutorial on haskell in general and state monad is here. Basically the transformation of a -> b to a -> Int -> (b, Int) can be seen as transformation of a -> b into a -> State Int b by simply given a nicer name to the function Int -> (b, Int). There is a very nice blog that explains where these nice abstractions come from in a very accessible way.
import Control.Monad.Trans.StateT
type Count = Int
dpllM :: (Node -> Variable) -> Sentence -> [Variable] -> Model -> State Count Bool
dpllM heurFun sentence vars model | ... = do
-- do your thing
modify (+1)
-- do your thing
dpll' :: (Node -> Variable) -> Sentence -> [Variable] -> Model -> Bool
dpll' heurFun sentence vars model = runState (dpllM heurFun sentence vars model) 0
Maybe you want something like
f :: A -> Int -> (Bool, Int)
f a c =
let a' = ...
a'' = ...
(b', c') = f a' c in f a'' c'

F# Tree: Node Insertion

This is a question that extends F# Recursive Tree Validation, which I had nicely answered yesterday.
This question concerns inserting a child in an existing tree. This is the updated type I'd like to use:
type Name = string
type BirthYear = int
type FamilyTree = Person of Name * BirthYear * Children
and Children = FamilyTree list
My last question concerned checking the validity of the tree, this was the solution I decided to go with:
let rec checkAges minBirth = function
| Person(_,b,_) :: t -> b >= minBirth && checkAges b t
| [] -> true
let rec validate (Person(_,b,c)) =
List.forall isWF c && checkAges (b + 16) c
Now I would like to be able to insert a Person Simon as a child of specific Person Hans in the following form
insertChildOf "Hans" simon:Person casperFamily:FamilyTree;;
So, input should be parent name, child and the family tree. Ideally it should then return a modified family tree, that is FamilyTree option
What I am struggling with is to incorporating the validate function to make sure it is legal, and a way to insert it properly in the list of children, if the insertion Person is already a parent - maybe as a seperate function.
All help is welcome and very appreciated - thanks! :)
After your comment here's a code that will behave as expected:
let insert pntName (Person(_, newPrsnYear, _) as newPrsn) (Person (n,y,ch)) =
let rec ins n y = function
| [] -> if y < newPrsnYear && n = pntName then Some [newPrsn] else None
| (Person (name, year, childs) as person) :: bros ->
let tryNxtBros() = Option.map (fun x -> person::x) (ins n y bros)
if y < newPrsnYear && n = pntName then // father OK
if newPrsnYear < year then // brother OK -> insert here
Some (newPrsn::person::bros)
else tryNxtBros()
else // keep looking, first into eldest child ...
match ins name year childs with
| Some i -> Some (Person (name, year, i) :: bros)
| _ -> tryNxtBros() // ... then into other childs
Option.map (fun x -> Person (n, y, x)) (ins n y ch)
As in my previous answer I keep avoiding using List functions since I don't think they are a good fit in a tree structure unless the tree provides a traverse.
I might be a bit purist in the sense I use either List functions (with lambdas and combinators) or pure recursion, but in general I don't like mixing them.

Implement tree builder with F#

I am pretty new to F# and I wanted to implement a solution to the following problem:
From a sequence of disk paths discovered in random order (e.g. "C:\Hello\foo" "C:" , "C:\Hello\bar" etc....) how to build (efficiently) the tree.
Assumption: the sequence is valid, which means the tree can be effectively created.
So I tried to implement with a recursive function ("mergeInto" in the following) which merges the tree "in place" with a list of string (the splitted path called "branch")
Here is my implementation, the immutability prevents side effects on the input tree, so I tried to use a ref cell for the input Tree but I encounter difficulty with the recursion. Any solution ?
open Microsoft.VisualStudio.TestTools.UnitTesting
type Tree =
|Node of string*list<Tree>
|Empty
let rec branchToTree (inputList:list<string>) =
match inputList with
| [] -> Tree.Empty
| head::tail -> Tree.Node (head, [branchToTree tail])
//branch cannot be empty list
let rec mergeInto (tree:Tree ref) (branch:list<string>) =
match !tree,branch with
| Node (value,_), head::tail when String.op_Inequality(value, head) -> raise (ApplicationException("Oops invariant loop broken"))
| Node (value,_), [_] -> ignore() //the branch is singleton and by loop invariant its head is the current Tree node -> nothing to do.
| Node (value,children), _ ->
let nextBranchValue = branch.Tail.Head //valid because of previous match
//broken attempt to retrieve a ref to the proper child
let targetChild = children
|> List.map (fun(child) -> ref child)
|> List.tryFind (fun(child) -> match !child with
|Empty -> false
|Node (value,_) -> value = nextBranchValue)
match targetChild with
|Some x -> mergeInto x branch.Tail //a valid child match then go deeper. NB: branch.Tail cannot be empty here
|None -> tree := Node(value, (Node (nextBranchValue,[])) :: children)//attach the next branch value to the children
| Empty,_ -> tree := branchToTree branch
[<TestClass>]
type TreeTests () =
[<TestMethod>]
member this.BuildTree () =
let initialTree = ref Tree.Empty
let branch1 = ["a";"b";"c"]
let branch2 = ["a";"b";"d"]
do mergeInto initialTree branch1
//-> my tree is ok
do mergeInto initialTree branch2
//->not ok, expected a
// |
// b
// / \
// d c
You can't make a ref to an element in a list, change the ref and then expect the item in the list to change. If you really want to do that then you should put the references into your Tree type.
type Tree =
|Node of string*list<Tree ref>
|Empty
let rec branchToTree (inputList:list<string>) =
match inputList with
| [] -> Tree.Empty
| head::tail -> Tree.Node(head, [ref (branchToTree tail)])
If you do that, remove the List.map (fun(child) -> ref child) part then your code works.
You might be interested in zippers which allow you to do something similar but without mutation.

Transform one word into another by changing, inserting, or deleting one character at a time

Given a finite dictionary of words and a start-end pair (e.g. "hands" and "feet" in the example below), find the shortest sequence of words such that any word in the sequence can be formed from either of its neighbors by either 1) inserting one character, 2) deleting one character, or 3) changing one character.
hands ->
hand ->
and ->
end ->
fend ->
feed ->
feet
For those who may be wondering - this is not a homework problem that was assigned to me or a question I was asked in an interview; it is simply a problem that interests me.
I am looking for a one- or two- sentence "top down view" of what approach you would take -- and for the daring, a working implementation in any language.
Instead of turning the dictionary into a full graph, use something with a little less structure:
For each word in the dictionary, you get a shortened_word by deleting character number i for each i in len(word). Map the pair (shortened_word, i) to a list of all the words.
This helps looking up all words with one replaced letter (because they must be in the same (shortened_word, i) bin for some i, and words with one more letter (because they must be in some (word, i) bin for some i.
The Python code:
from collections import defaultdict, deque
from itertools import chain
def shortened_words(word):
for i in range(len(word)):
yield word[:i] + word[i + 1:], i
def prepare_graph(d):
g = defaultdict(list)
for word in d:
for short in shortened_words(word):
g[short].append(word)
return g
def walk_graph(g, d, start, end):
todo = deque([start])
seen = {start: None}
while todo:
word = todo.popleft()
if word == end: # end is reachable
break
same_length = chain(*(g[short] for short in shortened_words(word)))
one_longer = chain(*(g[word, i] for i in range(len(word) + 1)))
one_shorter = (w for w, i in shortened_words(word) if w in d)
for next_word in chain(same_length, one_longer, one_shorter):
if next_word not in seen:
seen[next_word] = word
todo.append(next_word)
else: # no break, i.e. not reachable
return None # not reachable
path = [end]
while path[-1] != start:
path.append(seen[path[-1]])
return path[::-1]
And the usage:
dictionary = ispell_dict # list of 47158 words
graph = prepare_graph(dictionary)
print(" -> ".join(walk_graph(graph, dictionary, "hands", "feet")))
print(" -> ".join(walk_graph(graph, dictionary, "brain", "game")))
Output:
hands -> bands -> bends -> bents -> beets -> beet -> feet
brain -> drain -> drawn -> dawn -> damn -> dame -> game
A word about speed: building the 'graph helper' is fast (1 second), but hands -> feet takes 14 seconds, and brain --> game takes 7 seconds.
Edit: If you need more speed, you can try using a graph or network library. Or you actually build the full graph (slow) and then find paths much faster. This mostly consists of moving the look-up of edges from the walking function to the graph-building function:
def prepare_graph(d):
g = defaultdict(list)
for word in d:
for short in shortened_words(word):
g[short].append(word)
next_words = {}
for word in d:
same_length = chain(*(g[short] for short in shortened_words(word)))
one_longer = chain(*(g[word, i] for i in range(len(word) + 1)))
one_shorter = (w for w, i in shortened_words(word) if w in d)
next_words[word] = set(chain(same_length, one_longer, one_shorter))
next_words[word].remove(word)
return next_words
def walk_graph(g, start, end):
todo = deque([start])
seen = {start: None}
while todo:
word = todo.popleft()
if word == end: # end is reachable
break
for next_word in g[word]:
if next_word not in seen:
seen[next_word] = word
todo.append(next_word)
else: # no break, i.e. not reachable
return None # not reachable
path = [end]
while path[-1] != start:
path.append(seen[path[-1]])
return path[::-1]
Usage: Build the graph first (slow, all timings on some i5 laptop, YMMV).
dictionary = ispell_dict # list of 47158 words
graph = prepare_graph(dictionary) # more than 6 minutes!
Now find the paths (much faster than before, times without printing):
print(" -> ".join(walk_graph(graph, "hands", "feet"))) # 10 ms
print(" -> ".join(walk_graph(graph, "brain", "game"))) # 6 ms
print(" -> ".join(walk_graph(graph, "tampering", "crunchier"))) # 25 ms
Output:
hands -> lands -> lends -> lens -> lees -> fees -> feet
brain -> drain -> drawn -> dawn -> damn -> dame -> game
tampering -> tapering -> capering -> catering -> watering -> wavering -> havering -> hovering -> lovering -> levering -> leering -> peering -> peeping -> seeping -> seeing -> sewing -> swing -> swings -> sings -> sines -> pines -> panes -> paces -> peaces -> peaches -> beaches -> benches -> bunches -> brunches -> crunches -> cruncher -> crunchier
A naive approach could be to turn the dictionary into a graph, with the words as nodes and the edges connecting "neighbors" (i.e. words that can be turned into one another via one operation). Then you could use a shortest-path algorithm to find the distance between word A and word B.
The hard part about this approach would be finding a way to efficiently turn the dictionary into a graph.
Quick answer. You can compute for the Levenshtein distance, the "common" edit distance in most dynamic programming texts, and, from the computation table generated, try to build that path.
From the Wikipedia link:
d[i, j] := minimum
(
d[i-1, j] + 1, // a deletion
d[i, j-1] + 1, // an insertion
d[i-1, j-1] + 1 // a substitution
)
You can take note of when these happens in your code (maybe, in some auxiliary table) and, surely, it'd be easy reconstructing a solution path from there.

Resources