Convert natural language into logical formula - logic

I tried for days to write a NLTK grammar to convert simple French sentences into logical formulas. My problem can be similar with English sentences. My goal is that this grammar accepts several orders (home automation) and converts them into logical formulas. Some examples of orders:
Turn on the light:
exists x.(turn_on(x) & light(x))
Turn on the green light:
exists x.(turn_on(x) & light(x) & green(x))
Turn on the light of the kitchen
exists x.(turn_on(x) & light(x) & exists y.(kitchen(y) & in(x, y)))
In these examples, the word turn_on is not really a logical predicate. It will be used in the next step of my program (when it will convert this formula into another representation).
However, I have many difficulties to write the rule about possession relationship. I would like that the rule accepts an "infinite" recursion like:
turn on the light of the kitchen (the light belongs to the kitchen in my database)
turn on the light of the kitchen of the house (the kitchen belongs to the house in my database)
turn on the light of the kitchen of the house of [...] (etc.)
I succeeded to convert the first sentence but not the others. Here my grammar (I translate French to English for a better understanding):
% start SV
SV[SEM=<?v(?sn)>] -> V[SEM=?v] SN[SEM=?sn]
SN[SEM=<?ap(?sn1, ?sn2)>] -> SN[SEM=?sn1] AP[SEM=?ap] SN[SEM=?sn2]
SN[SEM=<?ad(?n)>] -> AD[SEM=?ad] N[SEM=?n]
SN[SEM=?n] -> N[SEM=?n]
N[SEM=<?adj(?n)>] -> ADJ[SEM=?adj] N[SEM=?n]
V[SEM=<\P.P(\x.turn_on(x))>] -> 'turn' 'on'
N[SEM=<\x.light(x)>] -> 'light'
N[SEM=<\x.kitchen(x)>] -> 'kitchen'
N[SEM=<\x.house(x)>] -> 'house'
ADJ[SEM=<\P x.(P(x) & big(x))>] -> 'big'
ADJ[SEM=<\P x.(P(x) & green(x))>] -> 'green'
AD[SEM=<\P Q.exists x.(P(x) & Q(x))>] -> 'the'
AP[SEM=<\P Q R.Q(\x.P(\y.(in(y,x) & R(y))))>] -> 'of'
With this grammar and the order "turn on the light of the kitchen", I get:
exists x.(kitchen(x) & exists z1.(light(z1) & in(z1,x) & turn_on(z1)))
But, for the order "turn on the light of the kitchen of the house":
exists x.(house(x) & exists z5.(kitchen(z5) & exists z2.(light(z2) & in(z2,z5) & in(z2,x) & turn_on(z2))))
To be more readable, the same formula without the "exists":
(house(x4) & kitchen(x6) & light(x7) & in(x7,x6) & in(x7,x4) & turn_on(x7))
There is a problem with the "in" predicates. Indeed, I want that the light is in the kitchen and that the kitchen is in the house. However, in this case, the light is in the kitchen and in the house (yes, it's true, but I don't want that =/). Here's what I would like:
(house(x4) & kitchen(x6) & light(x7) & in(x7,x6) & in(x6,x4) & turn_on(x7))
the difference -----^
I tried several methods but none of them worked... Can you help me please? I don't know if it's possible with my grammar. My knowledge on logic and lambda calcul are limited, I only just beginning to get interested in these topics.
EDIT:
Here is the python code that I use for my tests:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import nltk
def exec(parser, query):
try:
trees = list(parser.parse(query.split()))
except ValueError:
print('Invalid query')
return
if len(trees) == 0:
print('Invalid query')
return
print('query: %s' % query)
print('results:')
for t in trees:
sem = t.label()['SEM']
print('\t%s' % sem)
print('')
if __name__ == '__main__':
parser = nltk.load_parser('./en_grammar.fcfg')
exec(parser, 'turn on the light')
exec(parser, 'turn on the light of the kitchen')
exec(parser, 'turn on the light of the kitchen of the house')
Thanks a lot and sorry for my English.

It is hard to say that an existential quantifier is the logical form of an imperative sentence. However, your question lies in another problem.
It seems that you have an ambiguous grammar. Specially when you intrepret the x of y with in(x, y) function, it is imaginable to have similar ambiguity as in second phrase:
the light of the kitchen in the house .
the ball of the kid in the yard .
The ball which is in the yard.
The kid who is the yard.
Your grammar based on your code produces these two interpretations for desired sentence:
query: turn on the light of the kitchen of the house
results:
exists x.(house(x) & exists z5.(kitchen(z5) & exists z2.(light(z2) & in(z2,z5) & in(z2,x) & turn_on(z2))))
exists x.(house(x) & exists z3.(kitchen(z3) & in(z3,x) & exists z6.(light(z6) & in(z6,z3) & turn_on(z6))))
In second interpretation: house(x) & exists z3.(kitchen(z3) & in(z3,x) ... is exact thing that you want.
UPDATE:
Let's try to avoid the ambiguity in chains of x of y of z.
One very fast solution to force x of (y of z) instead of (x of y) of z is to track the of usage in all noun phrases, and then force it to have no OF on the left side of the of:
SN[SEM=<?ap(?sn1, ?sn2)>, +OF] -> SN[SEM=?sn1, -OF] AP[SEM=?ap] SN[SEM=?sn2]
SN[SEM=<?ad(?n)>, -OF] -> AD[SEM=?ad] N[SEM=?n]
SN[SEM=?n, -OF] -> N[SEM=?n]

Related

How to refer to an equation in Sphinx

I have an equation written in a rst file as:
.. math::
F=\begin{bmatrix} \lambda_1 & 0 & 0\\0 & \lambda_2 & 0\\0 & 0 & \lambda_3\end{bmatrix}
:label: eq:6
It is shown perfectly as:
Now I want to refer to this equation in the same rst file. I tried somwthing like:
I need to refer to this :ref:`Link title < eq:6>`
However it did not work. How can I link (e.g. refer) to this equation?
You have mismatched indentation for your math role, an incorrect role option of label instead of name, incorrect ordering of role and its content, and an extra space after the < in your link reference.
The following works for me.
.. math::
:name: eq:6
F=\begin{bmatrix} \lambda_1 & 0 & 0\\0 & \lambda_2 & 0\\0 & 0 & \lambda_3\end{bmatrix}
I need to refer to this :ref:`Link title <eq:6>`
There is another reference to :math:numref:, but I do not think that is what you want. There is also the use of ref where one can use a label as the target of the ref.

h2o.ai AutoML StackedEnsemble giving falsely superior validation metrics

I believe I've uncovered a bug (or limitation) in the h2o.ai AutoML StackedEnsemble validation metrics.
When running AutoML with only one model type (in this case XGBoost) and n-fold cross validation, I was surprised to see that the BestOfFamily StackedEnsemble scored better than any of the individual XGBoost models. Which should not be possible, since the BestOfFamily StackedEnsemble in this scenario contains only one model, the leading XGBoost model, and therefore should have identical validation metrics to it. I confirmed by checking the StackedEnsemble did indeed only contain the best XGBoost model, yet had different and superior validation metrics.
My best hypothesis for this is that the metalearner algorithm (at least the default GLM one) does not take into account the weights I had assigned to the training data. Some of the observations in the training data are related, and I needed to reduce their weight relative to more unique observations (if that doesn't make sense or is wrong, feel free to correct as I'm fairly amateur, but it's beside the point). When I discovered this anomaly, I was only using XGBoost, but I do use AutoML with multiple model categories and am concerned the problem will affect those Stacked Ensembles and their rankings as well.
So unfortunately, unless this can be explained or corrected, I will not be able to use Stacked Ensembles in my current endeavors, since I can't trust the validation metrics. Does anyone have such an explanation or method of fixing the problem?
h2o Version: 3.28.0.1
Used in conjunction with R Version: 3.6.1
As requested, below is some quickly cobbled together R code to generate some synthetic data and apply AutoML that should reproduce the problem.
Further edit: When I make all the weights the same in this example, the validation metric for the Best of Family Stacked Ensemble are closer to, but still not identical to, the single XGBoost model is contains. I don't understand how this can be true, as a Stacked Ensemble of one model should be have outputs identical to that one model, correct?
h2o.init()
DF<-data.frame(c(rep(T,1000), rep(F,1000)))
colnames(DF)<-"RESULT"
DF$WEIGHT<-rep(c(rep(1,500), rep(2,500)), 2)
DF[which(DF$RESULT & DF$WEIGHT==1), "VAR_1"]<-rnorm(length(which(DF$RESULT & DF$WEIGHT==1)), mean = 1, sd = 1)
DF[which(DF$RESULT & DF$WEIGHT==2), "VAR_1"]<-rnorm(length(which(DF$RESULT & DF$WEIGHT==2)), mean = 1, sd = 2)
DF[which(!DF$RESULT & DF$WEIGHT==1), "VAR_1"]<-rnorm(length(which(!DF$RESULT & DF$WEIGHT==1)), mean = 2, sd = 1)
DF[which(!DF$RESULT & DF$WEIGHT==2), "VAR_1"]<-rnorm(length(which(!DF$RESULT & DF$WEIGHT==2)), mean = 2, sd = 2)
DF[which(DF$RESULT & DF$WEIGHT==1), "VAR_2"]<-rnorm(length(which(DF$RESULT & DF$WEIGHT==1)), mean = 1, sd = 1)
DF[which(DF$RESULT & DF$WEIGHT==2), "VAR_2"]<-rnorm(length(which(DF$RESULT & DF$WEIGHT==2)), mean = 1, sd = 2)
DF[which(!DF$RESULT & DF$WEIGHT==1), "VAR_2"]<-rnorm(length(which(!DF$RESULT & DF$WEIGHT==1)), mean = 2, sd = 1)
DF[which(!DF$RESULT & DF$WEIGHT==2), "VAR_2"]<-rnorm(length(which(!DF$RESULT & DF$WEIGHT==2)), mean = 2, sd = 2)
DF[which(DF$RESULT & DF$WEIGHT==1), "VAR_3"]<-rnorm(length(which(DF$RESULT & DF$WEIGHT==1)), mean = 1, sd = 1)
DF[which(DF$RESULT & DF$WEIGHT==2), "VAR_3"]<-rnorm(length(which(DF$RESULT & DF$WEIGHT==2)), mean = 1, sd = 2)
DF[which(!DF$RESULT & DF$WEIGHT==1), "VAR_3"]<-rnorm(length(which(!DF$RESULT & DF$WEIGHT==1)), mean = 2, sd = 1)
DF[which(!DF$RESULT & DF$WEIGHT==2), "VAR_3"]<-rnorm(length(which(!DF$RESULT & DF$WEIGHT==2)), mean = 2, sd = 2)
TRAIN<-as.h2o(DF, "TRAIN")
AUTOML<-h2o.automl(project_name = "ERROR_TEST",
training_frame = "TRAIN",
y="RESULT",
weights_column = "WEIGHT",
stopping_metric = "logloss",
modeling_plan = list(list(name="XGBoost", alias='defaults'), "StackedEnsemble"),
sort_metric = "logloss",
verbosity = "info")
print(AUTOML#leaderboard)

Error in setting max features parameter in Isolation Forest algorithm using sklearn

I'm trying to train a dataset with 357 features using Isolation Forest sklearn implementation. I can successfully train and get results when the max features variable is set to 1.0 (the default value).
However when max features is set to 2, it gives the following error:
ValueError: Number of features of the model must match the input.
Model n_features is 2 and input n_features is 357
It also gives the same error when the feature count is 1 (int) and not 1.0 (float).
How I understood was that when the feature count is 2 (int), two features should be considered in creating each tree. Is this wrong? How can I change the max features parameter?
The code is as follows:
from sklearn.ensemble.iforest import IsolationForest
def isolation_forest_imp(dataset):
estimators = 10
samples = 100
features = 2
contamination = 0.1
bootstrap = False
random_state = None
verbosity = 0
estimator = IsolationForest(n_estimators=estimators, max_samples=samples, contamination=contamination,
max_features=features,
bootstrap=boostrap, random_state=random_state, verbose=verbosity)
model = estimator.fit(dataset)
In the documentation it states:
max_features : int or float, optional (default=1.0)
The number of features to draw from X to train each base estimator.
- If int, then draw `max_features` features.
- If float, then draw `max_features * X.shape[1]` features.
So, 2 should mean take two features and 1.0 should mean take all of the features, 0.5 take half and so on, from what I understand.
I think this could be a bug, since, taking a look in IsolationForest's fit:
# Isolation Forest inherits from BaseBagging
# and when _fit is called, BaseBagging takes care of the features correctly
super(IsolationForest, self)._fit(X, y, max_samples,
max_depth=max_depth,
sample_weight=sample_weight)
# however, when after _fit the decision_function is called using X - the whole sample - not taking into account the max_features
self.threshold_ = -sp.stats.scoreatpercentile(
-self.decision_function(X), 100. * (1. - self.contamination))
then:
# when the decision function _validate_X_predict is called, with X unmodified,
# it calls the base estimator's (dt) _validate_X_predict with the whole X
X = self.estimators_[0]._validate_X_predict(X, check_input=True)
...
# from tree.py:
def _validate_X_predict(self, X, check_input):
"""Validate X whenever one tries to predict, apply, predict_proba"""
if self.tree_ is None:
raise NotFittedError("Estimator not fitted, "
"call `fit` before exploiting the model.")
if check_input:
X = check_array(X, dtype=DTYPE, accept_sparse="csr")
if issparse(X) and (X.indices.dtype != np.intc or
X.indptr.dtype != np.intc):
raise ValueError("No support for np.int64 index based "
"sparse matrices")
# so, this check fails because X is the original X, not with the max_features applied
n_features = X.shape[1]
if self.n_features_ != n_features:
raise ValueError("Number of features of the model must "
"match the input. Model n_features is %s and "
"input n_features is %s "
% (self.n_features_, n_features))
return X
So, I am not sure on how you can handle this. Maybe figure out the percentage that leads to just the two features you need - even though I am not sure it'll work as expected.
Note: I am using scikit-learn v.0.18
Edit: as #Vivek Kumar commented this is an issue and upgrading to 0.20 should do the trick.

Eliminate matrix header in esttab (latex) or outtable output

I have a matrix of values (very non-standard summary statistics) that I want to pass from Stata to LaTeX. The command:
esttab matrix(matname) using $myfilename.tex, replace booktabs f
gives the matrix in LaTeX form but also gives the title of the matrix within the fragment. The same is true for:
outtable using myfilename, mat(matname) replace nobox
Currently, every time I rerun my Stata do file I have to go and edit myfilename.tex by hand.
Is there any way to non-manually remove the matrix name from the Stata to LaTeX output?
I tried using the option noheader, which works here:
matrix list matname, noheader
but doesn't seem to be active in esttab or outtable. It also occurred to me that if I could find a way to ask LaTex to just \input PART of the fragment file (lines 2 onward) that would work...
I think the nomtitles option will work. Here's reproducible example:
sysuse auto
reg price trunk headroom
matrix myMat = e(V)
esttab matrix(myMat) using temp.tex, replace booktabs f nomtitles
This produces the text (.tex) file below:
& trunk& headroom& \_cons\\
\midrule
trunk & 10557.96& -35339.31& -39464.18\\
headroom & -35339.31& 269901.5& -321726.7\\
\_cons & -39464.18& -321726.7& 1612951\\
Also, I used the following outtable command
outtable using "./temp", mat(myMat) replace center f(%9.2f) nobox
to produce this output:
% matrix: myMat file: ./temp.tex 10 Jun 2016 12:55:35
\begin{table}[htbp]
\begin{tabular}{lccc} \hline \hline
& trunk & headroom & cons \\ \hline
trunk & 10557.96 \\
headroom & -35339.31 & 269901.52 \\
cons & -39464.18 & -3.22e+05 & 1.61e+06 \\
\hline \hline \end{tabular}
\end{table}
While the matrix name is present in the output, it is commented out and so will not appear in the latex document.

Simple debugging in Haskell

I am new to Haskell. Previously I have programmed in Python and Java. When I am debugging some code I have a habit of littering it with print statements in the middle of code. However doing so in Haskell will change semantics, and I will have to change my function signatures to those with IO stuff. How do Haskellers deal with this? I might be missing something obvious. Please enlighten.
Other answers link the official doco and the Haskell wiki but if you've made it to this answer let's assume you bounced off those for whatever reason. The wikibook also has an example using Fibonacci which I found more accessible. This is a deliberately basic example which might hopefully help.
Let's say we start with this very simple function, which for important business reasons, adds "bob" to a string, then reverses it.
bobreverse x = reverse ("bob" ++ x)
Output in GHCI:
> bobreverse "jill"
"llijbob"
We don't see how this could possibly be going wrong, but something near it is, so we add debug.
import Debug.Trace
bobreverse x = trace ("DEBUG: bobreverse" ++ show x) (reverse ("bob" ++ x))
Output:
> bobreverse "jill"
"DEBUG: bobreverse "jill"
llijbob"
We are using show just to ensure x is converted to a string correctly before output. We also added some parenthesis to make sure the arguments were grouped correctly.
In summary, the trace function is a decorator which prints the first argument and returns the second. It looks like a pure function, so you don't need to bring IO or other signatures into the functions to use it. It does this by cheating, which is explained further in the linked documentation above, if you are curious.
Read this. You can use Debug.Trace.trace in place of print statements.
I was able to create a dual personality IO / ST monad typeclass, which will print debug statements when a monadic computation is typed as IO, them when it's typed as ST. Demonstration and code here: Haskell -- dual personality IO / ST monad? .
Of course Debug.Trace is more of a swiss army knife, especially when wrapped with a useful special case,
trace2 :: Show a => [Char] -> a -> a
trace2 name x = trace (name ++ ": " ++ show x) x
which can be used like (trace2 "first arg" 3) + 4
edit
You can make this even fancier if you want source locations
{-# LANGUAGE TemplateHaskell #-}
import Language.Haskell.TH
import Language.Haskell.TH.Syntax as TH
import Debug.Trace
withLocation :: Q Exp -> Q Exp
withLocation f = do
let error = locationString =<< location
appE f error
where
locationString :: Loc -> Q Exp
locationString loc = do
litE $ stringL $ formatLoc loc
formatLoc :: Loc -> String
formatLoc loc = let file = loc_filename loc
(line, col) = loc_start loc
in concat [file, ":", show line, ":", show col]
trace3' (loc :: String) msg x =
trace2 ('[' : loc ++ "] " ++ msg) x
trace3 = withLocation [| trace3' |]
then, in a separate file [from the definition above], you can write
{-# LANGUAGE TemplateHaskell #-}
tr3 x = $trace3 "hello" x
and test it out
> tr3 4
[MyFile.hs:2:9] hello: 4
You can use Debug.Trace for that.
I really liked Dons short blog about it:
https://donsbot.wordpress.com/2007/11/14/no-more-exceptions-debugging-haskell-code-with-ghci/
In short: use ghci, example with a program with code called HsColour.hs
$ ghci HsColour.hs
*Main> :set -fbreak-on-exception
*Main> :set args "source.hs"
Now run your program with tracing on, and GHCi will stop your program at the call to error:
*Main> :trace main
Stopped at (exception thrown)
Ok, good. We had an exception… Let’s just back up a bit and see where we are. Watch now as we travel backwards in time through our program, using the (bizarre, I know) “:back” command:
[(exception thrown)] *Main> :back
Logged breakpoint at Language/Haskell/HsColour/Classify.hs:(19,0)-(31,46)
_result :: [String]
This tells us that immediately before hitting error, we were in the file Language/Haskell/HsColour/Classify.hs, at line 19. We’re in pretty good shape now. Let’s see where exactly:
[-1: Language/Haskell/HsColour/Classify.hs:(19,0)-(31,46)] *Main> :list
18 chunk :: String -> [String]
vv
19 chunk [] = head []
20 chunk ('\r':s) = chunk s -- get rid of DOS newline stuff
21 chunk ('\n':s) = "\n": chunk s
^^

Resources