stanford NLP training sentiment model - stanford-nlp

I'm working on the kaggle competition for Rotten Tomatoes NLP prediction.
The training set format was parsed as such:
PhraseId SentenceId Phrase Sentiment
1 1 A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story . 1
2 1 A series of escapades demonstrating the adage that what is good for the goose 2
However, the training set formula must look like:
(3 (2 (2 The) (2 Rock)) (4 (3 (2 is) (4 (2 destined) (2 (2 (2 (2 (2 to) (2 (2 be) (2 (2 the) (2 (2 21st) (2 (2 (2 Century) (2 's)) (2 (3 new) (2 (2 ``) (2 Conan)))))))) (2 '')) (2 and)) (3 (2 that) (3 (2 he) (3 (2 's) (3 (2 going) (3 (2 to) (4 (3 (2 make) (3 (3 (2 a) (3 splash)) (2 (2 even) (3 greater)))) (2 (2 than) (2 (2 (2 (2 (1 (2 Arnold) (2 Schwarzenegger)) (2 ,)) (2 (2 Jean-Claud) (2 (2 Van) (2 Damme)))) (2 or)) (2 (2 Steven) (2 Segal))))))))))))) (2 .)))
Here's a snippet of the python code that I'm using:
phrasefind=str(train['Phrase'][i])+" " or " "+str(train['Phrase'][i]) or str(train['Phrase'][i])
phrase=train['Phrase'][i]
sent=rreplace(sent,phrasefind,"("+str(train['Sentiment'][i])+" "+str(phrase)+") ",1)
with the result:
(1 (2 (2 (2 A) series) of (2 escapades) (2 (2 demonstrating) the adage) (2 that) (2 what) is good for the goose) (2 is) (2 also) (3 good) (2 for) (2 the) (2 gander) (2 ,) (2 (2 some) of which) (2 occasionally) (3 amuses) (2 but) (2 none) (2 of which) (2 amounts) (2 to) (2 much) (2 of) (2 a story) .)
However, the sentiment package from Stanford won't recognize this format (works fine for their train.txt)
It is throwing the error:
Exception in thread "main" java.lang.NumberFormatException: null
Suggestions?

I am currently learning how to train the model myself.
Looking at your train.txt the issue is down to you not scoring some of the words. I have just made these changes to your result and the command line is successfully adding it to my model:
(1 (2 (2 (2 A) series) (2 of) (2 escapades) (2 (2 demonstrating) (2 the) (2 adage)) (2 that) (2 what) (2 is) (3 good) (2 for) (2 the) (2 goose) (2 is) (2 also) (3 good) (2 for) (2 the) (2 gander) (2 ,) (2 (2 some) (2 of) (2 which)) (2 occasionally) (3 amuses) (2 but) (2 none) (2 of which) (2 amounts) (2 to) (2 much) (2 of) (2 a story) (2 .))

Related

Scheme making matrix

I am new to scheme and I'm having problems with matrices in Scheme. I need to create a function that takes one big and one small square matrices (with the condition: the small's length should be divisor of big one) and make a new matrix with doing an operation on the big one with small one. I've successfully split the big matrix to size that I wanted and I’m successfully operating on it to get the result.
Here is how I did it:
(define (matrix-op big small x y)
(if (< y (/ (length big) (length small))))
(if (< x (/ (length big) (length small)))
(cons (calculate (split-y (split-x big small x) small y) small)
(matrix-op big small (+ x 1) y))
(matrix-op big small 0 (+ y 1)) ; <- this is where i need to split
)
'()
)
)
My calculate function returns only 1 atomic value so when I run the function like this it gives me an output like '(val val val val), but what i want is formatting the output like '((val val) (val val)). How can I do it? Thanks in advance.
I realized that I couldn't explain the problem properly. What i want to have is a function that takes two different square matrices one big and one small, Splits the big one to same size as smaller one, operates on them to create a new matrix that has the size m/n if the big one is mxm and small one is nxn. Example:
big '( small '(
(8 0 3 1 5 3 2 2) (8 4)
(7 1 1 4 3 7 1 4) (9 5)
(1 3 7 4 3 6 6 3) )
(0 9 8 6 5 6 4 3)
(1 7 6 9 6 6 7 2)
(5 7 1 0 2 9 5 3)
(0 5 4 6 6 6 3 0)
(3 6 2 7 7 5 7 0)
)
I need to split big over the same size as small and calculate results like:
for x=0 y=0 part is '( calculate result is 5
(8 0)
(7 1)
)
for x=1 y=0 part is '( calculate result is 2
(3 1)
(1 4)
)
I actually did returned the results calculated but with the method i gave above my return was like '(5 2 4 2 2 6 4 4 4 3 5 4 2 4 6 3) but I wanted to return as:
'(
(5 2 4 2)
(2 6 4 4)
(4 3 5 4)
(2 4 6 3)
)
So how can I manage to split the return list where i want to split?
I think you are trying to do too much at once. It is always OK to split a bigger problem into a smaller problem.
If I understand yours, the idea is to take two square matrics, one of which may be some multiple of the other’s dimensions, and perform a pair-wise operation on the elements. For example:
'((1 2 3) '((1 2 3) '((7 7 7) '(( 8 9 10)
(4 5 6) + '((7)) --> (4 5 6) + (7 7 7) --> (11 12 13)
(7 8 9)) (7 8 9)) (7 7 7)) (14 15 16))
I will continue with the assumption that this is what is desired.
Notice that if the two matrices were the same size, a simple nested map would easily combine all elements. What is left is the problem of the different sizes.
Solve that and you are golden.
Recap:
(define (f op small-M big-M)
(f-apply-pairwise-op
op
(f-biggify small-M (/ (length big-M) (length small-M)))
big-M))
Now you have broken the problem into two smaller pieces:
(define (f-apply-pairwise-op op A B) ...) ; produces pairwise 'A op B'
(define (f-biggify M n) ...) ; tile M n times wider and taller
Good luck!

Preserving list structure with sorting a list of sublists in Lisp [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
So I have list structure as follows:
(defparameter *list* '( ((2 2 2) (0.1))
((5 5 5) (0.4))
((1 1 1) (1.2))
((3 3 3) (3.4))
((4 4 4) (4.5)) )
I want to sort it where it returns an output of
'( ((1 1 1) (1.2))
((2 2 2) (0.1))
((3 3 3) (3.4))
((4 4 4) (4.5)) )
So here is my attempt:
(sort *list*
#'(lambda (a b)
(< (squared a '(0 0 0))
(squared b '(0 0 0))))
:key #'first)
Where squared takes in two lists and calculates the squared distance of each element and sums them (ie (squared '(1 2 3) '(0 3 5)) => 48))
I am sorting the list of lists by its first element of the sublist '(# # #) and calculating the distance from '(0 0 0) then sorting by that distance.
But my attempt outputs the following => ((1) (1 1 1) (2) (3) (2 2 2) (4) (5) (3 3 3) (4 4 4) (5 5 5))
How do I sort by '(# # #) but also preserve the list structure? Also using Common Lisp!
Thank you!
EDIT
I had typed into lisp wrong but correctly into this forum. I had typed list as the following
(defparameter list '( (2 2 2) (0.1)
(5 5 5) (0.4)
(1 1 1) (1.2)
(3 3 3) (3.4)
(4 4 4) (4.5) ))
Careful: sort may destroy the input data. Your input as shown here contains literal data. Modifying literal data has undefined consequences. Use copy-tree or copy-list to create non-literal from literal data.
Actually my first attempt works! I just typed in list incorrectly (forgot some parenthesis). So it sorts and maintains the structure!

Implementation of Reverse algorithm LISP

I am trying to implement the reverse algorithm in LISP. I am fairly new to the language so any help would be appreciated.
I have the following code in LISP, it seems logical to me but it doesn't output anything when I run it on terminal. This is what my rev.lisp file looks like:
(defun rev (list)
(if (atom list)
(append (rev (cdr list))
(list (rev (car list))))))
I run it on my terminal as:
%clisp rev.lisp
%
I am expecting the output 6 5 4 3 2 but it doesn't return anything.
You already have an answer, but here are some remarks.
Depending on your background and preferences, recursive algorithms are sometimes best understood with a trace of execution.
The TRACE macro can help you debug your code.
The exact output varies among implementation (here, I am using SBCL).
Since I would like to show you how many times APPEND is called and because tracing standard functions in not allowed1, I am defining a simple function around it and redefining your code to use it:
(defun append* (&rest args) (apply #'append args))
The result of TRACE is the following:
CL-USER> (rev '(2 3 4 5 6))
0: (REV (2 3 4 5 6))
1: (REV (3 4 5 6))
2: (REV (4 5 6))
3: (REV (5 6))
4: (REV (6))
5: (REV NIL)
5: REV returned NIL
5: (REV 6)
5: REV returned 6
5: (APPEND* NIL (6))
5: APPEND* returned (6)
4: REV returned (6)
4: (REV 5)
4: REV returned 5
4: (APPEND* (6) (5))
4: APPEND* returned (6 5)
3: REV returned (6 5)
3: (REV 4)
3: REV returned 4
3: (APPEND* (6 5) (4))
3: APPEND* returned (6 5 4)
2: REV returned (6 5 4)
2: (REV 3)
2: REV returned 3
2: (APPEND* (6 5 4) (3))
2: APPEND* returned (6 5 4 3)
1: REV returned (6 5 4 3)
1: (REV 2)
1: REV returned 2
1: (APPEND* (6 5 4 3) (2))
1: APPEND* returned (6 5 4 3 2)
0: REV returned (6 5 4 3 2)
(6 5 4 3 2)
Basics
First off, we see that REV is sometimes called on ATOM elements. Even though your implementation unwrap elements with CAR and wrap them back again with LIST, it makes little sense to do so. Reversing a list is a function that is applied on lists, and if you happen to pass a non-list argument, it should raise a red flag in your head. In order to build a recursive function for lists, it is generally sufficient to focus on the recursive definition of the datatype.
The LIST type is defined in Lisp as (OR NULL CONS), which is the union of the NULL type and the CONS type. In other words, a list is either empty or a cons-cell.
There are many ways to distinguish between both cases which differs mostly in style. Following the above approach with types, you can use ETYPECASE, which dispatches on the type of its argument and signals an error if no clause matches:
(defun rev (list)
(etypecase list
(null <empty>)
(cons <non-empty> )))
You could also use ENDP.
The reverse of an empty list is an empty list, and you are in the case where you can simply use WHEN and focus and the non-empty case:
(defun rev (list)
(when list
<non-empty>))
Above, we don't check that LIST is a cons-cell, it could be anything. However, the way we use it below can only apply on such objects, which means that runtime checks will detect errorneous cases early enough.
(defun rev (list)
(when list
(append* (rev (rest list))
(list (first list)))))
The above is quite similar to your code, except that I don't call REV on the first element. Also, I use FIRST and REST instead of CAR and CDR, because even though they are respective synonyms, the former better convey the intent of working with lists (this is subjective of course, but most people follow this rule).
Append
What the trace above shows that you might have missed when only reading the code is that APPEND is called for all intermediate lists. This is quite wasteful in terms of memory and processing, since APPEND necessarily has to traverse all elements to copy them in a fresh list. If you call APPEND n times, as you are doing since you iterate over a list of n elements, you end up with a quadratic algorithm (n2).
You can solve the memory issues by reusing the same intermediate list with NCONC in place of APPEND. You still have to iterate many times this list, but at least the same underlying cons cells are reused. Typically, a recursive reverse is written with an additional parameter, an accumulator, which is used to store intermediate results and return it at the deepest level:
(defun reverse-acc (list acc)
(etypecase list
;; end of input list, return accumulator
(null acc)
;; general case: put head of input list in front
;; of current accumulator and call recursively with
;; the tail of the input list.
(cons (reverse-acc (rest list)
(cons (first list) acc)))))
The above example is called with an empty accumulator. Even though it could be possible to let this function accessible directly to users, you might prefer to hide this implementation detail and export only a function with a single argument:
(defun rev (list) (reverse-acc list nil))
(trace rev reverse-acc)
0: (REV (2 3 4 5 6))
1: (REVERSE-ACC (2 3 4 5 6) NIL)
2: (REVERSE-ACC (3 4 5 6) (2))
3: (REVERSE-ACC (4 5 6) (3 2))
4: (REVERSE-ACC (5 6) (4 3 2))
5: (REVERSE-ACC (6) (5 4 3 2))
6: (REVERSE-ACC NIL (6 5 4 3 2))
6: REVERSE-ACC returned (6 5 4 3 2)
5: REVERSE-ACC returned (6 5 4 3 2)
4: REVERSE-ACC returned (6 5 4 3 2)
3: REVERSE-ACC returned (6 5 4 3 2)
2: REVERSE-ACC returned (6 5 4 3 2)
1: REVERSE-ACC returned (6 5 4 3 2)
0: REV returned (6 5 4 3 2)
(6 5 4 3 2)
The shape of the trace is typical of recursive functions for which tail-call elimination is possible. Indeed, the recursive invocation of REVERSE-ACC inside itself directly returns the result we want and thus no intermediate memory is required to store and process intermediate result. However, Common Lisp implementations are not required by the standard to eliminate recursive calls in tail position and the actual behavior of a specific implementation might even depend on optimization levels. A conforming program thus cannot assume the control stack won't ever grow linearly with the size of the list.
Recursivity is best used on certain kinds of problems which are recursive by nature and where the height of the stack doesn't grow so fast w.r.t. the input. For iteration, use control structures like DO, LOOP, etc. In the following example, I used DOLIST in order to PUSH elements inside a temporary RESULT list, which is return at the end of DOLIST:
(defun rev (list)
(let ((result '()))
(dolist (e list result)
(push e result))))
The trace is:
0: (REV (2 3 4 5 6))
0: REV returned (6 5 4 3 2)
(6 5 4 3 2)
1. 11.1.2.1.2 Constraints on the COMMON-LISP Package for Conforming Programs
You are not printing anything, so you are not seeing anything.
Replace (rev '(2 3 4 5 6)) with (print (rev '(2 3 4 5 6))) and you will see (6 5 4 3 2) on the screen.

How to reduce the number of independent variables in mathematica

I am not srue whether this is really a mathematical question, or actually a mathematica question. :D
suppose I have a matrix
{{4/13 + (9 w11)/13 + (6 w12)/13,
6/13 + (9 w21)/13 + (6 w22)/13}, {-(6/13) + (6 w11)/13 + (4 w12)/
13, -(9/13) + (6 w21)/13 + (4 w22)/13}}
with w11, w12, w21, w22 as free parameters.
And I know by visual inspection that 3*w11+2*w12 can be represented as one variable, and 3*w21+2*w22 can be represented as another. So essentially this matrix only has two independent variables. Given any matrix of this form, is there any method to automatically reduce the number of independent variables? I guess I am stuck at formulating this in a precise mathematical way.
Please share your thoughts. Many thanks.
Edit:
My question is really the following.
Given matrix like this
{{4/13 + (9 w11)/13 + (6 w12)/13,
6/13 + (9 w21)/13 + (6 w22)/13}, {-(6/13) + (6 w11)/13 + (4 w12)/
13, -(9/13) + (6 w21)/13 + (4 w22)/13}}
or involving some other symbolical constants
{{a+4/13 + (9 w11)/13 + (6 w12)/13,
6/13*c + (9 w21)/13 + (6 w22)/13}, {-(6/13)/d + (6 w11)/13 + (4 w12)/
13, -(9/13) + (6 w21)/13 + (4 w22)/13}}
I want to use mathematica to automatically identify the number n of independent variables (in this case is 2), and then name these independent varirables y1, y2, ..., yn, and then re-write the matrix in terms of y1, y2, ..., yn instead of w11, w12, w21, w22.
Starting with
mat = {{4/13 + (9 w11)/13 + (6 w12)/13,6/13 + (9 w21)/13 + (6 w22)/13},
{-(6/13) + (6 w11)/13 + (4 w12)/13, -(9/13) + (6 w21)/13 + (4 w22)/13}};
Form a second matrix, of indeterminates, same dimensions.
mat2 = Array[y, Dimensions[mat]];
Now consider the polynomial (actually linear) system formed by setting mat-mat2==0. We can eliminate the original variables and look for dependencies amongst the new ones. Could use Eliminate; I'll show with GroebnerBasis.
GroebnerBasis[Flatten[mat - mat2], Variables[mat2], Variables[mat]]
Out[59]= {-3 + 2 y[1, 2] - 3 y[2, 2], -2 + 2 y[1, 1] - 3 y[2, 1]}
So we get a pair of explicit relations between the original matrix elements.
---edit---
You can get expressions for the new variables that clearly indicates the dependency of two of them on the other two. To do this, form the Groebner basis and use it in polynomial reduction.
gb = GroebnerBasis[Flatten[mat - mat2], Variables[mat2], Variables[mat]];
vars = Flatten[mat2];
PolynomialReduce[vars, gb, vars][[All, 2]]
Out[278]= {1 + 3/2 y[2, 1], 3/2 + 3/2 y[2, 2], y[2, 1], y[2, 2]}
---end edit---
Daniel Lichtblau
Wolfram Research

Efficient program to print/return all increasing subsequences of size 3 in an array

Given an array like
1, 6, 5, 2, 3, 4
we need to print
1 2 3
1 3 4
1 2 4
2 3 4
What is the best way to do this?
Is this dynamic programming?
Is there a better way to do than the bruteforce O(n3)? I am sure there is.
The reason I say dynamic programming is because I can see this as something like
for '1' (print all results of sub problem of the rest of the array with subsequences of size 2).
for '2' (print all results of sub problems of the rest of the array with subseqences of size 2)
and go on like this.
However, there is a lot of overlap in the above two results, so we need to find an efficient way of reusing that, I guess.
Well, these are just random thoughts. You can correct me with the right appraoch.
OK, let me correct, if not print, I need the different increasing sequences returned. My point is, I need to find an approach to get to these sequences in the most efficient way.
You can walk through the array and remember what partial sequences are possible until the current point. Print and forget any sequences that reach length 3.
Example:
(1 6 5 2 3 4)
^
remember ((1))
(1 6 5 2 3 4)
^
remember ((1) (1 6) (6))
(1 6 5 2 3 4)
^
remember ((1) (1 6) (6) (1 5) (5))
(1 6 5 2 3 4)
^
remember ((1) (1 6) (6) (1 5) (5) (1 2) (2))
(1 6 5 2 3 4)
^
remember ((1) (1 6) (6) (1 5) (5) (1 2) (2) (1 3) (1 2 3) (2 3) (3))
print and forget (1 2 3)
remember ((1) (1 6) (6) (1 5) (5) (1 2) (2) (1 3) (2 3) (3))
(1 6 5 2 3 4)
^
remember ((1) (1 6) (6) (1 5) (5) (1 2) (2) (1 3) (2 3) (3) (1 4) (1 2 4) (2 4)
(1 3 4) (2 3 4) (3 4) (4))
print and forget (1 2 4)
print and forget (1 3 4)
print and forget (2 3 4)
done.
The challenge seems to lie in the choice of an appropriate data structure for the remembered subsequences.
In the generalized case you have to calculate the complexity based on two things:
1- Count of input numbers (I will call it b)
2- Length of output (I will call it d)
A generalized method that I can think of, is to construct an analogous graph to the problem in O(n^2):
If a larger number comes after a smaller number, There is a directed edge from smaller one to it.
Now in order to find all sequences of length d, You need to start from each number and output all paths of length (d - 1).
If you use a traversal method like BFS the complexity will be less than O(d x (b ^ (d - 1))).
However you can use adjacent matrix multiplication to find the paths of length d, which will bring the complexity down to something less than O((d - 2) x (b ^ 3)). (Nth power of an adjacency matrix will tell you how many paths exist from each node to another with length of N).
There are algorithms to reduce square matrix multiplication complexity a bit.
Create a list of ordered pairs (a,b) such that a<b and Index(a) < Index(b). O(n^2)
Sort this list (on either a or b -- doesn't matter) in O(n^2log(n)). Can be made O(nlog(n)) depending on data structure.
For each element in the list, find all matching ordered pairs using binary search -- worst case O(n^3log(n)), average case O(n^2log(n))

Resources