Stanford NLP returns : instead of NNP - stanford-nlp

I am using Stanford NLP Parser (http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordParser.html) to analyze sentences.
The problem is that there is an discrepancy betwwen the results from the library and the results from online demo page(http://nlp.stanford.edu:8080/parser/index.jsp)
The problem is with the following sentence:
the stage. Plus, he
When I run this online this is the output that I get:
(ROOT (NP (DT the) (NN stage) (. .)))
(ROOT (NP
(NP (NNP Plus))
(, ,)
(NP (PRP he))))
Please note that the Plus is identified as NNP
The problem is that the same sentence yields a slightly different output when processed by the lib:
{(ROOT (FRAG (FRAG (NP (DT the) (NN stage)) (. .)) (: Plus) (FRAG (,
,) (NP (PRP he)))))}
As you can see, the Plusis now identified as :
How do I force the lib to return NNP?
Here is the code:
var sent = "the stage. Plus, he";
var lp = LexicalizedParser.loadModel(modelsDirectory + #"\lexparser\englishPCFG.ser.gz");
var f = PTBTokenizer.factory(new CoreLabelTokenFactory(), "");
var s = new java.io.StringReader(sent);
var w = f.getTokenizer(s).tokenize();
s.close();
var t = lp.apply(w);

Are you tokenizing on whitespace (the tokenize.whitespace property)? It seems like in your second example, the library is not splitting the passage into two sentences, which yields a different parse.
Also, more broadly, what are you trying to do here? These are sufficiently ungrammatical sentences that the parse is near meaningless (and the parser is near guaranteed to mess up), and the NNP tag on 'Plus' is also a mistake.

Related

What does the ' represent in '+ in racket?

I have been banging around on google and drRacket trying to understand what the apostrophe ' before a procedure means in racket and how I could remove it. What I'm trying to do is take a + from inside a list i.e. '(+ 1 2). However, every time I do something like (first x) (where x is the list in the example) I receive '+ instead of just + (notice the apostrophe). How can I remove the apostrophe and what is its purpose?
The ' apostrophe, pronounced quote, mean that the stuff inside will be interpreted as data for an s-expression, not evaluated as code.
'x, 'hello, and '+ are all symbols, which are like strings with unique-identity properties. They contain only text, not "meaning", so the '+ symbol does not contain a reference to the + function.
If you use parentheses under a ' quote, it will create a list with all the elements also ' quoted. In other words, '(x y z) is equivalent to (list 'x 'y 'z). You can think of this as quote "distributing itself" over all the elements inside it.
In your case, '(+ 1 2) is equivalent to (list '+ '1 '2), which is the same as (list '+ 1 2) because numbers are already literal data.
In most cases, the best way to get rid of a ' quote is to not add one on the outside in the first place. Instead of '(+ 1 2) you could use list: (list + 1 2), or the more advanced forms ` quasiquote and , unquote: `(,+ 1 2). In either of these cases, the + never gets put under a quote in the first place. It never becomes a symbol like '+. The + outside of any quote has meaning as the addition function.
In other cases, you can't avoid having the symbol '+ because it comes from intrinsically textual data. In this case you can assign meaning to it with an interpreter. Somewhere in that interpreter you might want code like this
(match sym ['+ +] ['- -] ['* *] ['/ /] [_ (error "unrecognized symbol")])
Something is needed to assign meaning externally, because the symbol '+ does not have that meaning internally. You can either define the interpreter yourself or use an existing one such as eval, as long as all the meanings in the interpreter correspond exactly to what you intend.

Use variable padding in iteration directive in FORMAT

Is there a way to do something like the following?
(format t "~{~va~}" '("aa" "bb" "cc") 4)
I need to iterate through a list. Each element of that list should be padded with a variable number of spaces (specified at runtime, so I cannot use "~4a").
Or more generally, is there a way to refer to a specific argument in the argument list of FORMAT?
By nesting format function, you can do what you want.
(format t (format nil "~~{~~~Aa~~}" 4) '("aa" "bb" "cc"))
;; returns: aa bb cc
Here the inner format directive:
The nil as first argument, format returns a string.
(format nil "~~{~~~Aa~~}" 4)
;; returns: "~{~4a~}" - and this is exactly what you want to give
;; to the outer `format` as second argument!
You can of course write a function for this:
(defun format-by-padding-over (lst padding)
(format t (format nil "~~{~~~Aa~~}" padding) lst))
And then:
(format-by-padding-over '("aa" "bb" "cc") 4)
;; aa bb cc
;; NIL
I learned this trick here from #Sylwester (many thanks!).
You could also interleave the list with repetitions of the padding:
(format t "~{~va~}"
(mapcan (lambda (element)
(list 4 element))
list))
You can build the format control string using nested format functions, but then you have to take care about escaping tildes. When working with regular expressions (using CL-PPCRE), one can define regular expressions using trees, like (:alternation #\\ #\*), which helps preventing bugs and headaches related to escaping special characters. The same can be done with format strings, using format-string-builder, available in Quicklisp:
(lambda (v)
(make-format-string `((:map () (:str ,v)))))
Returns a closure, which can be used to build format strings:
(funcall * 10)
=> "~{~10a~}"

How to extract the last character of a string of unknown length?

I am writing a function that takes stringA and stringB as parameters and compares the first character of stringB with the last character of StringA. If they are equal, then the function returns true, else false is returned.
I have nearly the whole function ready, however I can't find a way to take the last character of stringA because its length is unknown. I checked the documentation and I found nothing. Any suggestions?
(cond
[(string=? (substring stringA ???) (substring stringB 0 2))"True"]
[else "False"])
You can get the last character position of a string using string-length (or rather one less than):
(string-ref str (sub1 (string-length str)))
Note that a character is different from a string of length 1. Thus the correct way to extract a character is with string-ref or the like, rather than substring.
It seems Chris answered your question. Just a reminder, to use the string-ref, which returns a character, you should use the comparison function char=? (or equal?).
I'd like to add another solution which I find more elaborate, but requires to download a collection from the planet racket (after installing package collections). Using the collections package, you can use the same function with any collection rather then just strings, using the (last ..) and (first ..) functions of the module.
(require data/collection)
(let ([stringA "abcd"]
[stringB "dcba"])
(cond
[(equal? (last stringA)
(first stringB)) "True"]
[else "False"]))
You could also use the SRFI-13 function string-take-right, which returns the last n characters of the argument string as a string.
every language has a length function for a string. in Racket I found this :
https://docs.racket-lang.org/reference/strings.html#%28def.%28%28quote.~23~25kernel%29._string-length%29%29
there is this : string-length str
so just run that it will give you the length and then you can extract the last character

Emacs Lisp: getting ascii value of character

I'd like to translate a character in Emacs to its numeric ascii code, similar to casting char a = 'a'; int i = (int)a in c. I've tried string-to-number and a few other functions, but none seem to make Emacs read the char as a number in the end.
What's the easiest way to do this?
To get the ascii-number which represents the character --as Drew said-- put a question mark before the character and evaluate that expression
?a ==> 97
Number appears in minibuffer, with C-u it's written behind expression.
Also the inverse works
(insert 97) will insert an "a" in the buffer.
BTW In some cases the character should be quoted
?\" will eval to 34
A character is a whole number in Emacs Lisp. There is no separate character data type.
Function string-to-char is built-in, and does what you want. (string-to-char "foo") is equivalent to (aref "foo" 0), which is #abo-abo's answer --- but it is coded in C.
String is an array.
(aref "foo" 0)

Regular expressions for password validation missing work

I need to validate a password that matches the following criteria:
Must be at least 6 characters long (?=.{6})
String contains a numbers(0-9) and least 1 uppercase letter(A-Z) and least 1 (-) character.
String not start and end with a - character (invalid : -ABCDE or ABCDE- etc..)
valid Strings
A-BCDE
ABC-DE
1B-CDE
1-BCDE
AB-CD1
ABCD-1
my regex
^.(?=.{6})(?=.\d)(?=.[A-Z])(?=.[-]).*$
demo url :
http://www.rubular.com/r/YHdPCjSW6P
invalid strings
ABCDEF (No - character must be one of at least 1 -)
-ABCDE (- the first character can not be)
ABCDE- (- the last character can not be)
A-BC-D (- can not be more than 1)
Would that work for you ?
^.*(?=.{6})(?=[^\-].*[a-zA-Z])[a-zA-Z0-9\-]{1,5}[^\-]$
See example here http://www.rubular.com/r/spfqXIVZyX
valid Strings
A-BCDE
ABC-DE
1B-CDE
1-BCDE
AB-CD1
ABCD-1
invalid strings
ABCDEF (No - character must be one of at least 1 -)
-ABCDE (- the first character can not be)
ABCDE- (- the last character can not be)
A-BC-D (- can not be more than 1)
my regex now
^.(?=.{6})(?=[^-].[a-zA-Z])[a-zA-Z0-9-]{1,5}[^-]$
demo url :
http://www.rubular.com/r/3Q6Ozs4aVB

Resources