I'm trying to write a JavaCC script for a (simple) XPath parser and I'm having problems with the part to parse individual steps.
My idea of the grammar is this:
Step ::= ( AxisName "::" )? NodeTest ( "[" Predicate "]" )*
I have transformed it into the following script snippet:
Step Step() :
{
Token t;
Step step;
Axis axis;
NodeTest nodeTest;
Expression predicate;
}
{
{ axis = Axis.child; }
(
t = <IDENTIFIER>
{ axis = Axis.valueOf(t.image); }
<COLON>
<COLON>
)?
t = <IDENTIFIER>
{ nodeTest = new NodeNameTest(t.image); }
{ step = new Step(axis, nodeTest); }
(
<OPEN_PAR>
predicate = Expression()
{ step.addPredicate(predicate); }
<CLOSE_PAR>
)*
{ return step; }
}
This, however, doesn't work. Given the following expression:
p
it throws the following error:
Exception in thread "main" java.lang.IllegalArgumentException: No enum constant cz.dusanrychnovsky.generator.expression.Axis.p
at java.lang.Enum.valueOf(Unknown Source)
at cz.dusanrychnovsky.generator.expression.Axis.valueOf(Axis.java:3)
at cz.dusanrychnovsky.generator.parser.XPathParser.Step(XPathParser.java:123)
at cz.dusanrychnovsky.generator.parser.XPathParser.RelativeLocationPath(XPathParser.java:83)
at cz.dusanrychnovsky.generator.parser.XPathParser.AbsoluteLocationPath(XPathParser.java:66)
at cz.dusanrychnovsky.generator.parser.XPathParser.Start(XPathParser.java:23)
at cz.dusanrychnovsky.generator.parser.XPathParser.parse(XPathParser.java:16)
at cz.dusanrychnovsky.generator.Main.main(Main.java:24)
I believe that what happens is that the parser sees an identifier on the input so it takes the axis branch even though no colons will follow, which the parser cannot know at that time.
What is the best way to fix this? Should I somehow increase the lookahead value for the Step rule, and if that's the case, then how exactly would I do that? Or do I need to rewrite the rule somehow?
Two choices:
( LOOKAHEAD(3)
t = <IDENTIFIER>
{ axis = Axis.valueOf(t.image); }
<COLON>
<COLON>
)?
or
( LOOKAHEAD( <IDENTIFIER> <COLON> <COLON> )
t = <IDENTIFIER>
{ axis = Axis.valueOf(t.image); }
<COLON>
<COLON>
)?
Related
I'm working/testing streams in Java8 and come across very frustrating issue.
I've got the code which compiles well:
List<String> words = Arrays.asList("Oracle", "Java", "Magazine");
List<String> wordLengths = words.stream().map((x) -> x.toUpperCase())
.collect(Collectors.toList());
And second one (nearly the same) which throw a warnings:
List<String> words = Arrays.asList("Oracle", "Java", "Magazine");
List<String> wordLengths = words.stream().map((x) -> {
x.toUpperCase();
}).collect(Collectors.toList());
Warning:
The method map(Function<? super String,? extends R>) in the type Stream<String> is not applicable for the arguments ((<no type> x) -> {})
What does this additional brackets have changed?
Your lambda expression returns a value. If you use brackets you need to add a return statement to your lambda function:
List<String> words = Arrays.asList("Oracle", "Java", "Magazine");
List<String> wordLengths = words.stream().map((x) -> {
return x.toUpperCase();
}).collect(Collectors.toList());
According to the official Oracle tutorial
A lambda expression consists of the following:
A comma-separated list of formal parameters enclosed in parentheses.
The CheckPerson.test method contains one parameter, p, which
represents an instance of the Person class.
Note: You can omit the data type of the parameters in a lambda
expression. In addition, you can omit the parentheses if there is only
one parameter. For example, the following lambda expression is also
valid:
p -> p.getGender() == Person.Sex.MALE
&& p.getAge() >= 18
&& p.getAge() <= 25
The arrow token, ->
A body, which consists of a single expression or a statement block.
This example uses the following expression:
p.getGender() == Person.Sex.MALE
&& p.getAge() >= 18
&& p.getAge() <= 25
If you specify a single expression, then the Java runtime evaluates
the expression and then returns its value. Alternatively, you can use
a return statement:
p -> {
return p.getGender() == Person.Sex.MALE
&& p.getAge() >= 18
&& p.getAge() <= 25;
}
A return statement is not an expression; in a lambda expression, you
must enclose statements in braces ({}). However, you do not have to
enclose a void method invocation in braces. For example, the following
is a valid lambda expression:
email -> System.out.println(email)
Since there is only one parameter in the provided lambda expression (x) -> x.toUpperCase() we can omit the parentheses: x -> x.toUpperCase(). String#toUpperCase returns a new String so there is no need to use return statement and braces. If instead we had a complex block with return statements we would have to enclose it into braces. Moreover in this case it is better to use Method Reference String::toUpperCase
List<String> wordLengths = words.stream().map(String::toUpperCase).collect(Collectors.toList());
I am using Stanford coreNLP to parse some text. I get multiple sentences. On these sentences I managed to extract Noun Phrases using TregexPattern. So I get a child Tree that is my Noun Phrase. I also managed to figure out the Head of the noun phrase.
How is it possible to get the position or even the token/coreLabel of that Head in the sentence?
Even better, how is it possible to find the dependency relationships of the Head to the rest of the sentence?
Here's an example :
public void doSomeTextKarate(String text){
Properties props = new Properties();
props.put("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
this.pipeline = pipeline;
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
SemanticGraph basicDeps = sentence.get(BasicDependenciesAnnotation.class);
Collection<TypedDependency> typedDeps = basicDeps.typedDependencies();
System.out.println("typedDeps ==> "+typedDeps);
SemanticGraph collDeps = sentence.get(CollapsedDependenciesAnnotation.class);
SemanticGraph collCCDeps = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
List<CoreMap> numerizedTokens = sentence.get(NumerizedTokensAnnotation.class);
List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
sentenceTree.percolateHeads(headFinder);
Set<Dependency<Label, Label, Object> > sentenceDeps = sentenceTree.dependencies();
for (Dependency<Label, Label, Object> dependency : sentenceDeps) {
System.out.println("sentence dep = " + dependency);
System.out.println(dependency.getClass() +" ( " + dependency.governor() + ", " + dependency.dependent() +") " );
}
//find nounPhrases in setence
TregexPattern pat = TregexPattern.compile("#NP");
TregexMatcher matcher = pat.matcher(sentenceTree);
while (matcher.find()) {
Tree nounPhraseTree = matcher.getMatch();
System.out.println("Found noun phrase " + nounPhraseTree);
nounPhraseTree.percolateHeads(headFinder);
Set<Dependency<Label, Label, Object> > npDeps = nounPhraseTree.dependencies();
for (Dependency<Label, Label, Object> dependency : npDeps ) {
System.out.println("nounPhraseTree dep = " + dependency);
}
Tree head = nounPhraseTree.headTerminal(headFinder);
System.out.println("head " + head);
Set<Dependency<Label, Label, Object> > headDeps = head.dependencies();
for (Dependency<Label, Label, Object> dependency : headDeps) {
System.out.println("head dep " + dependency);
}
//QUESTION :
//How do I get the position of "head" in tokens or numerizedTokens ?
//How do I get the dependencies where "head" is involved in typedDeps ?
}
}
}
In other words I would like to query for ALL dependency relationships where the "head" word/token/label is involved in the ENTIRE sentence. So I thought I needed to figure out the position of that token in the sentence to correlate it with the typed dependencies but mybe there is some easier way ?
Thanks in advance.
[EDIT]
So I might have found an answer or the beginning of it.
If I call .label() on head I get myself a CoreLabel which is pretty much what I needed to find the rest. I can now iterate over the typed dependencies and search for dependencies where either the dominator label or dependent label has the same index as my headLabel.
Tree nounPhraseTree = matcher.getMatch();
System.out.println("Found noun phrase " + nounPhraseTree);
nounPhraseTree.percolateHeads(headFinder);
Tree head = nounPhraseTree.headTerminal(headFinder);
CoreLabel headLabel = (CoreLabel) head.label();
System.out.println("tokens.contains(headLabel)" + tokens.contains(headLabel));
System.out.println("");
System.out.println("Iterating over typed deps");
for (TypedDependency typedDependency : typedDeps) {
System.out.println(typedDependency.gov().backingLabel());
System.out.println("gov pos "+ typedDependency.gov() + " - " + typedDependency.gov().index());
System.out.println("dep pos "+ typedDependency.dep() + " - " + typedDependency.dep().index());
if(typedDependency.gov().index() == headLabel.index() ){
System.out.println("dep or gov backing label equals headlabel :" + (typedDependency.gov().backingLabel().equals(headLabel) ||
typedDependency.dep().backingLabel().equals(headLabel))); //why does this return false all the time ?
System.out.println(" !!!!!!!!!!!!!!!!!!!!! HIT ON " + headLabel + " == " + typedDependency.gov());
}
}
So it seems I can only match my head's Label with the one from the typedDeps using the index. I wonder if this the propper way to do this.
As you can see in my code I also tried to use TypedDependency.backingLabel() to test equality with my headLabel either with the governor or the dependent but it systematically returns false. I wonder why !?
Any feedback appreciated.
You can get the position of a CoreLabel within its containing sentence with the CoreAnnotations.IndexAnnotation annotation.
Your method for finding all dependents of a given word seems correct, and is probably the easiest way to do it.
I need to recognize arrays of integers in Fortran's I4 format (stands for an integer of width four) as the following example:
Using a pure context-free grammar:
WS : ' ' ;
MINUS : '-' ;
DIGIT : '0'..'9' ;
int4:
WS WS (WS| MINUS ) DIGIT
| WS (WS| MINUS ) DIGIT DIGIT
| (WS| MINUS | DIGIT ) DIGIT DIGIT DIGIT
;
numbers
: int4*;
The above example is correctly matched:
However if I use semantic predicates to encode semantic constraints of rule int4 :
int4
scope { int n; }
#init { $int4::n = 0; }
: ( {$int4::n < 3}?=> WS {$int4::n++;} )*
( MINUS {$int4::n++;} )?
( {$int4::n < 4}?=> DIGIT{$int4::n++;} )+
{$int4::n == 4}?
;
it works for the int4 rule, but it's not the same for the numbers rule, because it doesn't recognize the array of integers of the first example:
In this case may be better pure context-free grammar, but in case of the format I30 (stands for an integer of width 30)?
The main question is: Is it possible to use Semantic Predicates with this grammar?
Your parse tree seems to end at the numbers rule because your numbers rule throws an exception (but it does not show up in the diagram...). You can see it if you run the code generated, and if you take a closer look at the exception, it says (line info may differ for you):
Exception in thread "main" java.util.EmptyStackException
at java.util.Stack.peek(Stack.java:102)
at FortranParser.numbers(FortranParser.java:305)
at Main.main(Main.java:9)
and the code throwing the exception is:
public final void numbers() throws RecognitionException {
....
else if ( (LA5_0==DIGIT) && ((int4_stack.peek().n < 4))) {
alt5=1;
}
So your problem is that the semantic predicate gets propagated to the numbers rule, and at that level the scope stack is empty, hence int4_stack.peek() throws an exception
A trick to avoid it is that you use a variable in the global scope, e.g.:
#members {
int level=0;
}
and modify the semantic predicates to check level before the predicates, just like:
int4
scope { int n; }
#init { $int4::n = 0; level++; }
#after { level--; }
: ( {level==0 || $int4::n < 3}?=> WS {$int4::n++;} )*
( MINUS {$int4::n++;} )?
( {level==0 || $int4::n < 4}?=> DIGIT{$int4::n++;} )+
{$int4::n == 4}?
;
This is just a workaround to avoid the error that you get, maybe (knowing the error) there is a better solution and you don't need to mess up your semantic predicates.
But, I think, the answer is yes, it is possible to use semantic predicates with that grammar.
A while ago I was struggling with writing a JavaCC template for XPath steps so that it would support both a full step definition and a definition with axis name omitted (in which case the axis name would default to child). I posted a question on SO and got a working answer by Theodore Norvell.
Now I'm trying to extend the template so that the parser would, in addition to the two previous possibilities, also support using a "#" sign as a shortcut for the attribute axis.
The following snippet does not work:
Step Step() :
{
Token t;
Step step;
Axis axis;
NodeTest nodeTest;
Expression predicate;
}
{
{ axis = Axis.child; }
(
<AT>
{ axis = Axis.attribute; }
|
LOOKAHEAD( <IDENTIFIER> <DOUBLE_COLON> )
t = <IDENTIFIER>
{ axis = Axis.valueOf(t.image); }
<DOUBLE_COLON>
)?
t = <IDENTIFIER>
{ nodeTest = new NodeNameTest(t.image); }
{ step = new Step(axis, nodeTest); }
(
<OPEN_PAR>
predicate = Expression()
{ step.addPredicate(predicate); }
<CLOSE_PAR>
)*
{ return step; }
}
Instead it emits the following warning message:
Choice conflict in [...] construct at line 162, column 9.
Expansion nested within construct and expansion following construct
have common prefixes, one of which is: <IDENTIFIER>
Consider using a lookahead of 2 or more for nested expansion.
I have tried setting the LOOKAHEAD parameter in various ways but the only way that worked was to set it globally to 2. I would prefer changing it locally though.
How do I do that? And why doesn't the snippet shown in this question work?
Try this
(
<AT>
{ axis = Axis.attribute; }
|
LOOKAHEAD( <IDENTIFIER> <DOUBLE_COLON> )
t = <IDENTIFIER>
{ axis = Axis.valueOf(t.image); }
<DOUBLE_COLON>
|
{}
)
--Edit--
I'd forgotten to answer the second question: "Why doesn't the given snippet work?"
The look ahead spec that you have only applies to the alternation. I'm suprised JavaCC doesn't give you a warning, as the LOOKAHEAD is on the last alternative and hence useless. By the time the parser gets to the LOOKAHEAD, it has already decided (on the basis of the next token being an identifier) to process the part inside the (...)? Another solution is thus
( LOOKAHEAD( <AT> | <IDENTIFIER> <DOUBLE_COLON> )
(<AT> {...} | <IDENTIFIER> {...} <DOUBLE_COLON> )
)?
I've got this in a Prolog file:
:- module(test,[main/0, api_trial/2]
:- use_module(library(prologbeans)).
main:-
register_query(assert_trial(Age,Res), api_trial(Age,Res)),
start.
person('John',10,'London').
person('Adam',10,'Manchester').
api_trial(Age,Res) :-
findall((P,Age,Add),person(P,Age,Add),Res).
In Java, I do the following query (after importing the correct classes etc):
public void trial() {
try{
Bindings bindings = new Bindings().bind("Age",10);
QueryAnswer answer = session.executeQuery("assert_trial(Age,Res)", bindings);
Term result = answer.getValue("Res");
System.out.println("Answer returned " + result);
} catch (IOException e) {
e.printStackTrace();
} catch (IllegalCharacterSetException e) {
e.printStackTrace();
}
}
Basically, my problem is that the format it returns the query in Java.
In Prolog, it's normal:
Res = [('John',10,'London'),('Adam',10,'Manchester')] ?
In Java, I get:
Answer returned [,(John,,(10,London)),,(Adam,,(10,Manchester))]
The formatting is messed up. How can I overcome this problem? Any help would be much appreciated.
Thanks.
Did you use toString() to created the Java output? Most likely the toString()
implements a write the is inbetween write_canonical and write.
From write_canonical it would share that operators f are not respected. Instead
they are normally written as f(a1,a2) respectively f(a1).
One operator definition that every prolog has is the comma, that is why we can
input (A,B), but write_canonical will write it as ','(A,B):
?- X = (A,B), write_canonical(X).
','(_,_)
X = (A,B)
What I now see in your output is, that it has also been stripped by the quotes.
This is the normal behaviour of write:
?- X = 'hello world!', write(X).
hello world!
X = 'hello world!'
The write operation that would both respect operators and put quotes where necessary
would be writeq.
Best Regards