JISON: How do I avoid "dog" being parsed as "do"? - jison

I have the following JISON file (lite version of my actual file, but reproduces my problem):
%lex
%%
"do" return 'DO';
[a-zA-Z_][a-zA-Z0-9_]* return 'ID';
"::" return 'DOUBLECOLON'
<<EOF>> return 'ENDOFFILE';
/lex
%%
start
: ID DOUBLECOLON ID ENDOFFILE
{$$ = {type: "enumval", enum: $1, val: $3}}
;
It is for parsing something like "AnimalTypes::cat". It works fine for things like "AnimalTypes::cat", but the when it sees dog instead of cat, it asumes it's a DO instead of an id. I can see why it does that, but how do I get around it? I've been looking at other JISON documents, but can't seem to spot the difference that (I assume) makes those work.
This is the error I get:
JisonParserError: Parse error on line 1:
PetTypes::dog
----------^
Expecting "ID", "enumstr", "id", got unexpected "DO"
Repro steps:
Install jison-gho globally from npm (or modify code to use local version). I use Node v14.6.0.
Save the JISON above as minimal-repro.jison
Run: jison -m es -o ./minimal.mjs ./minimal-repro.jison to create parser
Create a file named test.mjs with code like:
import Parser from "./minimal.mjs";
Parser.parser.parse("PetTypes::dog")
Run node test.mjs
Edit: Updated with a reproducible example.
Edit2: Simpler JISON

Unlike (f)lex, the jison lexer accepts the first matching pattern, even if it is not the longest matching pattern. You can get the (f)lex behaviour by using
%option flex
However, that significantly slows down the scanner.
The original jison automatically added \b to the end of patterns which ended with a literal string matching an alphabetic character, to make it easier to match keywords without incurring this overhead. In jison-gho, this feature was turned off unless you specify
%option easy_keyword_rules
See https://github.com/zaach/jison/wiki/Deviations-From-Flex-Bison#user-content-literal-tokens.
So either of those options will achieve the behaviour you expect.

Related

What is the equivalent of <<EOF>> in jison when using JSON format

I'm reading through the jison documentation and one of the examples gives a lexer rule that matches the end of file (<<EOF>>). However, that can only be used if you are writing the grammar in JISON format. Instead, I am using the JSON format to describe my grammar, but I cannot find anything in the documentation describing how to match the end of file. I have tried using "<<EOF>>" as the lexer rule, but that literally matches the string <<EOF>>.
How do I do this? Is there more documentation for jison somewhere that I'm missing?
After digging into the source code for lex-parser, it looks like $ does what I want. Instead of matching the end of a line, it matches the end of a file. <<EOF>> actually gets converted to $ when parsing the lex section of a jison file.

Ruby: Why do I get warning "regex literal in condition" here?

A simple Ruby program, which works well (using Ruby 2.0.0):
#!/usr/bin/ruby
while gets
print if /foo/../bar/
end
However, Ruby also outputs the warning warning: regex literal in condition. It seems that Ruby considers my flip-flop-expression /foo/../bar/ as dangerous.
My question: Where lies the danger in this program? And: Can I turn off this warning (ideally only for this statement, keeping other warnings active)?
BTW, I found on the net several discussions of this kind of code, also mentioning the warning, but never found a good explanation why we get warned.
You can avoid the warning by using an explicit match:
while gets
print if ~/foo/..~/bar/
end
Regexp#~ matches against $_.
I don't know why the warning is shown (to be honest, I wasn't even aware that Ruby matches regexp literals against $_ implicitly), but according to the parser's source code, it is shown unless you provide the -e option when invoking Ruby, i.e. passing the script as an argument:
$ ruby -e "while gets; print if /foo/../bar/ end"
I would avoid using $_ as an implicit parameter and instead use something like:
while line = gets
print line if line=~/foo/..line=~/bar/
end
I think Neil Slater is right: It looks like a bug in a parser. If I change the code to
#!/usr/bin/ruby
while gets
print if $_=~/foo/..$_=~/bar/
end
the warning disappears.
I'll file a bug report.

jamplus: link command line too long for osx

I'm using jamplus to build a vendor's cross-platform project. On osx, the C tool's command line (fed via clang to ld) is too long.
Response files are the classic answer to command lines that are too long: jamplus states in the manual that one can generate them on the fly.
The example in the manual looks like this:
actions response C++
{
$(C++) ##(-filelist #($(2)))
}
Almost there! If I specifically blow out the C.Link command, like this:
actions response C.Link
{
"$(C.LINK)" $(LINKFLAGS) -o $(<[1]:C) -Wl,-filelist,#($(2:TC)) $(NEEDLIBS:TC) $(LINKLIBS:TC))
}
in my jamfile, I get the command line I need that passes through to the linker, but the response file isn't newline terminated, so link fails (osx ld requires newline-separated entries).
Is there a way to expand a jamplus list joined with newlines? I've tried using the join expansion $(LIST:TCJ=\n) without luck. $(LIST:TCJ=#(\n)) doesn't work either. If I can do this, the generated file would hopefully be correct.
If not, what jamplus code can I use to override the link command for clang, and generate the contents on the fly from a list? I'm looking for the least invasive way of handling this - ideally, modifying/overriding the tool directly, instead of adding new indirect targets wherever a link is required - since it's our vendor's codebase, as little edit as possible is desired.
The syntax you are looking for is:
newLine = "
" ;
actions response C.Link
{
"$(C.LINK)" $(LINKFLAGS) -o $(<[1]:C) -Wl,-filelist,#($(2:TCJ=$(newLine))) $(NEEDLIBS:TC) $(LINKLIBS:TC))
}
To be clear (I'm not sure how StackOverflow will format the above), the newLine variable should be defined by typing:
newLine = "" ;
And then placing the carat between the two quotes and hitting enter. You can use this same technique for certain other characters, i.e.
tab = " " ;
Again, start with newLine = "" and then place carat between the quotes and hit tab. In the above it is actually 4 spaces which is wrong, but hopefully you get the idea. Another useful one to have is:
dollar = "$" ;
The last one is useful as $ is used to specify variables typically, so having a dollar variable is useful when you actually want to specify a dollar literal. For what it is worth, the Jambase I am using (the one that ships with the JamPlus I am using), has this:
SPACE = " " ;
TAB = " " ;
NEWLINE = "
" ;
Around line 28...
I gave up on trying to use escaped newlines and other language-specific characters within string joins. Maybe there's an awesome way to do that, that was too thorny to discover.
Use a multi-step shell command with multiple temp files.
For jamplus (and maybe other jam variants), the section of the actions response {} between the curly braces becomes an inline shell script. And the response file syntax #(<value>) returns a filename that can be assigned within the shell script, with the contents set to <value>.
Thus, code like:
actions response C.Link
{
_RESP1=#($(2:TCJ=#)#$(NEEDLIBS:TCJ=#)#$(LINKLIBS:TCJ=#))
_RESP2=#()
perl -pe "s/[#]/\n/g" < $_RESP1 > $_RESP2
"$(C.LINK)" $(LINKFLAGS) -o $(<[1]:C) -Wl,-filelist,$_RESP2
}
creates a pair of temp files, assigned to shell variable names _RESP1 and _RESP2. File at path _RESP1 is assigned the contents of the expanded sequence joined with a # character. Search and replace is done with a perl one liner into _RESP2. And link proceeds as planned, and jamplus cleans up the intermediate files.
I wasn't able to do this with characters like :;\n, but # worked as long as it had no adjacent whitespace. Not completely satisfied, but moving on.

Makefile syntax $(A,B,C)?

Consider the following code:
$ANIMAL = COW PIG CHICKEN VAMPIRE
all:
#echo $(ANIMAL, F, >.txt)
I strove to find a section in GNU make manual that mentions the above syntax, but I couldn't find anything related to it. What does it print and how is the syntax structured for the functionality?
Added: When a line starts with "#--" what does it mean?
#-- $(GEN_ENV); ...
To answer your addition: In regular Makefiles (read: POSIX, GNU, ...)
a leading '#' supresses echoing of the command.
a leading '-' says to ignore a non-zero exit status
both can be combined, and repetitions are okay, so #---###-#---echo foo is the same as #-echo foo
This is called "macro modifiers". This is not a GNU make feature. Take a look at this chapter of OPUS make tutorial. The general syntax of these modifiers:
$(name,modifier[,modifier]...)
name is macro expanded, then each modifier is applied in succession to the elements of the expanded value.
Take a look then at the list of modifiers and it becomes clear that it forms a list of file names (truncates paths of each variable in ANIMAL) with .txt added. So, in your case it shoud output:
COW.txt PIG.txt CHICKEN.txt VAMPIRE.txt
PS
I looked through the reference mentioned above and don't think the first line ($ANIMAL = ) is correct since macro definition should start without $.
Based on your comments it seems you are actually using OpusMake, rather than GNU make. You can find more information about it on the Opus Software, Inc. website, and also in this handy reference guide. From those sources you can see that you have an example of a macro employing macro modifiers in its expansion.
Generally speaking $(FOO) is expanded to the unmodified value of the variable FOO, while $(FOO,mod1[,mod2[,...]]]) expands to the value of FOO, modified according to the modifiers you specify. Note that you can string together any number of modifiers, and they will be applied in left-to-right order.
There's a ton of possible modifiers, but your example specifically uses two:
The F modifier, which means "use just the final path component of each pathname in the variable value"
The >str modifier, which means "append the text str to each space-separated word in the value".
Here's a quick example:
FOO=abc/def ghi/jkl
BAR=$(FOO,F)
BAZ=$(FOO,>.txt)
BOO=$(FOO,F,>.txt)
BAR will have the value def jkl (ie, just the filename portion of each path).
BAZ will have the value abc/def.txt ghi/jkl.txt (ie, append .txt to each space-separated word in the value)
BOO will have the value def.txt jkl.txt (ie, first take just the filename portion of each path, then append .txt to each)

trying to find a file/line for: .(eval):289: warning: don't put space before argument parentheses

So, I get this warning when I'm running my tests in ruby/RoR
.(eval):289: warning: don't put space before argument parentheses
I've checked every where (but obvoiusly not) and I can't find the origin of this error.
The above error just pops up inbetween the unit tests ...
Can someone clue me in onto how to find the location of this error?
The file and line number are contained in the backtrace. However, in your case, the warning is inside a string being evaled at runtime. Which means there is no file. (Actually, the eval method does take optional arguments for the file name and line number that should be displayed in a backtrace, but in this case whoever wrote the code in question unfortunately forgot to pass those arguments.)
I fear that you have no other choice than to manually examine every single call to eval in your entire codebase, and that includes Rails, your testing framework, your entire application, your tests, your plugins, your helpers, the ruby standard library, ...
Of course, you should be aware that the problem might not be obvious as in
eval 'foo (bar, baz)'
It could also be something like
def foo(*args)
puts args.join
end
bar = 'Hello'
baz = 'World'
foostr = 'foo' # in one file
barstr = 'bar' # in another file in a different directory
bazstr = 'baz' # in another file in a different directory
argstr = "(#{barstr}, #{bazstr})" # in yet another file
$, = ' ' # in some third-party plugin
str = [foostr, argstr].join # in a fourth file
eval str # somewhere else entirely
eval str, binding, __FILE__, __LINE__ # this is how it *should* be done
Note the difference between the two warning messages: the first one reads exactly like the one you posted, but the second one has the filename instead of (eval) and the line number inside the file instead of the line number inside the eval string.
By the way: the line number 289 in the warning message is the line number inside the evald string! In other words: somewhere in your application there is a string being evald, which is at least 289 lines long! (Actually, it is more likely that this done not in your application but rather in Rails. The Rails router used to be a particularly bad offender, I don't know if this is still the case.)
It sounds to me that there is a rule which forbids a space between a function name and the parentheses enclosing the arguments of the function.
In many languages this would be considered a permissible stylistic variation.
Is the eval mentioned in the warning message, the 'function' being complained about?
Does the number 289 mean anything as a line number?
Could you search your source files for a parenthesis preceded by a space?
Incidentally, the message says warning. What happens if you ignore it?
If it's happening in between the unit tests it might be in a setup or teardown method. Try searching for eval or try reducing the code you are running until the error goes away. Then you'll know where to look (the code you just removed).

Resources