I'm writing a compiler of clike language in JS using Jison as an lexer/parser generator with angular frontend. I nearly got the result I expected, but there is one thing that is puzzling me - how to make Jison ignore comments (both /* block */ and // line)?
Is there any easy way to achieve it? Keeping in mind that the comment can potentially be inserted in the middle of any statement/expression?
You ignore comments the same way you ignore whitespace: with a lexer rule that has no action.
For example:
\s+ /* IGNORE */
"//".* /* IGNORE */
[/][*][^*]*[*]+([^/*][^*]*[*]+)*[/] /* IGNORE */
The first line ignores whitespace. The second ignores single-line comments. And the third ignores block comments.
Related
I have created a custom directive for a documentation project of mine which is built using Sphinx and reStructuredText. The directive is used like this:
.. xpath-try:: //xpath[#expression="here"]
This will render the XPath expression as a simple code block, but with the addition of a link that the user can click to execute the expression against a sample XML document and view the matches (example link, example rendered page).
My directive specifies that it does not have content, takes one mandatory argument (the xpath expression) and recognises a couple of options:
class XPathTryDirective(Directive):
has_content = False
required_arguments = 1
optional_arguments = 0
final_argument_whitespace = True
option_spec = {
'filename': directives.unchanged,
'ns_args': directives.unchanged,
}
def run(self):
xpath_expr = self.arguments[0]
node = xpath_try(xpath_expr, xpath_expr)
...
return [node]
Everything seems to be working exactly as intended except that if the XPath expression contains a * then the syntax highlighting in my editor (gVim) gets really messed up. If I escape the * with a backslash, then that makes my editor happy, but the backslash comes through in the output.
My questions are:
Are special characters in an argument to a directive supposed to be escaped?
If so, does the directive API provide a way to get the unescaped version?
Or is it working fine and the only problem is my editor is failing to highlight things correctly?
It may seem like a minor concern but as I'm a novice at rst, I find the highlighting to be very helpful.
Are special characters in an argument to a directive supposed to be escaped?
No, I think that there is no additional processing performed on arguments of rst directives. Which matches your observation: whatever you specify as an argument of the directive, you are able to get directly via self.arguments[0].
Or is it working fine and the only problem is my editor is failing to highlight things correctly?
Yes, this seems to be the case. Character * is used for emphasis/italics in rst and it gets more attention during syntax highlighting for some reason.
This means that the solution here would be to tweak or fix vim syntax file for restructuredtext.
The documentation for namepaths says you should escape special characters:
Above is an example of a namespace with "unusual" characters in its
member names (the hash character, dashes, even quotes). To refer to
these you just need quote the names: chat."#channel",
chat."#channel"."op:announce-motd", and so on. Internal quotes in
names should be escaped with backslashes:
chat."#channel"."say-\"hello\""
However, this doesn't work on dots. If I have an event called "cellClick.dt" that I want to document, jsDoc skips the documentation from the output, and generates an incorrect link in the table of contents. I have tried the following combinations:
myClass~event.namespace
'myClass~event.namespace'
myClass~event\.namespace
myclass~'event.namespace'
All of them generate broken docs in some way. The last one at least seem to generate correct links and docs, but the apostrophes are still here in the output. This makes it very cumbersome to document code that uses dots for namespace separators in events (like eg. jQuery plugins do by default).
What's the correct way to do this? Is there one? The version I'm using is 3.3.0-alpha9.
I would suggest doing this:
/**
* #class
*/
function myClass () {
}
/**
* #memberof myClass
* #event event.namespace
*/
The event is properly named and is a member of myClass. It's annoying to have to split off the full name in two parts but at least the result is not ugly.
I'm writing a Ruby script that uses regex to find all comments of a specific format in Objective-C source code files.
The format is
/* <Headline_in_caps> <#>:
<Comment body>
**/
I want to capture the headline in caps, the number and the body of the comment.
With the regex below I can find one comment in this format within a larger body of text.
My problem is that if there are more than one comments in the file then I end up with all the text, including code, between the first /* and last **/. I don't want it to capture all text inclusively, but only what is within each /* and **/.
The body of the comment can include all characters, except for **/ and */ which both signify the end of a comment. Am I correct assuming that regex will find multiple-whole-regex-matches only processing text once?
\/\*\s*([A-Z]+). (\d)\:([\w\d\D\W]+)\*{2}\//x
Broken apart the regex does this:
\/\* —finds the start of a comment
\s* —finds whitespace
([A-Z]+) —captures caps word
.<space> —find the space in between caps word and digit
(\d) —capture the digit
\: —find the colon
([\w\W\d\D]+) —captures the body of a message which can include all valid characters, except **/ or */
\*{2}\/ —finds the end of a comment
Here is a sample, everything from the first /* to the second **/ is captured.:
/*
HEADLINE 1:
Comment body.
**/
- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
// This text and method declaration are captured
// The regex captures from HEADLINE to the end of the comment "meddled in." inclusively.
/*
HEADLINE 2:
Should be captured separately and without Objective-C code meddled in.
**/
}
Here is the sample on Rubular: http://rubular.com/r/4EoXXotzX0
I'm using gsub to process the regex on a string of the whole file, running Ruby 1.9.3. Another issue I have is that gsub gives me what Rubular ignores, is this a regression or is Rubular using a different method that gives what I want?
In this question Regex matching multiple occurrences per file and per line about multiple occurrences the answer is to use g for the global option, that is not valid in Ruby regex.
Change this: ([\w\W\d\D]+)
To this: ([\w\W\d\D]+?)
This will cause the regex to be non-greedy, stopping as soon as it sees the next closing **/. (Updated rubular: http://rubular.com/r/Whm31AJ6Kg)
Also, note that [\w\W\d\D] matches absolutely any character, and can be simpler written as just [\w\W]. You could alternatively match the body with just [^*\/], which would also avoid the above problem of matching through the close. (Updated rubular: http://rubular.com/r/2h0kGYkdVQ)
A solution:
Split the whole String with '*/' (end of a comment)
If the split returns only one element, there is no comment in the String
Otherwise, for each token, except the last one, use the RegExp %r{/\*(.*)$} (starting at '/*' until the end of the token) to capture the whole commented content (you may use here a more complex RegExp to capture more data in the comment)
It may not be the most beautiful solution, but it should do the job. And it's no bullet-proof, if you have in your Objective-C source code something like the line below, my solution will fail.
char *myString = "a comment /* */";
//
// These are my comments at the beginning...
//
//-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- <--- These?
SETUP;
DRAW;
END;
I did some searching but I found myself not knowing what to call them.
If such a line must be given a name, call it an ASCII Art divider.
Such lines help guide the eye when flipping through a large printout, but only take up screen space in a modern editor, where syntax colouring is more effective.
(The multi-line comment as a whole is called a block comment, with or without such dividers.)
/* Suppose I have a multi-line comment with hard line-breaks
* that are roughly uniform on the right side of the text,
* and I want to add text to a line in order to make the
* comment a bit more descriptive.
*/
Now, most unfortunately, I need to add text to one of the top lines.
/* Suppose I have a multi-line comment with hard line-breaks (here is some added text for happy fun time)
* that are roughly uniform on the right side of the text,
* and I want to add text to a line in order to make the
* comment a bit more descriptive.
*/
It takes O(n) time (n being the number of lines) to fix each line so that they roughly line up again. The computer should do this, not me.
Are there tools to deal with this in our IDEs? What are they called?
emacs supports the command fill-paragraph which is typically mapped to meta-q.
Output from fill-paragraph on your second paragraph of text:
/* Suppose I have a multi-line comment with hard line-breaks (here is
* some added text for happy fun time) that are roughly uniform on the
* right side of the text, and I want to add text to a line in order
* to make the comment a bit more descriptive.
*/
Eclipse has this built in (at least, I think it's what you want). When you type a comment, you then type Ctrl+Shift+F and it will format either all your code, or just the section of code that you have highlighted.
I just tested it now and it aligned all my comments for me.