Can I add scope information to tags generated with `--regex-<LANG>` in exuberant ctags? - ctags

Technically, I'm using Tagbar in vim to view a file's tags, but this question should apply generally to exuberant ctags, v5.8.
Suppose I've got the following python file, call it foo.py:
class foo:
def bar(baz):
print(baz)
Let's run ctags on it: ctags foo.py. The resulting tags file looks like this:
!_ some ctags version / formatting stuff not worth pasting
bar foo.py /^ def bar(baz):$/;" m class:foo
foo foo.py /^class foo:$/;" c
The bit I'm interested in is the last field of the second line, class:foo. That's the scope of the bar() function. If I use tagbar in vim, it nests the function in the class accordingly.
Now suppose I'm adding support for a new language in my ~/.ctags. In fact, I'm adding support for this puppet file:
class foo {
include bar
}
Suppose I use the following ~/.ctags arguments. The 'import' regex is ugly (errr... ugly for regex) but it gets the job done enough for this example:
--langdef=puppet
--langmap=puppet:.pp
--regex-puppet=/^class[ \t]*([:a-zA-Z0-9_\-]+)[ \t]*/\1/c,class,classes/
--regex-puppet=/^\ \ \ \ include[ \t]*([:a-zA-Z0-9_\-]+)/\1/i,include,includes/
That generates the following tag in my tags file:
bar foo.pp /^ include bar$/;" i
foo foo.pp /^class foo {$/;" c
Notice neither line contains scoping information. My question is this: Is there anyway for me to construct the --regex-puppet argument, or --regex-<LANG> lines generally, to collect information about a tag's scope? To perhaps declare that tags meeting criterion A are always going to be scope-parents of tags meeting criterion B?
man ctags suggests no clear way to add arbitrary scope information, but I might be overlooking another solution (snipped slightly for emphasis):
--regex-<LANG>=/regexp/replacement/[kind-spec/][flags]
Unless modified by flags, regexp is interpreted as a Posix extended regular expression. The replacement should expand for all matching lines to a non-empty string of
characters, or a warning message will be reported. An optional kind specifier for tags matching regexp may follow replacement, which will determine what kind of tag is
reported in the "kind" extension field (see TAG FILE FORMAT, below). The full form of kind-spec is in the form of a single letter, a comma, a name (without spaces), a
comma, a description, followed by a separator, which specify the short and long forms of the kind value and its textual description (displayed using --list-kinds). Either
the kind name and/or the description may be omitted. If kind-spec is omitted, it defaults to "r,regex". Finally, flags are one or more single-letter characters having the
following effect upon the interpretation of regexp:
b The pattern is interpreted as a Posix basic regular expression.
e The pattern is interpreted as a Posix extended regular expression (default).
i The regular expression is to be applied in a case-insensitive manner.

No, unfortunately that is not possible with the regex pattern support in ctags. The only way to get ctags to generate correct scopes is to write a parser as an additional module in C. I would like to add support for a better handling of new languages to ctags if I find the time, but so far that hasn't worked out and I'm also still unsure about the best approach.
If you're mostly interested in Tagbar support there is another approach, though: Tagbar supports arbitrary tag-generating programs as long as their output is compatible to the ctags one, so you could write a simple parser in, say, Python and configure Tagbar to use that. Have a look at :h tagbar-extend (especially the last subsection "Writing your own tag-generating program") if that would be an option for you.

I'm working on such feature at universal ctags project:
https://github.com/universal-ctags/ctags/pull/562
.
(Don't expect too much; regex parser is not enough for complicated syntax.
The new feature is for a language with simple syntax.)
Example 1::
$ cat /tmp/input.foo
class foo:
def bar(baz):
print(baz)
class goo:
def gar(gaz):
print(gaz)
$ cat /tmp/foo.ctags
--langdef=foo
--map-foo=+.foo
--regex-foo=/^class[[:blank:]]+([[:alpha:]]+):/\1/c,class/{scope=set}
--regex-foo=/^[[:blank:]]+def[[:blank:]]+([[:alpha:]]+).*:/\1/d,definition/{scope=ref}
$ ~/var/ctags/ctags --options=/tmp/foo.ctags -o - /tmp/input.foo
bar /tmp/input.foo /^ def bar(baz):$/;" d class:foo
foo /tmp/input.foo /^class foo:$/;" c
gar /tmp/input.foo /^ def gar(gaz):$/;" d class:goo
goo /tmp/input.foo /^class goo:$/;" c
Example 2::
$ cat /tmp/input.pp
class foo {
include bar
}
$ cat /tmp/pp.ctags
--langdef=pp
--map-pp=+.pp
--regex-pp=/^class[[:blank:]]*([[:alnum:]]+)[[[:blank:]]]*\{/\1/c,class,classes/{scope=push}
--regex-pp=/^[[:blank:]]*include[[:blank:]]*([[:alnum:]]+).*/\1/i,include,includes/{scope=ref}
--regex-pp=/^[[:blank:]]*\}.*//{scope=pop}{exclusive}
$ ~/var/ctags/ctags --options=/tmp/pp.ctags -o - /tmp/input.pp
bar /tmp/input.pp /^ include bar$/;" i class:foo
foo /tmp/input.pp /^class foo {$/;" c

Related

JISON: How do I avoid "dog" being parsed as "do"?

I have the following JISON file (lite version of my actual file, but reproduces my problem):
%lex
%%
"do" return 'DO';
[a-zA-Z_][a-zA-Z0-9_]* return 'ID';
"::" return 'DOUBLECOLON'
<<EOF>> return 'ENDOFFILE';
/lex
%%
start
: ID DOUBLECOLON ID ENDOFFILE
{$$ = {type: "enumval", enum: $1, val: $3}}
;
It is for parsing something like "AnimalTypes::cat". It works fine for things like "AnimalTypes::cat", but the when it sees dog instead of cat, it asumes it's a DO instead of an id. I can see why it does that, but how do I get around it? I've been looking at other JISON documents, but can't seem to spot the difference that (I assume) makes those work.
This is the error I get:
JisonParserError: Parse error on line 1:
PetTypes::dog
----------^
Expecting "ID", "enumstr", "id", got unexpected "DO"
Repro steps:
Install jison-gho globally from npm (or modify code to use local version). I use Node v14.6.0.
Save the JISON above as minimal-repro.jison
Run: jison -m es -o ./minimal.mjs ./minimal-repro.jison to create parser
Create a file named test.mjs with code like:
import Parser from "./minimal.mjs";
Parser.parser.parse("PetTypes::dog")
Run node test.mjs
Edit: Updated with a reproducible example.
Edit2: Simpler JISON
Unlike (f)lex, the jison lexer accepts the first matching pattern, even if it is not the longest matching pattern. You can get the (f)lex behaviour by using
%option flex
However, that significantly slows down the scanner.
The original jison automatically added \b to the end of patterns which ended with a literal string matching an alphabetic character, to make it easier to match keywords without incurring this overhead. In jison-gho, this feature was turned off unless you specify
%option easy_keyword_rules
See https://github.com/zaach/jison/wiki/Deviations-From-Flex-Bison#user-content-literal-tokens.
So either of those options will achieve the behaviour you expect.

What's the name of "<<-JS" operator [duplicate]

For example:
code = <<-EOH
bundle install
bundle exec unicorn -c /etc/unicorn.cfg -D
EOH
What does this code do? What is <<- called?
It's called heredoc. An easy way to define multiline strings which may include single or double quotes without needing to escape them.
See more here, for example.
Often you use heredocs to define large chunks of code. Some editors know about this and can highlight syntax for you there (if you specify language). Look:
There is also a newer HEREDOC syntax for Ruby <<~END that more closely resembles what you would typically see in most shells and other languages with the ~ instead of the - to tell Ruby to strip the leading whitespace to match the least indented line in the block.
https://infinum.co/the-capsized-eight/multiline-strings-ruby-2-3-0-the-squiggly-heredoc
Looks to me like heredoc. The - allows the ending delimiter to ignore whitespace before it.
A simple Google Search gave me this.

How to embed shell snippets in doxygen documentation

When installing my package, the user should at some point type
./wand-new "`cat wandcfg_install.spell`"
Or whatever the configuration file is called. If I put this line inside \code ... \endcode, doxygen thinks it is C++ or... Anyway, the word "new" is treated as keyword. How do I avoid this is in a semantically correct way?
I think \verbatim is disqualified because it actually is code, right?
(I guess the answer is to poke that Dimitri should add support for more languages inside a code block like LaTeX listings package, or at least add an disableparse option to code in the meantime)
Doxygen, as of July 2017, does not officially support documenting Shell/Bash scripting language, not even as an extension. There is an unofficial filter called bash-doxygen. Simple to setup: only one file download and three flags adjustments:
Edit the Doxyfile to map shell files to C parser: EXTENSION_MAPPING = sh=C
Set your shell script file names pattern as Doxygen inputs, like
e.g.: FILE_PATTERNS = *.sh
Mention doxygen-bash.sed in either the INTPUT_FILTER or the
FILTER_PATTERN directive of your Doxyfile. If doxygen-bash.sed is in
your $PATH, then you can just invoke it as is, else use sed -n -f /path/to/doxygen-bash.sed --.
Please note that since it uses C language parsing, some limitations apply, as stated in the main README page of bash-doxygen, one of them, at least in my tests, that the \code {.sh} recognises shell syntax, but all lines in the code block begin with an asterisk (*), apparently as a side-effect of requiring that all Doxygen doc sections have lines starting with double-hashes (##).

What does "<<-" mean in Ruby?

For example:
code = <<-EOH
bundle install
bundle exec unicorn -c /etc/unicorn.cfg -D
EOH
What does this code do? What is <<- called?
It's called heredoc. An easy way to define multiline strings which may include single or double quotes without needing to escape them.
See more here, for example.
Often you use heredocs to define large chunks of code. Some editors know about this and can highlight syntax for you there (if you specify language). Look:
There is also a newer HEREDOC syntax for Ruby <<~END that more closely resembles what you would typically see in most shells and other languages with the ~ instead of the - to tell Ruby to strip the leading whitespace to match the least indented line in the block.
https://infinum.co/the-capsized-eight/multiline-strings-ruby-2-3-0-the-squiggly-heredoc
Looks to me like heredoc. The - allows the ending delimiter to ignore whitespace before it.
A simple Google Search gave me this.

Makefile syntax $(A,B,C)?

Consider the following code:
$ANIMAL = COW PIG CHICKEN VAMPIRE
all:
#echo $(ANIMAL, F, >.txt)
I strove to find a section in GNU make manual that mentions the above syntax, but I couldn't find anything related to it. What does it print and how is the syntax structured for the functionality?
Added: When a line starts with "#--" what does it mean?
#-- $(GEN_ENV); ...
To answer your addition: In regular Makefiles (read: POSIX, GNU, ...)
a leading '#' supresses echoing of the command.
a leading '-' says to ignore a non-zero exit status
both can be combined, and repetitions are okay, so #---###-#---echo foo is the same as #-echo foo
This is called "macro modifiers". This is not a GNU make feature. Take a look at this chapter of OPUS make tutorial. The general syntax of these modifiers:
$(name,modifier[,modifier]...)
name is macro expanded, then each modifier is applied in succession to the elements of the expanded value.
Take a look then at the list of modifiers and it becomes clear that it forms a list of file names (truncates paths of each variable in ANIMAL) with .txt added. So, in your case it shoud output:
COW.txt PIG.txt CHICKEN.txt VAMPIRE.txt
PS
I looked through the reference mentioned above and don't think the first line ($ANIMAL = ) is correct since macro definition should start without $.
Based on your comments it seems you are actually using OpusMake, rather than GNU make. You can find more information about it on the Opus Software, Inc. website, and also in this handy reference guide. From those sources you can see that you have an example of a macro employing macro modifiers in its expansion.
Generally speaking $(FOO) is expanded to the unmodified value of the variable FOO, while $(FOO,mod1[,mod2[,...]]]) expands to the value of FOO, modified according to the modifiers you specify. Note that you can string together any number of modifiers, and they will be applied in left-to-right order.
There's a ton of possible modifiers, but your example specifically uses two:
The F modifier, which means "use just the final path component of each pathname in the variable value"
The >str modifier, which means "append the text str to each space-separated word in the value".
Here's a quick example:
FOO=abc/def ghi/jkl
BAR=$(FOO,F)
BAZ=$(FOO,>.txt)
BOO=$(FOO,F,>.txt)
BAR will have the value def jkl (ie, just the filename portion of each path).
BAZ will have the value abc/def.txt ghi/jkl.txt (ie, append .txt to each space-separated word in the value)
BOO will have the value def.txt jkl.txt (ie, first take just the filename portion of each path, then append .txt to each)

Resources