Are there any standards for tmlanguage keyword types? - syntax

.tmlanguage files work by defining a list of key value pairs. Regular expressions are the keys and the type of syntax is the value. This is done in the following XML-ish manner:
<key>match</key>
<string>[0-9]</string>
<key>name</key>
<string>constant.numeric</string>
My main question is: Is there a list of values that could go in place of constant.numeric if the file is to be used by a text editor like Sublime?

For a basic introduction, check out the Language Grammars section of the TextMate Manual. The Naming Conventions section describes some of the base scopes, like comment, keyword, meta, storage, etc. These classes can then be subclassed to give as much detail as possible - for example, constant.numeric.integer.long.hexadecimal.python. However, it is very important to note that these are not hard-and-fast rules - just suggestions. This will become obvious as you scan through different language definitions and see, for example, all the different ways that functions are scoped - meta.function-call, support.function.name, meta.function-call punctuation.definition.parameters, etc.
The best way to learn about scopes is to examine existing .tmLanguage files, and to look through the source of different languages and see what scopes are assigned where. The XML format is very difficult to casually browse through, so I use the excellent PackageDev plugin to translate the XML to YAML. It is then much easier to scan and see what scopes are described by what regexes:
Another way to learn is to see how different language constructs are scoped, and for that I highly recommend using ScopeAlways. Once installed and activated, just place your cursor and the scope(s) that apply to that particular position are shown in the status bar. This is particularly useful when designing color schemes, as you can easily see which selectors will highlight a language feature of interest.
If you're interested, the color scheme used here is Neon, which I designed to make as many languages as possible look as good as possible, covering as many scopes as possible. Feel free to look through it to see how the different language elements are highlighted; this could also help you in designing your .tmLanguage to be consistent with other languages.
I hope all this helps, good luck!

Yes. The .tmlanguage format was originally used by TextMate. The TextMate manual provides full documentation for the format, including the possible types of language constructs.
Copied from the relevant docs page, in hierarchical format:
comment — for comments.
line — line comments, we specialize further so that the type of comment start character(s) can be extracted from the scope
double-slash — // comment
double-dash — -- comment
number-sign — # comment
percentage — % comment
character — other types of line comments.
block — multi-line comments like /* … */ and <!-- … -->.
documentation — embedded documentation.
constant — various forms of constants.
numeric — those which represent numbers, e.g. 42, 1.3f, 0x4AB1U.
character — those which represent characters, e.g. <, \e, \031.
escape — escape sequences like \e would be constant.character.escape.
language — constants (generally) provided by the language which are “special” like true, false, nil, YES, NO, etc.
other — other constants, e.g. colors in CSS.
entity — an entity refers to a larger part of the document, for example a chapter, class, function, or tag. We do not scope the entire entity as entity.* (we use meta.* for that). But we do use entity.* for the “placeholders” in the larger entity, e.g. if the entity is a chapter, we would use entity.name.section for the chapter title.
name — we are naming the larger entity.
function — the name of a function.
type — the name of a type declaration or class.
tag — a tag name.
section — the name is the name of a section/heading.
other — other entities.
inherited-class — the superclass/baseclass name.
attribute-name — the name of an attribute (mainly in tags).
we are naming the larger entity.
invalid — stuff which is “invalid”.
illegal — illegal, e.g. an ampersand or lower-than character in HTML (which is not part of an entity/tag).
deprecated — for deprecated stuff e.g. using an API function which is deprecated or using styling with strict HTML.
keyword — keywords (when these do not fall into the other groups).
control — mainly related to flow control like continue, while, return, etc.
operator — operators can either be textual (e.g. or) or be characters.
other — other keywords.
markup — this is for markup languages and generally applies to larger subsets of the text.
underline — underlined text.
link — this is for links, as a convenience this is derived from markup.underline so that if there is no theme rule which specifically targets markup.underline.link then it will inherit the underline style.
bold — bold text (text which is strong and similar should preferably be derived from this name).
heading — a section header. Optionally provide the heading level as the next element, for example markup.heading.2.html for <h2>…</h2> in HTML.
italic — italic text (text which is emphasized and similar should preferably be derived from this name).
list — list items.
numbered — numbered list items.
unnumbered — unnumbered list items.
quote — quoted (sometimes block quoted) text.
raw — text which is verbatim, e.g. code listings. Normally spell checking is disabled for markup.raw.
other — other markup constructs.
meta — the meta scope is generally used to markup larger parts of the document. For example the entire line which declares a function would be meta.function and the subsets would be storage.type, entity.name.function, variable.parameter etc. and only the latter would be styled. Sometimes the meta part of the scope will be used only to limit the more general element that is styled, most of the time meta scopes are however used in scope selectors for activation of bundle items. For example in Objective-C there is a meta scope for the interface declaration of a class and the implementation, allowing the same tab-triggers to expand differently, depending on context.
storage — things relating to “storage”.
type — the type of something, class, function, int, var, etc.
modifier — a storage modifier like static, final, abstract, etc.
string — strings.
quoted — quoted strings.
single — single quoted strings: 'foo'.
double — double quoted strings: "foo".
triple — triple quoted strings: """Python""".
other — other types of quoting: $'shell', %s{...}.
unquoted — for things like here-docs and here-strings.
interpolated — strings which are “evaluated”: `date`, $(pwd).
regexp — regular expressions: /(\w+)/.
other — other types of strings (should rarely be used).
support — things provided by a framework or library should be below support.
function — functions provided by the framework/library. For example NSLog in Objective-C is support.function.
class — when the framework/library provides classes.
type — types provided by the framework/library, this is probably only used for languages derived from C, which has typedef (and struct). Most other languages would introduce new types as classes.
constant — constants (magic values) provided by the framework/library.
variable — variables provided by the framework/library. For example NSApp in AppKit.
other — the above should be exhaustive, but for everything else use support.other.
variable — variables. Not all languages allow easy identification (and thus markup) of these.
parameter — when the variable is declared as the parameter.
language — reserved language variables like this, super, self, etc.
other — other variables, like $some_variables.

Related

What is the name of the variable naming convention used in VBS that includes the data type in the name? [duplicate]

This question already has an answer here:
What is Hungarian Notation? [duplicate]
(1 answer)
Closed 8 months ago.
I have recently started working in a system that includes the data type in the fieldnames for every record. I'm writing up the documentation for this system (in particular the coding conventions), and as a history lesson, I wanted to include a reference to this style of naming convention.
In the past, I know it was very standard to use names like
dim strName
dim intAge
dim fltIncome
To help keep track of datatypes in dynamically typed languages (VBS in the case above). I also, know that this convention was actually named after somebody who wrote a lengthy description about why this is a good idea.
Does anyone know the name of this convention, or have good references they could share?
COM doesn't use Hungarian Notation at all. The Windows API does. And its useful enough. I've bolded the pat below that refutes Hungarian Notation in COM.
This is from https://learn.microsoft.com/ms-my/previous-versions/windows/desktop/automat/naming-conventions
Choose names for exposed objects, properties, and methods that can be
easily understood by users of the application. The guidelines in this
section apply to all of the following exposed items:
Objects — implemented as classes in an application.
Properties and methods — implemented as members of a class.
Named arguments — implemented as named parameters in a member function.
Constants and enumerations — implemented as settings for properties and methods.
Use Entire Words or Syllables
It is easier for users to remember complete words than to remember
whether you abbreviated Window as Wind, Wn, or Wnd.
When you need to abbreviate because an identifier would be too long,
try to use complete initial syllables. For example, use AltExpEval
instead of AlternateExpressionEvaluation.
Use Don't use
Application App
Window Wnd
Use Mixed Case
All identifiers should use mixed case, rather than underscores, to
separate words.
Use Don't use
ShortcutMenus Shortcut_Menus, Shortcutmenus, SHORTCUTMENUS, SHORTCUT_MENUS
BasedOn basedOn
Use the Same Word Used in the Interface
Use consistent terminology. Do not use names like HWND that are
based on Hungarian notation. Try to use the same word users would
use to describe a concept.
Use Don't use
Name Lbl
Use the Correct Plural for the Class Name
Collection classes should use the correct plural for the class name.
For example, if you have a class named Axis, store the collection of
Axis objects in an Axes class. Similarly, a collection of Vertex
objects should be stored in a Vertices class. In cases where English
uses the same word for the plural, append the word "Collection."
Use Don't use
Axes Axiss
SeriesCollection CollectionSeries
Windows ColWindow

In Lisp/Racket/Scheme how is it possible to have an argument named `list`?

Isn’t list a keyword to create a new list in Lisp, but yet it is possible to have an argument called list in Lisp. I thought keywords in most programming languages such as Java or C++ cannot be used for argument names, is there a special reason in Lisp that they can?
The name list isn't a reserved keyword, it's an ordinary function. Reusing the name for another purpose can be confusing for the reader but doesn't present any problems for the language itself; it's the same as having two variables called x in different parts of the program.
Mainstream Lisp descendants and derivatives like Commmon Lisp and Scheme do not incorporate the concept of reserved keywords. It is alien to the way Lisp works.
When Lisp read syntax is scanned, identifier tokens which appear in it are converted into corresponding symbol objects. These tokens are all in the same lexical category: symbol.
When Lisp read syntax is scanned and turned into an object, such as a nested list representing program code, this is done without regard for the semantics (what the symbols mean).
This is different from the parsing of languages (such as some of those in the broad Fortran/Algol family) which have reserved keywords.
Roughly speaking, reserved keywords are tokens which look like symbols but are actually just punctuation. Lisp has punctuation also, like parentheses, sharpsign prefixes, various quotes and such.
These punctuation words have a fixed role in the phrase structure grammar, and the phrase structure grammar must be processed before the semantics of the program can be considered.
So for instance, the reserved BEGIN and END keywords in Pascal are essentially nothing more than verbose parentheses. The '(' and ')' tokens are similarly reserved in Lisp-like languages. Trying to use BEGIN as the name of a function or variable in Pascal is similar to trying to use ( as the name of a function or variable in Lisp.
Some languages have keywords which determine phrase structure, yet allow identifiers which look exactly like reserved keywords to be used anyway. For instance, PL/I was famous for this:
IF IF=THEN THEN THEN=ELSE; ELSE ELSE=IF
Lisp dialects may assign special semantic treatment to certain symbols or certain categories of symbols. This is a sort of reservation, but not exactly the same as reserved keywords, because it is at the semantic level. For instance, in Common Lisp, the symbols nil and t (more specifically the nil and t in the common-lisp package, common-lisp:nil and common-lisp:t) may not be used as function or variable names. When either one appears as an expression, it evaluates to itself: the value of t is t and that of nil is nil. Moreover, nil is also the Boolean false value and the empty list. So, effectively, these symbols are reserved in some regards. Common Lisp also has a keyword package. All symbols in that package evaluate to themselves and may not be used as variables. They may be used as function names, and for any other purpose.
You say Lisp, but the answer changes depending on which Lisp you're talking about.
In Common Lisp, you can use list as a variable because Common Lisp is a Lisp-2, meaning that each symbol has a separate slot for a function binding and a variable binding. Common Lisp sets the function binding for the symbol list in the CL package, but doesn't set the variable binding. You can't change the function binding because Common Lisp doesn't allow you to redefine bindings for symbols that are set in the CL package (you can, of course, use whatever symbols you like in your own packages), but since the variable binding is free you're allowed to use it.
Scheme is a Lisp-1, which means that it only has one binding per symbol. There's no separation of function bindings and variable bindings (hence why you use define in Scheme, but defun and defvar in CL). The reason you can use "list" as a variable is because Scheme doesn't prevent you from rebinding its built-in symbols. It's just generally a bad idea, since by redefining list you can no longer call the list function.
Emacs Lisp is a Lisp-2 but doesn't prevent you from rebinding symbols, which means you can do things like (defun + (- a b)) and totally screw up your editing session. So... don't do that, unless you really know what you're doing.
Clojure is a Lisp-1. I don't have a working Clojure install at the moment so I can't comment on what it lets you do. I would suspect it's more strict than Scheme.

Why do Julia programmers need to prefix macros with the at-sign?

Whenever I see a Julia macro in use like #assert or #time I'm always wondering about the need to distinguish a macro syntactically with the # prefix. What should I be thinking of when using # for a macro? For me it adds noise and distraction to an otherwise very nice language (syntactically speaking).
I mean, for me '#' has a meaning of reference, i.e. a location like a domain or address. In the location sense # does not have a meaning for macros other than that it is a different compilation step.
The # should be seen as a warning sign which indicates that the normal rules of the language might not apply. E.g., a function call
f(x)
will never modify the value of the variable x in the calling context, but a macro invocation
#mymacro x
(or #mymacro f(x) for that matter) very well might.
Another reason is that macros in Julia are not based on textual substitution as in C, but substitution in the abstract syntax tree (which is much more powerful and avoids the unexpected consequences that textual substitution macros are notorious for).
Macros have special syntax in Julia, and since they are expanded after parse time, the parser also needs an unambiguous way to recognise them
(without knowing which macros have been defined in the current scope).
ASCII characters are a precious resource in the design of most programming languages, Julia very much included. I would guess that the choice of # mostly comes down to the fact that it was not needed for something more important, and that it stands out pretty well.
Symbols always need to be interpreted within the context they are used. Having multiple meanings for symbols, across contexts, is not new and will probably never go away. For example, no one should expect #include in a C program to go viral on Twitter.
Julia's Documentation entry Hold up: why macros? explains pretty well some of the things you might keep in mind while writing and/or using macros.
Here are a few snippets:
Macros are necessary because they execute when code is parsed,
therefore, macros allow the programmer to generate and include
fragments of customized code before the full program is run.
...
It is important to emphasize that macros receive their arguments as
expressions, literals, or symbols.
So, if a macro is called with an expression, it gets the whole expression, not just the result.
...
In place of the written syntax, the macro call is expanded at parse
time to its returned result.
It actually fits quite nicely with the semantics of the # symbol on its own.
If we look up the Wikipedia entry for 'At symbol' we find that it is often used as a replacement for the preposition 'at' (yes it even reads 'at'). And the preposition 'at' is used to express a spatial or temporal relation.
Because of that we can use the #-symbol as an abbreviation for the preposition at to refer to a spatial relation, i.e. a location like #tony's bar, #france, etc., to some memory location #0x50FA2C (e.g. for pointers/addresses), to the receiver of a message (#user0851 which twitter and other forums use, etc.) but as well for a temporal relation, i.e. #05:00 am, #midnight, #compile_time or #parse_time.
And since macros are processed at parse time (here you have it) and this is totally distinct from the other code that is evaluated at run time (yes there are many different phases in between but that's not the point here).
In addition to explicitly direct the attention to the programmer that the following code fragment is processed at parse time! as oppossed to run time, we use #.
For me this explanation fits nicely in the language.
thanks#all ;)

What is the meaning of the "#" prefix on some D attributes?

The D Programming Language has at least two attributes prefixed with the "#" symbol:
#disable
#property
What sort of meaning is "#" supposed to convey? I can't seem to locate anything relevant in the documentation.
Also, why is __gshared the only attribute with two leading underscores?
It has no meaning.
Yes, that probably wasn't what you were hoping to hear -- but that's what they've said in the newsgroups.
The # doesn't really mean anything at this point. All of the #x words are function attributes. The # was tacked on pretty much just to save keywords. So, in general, newer attributes have # on them and older ones don't (though there was some shuffling around of that a while back where there was some debate over whether some of the attributes should have # or not). If they were redone from scratch without caring what other languages have done, then you might have gotten # on all of the function attributes, but there was no way that stuff like #public was going to happen, since it would have just made porting code harder for no real benefit. The end result is that what got # and what didn't is fairly arbitrary. You just have to remember which attributes start with # and which don't, but that's not all that much different from having to learn new keywords. It's just that these are prefixed with # so that they aren't actually keywords and don't reduce the number of legal identifiers in the language.
Now, there's definitely a desire among many in the D community to use # for custom attributes in the future, in which case, # would indicate a custom attribute in the cases where the name used wasn't one built into the language, but for all of the ones built into the language, it pretty much just amounts to saving a keyword.
As Mehrdad shows (see the links in the comments), there's no special meaning to "#", they are how they are just for historical reasons.
As for your other question, __gshared isn't the only keyword with two underscores, there's also __thread and __traits. This naming convention is commonly used to denote internal data structures, which need to be exposed for practical reasons but are not "safe" to use in all cases (i.e. more a hack than a well-established feature). I'm not sure whether or not the D language follows this convention, but seeing this quote from the docs I believe that's the case:
__gshared is disallowed in safe mode.
I'm searching for more info about __thread and __traits (which indeed are not attributes), but so far could find very little.

What's a good way to make a type a plural when writing comments?

When writing comments, I sometimes find myself needing to talk about a type (class, struct, etc.) in plural when writing comments, such as:
/*
* getThings
* Get a list of --> Things <-- from somewhere.
*/
Thing *getThings(void);
The problem is, the type name is singular (namely, Thing), but I want to talk about them in plural in comments.
If I say Things, it suggests to the reader it's talking about a type called Things, which is not the case. If I say Thing's, it looks awkward because it's not grammatically correct (it's either possessive or "Thing is", not plural). I could talk around the problem and say a list of Thing items
What's a good convention to stick to when writing plurals of types?
Well, depending on the documentation system you're using, you can wrap the name of the type in a special syntax and put the s outside it. For example:
.NET XML comments
Get a list of <see cref="Thing"/>s from somewhere.
doxygen C/C++ comments
Get a list of \link Thing \endlink s from somewhere.
Not 100% certain on the doxygen variant but it should be something like that.
And if you're not using a particular documentation system and thus have no special comments, I'd do something like:
Get a list of [Thing]s from somewhere.
Or you could use ( ) or { }, depending on preference...
I would use the 's' in parentheses.
/* Get a list of Thing(s) from somewhere */

Resources