How can I add semantic highlighting to my Visual Studio language service? - visual-studio

I'm writing a language service with MPF, and I already have basic syntax highlighting working, but I'd like to also add semantic highlighting.
C# does this for type names, for example. The color of an identifier is different when it's naming a type; even the same word in the same statement may be highlighted differently based on context.
The language I'm supporting has very complex rules for contextual keywords, so I'd like to rely on something higher-level than a tokenizer to distinguish between identifiers and keywords. Right now my scanner is just marking every possible keyword as a keyword, even though they may be identifiers in context.
How can I achieve this? Is there any example source code from another language service that does this?

Related

When to use an inline codeSystem in a FHIR ValueSet

When creating a FHIR ValueSet using ALL codes of a small externally defined code list, which would be more appropriate (and indeed correct per the FHIR specification) - a composition or an inline codeSystem?
As an example, creating a ValueSet from the following code list:
http://www.datadictionary.nhs.uk/data_dictionary/attributes/e/end/ethnic_category_code_de.asp
Would there be advantages/disadvantages of using either method?
Inline definition of a code system is used when the code system and value set are synonymous - you're inventing codes and saying the value set contains all of them. Places this occurs are when we're defining structural codes for FHIR (ones that we'll be maintaining rather than external organization) or for things like Questionnaires where the codes might be specific to that particular questionnaire. In general, inventing your own code system isn't encouraged because it's less likely people will recognize it. It's better to draw codes from standardized code systems, be those international (like SNOMED, LOINC, ICD9, etc.) or national or even organization-maintained code systems.

Are there any standards for tmlanguage keyword types?

.tmlanguage files work by defining a list of key value pairs. Regular expressions are the keys and the type of syntax is the value. This is done in the following XML-ish manner:
<key>match</key>
<string>[0-9]</string>
<key>name</key>
<string>constant.numeric</string>
My main question is: Is there a list of values that could go in place of constant.numeric if the file is to be used by a text editor like Sublime?
For a basic introduction, check out the Language Grammars section of the TextMate Manual. The Naming Conventions section describes some of the base scopes, like comment, keyword, meta, storage, etc. These classes can then be subclassed to give as much detail as possible - for example, constant.numeric.integer.long.hexadecimal.python. However, it is very important to note that these are not hard-and-fast rules - just suggestions. This will become obvious as you scan through different language definitions and see, for example, all the different ways that functions are scoped - meta.function-call, support.function.name, meta.function-call punctuation.definition.parameters, etc.
The best way to learn about scopes is to examine existing .tmLanguage files, and to look through the source of different languages and see what scopes are assigned where. The XML format is very difficult to casually browse through, so I use the excellent PackageDev plugin to translate the XML to YAML. It is then much easier to scan and see what scopes are described by what regexes:
Another way to learn is to see how different language constructs are scoped, and for that I highly recommend using ScopeAlways. Once installed and activated, just place your cursor and the scope(s) that apply to that particular position are shown in the status bar. This is particularly useful when designing color schemes, as you can easily see which selectors will highlight a language feature of interest.
If you're interested, the color scheme used here is Neon, which I designed to make as many languages as possible look as good as possible, covering as many scopes as possible. Feel free to look through it to see how the different language elements are highlighted; this could also help you in designing your .tmLanguage to be consistent with other languages.
I hope all this helps, good luck!
Yes. The .tmlanguage format was originally used by TextMate. The TextMate manual provides full documentation for the format, including the possible types of language constructs.
Copied from the relevant docs page, in hierarchical format:
comment — for comments.
line — line comments, we specialize further so that the type of comment start character(s) can be extracted from the scope
double-slash — // comment
double-dash — -- comment
number-sign — # comment
percentage — % comment
character — other types of line comments.
block — multi-line comments like /* … */ and <!-- … -->.
documentation — embedded documentation.
constant — various forms of constants.
numeric — those which represent numbers, e.g. 42, 1.3f, 0x4AB1U.
character — those which represent characters, e.g. <, \e, \031.
escape — escape sequences like \e would be constant.character.escape.
language — constants (generally) provided by the language which are “special” like true, false, nil, YES, NO, etc.
other — other constants, e.g. colors in CSS.
entity — an entity refers to a larger part of the document, for example a chapter, class, function, or tag. We do not scope the entire entity as entity.* (we use meta.* for that). But we do use entity.* for the “placeholders” in the larger entity, e.g. if the entity is a chapter, we would use entity.name.section for the chapter title.
name — we are naming the larger entity.
function — the name of a function.
type — the name of a type declaration or class.
tag — a tag name.
section — the name is the name of a section/heading.
other — other entities.
inherited-class — the superclass/baseclass name.
attribute-name — the name of an attribute (mainly in tags).
we are naming the larger entity.
invalid — stuff which is “invalid”.
illegal — illegal, e.g. an ampersand or lower-than character in HTML (which is not part of an entity/tag).
deprecated — for deprecated stuff e.g. using an API function which is deprecated or using styling with strict HTML.
keyword — keywords (when these do not fall into the other groups).
control — mainly related to flow control like continue, while, return, etc.
operator — operators can either be textual (e.g. or) or be characters.
other — other keywords.
markup — this is for markup languages and generally applies to larger subsets of the text.
underline — underlined text.
link — this is for links, as a convenience this is derived from markup.underline so that if there is no theme rule which specifically targets markup.underline.link then it will inherit the underline style.
bold — bold text (text which is strong and similar should preferably be derived from this name).
heading — a section header. Optionally provide the heading level as the next element, for example markup.heading.2.html for <h2>…</h2> in HTML.
italic — italic text (text which is emphasized and similar should preferably be derived from this name).
list — list items.
numbered — numbered list items.
unnumbered — unnumbered list items.
quote — quoted (sometimes block quoted) text.
raw — text which is verbatim, e.g. code listings. Normally spell checking is disabled for markup.raw.
other — other markup constructs.
meta — the meta scope is generally used to markup larger parts of the document. For example the entire line which declares a function would be meta.function and the subsets would be storage.type, entity.name.function, variable.parameter etc. and only the latter would be styled. Sometimes the meta part of the scope will be used only to limit the more general element that is styled, most of the time meta scopes are however used in scope selectors for activation of bundle items. For example in Objective-C there is a meta scope for the interface declaration of a class and the implementation, allowing the same tab-triggers to expand differently, depending on context.
storage — things relating to “storage”.
type — the type of something, class, function, int, var, etc.
modifier — a storage modifier like static, final, abstract, etc.
string — strings.
quoted — quoted strings.
single — single quoted strings: 'foo'.
double — double quoted strings: "foo".
triple — triple quoted strings: """Python""".
other — other types of quoting: $'shell', %s{...}.
unquoted — for things like here-docs and here-strings.
interpolated — strings which are “evaluated”: `date`, $(pwd).
regexp — regular expressions: /(\w+)/.
other — other types of strings (should rarely be used).
support — things provided by a framework or library should be below support.
function — functions provided by the framework/library. For example NSLog in Objective-C is support.function.
class — when the framework/library provides classes.
type — types provided by the framework/library, this is probably only used for languages derived from C, which has typedef (and struct). Most other languages would introduce new types as classes.
constant — constants (magic values) provided by the framework/library.
variable — variables provided by the framework/library. For example NSApp in AppKit.
other — the above should be exhaustive, but for everything else use support.other.
variable — variables. Not all languages allow easy identification (and thus markup) of these.
parameter — when the variable is declared as the parameter.
language — reserved language variables like this, super, self, etc.
other — other variables, like $some_variables.

Writing ontologies in DL syntax?

I just discovered OWL and Protege. Upon reading through this reference page (which I quote below), I am left wondering whether it is possible to not use the abstract OWL syntax, and rather to write in DL syntax. My background is in logic, so it sounds like it would be more fun even if I would have to translate the ontologies later (though I am sure there must be applications to do this--besides, don't reasoners use DL?).
If it is possible, what configuration of settings should I use in Protege (or other software of your suggestion) in order to do this? I suspect it's not possible, but I want to be sure, as I see no good reason for this other than the awkwardness of special symbols.
EDIT: If it is NOT possible, how exactly are DL languages used?
OWL DL is the description logic SHOIN with support of data values, data types
and datatype properties, i.e., SHOIN(D), but since OWL is based on RDF(S), the
terminology slightly differs. ... For description of OWL ontology or knowledge
base, the DL syntax can be used. There is an "abstract" LISP-like syntax
defined that is easier to write in ASCII character set.
Here's a very brief working example of the two syntax styles for the same data.
don't reasoners use DL?
Not necessarily. They use all kinds of logics, some of which are DLs, some are not.
If it is possible, what configuration of settings should I use in Protege (or other software of your suggestion) in order to do this?
I'm pretty sure there is no such pluggin for Protégé. But if you really want some fun, use a text editor and write your ontology by hand. There are many syntaxes you can use: the functional syntax, the OWL/XML syntax, the RDF/XML syntax are all normative. In addition, you can use the Manchester syntax, Turtle, N-Triples, JSON-LD, that will be future recommendations for writing RDF (and therefore OWL). Or the more exotic RDF/JSON, HDT. Or again, more "powerful" syntaxes like Notation3, TriG, TriX, NQuads. Plenty of fun!
In any case, if you would like to write in the DL syntax, you would need to use special Unicode characters or special commands like in LaTeX for instance. And the parser that deals with it would have to read those characters or commands. Not ideal if you are programming. But you can always use the DL syntax in your writings.
BTW, the current standard Web Ontology Language is OWL 2. Its DL variant (viz., OWL 2 DL) is based on the even more irresistible SROIQ.

Is it possible to search intellisense in vstudio?

Is it possible to search or filter intellisense in visual studio?
Basically i know there is an enum in the project that contains 'column', but the enum doesnt begin with 'c'.
There has been lots of times where id rather not scroll through the hundreds (if not thousands) of valid objects it gives me.
I wonder if the real answer here is (and I won't be surprised to be voted down for this) that your enum isn't properly named. If it was then I'd expect the name to be obvious in the use context, may be consider renaming the enum?
You can search in Class View. Type "column" and hit enter.
Visual Studio 2010 changes all of this, giving you multiple very easy ways to do this type of search quickly.
If you're using ReSharper, you can use "Go To Symbol..." and type "column", and it will give you all symbols (types, properties, fields, methods, etc) that match.
Otherwise your best bet is to use the Object Browser and search.
I really don't know about doing that in intellisense itself, but assuming the objective is to actually find a member whose name you don't remember, you can write a small utility for that purpose using the underlying mechanism intellisense uses, reflection.
Open the Object Browser under View menu. From there, you can search within all the language constructs available to you.

Are semantics and syntax the same?

What is the difference in meaning between 'semantics' and 'syntax'? What are they?
Also, what's the difference between things like "semantic website vs. normal website", "semantic social networking vs. normal social networking" etc.
Syntax is the grammar. It describes the way to construct a correct sentence. For example, this water is triangular is syntactically correct.
Semantics relates to the meaning. this water is triangular does not mean anything, though the grammar is ok.
Talking about the semantic web has become trendy recently. The idea is to enhance the markup (structural with HTML) with additional data so computer could make sense of the web pages more easily.
Syntax is the grammar of a language - the rules by which to form sentences or expressions.
Semantics is the meaning you are trying to express with your code.
A program that is syntactically correct will compile and run.
A program that is semantically correct will actually do what you as the programmer intended it to do. i.e. it doesn't have any bugs in it.
Two programs written to perform the same task in different languages will use different syntaxes, but they would be the same semantically.
If you are talking about web (rather than programming languages):
The syntax of the language is whatever the browser (or processing program) can legally recognize and handle, and render to you. For example, your browser can render HTML, while your API can parse XML trees.
Semantics involve what is actually being represented. There's a lot of buzz now about semantic webs and all that stuff, but it essentially means that each entity is also associated with some human-readable information or metadata, so that a certain tag would have a supposed meaning and refer you to it.
Social networks are the same story. You put knowledge in the links
"An ant ate an aunt." has a correct syntax, but will not make sense semantically. A syntax is a set of rules that can be combined to produce infinite number of gramatically valid sentences, but few, very few of which has a semantics.
Syntax is the word order of a sentence. In English it would be the subject-verb-object form.
Semantic is the meaning behind words. E.g: she ate a saw. The word saw doesn't match according to the meaning of the sentence. but it is grammatically correct. so its syntax is correct. =)
Specifically, semantic social networking means embedding the actual social relationships within the page markup. The standard format for doing this as defined by microformats is XFN, XHTML Friends Network. In regards to the semantic web in general, microformats should be the go-to guide for defining embedded semantic content.
Semantic web sites use the concept of the semantic web, which aims to bring meaning to web content by using special annotations to identify certain concepts in a page. This makes possible the automatic (by a computer, not a human) reasoning about the content, which improves its aggregation, extraction, indexing and searching.
Explanations above are vague on the semantics side, semantics could mean the different elements at disposition to build arguments of value(these being comprehensible, to end-user man and digestible to the machine).
Of course this puts semantics and the programmer-editor-writer-communicator in the middle: he decides on the semantics that should be ideally defined to his public, comprehended by his public, general convention by his public and digestible to the machine-computer. Semantics should be agreed upon, are conceptual, must be implementable to both sides.
Say footnotes, inline and block-quotes, titles and on and on to end up into a well-defined and finite list. Mediawiki, wikitext as an example fails in that perspective, defining syntax for elements of semantic meaning left undefined, no finite list agreed upon. "meaning by form" as additional of what a title as an example again carries as textual content. Example "This is a title" becomes only semantics integrated by the supposition within agreed upon semantics, and there can be more then one set of say "This is important and will be detailed"
Asciidoc and pandoc markup is quite different in it's semantics, regardless of how each translates this by convention of syntax to output formats.
Programming, output formats as html, pdf, epub can have consequentially meaning by form, by semantics, the syntax having disappeared as a temporary tool of translation, and as one more consequence thus the output can be scanned robotically for meaning, the champ of algorithms of 'grep': Google. Looking for the meaning of "what" in "What is it that is looked for" based upon whether a title or a footnote, or a link is considered.
Semantics, and there can be more then one layer, even the textual message carries (Chomsky) semantics thus could be translated as meaning by form, creating functional differences to anything else in the output chain, including a human being, the reader.
As a conclusion, programmers and academics should be integrated, no academic should be without knowledge of his tools, as any bread and butter carpenter. Programmers should be academics in the sense that the other end of the bridging they accomplish is the end user, the bridge... much so: semantics.
m.

Resources