Does a syntax highlighter in an IDE scan the whole file every time a letter is typed? - syntax-highlighting

Assuming a syntax highlighter uses a lexer to do the background work: when typing in an IDE with live syntax highlighting, does the lexer have to re-tokenize the entire file (in whatever language, ex. Java, C++, Python, etc), does the lexer only have to re-read and tokenize the current line, or does it only keep itself occupied with a single character/word at a time?
I'm asking because in a lot of editors/IDEs, most code remains the same as the programmer is typing, however, in some cases there's stuff like starting a string literal, which re-highlights the rest of the line, and in other cases like starting a multi-line comment, the whole text file becomes re-highlighted from the point where I start the multi-line comment, to the end of the file.
If the lexical analysis has to be done for the entire file for every single letter typed, wouldn't that make it slow, especially for larger (100.000+ lines) text files?

There is a syntax highlight and semantic highlight.
Syntax highlight is when editor only decorates based on language syntax - e.g. identifiers are black, keywords are pink and comments are green. Syntax highlight does not necessarily reparses (or, rather, tokenizes) the whole file - it can only tokenize "damaged region" (e.g. tokens around edit location). Of course, editor developer may opt to tokenize the whole input - as it is really fast, error-proof and easier to implement.
Semantic highlight (one that, for instance, can highlight global and local identifiers differently) usually require complete reparse - e.g. in Java adding "static" to function declaration would require you to invalidate function references both above and below the cursor. In some cases caching may be implemented (e.g. cache include files parse result as user edit does not change it that much). Semantic highlight is slow so it is usually combined with syntax highligh (you may see in Eclipse that the keywords are highlighted instantly - while member variable changes the color from the black after some small delay).

I didn't look this up, but I am pretty sure that it depends what is being highlighted. That is, comparing the local area you are typing in with basic syntax; versus, say an open comment that until closed highlights from that point until the end of the file.

Related

Supress cross reference hyperlink using exclamation mark

Prefixing the link with an ! suppresses the creation of a reference (e.g. :ref:`!no link` will be simply rendered as no link):
If you prefix the content with !, no reference/hyperlink will be created.
However, I can't think of any practical usage of this. Why should I first create a reference and then don't want to use it - it would be far easier to write plain text from the very beginning.
So - what is a typical use case of such a suppressed reference?
(Sphinx itself for instance doesn't use it in its docs.)
I can't think of any practical usage of this.
The use I can think of is before a build if you wanted to "turn of" hyperlink generation for one given cross-reference (that appears multiple times) how would you do it?
Well, the simplest way might be using some text editor's "find and replace", and arguably the least invasive way would be to add or remove a single character !. That way the length and structure of the cross-reference is kept in the source (and the title is still rendered in place). This could be convenient in several places, like a table where removing the whole cross-reference could misalign the source.
The most economical change possible would be turning this:
:ref:`a very long title <an.extremely.long.link.target>`
into this:
:ref:`!a very long title <an.extremely.long.link.target>`
The same could possibly be achieved programmatically using the Sphinx API, but a lot of Sphinx users are likely to prefer a text editing solution over a programmatic one.

SPSS: Create links/anchors in syntax

Is there a way to create links or anchors within SPSS syntax? Something like linking to a bookmark.
I am making changes and additions to a syntax file, and document these changes at the bottom of the file as comments. In these comments I would like to link to the part of the syntax that was changed. Now I just write the line number, but that changes as I add more syntax, so the reference becomes incorrect.
Bookmarks were the closest thing I found to what I want to do, but I can't turn them into a link. Moreover, I can only create a maximum of 9 bookmarks, which is not enough.
Trying to think creatively here:
instead of bookmarking all the changes, you could break up your syntax into many small syntaxes - each of which contains one of the parts where a change was made.
you can name and number the small syntaxes accordingly.
Then you create one syntax which contains a series of INSERT commands, which calls each of the small syntaxes in turn. You can add titles and remarks between the insert commands, so other users can follow the process and study the relevant small syntax that they need separately.
The Statistics Syntax Editor supports bookmarks - you can have up to 10. Generate a few in the SE and save the syntax file to see how these are represented (hint: look at the COMMENT BOOKMARK lines.

How to find foreign language used in "C comments"

I have a large source code where most of the documentation and source code comments are in english. But one of the minor contributors wrote comments in a different language, spread in various places.
Is there a simple trick that will let me find them ? I imagine first a way to extract all comments from the code and generate a single text file (with possible source file / line number info), then pipe this through some language detection app.
If that matters, I'm on Linux and the current compiler on this project is CLang.
The only thing that comes to mind is to go through all of the code manually and check it yourself. If it's a similar language, that doesn't contain foreign letters, consider using something with a spellchecker. This way, the text that isn't recognized will get underlined, and easy to spot.
Other than that, I don't see an easy way to go through with this.
You could make a program, that reads the files and only prints the comments out to another output file, where you then spell check that file, but this would seem to be a waste of time, as you would easily be able to spot the comments yourself.
If you do make a program for that, however, keep in mind that there are three things to check for:
If comment starts with /*, make sure it stops reading when encountering */
If comment starts with //, only read one line - unless:
If line starting with // ends with \, read next line as well
While it is possible to detect a language from a string automatically, you need way more words than fit in a usual comment to do so.
Solution: Use your own eyes and your own brain...

What is the option that makes Sublime Text add whitespace at the left of wrapped lines?

I want Sublime to treat my text files like they are source code files and show whitespace at the left of wrapped lines, like so:
beginning of long line blah blah blah, now it wraps
and it keeps going after an automatic indent
I tried opening the console and setting the option manually:
view.settings().set('indent_subsequent_lines', True)
but nothing changes, any ideas?
Well, that was fun. There's no easily accessible setting for this. But as you indicate, ST decides whether to add an extra indent when soft wrapping, from whether the syntax is considered code-like or plain text-like.
Being one or the other is up to the package defining the syntax to specify. So lacking a global setting from ST, you need to change your text packages. As an example, let's take Text. That contains Plain text.tmLanguage. In that you change
<key>scopeName</key>
<string>text.plain</string>
to
<key>scopeName</key>
<string>source.plain</string>
I'm unsure of whether there'll be ill effects from not keeping .plain.
One easy way to do this is to get the PackageResourceViewerpackage.
After install, do:
cmdshiftp
Type: Open Resource
return
Type: Text
return
Type: Plain text.tmLanguage
Make your edit and save.
PackageResourceViewer will save the modified Text package to your Packages directory. And sublime will display files considered to have Plain text-syntax, like they are code.
The caveat is that you need to do this for every text-syntax you want to be considered code.
As #AdamAL said, it's dependent on both the 'indent_subsequent_lines' setting as well as whether it's considered a "source" or a "text" language ... by default, "markup" languages (such as HTML, CSS, etc.) and plain text, etc., are considered "text", and programming languages such as C++, Java, PHP, etc., are considered "source".
"Source" languages will indent subsequent soft-wrapped lines (if the indent_subsequent_lines is true), whereas "text" languages will only indent up to the same level as the current line.
For each one you want to change, you'll need to edit the settings of the given language. #AdamAL's answer provides a great way to do this using the PackageResourceViewer package:
After install, do:
[Ctrl/Cmd]+[shift]+p, "Open Resource"
Find the name of the language you want to change, and find either the .sublime-syntax file or the .tmLanguage file. .sublime-syntax is supported from build 3084 of ST 3 and appears that it may trump values in the .tmLanguage file in supported versions, if present (when editing the definition of TaskPaper files provided by the "PlainTasks" package, my change didn't take when just editing the PlainTasks.tmLanguage, I had to edit the PlainTasks.sublime-syntax before it took).
In .sublime-syntax (which are YAML files) look for the first scope: line, where the main scope name of the language is identified (there will be lots of other scope: entries further down under contexts:).
In .tmLanguage (which are XML .plist files) look for the <string> following the <key>scopeName</key>.
Sublime Text Syntax Definition Documentation Reference:
scopeName
Name of the topmost scope for this syntax definition. Either source.<lang> or text.<lang>. Use source for programming languages and text for markup and everything else.
The <lang> (without brackets) is just an identifier string for the given syntax/language definition.
I noted that it seems that (in ST 3, anyway), no restart of Sublime Text is needed to get the changes to apply, if the edit is in the right place.
And also note that there may be other effects of changing this in more complex packages -- For example, in PlainTasks, the additional keybindings that it defines depended on it looking for a context that included text.todo, which I changed to source.todo in several places. So in order for the keybindings to work properly again, I also had to update my .sublime-keymap for that package. (This could also be because I changed it in a place besides the .sublime-syntax that I didn't need to. Just sayin' -- YMMV.)

Syntax Highlighting in Sublime Text 2

So I have been trying to figure out how to add syntax highlighting for the name of typedef's in c++ files, in sublime text.
For example, if I have typedef long long integer; I want integer to be highlighted (preferably the same color as the other types: int, bool, etc.). I went looked at the C.tmLanuage file, and tried to add the following regex code ^typedef.*?\s(\w+)\s*; to storage.type.c (line 49), but it didn't work. If I add the word string, it will highlight all instances of the word string. I tried going in the C++.tmLanguage file, and adding the regex code to storage.type.c++, but it still did not work.
Does anybody know how to get typedef's highlighted in sublime text?
Also, is there a way to get syntax highlighting for class name? Let's say I declare a string or vector, I would like either string or vector to be highlighted.
That regex would work (I believe) if you had something along the lines of typedef foo; To get the behavior you want, you will have to create a slightly more complex pattern entry in the tmLanguage file. As the language file is based on TextMates, you will want to have this as a reference (http://manual.macromates.com/en/language_grammars#language_grammars). I would also recommend using PlistJsonConverter (working in JSON is easier for me than working in XML). You will probably need to define begin and end patterns (begin will probably be typedef end will probably be ;). You can then apply whatever patterns you want to that group.
As for the class name highlighting, I would look to see what, if any scopes are being applied. If none are, you will have to come up with a regex to apply the scope to those. You can then add a color entry, or use a defined one from the color scheme.
Edit:
Actually they don't appear to be JSON. I see () rather than []. JSON is pretty simple to understand. You can look for something more in depth, but wikipedia is a good place to start. What you would probably be interested in are the things under the "Rule Keys" section. I did some searching (because I knew there were some better examples out there), and came across http://docs.sublimetext.info/en/latest/extensibility/syntaxdefs.html . It goes over syntax definitions from scratch, but the most relevant section is probably http://docs.sublimetext.info/en/latest/extensibility/syntaxdefs.html#analyzing-patterns. I don't have a regex to find class names, so you would have to come up with one yourself. If you haven't already though, you may want to search around to see if someone else has implemented a language file in a way that works for you.
You will want to start with the built in tmLanguage file and convert that from a Plist to json. You can then edit that file and move it back.

Resources