If I have an identifier with a same name as existing keyword, how do I escape it?
That's what I found (and this is probably the final answer):
It is possible to use # as a prefix in identifier names. However, by default it creates a different identifier (#a != a).
Since # is allowed, it is possible to add a new compiler step to the pipeline that will do TrimStart('#') on all identifiers. It works ok, you will just have to remember all types of things that have names.
If you are using Rhino.DSL, it has a UseSymbols step that converts #a into 'a', which had confused me a lot (I was working with project that included this step by default).
I don't think anything like the C# # prefix is implemented in Boo... but I'm pretty sure it could be achieved by inserting a custom compiler step to the beginning of the compiler pipeline.
Related
What is the most idiomatic way to get some form of context to the yacc parser in goyacc, i.e. emulate the %param command in traditional yacc?
I need to parse to my .Parse function some context (in this case including for instance where to build its parse tree).
The goyacc .Parse function is declared
func ($$rcvr *$$ParserImpl) Parse($$lex $$Lexer) int {
Things I've thought of:
$$ParserImpl cannot be changed by the .y file, so the obvious solution (to add fields to it) is right out, which is a pity.
As $$Lexer is an interface, I could stuff the parser context into the Lexer implementation, then force type convert $$lex to that implementation (assuming my parser always used the same lexer), but this seems pretty disgusting (for which read non-idiomatic). Moreover there is (seemingly) no way to put a user-generated line at the top of the Parse function like c := yylex.(*lexer).c, so in the many tens of places I want to refer to this variable, I have to use the rather ugly form yylex.(*lexer).c rather than just c.
Normally I'd use %param in normal yacc / C (well, bison anyway), but that doesn't exist in goyacc.
I'd like to avoid postprocessing my generated .go file with sed or perl for what are hopefully obvious reasons.
I want to be able to (go)yacc parse more than one file at once, so a global variable is not possible (and global variables are hardly idiomatic).
What's the most idiomatic solution here? I keep thinking I must be missing something simple.
My own solution is to modify goyacc (see this PR) which adds a %param directive allowing one or more fields to be added to the $$ParserImpl structure (accessible as $$rcvr in code). This seems the most idiomatic route. This permits not only passing context in, but the ability for the user to add additional func()s using $$ParserImpl as a receiver.
In the file file.h, following code is seen.
#ifndef FILE_H
#define FILE_H
...
...
#endif
QUESTION: Who generated FILE_H (is FILE_H called identifier?) What is this naming convention called? What I should read to understand more of this?
At the moment, I know this is called include guard, and stuff to do with preprocessor. But I can't seem to google futher. Any links would be highly appreciated.
Include guards are not actually a feature of the language itself, they're just a way to ensure the same header file isn't included twice into the same translation unit, and they're built from lower-level language features.
The actual features that make this possible are macro replacement (specifically object-like macros) and conditional inclusion.
Hence, to find out where the FILE_H comes from, you need to examine two things.
The first is the limitations imposed by the standard (C11 6.4.2 for example). In macro replacement, the macro name must be drawn from a limited character set, the minimum of which includes the upper and lower case letters, the underscore, and the digits (there are all sorts of extras that can be allowed, such as universal character names or other implementation-defined characters but this is the mandated baseline).
The second is the mind of the developer. Beyond the constraints of the standard, the developer must provide a unique identifier used for the include guard and the easiest way to do this is to make it reliant somehow on the file name itself. Hence, one practice is to use the uppercase file name with . replaced by an underscore.
That's why you'll end up with an include guard for btree.h being of the form:
#ifndef BTREE_H
#define BTREE_H
// weave your magic here
#endif
You should keep in mind however that it doesn't always work out well. Sometimes you may end up with two similarly named header files that use the same include guard name, resulting in one of the header files not being included at all. This happens infrequently enough that it's usually not worth being concerned about.
So, I'm building a custom backend for GCC for a processor. This processor has 4 address spaces: local, global, mmm, and mmr. I want to make it such that when writing c code, you can do this:
int global x = 5;
which would cause the compiler to spit out an instruction like this:
ldi.g %reg, 5
I know that certain processors like blackfin and MeP do something similar to this, so I figure its possible to do, however I have no idea how to do it. The technique that should allow me to do this is a variable attribute.
Any suggestions on how I could go about doing this?
You can add target-specific attributes by registering a struct attribute_spec table using TARGET_ATTRIBUTE_TABLE, as described in the GCC internals documentation. The details of struct attribute_spec can be found in the source (gcc/tree.h).
This handler doesn't need to do anything beyond returning NULL_TREE, although typically it will at least do some error checking. (Read the comments in gcc/tree.h, and look at examples in other targets.)
Later, you can obtain the list of attributes for a declaration tree node with DECL_ATTRIBUTES() (see the internals docs again), and use lookup_attribute() (see gcc/tree.h again) to see if a given attribute in the list.
You want to references to a symbol to generate different assembly based on your new attributes, so you probably want to use the TARGET_ENCODE_SECTION_INFO hook ("Define this hook if references to a symbol or a constant must be treated differently depending on something about the variable or function named by the symbol") to set a flag on the symbol_ref (as the docs suggest). You can define a predicate for testing this flag in the .md .
I'm considering how to do automatic bug tracking and as part of that I'm wondering what is available to match source code line numbers (or more accurate numbers mapped from instruction pointers via something like addr2line) in one version of a program to the same line in another. (Assume everything is in some kind of source control and is available to my code)
The simplest approach would be to use a diff tool/lib on the files and do some math on the line number spans, however this has some limitations:
It doesn't handle cross file motion.
It might not play well with lines that get changed
It doesn't look at the information available in the intermediate versions.
It provides no way to manually patch up lines when the diff tool gets things wrong.
It's kinda clunky
Before I start diving into developing something better:
What already exists to do this?
What features do similar system have that I've not thought of?
Why do you need to do this? If you use decent source version control, you should have access to old versions of the code, you can simply provide a link to that so people can see the bug in its original place. In fact the main problem I see with this system is that the bug may have already been fixed, but your automatic line tracking code will point to a line and say there's a bug there. Seems this system would be a pain to build, and not provide a whole lot of help in practice.
My suggestion is: instead of trying to track line numbers, which as you observed can quickly get out of sync as software changes, you should decorate each assertion (or other line of interest) with a unique identifier.
Assuming you're using C, in the case of assertions, this could be as simple as changing something like assert(x == 42); to assert(("check_x", x == 42)); -- this is functionally identical, due to the semantics of the comma operator in C and the fact that a string literal will always evaluate to true.
Of course this means that you need to identify a priori those items that you wish to track. But given that there's no generally reliable way to match up source line numbers across versions (by which I mean that for any mechanism you could propose, I believe I could propose a situation in which that mechanism does the wrong thing) I would argue that this is the best you can do.
Another idea: If you're using C++, you can make use of RAII to track dynamic scopes very elegantly. Basically, you have a Track class whose constructor takes a string describing the scope and adds this to a global stack of currently active scopes. The Track destructor pops the top element off the stack. The final ingredient is a static function Track::getState(), which simply returns a list of all currently active scopes -- this can be called from an exception handler or other error-handling mechanism.
What I like to do is remove all functions which has specific word for example, if the word is 'apple':
void eatapple()
{
// blah
// blah
}
I'd like to delete all code from 'void' to '}'.
What I tried is:
^void.*apple(.|\n)*}
But it took very long time I think something is wrong here.
I'm using Visual Studio. Thank you.
To clarify jeong's eventual solution, which I think is pretty clever: it works, but it depends on the code being formatted in a very particular way. But that's OK, because most IDE's can enforce a particular formatting standard anyway. If I may give a related example - the problem that brought me here - I was looking to find expressions of the form (in Java)
if (DEBUG) {
// possibly arbitrary statements or blocks
}
which, yes, isn't technically regular, but I ran the Eclispe code formatter on the files to make sure they all necessarily looked like this (our company's usual preferred code style):
if (DEBUG) {
statement();
while (whatever) {
blahblahblah(etc);
}
// ...
}
and then looking for this (this is Java regex syntax, of course)
^(\s*)if \(DEBUG.*(?:\n\1 .*)*\n\1\}
did the trick.
Finally did it.
^void.*(a|A)pple\(\)\n\{\n((\t.*\n)|(^$\n))*^\}
Function blocks aren't regular, so using a regular expression in this situation is a bad idea.
If you really have a huge number of functions that you need to delete (more than you can delete by hand (suggesting there's something wrong with your codebase — but I digress)) then you should write a quick brace-counting parser instead of trying to use regular expressions in this situation.
It should be pretty easy, especially if you can assume the braces are already balanced. Read in tokens, find one that matches "apple", then keep going until you reach the brace that matches with the one immediately after the "apple" token. Delete everything between.
In theory, regular language is not able to express a sentence described by context free grammar. If it is a one time job, why don't you just do it manually.
Switch to VI. Then you can select the opening brace and press d% to delete the section.