Implementing fork-merge parser for C - refactoring

I'm trying to implement a fork-merge parser for C using Java. I need to fork the parser whenever I find an #if directive. For example:
int x = #if 3; #else 4; #endif
The above statement should be parsed as follows:
First I create a new parser for #if and read-in everything under #if statement, in the above case, reading value 3 directly wold throw a syntax error, in that case I should read back all the tokens that were already read. How do I do this?

Well, you could take any standard parser, and replicate its state on encountering an #if. Then one version goes down the fork of the #if, the other down the fork of the #else. (Doesn't have a #else? Pretend you saw #if cond lexemes #else #endif).
Life gets messy if the preprocessor conditionals don't occur in nice places:
void p()
{ if (x<2)
{ y =
#if cond
3; } else { x=
#else
7;
#endif
}
The parse can end up in really different states at the end of each conditional subpart. A consequence: you need to expand preprocessor conditionals so they include clean language structures.
A serious problem with this approach, is that if you fork every time you see #if, you end up with 2^N forks for N #ifs. It is easy to find C code with dozens of conditionals; 2^24 --> 16 million so this gets out of hand fast.
So you need a way to merge the parses back together again when you hit the #endif. That's not so easy; we have done this with a GLR parser but the answer is very complicated and won't fit easily here. This technical paper discusses how to do this merge for LR parsers: http://www.lirmm.fr/~ducour/Doc-objets/ECOOP2012/PLDI/pldi/p323.pdf
There's a second complication: imagine
#if cond1
stuff1
#else
#if cond2
stuff4
#else
stuff5
#endif
#endif
Now you need to fork parsers inside parsers. Worse, stuff4 has a condition, which is the conjunction of cond1 and cond2, but stuff5 has a condition of cond1 & ~ cond2. Realistically, you're going to need to compute and retain the condition under which each parse (and generated subtree) occurs. You'll need some kind of symbolic condition computation to do this, and you'll need to handle the case where the composed condition is entirely false specially (just skip the content). Interestingly, our GLR solution and the above technical paper both agree that using BDDs for this is a good idea.
If you want to do refactoring, you'll need to determine the meaning of names in the presence of conditionals:
#if cond1
float x;
#else
char* x;
#endif
...
x=
#if cond1
3.7
#else
"foobar"
#endif
;
This requires having a symbol table that carries conditional information with the symbols. See my technical paper http://www.rcost.unisannio.it/mdipenta/papers/scam2002.pdf for details on how to approach this.
To do refactoring, you're going to need control and data flow analysis on top of all this.
Check out my bio for a tool where we are attempting to do all this. We have the conditional parsing part done right, we think. The rest is still up in the air.

Related

HLSL - Compile time iteration without a loop

I currently have no way to compile my shaders into assembly to investigate the results of this question, and I'm honestly not sure if I know enough about assembly to figure this out, even if I could.
Essentially, I'm wanting to do something like this:
int itr = 0;
#if X
DoSomethingX( itr++ );
#endif
#if Y
DoSomethingY( itr++ );
#endif
#if Z
DoSomethingZ( itr++ );
#endif
and be confident that the compiler would generate code like this in cases where X Y Z are all true:
DoSomethingX( 0 );
DoSomethingY( 1 );
DoSomethingZ( 2 );
Will this work as-is? I would imagine it should, but then I wonder what the point of the [unroll] keyword is, if the compiler automatically does such things?
Is there any type of keyword to suggest this behavior? Or perhaps a completely different design/syntax that gives the same result? My specific usage scenario is to extract data from a dynamic buffer, where certain data is only uploaded when that feature is enabled, but I imagine this type of situation pops up elsewhere.
I'm using DirectX12 with (runtime) dxc, but general answers are welcome.

Is there any difference between #define and # define? [duplicate]

I know that #defines, etc. are normally never indented. Why?
I'm working in some code at the moment which has a horrible mixture of #defines, #ifdefs, #elses, #endifs, etc. All these often mixed in with normal C code. The non-indenting of the #defines makes them hard to read. And the mixture of indented code with non-indented #defines is a nightmare.
Why are #defines typically not indented? Is there a reason one wouldn't indent (e.g. like this code below)?
#ifdef SDCC
#if DEBUGGING == 1
#if defined (pic18f2480)
#define FLASH_MEMORY_END 0x3DC0
#elif defined (pic18f2580)
#define FLASH_MEMORY_END 0x7DC0
#else
#error "Can't set up flash memory end!"
#endif
#else
#if defined (pic18f2480)
#define FLASH_MEMORY_END 0x4000
#elif defined (pic18f2580)
#define FLASH_MEMORY_END 0x8000
#else
#error "Can't set up flash memory end!"
#endif
#endif
#else
#if DEBUGGING == 1
#define FLASH_MEMORY_END 0x7DC0
#else
#define FLASH_MEMORY_END 0x8000
#endif
#endif
Pre-ANSI C preprocessor did not allow for space between the start of a line and the "#" character; the leading "#" had to always be placed in the first column.
Pre-ANSI C compilers are non-existent these days. Use which ever style (space before "#" or space between "#" and the identifier) you prefer.
http://www.delorie.com/gnu/docs/gcc/cpp_48.html
As some have already said, some Pre-ANSI compilers required the # to be the first char on the line but they didn't require de preprocessor directive to be attached to it, so indentation was made this way.
#ifdef SDCC
# if DEBUGGING == 1
# if defined (pic18f2480)
# define FLASH_MEMORY_END 0x3DC0
# elif defined (pic18f2580)
# define FLASH_MEMORY_END 0x7DC0
# else
# error "Can't set up flash memory end!"
# endif
# else
# if defined (pic18f2480)
# define FLASH_MEMORY_END 0x4000
# elif defined (pic18f2580)
# define FLASH_MEMORY_END 0x8000
# else
# error "Can't set up flash memory end!"
# endif
# endif
#else
# if DEBUGGING == 1
# define FLASH_MEMORY_END 0x7DC0
# else
# define FLASH_MEMORY_END 0x8000
# endif
#endif
I've often seen this style in old Unix headers but I hate it as the syntax coloring often fails on such code. I use a very visible color for pre-processor directive so that they stand out (they are at a meta-level so should not be part of the normal flow of code).
You can even see that SO does not color the sequence in a useful manner.
Regarding the parsing of preprocessor directives, the C99 standard (and the C89 standard before it) were clear about the sequence of operations performed logically by the compiler. In particular, I believe it means that this code:
/* */ # /* */ include /* */ <stdio.h> /* */
is equivalent to:
#include <stdio.h>
For better or worse, GCC 3.4.4 with '-std=c89 -pedantic' accepts the comment-laden line, at any rate. I'm not advocating that as a style - not for a second (it is ghastly). I just think that it is possible.
ISO/IEC 9899:1999 section 5.1.1.2 Translation phases says:
[Character mapping, including trigraphs]
[Line splicing - removing backslash newline]
The source file is decomposed into preprocessing tokens and sequences of
white-space characters (including comments). A source file shall not end in a
partial preprocessing token or in a partial comment. Each comment is replaced by
one space character. New-line characters are retained. Whether each nonempty
sequence of white-space characters other than new-line is retained or replaced by
one space character is implementation-defined.
Preprocessing directives are executed, macro invocations are expanded, [...]
Section 6.10 Preprocessing directives says:
A preprocessing directive consists of a sequence of preprocessing tokens that begins with
a # preprocessing token that (at the start of translation phase 4) is either the first character
in the source file (optionally after white space containing no new-line characters) or that
follows white space containing at least one new-line character, and is ended by the next
new-line character.
The only possible dispute is the parenthetical expression '(at the start of translation phase 4)', which could mean that the comments before the hash must be absent since they are not otherwise replaced by spaces until the end of phase 4.
As others have noted, the pre-standard C preprocessors did not behave uniformly in a number of ways, and spaces before and in preprocessor directives was one of the areas where different compilers did different things, including not recognizing preprocessor directives with spaces ahead of them.
It is noteworthy that backslash-newline removal occurs before comments are analyzed.
Consequently, you should not end // comments with a backslash.
I don't know why it's not more common. There are certainly times when I like to indent preprocessor directives.
One thing that keeps getting in my way (and sometimes convinces me to stop trying) is that many or most editors/IDEs will throw the directive to column 1 at the slightest provocation. Which is annoying as hell.
These days I believe this is mainly a choice of style. I think at one point in the distant past, not all compilers supported the notion of indenting preprocessor defines. I did some research and was unable to back up that assertion. But in any case, it appears that all modern compilers support the idea of indenting pre-processor macro. I do not have a copy of the C or C++ standard though so I do not know if this is standard behavior or not.
As to whether or not it's good style. Personally, I like the idea of keeping them all to the left. It gives you a consistent place to look for them. Yeah it can get annoying when there are very nested macros. But if you indent them, you'll eventually end up with even weirder looking code.
#if COND1
void foo() {
#if COND2
int i;
#if COND3
i = someFunction()
cout << i << eol;
#endif
#endif
}
#endif
For the example you've given it may be appropriate to use indentation to make it clearer, seeing as you have such a complex structure of nested directives.
Personally I think it is useful to keep them not indented most of the time, because these directives operate separately from the rest of your code. Directives such as #ifdef are handled by the pre-processor, before the compiler ever sees your code, so a block of code after an #ifdef directive may not even be compiled.
Keeping directives visually separated from the rest of your code is more important when they are interspersed with code (rather than a dedicated block of directives, as in the example you give).
In almost all the currently available C/CPP compilers it is not restricted. It's up to the user to decide how you want to align code.
So happy coding.
I'm working in some code at the moment which has a horrible mixture of #defines, #ifdefs, #elses, #endifs, #etc. All these often mixed in with normal C code. The non-indenting of the #defines makes them hard to read. And the mixture of indented code with non-indented #defines is a nightmare.
A common solution is to comment the directives, so that you easily know what they refer to:
#ifdef FOO
/* a lot of code */
#endif /* FOO */
#ifndef FOO
/* a lot of code */
#endif /* not FOO */
I know this is old topic but I wasted couple of days searching for solution. I agree with initial post that intending makes code cleaner if you have lots of them (in my case I use directives to enable/disable verbose logging). Finally, I found solution here which works Visual Studio 2017
If you like to indent #pragma expressions, you can enable it under: Tools > Options > Text Editor > C/C++ > Formatting > Indentation > Position of preprocessor directives > Leave indented
The only problem left is that auto code layout fixed that formatting =(

Mathematica: conditional "compilation"

I'm trying to make a conditinal expression which would initialize some functions, variables etc.. Something which would look like this in C:
#if option==1
int foo(int x){/*some code here*/}
int q=10;
#else
char foo(int x){/*some other code*/}
double q=3.141592;
#endif
use_q(q);
f(some_var);
In Mathematica I've tried using If, like this:
If[option==1,
foo[x_]=some_expression1;
q=10;
,
foo[x_]=some_expression2;
q=3.141592;
]
use_q[q];
f[some_var];
But the result is that functions' arguments are colored red, and nothing gets initialized or computed inside If.
So, how should I do instead to get conditional "compilation"?
Several things:
Do not use blanks (underscores) in variable names - in Mathematica these are reserved symbols, representing patterns.
In case you condition does not evaluate to True or False, If does not evaluate either.
Thus:
In[12]:= If[option==1,Print["1"],Print["Not 1"]]
Out[12]= If[option==1,Print[1],Print[Not 1]]
thus your result. Red colred arguments are not the issue in this particular case. You should either use === in place of ==, or TrueQ[option==1], to get what you want. Have a look here, for more information.
This sounds like something that would be better done as a function with an option, for example
Options[myfunction,{Compiled->False}]
myfunction[x_,opts:OptionsPattern[]]:=
With[{comp= TrueQ[OptionValue[Compiled]]},
If[comp, compiledFunction[x], notcompiledFunction[x] ]]
(The local constant comp within the With statement is not strictly necessary for this example but would be useful if your code is at all complex and you use this conditional more than once.)
I do not recommend defining different cases of a function inside an If[] statement. You would be better off using the built-in pattern-matching abilities in Mathematica. (See documentation here and especially here.)
Some useful documentation on options within functions can be found here, here and here.

style opinion re. empty If block

I'm trying to curb some of the bad habits of a self-proclaimed "senior programmer." He insists on writing If blocks like this:
if (expression) {}
else {
statements
}
Or as he usually writes it in classic ASP VBScript:
If expression Then
Else
statements
End If
The expression could be something as easily negated as:
if (x == 0) {}
else {
statements
}
Other than clarity of coding style, what other reasons can I provide for my opinion that the following is preferred?
if (x != 0) {
statements
}
Or even the more general case (again in VBScript):
If Not expression Then
statements
End If
Reasons that come to my mind for supporting your opinion (which I agree with BTW) are:
Easier to read (which implies easier to understand)
Easier to maintain (because of point #1)
Consistent with 'established' coding styles in most major programming languages
I have NEVER come across the coding-style/form that your co-worker insists on using.
I've tried it both ways. McConnell in Code Complete says one should always include both the then and the else to demonstrate that one has thought about both conditions, even if the operation is nothing (NOP). It looks like your friend is doing this.
I've found this practice to add no value in the field because unit testing handles this or it is unnecessary. YMMV, of course.
If you really want to burn his bacon, calculate how much time he's spending writing the empty statements, multiply by 1.5 (for testing) and then multiply that number by his hourly rate. Send him a bill for the amount.
As an aside, I'd move the close curly bracket to the else line:
if (expression) {
} else {
statements
}
The reason being that it is tempting to (or easy to accidentally) add some statement outside the block.
For this reason, I abhor single-line (bare) statements, of the form
if (expression)
statement
Because it can get fugly (and buggy) really fast
if (expression)
statement1
statement2
statement2 will always run, even though it might look like it should be subject to expression. Getting in the habit of always using brackets will kill this stumbling point dead.

What are the pros and cons of putting as much logic as possible in a minimum(one-liners) piece of code?

Is it cool?
IMO one-liners reduces the readability and makes debugging/understanding more difficult.
Maximize understandability of the code.
Sometimes that means putting (simple, easily understood) expressions on one line in order to get more code in a given amount of screen real-estate (i.e. the source code editor).
Other times that means taking small steps to make it obvious what the code means.
One-liners should be a side-effect, not a goal (nor something to be avoided).
If there is a simple way of expressing something in a single line of code, that's great. If it's just a case of stuffing in lots of expressions into a single line, that's not so good.
To explain what I mean - LINQ allows you to express quite complicated transformations in relative simplicity. That's great - but I wouldn't try to fit a huge LINQ expression onto a single line. For instance:
var query = from person in employees
where person.Salary > 10000m
orderby person.Name
select new { person.Name, person.Deparment };
is more readable than:
var query = from person in employees where person.Salary > 10000m orderby person.Name select new { person.Name, person.Deparment };
It's also more readabe than doing all the filtering, ordering and projection manually. It's a nice sweet-spot.
Trying to be "clever" is rarely a good idea - but if you can express something simply and concisely, that's good.
One-liners, when used properly, transmit your intent clearly and make the structure of your code easier to grasp.
A python example is list comprehensions:
new_lst = [i for i in lst if some_condition]
instead of:
new_lst = []
for i in lst:
if some_condition:
new_lst.append(i)
This is a commonly used idiom that makes your code much more readable and compact. So, the best of both worlds can be achieved in certain cases.
This is by definition subjective, and due to the vagueness of the question, you'll likely get answers all over the map. Are you referring to a single physical line or logical line? EG, are you talking about:
int x = BigHonkinClassName.GetInstance().MyObjectProperty.PropertyX.IntValue.This.That.TheOther;
or
int x = BigHonkinClassName.GetInstance().
MyObjectProperty.PropertyX.IntValue.
This.That.TheOther;
One-liners, to me, are a matter of "what feels right." In the case above, I'd probably break that into both physical and logic lines, getting the instance of BigHonkinClassName, then pulling the full path to .TheOther. But that's just me. Other people will disagree. (And there's room for that. Like I said, subjective.)
Regarding readability, bear in mind that, for many languages, even "one-liners" can be broken out into multiple lines. If you have a long set of conditions for the conditional ternary operator (? :), for example, it might behoove you to break it into multiple physical lines for readability:
int x = (/* some long condition */) ?
/* some long method/property name returning an int */ :
/* some long method/property name returning an int */ ;
At the end of the day, the answer is always: "It depends." Some frameworks (such as many DAL generators, EG SubSonic) almost require obscenely long one-liners to get any real work done. Othertimes, breaking that into multiple lines is quite preferable.
Given concrete examples, the community can provide better, more practical advice.
In general, I definitely don't think you should ever "squeeze" a bunch of code onto a single physical line. That doesn't just hurt legibility, it smacks of someone who has outright disdain for the maintenance programmer. As I used to teach my students: always code for the maintenance programmer, because it will often be you.
:)
Oneliners can be useful in some situations
int value = bool ? 1 : 0;
But for the most part they make the code harder to follow. I think you only should put things on one line when it is easy to follow, the intent is clear, and it won't affect debugging.
One-liners should be treated on a case-by-case basis. Sometimes it can really hurt readability and a more verbose (read: easy-to-follow) version should be used.
There are times, however when a one-liner seems more natural. Take the following:
int Total = (Something ? 1 : 2)
+ (SomethingElse ? (AnotherThing ? x : y) : z);
Or the equivalent (slightly less readable?):
int Total = Something ? 1 : 2;
Total += SomethingElse ? (AnotherThing ? x : y) : z;
IMHO, I would prefer either of the above to the following:
int Total;
if (Something)
Total = 1;
else
Total = 2;
if (SomethingElse)
if (AnotherThing)
Total += x;
else
Total += y;
else
Total += z
With the nested if-statements, I have a harder time figuring out the final result without tracing through it. The one-liner feels more like the math formula it was intended to be, and consequently easier to follow.
As far as the cool factor, there is a certain feeling of accomplishment / show-off factor in "Look Ma, I wrote a whole program in one line!". But I wouldn't use it in any context other than playing around; I certainly wouldn't want to have to go back and debug it!
Ultimately, with real (production) projects, whatever makes it easiest to understand is best. Because there will come a time that you or someone else will be looking at the code again. What they say is true: time is precious.
That's true in most cases, but in some cases where one-liners are common idioms, then it's acceptable. ? : might be an example. Closure might be another one.
No, it is annoying.
One liners can be more readable and they can be less readable. You'll have to judge from case to case.
And, of course, on the prompt one-liners rule.
VASTLY more important is developing and sticking to a consistent style.
You'll find bugs MUCH faster, be better able to share code with others, and even code faster if you merely develop and stick to a pattern.
One aspect of this is to make a decision on one-liners. Here's one example from my shop (I run a small coding department) - how we handle IFs:
Ifs shall never be all on one line if they overflow the visible line length, including any indentation.
Thou shalt never have else clauses on the same line as the if even if it comports with the line-length rule.
Develop your own style and STICK WITH IT (or, refactor all code in the same project if you change style).
.
The main drawback of "one liners" in my opinion is that it makes it hard to break on the code and debug. For example, pretend you have the following code:
a().b().c(d() + e())
If this isn't working, its hard to inspect the intermediate values. However, it's trivial to break with gdb (or whatever other tool you may be using) in the following, and check each individual variable and see precisely what is failing:
A = a();
B = A.b();
D = d();
E = e(); // here i can query A B D and E
B.C(d + e);
One rule of thumb is if you can express the concept of the one line in plain language in a very short sentence. "If it's true, set it to this, otherwise set it to that"
For a code construct where the ultimate objective of the entire structure is to decide what value to set a single variable, With appropriate formatting, it is almost always clearer to put multiple conditonals into a single statement. With multiple nested if end if elses, the overall objective, to set the variable...
" variableName = "
must be repeated in every nested clause, and the eye must read all of them to see this.. with a singlr statement, it is much clearer, and with the appropriate formatting, the complexity is more easily managed as well...
decimal cost =
usePriority? PriorityRate * weight:
useAirFreight? AirRate * weight:
crossMultRegions? MultRegionRate:
SingleRegionRate;
The prose is an easily understood one liner that works.
The cons is the concatenation of obfuscated gibberish on one line.
Generally, I'd call it a bad idea (although I do it myself on occasion) -- it strikes me as something that's done more to impress on how clever someone is than it is to make good code. "Clever tricks" of that sort are generally very bad.
That said, I personally aim to have one "idea" per line of code; if this burst of logic is easily encapsulated in a single thought, then go ahead. If you have to stop and puzzle it out a bit, best to break it up.

Resources