I'm reading a research paper High Performance Dynamic Lock-Free Hash Tables
and List-Based Sets (Maged M. Michael) and I don't understand this pseudo-code syntax that's being used for examples.
Specifically these parts:
〈pmark,cur,ptag〉: MarkPtrType;
〈cmark,next,ctag〉: MarkPtrType;
nodeˆ.〈Mark,Next〉←〈0,cur〉;
if CAS(prev,〈0,cur,ptag〉,〈0,node,ptag+1〉)
Eg. (page 5, chapter 3)
UPDATE:
The ˆ. seems to be a Pascal notation for dereferencing the pointer and accessing the variable in the record (https://stackoverflow.com/a/1814936/8524584).
The ← arrow seems to be a Haskell like do notation operator for assignment, that assigns the result of an operation to a variable. This is a bit strange, since the paper is nearly a decade older than Haskell. It's probably some notation that Haskell also borrowed from. (https://en.wikibooks.org/wiki/Haskell/do_notation#Translating_the_bind_operator)
The 〈a, b〉 is a mathematical vector notation for an inner product of a vector (https://mathworld.wolfram.com/InnerProduct.html)
The wide angle bracket notation seems to be either an ad-hoc list notation or manipulation of multiple variables on a single line (thanks #graybeard for pointing it out). It might even be some kind of tuple.
This is what 〈pmark,cur,ptag〉: MarkPtrType; would look like in a C like language:
MarkPtrType pmark;
MarkPtrType cur;
MarkPtrType ptag;
// or some list assignment notation
// or a tuple
The ˆ. seems to be a Pascal notation for dereferencing the pointer and accessing the variable in the record (https://stackoverflow.com/a/1814936/8524584).
The ← arrow is an APL assignment notation, also similar to the Haskell's do notation operator for assignment.
For example, a simple check for an empty string:
if s == "" { return 0 }
Or, a for-loop to pre-fill an array with -1 (I don't think there's an easier way to do this):
for i := range m { m[i] = -1 }
Is this generally discouraged, even if these functions are extremely simple altogether? I don't mean to be pedantic, but am generally curious what the sentiment for this is.
Generally, the culture in Go is to format your code the way the command go fmt would format it. (The reasons why there is an accepted style are in the linked article.)
To the extent that go fmt puts structured statement bodies on separate lines means that yes, the practice is "discouraged" in the community, but only because of a desire to have a common look for as much Go source code as possible.
The reasons why one-liners are not part of go fmt are not as relevant as the fact that go fmt was chosen as the canonical style. If you wish to argue about the pros and cons of one-liners, you can look at the debates made in just about any curly-brace language, as they are not unique to Go. Of course, Go's mandating of braces does make the question slightly different than, say, C or Java, where unbraced bodies means it's harder to "add a new statement in the body," but basically many of the same arguments for readability do apply.
Sometimes the value of a variable accessed within the control-flow of a program cannot possibly have any effect on a its output. For example:
global var_1
global var_2
start program hello(var_3, var_4)
if (var_2 < 0) then
save-log-to-disk (var_1, var_3, var_4)
end-if
return ("Hello " + var_3 + ", my name is " + var_1)
end program
Here only var_1 and var_3 have any influence on the output, while var_2 and var_4 are only used for side effects.
Do variables such as var_1 and var_3 have a name in dataflow-theory/compiler-theory?
Which static dataflow analysis techniques can be used to discover them?
References to academic literature on the subject would be particularly appreciated.
The problem that you stated is undecidable in general,
even for the following very narrow special case:
Given a single routine P(x), where x is a parameter of type integer. Is the output of P(x) independent of the value of x, i.e., does
P(0) = P(1) = P(2) = ...?
We can reduce the following still undecidable version of the halting problem to the question above: Given a Turing machine M(), does the program
never stop on the empty input?
I assume that we use a (Turing-complete) language in which we can build a "Turing machine simulator":
Given the program M(), construct this routine:
P(x):
if x == 0:
return 0
Run M() for x steps
if M() has terminated then:
return 1
else:
return 0
Now:
P(0) = P(1) = P(2) = ...
=>
M() does not terminate.
M() does terminate
=> P(x) = 1 for a sufficiently large x
=> P(x) != P(0) = 0
So, it is very difficult for a compiler to decide whether a variable actually does not influence the return value of a routine; in your example, the "side effect routine" might manipulate one of its values (or even loop infinitely, which would most definitely change the return value of the routine ;-)
Of course overapproximations are still possible. For example, one might conclude that a variable does not influence the return value if it does not appear in the routine body at all. You can also see some classical compiler analyses (like Expression Simplification, Constant propagation) having the side effect of eliminating appearances of such redundant variables.
Pachelbel has discussed the fact that you cannot do this perfectly. OK, I'm an engineer, I'm willing to accept some dirt in my answer.
The classic way to answer you question is to do dataflow tracing from program outputs back to program inputs. A dataflow is the connection of a program assignment (or sideeffect) to a variable value, to a place in the application that consumes that value.
If there is (transitive) dataflow from a program output that you care about (in your example, the printed text stream) to an input you supplied (var2), then that input "affects" the output. A variable that does not flow from the input to your desired output is useless from your point of view.
If you focus your attention only the computations involved in the dataflows, and display them, you get what is generally called a "program slice" . There are (very few) commercial tools that can show this to you.
Grammatech has a good reputation here for C and C++.
There are standard compiler algorithms for constructing such dataflow graphs; see any competent compiler book.
They all suffer from some limitation due to Turing's impossibility proofs as pointed out by Pachelbel. When you implement such a dataflow algorithm, there will be places that it cannot know the right answer; simply pick one.
If your algorithm chooses to answer "there is no dataflow" in certain places where it is not sure, then it may miss a valid dataflow and it might report that a variable does not affect the answer incorrectly. (This is called a "false negative"). This occasional error may be satisfactory if
the algorithm has some other nice properties, e.g, it runs really fast on a millions of code. (The trivial algorithm simply says "no dataflow" in all places, and it is really fast :)
If your algorithm chooses to answer "yes there is a dataflow", then it may claim that some variable affects the answer when it does not. (This is called a "false positive").
You get to decide which is more important; many people prefer false positives when looking for a problem, because then you have to at least look at possibilities detected by the tool. A false negative means it didn't report something you might care about. YMMV.
Here's a starting reference: http://en.wikipedia.org/wiki/Data-flow_analysis
Any of the books on that page will be pretty good. I have Muchnick's book and like it lot. See also this page: (http://en.wikipedia.org/wiki/Program_slicing)
You will discover that implementing this is pretty big effort, for any real langauge. You are probably better off finding a tool framework that does most or all this for you already.
I use the following algorithm: a variable is used if it is a parameter or it occurs anywhere in an expression, excluding as the LHS of an assignment. First, count the number of uses of all variables. Delete unused variables and assignments to unused variables. Repeat until no variables are deleted.
This algorithm only implements a subset of the OP's requirement, it is horribly inefficient because it requires multiple passes. A garbage collection may be faster but is harder to write: my algorithm only requires a list of variables with usage counts. Each pass is linear in the size of the program. The algorithm effectively does a limited kind of dataflow analysis by elimination of the tail of a flow ending in an assignment.
For my language the elimination of side effects in the RHS of an assignment to an unused variable is mandated by the language specification, it may not be suitable for other languages. Effectiveness is improved by running before inlining to reduce the cost of inlining unused function applications, then running it again afterwards which eliminates parameters of inlined functions.
Just as an example of the utility of the language specification, the library constructs a thread pool and assigns a pointer to it to a global variable. If the thread pool is not used, the assignment is deleted, and hence the construction of the thread pool elided.
IMHO compiler optimisations are almost invariably heuristics whose performance matters more than effectiveness achieving a theoretical goal (like removing unused variables). Simple reductions are useful not only because they're fast and easy to write, but because a programmer using a language who understand basics of the compiler operation can leverage this knowledge to help the compiler. The most well known example of this is probably the refactoring of recursive functions to place the recursion in tail position: a pointless exercise unless the programmer knows the compiler can do tail-recursion optimisation.
I am experimenting with syntax mods in Mathematica, using the Notation package.
I am not interested in mathematical notation for a specific field, but general purpose syntax modifications and extensions, especially notations that reduce the verbosity of Mathematica's VeryLongFunctionNames, clean up unwieldy constructs, or extend the language in a pleasing way.
An example modification is defining Fold[f, x] to evaluate as Fold[f, First#x, Rest#x]
This works well, and is quite convenient.
Another would be defining *{1,2} to evaluate as Sequence ## {1,2} as inspired by Python; this may or may not work in Mathematica.
Please provide information or links addressing:
Limits of notation and syntax modification
Tips and tricks for implementation
Existing packages, examples or experiments
Why this is a good or bad idea
Not a really constructive answer, just a couple of thoughts. First, a disclaimer - I don't suggest any of the methods described below as good practices (perhaps generally they are not), they are just some possibilities which seem to address your specific question. Regarding the stated goal - I support the idea very much, being able to reduce verbosity is great (for personal needs of a solo developer, at least). As for the tools: I have very little experience with Notation package, but, whether or not one uses it or writes some custom box-manipulation preprocessor, my feeling is that the whole fact that the input expression must be parsed into boxes by Mathematica parser severely limits a number of things that can be done. Additionally, there will likely be difficulties with using it in packages, as was mentioned in the other reply already.
It would be easiest if there would be some hook like $PreRead, which would allow the user to intercept the input string and process it into another string before it is fed to the parser. That would allow one to write a custom preprocessor which operates on the string level - or you can call it a compiler if you wish - which will take a string of whatever syntax you design and generate Mathematica code from it. I am not aware of such hook (it may be my ignorance of course). Lacking that, one can use for example the program style cells and perhaps program some buttons which read the string from those cells and call such preprocessor to generate the Mathematica code and paste it into the cell next to the one where the original code is.
Such preprocessor approach would work best if the language you want is some simple language (in terms of its syntax and grammar, at least), so that it is easy to lexically analyze and parse. If you want the Mathematica language (with its full syntax modulo just a few elements that you want to change), in this approach you are out of luck in the sense that, regardless of how few and "lightweight" your changes are, you'd need to re-implement pretty much completely the Mathematica parser, just to make those changes, if you want them to work reliably. In other words, what I am saying is that IMO it is much easier to write a preprocessor that would generate Mathematica code from some Lisp-like language with little or no syntax, than try to implement a few syntactic modifications to otherwise the standard mma.
Technically, one way to write such a preprocessor is to use standard tools like Lex(Flex) and Yacc(Bison) to define your grammar and generate the parser (say in C). Such parser can be plugged back to Mathematica either through MathLink or LibraryLink (in the case of C). Its end result would be a string, which, when parsed, would become a valid Mathematica expression. This expression would represent the abstract syntax tree of your parsed code. For example, code like this (new syntax for Fold is introduced here)
"((1|+|{2,3,4,5}))"
could be parsed into something like
"functionCall[fold,{plus,1,{2,3,4,5}}]"
The second component for such a preprocessor would be written in Mathematica, perhaps in a rule-based style, to generate Mathematica code from the AST. The resulting code must be somehow held unevaluated. For the above code, the result might look like
Hold[Fold[Plus,1,{2,3,4,5}]]
It would be best if analogs of tools like Lex(Flex)/Yacc(Bison) were available within Mathematica ( I mean bindings, which would require one to only write code in Mathematica, and generate say C parser from that automatically, plugging it back to the kernel either through MathLink or LibraryLink). I may only hope that they will become available in some future versions. Lacking that, the approach I described would require a lot of low-level work (C, or Java if your prefer). I think it is still doable however. If you can write C (or Java), you may try to do some fairly simple (in terms of the syntax / grammar) language - this may be an interesting project and will give an idea of what it will be like for a more complex one. I'd start with a very basic calculator example, and perhaps change the standard arithmetic operators there to some more weird ones that Mathematica can not parse properly itself, to make it more interesting. To avoid MathLink / LibraryLink complexity at first and just test, you can call the resulting executable from Mathematica with Run, passing the code as one of the command line arguments, and write the result to a temporary file, that you will then import into Mathematica. For the calculator example, the entire thing can be done in a few hours.
Of course, if you only want to abbreviate certain long function names, there is a much simpler alternative - you can use With to do that. Here is a practical example of that - my port of Peter Norvig's spelling corrector, where I cheated in this way to reduce the line count:
Clear[makeCorrector];
makeCorrector[corrector_Symbol, trainingText_String] :=
Module[{model, listOr, keys, words, edits1, train, max, known, knownEdits2},
(* Proxies for some commands - just to play with syntax a bit*)
With[{fn = Function, join = StringJoin, lower = ToLowerCase,
rev = Reverse, smatches = StringCases, seq = Sequence, chars = Characters,
inter = Intersection, dv = DownValues, len = Length, ins = Insert,
flat = Flatten, clr = Clear, rep = ReplacePart, hp = HoldPattern},
(* body *)
listOr = fn[Null, Scan[If[# =!= {}, Return[#]] &, Hold[##]], HoldAll];
keys[hash_] := keys[hash] = Union[Most[dv[hash][[All, 1, 1, 1]]]];
words[text_] := lower[smatches[text, LetterCharacter ..]];
With[{m = model},
train[feats_] := (clr[m]; m[_] = 1; m[#]++ & /# feats; m)];
With[{nwords = train[words[trainingText]],
alphabet = CharacterRange["a", "z"]},
edits1[word_] := With[{c = chars[word]}, join ### Join[
Table[
rep[c, c, #, rev[#]] &#{{i}, {i + 1}}, {i, len[c] - 1}],
Table[Delete[c, i], {i, len[c]}],
flat[Outer[#1[c, ##2] &, {ins[#1, #2, #3 + 1] &, rep},
alphabet, Range[len[c]], 1], 2]]];
max[set_] := Sort[Map[{nwords[#], #} &, set]][[-1, -1]];
known[words_] := inter[words, keys[nwords]]];
knownEdits2[word_] := known[flat[Nest[Map[edits1, #, {-1}] &, word, 2]]];
corrector[word_] := max[listOr[known[{word}], known[edits1[word]],
knownEdits2[word], {word}]];]];
You need some training text with a large number of words as a string to pass as a second argument, and the first argument is the function name for a corrector. Here is the one that Norvig used:
text = Import["http://norvig.com/big.txt", "Text"];
You call it once, say
In[7]:= makeCorrector[correct, text]
And then use it any number of times on some words
In[8]:= correct["coputer"] // Timing
Out[8]= {0.125, "computer"}
You can make your custom With-like control structure, where you hard-code the short names for some long mma names that annoy you the most, and then wrap that around your piece of code ( you'll lose the code highlighting however). Note, that I don't generally advocate this method - I did it just for fun and to reduce the line count a bit. But at least, this is universal in the sense that it will work both interactively and in packages. Can not do infix operators, can not change precedences, etc, etc, but almost zero work.
(my first reply/post.... be gentle)
From my experience, the functionality appears to be a bit of a programming cul-de-sac. The ability to define custom notations seems heavily dependent on using the 'notation palette' to define and clear each custom notation. ('everything is an expression'... well, except for some obscure cases, like Notations, where you have to use a palette.) Bummer.
The Notation package documentation mentions this explicitly, so I can't complain too much.
If you just want to define custom notations in a particular notebook, Notations might be useful to you. On the other hand, if your goal is to implement custom notations in YourOwnPackage.m and distribute them to others, you are likely to encounter issues. (unless you're extremely fluent in Box structures?)
If someone can correct my ignorance on this, you'd make my month!! :)
(I was hoping to use Notations to force MMA to treat subscripted variables as symbols.)
Not a full answer, but just to show a trick I learned here (more related to symbol redefinition than to Notation, I reckon):
Unprotect[Fold];
Fold[f_, x_] :=
Block[{$inMsg = True, result},
result = Fold[f, First#x, Rest#x];
result] /; ! TrueQ[$inMsg];
Protect[Fold];
Fold[f, {a, b, c, d}]
(*
--> f[f[f[a, b], c], d]
*)
Edit
Thanks to #rcollyer for the following (see comments below).
You can switch the definition on or off as you please by using the $inMsg variable:
$inMsg = False;
Fold[f, {a, b, c, d}]
(*
->f[f[f[a,b],c],d]
*)
$inMsg = True;
Fold[f, {a, b, c, d}]
(*
->Fold::argrx: (Fold called with 2 arguments; 3 arguments are expected.
*)
Fold[f, {a, b, c, d}]
That's invaluable while testing