On complexity of recursive descent parsers

On complexity of recursive descent parsers - algorithm

It's known that recursive descent parsers may require exponential time in some cases; could anyone point me to the samples, where this happens? Especially interested in cases for PEG (i.e. with prioritized choices).

Any top down parser, including recursive descent, can theoretically become exponential if the combination of input and grammar are such that large numbers of backtracks are necessary. This happens if the grammar is such that determinative choices are placed at the end of long sequences. For example, if you have a symbol like & meaning "all previous minuses are actually plusses" and then have data like "((((a - b) - c) - d) - e &)" then the parser has to go backwards and change all the plusses to minuses. If you start making nested expressions along these lines you can create an effectively non-terminating set of input.
You have to realize you are stepping into a political issue here, because the reality is that most normal grammars and data sets are not like this, however, there are a LOT of people who systematically badmouth recursive descent because it is not easy to make RD automatically. All early parsers are LALR because they are MUCH easier to make automatically than RD. So what happened was that everyone just wrote LALR and badmouthed RD, because in the old days the only way to make an RD was to code it by hand. For example, if you read the dragon book you will find that Aho & Ullman write just one paragraph on RD, and it is basically just a ideological takedown saying "RD is bad, don't do it".
Of course, if you start hand coding RDs (as I have) you will find that they are much better than LALRs for a variety of reasons. In the old days you could always tell a compiler that had a hand-coded RD, because it had meaningful error messages with locational accuracy, whereas compilers with LALRs would show the error occurring like 50 lines away from where it really was. Things have changed a lot since the old days, but you should realize that when you start reading the FUD on RD, that it is coming from a long, long tradition of verbally trashing RD in "certain circles".

It's because you can end up parsing the same things (check the same rule at the same position) many times in different recursion branches. It's kind of like calculating the n-th Fibonacci number using recursion.
Grammar:
A -> xA | xB | x
B -> yA | xA | y | A
S -> A
Input:
xxyxyy
Parsing:
xA(xxyxyy)
xA(xyxyy)
xA(yxyy) fail
xB(yxyy) fail
x(yxyy) fail
xB(xyxyy)
yA(yxyy)
xA(xyy)
xA(yy) fail
xB(yy) fail
x(yy) fail
xB(xyy)
yA(yy)
xA(y) fail
xB(y) fail
x(y) fail
xA(yy) fail *
x(xyy) fail
xA(yxyy) fail *
y(yxyy) fail
A(yxyy)
xA(yxyy) fail *
xB(yxyy) fail *
x(yxyy) fail *
x(xyxyy) fail
xB(xxyxyy)
yA(xyxyy) fail
xA(xyxyy) *
xA(yxyy) fail *
xB(yxyy) fail *
...
* - where we parse a rule at the same position where we have already parsed it in a different branch. If we had saved the results - which rules fail at which positions - we'd know xA(xyxyy) fails the second time around and we wouldn't go through its whole subtree again. I didn't want to write out the whole thing, but you can see it will repeat the same subtrees many times.
When it will happen - when you have many overlapping transformations. Prioritized choice doesn't change things - if the lowest priority rule ends up being the only correct one (or none are correct), you had to check all the rules anyway.

Related

An algorithm for compiler designing?

Recently I am thinking about an algorithm constructed by myself. I call it Replacment Compiling.
It works as follows:
Define a language as well as its operators' precedence, such as
(1) store <value> as <id>, replace with: var <id> = <value>, precedence: 1
(2) add <num> to <num>, replace with: <num> + <num>, precedence: 2
Accept a line of input, such as store add 1 to 2 as a;
Tokenize it: <kw,store><kw,add><num,1><kw,to><2><kw,as><id,a><EOF>;
Then scan through all the tokens until reach the end-of-file, find the operation with highest precedence, and "pack" the operation:
<kw,store>(<kw,add><num,1><kw,to><2>)<kw,as><id,a><EOF>
Replace the "sub-statement", the expression in parenthesis, with the defined replacement:
<kw,store>(1 + 2)<kw,as><id,a><EOF>
Repeat until no more statements left:
(<kw,store>(1 + 2)<kw,as><id,a>)<EOF>
(var a = (1 + 2))
Then evaluate the code with the built-in function, eval().
eval("var a = (1 + 2)")
Then my question is: would this algorithm work, and what are the limitations? Is this algorithm works better on simple languages?

This won't work as-is, because there's no way of deciding the precedence of operations and keywords, but you have essentially defined parsing (and thrown in an interpretation step at the end). This looks pretty close to operator-precedence parsing, but I could be wrong in the details of your vision. The real keys to what makes a parsing algorithm are the direction/precedence it reads the code, whether the decisions are made top-down (figure out what kind of statement and apply the rules) or bottom-up (assemble small pieces into larger components until the types of statements are apparent), and whether the grammar is encoded as code or data for a generic parser. (I'm probably overlooking something, but this should give you a starting point to make sense out of further reading.)
More typically, code is generally parsed using an LR technique (LL if it's top-down) that's driven from a state machine with look-ahead and next-step information, but you'll also find the occasional recursive descent. Since they're all doing very similar things (only implemented differently), your rough algorithm could probably be refined to look a lot like any of them.
For most people learning about parsing, recursive-descent is the way to go, since everything is in the code instead of building what amounts to an interpreter for the state machine definition. But most parser generators build an LL or LR compiler.
And I'm obviously over-simplifying the field, since you can see at the bottom of the Wikipedia pages that there's a smattering of related systems that partly revolve around the kind of grammar you have available. But for most languages, those are the big-three algorithms.

What you've defined is a rewriting system: https://en.wikipedia.org/wiki/Rewriting
You can make a compiler like that, but it's hard work and runs slowly, and if you do a really good job of optimizing it then you'll get conventional table-driven parser. It would be better in the end to learn about those first and just start there.
If you really don't want to use a parser generating tool, then the easiest way to write a parser for a simple language by hand is usually recursive descent: https://en.wikipedia.org/wiki/Recursive_descent_parser

Precision in Program analysis

According to David Brumley's Control Flow Integrity & Software Fault Isolation (PPT slide),
in the below statements, x is always 8 due to the path to the x=7 is unrealizable even with the path sensitive analysis.
Why is that?
Is it because the analysis cannot determine the values of n, a, b, and c in advance during the analysis? Or is it because there's no solution that can be calculated by a computer?
if(a^n + b^n = c^n && n>2 && a>0 && b>0 && c>0)
x = 7; /unrealizable path/
else
x = 8;

In general, the task to determine which path in the program is taken, and which — not, is undecidable. It is quite possible that a particular expression, as in your example, can be proved to have a specific value. However, the words "in general" and "undecidable" say that you cannot write an algorithm that would be able to compute the value every time.
At this point the analysis algorithm can be optimistic or pessimistic. The optimistic one could pick 8 and be fine — it considers possible that at run-time x would get this value. It could also pick 7 — "who knows, maybe, x would be 7". But if the analysis is required to be sound, and it cannot determine the value of the condition, it should assume that the first branch could be taken during one execution, and the second branch could be taken during another execution, so x could be either 7 or 8.
In other words, there is a trade-off between soundness and precision. Or, actually, between soundness, precision, and decidability. The latter property tells if the analysis always terminates. Now, you have to pick what is needed:
Decidability — this is a common choice for compilers and code analyzers, because you would like to get an answer about your program in finite time. However, proof assistants could start some processes that could run up to the specified time limit, and if the limit is not set, forever: it's up to the user to stop it and to try something else.
Soundness — this is a common choice for compilers, because you would like to get the answer that matches the language specification. Code analyzers are more flexible. Many of them are unsound, but because of that they can find more potential issues in finite time, leaving the interpretation to the developer. I believe the example you mention talks about sound analysis.
Precision — this is a rare property. Compilers and code analyzer should be pessimistic, because otherwise some incorrect code could sneak in. But this might be parameterizable. E.g., if the compiler/analyzer supports constant propagation and folding, and all of the variables in the example are set to some known constants before the condition, it can figure out the exact value of x after it, and be completely precise.

Prolog predicate arguments: readability vs. efficiency

I want to ask pros and cons of different Prolog representations in arguments of predicates.
For example in Exercise 4.3: Write a predicate second(X,List) which checks whether X is the second element of List. The solution can be:
second(X,List):- [_,X|_]=List.
Or,
second(X,[_,X|_]).
The both predicates would behave similarly. The first one would be more readable than the second, at least to me. But the second one uses more stacks during the execution (I checked this with trace).
A more complicated example is Exercise 3.5: Binary trees are trees where all internal nodes have exactly two children. The smallest binary trees consist of only one leaf node. We will represent leaf nodes as leaf(Label) . For instance, leaf(3) and leaf(7) are leaf nodes, and therefore small binary trees. Given two binary trees B1 and B2 we can combine them into one binary tree using the functor tree/2 as follows: tree(B1,B2) . So, from the leaves leaf(1) and leaf(2) we can build the binary tree tree(leaf(1),leaf(2)) . And from the binary trees tree(leaf(1),leaf(2)) and leaf(4) we can build the binary tree tree(tree(leaf(1), leaf(2)),leaf(4)). Now, define a predicate swap/2 , which produces the mirror image of the binary tree that is its first argument. The solution would be:
A2.1:
swap(T1,T2):- T1=tree(leaf(L1),leaf(L2)), T2=tree(leaf(L2),leaf(L1)).
swap(T1,T2):- T1=tree(tree(B1,B2),leaf(L3)), T2=tree(leaf(L3),T3), swap(tree(B1,B2),T3).
swap(T1,T2):- T1=tree(leaf(L1),tree(B2,B3)), T2=tree(T3,leaf(L1)), swap(tree(B2,B3),T3).
swap(T1,T2):- T1=tree(tree(B1,B2),tree(B3,B4)), T2=tree(T4,T3), swap(tree(B1,B2),T3),swap(tree(B3,B4),T4).
Alternatively,
A2.2:
swap(tree(leaf(L1),leaf(L2)), tree(leaf(L2),leaf(L1))).
swap(tree(tree(B1,B2),leaf(L3)), tree(leaf(L3),T3)):- swap(tree(B1,B2),T3).
swap(tree(leaf(L1),tree(B2,B3)), tree(T3,leaf(L1))):- swap(tree(B2,B3),T3).
swap(tree(tree(B1,B2),tree(B3,B4)), tree(T4,T3)):- swap(tree(B1,B2),T3),swap(tree(B3,B4),T4).
The number of steps of the second solution was much less than the first one (again, I checked with trace). But regarding the readability, the first one would be easier to understand, I think.
Probably the readability depends on the level of one's Prolog skill. I am a learner level of Prolog, and am used to programming with C++, Python, etc. So I wonder if skillful Prolog programmers agree with the above readability.
Also, I wonder if the number of steps can be a good measurement of the computational efficiency.
Could you give me your opinions or guidelines to design predicate arguments?
EDITED.
According to the advice from #coder, I made a third version that consists of a single rule:
A2.3:
swap(T1,T2):-
( T1=tree(leaf(L1),leaf(L2)), T2=tree(leaf(L2),leaf(L1)) );
( T1=tree(tree(B1,B2),leaf(L3)), T2=tree(leaf(L3),T3), swap(tree(B1,B2),T3) );
( T1=tree(leaf(L1),tree(B2,B3)), T2=tree(T3,leaf(L1)), swap(tree(B2,B3),T3) );
( T1=tree(tree(B1,B2),tree(B3,B4)), T2=tree(T4,T3), swap(tree(B1,B2),T3),swap(tree(B3,B4),T4) ).
I compared the number of steps in trace of each solution:
A2.1: 36 steps
A2.2: 8 steps
A2.3: 32 steps
A2.3 (readable single-rule version) seems to be better than A2.1 (readable four-rule version), but A2.2 (non-readable four-rule version) still outperforms.
I'm not sure if the number of steps in trace is reflecting the actual computation efficiency.
There are less steps in A2.2 but it uses more computation cost in pattern matching of the arguments.
So, I compared the execution time for 40000 queries (each query is a complicated one, swap(tree(tree(tree(tree(leaf(3),leaf(4)),leaf(5)),tree(tree(tree(tree(leaf(3),leaf(4)),leaf(5)),leaf(4)),leaf(5))),tree(tree(leaf(3),tree(tree(leaf(3),leaf(4)),leaf(5))),tree(tree(tree(tree(leaf(3),leaf(4)),leaf(5)),leaf(4)),leaf(5)))), _). ). The results were almost the same (0.954 sec, 0.944 sec, 0.960 sec respectively). This is showing that the three reresentations A2.1, A2.2, A2.3 have close computational efficiency.
Do you agree with this result? (Probably this is a case specific; I need to vary the experimental setup).

This question is a very good example of a bad question for a forum like Stackoverflow. I am writing an answer because I feel you might use some advice, which, again, is very subjective. I wouldn't be surprised if the question gets closed as "opinion based". But first, an opinion on the exercises and the solutions:
Second element of list
Definitely, second(X, [_,X|_]). is to be preferred. It just looks more familiar. But you should be using the standard library anyway: nth1(2, List, Element).
Mirroring a binary tree
The tree representation that the textbook suggests is a bit... unorthodox? A binary tree is almost invariably represented as a nested term, using two functors, for example:
t/3 which is a non-empty tree, with t(Value_at_node, Left_subtree, Right_subtree)
nil/0 which is an empty tree
Here are some binary trees:
The empty tree: nil
A binary search tree holding {1,2,3}: t(2, t(1, nil, nil), t(3, nil, nil))
A degenerate left-leaning binary tree holding the list [1,2,3] (if you traversed it pre-order): t(1, t(2, t(3, nil, nil), nil), nil)
So, to "mirror" a tree, you would write:
mirror(nil, nil).
mirror(t(X, L, R), t(X, MR, ML)) :-
mirror(L, ML),
mirror(R, MR).
The empty tree, mirrored, is the empty tree.
A non-empty tree, mirrored, has its left and right sub-trees swapped, and mirrored.
That's all. No need for swapping, really, or anything else. It is also efficient: for any argument, only one of the two clauses will be evaluated because the first arguments are different functors, nil/0 and t/3 (Look-up "first argument indexing" for more information on this). If you would have instead written:
mirror_x(T, MT) :-
( T = nil
-> MT = nil
; T = t(X, L, R),
MT = t(X, MR, ML),
mirror_x(L, ML),
mirror_x(R, MR)
).
Than not only is this less readable (well...) but probably less efficient, too.
On readability and efficiency
Code is read by people and evaluated by machines. If you want to write readable code, you still might want to address it to other programmers and not to the machines that are going to evaluate it. Prolog implementations have gotten better and better at being efficient at evaluating code that is also more readable to people who have read and written a lot of Prolog code (do you recognize the feedback loop?). You might want to take a look at Coding Guidelines for Prolog if you are really interested in readability.
A first step towards getting used to Prolog is trying to solve the 99 Prolog Problems (there are other sites with the same content). Follow the suggestion to avoid using built-ins. Then, look at the solutions and study them. Then, study the documentation of a Prolog implementation to see how much of these problems have been solved with built-in predicates or standard libraries. Then, study the implementations. You might find some real gems there: one of my favorite examples is the library definition of nth0/3. Just look at this beauty ;-).
There is also a whole book written on the subject of good Prolog code: "The Craft of Prolog" by Richard O'Keefe. The efficiency measurements are quite outdated though. Basically, if you want to know how efficient your code is, you end up with a matrix with at least three dimensions:
Prolog implementation (SWI-Prolog, SICSTUS, YAP, Gnu-Prolog...)
Data structure and algorithm used
Facilities provided by the implementation
You will end up having some wholes in the matrix. Example: what is the best way to read line-based input, do something with each line, and output it? Read line by line, do the thing, output? Read all at once, do everything in memory, output at once? Use a DCG? In SWI-Prolog, since version 7, you can do:
read_string(In_stream, _, Input),
split_string(Input, "\n", "", Lines),
maplist(do_x, Lines, Xs),
atomics_to_string(Xs, "\n", Output),
format(Out_stream, "~s\n", Output)
This is concise and very efficient. Caveats:
The available memory might be a bottle neck
Strings are not standard Prolog, so you are stuck with implementations that have them
This is a very basic example, but it demonstrates at least the following difficulties in answering your question:
Differences between implementations
Opinions on what is readable or idiomatic Prolog
Opinions on the importance of standards
The example above doesn't even go into details about your problem, as for example what you do with each line. Is it just text? Do you need to parse the lines? Why are you not using a stream of Prolog terms instead? and so on.
On efficiency measurements
Don't use the number of steps in the tracer, or even the reported number of inferences. You really need to measure time, with a realistic input. Sorting with sort/2, for example, always counts as exactly one inference, no matter what is the length of the list being sorted. On the other hand, sort/2 in any Prolog is about as efficient as a sort on your machine would ever get, so is that an issue? You can't know until you have measured the performance.
And of course, as long as you make an informed choice of an algorithm and a data structure, you can at the very least know the complexity of your solution. Doing an efficiency measurement is interesting only if you notice a discrepancy between what you expect and what you measure: obviously, there is a mistake. Either your complexity analysis is wrong, or your implementation is wrong, or even the Prolog implementation you are using is doing something unexpected.
On top of this, there is the inherent problem of high-level libraries. With some of the more complex approaches, you might not be able to easily judge what the complexity of a given solution might be (constraint logic programming, as in CHR and CLPFD, is a prime example). Most real problems that fit nicely to the approach will be much easier to write, and more efficient than you could ever do without considerable effort and very specific code. But get fancy enough, and your CHR program might not even want to compile any more.
Unification in the head of the predicate
This is not opinion-based any more. Just do the unifications in the head if you can. It is more readable to a Prolog programmer, and it is more efficient.
PS
"Learn Prolog Now!" is a good starting point, but nothing more. Just work your way through it and move on.

In the first way for example for Exercise 3.5 you use the rule swap(T1,T2) four times ,which means that prolog will examine all these four rules and will return true or fail for every of these four calls .Because these rules can't all be true together (each time one of them will return true) ,for every input you waste three calls that will not succeed (that's why it demands more steps and more time ). The only advantage in the above case is that by writing with the first way ,it is more readable. In generally when you have such cases of pattern matching it's better to write the rules in a way that are well defined and not two(or more) rules match a input ,if of course you require only one answer ,as for example the second way of writing the above example .
Finally one example where it is required that more than one rules match an input is the predicate member where it is written:
member(H,[H|_]).
member(H,[_|T]):- member(H,T).
where in this case you require more than one answers.
In the third way you just write the first way without pattern matching .It has the form (condition1);...;(condition4) and if the condition1 does not return true it examines the next condition .Most of the times the fourth condition returns true ,but it has called and tested condition1-3 which returned false .So it is almost as the first way of writing the solution ,except the fact that in third solution if it finds true condition1 it will not test other conditions so you will save some wasted calls (compared to solution1).
As for the running time ,it was expected to be almost the same because in worst case solution 1 and 3 does four times the tests/calls that solution 2 does .So if solution2 is O(g) complexity (for some function g) ,then solution 1 and 3 are O(4g) which is O(g) complexity so running times will be very close.

How to recognize variables that don't affect the output of a program?

Sometimes the value of a variable accessed within the control-flow of a program cannot possibly have any effect on a its output. For example:
global var_1
global var_2
start program hello(var_3, var_4)
if (var_2 < 0) then
save-log-to-disk (var_1, var_3, var_4)
end-if
return ("Hello " + var_3 + ", my name is " + var_1)
end program
Here only var_1 and var_3 have any influence on the output, while var_2 and var_4 are only used for side effects.
Do variables such as var_1 and var_3 have a name in dataflow-theory/compiler-theory?
Which static dataflow analysis techniques can be used to discover them?
References to academic literature on the subject would be particularly appreciated.

The problem that you stated is undecidable in general,
even for the following very narrow special case:
Given a single routine P(x), where x is a parameter of type integer. Is the output of P(x) independent of the value of x, i.e., does
P(0) = P(1) = P(2) = ...?
We can reduce the following still undecidable version of the halting problem to the question above: Given a Turing machine M(), does the program
never stop on the empty input?
I assume that we use a (Turing-complete) language in which we can build a "Turing machine simulator":
Given the program M(), construct this routine:
P(x):
if x == 0:
return 0
Run M() for x steps
if M() has terminated then:
return 1
else:
return 0
Now:
P(0) = P(1) = P(2) = ...
=>
M() does not terminate.
M() does terminate
=> P(x) = 1 for a sufficiently large x
=> P(x) != P(0) = 0
So, it is very difficult for a compiler to decide whether a variable actually does not influence the return value of a routine; in your example, the "side effect routine" might manipulate one of its values (or even loop infinitely, which would most definitely change the return value of the routine ;-)
Of course overapproximations are still possible. For example, one might conclude that a variable does not influence the return value if it does not appear in the routine body at all. You can also see some classical compiler analyses (like Expression Simplification, Constant propagation) having the side effect of eliminating appearances of such redundant variables.

Pachelbel has discussed the fact that you cannot do this perfectly. OK, I'm an engineer, I'm willing to accept some dirt in my answer.
The classic way to answer you question is to do dataflow tracing from program outputs back to program inputs. A dataflow is the connection of a program assignment (or sideeffect) to a variable value, to a place in the application that consumes that value.
If there is (transitive) dataflow from a program output that you care about (in your example, the printed text stream) to an input you supplied (var2), then that input "affects" the output. A variable that does not flow from the input to your desired output is useless from your point of view.
If you focus your attention only the computations involved in the dataflows, and display them, you get what is generally called a "program slice" . There are (very few) commercial tools that can show this to you.
Grammatech has a good reputation here for C and C++.
There are standard compiler algorithms for constructing such dataflow graphs; see any competent compiler book.
They all suffer from some limitation due to Turing's impossibility proofs as pointed out by Pachelbel. When you implement such a dataflow algorithm, there will be places that it cannot know the right answer; simply pick one.
If your algorithm chooses to answer "there is no dataflow" in certain places where it is not sure, then it may miss a valid dataflow and it might report that a variable does not affect the answer incorrectly. (This is called a "false negative"). This occasional error may be satisfactory if
the algorithm has some other nice properties, e.g, it runs really fast on a millions of code. (The trivial algorithm simply says "no dataflow" in all places, and it is really fast :)
If your algorithm chooses to answer "yes there is a dataflow", then it may claim that some variable affects the answer when it does not. (This is called a "false positive").
You get to decide which is more important; many people prefer false positives when looking for a problem, because then you have to at least look at possibilities detected by the tool. A false negative means it didn't report something you might care about. YMMV.
Here's a starting reference: http://en.wikipedia.org/wiki/Data-flow_analysis
Any of the books on that page will be pretty good. I have Muchnick's book and like it lot. See also this page: (http://en.wikipedia.org/wiki/Program_slicing)
You will discover that implementing this is pretty big effort, for any real langauge. You are probably better off finding a tool framework that does most or all this for you already.

I use the following algorithm: a variable is used if it is a parameter or it occurs anywhere in an expression, excluding as the LHS of an assignment. First, count the number of uses of all variables. Delete unused variables and assignments to unused variables. Repeat until no variables are deleted.
This algorithm only implements a subset of the OP's requirement, it is horribly inefficient because it requires multiple passes. A garbage collection may be faster but is harder to write: my algorithm only requires a list of variables with usage counts. Each pass is linear in the size of the program. The algorithm effectively does a limited kind of dataflow analysis by elimination of the tail of a flow ending in an assignment.
For my language the elimination of side effects in the RHS of an assignment to an unused variable is mandated by the language specification, it may not be suitable for other languages. Effectiveness is improved by running before inlining to reduce the cost of inlining unused function applications, then running it again afterwards which eliminates parameters of inlined functions.
Just as an example of the utility of the language specification, the library constructs a thread pool and assigns a pointer to it to a global variable. If the thread pool is not used, the assignment is deleted, and hence the construction of the thread pool elided.
IMHO compiler optimisations are almost invariably heuristics whose performance matters more than effectiveness achieving a theoretical goal (like removing unused variables). Simple reductions are useful not only because they're fast and easy to write, but because a programmer using a language who understand basics of the compiler operation can leverage this knowledge to help the compiler. The most well known example of this is probably the refactoring of recursive functions to place the recursion in tail position: a pointless exercise unless the programmer knows the compiler can do tail-recursion optimisation.

Is it worth it to rewrite an if statement to avoid branching?

Recently I realized I have been doing too much branching without caring the negative impact on performance it had, therefore I have made up my mind to attempt to learn all about not branching. And here is a more extreme case, in attempt to make the code to have as little branch as possible.
Hence for the code
if(expression)
A = C; //A and C have to be the same type here obviously
expression can be A == B, or Q<=B, it could be anything that resolve to true or false, or i would like to think of it in term of the result being 1 or 0 here
I have come up with this non branching version
A += (expression)*(C-A); //Edited with thanks
So my question would be, is this a good solution that maximize efficiency?
If yes why and if not why?

Depends on the compiler, instruction set, optimizer, etc. When you use a boolean expression as an int value, e.g., (A == B) * C, the compiler has to do the compare, and the set some register to 0 or 1 based on the result. Some instruction sets might not have any way to do that other than branching. Generally speaking, it's better to write simple, straightforward code and let the optimizer figure it out, or find a different algorithm that branches less.

Jeez, no, don't do that!
Anyone who "penalize[s] [you] a lot for branching" would hopefully send you packing for using something that awful.
How is it awful, let me count the ways:
There's no guarantee you can multiply a quantity (e.g., C) by a boolean value (e.g., (A==B) yields true or false). Some languages will, some won't.
Anyone casually reading it is going observe a calculation, not an assignment statement.
You're replacing a comparison, and a conditional branch with two comparisons, two multiplications, a subtraction, and an addition. Seriously non-optimal.
It only works for integral numeric quantities. Try this with a wide variety of floating point numbers, or with an object, and if you're really lucky it will be rejected by the compiler/interpreter/whatever.

You should only ever consider doing this if you had analyzed the runtime properties of the program and determined that there is a frequent branch misprediction here, and that this is causing an actual performance problem. It makes the code much less clear, and its not obvious that it would be any faster in general (this is something you would also have to measure, under the circumstances you are interested in).

After doing research, I came to the conclusion that when there are bottleneck, it would be good to include timed profiler, as these kind of codes are usually not portable and are mainly used for optimization.
An exact example I had after reading the following question below
Why is it faster to process a sorted array than an unsorted array?
I tested my code on C++ using that, that my implementation was actually slower due to the extra arithmetics.
HOWEVER!
For this case below
if(expression) //branched version
A += C;
//OR
A += (expression)*(C); //non-branching version
The timing was as of such.
Branched Sorted list was approximately 2seconds.
Branched unsorted list was aproximately 10 seconds.
My implementation (whether sorted or unsorted) are both 3seconds.
This goes to show that in an unsorted area of bottleneck, when we have a trivial branching that can be simply replaced by a single multiplication.
It is probably more worthwhile to consider the implementation that I have suggested.
** Once again it is mainly for the areas that is deemed as the bottleneck **

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio