Shift reduce and reduce reduce conflicts - oslo

I'm having a hard time wrapping my head around this and need some help understanding shift reduce and reduce reduce conflicts. I have a grammar which I can't seem to understand why it's problematic. I could attach the grammar, but I want to learn how this really works.
First question, what type of parser does MGrammer create? As I understand it, shift reduce and reduce reduce conflicts depends on the kind of parser.
Second question, what signifies a reduce reduce conflict and what signifies a shift reduce conflict?
I know the basics of lexical analysis, and formal grammar but it's been a while since I worked with language design so any help here is much appropriated.
Update:
I'm working with a whitespace significant language and I'm wondering about the possibilities of doing this in MGrammar, will I need look-a-head to resolve ambiguities?

Simple example:
if cond
if cond2
cmd
else
cmd2
Question: Where does the else belong to? For the human eye, the indentation says "to the second if" but that means nothing to a computer (except when using Python ;)). This is a shift/reduce conflict.
A elegant solution is to treat the else as a left-binding operator of the highest precedence (which makes it "hang" to the closest if).
A reduce/reduce conflict is an ambiguity. I have no good example handy but it means that there are paths in the grammar where one token could cause two rules to reduce at the same time and there is no additional information to decide which rule should take precedence.
[EDIT] The bison docs have an example for reduce/reduce.

Related

What exactly is the token count in functions/methods used for?

I've been using some tools to measure code quality and CCN (Cyclomatic Complexity Number) and some of those tools provides a count for tokens in functions what does that count says about my function or method? What is it used for?
Cyclomatic Complexity Number is a metric to indicate complexity of function, procedure or program. The best (large enough and intuitive) explanation I have found is provided here.
I think that tokens refer to conditional statements tokens that actually are taken into account to compute the cyclomatic complexity.
[later edit]
A high CCN means complex code that:
it is (much) more hard to read and understand
it is hard to maintain
unit tests are harder to maintain since a decent code coverage is reached with more difficulty
might lead to more bugs
CCN can be reduced using various techniques. Some examples can be seen here or here.
In the context of CCN tools, a token is any distinct operator or operand.
How this is implemented depends on the tool. Since the page on Lizard doesn't go into details, you will have to examine the source code (its not many lines)
https://github.com/terryyin/lizard/tree/master/lizard_languages
If you search the source for 'token', you will see how the tool is parsing the code. In most cases it is looking for code blocks, expressions, annotations and accessing of methods/objects.
For example, according to java.py, Java is only parsed for '{', '#', and '.'
Not sure why it isn't looking for expression...?
The OP has not declared which tool they're using but for lizard this has been asked from as an issue so it might help someone
Token is the word and operators, etc.
For example: if (abc % 3 != 0) Has [‘if’, ‘(‘, ‘abc’, ‘%’, ‘3’, ‘!=‘, ‘0’, ‘)’] 8 tokens.
Also another source that has similar description:
One program can have a maximum of 8192 tokens. Each token is a word
(e.g. variable name) or operator. Pairs of brackets, and strings count
as 1 token. commas, periods, LOCALs, semi-colons, ENDs, and comments
are not counted.
Now the next question is, would the number of tokens matter like CNN? Giving the disclaimer that I am not an expert in code quality, it depends on the language. For example, in compiled languages, you might want to break a complex line into multiple lines which increases the number of tokens but significantly enhances the readability of the code. You should go for it, the modern compilers are smart enough to optimize them.
However, this might not be so much true in interpreted languages. Again, you should look into the specific language you are using to make sure if there is any optimization behind the scene or not. That being said, some languages such as Python provide syntaxes to reduce the number of tokens. This is great as long as it was designed in the language.
TL;DR: This factor doesn't matter as much as code readability. Double-check your code if it is high but don't mess up the code to lower it.

Simple tips for Haskell performance increases (on ProjectEuler problems)?

I'm new to programming and learning Haskell by reading and working through Project Euler problems. Of course, the most important thing one can do to improve performance on these problems is to use a better algorithm. However, it is clear to me that there are other simple and easy to implement ways to improve performance. A cursory search brought up this question, and this question, which give the following tips:
Use the ghc flags -O2 and -fllvm.
Use the type Int, instead of Integer, because it is unboxed (or even Integer instead of Int64). This requires typing the functions, not letting the compiler decide on the fly.
Use rem, not mod, for division testing.
Use Schwartzian transformations when appropriate.
Using an accumulator in recursive functions (a tail-recursion optimization, I believe).
Memoization (?)
(One answer also mentions worker/wrapper transformation, but that seems fairly advanced.)
Question: What other simple optimizations can one make in Haskell to improve performance on Project Euler-style problems? Are there any other Haskell-specific (or functional programming specific?) ideas or features that could be used to help speed up solutions to Project Euler problems? Conversely, what should one watch out for? What are some common yet inefficient things to be avoided?
Here are some good slides by Johan Tibell that I frequently refer to:
Haskell Performance Patterns
One easy suggestion is to use hlint which is a program that checks your source code and makes suggestions for improvements syntax wise. This might not increase speed because most likely it's already done by the compiler or the lazy evaluation. But it might help the compiler in some cases. Further more it will make you a better Haskell programmer since you will learn better ways to do things, and it might be easier to understand your program and analyze it.
examples taken from http://community.haskell.org/~ndm/darcs/hlint/hlint.htm such as:
darcs-2.1.2\src\CommandLine.lhs:94:1: Error: Use concatMap
Found:
concat $ map escapeC s
Why not:
concatMap escapeC s
and
darcs-2.1.2\src\Darcs\Patch\Test.lhs:306:1: Error: Use a more efficient monadic variant
Found:
mapM (delete_line (fn2fp f) line) old
Why not:
mapM_ (delete_line (fn2fp f) line) old
I think the largest increases you can do in Project Euler problems is to understand the problem and remove unnecessary computations. Even if you don't understand everything you can do some small fixes which will make your program run twice the speed. Let's say you are looking for primes up to 1.000.000, then you of course can do filter isPrime [1..1000000]. But if you think a bit, then you can realize that well, no even number above is a prime, there you have removed (about) half the work. Instead doing [1,2] ++ filter isPrime [3,5..999999]
There is a fairly large section of the Haskell wiki about performance.
One fairly common problem is too little (or too much) strictness (this is covered by the sections listed in the General techniques section of the performance page above). Too much laziness causes a large number of thunks to be accumulated, too much strictness can cause too much to be evaluated.
These considerations are especially important when writing tail recursive functions (i.e. those with an accumulator); And, on that note, depending on how the function is used, a tail recursive function is sometimes less efficient in Haskell than the equivalent non-tail-recursive function, even with the optimal strictness annotations.
Also, as demonstrated by this recent question, sharing can make a huge difference to performance (in many cases, this can be considered a form of memoisation).
Project Euler is mostly about finding clever algorithmic solutions to the problems. Once you have the right algorithm, micro-optimization is rarely an issue, since even a straightforward or interpreted (e.g. Python or Ruby) implementation should run well within the speed constraints. The main technique you need is understanding lazy evaluation so you can avoid thunk buildups.

Code generation by genetic algorithms

Evolutionary programming seems to be a great way to solve many optimization problems. The idea is very easy and the implementation does not make problems.
I was wondering if there is any way to evolutionarily create a program in ruby/python script (or any other language)?
The idea is simple:
Create a population of programs
Perform genetic operations (roulette-wheel selection or any other selection), create new programs with inheritance from best programs, etc.
Loop point 2 until program that will satisfy our condition is found
But there are still few problems:
How will chromosomes be represented? For example, should one cell of chromosome be one line of code?
How will chromosomes be generated? If they will be lines of code, how do we generate them to ensure that they are syntactically correct, etc.?
Example of a program that could be generated:
Create script that takes N numbers as input and returns their mean as output.
If there were any attempts to create such algorithms I'll be glad to see any links/sources.
If you are sure you want to do this, you want genetic programming, rather than a genetic algorithm. GP allows you to evolve tree-structured programs. What you would do would be to give it a bunch of primitive operations (while($register), read($register), increment($register), decrement($register), divide($result $numerator $denominator), print, progn2 (this is GP speak for "execute two commands sequentially")).
You could produce something like this:
progn2(
progn2(
read($1)
while($1
progn2(
while($1
progn2( #add the input to the total
increment($2)
decrement($1)
)
)
progn2( #increment number of values entered, read again
increment($3)
read($1)
)
)
)
)
progn2( #calculate result
divide($1 $2 $3)
print($1)
)
)
You would use, as your fitness function, how close it is to the real solution. And therein lies the catch, that you have to calculate that traditionally anyway*. And then have something that translates that into code in (your language of choice). Note that, as you've got a potential infinite loop in there, you'll have to cut off execution after a while (there's no way around the halting problem), and it probably won't work. Shucks. Note also, that my provided code will attempt to divide by zero.
*There are ways around this, but generally not terribly far around it.
It can be done, but works very badly for most kinds of applications.
Genetic algorithms only work when the fitness function is continuous, i.e. you can determine which candidates in your current population are closer to the solution than others, because only then you'll get improvements from one generation to the next. I learned this the hard way when I had a genetic algorithm with one strongly-weighted non-continuous component in my fitness function. It dominated all others and because it was non-continuous, there was no gradual advancement towards greater fitness because candidates that were almost correct in that aspect were not considered more fit than ones that were completely incorrect.
Unfortunately, program correctness is utterly non-continuous. Is a program that stops with error X on line A better than one that stops with error Y on line B? Your program could be one character away from being correct, and still abort with an error, while one that returns a constant hardcoded result can at least pass one test.
And that's not even touching on the matter of the code itself being non-continuous under modifications...
Well this is very possible and #Jivlain correctly points out in his (nice) answer that genetic Programming is what you are looking for (and not simple Genetic Algorithms).
Genetic Programming is a field that has not reached a broad audience yet, partially because of some of the complications #MichaelBorgwardt indicates in his answer. But those are mere complications, it is far from true that this is impossible to do. Research on the topic has been going on for more than 20 years.
Andre Koza is one of the leading researchers on this (have a look at his 1992 work) and he demonstrated as early as 1996 how genetic programming can in some cases outperform naive GAs on some classic computational problems (such as evolving programs for Cellular Automata synchronization).
Here's a good Genetic Programming tutorial from Koza and Poli dated 2003.
For a recent reference you might wanna have a look at A field guide to genetic programming (2008).
Since this question was asked, the field of genetic programming has advanced a bit, and there have been some additional attempts to evolve code in configurations other than the tree structures of traditional genetic programming. Here are just a few of them:
PushGP - designed with the goal of evolving modular functions like human coders use, programs in this system store all variables and code on different stacks (one for each variable type). Programs are written by pushing and popping commands and data off of the stacks.
FINCH - a system that evolves Java byte-code. This has been used to great effect to evolve game-playing agents.
Various algorithms have started evolving C++ code, often with a step in which compiler errors are corrected. This has had mixed, but not altogether unpromising results. Here's an example.
Avida - a system in which agents evolve programs (mostly boolean logic tasks) using a very simple assembly code. Based off of the older (and less versatile) Tierra.
The language isn't an issue. Regardless of the language, you have to define some higher-level of mutation, otherwise it will take forever to learn.
For example, since any Ruby language can be defined in terms of a text string, you could just randomly generate text strings and optimize that. Better would be to generate only legal Ruby programs. However, it would also take forever.
If you were trying to build a sorting program and you had high level operations like "swap", "move", etc. then you would have a much higher chance of success.
In theory, a bunch of monkeys banging on a typewriter for an infinite amount of time will output all the works of Shakespeare. In practice, it isn't a practical way to write literature. Just because genetic algorithms can solve optimization problems doesn't mean that it's easy or even necessarily a good way to do it.
The biggest selling point of genetic algorithms, as you say, is that they are dirt simple. They don't have the best performance or mathematical background, but even if you have no idea how to solve your problem, as long as you can define it as an optimization problem you will be able to turn it into a GA.
Programs aren't really suited for GA's precisely because code isn't good chromossome material. I have seen someone who did something similar with (simpler) machine code instead of Python (although it was more of an ecossystem simulation then a GA per se) and you might have better luck if you codify your programs using automata / LISP or something like that.
On the other hand, given how alluring GA's are and how basically everyone who looks at them asks this same question, I'm pretty sure there are already people who tried this somewhere - I just have no idea if any of them succeeded.
Good luck with that.
Sure, you could write a "mutation" program that reads a program and randomly adds, deletes, or changes some number of characters. Then you could compile the result and see if the output is better than the original program. (However we define and measure "better".) Of course 99.9% of the time the result would be compile errors: syntax errors, undefined variables, etc. And surely most of the rest would be wildly incorrect.
Try some very simple problem. Say, start with a program that reads in two numbers, adds them together, and outputs the sum. Let's say that the goal is a program that reads in three numbers and calculates the sum. Just how long and complex such a program would be of course depends on the language. Let's say we have some very high level language that lets us read or write a number with just one line of code. Then the starting program is just 4 lines:
read x
read y
total=x+y
write total
The simplest program to meet the desired goal would be something like
read x
read y
read z
total=x+y+z
write total
So through a random mutation, we have to add "read z" and "+z", a total of 9 characters including the space and the new-line. Let's make it easy on our mutation program and say it always inserts exactly 9 random characters, that they're guaranteed to be in the right places, and that it chooses from a character set of just 26 letters plus 10 digits plus 14 special characters = 50 characters. What are the odds that it will pick the correct 9 characters? 1 in 50^9 = 1 in 2.0e15. (Okay, the program would work if instead of "read z" and "+z" it inserted "read w" and "+w", but then I'm making it easy by assuming it magically inserts exactly the right number of characters and always inserts them in the right places. So I think this estimate is still generous.)
1 in 2.0e15 is a pretty small probability. Even if the program runs a thousand times a second, and you can test the output that quickly, the chance is still just 1 in 2.0e12 per second, or 1 in 5.4e8 per hour, 1 in 2.3e7 per day. Keep it running for a year and the chance of success is still only 1 in 62,000.
Even a moderately competent programmer should be able to make such a change in, what, ten minutes?
Note that changes must come in at least "packets" that are correct. That is, if a mutation generates "reax z", that's only one character away from "read z", but it would still produce compile errors, and so would fail.
Likewise adding "read z" but changing the calculation to "total=x+y+w" is not going to work. Depending on the language, you'll either get errors for the undefined variable or at best it will have some default value, like zero, and give incorrect results.
You could, I suppose, theorize incremental solutions. Maybe one mutation adds the new read statement, then a future mutation updates the calculation. But without the calculation, the additional read is worthless. How will the program be evaluated to determine that the additional read is "a step in the right direction"? The only way I see to do that is to have an intelligent being read the code after each mutation and see if the change is making progress toward the desired goal. And if you have an intelligent designer who can do that, that must mean that he knows what the desired goal is and how to achieve it. At which point, it would be far more efficient to just make the desired change rather than waiting for it to happen randomly.
And this is an exceedingly trivial program in a very easy language. Most programs are, what, hundreds or thousands of lines, all of which must work together. The odds against any random process writing a working program are astronomical.
There might be ways to do something that resembles this in some very specialized application, where you are not really making random mutations, but rather making incremental modifications to the parameters of a solution. Like, we have a formula with some constants whose values we don't know. We know what the correct results are for some small set of inputs. So we make random changes to the constants, and if the result is closer to the right answer, change from there, if not, go back to the previous value. But even at that, I think it would rarely be productive to make random changes. It would likely be more helpful to try changing the constants according to a strict formula, like start with changing by 1000's, then 100's then 10's, etc.
I want to just give you a suggestion. I don't know how successful you'd be, but perhaps you could try to evolve a core war bot with genetic programming. Your fitness function is easy: just let the bots compete in a game. You could start with well known bots and perhaps a few random ones then wait and see what happens.

Pseudocode interpreter?

Like lots of you guys on SO, I often write in several languages. And when it comes to planning stuff, (or even answering some SO questions), I actually think and write in some unspecified hybrid language. Although I used to be taught to do this using flow diagrams or UML-like diagrams, in retrospect, I find "my" pseudocode language has components of C, Python, Java, bash, Matlab, perl, Basic. I seem to unconsciously select the idiom best suited to expressing the concept/algorithm.
Common idioms might include Java-like braces for scope, pythonic list comprehensions or indentation, C++like inheritance, C#-style lambdas, matlab-like slices and matrix operations.
I noticed that it's actually quite easy for people to recognise exactly what I'm triying to do, and quite easy for people to intelligently translate into other languages. Of course, that step involves considering the corner cases, and the moments where each language behaves idiosyncratically.
But in reality, most of these languages share a subset of keywords and library functions which generally behave identically - maths functions, type names, while/for/if etc. Clearly I'd have to exclude many 'odd' languages like lisp, APL derivatives, but...
So my questions are,
Does code already exist that recognises the programming language of a text file? (Surely this must be a less complicated task than eclipse's syntax trees or than google translate's language guessing feature, right?) In fact, does the SO syntax highlighter do anything like this?
Is it theoretically possible to create a single interpreter or compiler that recognises what language idiom you're using at any moment and (maybe "intelligently") executes or translates to a runnable form. And flags the corner cases where my syntax is ambiguous with regards to behaviour. Immediate difficulties I see include: knowing when to switch between indentation-dependent and brace-dependent modes, recognising funny operators (like *pointer vs *kwargs) and knowing when to use list vs array-like representations.
Is there any language or interpreter in existence, that can manage this kind of flexible interpreting?
Have I missed an obvious obstacle to this being possible?
edit
Thanks all for your answers and ideas. I am planning to write a constraint-based heuristic translator that could, potentially, "solve" code for the intended meaning and translate into real python code. It will notice keywords from many common languages, and will use syntactic clues to disambiguate the human's intentions - like spacing, brackets, optional helper words like let or then, context of how variables are previously used etc, plus knowledge of common conventions (like capital names, i for iteration, and some simplistic limited understanding of naming of variables/methods e.g containing the word get, asynchronous, count, last, previous, my etc). In real pseudocode, variable naming is as informative as the operations themselves!
Using these clues it will create assumptions as to the implementation of each operation (like 0/1 based indexing, when should exceptions be caught or ignored, what variables ought to be const/global/local, where to start and end execution, and what bits should be in separate threads, notice when numerical units match / need converting). Each assumption will have a given certainty - and the program will list the assumptions on each statement, as it coaxes what you write into something executable!
For each assumption, you can 'clarify' your code if you don't like the initial interpretation. The libraries issue is very interesting. My translator, like some IDE's, will read all definitions available from all modules, use some statistics about which classes/methods are used most frequently and in what contexts, and just guess! (adding a note to the program to say why it guessed as such...) I guess it should attempt to execute everything, and warn you about what it doesn't like. It should allow anything, but let you know what the several alternative interpretations are, if you're being ambiguous.
It will certainly be some time before it can manage such unusual examples like #Albin Sunnanbo's ImportantCustomer example. But I'll let you know how I get on!
I think that is quite useless for everything but toy examples and strict mathematical algorithms. For everything else the language is not just the language. There are lots of standard libraries and whole environments around the languages. I think I write almost as many lines of library calls as I write "actual code".
In C# you have .NET Framework, in C++ you have STL, in Java you have some Java libraries, etc.
The difference between those libraries are too big to be just syntactic nuances.
<subjective>
There has been attempts at unifying language constructs of different languages to a "unified syntax". That was called 4GL language and never really took of.
</subjective>
As a side note I have seen a code example about a page long that was valid as c#, Java and Java script code. That can serve as an example of where it is impossible to determine the actual language used.
Edit:
Besides, the whole purpose of pseudocode is that it does not need to compile in any way. The reason you write pseudocode is to create a "sketch", however sloppy you like.
foreach c in ImportantCustomers{== OrderValue >=$1M}
SendMailInviteToSpecialEvent(c)
Now tell me what language it is and write an interpreter for that.
To detect what programming language is used: Detecting programming language from a snippet
I think it should be possible. The approach in 1. could be leveraged to do this, I think. I would try to do it iteratively: detect the syntax used in the first line/clause of code, "compile" it to intermediate form based on that detection, along with any important syntax (e.g. begin/end wrappers). Then the next line/clause etc. Basically write a parser that attempts to recognize each "chunk". Ambiguity could be flagged by the same algorithm.
I doubt that this has been done ... seems like the cognitive load of learning to write e.g. python-compatible pseudocode would be much easier than trying to debug the cases where your interpreter fails.
a. I think the biggest problem is that most pseudocode is invalid in any language. For example, I might completely skip object initialization in a block of pseudocode because for a human reader it is almost always straightforward to infer. But for your case it might be completely invalid in the language syntax of choice, and it might be impossible to automatically determine e.g. the class of the object (it might not even exist). Etc.
b. I think the best you can hope for is an interpreter that "works" (subject to 4a) for your pseudocode only, no-one else's.
Note that I don't think that 4a,4b are necessarily obstacles to it being possible. I just think it won't be useful for any practical purpose.
Recognizing what language a program is in is really not that big a deal. Recognizing the language of a snippet is more difficult, and recognizing snippets that aren't clearly delimited (what do you do if four lines are Python and the next one is C or Java?) is going to be really difficult.
Assuming you got the lines assigned to the right language, doing any sort of compilation would require specialized compilers for all languages that would cooperate. This is a tremendous job in itself.
Moreover, when you write pseudo-code you aren't worrying about the syntax. (If you are, you're doing it wrong.) You'll wind up with code that simply can't be compiled because it's incomplete or even contradictory.
And, assuming you overcame all these obstacles, how certain would you be that the pseudo-code was being interpreted the way you were thinking?
What you would have would be a new computer language, that you would have to write correct programs in. It would be a sprawling and ambiguous language, very difficult to work with properly. It would require great care in its use. It would be almost exactly what you don't want in pseudo-code. The value of pseudo-code is that you can quickly sketch out your algorithms, without worrying about the details. That would be completely lost.
If you want an easy-to-write language, learn one. Python is a good choice. Use pseudo-code for sketching out how processing is supposed to occur, not as a compilable language.
An interesting approach would be a "type-as-you-go" pseudocode interpreter. That is, you would set the language to be used up front, and then it would attempt to convert the pseudo code to real code, in real time, as you typed. An interactive facility could be used to clarify ambiguous stuff and allow corrections. Part of the mechanism could be a library of code which the converter tried to match. Over time, it could learn and adapt its translation based on the habits of a particular user.
People who program all the time will probably prefer to just use the language in most cases. However, I could see the above being a great boon to learners, "non-programmer programmers" such as scientists, and for use in brainstorming sessions with programmers of various languages and skill levels.
-Neil
Programs interpreting human input need to be given the option of saying "I don't know." The language PL/I is a famous example of a system designed to find a reasonable interpretation of anything resembling a computer program that could cause havoc when it guessed wrong: see http://horningtales.blogspot.com/2006/10/my-first-pli-program.html
Note that in the later language C++, when it resolves possible ambiguities it limits the scope of the type coercions it tries, and that it will flag an error if there is not a unique best interpretation.
I have a feeling that the answer to 2. is NO. All I need to prove it false is a code snippet that can be interpreted in more than one way by a competent programmer.
Does code already exist that
recognises the programming language
of a text file?
Yes, the Unix file command.
(Surely this must be a less
complicated task than eclipse's syntax
trees or than google translate's
language guessing feature, right?) In
fact, does the SO syntax highlighter
do anything like this?
As far as I can tell, SO has a one-size-fits-all syntax highlighter that tries to combine the keywords and comment syntax of every major language. Sometimes it gets it wrong:
def median(seq):
"""Returns the median of a list."""
seq_sorted = sorted(seq)
if len(seq) & 1:
# For an odd-length list, return the middle item
return seq_sorted[len(seq) // 2]
else:
# For an even-length list, return the mean of the 2 middle items
return (seq_sorted[len(seq) // 2 - 1] + seq_sorted[len(seq) // 2]) / 2
Note that SO's highlighter assumes that // starts a C++-style comment, but in Python it's the integer division operator.
This is going to be a major problem if you try to combine multiple languages into one. What do you do if the same token has different meanings in different languages? Similar situations are:
Is ^ exponentiation like in BASIC, or bitwise XOR like in C?
Is || logical OR like in C, or string concatenation like in SQL?
What is 1 + "2"? Is the number converted to a string (giving "12"), or is the string converted to a number (giving 3)?
Is there any language or interpreter
in existence, that can manage this
kind of flexible interpreting?
On another forum, I heard a story of a compiler (IIRC, for FORTRAN) that would compile any program regardless of syntax errors. If you had the line
= Y + Z
The compiler would recognize that a variable was missing and automatically convert the statement to X = Y + Z, regardless of whether you had an X in your program or not.
This programmer had a convention of starting comment blocks with a line of hyphens, like this:
C ----------------------------------------
But one day, they forgot the leading C, and the compiler choked trying to add dozens of variables between what it thought was subtraction operators.
"Flexible parsing" is not always a good thing.
To create a "pseudocode interpreter," it might be necessary to design a programming language that allows user-defined extensions to its syntax. There already are several programming languages with this feature, such as Coq, Seed7, Agda, and Lever. A particularly interesting example is the Inform programming language, since its syntax is essentially "structured English."
The Coq programming language allows "syntax extensions", so the language can be extended to parse new operators:
Notation "A /\ B" := (and A B).
Similarly, the Seed7 programming language can be extended to parse "pseudocode" using "structured syntax definitions." The while loop in Seed7 is defined in this way:
syntax expr: .while.().do.().end.while is -> 25;
Alternatively, it might be possible to "train" a statistical machine translation system to translate pseudocode into a real programming language, though this would require a large corpus of parallel texts.

Writing shorter code/algorithms, is more efficient (performance)?

After coming across the code golf trivia around the site it is obvious people try to find ways to write code and algorithms as short as the possibly can in terms of characters, lines and total size, even if that means writing something like:
//Code by: job
//Topic: Code Golf - Collatz Conjecture
n=input()
while n>1:n=(n/2,n*3+1)[n%2];print n
So as a beginner I start to wonder whether size actually matters :D
It is obviously a very subjective question highly dependent on the actual code being used, but what is the rule of thumb in the real world.
In the case that size wont matter, how come then we don't focus more on performance rather than size?
I hope this does not become a flame war. Good code has many attributes, including:
Solving the use-case properly.
Readability.
Maintainability.
Performance.
Testability.
Low memory signature.
Good user interface.
Reusability.
The brevity of code is not that important in 21st century programming. It used to be more important when memory was really scarce. Please see this question, including my answer, for books referencing the attributes above.
A lot of good answers already about what's important versus what's not. In real life, (almost) nobody writes code like code golf, wtih shortened identifiers, minimal whitespace, and the fewest possible statements.
That said, "more code" does correlate with more bugs and complexity, and "less code" tends to correlate with better readability and performance. So all other things being equal, it's useful to strive for shorter code, but only in the sense of "these simple 30 lines of code do the same as that 100 complex lines of code".
Writing "code golf" solutions are often to do with showing how "clever" you are in getting the job done in the most succinct way even at the expense of readability. Quite often, however, more verbose code including, for example, memoization of function results, can be faster. Code size can matter for performance, smaller blocks of code can fit in the L1 CPU cache but this is an extreme case of optimization and a faster algorithm will most always be better. "Code Golf" code is not like production code - always write for clarity & readability of the solution rather than terseness if anyone, including yourself, ever intend to read that code again.
Whitespace has no effect on performance. So code like that is just silly (or perhaps the golf score was based on the character count?). The number of lines also has no effect, although the number of statements can have an effect. (exception: python, where whitespace is significant)
The effect is complex, however. It's not at all uncommon to discover that you have to add statements to a function in order to improve it's performance.
Still, without knowing anything else, bet that more statements is a larger object file, and a slower program. But more lines doesn't do anything other than make code more readable up to a point (after which adding more lines makes it less readable ;)
I don't believe that Code Golf has any practical significance. In practice, readable code is what counts. Which in itself is a conflicting requirement: readable code should be concise, but still easy to understood.
However, I would like to answer your question yet differently. Usually, there are fast and simple algorithms. However, if the speed is top priority, things can get complex real fast (and the resulting code will be longer). I don't believe that simplicity equals speed.
There are many aspects to performance. Performance can for example be measured by memory footprint, speed of execution, bandwith consumption, framerate, maintainability, supportability and so on. Performance usually means spending as little as possible of the most scarce resource.
When applied to networking, brevity IS performance. If your webserver serves a little javascript snippet on every page, it doesn't exactly hurt to keep the variable names short. Pull up www.google.com and view source!
Some times DRY is not helping performance. An example is that Microsoft has found that they don't want a to loop through an array unless it is bigger than 3 elements.
String.Format has signatures for one, two and three arguments, and then for array.
There are many ways of trading one aspect for another. This is usually called caching.
You can for example trade memory footprint for speed of execution. For example by doing lookup instead of execution. It is just a matter of replacing () with [] in most popular languages. If you plan it so that the spaceship in your game can only go in a fixed number of directions, you can save on trigonometric function calls.
Or you can use a proxy server with a cache for looking up things over a network. DNS servers do this all the time.
Finally, if development team availability is the most scarce resource, clarity of code is the best bet for maintainability performance, even if it doesn't run quite as fast or is quite as interesting or "elegant" in code.
Absolutely not. Code size and performance (however you measure it) are only very loosly connected. To make matter worse whats a neat trick on one chip/compiler/OS may very well be the worse thing you can do in another archictecture.
Its counter-intuitive but a clear well written simple as possible implmentation is often far more efficient than than a devious bag of tricks. Today's optimizing compilers like clear uncomplicated code just as much as humans and complex trickery can cause them to abandon thier best optimizing strategies.
Writing fewer lines of code tends to be better for a bunch of reasons. For example, the less code you have, the less chance for bugs. See for example Paul Graham's essay, "Succinctness is Power"
Notwithstanding that, the level reached by Code Golf is usually far beyond what makes sense. In Code Golf, people are trying to write code that is as short as possible, even if they know that it's less readable.
Efficiency is a much harder thing to decide. I'm guessing that less code is usually more efficient, but there are many cases where this isn't true.
So to answer the real question, why do we even have Code Golf competitions which aim at a low character count, if that's not a very important thing?
Two reasons:
Making code as short as possible means you have to be both clever, and know a language pretty well to find all kinds of tricks. This makes it a fun riddle.
Also, it's the easiest measure to use for a code competition. Efficiency, for example, is very hard to measure, especially using many different languages, especially since some solutions are more efficient in some cases, but less in others (big input vs small). Readability: that's a very personal thing, which often leads to heated debates.
In short, I don't think there is any way of doing Code Golf style competitions without using "shortness of code" as the criterion.
This is from "10 Commandments for Java Developers"
Keep in Mind - "Less is more" is not always better. - Code efficiency is a great thing, but > in many situations writing less lines of code does not improve the efficiency of that code.
This is (probably) true for all programming languages (though in assembly it could be different).
It makes a difference if you're talking about little academic-style algorithms or real software, which can be thousands of lines of code. I'm talking about the latter.
Here's an example where a reasonably well-written program was speeded up by a factor of 43x, and it's code size was reduced by 4x.
"Code golf" is just squeezing code, like cramming undergraduates into a phone booth. I'm talking about reducing code by rewriting it in a form that is declarative, like a domain-specific-language (DSL). Since it is declarative, it maps more directly onto its requirements, so it is not puffed up with code that exists only for implementation's sake. That link shows an example of doing that.
This link shows a way of reducing size of UI code in a similar way.
Good performance is achieved by avoiding doing things that don't really have to be done. Of course, when you write code, you're not intentionally making it do unnecessary work, but if you do aggressive performance tuning as in that example, you'd be amazed at what you can remove.
The point of code golf is to optimise for one thing (source length), at the potential expense of everything else (performance, comprehensibility, robustness). If you accidentally improve performance that's a fluke - if you could shave a character off by doubling the runtime, then you would.
You ask "how come then we don't focus more on performance rather than size", but the question is based on a false premise that programmers focus more on code size than on performance. They don't, "code golf" is a minority interest. It's challenging and fun, but it's not important. Look at the number of questions tagged "code-golf" against the number tagged "performance".
As other people point out, making code shorter often means making it simpler to understand, by removing duplication and opportunities for obscure errors. That's usually more important than running speed. But code golf is a completely different thing, where you remove whitespace, comments, descriptive names, etc. The purpose isn't to make the code more comprehensible.

Resources