What is Eta abstraction in lambda calculus used for? - lambda-calculus

Eta Abstraction in lambda calculus means following.
A function f can be written as \x -> f x
Is Eta abstraction of any use while reducing lambda expressions?
Is it only an alternate way of writing certain expressions?
Practical use cases would be appreciated.

The eta reduction/expansion is just a consequence of the law that says that given
f = g
it must be, that for any x
f x = g x
and vice versa.
Hence given:
f x = (\y -> f y) x
we get, by beta reducing the right hand side
f x = f x
which must be true. Thus we can conclude
f = \y -> f y

First, to clarify the terminology, paraphrasing a quote from the Eta conversion article in the Haskell wiki (also incorporating Will Ness' comment above):
Converting from \x -> f x to f would
constitute an eta reduction, and moving in the opposite way
would be an eta abstraction or expansion. The term eta conversion can refer to the process in either direction.
Extensive use of η-reduction can lead to Pointfree programming.
It is also typically used in certain compile-time optimisations.
Summary of the use cases found:
Point-free (style of) programming
Allow lazy evaluation in languages using strict/eager evaluation strategies
Compile-time optimizations
1. Point-free (style of) programming
From the Tacit programming Wikipedia article:
Tacit programming, also called point-free style, is a programming
paradigm in which function definitions do not identify the arguments
(or "points") on which they operate. Instead the definitions merely
compose other functions
Borrowing a Haskell example from sth's answer (which also shows composition that I chose to ignore here):
inc x = x + 1
can be rewritten as
inc = (+) 1
This is because (following yatima2975's reasoning) inc x = x + 1 is just syntactic sugar for \x -> (+) 1 x so
\x -> f x => f
\x -> ((+) 1) x => (+) 1
(Check Ingo's answer for the full proof.)
There is a good thread on Stackoverflow on its usage. (See also this repl.it snippet.)
2. Allow lazy evaluation in languages using strict/eager evaluation strategies
Makes it possible to use lazy evaluation in eager/strict languages.
Paraphrasing from the MLton documentation on Eta Expansion:
Eta expansion delays the evaluation of f until the surrounding function/lambda is applied, and will re-evaluate f each time the function/lambda is applied.
Interesting Stackoverflow thread: Can every functional language be lazy?
2.1 Thunks
I could be wrong, but I think the notion of thunking or thunks belongs here. From the wikipedia article on thunks:
In computer programming, a thunk is a subroutine used to inject an
additional calculation into another subroutine. Thunks are primarily
used to delay a calculation until its result is needed, or to insert
operations at the beginning or end of the other subroutine.
The 4.2 Variations on a Scheme — Lazy Evaluation of the Structure and Interpretation of Computer Programs (pdf) has a very detailed introduction to thunks (and even though the latter has not one occurrence of the phrase "lambda calculus", it is worth reading).
(This paper also seemed interesting but didn't have the time to look into it yet: Thunks and the λ-Calculus.)
3. Compile-time optimizations
Completely ignorant on this topic, therefore just presenting sources:
From Georg P. Loczewski's The Lambda Calculus:
In 'lazy' languages like Lambda Calculus, A++, SML, Haskell, Miranda etc., eta conversion, abstraction and reduction alike, are mainly used within compilers. (See [Jon87] page 22.)
where [Jon87] expands to
Simon L. Peyton Jones
The Implementation of Functional Programming Languages
Prentice Hall International, Hertfordshire,HP2 7EZ, 1987.
ISBN 0 13 453325 9.
search results for "eta" reduction abstraction expansion conversion "compiler" optimization
4. Extensionality
This is another topic that I know little about, and this is more theoretical, so here it goes:
From the Lambda calculus wikipedia article:
η-reduction expresses the idea of extensionality, which in this context is that two functions are the same if and only if they give the same result for all arguments.
Some other sources:
nLab entry on Eta-conversion that goes deeper into its connection with extensionality, and its relationship with beta-conversion
ton of info in the What's the point of η-conversion in lambda calculus? on the Theoretical Computer Science Stackexchange (but beware: the author of the accepted answer seems to have a beef with the commonly held belief about the relationsship between eta reduction and extensionality, so make sure to read the entire page. Most of it was over my head so I have no opinions.)
The question above has been cross-posted to Math Exchange as well
Speaking of "over my head" stuff: here's Conor McBride's take; the only thing I understood were that eta conversions can be controversial in certain context, but reading his reply was that of trying to figure out an alien language (couldn't resist)
Saved this page recursively in Internet Archive so if any of the links are not live anymore then that snapshot may have saved those too.


How to understand the recursive search in Prolog?

Here is a section of Prolog code defining numeral in a recursive way:
numeral(succ(X)) :- numeral(X).
When given query numeral(X). Prolog will return:
X = 0 ;
X = succ(0) ;
X = succ(succ(0)) ;
X = succ(succ(succ(0))) ;
X = succ(succ(succ(succ(0)))) ;
X = succ(succ(succ(succ(succ(0))))) ;
X = succ(succ(succ(succ(succ(succ(0)))))) ;
X = succ(succ(succ(succ(succ(succ(succ(0))))))) ;
X = succ(succ(succ(succ(succ(succ(succ(succ(0))))))))
Based on what I have learned, when doing the query, prolog will firstly make X into a variable like (_G42), then it will search the facts and rules to find the match.
In this case, it will find 0 (fact) as a right match. Then it will also try to match the rule. That is considering _G42 is not 0, and _G42 is the succ of another number. Thus, another variable is generated(like _G44), _G44 will match 0 and will also go further like _G42. Since _G44 matches 0, then it will go backward to _G42, getting _G42 = succ(_G44) = succ(0).
I am not sure if I am right about the understanding. I made a diagram to show my comprehension on this problem.
If the analysis is correct, I still feel difficult to design the recursive function like this. Since I am new to Prolog, I want to know if this kind of definition always used in application (say building an expert system, verifying protocols) or it is just for beginners to better understanding the basic searching procedure? If it is often used, what is the key point to design this kind of recursive definition?
My personal opinion: Especially as a beginner, you have zero chance to"understand the recursive search in Prolog". Countless beginners are trying to understand Prolog in this way, and they very consistently fail.
The sad part is that this hits hardest workers the hardest: You always think you can somehow understand it, but in the end, you cannot, because there are too many ways to invoke even the simplest predicates, with uninstantiated and (partly) instantiated arguments, and even with aliased variables.
Your graph nicely illustrates that such a procedural reading gets extremely unwieldy very quickly for even the simplest conceivable recursive definitions.
A much more tractable approach for understanding the predicate is to read it declaratively:
0 is a numeral
If X is a numeral (whatever X is!), then succ(X) of X is also a numeral.
Note that :- even means ←, i.e., an implication from right to left.
My recommendation is to focus on a clear declarative description of what ought to hold. To overcome the initial barriers with Prolog, you must let go the idea that you can trace the steps that the CPU performs in the extreme detail in which you are currently trying to follow it. Prolog is too high-level to be amenable to tracing in this low-level way. It is like trying to interpret between French and English by tracing only the neuronal activities of the speakers.
Write a clear definition and then leave the search to Prolog. There are many other and working ways to understand and break down declarative definitions without getting swamped in low-level details. See for example program-slicing and failure-slicing. They work as long as you stay in the so-called pure monotonic subset of Prolog. Focus on this area, and you will be able to make very fast progress.

Representing syntactically different terms in TPTP

I am having a look at first order logic theorem provers such as Vampire and E-Prover, and the TPTP syntax seems to be the way to go. I am more familiar with Logic Programming syntaxes such as Answer Set Programming and Prolog, and although I try refering to a detailed description of the TPTP syntax I still don't seem to grasp how to properly distinguish between interpreted and non interpreted functor (and I might be using the terminology wrong).
Essentially, I am trying to prove a theorem by showing that no model acts as a counter-example. My first difficulty was that I did not expect the following logic program to be satisfiable.
fof(all_foo, axiom, ![X] : (pred(X) => (X = foo))).
fof(exists_bar, axiom, pred(bar)).
It is indeed satisfiable because nothing prevents bar from being equal to foo. So a first solution would be to insist that these two terms are distinct and we obtain the following unsatisfiable program.
fof(all_foo, axiom, ![X] : pred(X) => (X = foo)).
fof(exists_bar, axiom, pred(bar)).
fof(foo_not_bar, axiom, foo != bar).
The Techinal Report clarifies that different double quoted strings are different objects indeed, so another solution is to put quotes here and there, so as to obtain the following unsatisfiable program.
fof(all_foo, axiom, ![X] : (pred(X) => (X = "foo"))).
fof(exists_bar, axiom, pred("bar")).
I am happy not to have manually specify the inequality as that would obviously not scale to a more realistic scenario. Moving closer to my real situation, I actually have to handle composed terms, and the following program is unfortunately satisfiable.
fof(all_foo, axiom, ![X] : (pred(X) => (X = f("foo")))).
fof(exists_bar, axiom, pred(g("bar"))).
I guess f("foo") is not a term but the function f applied to the object "foo". So it could potentially coincide with function g. Although a manual specification that f and g never coincide does the trick, the following program is unsatisfiable, I feel like I'm doing it wrong. And it probably wouldn't scale to my real setting with plenty of terms all to be interpreted as distinct when they are syntactically distinct.
fof(all_foo, axiom, ![X] : (pred(X) => (X = f("foo")))).
fof(exists_bar, axiom, pred(g("bar"))).
fof(f_not_g, axiom, ![X, Y] : f(X) != g(Y)).
I have tried throwing single quotes around, but I didn't find the proper way to do it.
How do I make syntactically different (composed) terms and test for syntactical equality?
Subsidiary question: the following program is satisfiable, because the automated-theorem prover understands f as a function rather than a uninterpreted functor.
fof(exists_f_g, axiom, (?[I] : ((f(foo) = f(I)) & pred(g(I))))).
fof(not_g_foo, axiom, ~pred(g(foo))).
To make it unsatisfiable, I need to manually specify that f is injective. What would be the natural way to obtain this behaviour without specifying injectivity of all functors that occur in my program?
fof(exists_f_g, axiom, (?[I] : ((f(foo) = f(I)) & pred(g(I))))).
fof(not_g_foo, axiom, ~pred(g(foo))).
fof(f_injective, axiom, ![X,Y] : (f(X) = f(Y) => (X = Y))).
First of all let me point you to the Syntax BNF of TPTP. In principle, you have Prolog terms with some predefined infix/prefix operators of appropriate precedences. This means, variables are written in upper case and constants are written in lower case. Also like Prolog, escaping with single quotes allows us to write a constant starting with a capital letter i.e. 'X'. I have never seen double quoted atoms so far, so you might want look up the instructions of the prover on how to interpret them.
But even though the syntax is Prolog-ish, automated theorem proving is a different kind of beast. There is no closed world assumption nor are different constants assumed to be different - that's why you cannot find a proof for:
fof(c1, conjecture, a=b ).
and neither for:
fof(c1, conjecture, ~(a=b) ).
So if you want to have syntactic dis-equality, you need to axiomatize it. Now, assuming a different from b trivially shows that they are different, so I at least claimed: "Suppose there are two different constants a and b, then there exists some variable which is not b."
fof(a1, axiom, ~(a=b)).
fof(c1, conjecture, ?[X]: ~(X=b)).
Since functions in first-order logic are not necessarily injective, you also don't get around of adding your assumption in there.
Please also note the different roles of input formulas: so far you only stated axioms and no conjectures i.e. you ask the prover to show your axiom set to be inconsistent. Some provers might even give up because they use some resolution refinements (e.g. set of support) which restricts resolution between axioms[1]. In any case, you need to be aware that the formula you are trying to prove is of the form A1 ∧ ... ∧ An → C1 ∨ ... Cm where the A are axioms and the C are conjectures.[2]
I hope that at least the syntax is a bit clearer now - unfortunately the answer to the questions is more that atomated theorem provers don't make the same assumptions as you expect, so you have to axiomatize them. These axiomatizations are also often ineffective and you might get better perfomance from specialized tools.
[1] As you already notice, advanced provers like Vampire or E Prover tell you about (counter-)satisfyability instead.
[2] A resolution based theorem prover will first negate that formula and perform a CNF transformation, but even though most TPTP accepting provers are resolution based, that's not a requirement.

Is there a formalised high-level notation for Pseudocode?

I'd like to be able to reason about code on paper better than just writing boxes or pseudocode.
The key thing here is paper. On a machine, I can most likely use a high-level language with a linter/compiler very quickly, and a keyboard restricts what can be done, somewhat.
A case study is APL, a language that we semi-jokingly describe as "write-only". Here is an example:
m ← +/3+⍳4
(Explanation: ⍳4 creates an array, [1,2,3,4], then 3 is added to each component, which are then summed together and the result stored in variable m.)
Look how concise that is! Imagine having to type those symbols in your day job! But, writing iota and arrows on a whiteboard is fine, saves time and ink.
Here's its haskell equivalent:
m = foldl (+) 0 (map (+3) [1..4])
And Python:
reduce(add, map(lambda x: x+3, range(4)))
But the principle behind these concise programming languages is different: they use words and punctuation to describe high-level actions (such as fold), whereas I want to write symbols for these common actions.
Does such a formalised pseudocode exist?
Not to be snarky, but you could use APL. It was after all originally invented as a mathematical notation before it was turned into a programming language. I seem to remember that there was something like what I think you are talking about in Backus' Turing Award lecture. Finally, maybe Z Notation is what you want: https://en.m.wikipedia.org/wiki/Z_notation

When to use various language pragmas and optimisations?

I have a fair bit of understanding of haskell but I am always little unsure about what kind of pragmas and optimizations I should use and where. Like
Like when to use SPECIALIZE pragma and what performance gains it has.
Where to use RULES. I hear people taking about a particular rule not firing? How do we check that?
When to make arguments of a function strict and when does that help? I understand that making argument strict will make the arguments to be evaluated to normal form, then why should I not add strictness to all function arguments? How do I decide?
How do I see and check I have a space leak in my program? What are the general patterns which constitute to a space leak?
How do I see if there is a problem with too much lazyness? I can always check the heap profiling but I want to know what are the general cause, examples and patterns where lazyness hurts?
Is there any source which talks about advanced optimizations (both at higher and very low levels) especially particular to haskell?
Like when to use SPECIALIZE pragma and what performance gains it has.
You let the compiler specialise a function if you have a (type class) polymorphic function, and expect it to be called often at one or a few instances of the class(es).
The specialisation removes the dictionary lookup where it is used, and often enables further optimisation, the class member functions can often be inlined then, and they are subject to strictness analysis, both give potentially huge performance gains. If the only optimisation possible is the elimination of the dicitonary lookup, the gain won't generally be huge.
As of GHC-7, it's probably more useful to give the function an {-# INLINABLE #-} pragma, which makes its (nearly unchanged, some normalising and desugaring is performed) source available in the interface file, so the function can be specialised and possibly even inlined at the call site.
Where to use RULES. I hear people taking about a particular rule not firing? How do we check that?
You can check which rules have fired by using the -ddump-rule-firings command line option. That usually dumps a large number of fired rules, so you have to search a bit for your own rules.
You use rules
when you have a more efficient version of a function for special types, e.g.
"realToFrac/Float->Double" realToFrac = float2Double
when some functions can be replaced with a more efficient version for special arguments, e.g.
"^2/Int" forall x. x ^ (2 :: Int) = let u = x in u*u
"^3/Int" forall x. x ^ (3 :: Int) = let u = x in u*u*u
"^4/Int" forall x. x ^ (4 :: Int) = let u = x in u*u*u*u
"^5/Int" forall x. x ^ (5 :: Int) = let u = x in u*u*u*u*u
"^2/Integer" forall x. x ^ (2 :: Integer) = let u = x in u*u
"^3/Integer" forall x. x ^ (3 :: Integer) = let u = x in u*u*u
"^4/Integer" forall x. x ^ (4 :: Integer) = let u = x in u*u*u*u
"^5/Integer" forall x. x ^ (5 :: Integer) = let u = x in u*u*u*u*u
when rewriting an expression according to general laws might produce code that's better to optimise, e.g.
"map/map" forall f g. (map f) . (map g) = map (f . g)
Extensive use of RULES in the latter style is made in fusion frameworks, for example in the text library, and for the list functions in base, a different kind of fusion (foldr/build fusion) is implemented using rules.
When to make arguments of a function strict and when does that help? I understand that making argument strict will make the arguments to be evaluated to normal form, then why should I not add strictness to all function arguments? How do I decide?
Making an argument strict will ensure that it is evaluated to weak head normal form, not to normal form.
You do not make all arguments strict because some functions must be non-strict in some of their arguments to work at all and some are less efficient if strict in all arguments.
For example partition must be non-strict in its second argument to work at all on infinite lists, more general every function used in foldr must be non-strict in the second argument to work on infinite lists. On finite lists, having the function non-strict in the second argument can make it dramatically more efficient (foldr (&&) True (False:replicate (10^9) True)).
You make an argument strict, if you know that the argument must be evaluated before any worthwhile work can be done anyway. In many cases, the strictness analyser of GHC can do that on its own, but of course not in all.
A very typical case are accumulators in loops or tail recursions, where adding strictness prevents the building of huge thunks on the way.
I know no hard-and-fast rules for where to add strictness, for me it's a matter of experience, after a while you learn in what places adding strictness is likely to help and where to harm.
As a rule of thumb, it makes sense to keep small data (like Int) evaluated, but there are exceptions.
How do I see and check I have a space leak in my program? What are the general patterns which constitute to a space leak?
The first step is to use the +RTS -s option (if the programme was linked with rtsopts enabled). That shows you how much memory was used overall, and you can often judge by that whether you have a leak.
A more informative output can be obtained from running the programme with the +RTS -hT option, that produces a heap profile that can help locating the space leak (also, the programme needs to be linked with enabled rtsopts).
If further analysis is required, the programme needs to be compiled with profiling enabled (-rtsops -prof -fprof-auto, in older GHCs, the -fprof-auto option wasn't available, the -prof-auto-all option is the closest correspondence there).
Then you run it with various profiling options and look at the generated heap profiles.
The two most common causes for space leaks are
too much laziness
too much strictness
the third place is probably taken by unwanted sharing, GHC does little common subexpression elimination, but it occasionally shares long lists even where not wanted.
For finding the cause of a leak, I know again no hard-and-fast rules, and occasionally, a leak can be fixed by adding strictness in one place or by adding laziness in another.
How do I see if there is a problem with too much lazyness? I can always check the heap profiling but I want to know what are the general cause, examples and patterns where lazyness hurts?
Generally, laziness is wanted where results can be built up incrementally, and unwanted where no part of the result can be delivered before processing is complete, like in left folds or generally in tail-recursive functions.
I recommend reading the GHC documentation on Pragmas and Rewrite Rules, as they address many of your questions about SPECIALIZE and RULES.
To briefly address your questions:
SPECIALIZE is used to force the compiler to build a specialized version of a polymorphic function for a particular type. The advantage is that applying the function in that case will no longer require the dictionary. The disadvantage is that it will increase the size of your program. Specialization is particularly valuable for functions called in "inner-loops", and it's essentially useless for infrequently called top-level functions. Refer to the GHC documentation for interactions with INLINE.
RULES allows you to specify rewrite rules that you know to be valid but the compiler couldn't infer on its own. The common example is {-# RULES "mapfusion" forall f g xs. map f (map g xs) = map (f.g) xs #-}, which tells GHC how to fuse map. It can be finicky to get GHC to use the rules because of interference with INLINE. 7.19.3 touches on how to avoid conflicts and also how to force GHC to use a rule even when it would normally avoid it.
Strict arguments are most vital for something like an accumulator in a tail-recursive function. You know that the value will ultimately be fully calculated, and building up a stack of closures to delay the computation completely defeats the purpose. Enforced strictness must naturally be avoided anytime the function may be applied to a value which must be processed lazily, like an infinite list. Generally, the best idea is to initially only force strictness where it's obviously useful (like accumulators), and then add more later only as profiling shows it's needed.
My experience has been that most show-stopping space leaks came from lazy accumulators and unevaluated lazy values in very large data-structures, although I'm sure this is specific to the kinds of programs you're writing. Using unboxed data-structures whenever possible fixes a lot of the problems.
Outside of the instances where laziness causes space-leaks, the major situation where it should be avoided is in IO. Lazily processing resource inherently increases the amount of wall-clock time that the resource is needed. This can be bad for cache performance, and it's obviously bad if something else wants exclusive rights to use the same resource.

What programming languages are context-free?

Or, to be a little more precise: which programming languages are defined by a context-free grammar?
From what I gather C++ is not context-free due to things like macros and templates. My gut tells me that functional languages might be context free, but I don't have any hard data to back that up with.
Extra rep for concise examples :-)
What programming languages are context-free? [...]
My gut tells me that functional languages might be context-free [...]
The short version: There are hardly any real-world programming languages that are context-free in any meaning of the word. Whether a language is context-free or not has nothing to do with it being functional. It is simply a matter of how complex the syntax is.
Here's a CFG for the imperative language Brainfuck:
Program → Instr Program | ε
Instr → '+' | '-' | '>' | '<' | ',' | '.' | '[' Program ']'
And here's a CFG for the functional SKI combinator calculus:
Program → E
E → 'S' E E E
E → 'K' E E
E → 'I'
E → '(' E ')'
These CFGs recognize all valid programs of the two languages because they're so simple.
The longer version: Usually, context-free grammars (CFGs) are only used to roughly specify the syntax of a language. One must distinguish between syntactically correct programs and programs that compile/evaluate correctly. Most commonly, compilers split language analysis into syntax analysis that builds and verifies the general structure of a piece of code, and semantic analysis that verifies the meaning of the program.
If by "context-free language" you mean "... for which all programs compile", then the answer is: hardly any. Languages that fit this bill hardly have any rules or complicated features, like the existence of variables, whitespace-sensitivity, a type system, or any other context: Information defined in one place and relied upon in another.
If, on the other hand, "context-free language" only means "... for which all programs pass syntax analysis", the answer is a matter of how complex the syntax alone is. There are many syntactic features that are hard or impossible to describe with a CFG alone. Some of these are overcome by adding additional state to parsers for keeping track of counters, lookup tables, and so on.
Examples of syntactic features that are not possible to express with a CFG:
Indentation- and whitespace-sensitive languages like Python and Haskell. Keeping track of arbitrarily nested indentation levels is essentially context-sensitive and requires separate counters for the indentation level; both how many spaces that are used for each level and how many levels there are.
Allowing only a fixed level of indentation using a fixed amount of spaces would work by duplicating the grammar for each level of indentation, but in practice this is inconvenient.
The C Typedef Parsing Problem says that C programs are ambiguous during lexical analysis because it cannot know from the grammar alone if something is a regular identifier or a typedef alias for an existing type.
The example is:
typedef int my_int;
my_int x;
At the semicolon, the type environment needs to be updated with an entry for my_int. But if the lexer has already looked ahead to my_int, it will have lexed it as an identifier rather than a type name.
In context-free grammar terms, the X → ... rule that would trigger on my_int is ambiguous: It could be either one that produces an identifier, or one that produces a typedef'ed type; knowing which one relies on a lookup table (context) beyond the grammar itself.
Macro- and template-based languages like Lisp, C++, Template Haskell, Nim, and so on. Since the syntax changes as it is being parsed, one solution is to make the parser into a self-modifying program. See also Is C++ context-free or context-sensitive?
Often, operator precedence and associativity are not expressed directly in CFGs even though it is possible. For example, a CFG for a small expression grammar where ^ binds tighter than ×, and × binds tighter than +, might look like this:
E → E ^ E
E → E × E
E → E + E
E → (E)
E → num
This CFG is ambiguous, however, and is often accompanied by a precedence / associativity table saying e.g. that ^ binds tightest, × binds tighter than +, that ^ is right-associative, and that × and + are left-associative.
Precedence and associativity can be encoded into a CFG in a mechanical way such that it is unambiguous and only produces syntax trees where the operators behave correctly. An example of this for the grammar above:
E₀ → EA E₁
EA → E₁ + EA
EA → ε
E₁ → EM E₂
EM → E₂ × EM
EM → ε
E₂ → E₃ EP
EP → ^ E₃ EP
E₃ → num
E₃ → (E₀)
But ambiguous CFGs + precedence / associativity tables are common because they're more readable and because various types of LR parser generator libraries can produce more efficient parsers by eliminating shift/reduce conflicts instead of dealing with an unambiguous, transformed grammar of a larger size.
In theory, all finite sets of strings are regular languages, and so all legal programs of bounded size are regular. Since regular languages are a subset of context-free languages, all programs of bounded size are context-free. The argument continues,
While it can be argued that it would be an acceptable limitation for a language to allow only programs of less than a million lines, it is not practical to describe a programming language as a regular language: The description would be far too large.
     — Torben Morgensen's Basics of Compiler Design, ch. 2.10.2
The same goes for CFGs. To address your sub-question a little differently,
Which programming languages are defined by a context-free grammar?
Most real-world programming languages are defined by their implementations, and most parsers for real-world programming languages are either hand-written or uses a parser generator that extends context-free parsing. It is unfortunately not that common to find an exact CFG for your favourite language. When you do, it's usually in Backus-Naur form (BNF), or a parser specification that most likely isn't purely context-free.
Examples of grammar specifications from the wild:
BNF for Standard ML
BNF-like for Haskell
Yacc grammar for PHP
The set of programs that are syntactically correct is context-free for almost all languages.
The set of programs that compile is not context-free for almost all languages. For example, if the set of all compiling C programs were context free, then by intersecting with a regular language (also known as a regex), the set of all compiling C programs that match
^int main\(void\) { int a+; a+ = a+; return 0; }$
would be context-free, but this is clearly isomorphic to the language a^kba^kba^k, which is well-known not to be context-free.
Depending on how you understand the question, the answer changes. But IMNSHO, the proper answer is that all modern programming languages are in fact context sensitive. For example there is no context free grammar that accepts only syntactically correct C programs. People who point to yacc/bison context free grammars for C are missing the point.
To go for the most dramatic example of a non-context-free grammar, Perl's grammar is, as I understand it, turing-complete.
If I understand your question, you are looking for programming languages which can be described by context free grammars (cfg) so that the cfg generates all valid programs and only valid programs.
I believe that most (if not all) modern programming languages are therefore not context free. For example, once you have user defined types (very common in modern languages) you are automatically context sensitive.
There is a difference between verifying syntax and verifying semantic correctness of a program. Checking syntax is context free, whereas checking semantic correctness isn't (again, in most languages).
This, however, does not mean that such a language cannot exist. Untyped lambda calculus, for example, can be described using a context free grammar, and is, of course, Turing complete.
Most of the modern programming languages are not context-free languages. As a proof, if I delve into the root of CFL its corresponding machine PDA can't process string matchings like {ww | w is a string}. So most programming languages require that.
int fa; // w
fa=1; // ww as parser treat it like this
VHDL is somewhat context sensitive:
VHDL is context-sensitive in a mean way. Consider this statement inside a
jinx := foo(1);
Well, depending on the objects defined in the scope of the process (and its
enclosing scopes), this can be either:
A function call
Indexing an array
Indexing an array returned by a parameter-less function call
To parse this correctly, a parser has to carry a hierarchical symbol table
(with enclosing scopes), and the current file isn't even enough. foo can be a
function defined in a package. So the parser should first analyze the packages
imported by the file it's parsing, and figure out the symbols defined in them.
This is just an example. The VHDL type/subtype system is a similarly
context-sensitive mess that's very difficult to parse.
(Eli Bendersky, “Parsing VHDL is [very] hard”, 2009)
Let's take Swift, where the user can define operators including operator precedence and associativity. For example, the operators + and * are actually defined in the standard library.
A context free grammar and a lexer may be able to parse a + b - c * d + e, but the semantics is "five operands a, b, c, d and e, separated by the operators +, -, * and +". That's what a parser can achieve without knowing about operators. A context free grammar and a lexer may also be able to parse a +-+ b -+- c, which is three operands a, b and c separated by operators +-+ and -+-.
A parser can "parse" a source file according to a context-free Swift grammar, but that's nowhere near the job done. Another step would be collecting knowledge about operators, and then change the semantics of a + b - c * d + e to be the same as operator+ (operator- (operator+ (a, b), operator* (c, d)), e).
So there is (or maybe there is, I havent checked to closely) a context free grammar, but it only gets you so far to parsing a program.
I think Haskell and ML are supporting context free. See this link for Haskell.
