If I were to want to create a new programming language, is there a specific file format or extension used to define the grammar of this new language? Or should I just use a plain text file?
What you are looking for might be a metasyntax notation, sometimes also called "formal grammar".
A common notation to describe at least certain types of formal languages is Extended Backus-Naur Form. It defines the grammar by means of "production rules" that gradually expand a "start symbol" into any valid expressions in the described language.
This format is, however, mainly a notation (i.e. it describes how you write down the language description, even when you're writing on paper). It is not so much a "file format" in the sense that there would be a standard extension for it.
Related
I was reading a article about it, and there was a line in that article which confused me a bit:
".....the markup language is used to present information whereas programming language is used to give instructions to a computer to perform a particular task."
The Computer still has to follow instructions of markup language to display information on the screen, right?
Please help! And Thank you!
You are right and the article is also right, but there's a bit of clarification.
A markup language (HTML, XAML, markdown, etc.) is an example of a declarative language, which says "this is what the result should be (e.g. what the screen should look like, what bit of input or output is expected, and so on). Declarative languages do not say anything about how to do something, but ultimately rely on an "engine" that imperatively gets to the results.
In contrast an imperative language describes each step that must be taken to achieve the end result (i.e. an algorithm). This is a set of instructions.
The reason your statement, "The Computer still has to follow instructions of markup language to display information on the screen" is correct is because a declarative language must rely on imperative operators and statements such as conditionals, looping, and so on (imperative) at a lower level to achieve the stated declarations.
Incidentally, referring back to your title... both are programming languages. They're just using different paradigms.
I have been curious lately about DSLs, specifically, how to implement them in Lisp,
since it looks like a piece of cake compare to the alternatives.
Looking for information I cannot find any evidence of a non-lisp DSEL in Lisp in internet.
So my question is:
Is it possible to implement a DSL with non-lisp syntax in lisp with the use of macros?
How is this achieved?
Can the reader of lisp be replaced by a custom reader that translates code to lisp structure?
If the former is true: is this a common way to implement "non-lispy" DSELs?
Short version: Racket does this.
In more detail: Racket, a descendant of Scheme, has a really well-thought-out story here. A Racket module/file can begin with a language declaration, e.g.
#lang algol60
... and then the rest of the file can be written in the given language. (Yes, algol60 is built in.)
In order to develop your own language, you need to write a package that is a language specification, that shows how to expand the syntax of this language into the syntax of the underlying language (in this case, Racket). Anyone can write such packages, and then distribute them to allow others to write programs in this language. There are examples of such language specifications included with Racket, e.g. the algol 60 example mentioned earlier.
I think this is exactly what you're asking for?
ObDisclaimer: Yes, I am a Racket developer.
How do you implement the surface language of a programming language? You write a parser or use a parser generator. You can do that in Lisp, too.
There are many examples of general purpose and domain specific languages written in Lisp - not using s-expression syntax.
Historically the first ML (an extension language for a theorem prover) was written in Lisp. Macsyma (a language for computer algebra) is written in Lisp. In many cases there is some kind of 'end user', for which a non-s-expression language needs to be written/supported. Sometimes there are languages which exist and need to be supported.
Using macros and read macros you can implement some languages or extend the Lisp language. For example it is easy to add JSON syntax to Lisp using a read macro. Also some kind of infix syntax. XML (example: XMLisp).
There's no problem in supporing non-Lisp syntax DSLs in Lisp. You'll need to use some parser/parser generator library as Rainer has mentioned. A good example is esrap that is used to parse markdown (see 3bmd) and also for the pgloader command language which is just an example of an external DSL you're asking about.
From Let Over Lambda, there is an implementation of Perl style regular expressions: http://letoverlambda.com/index.cl/guest/chap4.html#sec_4.
Also there are several attempts at making a "non-lispy" version of Lisp, the main one being the Readable Lisp S-expression Project: http://readable.sourceforge.net/.
One implementation-specific solution that sticks out (if you want to use Scheme rather than CL) is Gambit Scheme's built-in support for infix syntax via its SIX-script extension.
This provides a rich set of loosely C-like operators and syntax forms, which can either be used out-of-the-box to write code in a C-like style, or redefined to mean whatever you want (you can easily redefine e.g. the function definition format, if you aren't a fan of type name(args) {}). for, case, := and so on (even goto) are all already present and ready to mean whatever you need.
The actual core of the syntax (operator precedence, expressions vs. statements) is fixed, but you can assign things like a Scheme binding construct to the s-expression produced by an operator for a reasonably large amount of freedom.
a = b * c;
is translated by the reader into
(six.x=y (six.identifier a) (six.x*y (six.identifier b) (six.identifier c)))
You can then override the definitions of those macros with your own to make the syntax do whatever you want. Turning the C-style base into a Haskell-looking functional language isn't too hard (strategically redefine = and -> and you're halfway there...).
I just discovered OWL and Protege. Upon reading through this reference page (which I quote below), I am left wondering whether it is possible to not use the abstract OWL syntax, and rather to write in DL syntax. My background is in logic, so it sounds like it would be more fun even if I would have to translate the ontologies later (though I am sure there must be applications to do this--besides, don't reasoners use DL?).
If it is possible, what configuration of settings should I use in Protege (or other software of your suggestion) in order to do this? I suspect it's not possible, but I want to be sure, as I see no good reason for this other than the awkwardness of special symbols.
EDIT: If it is NOT possible, how exactly are DL languages used?
OWL DL is the description logic SHOIN with support of data values, data types
and datatype properties, i.e., SHOIN(D), but since OWL is based on RDF(S), the
terminology slightly differs. ... For description of OWL ontology or knowledge
base, the DL syntax can be used. There is an "abstract" LISP-like syntax
defined that is easier to write in ASCII character set.
Here's a very brief working example of the two syntax styles for the same data.
don't reasoners use DL?
Not necessarily. They use all kinds of logics, some of which are DLs, some are not.
If it is possible, what configuration of settings should I use in Protege (or other software of your suggestion) in order to do this?
I'm pretty sure there is no such pluggin for Protégé. But if you really want some fun, use a text editor and write your ontology by hand. There are many syntaxes you can use: the functional syntax, the OWL/XML syntax, the RDF/XML syntax are all normative. In addition, you can use the Manchester syntax, Turtle, N-Triples, JSON-LD, that will be future recommendations for writing RDF (and therefore OWL). Or the more exotic RDF/JSON, HDT. Or again, more "powerful" syntaxes like Notation3, TriG, TriX, NQuads. Plenty of fun!
In any case, if you would like to write in the DL syntax, you would need to use special Unicode characters or special commands like in LaTeX for instance. And the parser that deals with it would have to read those characters or commands. Not ideal if you are programming. But you can always use the DL syntax in your writings.
BTW, the current standard Web Ontology Language is OWL 2. Its DL variant (viz., OWL 2 DL) is based on the even more irresistible SROIQ.
Perhaps I am limited by my experience with dynamic languages (Ruby on Netbeans and Groovy on Eclipse), but it seems to me that the nature of dynamic languages makes it impossible to refactor (renaming methods, classes, pushing-up, pulling-down, etc.) automatically.
Is it possible to refactor AUTOMATICALLY in any dynamic language (with any IDE/tool)? I am especially interested in Ruby, Python and Groovy, and how the refactoring compares to the 100% automatic refactoring available in all Java IDEs.
Given that automatic refactoring was invented in a dynamic language (Smalltalk), I would have to say "Yes".
In particular, John Brant, Don Roberts and Ralph Johnson developed the Refactoring Browser which is one of the core tools in, for instance, Squeak.
My Google-fu is weak today, but you could try find this paper: Don Roberts, John Brant, and Ralph Johnson, A Refactoring Tool for Smalltalk, "The Theory and Practice of Object Systems", (3) 4, 1997.
Smalltalk does not declare any types. The Refactoring Browser has successfully performed correct refactorings in commercial code since 1995 and is incorporated in nearly all current Smalltalk IDE's. - Don Roberts
Automatic Refactoring was invented in Smalltalk, a highly dynamic language.
And it works like a charm ever since.
You can try yourself in a free Smalltalk version (for instance http://pharo-project.org)
In a dynamic language you can also script refactorings yourself or query the
system. Simple example to get the number of Test classes:
TestCase allSubclasses size
I have wondered the same thing. I'm not a compiler/interpreter writer, but I think the answer will be that it is impossible to get it perfect. However, you can get it correct in most cases.
First, I'm going to change the name "dynamic" language to "interpreted" language which is what I think of with Ruby, Javascript, etc. Interpreted languages tend to take advantage of run-time capabilities.
For instance, most scripting languages allow the following
-- pseudo-code but you get the idea
eval("echo(a)");
I just "ran" a string! You would have to refactor that string also. And will a be a variable or does this language allow you to print the character a without quotes if there is no variable a?
I want to believe this kind of coding is probably the exception and that you will get good refactoring almost all of the time. Unfortunately it seems that when I look through libraries for scripting languages, they get into such exceptions normally and maybe even base their architecture on them.
Or to up the ante a bit:
def functionThatAssumesInputWillCreateX(input)
eval(input)
echo(x)
def functionWithUnknownParms( ... )
eval(argv[1]);
At least when you refactor Java, and change a variable from int to string, you get errors in all the places that were expecting the int still:
String wasInt;
out = 3 + wasInt;
With interpreted languages you will probably not see this until run-time.
Ditto the points about the Refactoring Browser...it is highly effective in Smalltalk. However, I imagine there are certain types of refactoring that would be impossible without type information (whether obtain by explicit type annotation in the language or through some form of type inferencing in a dynamic language is irrelevant). One example: when renaming a method in Smalltalk, it will rename all implementors and senders of that method, which most often is just fine, but is sometimes undesirable. If you had type information on variables, you could scope the rename to just the implementors in the current class hierarchy and all senders when the message is being sent to a variable declared to be of a type in that hierarchy (however, I could imagine scenarios where even with type declaration, that would break down and produce undesirable results).
What is the difference in meaning between 'semantics' and 'syntax'? What are they?
Also, what's the difference between things like "semantic website vs. normal website", "semantic social networking vs. normal social networking" etc.
Syntax is the grammar. It describes the way to construct a correct sentence. For example, this water is triangular is syntactically correct.
Semantics relates to the meaning. this water is triangular does not mean anything, though the grammar is ok.
Talking about the semantic web has become trendy recently. The idea is to enhance the markup (structural with HTML) with additional data so computer could make sense of the web pages more easily.
Syntax is the grammar of a language - the rules by which to form sentences or expressions.
Semantics is the meaning you are trying to express with your code.
A program that is syntactically correct will compile and run.
A program that is semantically correct will actually do what you as the programmer intended it to do. i.e. it doesn't have any bugs in it.
Two programs written to perform the same task in different languages will use different syntaxes, but they would be the same semantically.
If you are talking about web (rather than programming languages):
The syntax of the language is whatever the browser (or processing program) can legally recognize and handle, and render to you. For example, your browser can render HTML, while your API can parse XML trees.
Semantics involve what is actually being represented. There's a lot of buzz now about semantic webs and all that stuff, but it essentially means that each entity is also associated with some human-readable information or metadata, so that a certain tag would have a supposed meaning and refer you to it.
Social networks are the same story. You put knowledge in the links
"An ant ate an aunt." has a correct syntax, but will not make sense semantically. A syntax is a set of rules that can be combined to produce infinite number of gramatically valid sentences, but few, very few of which has a semantics.
Syntax is the word order of a sentence. In English it would be the subject-verb-object form.
Semantic is the meaning behind words. E.g: she ate a saw. The word saw doesn't match according to the meaning of the sentence. but it is grammatically correct. so its syntax is correct. =)
Specifically, semantic social networking means embedding the actual social relationships within the page markup. The standard format for doing this as defined by microformats is XFN, XHTML Friends Network. In regards to the semantic web in general, microformats should be the go-to guide for defining embedded semantic content.
Semantic web sites use the concept of the semantic web, which aims to bring meaning to web content by using special annotations to identify certain concepts in a page. This makes possible the automatic (by a computer, not a human) reasoning about the content, which improves its aggregation, extraction, indexing and searching.
Explanations above are vague on the semantics side, semantics could mean the different elements at disposition to build arguments of value(these being comprehensible, to end-user man and digestible to the machine).
Of course this puts semantics and the programmer-editor-writer-communicator in the middle: he decides on the semantics that should be ideally defined to his public, comprehended by his public, general convention by his public and digestible to the machine-computer. Semantics should be agreed upon, are conceptual, must be implementable to both sides.
Say footnotes, inline and block-quotes, titles and on and on to end up into a well-defined and finite list. Mediawiki, wikitext as an example fails in that perspective, defining syntax for elements of semantic meaning left undefined, no finite list agreed upon. "meaning by form" as additional of what a title as an example again carries as textual content. Example "This is a title" becomes only semantics integrated by the supposition within agreed upon semantics, and there can be more then one set of say "This is important and will be detailed"
Asciidoc and pandoc markup is quite different in it's semantics, regardless of how each translates this by convention of syntax to output formats.
Programming, output formats as html, pdf, epub can have consequentially meaning by form, by semantics, the syntax having disappeared as a temporary tool of translation, and as one more consequence thus the output can be scanned robotically for meaning, the champ of algorithms of 'grep': Google. Looking for the meaning of "what" in "What is it that is looked for" based upon whether a title or a footnote, or a link is considered.
Semantics, and there can be more then one layer, even the textual message carries (Chomsky) semantics thus could be translated as meaning by form, creating functional differences to anything else in the output chain, including a human being, the reader.
As a conclusion, programmers and academics should be integrated, no academic should be without knowledge of his tools, as any bread and butter carpenter. Programmers should be academics in the sense that the other end of the bridging they accomplish is the end user, the bridge... much so: semantics.
m.