Some Macro terms in Racket - scheme

I am confused by the terms for a long time, thinking it is good to ask out what exactly do they mean:
A. syntax. B. syntax value. C. syntax object. D.s-expression E.datum (in syntax->datum)
What's the difference between s-expression and symbol?
What's the difference between s-expression and datum?
What's the difference between (syntax, syntax values and syntax object) from s-expression?
Code examples for explanation will be appreciated.

"Syntax" is a type for representing source code in Racket, which is a wrapper around S-expression (see a recent blog post for details). "Syntax value" and "syntax object" are all synonyms of this, and ni the ancient days of the mzscheme language functions that deal with syntax used syntax-value in the name. These days we use just "syntax" more often, and for a plural form we use "syntaxes".
An "S-expression" is either a primitive piece of data that can be typed in code (symbols, numbers, strings, booleans, etc -- in Racket you could also include other types), or a list of these things. An S-expression is therefore any nested structure of lists made of these primitive types at the fringe. Sometimes this includes vectors too (since they can be typed in using the #(...) syntax) but more usually they're left out.
Finally, "datum" is another name for an S-expression, sometimes when you want to refer to the fact that it's a piece of data that has an input representation. You can see how R5RS introduces it: <Datum> may be any external representation of a Scheme object [...]. This notation is used to include literal constants in Scheme code.
As for your questions:
What's the difference between s-expression and symbol?
A symbols is an S-expression, an S-expression may contain symbols.
What's the difference between s-expression and datum?
Nothing really. (Although some subtle intentions differences might be there.)
What's the difference between (syntax, syntax values and syntax object) from s-expression?
They are the representation of program syntax used by macros in racket -- they contain the S-expressions, but they add source location information, lexical context, syntax properties, and certificates. See that blog post for a quick introduction.

Related

In Lisp/Racket/Scheme how is it possible to have an argument named `list`?

Isn’t list a keyword to create a new list in Lisp, but yet it is possible to have an argument called list in Lisp. I thought keywords in most programming languages such as Java or C++ cannot be used for argument names, is there a special reason in Lisp that they can?
The name list isn't a reserved keyword, it's an ordinary function. Reusing the name for another purpose can be confusing for the reader but doesn't present any problems for the language itself; it's the same as having two variables called x in different parts of the program.
Mainstream Lisp descendants and derivatives like Commmon Lisp and Scheme do not incorporate the concept of reserved keywords. It is alien to the way Lisp works.
When Lisp read syntax is scanned, identifier tokens which appear in it are converted into corresponding symbol objects. These tokens are all in the same lexical category: symbol.
When Lisp read syntax is scanned and turned into an object, such as a nested list representing program code, this is done without regard for the semantics (what the symbols mean).
This is different from the parsing of languages (such as some of those in the broad Fortran/Algol family) which have reserved keywords.
Roughly speaking, reserved keywords are tokens which look like symbols but are actually just punctuation. Lisp has punctuation also, like parentheses, sharpsign prefixes, various quotes and such.
These punctuation words have a fixed role in the phrase structure grammar, and the phrase structure grammar must be processed before the semantics of the program can be considered.
So for instance, the reserved BEGIN and END keywords in Pascal are essentially nothing more than verbose parentheses. The '(' and ')' tokens are similarly reserved in Lisp-like languages. Trying to use BEGIN as the name of a function or variable in Pascal is similar to trying to use ( as the name of a function or variable in Lisp.
Some languages have keywords which determine phrase structure, yet allow identifiers which look exactly like reserved keywords to be used anyway. For instance, PL/I was famous for this:
IF IF=THEN THEN THEN=ELSE; ELSE ELSE=IF
Lisp dialects may assign special semantic treatment to certain symbols or certain categories of symbols. This is a sort of reservation, but not exactly the same as reserved keywords, because it is at the semantic level. For instance, in Common Lisp, the symbols nil and t (more specifically the nil and t in the common-lisp package, common-lisp:nil and common-lisp:t) may not be used as function or variable names. When either one appears as an expression, it evaluates to itself: the value of t is t and that of nil is nil. Moreover, nil is also the Boolean false value and the empty list. So, effectively, these symbols are reserved in some regards. Common Lisp also has a keyword package. All symbols in that package evaluate to themselves and may not be used as variables. They may be used as function names, and for any other purpose.
You say Lisp, but the answer changes depending on which Lisp you're talking about.
In Common Lisp, you can use list as a variable because Common Lisp is a Lisp-2, meaning that each symbol has a separate slot for a function binding and a variable binding. Common Lisp sets the function binding for the symbol list in the CL package, but doesn't set the variable binding. You can't change the function binding because Common Lisp doesn't allow you to redefine bindings for symbols that are set in the CL package (you can, of course, use whatever symbols you like in your own packages), but since the variable binding is free you're allowed to use it.
Scheme is a Lisp-1, which means that it only has one binding per symbol. There's no separation of function bindings and variable bindings (hence why you use define in Scheme, but defun and defvar in CL). The reason you can use "list" as a variable is because Scheme doesn't prevent you from rebinding its built-in symbols. It's just generally a bad idea, since by redefining list you can no longer call the list function.
Emacs Lisp is a Lisp-2 but doesn't prevent you from rebinding symbols, which means you can do things like (defun + (- a b)) and totally screw up your editing session. So... don't do that, unless you really know what you're doing.
Clojure is a Lisp-1. I don't have a working Clojure install at the moment so I can't comment on what it lets you do. I would suspect it's more strict than Scheme.

What is the difference between syntax and semantics in programming languages?

What is the difference between syntax and semantics in programming languages (like C, C++)?
TL; DR
In summary, syntax is the concept that concerns itself only whether or not the sentence is valid for the grammar of the language. Semantics is about whether or not the sentence has a valid meaning.
Long answer:
Syntax is about the structure or the grammar of the language. It answers the question: how do I construct a valid sentence? All languages, even English and other human (aka "natural") languages have grammars, that is, rules that define whether or not the sentence is properly constructed.
Here are some C language syntax rules:
separate statements with a semi-colon
enclose the conditional expression of an IF statement inside parentheses
group multiple statements into a single statement by enclosing in curly braces
data types and variables must be declared before the first executable statement (this feature has been dropped in C99. C99 and latter allow mixed type declarations.)
Semantics is about the meaning of the sentence. It answers the questions: is this sentence valid? If so, what does the sentence mean? For example:
x++; // increment
foo(xyz, --b, &qrs); // call foo
are syntactically valid C statements. But what do they mean? Is it even valid to attempt to transform these statements into an executable sequence of instructions? These questions are at the heart of semantics.
Consider the ++ operator in the first statement. First of all, is it even valid to attempt this?
If x is a float data type, this statement has no meaning (according to the C language rules) and thus it is an error even though the statement is syntactically correct.
If x is a pointer to some data type, the meaning of the statement is to "add sizeof(some data type) to the value at address x and store the result into the location at address x".
If x is a scalar, the meaning of the statement is "add one to the value at address x and store the result into the location at address x".
Finally, note that some semantics can not be determined at compile-time and therefore must be evaluated at run-time. In the ++ operator example, if x is already at the maximum value for its data type, what happens when you try to add 1 to it? Another example: what happens if your program attempts to dereference a pointer whose value is NULL?
Syntax refers to the structure of a language, tracing its etymology to how things are put together.
For example you might require the code to be put together by declaring a type then a name and then a semicolon, to be syntactically correct.
Type token;
On the other hand, the semantics is about meaning.
A compiler or interpreter could complain about syntax errors. Your co-workers will complain about semantics.
Semantics is what your code means--what you might describe in pseudo-code. Syntax is the actual structure--everything from variable names to semi-colons.
Wikipedia has the answer. Read syntax (programming languages) & semantics (computer science) wikipages.
Or think about the work of any compiler or interpreter. The first step is lexical analysis where tokens are generated by dividing string into lexemes then parsing, which build some abstract syntax tree (which is a representation of syntax). The next steps involves transforming or evaluating these AST (semantics).
Also, observe that if you defined a variant of C where every keyword was transformed into its French equivalent (so if becoming si, do becoming faire, else becoming sinon etc etc...) you would definitely change the syntax of your language, but you won't change much the semantics: programming in that French-C won't be easier!
You need correct syntax to compile.
You need correct semantics to make it work.
Late to the party - but to me, the answers here seem correct but incomplete.
Pragmatically, I would distinguish between three levels:
Syntax
Low level semantics
High level semantics
1. SYNTAX
Syntax is the formal grammar of the language, which specifies a well-formed statement the compiler will recognise.
So in C, the syntax of variable initialisation is:
data_type variable_name = value_expression;
Example:
int volume = 66 * 22 * 55;
While in Go, which offers type inference, one form of initialisation is:
variable_name := value_expression
Example:
volume := 66 * 22 * 55
Clearly, a Go compiler won't recognise the C syntax, and vice versa.
2. LOW LEVEL SEMANTICS
Where syntax is concerned with form, semantics is concerned with meaning.
In natural languages, a sentence can be syntactically correct but semantically meaningless. For example:
The man bought the infinity from the store.
The sentence is grammatically correct but doesn't make real-world sense.
At the low level, programming semantics is concerned with whether a statement with correct syntax is also consistent with the semantic rules as expressed by the developer using the type system of the language.
For example, this is a syntactically correct assignment statement in Java, but semantically it's an error as it tries to assign an int to a String
String firstName = 23;
So type systems are intended to protect the developer from unintended slips of meaning at the low level.
Loosely typed languages like JavaScript or Python provide very little semantic protection, while languages like Haskell or F# with expressive type systems provide the skilled developer with a much higher level of protection.
For example, in F# your ShoppingCart type can specify that the cart must be in one of three states:
type ShoppingCart =
| EmptyCart // no data
| ActiveCart of ActiveCartData
| PaidCart of PaidCartData
Now the compiler can check that your code hasn't tried to put the cart into an illegal state.
In Python, you would have to write your own code to check for valid state.
3. HIGH LEVEL SEMANTICS
Finally, at a higher level, semantics is concerned with what the code is intended to achieve - the reason that the program is being written.
This can be expressed as pseudo-code which could be implemented in any complete language. For example:
// Check for an open trade for EURUSD
// For any open trade, close if the profit target is reached
// If there is no open trade for EURUSD, check for an entry signal
// For an entry signal, use risk settings to calculate trade size
// Submit the order.
In this (heroically simplified) scenario, you are making a high-level semantic error if your system enters two trades at once for EURUSD, enters a trade in the wrong direction, miscalculates the trade size, and so on.
TL; DR
If you screw up your syntax or low-level semantics, your compiler will complain.
If you screw up your high-level semantics, your program isn't fit for purpose and your customer will complain.
Syntax is the structure or form of expressions, statements, and program units but Semantics is the meaning of those expressions, statements, and program units. Semantics follow directly from syntax.
Syntax refers to the structure/form of the code that a specific programming language specifies but Semantics deal with the meaning assigned to the symbols, characters and words.
Understanding how the compiler sees the code
Usually, syntax and semantics analysis of the code is done in the 'frontend' part of the compiler.
Syntax: Compiler generates tokens for each keyword and symbols: the token contains the information- type of keyword and its location in the code.
Using these tokens, an AST(short for Abstract Syntax Tree) is created and analysed.
What compiler actually checks here is whether the code is lexically meaningful i.e. does the 'sequence of keywords' comply with the language rules? As suggested in previous answers, you can see it as the grammar of the language(not the sense/meaning of the code).
Side note: Syntax errors are reported in this phase.(returns tokens with the error type to the system)
Semantics: Now, the compiler will check whether your code operations 'makes sense'.
e.g. If the language supports Type Inference, sematic error will be reported if you're trying to assign a string to a float. OR declaring the same variable twice.
These are errors that are 'grammatically'/ syntaxially correct, but makes no sense during the operation.
Side note: For checking whether the same variable is declared twice, compiler manages a symbol table
So, the output of these 2 frontend phases is an annotated AST(with data types) and symbol table.
Understanding it in a less technical way
Considering the normal language we use; here, English:
e.g. He go to the school. - Incorrect grammar/syntax, though he wanted to convey a correct sense/semantic.
e.g. He goes to the cold. - cold is an adjective. In English, we might say this doesn't comply with grammar, but it actually is the closest example to incorrect semantic with correct syntax I could think of.
He drinks rice (wrong semantic- meaningless, right syntax- grammar)
Hi drink water (right semantic- has meaning, wrong syntax- grammar)
Syntax: It is referring to grammatically structure of the language.. If you are writing the c language . You have to very care to use of data types, tokens [ it can be literal or symbol like "printf()". It has 3 tokes, "printf, (, )" ]. In the same way, you have to very careful, how you use function, function syntax, function declaration, definition, initialization and calling of it.
While semantics, It concern to logic or concept of sentence or statements. If you saying or writing something out of concept or logic, then you are semantically wrong.

What is the formal term for the "#{}" token in Ruby syntax?

The Background
I recently posted an answer where I variously referred to #{} as a literal, an operator, and (in one draft) a "literal constructor." The squishiness of this definition didn't really affect the quality of the answer, since the question was more about what it does and how to find language references for it, but I'm unhappy with being unable to point to a canonical definition of exactly what to call this element of Ruby syntax.
The Ruby manual mentions this syntax element in the section on expression substitution, but doesn't really define the term for the syntax itself. Almost every reference to this language element says it's used for string interpolation, but doesn't define what it is.
Wikipedia Definitions
Here are some Wikipedia definitions that imply this construct is (strictly speaking) neither a literal nor an operator.
Literal (computer programming)
Operator (programming)
The Questions
Does anyone know what the proper term is for this language element? If so, can you please point me to a formal definition?
Ruby's parser calls #{} the "embexpr" operator. That's EMBedded EXPRession, naturally.
I would definitely call it neither a literal (that's more for, e.g. string literals or number literals themselves, but not parts thereof) nor an operator; those are solely for e.g. binary or unary (infix) operators.
I would either just refer to it without a noun (i.e. for string interpolation), or perhaps call those characters the string interpolation sequence or escape.
TL;DR
Originally, I'd hypothesized:
Embedded expression seems the most likely definition for this token, based on hints in the source code.
This turned out to be true, and has been officially validated by the Ruby 2.x documentation. Based on the updates to the Ripper documentation since this answer was originally written, it seems the parser token is formally defined as string_embexpr and the symbol itself is called an "embedded expression." See the Update for Ruby 2.x section at the bottom of this answer for detailed corroboration.
The remainder of the answer is still relevant, especially for older Rubies such as Ruby 1.9.3, and the methodology used to develop the original answer remains interesting. I am therefore updating the answer, but leaving the bulk of the original post as-is for historical purposes, even though the current answer could now be shorter.
Pre-2.x Answer Based on Ruby 1.9.3 Source Code
Related Answer
This answer calls attention to the Ruby source, which makes numerous references to embexpr throughout the code base. #Phlip suggests that this variable is an abbreviation for "EMBedded EXPRession." This seems like a reasonable interpretation, but neither the ruby-1.9.3-p194 source nor Google (as of this writing) explicitly references the term embedded expression in association with embexpr in any context, Ruby-related or not.
Additional Research
A scan of the Ruby 1.9.3-p194 source code with:
ack-grep -cil --type-add=YACC=.y embexpr .rvm/src/ruby-1.9.3-p194 |
sort -rnk2 -t: |
sed 's!^.*/!!'
reveals 9 files and 33 lines with the term embexpr:
test_scanner_events.rb:12
test_parser_events.rb:7
eventids2.c:5
eventids1.c:3
eventids2table.c:2
parse.y:1
parse.c:1
ripper.y:1
ripper.c:1
Of particular interest is the inclusion of string_embexpr on line 4,176 of the parse.y and ripper.y bison files. Likewise, TestRipper::ParserEvents#test_string_embexpr contains two references to parsing #{} on lines 899 and 902 of test_parser_events.rb.
The scanner, exercised in test_scanner_events.rb, is also noteworthy. This file defines tests in #test_embexpr_beg and #test_embexpr_end that scan for the token #{expr} inside various string expressions. The tests reference both embexpr and expr, raising the likelihood that "embedded expression" is indeed a sensible name for the thing.
Update for Ruby 2.x
Since this post was originally written, the documentation for the standard library's Ripper class has been updated to formally identify the token. The usage section provides "Hello, #{world}!" as an example, and says in part:
Within our :string_literal you’ll notice two #tstring_content, this is the literal part for Hello, and !. Between the two #tstring_content statements is a :string_embexpr, where embexpr is an embedded expression.
This Block post suggests, it is called an 'idiom':
http://kconrails.com/2010/12/08/ruby-string-interpolation/
The Wikipedia Article doesn't seem to contradict that:
http://en.wikipedia.org/wiki/Programming_idiom
#{} It's called placeholder and is used to reference variables with a string.
puts "My name is #{my_name}"

What are some example use cases for symbol literals in Scala?

The use of symbol literals is not immediately clear from what I've read up on Scala. Would anyone care to share some real world uses?
Is there a particular Java idiom being covered by symbol literals? What languages have similar constructs? I'm coming from a Python background and not sure there's anything analogous in that language.
What would motivate me to use 'HelloWorld vs "HelloWorld"?
Thanks
In Java terms, symbols are interned strings. This means, for example, that reference equality comparison (eq in Scala and == in Java) gives the same result as normal equality comparison (== in Scala and equals in Java): 'abcd eq 'abcd will return true, while "abcd" eq "abcd" might not, depending on JVM's whims (well, it should for literals, but not for strings created dynamically in general).
Other languages which use symbols are Lisp (which uses 'abcd like Scala), Ruby (:abcd), Erlang and Prolog (abcd; they are called atoms instead of symbols).
I would use a symbol when I don't care about the structure of a string and use it purely as a name for something. For example, if I have a database table representing CDs, which includes a column named "price", I don't care that the second character in "price" is "r", or about concatenating column names; so a database library in Scala could reasonably use symbols for table and column names.
If you have plain strings representing say method names in code, that perhaps get passed around, you're not quite conveying things appropriately. This is sort of the Data/Code boundary issue, it's not always easy to the draw the line, but if we were to say that in that example those method names are more code than they are data, then we want something to clearly identify that.
A Symbol Literal comes into play where it clearly differentiates just any old string data with a construct being used in the code. It's just really there where you want to indicate, this isn't just some string data, but in fact in some way part of the code. The idea being things like your IDE would highlight it differently, and given the tooling, you could refactor on those, rather than doing text search/replace.
This link discusses it fairly well.
Note: Symbols will be deprecated and then removed in Scala 3 (dotty).
Reference: http://dotty.epfl.ch/docs/reference/dropped-features/symlits.html
Because of this, I personally recommend not using Symbols anymore (at least in new scala code). As the dotty documentation states:
Symbol literals are no longer supported
it is recommended to use a plain string literal [...] instead
Python mantains an internal global table of "interned strings" with the names of all variables, functions, modules, etc. With this table, the interpreter can make faster searchs and optimizations. You can force this process with the intern function (sys.intern in python3).
Also, Java and Scala automatically use "interned strings" for faster searchs. With scala, you can use the intern method to force the intern of a string, but this process don't works with all strings. Symbols benefit from being guaranteed to be interned, so a single reference equality check is both sufficient to prove equality or inequality.

Is a symbol table in Ruby any different from a symbol table in other languages

The wikipedia entry on Symbol tables is a good reference:
http://en.wikipedia.org/wiki/Symbol_table
But as I try to understand symbols in Ruby and how they are represented in the Array of Symbols (returned by the Symbol.all_symbols method),
I'm wondering whether Ruby's approach to the symbol table has any important differences from other languages?
Ruby doesn't really have a "symbol table" in that sense. It has bindings, and symbols (what lispers call atoms) but it isn't really doing it the way that article describes.
So in answer to your question: it isn't so much that ruby has the same thing done differently, but rather that it does two different things (:xxx notation --> unique ids and bindings in scopes) and uses similar / overlapping terminology for them.
To clarify:
The article you link to gives the conventional definition of a symbol table, to wit
where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source, such as its type, scope level and sometimes its location
But this isn't what ruby's symbol table does. It just provides a globally unique identity for a certain class of objects which can be written as :something in the source code, including things like :+ and :"Hi bob!" which aren't identifiers. Also, merely using an identifier will not create a corresponding symbol. And finally, none of the information listed in the passage above is stored in ruby's list of symbols.
It's a coincidence of naming, and reading that article will not help you understand ruby's symbols.
The biggest difference is that (like Lisp) Ruby actually has a syntax for symbols, and it's easy to add/remove things at runtime yourself. If you say :balloon (or "balloon".intern) it will intern that for you. Even though you're referring to it by name in your source, internally it's just a pointer in the symbol table. If you compare symbols, it's just a pointer-compare, not a string-compare.
Languages like C don't really have a way to say simply "create a new symbol for me" at runtime. You can do it implicitly at compile-time by defining a function, but that's really its only use. Since C has no syntax for symbols, if you want to be able to say Balloon in your program but be able to compare it with a single machine instruction, you use enums (or #defines).
In Ruby, it takes only one character to make a symbol, so you can use it for all kinds of things (like hash keys).
Symbols in Ruby are used where other languages tend to use enums, defines, constants and the like. They're also often used for associative keys. Their use has little to do with a symbol table as discussed in that article, except that they obviously exist in one.

Resources