In an answer to another SO question, passing mention was made to a Mathematica string escape syntax that looks like this: "\<...\>". Apparently, this syntax is useful for dealing with embedded newlines in strings. I've tried searching for documentation using various terms in the Mathematica help browser (and Google), but to no avail. Where can I find documentation on this syntax?
Answer Summary
#Mark points out that the construct is documented in Section 2.8.6 of the V5 Mathematica book. It is also mentioned in Section 2.8.7 of the V5.2 Mathematica Book. In both of those locations, the documentation states that Mathematica ignores line breaks and following tabs in strings -- unless they are enclosed between \< and \> in which case the line breaks (but not tabs) are retained.
In the corresponding section of the V6 documentation, it states that line breaks are retained in strings. Therefore, it appears that the escape syntax is no longer needed starting in V6 -- and is therefore no longer documented.
Note that many notebooks still use this syntax internally, even in V8. #Alexey points out that the cell expressions for strings that contain line breaks still use the syntax.
It's mentioned briefly in section 2.8.6 of the most recent edition of The Mathematica Book. Of course, that's for V5 of Mathematica. In fact, I just tried the following in both V5.2 and V6.0.3:
"Hi
There"
The results were quite different. In particular, V5.2 returned a single line, with no newline character. V6.0.3 returned two lines with the newline character formatted as expected. Strings were improved considerably in V6, so perhaps the `"\<...\>" construct is no longer required.
And perhaps I've been using Mathematica for way too long. :)
Related
The man page for apparmor.d (5) uses the syntax element AARE in several places, such as in SIGNAL PEER = 'peer' '=' AARE.
The definition for AARE is this:
AARE = ?*[]{}^
See below for meanings.
My question now is: where is this "below"? I suspect AARE to maybe mean "AppArmor Regular Expression" but I simply cannot find anywhere in the AA documentation any explanation for it. Googling or DuckduckGo'ing for "apparmor AARE" simpy draws blanks except for the man page where I cannot find any explanation "below". There is a section about "Globbing" but it is totally unclear to me if the AAREs (regular expressions?) are actually refering to "globbing" -- but then globbing isn't regular expressions.
So what are AARE and what is their syntax examples, as I really don't understand the meaning of ?*[]{}^.
Based on a discussion of my original question on the AppArmor mailing list the simplified answer is: yes, AARE means "AppArmor Regular Expression". However, AAREs are nearer (shell) glob expressions, but with additional AppArmor variable expansion using the #{VAR} syntax. This is not to be confused with the glob syntax of alternatives {A,B,C} which is also supported.
While there have been discussion on man page updates I don't see them yet live in production; the proposed changes should have been related to the Globbing section.
I noticed in the Advanced Bash-Scripting Guide, that multiline comments are denoted as #+ rather than simply #. E.g. here.
(there is also a #% used in that particular example, denoting something like a bullet list(?), but this is literally the only location in the document where this is used, whereas the #+ syntax is used extensively)
I was wondering if this is some sort of convention, or if there is a particular reason for it other than the fact it just looks nice.
I note that it specifically seems to denote lines that are meant to be a continuation on a single line, rather than multi-line comments in general, so I'm wondering if it was simply done internally for parsing / documentation generation.
Has anyone else encountered this out in the wild before? Does anyone actually use this style?
Is it possible to invert the behavior of break hints in the Format module, e.g. using standard spaces as break hints, and adding some special notation for non-breakable spaces?
The current behavior leads to situations where one might be inclined to write "Hello# world,# this# is# a# short# phrase", where every space is converted into a break hint to mimic the behavior as seen e.g. in text editors, HTML renderers, etc.
For instance, this Using the Format module documentation explicitly recommends using break hints:
Generally speaking, a printing routine using "format", should not directly output white spaces: the routine should use break hints instead.
This behavior not only complicates writing messages, but it also makes it very hard to grep strings in the source code.
It seems that following the established convention that "every space is a break hint, unless marked as non-breaking space" would be a better alternative.
Are there simple techniques to wrap such strings and invert their behavior, preferably that incur no excessive runtime cost and/or lead to typing issues (e.g. due to conversions between string and format)?
Since 4.02 there is a Format.pp_print_text function that will take a regular text and print it substituting spaces with cuts and \n with new_lines.
Using this function you can still print using printf and other convenience functions:
printf "text '%a'" pp_print_text "hello, this a short phrase"
In general your question is more about library design. So, it is hard to answer anything about it. It is more suited for discussion on the OCaml mailing list.
Background:
I am implementing a language similar to Ruby, called Sapphire, as a way to try out some Ideas I have on concurrency in programming languages. I am trying to copy Ruby's double quoted strings with embedded code which I find very useful as a programmer.
Question:
How do any of the Ruby interpreters turn a double quotes string with embedded code into and AST?
eg:
puts "The value of foo is #{#foo}."
puts "this is an example of unmatched braces in code: #{ foo.go('}') }"
Details:
The problem I have is how to decide which } closes the code block. Code blocks can have other braces within them and with a little effort they can be unmatched. The lexer can find the beginning of a code block in a string, but without the aid of the parser, it cannot know for sure which character is the end of that block.
It looks like Ruby's parse.y file does both the lexing and parsing steps, but reading that thing is a nightmare it is 11628 lines long with no comments and lots of abbr.
True, Yacc files can be a bit daunting to read at first and parse.y is not the best file to start with. Have you looked at the various string production rules? Do you have any specific questions?
As for the actual parsing, it's indeed not uncommon that lexers do also parse numeric literals and strings, see e.g. the accepted answer to a similar question here on SO. If you approach things this way, it's not too hard to see how to go about it. Hitting #{ inside a string, basically starts a new parsing context that gets parsed as an expression again. This means that the first } in your example can't be the terminating one for the interpolation, since it's part of a literal string within the expression. Once you reach the end of the expression (keep in mind expression separators like ;), the next } is the one you need.
This is not a complete answer, but I leave it in hopes that it might be useful either to me or one who follows me.
Matz gives a pretty detailed rundown of the yylex() function of parse.y in chapter 11 of his book. It does not directly mention strings, but it does describe how the lexer uses lex_state to resolve several locally ambiguous constructs in Ruby.
A reproduction of an English translation of this chapter can be found here.
Please bear in mind that they don't have to (create an AST at compile time).
Ruby strings can be assembled at runtime and will interpolate correctly. Therefore all the parsing and evaluation machinery has to be available at runtime. Any work done at compile time in that sense could be considered an optimisation.
So why does this matter? Because there are very effective stack-based techniques for parsing and evaluating expressions that do not create or decorate an AST. The string is read (parsed) from left to right, and as embedded tokens are encountered they are either evaluated or pushed on a stack, or cause stack contents to be popped and evaluated.
This is a simple technique to implement provided the expressions are relatively simple. If you really want the full power of the language inside every string, then you need the full compiler at runtime. Not everyone does.
Disclosure: I wrote a commercial language product that does exactly this.
Dart also supports expressions interpolated into strings like Ruby, and I've skimmed a few parsers for it. I believe what they do is define separate tokens for a string literal preceding interpolation and a string literal at the end. So if you tokenize:
"before ${the + expression} after"
You would get tokens like:
STRING_START "before "
IDENTIFIER the
PLUS
IDENTIFIER expression
STRING " after"
Then in your parser, it's a pretty straightforward process of handling STRING_START to parse the interpolated expression(s) following it.
Our Ruby parser (see my bio) treats Ruby "strings" as complex objects having lots of substructures, including string start and end tokens, bare string literal fragments, lots of funny punctuation sequences representing the various regexp operators, and of course, recursively, most of Ruby itself for expressions nested inside such strings.
This is accomplished by allowing the lexer to detect and generate such string fragments in a (for Ruby, many) special lexing modes. The parser has a (sub)grammar that defines valid sequences of tokens. And that kind of parsing solves OP's original problem; the parser knows whether a curly brace matches other curly braces from the regexp content, and/or if the regexp has been completely assembled and the curly brace is a matching block end.
Yes, it builds an AST of the Ruby code, and of the regexps.
The purpose of all this is to allow us to build analyzers and transformers of Ruby code. See https://softwarerecs.stackexchange.com/q/11779/101
The Background
I recently posted an answer where I variously referred to #{} as a literal, an operator, and (in one draft) a "literal constructor." The squishiness of this definition didn't really affect the quality of the answer, since the question was more about what it does and how to find language references for it, but I'm unhappy with being unable to point to a canonical definition of exactly what to call this element of Ruby syntax.
The Ruby manual mentions this syntax element in the section on expression substitution, but doesn't really define the term for the syntax itself. Almost every reference to this language element says it's used for string interpolation, but doesn't define what it is.
Wikipedia Definitions
Here are some Wikipedia definitions that imply this construct is (strictly speaking) neither a literal nor an operator.
Literal (computer programming)
Operator (programming)
The Questions
Does anyone know what the proper term is for this language element? If so, can you please point me to a formal definition?
Ruby's parser calls #{} the "embexpr" operator. That's EMBedded EXPRession, naturally.
I would definitely call it neither a literal (that's more for, e.g. string literals or number literals themselves, but not parts thereof) nor an operator; those are solely for e.g. binary or unary (infix) operators.
I would either just refer to it without a noun (i.e. for string interpolation), or perhaps call those characters the string interpolation sequence or escape.
TL;DR
Originally, I'd hypothesized:
Embedded expression seems the most likely definition for this token, based on hints in the source code.
This turned out to be true, and has been officially validated by the Ruby 2.x documentation. Based on the updates to the Ripper documentation since this answer was originally written, it seems the parser token is formally defined as string_embexpr and the symbol itself is called an "embedded expression." See the Update for Ruby 2.x section at the bottom of this answer for detailed corroboration.
The remainder of the answer is still relevant, especially for older Rubies such as Ruby 1.9.3, and the methodology used to develop the original answer remains interesting. I am therefore updating the answer, but leaving the bulk of the original post as-is for historical purposes, even though the current answer could now be shorter.
Pre-2.x Answer Based on Ruby 1.9.3 Source Code
Related Answer
This answer calls attention to the Ruby source, which makes numerous references to embexpr throughout the code base. #Phlip suggests that this variable is an abbreviation for "EMBedded EXPRession." This seems like a reasonable interpretation, but neither the ruby-1.9.3-p194 source nor Google (as of this writing) explicitly references the term embedded expression in association with embexpr in any context, Ruby-related or not.
Additional Research
A scan of the Ruby 1.9.3-p194 source code with:
ack-grep -cil --type-add=YACC=.y embexpr .rvm/src/ruby-1.9.3-p194 |
sort -rnk2 -t: |
sed 's!^.*/!!'
reveals 9 files and 33 lines with the term embexpr:
test_scanner_events.rb:12
test_parser_events.rb:7
eventids2.c:5
eventids1.c:3
eventids2table.c:2
parse.y:1
parse.c:1
ripper.y:1
ripper.c:1
Of particular interest is the inclusion of string_embexpr on line 4,176 of the parse.y and ripper.y bison files. Likewise, TestRipper::ParserEvents#test_string_embexpr contains two references to parsing #{} on lines 899 and 902 of test_parser_events.rb.
The scanner, exercised in test_scanner_events.rb, is also noteworthy. This file defines tests in #test_embexpr_beg and #test_embexpr_end that scan for the token #{expr} inside various string expressions. The tests reference both embexpr and expr, raising the likelihood that "embedded expression" is indeed a sensible name for the thing.
Update for Ruby 2.x
Since this post was originally written, the documentation for the standard library's Ripper class has been updated to formally identify the token. The usage section provides "Hello, #{world}!" as an example, and says in part:
Within our :string_literal you’ll notice two #tstring_content, this is the literal part for Hello, and !. Between the two #tstring_content statements is a :string_embexpr, where embexpr is an embedded expression.
This Block post suggests, it is called an 'idiom':
http://kconrails.com/2010/12/08/ruby-string-interpolation/
The Wikipedia Article doesn't seem to contradict that:
http://en.wikipedia.org/wiki/Programming_idiom
#{} It's called placeholder and is used to reference variables with a string.
puts "My name is #{my_name}"