Stream implementations - data-structures

Can streams be implemented in DrRacket as they are in SICP?

Short answer: yes.
In fact, Racket has several things that might be useful to you in this regard; there are streams and stream constructors, generators, sequences, and a lazy language, along with the 'delay' and 'force' that (IIRC) SICP uses.
Just to get you started, here's the documentation page for Delayed Evaluation, which is probably closest to what you're asking about.
http://docs.racket-lang.org/reference/Delayed_Evaluation.html

Related

Scheme and Lisp best practices: recursion yes for Scheme, no for Lisp?

As I understand -- and I'm here to be corrected if wrong -- good Scheme practice is to do anything requiring looping, repeating with recursion, and, furthermore, overflow won't be a problem because tail-recursion is built-in. Lisp, however, doesn't have protection from overflow, hence, all the loop iterative macros (loop, while, etc). And so in real-world Lisp you usually don't use recursion, while Scheme wants it all but exclusively.
If my assumptions are true, is there a way to be "pure" with Lisp and not risk overflow? Or is this too much swimming against the stream to use recursion in Lisp? I recall from The Little Schemer how they give you a thorough workout with recursion. And there was an earlier edition called The Little Lisper. Did it give you the same recursion workout in Lisp? And then Land of Lisp left me confused about whether loops or recursion was "best practice."
What I'm trying to do is decide whether to use Racket inside of Emacs Org-mode or just use built-in Elisp for beginner students. I want students to stay as purely functional as possible, e.g., I don't want to explain the very new and difficult topic of recursion, then say "Oh, but we won't be using it..."
As I understand -- and I'm here to be corrected if wrong -- good Scheme practice is to do anything requiring looping, repeating with recursion, and, furthermore, overflow won't be a problem because tail-recursion is built-in.
This is correct as far as I know.
Lisp, however, doesn't have protection from overflow [...]
Not exactly. Most self-respective Common Lisp implementations provide tail-call merging (with some restrictions, see https://0branch.com/notes/tco-cl.html). The difference is that there is no requirements from the language specification to have it. That grants compiler writers more freedom when implementing various Common Lisp features. Emacs Lisp does not have TCO, except in the form of libraries like recur or tco (self-recursion).
... hence, all the loop iterative macros (loop, while, etc). And so in real-world Lisp you usually don't use recursion, while Scheme wants it all but exclusively.
The difference is mostly cultural. Take a REPL (Read-Eval-Print-Loop). I think it makes more sense to consider the interaction as a loop rather than a infinitely tail-recursive function. Somehow, there seem to be some reluctance in purely functional languages to even consider loops as a primitive control structure, whereas iterative processes are considered more elegant. Why not have both?
If my assumptions are true, is there a way to be "pure" with Lisp and not risk overflow? Or is this too much swimming against the stream to use recursion in Lisp?
You can certainly use recursion in Lisp, provided you do not abuse it, which should not be the case with small programs. Consider for example that map in OCaml is not tail-recursive, but people still use it regularly. If you use SBCL, there is a section in the manual that explains how to enforce tail-call elimination.
What I'm trying to do is decide whether to use Racket inside of Emacs Org-mode or just use built-in Elisp for beginner students. I want students to stay as purely functional as possible, e.g., I don't want to explain the very new and difficult topic of recursion, then say "Oh, but we won't be using it..."
If you want to teach functional programming, use a more functional language. In other words, between Racket and Emacs Lisp, I'd say Racket is more appropriate for students. There are more materials to teach functional programming with Racket, there is also Typed Racket.
There are a lot of differences between the typical Lisp dialects and the various Scheme dialects:
Scheme
requires tail call optimization
favors tail recursion over imperative loop constructs
dislikes imperative control flow
provides functional abstractions
tries to map control structures to functions (see for example CALL-WITH-CURRENT-CONTINUATION)
implements some looping macros as extensions
Lisp
Parts of this applies for Lisp dialects like Emacs Lisp, Common Lisp or ISLisp.
has low level control flow, for example GOTO like constructs: don't use them in user level code, sometimes they may be useful in macros.
does not standardize or even provide tail call optimization: for maximum portability don't require TCO in your code
provides simple macros: DO, DOTIMES, DOLIST: use them where directly applicable
provides complex macros: LOOP, or ITERATE as a library
provides functional abstractions like MAP, MAPCAR, REDUCE
Implementations typically have hard limits on stack sizes, which makes non-TCO recursive functions problematic. Typically one can set a large stack size upfront or extend the stack size at runtime.
Also tail call optimization has some incompatibilities with dynamic scope constructs like special variables or similar.
For TCO support in Common Lisp see: Tail Call Optimisation in Common Lisp Implementations
Style
Some prefer recursive functions, but I would not use it generally. Generally I (that's my personal opinion) prefer explicit loop constructs over general recursive calls. Others prefer the idea of a more mathematical approach of recursive functions.
If there is a higher-order function like MAP, use that.
If the loop code gets more complex, using a powerful LOOP statement can be useful
In Racket the looping construct of choice is for.
Racket (like Scheme) also has TCO, so in principle you can
write all looping constructs using function calls - it just isn't
very convenient.
There is no Tail Call Optimisation (TCO) in Emacs Lisp, so even tho I personally like recursion very much, I'd generally recommend against its use in Elisp (except of course in those cases where it's the only natural choice).
Elisp as a programming environment is fairly good, but I must agree with "coredump" and would recommend Racket instead, which has great support for beginners.

Efficiency of Streams in scheme

As far as I learned using streams in large programs are way more efficient than using normal lisp in DrRacket.So why not the default evaluation is lazy evaluation in DrRacket?I wrote and put a timer procedure which calculates the time the work needed to be complete and in every complex program lazy eval was a lot faster.
AFAIK using streams when you are doing something like sorting is a waste of cycles since you need to be finished with the sort in order to know the first element. If you have tasks that work like a sort so that you'll need to evaluate a whole set you'll end up using more time than without streams. The reason for that is that the whole stream system has a cost as well as benefits.
The benefits of streams are the fact that you can do calculations in parallel so that the program doesn't need to do a whole loop before processing the first element. If you have n layers of processing streams you'll benefit when your program quits and all the other layers hasn't served you the whole thing yet.
DrRacket is not a language but an IDE. Racket is both a language (#!racket as first line of source) and the name of the implementation that implements it.
Racket supports #!lazy which is a lazy version of Racket. Basically everything works just like streams do, everywhere. You'll have the same benefits and cost.
None of the mentioned languages are Scheme, but #!racket was based on and was a superset of #!r5rs. Since then you have #!r6rs and the new #!r7rs. None of the official Scheme reports are lazy. The reason is that its predecessor was eager and making it lazy would completely change the language and ruin all backwards compatibility.
The innovation of Scheme in 1975 was lexical closures. The creators made lazy evaluation by need in an later report (by implementing delay and force). Other languages, like Haskell, are built to be lazy from the ground and they have a more advanced compiler to constant fold and make its code snappy.

Examples where compiler-optimized functional code performs better than imperative code

One of the promises of side-effect free, referentially transparent functional programming is that such code can be extensively optimized.
To quote Wikipedia:
Immutability of data can, in many cases, lead to execution efficiency, by allowing the compiler to make assumptions that are unsafe in an imperative language, thus increasing opportunities for inline expansion.
I'd like to see examples where a functional language compiler outperforms an imperative one by producing a better optimized code.
Edit: I tried to give a specific scenario, but apparently it wasn't a good idea. So I'll try to explain it in a different way.
Programmers translate ideas (algorithms) into languages that machines can understand. At the same time, one of the most important aspects of the translation is that also humans can understand the resulting code. Unfortunately, in many cases there is a trade-off: A concise, readable code suffers from slow performance and needs to be manually optimized. This is error-prone, time consuming, and it makes the code less readable (up to totally unreadable).
The foundations of functional languages, such as immutability and referential transparency, allow compilers to perform extensive optimizations, which could replace manual optimization of code and free programmers from this trade-off. I'm looking for examples of ideas (algorithms) and their implementations, such that:
the (functional) implementation is close to the original idea and is easy to understand,
it is extensively optimized by the compiler of the language, and
it is hard (or impossible) to write similarly efficient code in an imperative language without manual optimizations that reduce its conciseness and readability.
I apologize if it is a bit vague, but I hope the idea is clear. I don't want to give unnecessary restrictions on the answers. I'm open to suggestions if someone knows how to express it better.
My interest isn't just theoretical. I'd like to use such examples (among other things) to motivate students to get interested in functional programming.
At first, I wasn't satisfied by a few examples suggested in the comments. On second thoughts I take my objections back, those are good examples. Please feel free to expand them to full answers so that people can comment and vote for them.
(One class of such examples will be most likely parallelized code, which can take advantage of multiple CPU cores. Often in functional languages this can be done easily without sacrificing code simplicity (like in Haskell by adding par or pseq in appropriate places). I' be interested in such examples too, but also in other, non-parallel ones.)
There are cases where the same algorithm will optimize better in a pure context. Specifically, stream fusion allows an algorithm that consists of a sequence of loops that may be of widely varying form: maps, filters, folds, unfolds, to be composed into a single loop.
The equivalent optimization in a conventional imperative setting, with mutable data in loops, would have to achieve a full effect analysis, which no one does.
So at least for the class of algorithms that are implemented as pipelines of ana- and catamorphisms on sequences, you can guarantee optimization results that are not possible in an imperative setting.
A very recent paper Haskell beats C using generalised stream fusion by Geoff Mainland, Simon Peyton Jones, Simon Marlow, Roman Leshchinskiy (submitted to ICFP 2013) describes such an example. Abstract (with the interesting part in bold):
Stream fusion [6] is a powerful technique for automatically transforming
high-level sequence-processing functions into efficient implementations.
It has been used to great effect in Haskell libraries
for manipulating byte arrays, Unicode text, and unboxed vectors.
However, some operations, like vector append, still do not perform
well within the standard stream fusion framework. Others,
like SIMD computation using the SSE and AVX instructions available
on modern x86 chips, do not seem to fit in the framework at
all.
In this paper we introduce generalized stream fusion, which
solves these issues. The key insight is to bundle together multiple
stream representations, each tuned for a particular class of stream
consumer. We also describe a stream representation suited for efficient
computation with SSE instructions. Our ideas are implemented
in modified versions of the GHC compiler and vector library.
Benchmarks show that high-level Haskell code written using
our compiler and libraries can produce code that is faster than both
compiler- and hand-vectorized C.
This is just a note, not an answer: the gcc has a pure attribute suggesting it can take account of purity; the obvious reasons are remarked on in the manual here.
I would think that 'static single assignment' imposes a form of purity -- see the links at http://lambda-the-ultimate.org/node/2860 or the wikipedia article.
make and various build systems perform better for large projects by assuming that various build steps are referentially transparent; as such, they only need to rerun steps that have had their inputs change.
For small to medium sized changes, this can be a lot faster than building from scratch.

Writing portable scheme code. Is anything "standard" beyond R5RS itself?

I'm learning scheme and until now have been using guile. I'm really just learning as a way to teach myself a functional programming language, but I'd like to publish an open source project of some sort to reenforce the study— not sure what yet... I'm a web developer, so probably something webby.
It's becoming apparent that publishing scheme code isn't very easy to do, with all these different implementations and no real standards beyond the core of the language itself (R5RS). For example, I'm almost certainly going to need to do basic IO on disk and over a TCP socket, along with string manipulation, such as scanning/regex, which seems not to be covered by R5RS, unless I'm not seeing it in the document. It seems like Scheme is more of a "concept" than a practical language... is this a fair assessment? Perhaps I should look to something like Haskell if I want to learn a functional programming language that lends itself more to use in open source projects?
In reality, how much pain do the differing scheme implementations pose when you want to publish an open source project? I don't really fancy having to maintain 5 different functions for basic things like string manipulation under various mainstream implementations (Chicken, guile, MIT, DrRacket). How many people actually write scheme for cross-implementation compatibility, as opposed to being tightly coupled with the library functions that only exist in their own scheme?
I have read http://www.ccs.neu.edu/home/dorai/scmxlate/scheme-boston/talk.html, which doesn't fill me with confidence ;)
EDIT | Let's re-define "standard" as "common".
I believe that in Scheme, portability is a fool's errand, since Scheme implementations are more different than they are similar, and there is no single implementation that other implementations try to emulate (unlike Python and Ruby, for example).
Thus, portability in Scheme is analogous to using software rendering for writing games "because it's in the common subset between OpenGL and DirectX". In other words, it's a lowest common denominator—it can be done, but you lose access to many features that the implementation offers.
For this reason, while SRFIs generally have a portable reference implementation (where practical), some of them are accompanied by notes that a quality Scheme implementation should tailor the library to use implementation-specific features in order to function optimally.
A prime example is case-lambda (SRFI 16); it can be implemented portably, and the reference implementation demonstrates it, but it's definitely less optimal compared to a built-in case-lambda, since you're having to implement function dispatch in "user" code.
Another example is stream-constant from SRFI 41. The reference implementation uses an O(n) simulation of circular lists for portability, but any decent implementation should adapt that function to use real circular lists so that it's O(1).†
The list goes on. Many useful things in Scheme are not portable—SRFIs help make more features portable, but there's no way that SRFIs can cover everything. If you want to get useful work done efficiently, chances are pretty good you will have to use non-portable features. The best you can do, I think, is to write a façade to encapsulate those features that aren't already covered by SRFIs.
† There is actually now a way to implement stream-constant in an O(1) fashion without using circular lists at all. Portable and fast for the win!
Difficult question.
Most people decide to be pragmatic. If portability between implementations is important, they write the bulk of the program in standard Scheme and isolate non-standard parts in (smallish) libraries. There have been various approaches of how exactly to do this. One recent effort is SnowFort.
http://snow.iro.umontreal.ca/
An older effort is SLIB.
http://people.csail.mit.edu/jaffer/SLIB
If you look - or ask for - libraries for regular expressions and lexer/parsers you'll quickly find some.
Since the philosophy of R5RS is to include only those language features that all implementors agree on, the standard is small - but also very stable.
However for "real world" programming R5RS might not be the best fit.
Therefore R6RS (and R7RS?) include more "real world" libraries.
That said if you only need portability because it seems to be the Right Thing, then reconsider carefully if you really want to put the effort in.
I would simply write my program on the implementation I know the best. Then if necessary port it afterwards. This often turns out to be easier than expected.
I write a blog that uses Scheme as its implementation language. Because I don't want to alienate users of any particular implementation of Scheme, I write in a restricted dialect of Scheme that is based on R5RS plus syntax-case macros plus my Standard Prelude. I don't find that overly restrictive for the kind of algorithmic programs that I write, but your needs may be different. If you look at the various exercises on the blog, you will see that I wrote my own regular-expression matcher, that I've done a fair amount of string manipulation, and that I've snatched files from the internet by shelling out to wget (I use Chez Scheme -- users have to provide their own non-portable shell mechanism if they use anything else); I've even done some limited graphics work by writing ANSI terminal sequences.
I'll disagree just a little bit with Jens. Instead of porting afterwards, I find it easier to build in portability from the beginning. I didn't use to think that way, but my experience over the last three years shows that it works.
It's worth pointing out that modern Scheme implementations are themselves fairly portable; you can often port whole programs to new environments simply by bringing the appropriate Scheme along. That doesn't help library programmers much, though, and that's where R7RS-small, the latest Scheme definition, comes in. It's not widely implemented yet, but it provides a larger common core than R5RS.

Cons of first class continuations

What are some of the criticisms leveled against exposing continuations as first class objects?
I feel that it is good to have first class continuations. It allow complete control over the execution flow of instructions. Advanced programmers can develop intuitive solutions to certain kind of problems. For instance, continuations are used to manage state on web servers. A language implementation can provide useful abstractions on top of continuations. For example, green threads.
Despite all these, are there strong arguments against first class continuations?
The reality is that many of the useful situations where you could use continuations are already covered by specialized language constructs: throw/catch, return, C#/Python yield. Thus, language implementers don't really have all that much incentive to provide them in a generalized form usable for roll-your-own solutions.
In some languages, generalized continuations are quite hard to implement efficiently. Stack-based languages (i.e. most languages) basically have to copy the whole stack every time you create a continuation.
Those languages can implement certain continuation-like features, those that don't break the basic stack-based model, a lot more efficiently than the general case, but implementing generalized continuations is quite a bit harder and not worth it.
Functional languages are more likely to implement continuations for a couple of reasons:
They are frequently implemented in continuation passing style, which means the "call stack" is probably a linked list allocated on the heap. This makes it trivial to pass a pointer to the stack as a continuation, since you don't need to overwrite the stack context when you pop the current frame and push a new one. (I've never implemented CPS but that's my understanding of it.)
They favor immutable data bindings, which make your old continuation a lot more useful because you will not have altered the contents of variables that the stack pointed to when you created it.
For these reasons, continuations are likely to remain mostly just in the domain of functional languages.
First up, there is more then just call/cc when it comes to continuation. I suggest starting with Mark Feelys paper: A better API for first class continuations
Next up I suggest reading about the control operators shift and reset, which is a different way of representing contunations.
A significant objection is implementation cost. If the runtime uses a stack, then first-class continuations require a stack copy at some point. The copy cost can be controlled (see Representing Control in the Presence of First-Class Continuations for a good strategy), but it also means that mutable variables cannot be allocated on the stack. This isn't an issue for functional or mostly-functional (e.g., Scheme) languages, but this adds significant overhead for OO languages.
Most programmers don't understand them. If you have code that uses them, it's harder to find replacement programmers who will be able to work with it.
Continuations are hard to implement on some platforms. For example, JRuby doesn't support continuations.
First-class continuations undermine the ability to reason about code, especially in languages that allow continuations to be imperatively assigned to variables, because the insides of closures can be brought alive again in hairy ways.
Cf. Kent Pitman's complaint about continuations, about the tricky way that unwind-protect interacts with call/cc
Call/cc is the 'goto' of advanced functional programming (a la the example here).
in ruby 1.8 the implementation was extremely slow. better in 1.9, and of course most schemes have had them built in and performing well from the outset.

Resources