What is the difference between an "interned" and an "uninterned" symbol. Is it only Racket that has uninterned symbols or do other dialects of scheme or lisp have them?
Interned symbols are eq? if and only if they have the same name. Uninterned symbols are not eq? to any other symbol, so they are a kind of unique token with an attached string. Interned symbols are the kind that are produced by the default reader. Uninterned symbols can be used as identifiers when generating code in a macro, such an identifier cannot be shadowed by any other identifier. Most Lisp dialects have this concepts, in Scheme it is rarer, since hygienic macros are supposed to reduce its usefulness.
Common Lisp has uninterned symbols. As Juho's answer says, an uninterned symbol is guaranteed not to be equal to any other value.
Common Lisp-style requires uninterned symbols in order to write many macros correctly (particularly macros whose expansion requires introducing and binding new variables), because any interned symbol you use in a macro expansion might capture or shadow a binding in its expansion site.
Scheme's hygienic macro systems, on the other hand, do not have this problem, so a Scheme system does not need to provide uninterned symbols. Still, many of them do. Why? Several reasons:
Some Scheme systems offer a Common Lisp-style defmacro capability.
In others, the hygienic macro system's implementation may use uninterned symbols internally, but the concept of an uninterned symbol may be exposed.
Uninterned symbols can be useful in many programs that use s-expressions to represent a language, and transform this language into another s-expr language. These sorts of tasks often benefit from an ability to generate an identifier guaranteed to be new.
The other two excellent answers nevertheless fail to mention the virtue of interned values generally, which is that they can be compared in constant time. This typically means that these values are represented as pointers to a table without duplicates. In Racket, as of a few months ago[*], other values--floating point values and strings used as literals, for instance--will also be interned. In addition to allowing faster comparisons, I believe that this enables better compile-time optimizations, because these values can be compared for equality without running the code.
Are there other systems that do things like this? I bet there are.
[*] I'm sure someone will correct me if I'm wrong :).
Related
There are many differences between Common Lisp and Scheme such as whether functions and variables share a namespace, whether macros are hygienic, and how strongly functional style is preferred; this shows up in some vocabulary differences such as setq vs set!.
But quite a bit of vocabulary is still shared, such as quote and cons.
I'm looking for a full list of vocabulary shared between the languages. Does such a thing exist?
Alternatively, I could make one myself given the vocabulary of each language, i.e. a list of all the known symbols including language primitives, standard library macros and functions. Do those exist for Common Lisp (as in the standard) and Scheme (as in any RxRS, or failing that, any dialect)?
Get all symbols of Common Lisp:
(sort (loop for sym being each external-symbol of "CL" collect sym)
#'string-lessp)
You are looking for this web page:
http://hyperpolyglot.org/lisp
Anyone knows if there is similar concept in other popular language compared to symbol literal in Ruby? Can I considered it just as an "Interned String"?
Yes, symbols (sometimes referred to as atoms in other languages) can be considered interned strings.
There's a ton of information on Ruby symbols here: Question - Understanding Symbols In Ruby
And, an afterthought, this question lists many examples of similar concepts in a couple languages:
Lisp and Erlang Atoms, Ruby and Scheme Symbols. How useful are they?
Anyone knows if there is similar concept in other popular language compared to symbol literal in Ruby?
Sure, symbols in Ruby come from symbols in Smalltalk, which in turn gets them from Lisp. Scala also has symbols, and Erlang's atoms are similar. Erlang probably got them from Prolog.
Can I considered it just as an "Interned String"?
You can consider it all sorts of things, but symbols are symbols. They aren't immutable strings or interned strings or whatever ... they are just symbols.
The two languages where I have used symbols are Ruby and Erlang and I've always found them to be extremely useful.
Haskell does have algebraic datatypes, but I still think symbols would be mighty convenient. An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".
The syntactic sugar for atoms can be minor - :something or <something> is an atom. All atoms are instances of a Type called Atom which derives Show and Eq. You can then use it for more descriptive error codes, for example
type ErrorCode = Atom
type Message = String
data Error = Error ErrorCode Message
loginError = Error :redirect "Please login first"
In this case :redirect is more efficient than using a string ("redirect") and easier to understand than an integer (404).
The benefit may seem minor, but I say it is worth adding atoms as a language feature (or at least a GHC extension).
So why have symbols not been added to the language? Or am I thinking about this the wrong way?
I agree with camccann's answer that it's probably missing mainly because it would have to be baked quite deeply into the implementation and it is of too little use for this level of complication. In Erlang (and Prolog and Lisp) symbols (or atoms) usually serve as special markers and serve mostly the same notion as a constructor. In Lisp, the dynamic environment includes the compiler, so it's partly also a (useful) compiler concept leaking into the runtime.
The problem is the following, symbol interning is impure (it modifies the symbol table). Because we never modify an existing object it is referentially transparent, however, but if implemented naïvely can lead to space leaks in the runtime. In fact, as currently implemented in Erlang you can actually crash the VM by interning too many symbols/atoms (current limit is 2^20, I think), because they can never get garbage collected. It's also difficult to implement in a concurrent setting without a huge lock around the symbol table.
Both problems can be (and have been) solved, however. For example, see Erlang EEP 20. I use this technique in the simple-atom package. It uses unsafePerformIO under the hood, but only in (hopefully) rare cases. It could still use some help from the GC to perform an optimisation similar to indirection shortening. It also uses quite a few IORefs internally which isn't too great for performance and memory usage.
In summary, it can be done but implementing it properly is non-trivial. Compiler writers always weigh the power of a feature against its implementation and maintenance efforts, and it seems like first-class symbols lose out on this one.
I think the simplest answer is that, of the things Lisp-style symbols (which is where both Ruby and Erlang got the idea, I believe) are used for, in Haskell most are either:
Already done in some other fashion--e.g. a data type with a bunch of nullary constructors, which also behave as "convenient names for integers".
Awkward to fit in--things that exist at the level of language syntax instead of being regular data usually have more type information associated with them, but symbols would have to either be distinct types from each other (nearly useless without some sort of lightweight ad-hoc sum type) or all the same type (in which case they're barely different from just using strings).
Also, keep in mind that Haskell itself is actually a very, very small language. Very little is "baked in", and of the things that are most are just syntactic sugar for other primitives. This is a bit less true if you include a bunch of GHC extensions, but GHC with -XAndTheKitchenSinkToo is not the same language as Haskell proper.
Also, Haskell is very amenable to pseudo-syntax and metaprogramming, so there's a lot you can do even without having it built in. Particularly if you get into TH and scary type metaprogramming and whatever else.
So what it mostly comes down to is that most of the practical utility of symbols is already available from other features, and the stuff that isn't available would be more difficult to add than it's worth.
Atoms aren't provided by the language, but can be implemented reasonably as a library:
http://hackage.haskell.org/package/simple-atom
There are a few other libs on hackage, but this one looks the most recent and well-maintained.
Haskell uses type constructors* instead of symbols so that the set of symbols a function can take is closed, and can be reasoned about by the type system. You could add symbols to the language, but it would put you in the same place that using strings would - you'd have to check all possible symbols against the few with known meanings at runtime, add error handling all over the place, etc. It'd be a big workaround for all the compile-time checking.
The main difference between strings and symbols is interning - symbols are atomic and can be compared in constant time. Both are types with an essentially infinite number of distinct values, though, and against the grain of Haskell's specifying arguments and results with finite types.
I'm more familiar with OCaml than Haskell, so "type constructor" may not be the right term. Things like None or Just 3.
An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".
Use Enum instead.
data FileType = GZipped | BZipped | Plain
deriving Enum
descr ft = ["compressed with gzip",
"compressed with bzip2",
"uncompressed"] !! fromEnum ft
This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Why don't more projects use Ruby Symbols instead of Strings?
What's the difference between a string and a symbol in Ruby?
I am new to ruby language.
From what I read I must use symbols instead strings where is possible.
Is that correct ?
You don't HAVE TO do anything. However, it is advisable to use symbols instead of strings in cases where you will be using the same string literal over and over again. The reason is that there is only ever 1 of the symbol held in memory, while if you use the string literal it will created anew whenever you assign it, therefore potentially wasting memory.
Strings are mutable where as Symbols are not. Symbols are better performance wise.
Read this article to better understand symbols, strings and their differences.
It's sort of up to you which one you use -- you can use a string anywhere you'd use a symbol, but not the other way around. Symbols do have a number of advantages, in a few cases.
Symbols give you a performance advantage because two symbols of the same name actually map to the same object in memory, whereas two strings with the same characters create different objects. Symbols are immutable and lightweight, which makes them ideal for elements that you won't be changing around at runtime; keys in a hash table, for example.
Here's a nice excerpt from the Ruby Newbie guide to symbols:
The granddaddy of all advantages is
also the granddaddy of advantages:
symbols can't be changed at runtime.
If you need something that absolutely,
positively must remain constant, and
yet you don't want to use an
identifier beginning with a capital
letter (a constant), then symbols are
what you need.
The big advantage of symbols is that
they are immutable (can't be changed
at runtime), and sometimes that's
exactly what you want.
Sometimes that's exactly what you
don't want. Most usage of strings
requires manipulation -- something you
can't do (at least directly) with
symbols.
Another disadvantage of symbols is
they don't have the String class's
rich set of instance methods. The
String class's instance method make
life easier. Much easier.
Symbols have two main advantages:
They are like intern'ed strings, so they can improve performance.
They are syntactically distinct which can make your code more readable, particularly for programs that use hashtables as complex build-in data structures.
Why does Ruby expose symbols for explicit use? Isn't that the sort of optimisation that's usually handled by the interpreter/compiler?
Part of the issue is that Ruby strings are mutable. Since every string Ruby allocates must be independent (it can't cache short/common ones), it's convenient to have a Symbol type to let the programmer have what are essentially immutable, memory-efficient strings.
Also, they share many characteristics with enum's, but with less pain for the programmer.
Ruby symbols are used in lieu of string constants in other similar languages. Besides the performance benefit, they can be used to semantically distinguish between string data and a more abstract symbol. Being syntactically different, they can clearly be distinguished in code.
Have a look at Ruby symbols post.