Is a symbol table in Ruby any different from a symbol table in other languages - ruby

The wikipedia entry on Symbol tables is a good reference:
http://en.wikipedia.org/wiki/Symbol_table
But as I try to understand symbols in Ruby and how they are represented in the Array of Symbols (returned by the Symbol.all_symbols method),
I'm wondering whether Ruby's approach to the symbol table has any important differences from other languages?

Ruby doesn't really have a "symbol table" in that sense. It has bindings, and symbols (what lispers call atoms) but it isn't really doing it the way that article describes.
So in answer to your question: it isn't so much that ruby has the same thing done differently, but rather that it does two different things (:xxx notation --> unique ids and bindings in scopes) and uses similar / overlapping terminology for them.
To clarify:
The article you link to gives the conventional definition of a symbol table, to wit
where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source, such as its type, scope level and sometimes its location
But this isn't what ruby's symbol table does. It just provides a globally unique identity for a certain class of objects which can be written as :something in the source code, including things like :+ and :"Hi bob!" which aren't identifiers. Also, merely using an identifier will not create a corresponding symbol. And finally, none of the information listed in the passage above is stored in ruby's list of symbols.
It's a coincidence of naming, and reading that article will not help you understand ruby's symbols.

The biggest difference is that (like Lisp) Ruby actually has a syntax for symbols, and it's easy to add/remove things at runtime yourself. If you say :balloon (or "balloon".intern) it will intern that for you. Even though you're referring to it by name in your source, internally it's just a pointer in the symbol table. If you compare symbols, it's just a pointer-compare, not a string-compare.
Languages like C don't really have a way to say simply "create a new symbol for me" at runtime. You can do it implicitly at compile-time by defining a function, but that's really its only use. Since C has no syntax for symbols, if you want to be able to say Balloon in your program but be able to compare it with a single machine instruction, you use enums (or #defines).
In Ruby, it takes only one character to make a symbol, so you can use it for all kinds of things (like hash keys).

Symbols in Ruby are used where other languages tend to use enums, defines, constants and the like. They're also often used for associative keys. Their use has little to do with a symbol table as discussed in that article, except that they obviously exist in one.

Related

In Lisp/Racket/Scheme how is it possible to have an argument named `list`?

Isn’t list a keyword to create a new list in Lisp, but yet it is possible to have an argument called list in Lisp. I thought keywords in most programming languages such as Java or C++ cannot be used for argument names, is there a special reason in Lisp that they can?
The name list isn't a reserved keyword, it's an ordinary function. Reusing the name for another purpose can be confusing for the reader but doesn't present any problems for the language itself; it's the same as having two variables called x in different parts of the program.
Mainstream Lisp descendants and derivatives like Commmon Lisp and Scheme do not incorporate the concept of reserved keywords. It is alien to the way Lisp works.
When Lisp read syntax is scanned, identifier tokens which appear in it are converted into corresponding symbol objects. These tokens are all in the same lexical category: symbol.
When Lisp read syntax is scanned and turned into an object, such as a nested list representing program code, this is done without regard for the semantics (what the symbols mean).
This is different from the parsing of languages (such as some of those in the broad Fortran/Algol family) which have reserved keywords.
Roughly speaking, reserved keywords are tokens which look like symbols but are actually just punctuation. Lisp has punctuation also, like parentheses, sharpsign prefixes, various quotes and such.
These punctuation words have a fixed role in the phrase structure grammar, and the phrase structure grammar must be processed before the semantics of the program can be considered.
So for instance, the reserved BEGIN and END keywords in Pascal are essentially nothing more than verbose parentheses. The '(' and ')' tokens are similarly reserved in Lisp-like languages. Trying to use BEGIN as the name of a function or variable in Pascal is similar to trying to use ( as the name of a function or variable in Lisp.
Some languages have keywords which determine phrase structure, yet allow identifiers which look exactly like reserved keywords to be used anyway. For instance, PL/I was famous for this:
IF IF=THEN THEN THEN=ELSE; ELSE ELSE=IF
Lisp dialects may assign special semantic treatment to certain symbols or certain categories of symbols. This is a sort of reservation, but not exactly the same as reserved keywords, because it is at the semantic level. For instance, in Common Lisp, the symbols nil and t (more specifically the nil and t in the common-lisp package, common-lisp:nil and common-lisp:t) may not be used as function or variable names. When either one appears as an expression, it evaluates to itself: the value of t is t and that of nil is nil. Moreover, nil is also the Boolean false value and the empty list. So, effectively, these symbols are reserved in some regards. Common Lisp also has a keyword package. All symbols in that package evaluate to themselves and may not be used as variables. They may be used as function names, and for any other purpose.
You say Lisp, but the answer changes depending on which Lisp you're talking about.
In Common Lisp, you can use list as a variable because Common Lisp is a Lisp-2, meaning that each symbol has a separate slot for a function binding and a variable binding. Common Lisp sets the function binding for the symbol list in the CL package, but doesn't set the variable binding. You can't change the function binding because Common Lisp doesn't allow you to redefine bindings for symbols that are set in the CL package (you can, of course, use whatever symbols you like in your own packages), but since the variable binding is free you're allowed to use it.
Scheme is a Lisp-1, which means that it only has one binding per symbol. There's no separation of function bindings and variable bindings (hence why you use define in Scheme, but defun and defvar in CL). The reason you can use "list" as a variable is because Scheme doesn't prevent you from rebinding its built-in symbols. It's just generally a bad idea, since by redefining list you can no longer call the list function.
Emacs Lisp is a Lisp-2 but doesn't prevent you from rebinding symbols, which means you can do things like (defun + (- a b)) and totally screw up your editing session. So... don't do that, unless you really know what you're doing.
Clojure is a Lisp-1. I don't have a working Clojure install at the moment so I can't comment on what it lets you do. I would suspect it's more strict than Scheme.

When to use symbols in Ruby [duplicate]

This question already has answers here:
How to understand symbols in Ruby
(11 answers)
Using Ruby Symbols
(5 answers)
Closed 8 years ago.
I'm not clear on the value and proper use of symbols.
The benefit seems to be that they remove the need for multiple copies of the same hash by letting it exist only in memory. I wonder whether this is true and what other benefits this brings.
If I were creating a user object with properties such as name, email, and password, and used symbols for each property instead of strings, does that mean that there is only one object for each property? It seems like this would avoid a string copy for the properties in the hash (which seems like a good thing).
Can someone help me understand what a symbol is and when it's better to use one over a string in a hash? What are the benefits and pitfalls of each?
Also, can anyone speak to the memory tradeoffs of each? With scalability being important, I'm curious if symbols would help with speed.
Symbols, or "internals" as they're also referred to as, are useful for hash keys, common arguments, and other places where the overhead of having many, many duplicate strings with the same value is inefficient.
For example:
params[:name]
my_function(with: { arguments: [ ... ] })
record.state = :completed
These are generally preferable to strings because they will be repeated frequently.
The most common uses are:
Hash keys
Arguments to methods
Option flags or enum-type property values
It's better to use strings when handling user data of an unknown composition. Unlike strings which can be garbage collected, symbols are permanent. Converting arbitrary user data to symbols may fill up the symbol table with junk and possibly crash your application if someone's being malicious.
For example:
user_data = JSON.load(...).symbolize_keys
This would allow an attacker to create JSON data with intentionally long, randomized names that, in time, would bloat your process with all kinds of useless junk.
Besides avoiding the need for repeated memory allocation, symbols can be compared for equality faster than strings, and their hash codes can be computed faster than strings (so both storing and retrieving data from a Hash will be faster when symbol rather than string keys are used).
Internally, Ruby uses something closely related to symbols to identify methods, the names of classes, and so on. So, for example, when you retrieve a list of the methods an object supports (with obj.methods), you get back an array of symbols. When you want to call a method "dynamically", using a name stored in a variable or passed in as an argument, you must use a symbol. Likewise for getting/setting the values of instance variables, constants, and so on.
Intuitively, you can think of it this way. If you've ever programmed in C, you have written things like:
#define SOMETHING 1
#define SOMETHING_ELSE 2
These defines eliminate the need to use "magic numbers" in your code. The names used (SOMETHING, etc) are not relevant to users of your program, just as the names of functions or classes are not relevant to users. They are just "labels" which are internal to the code, and are of concern only to the programmer. Symbols play a similar role in Ruby programs. They are a data type with performance properties similar to integers, but with a literal syntax which makes them appear as meaningful names to a human programmer.
Once you "get" the concept of Ruby symbols, understanding Lisp symbols will be much easier, if you ever program in Lisp. In Lisp, symbols are the basic data type which program code is composed of. (Because Lisp programs are data, and can be manipulated as such.)
You should think about symbols like a numbers. It is constant, immutable and non-gc object that is created on first usage and you should use them whenever you need to reference to object that cannot be duplicated, like:
messages aka methods (Ruby doesn't have overloading)
hash keys (Ruby doesn't have multi hashes)
Yes, your example is fine.
name, email, and password could all be stored as symbols, even in a hash - the specific object could still be a string object.
{
:name => 'John doe',
:email => 'foo#hotmail.com',
:password => 'lassdgjkl23853'
}

Using Ruby Symbols

First time I tried learning Ruby was 2 years ago, now I have started again. The reason I stopped was because I could not understand the Symbol class. And now I am at the same point again, completely lost in when and why you use Symbols. I have read the other posts on Stackoverflow as well as Googled for several explanations. But I do not understand it yet.
First I thought symbols was just a way to create some sort of "named constant" without having to go through the same process as in let say Java.
:all
instead of making a constant with an arbitrary value public static final String ALL = 8;
However it does not make much sense when you use it in e.g. attr_accessor :first_name etc.
Are Symbols just a lightweight String class? I am having problems understanding how I should interpret, when and how to use symbols both in my own classes and in frameworks.
In short, symbols are lightweight strings, but they also are immutable and non-garbage-collectable.
You should not use them as immutable strings in your data processing tasks (remember, once symbol is created, it can't be destroyed). You typically use symbols for naming things.
# typical use cases
# access hash value
user = User.find(params[:id])
# name something
attr_accessor :first_name
# set hash value in opts parameter
db.collection.update(query, update, multi: true, upsert: true)
Let's take first example, params[:id]. In a moderately big rails app there may be hundreds/thousands of those scattered around the codebase. If we accessed that value with a string, params["id"], that means new string allocation each time (and that string needs to be collected afterwards). In case of symbol, it's actually the same symbol everywhere. Less work for memory allocator, garbage collector and even you (: is faster to type than "")
If you have a simple one-word string that appears often in your code and you don't do something funky to it (interpolation, gsub, upcase, etc), then it's likely a good candidate to be a symbol.
However, does this apply only to text that is used as part of the actual program logic such as naming, not text that you get while actually running the program...such as text from the user/web etc?
I can not think of a single case where I'd want to turn data from user/web to symbol (except for parsing command-line options, maybe). Mainly because of the consequences (once created symbols live forever).
Also, many editors provide different coloring for symbols, to highlight them in the code. Take a look at this example
The O'Reilly Ruby Cookbook (p. 15) quotes Jim Weirich as saying:
If the contents (the sequence of characters) of the object are important, use a string.
If the identity of the object is important, use a symbol.
Symbols are generally used as hash keys, because it's the identity of the key that's important. Symbols are also required when passing messages using certain methods like Object#send.
A Ruby implementation typically has a table in which it stores the names of all classes, methods and variables. It refers to say a method name by the position in the table, avoiding expensive string comparisons. But you can use this table too and add values to it: symbols.
If you write code that uses strings as identifiers rather than for their textual content, consider symbols. If you write a method that expects an argument to be either 'male' or 'female', consider using :male and :female . Comparing two symbols for equality is faster than strings (that's why symbols make good hash keys).
Symbols are used for naming things in the language: the names of classes, the names of methods etc.
These are very like strings, except they can never be garbage collected, and testing for equality is optimised to be very quick.
The Java implementation has a very similar thing, except that it is not available for runtime use. What I mean is, when you write java code like obj.someMethod(4), the string 'someMethod' is converted by the compiler into a symbol which is embedded in a lookup table in the .class file. These symbols are like 'special' strings which are not garbage collected, and which are very fast to compare for equality. This is almost identical to Ruby, except that Ruby allows you to create new symbols at runtime, whereas Java only allows it at compile time.
This is just like creating new methods -- Java allows it at compile time; Ruby allows it at runtime.
After ruby version 2.2 symbol GC was removed, so now mortal symbols i.e when we convert string to symbol ("mortal".to_sym) gets cleaned up from memory.
check this out:
require 'objspace'
ObjectSpace.count_symbols
{
:mortal_dynamic_symbol=>3,
:immortal_dynamic_symbol=>5,
:immortal_static_symbol=>3663,
:immortal_symbol=>3668
}
source: https://www.rubyguides.com/2018/02/ruby-symbols/

Why aren't the arguments to File.new symbols instead of strings?

I was wondering why the people who wrote the File library decided to make the arguments that determine what mode the file is opened in strings instead of symbols.
For example, this is how it is now:
f = File.new('file', 'rw')
But wouldn't it be a better design to do
f = File.new('file', :rw)
or even
f = File.new(:file, :rw)
for example? This seems to be the perfect place to use them since the argument definitely doesn't need to be mutable.
I am interested in knowing why it came out this way.
Update: I just got done reading a related question about symbols vs. strings, and I think the consensus was that symbols are just not as well known as strings, and everyone is used to using strings to index hash tables anyway. However, I don't think it would be valid for the designers of Ruby's standard library to plead ignorance on the subject of symbols, so I don't think that's the reason.
I'm no expert in the history of ruby, but you really have three options when you want parameters to a method: strings, symbols, and static classes.
For example, exception handling. Each exception is actually a type of class Exception.
ArgumentError.is_a? Class
=> True
So you could have each permission for the stream be it's own class. But that would require even more classes to be generated for the system.
The thing about symbols is they are never deleted. Every symbol you generate is preserved indefinitely; it's why using the method '.to_sym' lightly is discouraged. It leads to memory leaks.
Strings are just easier to manipulate. If you got the input mode from the user, you would need a '.to_sym' somewhere in your code, or at the very least, a large switch statement. With a string, you can just pass the user input directly to the method (if you were so trusting, of course).
Also, in C, you pass a character to the file i/o method. There are no Chars in ruby, just strings. Seeing as how ruby is built on C, that could be where it comes from.
It is simply a relic from previous languages.

What are some example use cases for symbol literals in Scala?

The use of symbol literals is not immediately clear from what I've read up on Scala. Would anyone care to share some real world uses?
Is there a particular Java idiom being covered by symbol literals? What languages have similar constructs? I'm coming from a Python background and not sure there's anything analogous in that language.
What would motivate me to use 'HelloWorld vs "HelloWorld"?
Thanks
In Java terms, symbols are interned strings. This means, for example, that reference equality comparison (eq in Scala and == in Java) gives the same result as normal equality comparison (== in Scala and equals in Java): 'abcd eq 'abcd will return true, while "abcd" eq "abcd" might not, depending on JVM's whims (well, it should for literals, but not for strings created dynamically in general).
Other languages which use symbols are Lisp (which uses 'abcd like Scala), Ruby (:abcd), Erlang and Prolog (abcd; they are called atoms instead of symbols).
I would use a symbol when I don't care about the structure of a string and use it purely as a name for something. For example, if I have a database table representing CDs, which includes a column named "price", I don't care that the second character in "price" is "r", or about concatenating column names; so a database library in Scala could reasonably use symbols for table and column names.
If you have plain strings representing say method names in code, that perhaps get passed around, you're not quite conveying things appropriately. This is sort of the Data/Code boundary issue, it's not always easy to the draw the line, but if we were to say that in that example those method names are more code than they are data, then we want something to clearly identify that.
A Symbol Literal comes into play where it clearly differentiates just any old string data with a construct being used in the code. It's just really there where you want to indicate, this isn't just some string data, but in fact in some way part of the code. The idea being things like your IDE would highlight it differently, and given the tooling, you could refactor on those, rather than doing text search/replace.
This link discusses it fairly well.
Note: Symbols will be deprecated and then removed in Scala 3 (dotty).
Reference: http://dotty.epfl.ch/docs/reference/dropped-features/symlits.html
Because of this, I personally recommend not using Symbols anymore (at least in new scala code). As the dotty documentation states:
Symbol literals are no longer supported
it is recommended to use a plain string literal [...] instead
Python mantains an internal global table of "interned strings" with the names of all variables, functions, modules, etc. With this table, the interpreter can make faster searchs and optimizations. You can force this process with the intern function (sys.intern in python3).
Also, Java and Scala automatically use "interned strings" for faster searchs. With scala, you can use the intern method to force the intern of a string, but this process don't works with all strings. Symbols benefit from being guaranteed to be interned, so a single reference equality check is both sufficient to prove equality or inequality.

Resources