Using Ruby Symbols - ruby

First time I tried learning Ruby was 2 years ago, now I have started again. The reason I stopped was because I could not understand the Symbol class. And now I am at the same point again, completely lost in when and why you use Symbols. I have read the other posts on Stackoverflow as well as Googled for several explanations. But I do not understand it yet.
First I thought symbols was just a way to create some sort of "named constant" without having to go through the same process as in let say Java.
:all
instead of making a constant with an arbitrary value public static final String ALL = 8;
However it does not make much sense when you use it in e.g. attr_accessor :first_name etc.
Are Symbols just a lightweight String class? I am having problems understanding how I should interpret, when and how to use symbols both in my own classes and in frameworks.

In short, symbols are lightweight strings, but they also are immutable and non-garbage-collectable.
You should not use them as immutable strings in your data processing tasks (remember, once symbol is created, it can't be destroyed). You typically use symbols for naming things.
# typical use cases
# access hash value
user = User.find(params[:id])
# name something
attr_accessor :first_name
# set hash value in opts parameter
db.collection.update(query, update, multi: true, upsert: true)
Let's take first example, params[:id]. In a moderately big rails app there may be hundreds/thousands of those scattered around the codebase. If we accessed that value with a string, params["id"], that means new string allocation each time (and that string needs to be collected afterwards). In case of symbol, it's actually the same symbol everywhere. Less work for memory allocator, garbage collector and even you (: is faster to type than "")
If you have a simple one-word string that appears often in your code and you don't do something funky to it (interpolation, gsub, upcase, etc), then it's likely a good candidate to be a symbol.
However, does this apply only to text that is used as part of the actual program logic such as naming, not text that you get while actually running the program...such as text from the user/web etc?
I can not think of a single case where I'd want to turn data from user/web to symbol (except for parsing command-line options, maybe). Mainly because of the consequences (once created symbols live forever).
Also, many editors provide different coloring for symbols, to highlight them in the code. Take a look at this example

The O'Reilly Ruby Cookbook (p. 15) quotes Jim Weirich as saying:
If the contents (the sequence of characters) of the object are important, use a string.
If the identity of the object is important, use a symbol.
Symbols are generally used as hash keys, because it's the identity of the key that's important. Symbols are also required when passing messages using certain methods like Object#send.

A Ruby implementation typically has a table in which it stores the names of all classes, methods and variables. It refers to say a method name by the position in the table, avoiding expensive string comparisons. But you can use this table too and add values to it: symbols.
If you write code that uses strings as identifiers rather than for their textual content, consider symbols. If you write a method that expects an argument to be either 'male' or 'female', consider using :male and :female . Comparing two symbols for equality is faster than strings (that's why symbols make good hash keys).

Symbols are used for naming things in the language: the names of classes, the names of methods etc.
These are very like strings, except they can never be garbage collected, and testing for equality is optimised to be very quick.
The Java implementation has a very similar thing, except that it is not available for runtime use. What I mean is, when you write java code like obj.someMethod(4), the string 'someMethod' is converted by the compiler into a symbol which is embedded in a lookup table in the .class file. These symbols are like 'special' strings which are not garbage collected, and which are very fast to compare for equality. This is almost identical to Ruby, except that Ruby allows you to create new symbols at runtime, whereas Java only allows it at compile time.
This is just like creating new methods -- Java allows it at compile time; Ruby allows it at runtime.

After ruby version 2.2 symbol GC was removed, so now mortal symbols i.e when we convert string to symbol ("mortal".to_sym) gets cleaned up from memory.
check this out:
require 'objspace'
ObjectSpace.count_symbols
{
:mortal_dynamic_symbol=>3,
:immortal_dynamic_symbol=>5,
:immortal_static_symbol=>3663,
:immortal_symbol=>3668
}
source: https://www.rubyguides.com/2018/02/ruby-symbols/

Related

When to use symbols in Ruby [duplicate]

This question already has answers here:
How to understand symbols in Ruby
(11 answers)
Using Ruby Symbols
(5 answers)
Closed 8 years ago.
I'm not clear on the value and proper use of symbols.
The benefit seems to be that they remove the need for multiple copies of the same hash by letting it exist only in memory. I wonder whether this is true and what other benefits this brings.
If I were creating a user object with properties such as name, email, and password, and used symbols for each property instead of strings, does that mean that there is only one object for each property? It seems like this would avoid a string copy for the properties in the hash (which seems like a good thing).
Can someone help me understand what a symbol is and when it's better to use one over a string in a hash? What are the benefits and pitfalls of each?
Also, can anyone speak to the memory tradeoffs of each? With scalability being important, I'm curious if symbols would help with speed.
Symbols, or "internals" as they're also referred to as, are useful for hash keys, common arguments, and other places where the overhead of having many, many duplicate strings with the same value is inefficient.
For example:
params[:name]
my_function(with: { arguments: [ ... ] })
record.state = :completed
These are generally preferable to strings because they will be repeated frequently.
The most common uses are:
Hash keys
Arguments to methods
Option flags or enum-type property values
It's better to use strings when handling user data of an unknown composition. Unlike strings which can be garbage collected, symbols are permanent. Converting arbitrary user data to symbols may fill up the symbol table with junk and possibly crash your application if someone's being malicious.
For example:
user_data = JSON.load(...).symbolize_keys
This would allow an attacker to create JSON data with intentionally long, randomized names that, in time, would bloat your process with all kinds of useless junk.
Besides avoiding the need for repeated memory allocation, symbols can be compared for equality faster than strings, and their hash codes can be computed faster than strings (so both storing and retrieving data from a Hash will be faster when symbol rather than string keys are used).
Internally, Ruby uses something closely related to symbols to identify methods, the names of classes, and so on. So, for example, when you retrieve a list of the methods an object supports (with obj.methods), you get back an array of symbols. When you want to call a method "dynamically", using a name stored in a variable or passed in as an argument, you must use a symbol. Likewise for getting/setting the values of instance variables, constants, and so on.
Intuitively, you can think of it this way. If you've ever programmed in C, you have written things like:
#define SOMETHING 1
#define SOMETHING_ELSE 2
These defines eliminate the need to use "magic numbers" in your code. The names used (SOMETHING, etc) are not relevant to users of your program, just as the names of functions or classes are not relevant to users. They are just "labels" which are internal to the code, and are of concern only to the programmer. Symbols play a similar role in Ruby programs. They are a data type with performance properties similar to integers, but with a literal syntax which makes them appear as meaningful names to a human programmer.
Once you "get" the concept of Ruby symbols, understanding Lisp symbols will be much easier, if you ever program in Lisp. In Lisp, symbols are the basic data type which program code is composed of. (Because Lisp programs are data, and can be manipulated as such.)
You should think about symbols like a numbers. It is constant, immutable and non-gc object that is created on first usage and you should use them whenever you need to reference to object that cannot be duplicated, like:
messages aka methods (Ruby doesn't have overloading)
hash keys (Ruby doesn't have multi hashes)
Yes, your example is fine.
name, email, and password could all be stored as symbols, even in a hash - the specific object could still be a string object.
{
:name => 'John doe',
:email => 'foo#hotmail.com',
:password => 'lassdgjkl23853'
}

Why aren't the arguments to File.new symbols instead of strings?

I was wondering why the people who wrote the File library decided to make the arguments that determine what mode the file is opened in strings instead of symbols.
For example, this is how it is now:
f = File.new('file', 'rw')
But wouldn't it be a better design to do
f = File.new('file', :rw)
or even
f = File.new(:file, :rw)
for example? This seems to be the perfect place to use them since the argument definitely doesn't need to be mutable.
I am interested in knowing why it came out this way.
Update: I just got done reading a related question about symbols vs. strings, and I think the consensus was that symbols are just not as well known as strings, and everyone is used to using strings to index hash tables anyway. However, I don't think it would be valid for the designers of Ruby's standard library to plead ignorance on the subject of symbols, so I don't think that's the reason.
I'm no expert in the history of ruby, but you really have three options when you want parameters to a method: strings, symbols, and static classes.
For example, exception handling. Each exception is actually a type of class Exception.
ArgumentError.is_a? Class
=> True
So you could have each permission for the stream be it's own class. But that would require even more classes to be generated for the system.
The thing about symbols is they are never deleted. Every symbol you generate is preserved indefinitely; it's why using the method '.to_sym' lightly is discouraged. It leads to memory leaks.
Strings are just easier to manipulate. If you got the input mode from the user, you would need a '.to_sym' somewhere in your code, or at the very least, a large switch statement. With a string, you can just pass the user input directly to the method (if you were so trusting, of course).
Also, in C, you pass a character to the file i/o method. There are no Chars in ruby, just strings. Seeing as how ruby is built on C, that could be where it comes from.
It is simply a relic from previous languages.

What are some example use cases for symbol literals in Scala?

The use of symbol literals is not immediately clear from what I've read up on Scala. Would anyone care to share some real world uses?
Is there a particular Java idiom being covered by symbol literals? What languages have similar constructs? I'm coming from a Python background and not sure there's anything analogous in that language.
What would motivate me to use 'HelloWorld vs "HelloWorld"?
Thanks
In Java terms, symbols are interned strings. This means, for example, that reference equality comparison (eq in Scala and == in Java) gives the same result as normal equality comparison (== in Scala and equals in Java): 'abcd eq 'abcd will return true, while "abcd" eq "abcd" might not, depending on JVM's whims (well, it should for literals, but not for strings created dynamically in general).
Other languages which use symbols are Lisp (which uses 'abcd like Scala), Ruby (:abcd), Erlang and Prolog (abcd; they are called atoms instead of symbols).
I would use a symbol when I don't care about the structure of a string and use it purely as a name for something. For example, if I have a database table representing CDs, which includes a column named "price", I don't care that the second character in "price" is "r", or about concatenating column names; so a database library in Scala could reasonably use symbols for table and column names.
If you have plain strings representing say method names in code, that perhaps get passed around, you're not quite conveying things appropriately. This is sort of the Data/Code boundary issue, it's not always easy to the draw the line, but if we were to say that in that example those method names are more code than they are data, then we want something to clearly identify that.
A Symbol Literal comes into play where it clearly differentiates just any old string data with a construct being used in the code. It's just really there where you want to indicate, this isn't just some string data, but in fact in some way part of the code. The idea being things like your IDE would highlight it differently, and given the tooling, you could refactor on those, rather than doing text search/replace.
This link discusses it fairly well.
Note: Symbols will be deprecated and then removed in Scala 3 (dotty).
Reference: http://dotty.epfl.ch/docs/reference/dropped-features/symlits.html
Because of this, I personally recommend not using Symbols anymore (at least in new scala code). As the dotty documentation states:
Symbol literals are no longer supported
it is recommended to use a plain string literal [...] instead
Python mantains an internal global table of "interned strings" with the names of all variables, functions, modules, etc. With this table, the interpreter can make faster searchs and optimizations. You can force this process with the intern function (sys.intern in python3).
Also, Java and Scala automatically use "interned strings" for faster searchs. With scala, you can use the intern method to force the intern of a string, but this process don't works with all strings. Symbols benefit from being guaranteed to be interned, so a single reference equality check is both sufficient to prove equality or inequality.

Is a symbol table in Ruby any different from a symbol table in other languages

The wikipedia entry on Symbol tables is a good reference:
http://en.wikipedia.org/wiki/Symbol_table
But as I try to understand symbols in Ruby and how they are represented in the Array of Symbols (returned by the Symbol.all_symbols method),
I'm wondering whether Ruby's approach to the symbol table has any important differences from other languages?
Ruby doesn't really have a "symbol table" in that sense. It has bindings, and symbols (what lispers call atoms) but it isn't really doing it the way that article describes.
So in answer to your question: it isn't so much that ruby has the same thing done differently, but rather that it does two different things (:xxx notation --> unique ids and bindings in scopes) and uses similar / overlapping terminology for them.
To clarify:
The article you link to gives the conventional definition of a symbol table, to wit
where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source, such as its type, scope level and sometimes its location
But this isn't what ruby's symbol table does. It just provides a globally unique identity for a certain class of objects which can be written as :something in the source code, including things like :+ and :"Hi bob!" which aren't identifiers. Also, merely using an identifier will not create a corresponding symbol. And finally, none of the information listed in the passage above is stored in ruby's list of symbols.
It's a coincidence of naming, and reading that article will not help you understand ruby's symbols.
The biggest difference is that (like Lisp) Ruby actually has a syntax for symbols, and it's easy to add/remove things at runtime yourself. If you say :balloon (or "balloon".intern) it will intern that for you. Even though you're referring to it by name in your source, internally it's just a pointer in the symbol table. If you compare symbols, it's just a pointer-compare, not a string-compare.
Languages like C don't really have a way to say simply "create a new symbol for me" at runtime. You can do it implicitly at compile-time by defining a function, but that's really its only use. Since C has no syntax for symbols, if you want to be able to say Balloon in your program but be able to compare it with a single machine instruction, you use enums (or #defines).
In Ruby, it takes only one character to make a symbol, so you can use it for all kinds of things (like hash keys).
Symbols in Ruby are used where other languages tend to use enums, defines, constants and the like. They're also often used for associative keys. Their use has little to do with a symbol table as discussed in that article, except that they obviously exist in one.

Ruby symbols are not garbage collected!? Then, isn't it better to use a String?

If you create 10,000 strings in a loop, a lot of garbage collection has to take place which uses up a lot of resources.
If you do the same thing with symbols, you create objects which cannot be garbage collected.
Which is worse?
If you refer to the same symbol in your loop, then it doesn't have to recreate that object everytime i.e.
while i < 10000
i += 1
:im_using_this_symbol_here
end
Now if you use a string there instead, the string will be recreated 10K times. In general, use symbols in cases where you almost treat the literal like a constant or a key. A very good example for me would be
link_to "News", :action => 'news'
instead of
link_to "News", "action" => 'news'
action being re-used over and over again within your application.
Seeing as symbols are almost always created via literals, there isn't much potential for a memory explosion here. Their behavior is pretty much required by their usage: every time you refer to a symbol, it's the same one.
Similarly, strings need to be unique in Ruby. This is due to the way they're used - text processing etc.
Decide which one to use depending on their semantics, don't optimize prematurely.
If you are using Ruby 2.2.0 or later, it should usually be OK to dynamically create a lot of symbols, because they will be garbage collected according to the Ruby 2.2.0-preview1 announcement, which has a link to more details about the new symbol GC. However, if you pass your dynamic symbols to some kind of code that converts it to an ID (an internal Ruby implementation concept used in the C source code), then in that case it will get pinned and never get garbage collected. I'm not sure how commonly that happens.
When deciding whether to use symbols or strings you should consider:
Symbols cannot be changed after they are created.
Symbols do not have a lot of the methods that strings have, like start_with?
Symbols can very efficiently be compared to eachother for equality.
Symbols are supposed to represent the name of something according to the Symbol docs. I wouldn't use them to store anything that couldn't be considered a name.

Resources