This question already has answers here:
How to understand symbols in Ruby
(11 answers)
Using Ruby Symbols
(5 answers)
Closed 8 years ago.
I'm not clear on the value and proper use of symbols.
The benefit seems to be that they remove the need for multiple copies of the same hash by letting it exist only in memory. I wonder whether this is true and what other benefits this brings.
If I were creating a user object with properties such as name, email, and password, and used symbols for each property instead of strings, does that mean that there is only one object for each property? It seems like this would avoid a string copy for the properties in the hash (which seems like a good thing).
Can someone help me understand what a symbol is and when it's better to use one over a string in a hash? What are the benefits and pitfalls of each?
Also, can anyone speak to the memory tradeoffs of each? With scalability being important, I'm curious if symbols would help with speed.
Symbols, or "internals" as they're also referred to as, are useful for hash keys, common arguments, and other places where the overhead of having many, many duplicate strings with the same value is inefficient.
For example:
params[:name]
my_function(with: { arguments: [ ... ] })
record.state = :completed
These are generally preferable to strings because they will be repeated frequently.
The most common uses are:
Hash keys
Arguments to methods
Option flags or enum-type property values
It's better to use strings when handling user data of an unknown composition. Unlike strings which can be garbage collected, symbols are permanent. Converting arbitrary user data to symbols may fill up the symbol table with junk and possibly crash your application if someone's being malicious.
For example:
user_data = JSON.load(...).symbolize_keys
This would allow an attacker to create JSON data with intentionally long, randomized names that, in time, would bloat your process with all kinds of useless junk.
Besides avoiding the need for repeated memory allocation, symbols can be compared for equality faster than strings, and their hash codes can be computed faster than strings (so both storing and retrieving data from a Hash will be faster when symbol rather than string keys are used).
Internally, Ruby uses something closely related to symbols to identify methods, the names of classes, and so on. So, for example, when you retrieve a list of the methods an object supports (with obj.methods), you get back an array of symbols. When you want to call a method "dynamically", using a name stored in a variable or passed in as an argument, you must use a symbol. Likewise for getting/setting the values of instance variables, constants, and so on.
Intuitively, you can think of it this way. If you've ever programmed in C, you have written things like:
#define SOMETHING 1
#define SOMETHING_ELSE 2
These defines eliminate the need to use "magic numbers" in your code. The names used (SOMETHING, etc) are not relevant to users of your program, just as the names of functions or classes are not relevant to users. They are just "labels" which are internal to the code, and are of concern only to the programmer. Symbols play a similar role in Ruby programs. They are a data type with performance properties similar to integers, but with a literal syntax which makes them appear as meaningful names to a human programmer.
Once you "get" the concept of Ruby symbols, understanding Lisp symbols will be much easier, if you ever program in Lisp. In Lisp, symbols are the basic data type which program code is composed of. (Because Lisp programs are data, and can be manipulated as such.)
You should think about symbols like a numbers. It is constant, immutable and non-gc object that is created on first usage and you should use them whenever you need to reference to object that cannot be duplicated, like:
messages aka methods (Ruby doesn't have overloading)
hash keys (Ruby doesn't have multi hashes)
Yes, your example is fine.
name, email, and password could all be stored as symbols, even in a hash - the specific object could still be a string object.
{
:name => 'John doe',
:email => 'foo#hotmail.com',
:password => 'lassdgjkl23853'
}
First time I tried learning Ruby was 2 years ago, now I have started again. The reason I stopped was because I could not understand the Symbol class. And now I am at the same point again, completely lost in when and why you use Symbols. I have read the other posts on Stackoverflow as well as Googled for several explanations. But I do not understand it yet.
First I thought symbols was just a way to create some sort of "named constant" without having to go through the same process as in let say Java.
:all
instead of making a constant with an arbitrary value public static final String ALL = 8;
However it does not make much sense when you use it in e.g. attr_accessor :first_name etc.
Are Symbols just a lightweight String class? I am having problems understanding how I should interpret, when and how to use symbols both in my own classes and in frameworks.
In short, symbols are lightweight strings, but they also are immutable and non-garbage-collectable.
You should not use them as immutable strings in your data processing tasks (remember, once symbol is created, it can't be destroyed). You typically use symbols for naming things.
# typical use cases
# access hash value
user = User.find(params[:id])
# name something
attr_accessor :first_name
# set hash value in opts parameter
db.collection.update(query, update, multi: true, upsert: true)
Let's take first example, params[:id]. In a moderately big rails app there may be hundreds/thousands of those scattered around the codebase. If we accessed that value with a string, params["id"], that means new string allocation each time (and that string needs to be collected afterwards). In case of symbol, it's actually the same symbol everywhere. Less work for memory allocator, garbage collector and even you (: is faster to type than "")
If you have a simple one-word string that appears often in your code and you don't do something funky to it (interpolation, gsub, upcase, etc), then it's likely a good candidate to be a symbol.
However, does this apply only to text that is used as part of the actual program logic such as naming, not text that you get while actually running the program...such as text from the user/web etc?
I can not think of a single case where I'd want to turn data from user/web to symbol (except for parsing command-line options, maybe). Mainly because of the consequences (once created symbols live forever).
Also, many editors provide different coloring for symbols, to highlight them in the code. Take a look at this example
The O'Reilly Ruby Cookbook (p. 15) quotes Jim Weirich as saying:
If the contents (the sequence of characters) of the object are important, use a string.
If the identity of the object is important, use a symbol.
Symbols are generally used as hash keys, because it's the identity of the key that's important. Symbols are also required when passing messages using certain methods like Object#send.
A Ruby implementation typically has a table in which it stores the names of all classes, methods and variables. It refers to say a method name by the position in the table, avoiding expensive string comparisons. But you can use this table too and add values to it: symbols.
If you write code that uses strings as identifiers rather than for their textual content, consider symbols. If you write a method that expects an argument to be either 'male' or 'female', consider using :male and :female . Comparing two symbols for equality is faster than strings (that's why symbols make good hash keys).
Symbols are used for naming things in the language: the names of classes, the names of methods etc.
These are very like strings, except they can never be garbage collected, and testing for equality is optimised to be very quick.
The Java implementation has a very similar thing, except that it is not available for runtime use. What I mean is, when you write java code like obj.someMethod(4), the string 'someMethod' is converted by the compiler into a symbol which is embedded in a lookup table in the .class file. These symbols are like 'special' strings which are not garbage collected, and which are very fast to compare for equality. This is almost identical to Ruby, except that Ruby allows you to create new symbols at runtime, whereas Java only allows it at compile time.
This is just like creating new methods -- Java allows it at compile time; Ruby allows it at runtime.
After ruby version 2.2 symbol GC was removed, so now mortal symbols i.e when we convert string to symbol ("mortal".to_sym) gets cleaned up from memory.
check this out:
require 'objspace'
ObjectSpace.count_symbols
{
:mortal_dynamic_symbol=>3,
:immortal_dynamic_symbol=>5,
:immortal_static_symbol=>3663,
:immortal_symbol=>3668
}
source: https://www.rubyguides.com/2018/02/ruby-symbols/
The use of symbol literals is not immediately clear from what I've read up on Scala. Would anyone care to share some real world uses?
Is there a particular Java idiom being covered by symbol literals? What languages have similar constructs? I'm coming from a Python background and not sure there's anything analogous in that language.
What would motivate me to use 'HelloWorld vs "HelloWorld"?
Thanks
In Java terms, symbols are interned strings. This means, for example, that reference equality comparison (eq in Scala and == in Java) gives the same result as normal equality comparison (== in Scala and equals in Java): 'abcd eq 'abcd will return true, while "abcd" eq "abcd" might not, depending on JVM's whims (well, it should for literals, but not for strings created dynamically in general).
Other languages which use symbols are Lisp (which uses 'abcd like Scala), Ruby (:abcd), Erlang and Prolog (abcd; they are called atoms instead of symbols).
I would use a symbol when I don't care about the structure of a string and use it purely as a name for something. For example, if I have a database table representing CDs, which includes a column named "price", I don't care that the second character in "price" is "r", or about concatenating column names; so a database library in Scala could reasonably use symbols for table and column names.
If you have plain strings representing say method names in code, that perhaps get passed around, you're not quite conveying things appropriately. This is sort of the Data/Code boundary issue, it's not always easy to the draw the line, but if we were to say that in that example those method names are more code than they are data, then we want something to clearly identify that.
A Symbol Literal comes into play where it clearly differentiates just any old string data with a construct being used in the code. It's just really there where you want to indicate, this isn't just some string data, but in fact in some way part of the code. The idea being things like your IDE would highlight it differently, and given the tooling, you could refactor on those, rather than doing text search/replace.
This link discusses it fairly well.
Note: Symbols will be deprecated and then removed in Scala 3 (dotty).
Reference: http://dotty.epfl.ch/docs/reference/dropped-features/symlits.html
Because of this, I personally recommend not using Symbols anymore (at least in new scala code). As the dotty documentation states:
Symbol literals are no longer supported
it is recommended to use a plain string literal [...] instead
Python mantains an internal global table of "interned strings" with the names of all variables, functions, modules, etc. With this table, the interpreter can make faster searchs and optimizations. You can force this process with the intern function (sys.intern in python3).
Also, Java and Scala automatically use "interned strings" for faster searchs. With scala, you can use the intern method to force the intern of a string, but this process don't works with all strings. Symbols benefit from being guaranteed to be interned, so a single reference equality check is both sufficient to prove equality or inequality.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I've read some of the recent language vs. language questions with interest... Perl vs. Python, Python vs. Java, Can one language be better than another?
One thing I've noticed is that a lot of us have very superficial reasons for disliking languages. We notice these things at first glance and they turn us off. We shun what are probably perfectly good languages as a result of features that we'd probably learn to love or ignore in 2 seconds if we bothered.
Well, I'm as guilty as the next guy, if not more. Here goes:
Ruby: All the Ruby example code I see uses the puts command, and that's a sort of childish Yiddish anatomical term. So as a result, I can't take Ruby code seriously even though I should.
Python: The first time I saw it, I smirked at the whole significant whitespace thing. I avoided it for the next several years. Now I hardly use anything else.
Java: I don't like identifiersThatLookLikeThis. I'm not sure why exactly.
Lisp: I have trouble with all the parentheses. Things of different importance and purpose (function declarations, variable assignments, etc.) are not syntactically differentiated and I'm too lazy to learn what's what.
Fortran: uppercase everything hurts my eyes. I know modern code doesn't have to be written like that, but most example code is...
Visual Basic: it bugs me that Dim is used to declare variables, since I remember the good ol' days of GW-BASIC when it was only used to dimension arrays.
What languages did look right to me at first glance? Perl, C, QBasic, JavaScript, assembly language, BASH shell, FORTH.
Okay, now that I've aired my dirty laundry... I want to hear yours. What are your language hangups? What superficial features bother you? How have you gotten over them?
I hate Hate HATE "End Function" and "End IF" and "If... Then" parts of VB. I would much rather see a curly bracket instead.
PHP's function name inconsistencies.
// common parameters back-to-front
in_array(needle, haystack);
strpos(haystack, needle);
// _ to separate words, or not?
filesize();
file_exists;
// super globals prefix?
$GLOBALS;
$_POST;
I never really liked the keywords spelled backwards in some scripting shells
if-then-fi is bad enough, but case-in-esac is just getting silly
I just thought of another... I hate the mostly-meaningless URLs used in XML to define namespaces, e.g. xmlns="http://purl.org/rss/1.0/"
Pascal's Begin and End. Too verbose, not subject to bracket matching, and worse, there isn't a Begin for every End, eg.
Type foo = Record
// ...
end;
Although I'm mainly a PHP developer, I dislike languages that don't let me do enough things inline. E.g.:
$x = returnsArray();
$x[1];
instead of
returnsArray()[1];
or
function sort($a, $b) {
return $a < $b;
}
usort($array, 'sort');
instead of
usort($array, function($a, $b) { return $a < $b; });
I like object-oriented style. So it bugs me in Python to see len(str) to get the length of a string, or splitting strings like split(str, "|") in another language. That is fine in C; it doesn't have objects. But Python, D, etc. do have objects and use obj.method() other places. (I still think Python is a great language.)
Inconsistency is another big one for me. I do not like inconsistent naming in the same library: length(), size(), getLength(), getlength(), toUTFindex() (why not toUtfIndex?), Constant, CONSTANT, etc.
The long names in .NET bother me sometimes. Can't they shorten DataGridViewCellContextMenuStripNeededEventArgs somehow? What about ListViewVirtualItemsSelectionRangeChangedEventArgs?
And I hate deep directory trees. If a library/project has a 5 level deep directory tree, I'm going to have trouble with it.
C and C++'s syntax is a bit quirky. They reuse operators for different things. You're probably so used to it that you don't think about it (nor do I), but consider how many meanings parentheses have:
int main() // function declaration / definition
printf("hello") // function call
(int)x // type cast
2*(7+8) // override precedence
int (*)(int) // function pointer
int x(3) // initializer
if (condition) // special part of syntax of if, while, for, switch
And if in C++ you saw
foo<bar>(baz(),baaz)
you couldn't know the meaning without the definition of foo and bar.
the < and > might be a template instantiation, or might be less-than and greater-than (unusual but legal)
the () might be a function call, or might be just surrounding the comma operator (ie. perform baz() for size-effects, then return baaz).
The silly thing is that other languages have copied some of these characteristics!
Java, and its checked exceptions. I left Java for a while, dwelling in the .NET world, then recently came back.
It feels like, sometimes, my throws clause is more voluminous than my method content.
There's nothing in the world I hate more than php.
Variables with $, that's one extra odd character for every variable.
Members are accessed with -> for no apparent reason, one extra character for every member access.
A freakshow of language really.
No namespaces.
Strings are concatenated with ..
A freakshow of language.
All the []s and #s in Objective C. Their use is so different from the underlying C's native syntax that the first time I saw them it gave the impression that all the object-orientation had been clumsily bolted on as an afterthought.
I abhor the boiler plate verbosity of Java.
writing getters and setters for properties
checked exception handling and all the verbiage that implies
long lists of imports
Those, in connection with the Java convention of using veryLongVariableNames, sometimes have me thinking I'm back in the 80's, writing IDENTIFICATION DIVISION. at the top of my programs.
Hint: If you can automate the generation of part of your code in your IDE, that's a good hint that you're producing boilerplate code. With automated tools, it's not a problem to write, but it's a hindrance every time someone has to read that code - which is more often.
While I think it goes a bit overboard on type bureaucracy, Scala has successfully addressed some of these concerns.
Coding Style inconsistencies in team projects.
I'm working on a large team project where some contributors have used 4 spaces instead of the tab character.
Working with their code can be very annoying - I like to keep my code clean and with a consistent style.
It's bad enough when you use different standards for different languages, but in a web project with HTML, CSS, Javascript, PHP and MySQL, that's 5 languages, 5 different styles, and multiplied by the number of people working on the project.
I'd love to re-format my co-workers code when I need to fix something, but then the repository would think I changed every line of their code.
It irritates me sometimes how people expect there to be one language for all jobs. Depending on the task you are doing, each language has its advantages and disadvantages. I like the C-based syntax languages because it's what I'm most used to and I like the flexibility they tend to bestow on the developer. Of course, with great power comes great responsibility, and having the power to write 150 line LINQ statements doesn't mean you should.
I love the inline XML in the latest version of VB.NET although I don't like working with VB mainly because I find the IDE less helpful than the IDE for C#.
If Microsoft had to invent yet another C++-like language in C# why didn't they correct Java's mistake and implement support for RAII?
Case sensitivity.
What kinda hangover do you need to think that differentiating two identifiers solely by caSE is a great idea?
I hate semi-colons. I find they add a lot of noise and you rarely need to put two statements on a line. I prefer the style of Python and other languages... end of line is end of a statement.
Any language that can't fully decide if Arrays/Loop/string character indexes are zero based or one based.
I personally prefer zero based, but any language that mixes the two, or lets you "configure" which is used can drive you bonkers. (Apache Velocity - I'm looking in your direction!)
snip from the VTL reference (default is 1, but you can set it to 0):
# Default starting value of the loop
# counter variable reference.
directive.foreach.counter.initial.value = 1
(try merging 2 projects that used different counter schemes - ugh!)
In no particular order...
OCaml
Tuples definitions use * to separate items rather than ,. So, ("Juliet", 23, true) has the type (string * int * bool).
For being such an awesome language, the documentation has this haunting comment on threads: "The threads library is implemented by time-sharing on a single processor. It will not take advantage of multi-processor machines. Using this library will therefore never make programs run faster." JoCaml doesn't fix this problem.
^^^ I've heard the Jane Street guys were working to add concurrent GC and multi-core threads to OCaml, but I don't know how successful they've been. I can't imagine a language without multi-core threads and GC surviving very long.
No easy way to explore modules in the toplevel. Sure, you can write module q = List;; and the toplevel will happily print out the module definition, but that just seems hacky.
C#
Lousy type inference. Beyond the most trivial expressions, I have to give types to generic functions.
All the LINQ code I ever read uses method syntax, x.Where(item => ...).OrderBy(item => ...). No one ever uses expression syntax, from item in x where ... orderby ... select. Between you and me, I think expression syntax is silly, if for no other reason than that it looks "foreign" against the backdrop of all other C# and VB.NET code.
LINQ
Every other language uses the industry standard names are Map, Fold/Reduce/Inject, and Filter. LINQ has to be different and uses Select, Aggregate, and Where.
Functional Programming
Monads are mystifying. Having seen the Parser monad, Maybe monad, State, and List monads, I can understand perfectly how the code works; however, as a general design pattern, I can't seem to look at problems and say "hey, I bet a monad would fit perfect here".
Ruby
GRRRRAAAAAAAH!!!!! I mean... seriously.
VB
Module Hangups
Dim _juliet as String = "Too Wordy!"
Public Property Juliet() as String
Get
Return _juliet
End Get
Set (ByVal value as String)
_juliet = value
End Set
End Property
End Module
And setter declarations are the bane of my existence. Alright, so I change the data type of my property -- now I need to change the data type in my setter too? Why doesn't VB borrow from C# and simply incorporate an implicit variable called value?
.NET Framework
I personally like Java casing convention: classes are PascalCase, methods and properties are camelCase.
In C/C++, it annoys me how there are different ways of writing the same code.
e.g.
if (condition)
{
callSomeConditionalMethod();
}
callSomeOtherMethod();
vs.
if (condition)
callSomeConditionalMethod();
callSomeOtherMethod();
equate to the same thing, but different people have different styles. I wish the original standard was more strict about making a decision about this, so we wouldn't have this ambiguity. It leads to arguments and disagreements in code reviews!
I found Perl's use of "defined" and "undefined" values to be so useful that I have trouble using scripting languages without it.
Perl:
($lastname, $firstname, $rest) = split(' ', $fullname);
This statement performs well no matter how many words are in $fullname. Try it in Python, and it explodes if $fullname doesn't contain exactly three words.
SQL, they say you should not use cursors and when you do, you really understand why...
its so heavy going!
DECLARE mycurse CURSOR LOCAL FAST_FORWARD READ_ONLY
FOR
SELECT field1, field2, fieldN FROM atable
OPEN mycurse
FETCH NEXT FROM mycurse INTO #Var1, #Var2, #VarN
WHILE ##fetch_status = 0
BEGIN
-- do something really clever...
FETCH NEXT FROM mycurse INTO #Var1, #Var2, #VarN
END
CLOSE mycurse
DEALLOCATE mycurse
Although I program primarily in python, It irks me endlessly that lambda body's must be expressions.
I'm still wrapping my brain around JavaScript, and as a whole, Its mostly acceptable. Why is it so hard to create a namespace. In TCL they're just ugly, but in JavaScript, it's actually a rigmarole AND completely unreadable.
In SQL how come everything is just one, huge freekin SELECT statement.
In Ruby, I very strongly dislike how methods do not require self. to be called on current instance, but properties do (otherwise they will clash with locals); i.e.:
def foo()
123
end
def foo=(x)
end
def bar()
x = foo() # okay, same as self.foo()
x = foo # not okay, reads unassigned local variable foo
foo = 123 # not okay, assigns local variable foo
end
To my mind, it's very inconsistent. I'd rather prefer to either always require self. in all cases, or to have a sigil for locals.
Java's packages. I find them complex, more so because I am not a corporation.
I vastly prefer namespaces. I'll get over it, of course - I'm playing with the Android SDK, and Eclipse removes a lot of the pain. I've never had a machine that could run it interactively before, and now I do I'm very impressed.
Prolog's if-then-else syntax.
x -> y ; z
The problem is that ";" is the "or" operator, so the above looks like "x implies y or z".
Java
Generics (Java version of templates) are limited. I can not call methods of the class and I can not create instances of the class. Generics are used by containers, but I can use containers of instances of Object.
No multiple inheritance. If a multiple inheritance use does not lead to diamond problem, it should be allowed. It should allow to write a default implementation of interface methods, a example of problem: the interface MouseListener has 5 methods, one for each event. If I want to handle just one of them, I have to implement the 4 other methods as an empty method.
It does not allow to choose to manually manage memory of some objects.
Java API uses complex combination of classes to do simple tasks. Example, if I want to read from a file, I have to use many classes (FileReader, FileInputStream).
Python
Indentation is part of syntax, I prefer to use the word "end" to indicate end of block and the word "pass" would not be needed.
In classes, the word "self" should not be needed as argument of functions.
C++
Headers are the worst problem. I have to list the functions in a header file and implement them in a cpp file. It can not hide dependencies of a class. If a class A uses the class B privately as a field, if I include the header of A, the header of B will be included too.
Strings and arrays came from C, they do not provide a length field. It is difficult to control if std::string and std::vector will use stack or heap. I have to use pointers with std::string and std::vector if I want to use assignment, pass as argument to a function or return it, because its "=" operator will copy entire structure.
I can not control the constructor and destructor. It is difficult to create an array of objects without a default constructor or choose what constructor to use with if and switch statements.
In most languages, file access. VB.NET is the only language so far where file access makes any sense to me. I do not understand why if I want to check if a file exists, I should use File.exists("") or something similar instead of creating a file object (actually FileInfo in VB.NET) and asking if it exists. And then if I want to open it, I ask it to open: (assuming a FileInfo object called fi) fi.OpenRead, for example. Returns a stream. Nice. Exactly what I wanted. If I want to move a file, fi.MoveTo. I can also do fi.CopyTo. What is this nonsense about not making files full-fledged objects in most languages? Also, if I want to iterate through the files in a directory, I can just create the directory object and call .GetFiles. Or I can do .GetDirectories, and I get a whole new set of DirectoryInfo objects to play with.
Admittedly, Java has some of this file stuff, but this nonsense of having to have a whole object to tell it how to list files is just silly.
Also, I hate ::, ->, => and all other multi-character operators except for <= and >= (and maybe -- and ++).
[Disclaimer: i only have a passing familiarity with VB, so take my comments with a grain of salt]
I Hate How Every Keyword In VB Is Capitalized Like This. I saw a blog post the other week (month?) about someone who tried writing VB code without any capital letters (they did something to a compiler that would let them compile VB code like that), and the language looked much nicer!
My big hangup is MATLAB's syntax. I use it, and there are things I like about it, but it has so many annoying quirks. Let's see.
Matrices are indexed with parentheses. So if you see something like Image(350,260), you have no clue from that whether we're getting an element from the Image matrix, or if we're calling some function called Image and passing arguments to it.
Scope is insane. I seem to recall that for loop index variables stay in scope after the loop ends.
If you forget to stick a semicolon after an assignment, the value will be dumped to standard output.
You may have one function per file. This proves to be very annoying for organizing one's work.
I'm sure I could come up with more if I thought about it.