Why would one use String keys for a hash over symbols - ruby

This question is supposed to be the reverse of this one: Why use symbols as hash keys in Ruby?
As you can see, everyone agrees that symbols are the most logical choice for hash keys. However, I'm building a web application with the sinatra framework, and the request object is a hash that contains Strings as keys. Can anyone explain why this would be a design choice?

Because the source of the keys--the query string--is made up of strings, so searching through this string for keys, it is most directly convenient to index the hash via the strings.
Every Symbol that is created in the Ruby runtime is allocated and never released. There is a theoretical (but unlikely) DOS attack available by sending hundreds of thousands of requests with unique query string parameters. If these were symbolized, each request would slowly grow the runtime memory pool.
Strings, on the other hand, may be garbage collected. Thousands of unique strings handled across various requests will eventually go away, with no long-term impact.
Edit: Note that with Sinatra, symbols are also available for accessing the params hash. However, this is done by creating a hash that is indexed by strings, and converting symbols (in your code) to strings when you make a request. Unless you do something like the following:
params.each{ |key,_| key.to_sym }
...you are not at risk of any symbol pseudo-DOS attack.

Related

When to use symbols in Ruby [duplicate]

This question already has answers here:
How to understand symbols in Ruby
(11 answers)
Using Ruby Symbols
(5 answers)
Closed 8 years ago.
I'm not clear on the value and proper use of symbols.
The benefit seems to be that they remove the need for multiple copies of the same hash by letting it exist only in memory. I wonder whether this is true and what other benefits this brings.
If I were creating a user object with properties such as name, email, and password, and used symbols for each property instead of strings, does that mean that there is only one object for each property? It seems like this would avoid a string copy for the properties in the hash (which seems like a good thing).
Can someone help me understand what a symbol is and when it's better to use one over a string in a hash? What are the benefits and pitfalls of each?
Also, can anyone speak to the memory tradeoffs of each? With scalability being important, I'm curious if symbols would help with speed.
Symbols, or "internals" as they're also referred to as, are useful for hash keys, common arguments, and other places where the overhead of having many, many duplicate strings with the same value is inefficient.
For example:
params[:name]
my_function(with: { arguments: [ ... ] })
record.state = :completed
These are generally preferable to strings because they will be repeated frequently.
The most common uses are:
Hash keys
Arguments to methods
Option flags or enum-type property values
It's better to use strings when handling user data of an unknown composition. Unlike strings which can be garbage collected, symbols are permanent. Converting arbitrary user data to symbols may fill up the symbol table with junk and possibly crash your application if someone's being malicious.
For example:
user_data = JSON.load(...).symbolize_keys
This would allow an attacker to create JSON data with intentionally long, randomized names that, in time, would bloat your process with all kinds of useless junk.
Besides avoiding the need for repeated memory allocation, symbols can be compared for equality faster than strings, and their hash codes can be computed faster than strings (so both storing and retrieving data from a Hash will be faster when symbol rather than string keys are used).
Internally, Ruby uses something closely related to symbols to identify methods, the names of classes, and so on. So, for example, when you retrieve a list of the methods an object supports (with obj.methods), you get back an array of symbols. When you want to call a method "dynamically", using a name stored in a variable or passed in as an argument, you must use a symbol. Likewise for getting/setting the values of instance variables, constants, and so on.
Intuitively, you can think of it this way. If you've ever programmed in C, you have written things like:
#define SOMETHING 1
#define SOMETHING_ELSE 2
These defines eliminate the need to use "magic numbers" in your code. The names used (SOMETHING, etc) are not relevant to users of your program, just as the names of functions or classes are not relevant to users. They are just "labels" which are internal to the code, and are of concern only to the programmer. Symbols play a similar role in Ruby programs. They are a data type with performance properties similar to integers, but with a literal syntax which makes them appear as meaningful names to a human programmer.
Once you "get" the concept of Ruby symbols, understanding Lisp symbols will be much easier, if you ever program in Lisp. In Lisp, symbols are the basic data type which program code is composed of. (Because Lisp programs are data, and can be manipulated as such.)
You should think about symbols like a numbers. It is constant, immutable and non-gc object that is created on first usage and you should use them whenever you need to reference to object that cannot be duplicated, like:
messages aka methods (Ruby doesn't have overloading)
hash keys (Ruby doesn't have multi hashes)
Yes, your example is fine.
name, email, and password could all be stored as symbols, even in a hash - the specific object could still be a string object.
{
:name => 'John doe',
:email => 'foo#hotmail.com',
:password => 'lassdgjkl23853'
}

Using Ruby Symbols

First time I tried learning Ruby was 2 years ago, now I have started again. The reason I stopped was because I could not understand the Symbol class. And now I am at the same point again, completely lost in when and why you use Symbols. I have read the other posts on Stackoverflow as well as Googled for several explanations. But I do not understand it yet.
First I thought symbols was just a way to create some sort of "named constant" without having to go through the same process as in let say Java.
:all
instead of making a constant with an arbitrary value public static final String ALL = 8;
However it does not make much sense when you use it in e.g. attr_accessor :first_name etc.
Are Symbols just a lightweight String class? I am having problems understanding how I should interpret, when and how to use symbols both in my own classes and in frameworks.
In short, symbols are lightweight strings, but they also are immutable and non-garbage-collectable.
You should not use them as immutable strings in your data processing tasks (remember, once symbol is created, it can't be destroyed). You typically use symbols for naming things.
# typical use cases
# access hash value
user = User.find(params[:id])
# name something
attr_accessor :first_name
# set hash value in opts parameter
db.collection.update(query, update, multi: true, upsert: true)
Let's take first example, params[:id]. In a moderately big rails app there may be hundreds/thousands of those scattered around the codebase. If we accessed that value with a string, params["id"], that means new string allocation each time (and that string needs to be collected afterwards). In case of symbol, it's actually the same symbol everywhere. Less work for memory allocator, garbage collector and even you (: is faster to type than "")
If you have a simple one-word string that appears often in your code and you don't do something funky to it (interpolation, gsub, upcase, etc), then it's likely a good candidate to be a symbol.
However, does this apply only to text that is used as part of the actual program logic such as naming, not text that you get while actually running the program...such as text from the user/web etc?
I can not think of a single case where I'd want to turn data from user/web to symbol (except for parsing command-line options, maybe). Mainly because of the consequences (once created symbols live forever).
Also, many editors provide different coloring for symbols, to highlight them in the code. Take a look at this example
The O'Reilly Ruby Cookbook (p. 15) quotes Jim Weirich as saying:
If the contents (the sequence of characters) of the object are important, use a string.
If the identity of the object is important, use a symbol.
Symbols are generally used as hash keys, because it's the identity of the key that's important. Symbols are also required when passing messages using certain methods like Object#send.
A Ruby implementation typically has a table in which it stores the names of all classes, methods and variables. It refers to say a method name by the position in the table, avoiding expensive string comparisons. But you can use this table too and add values to it: symbols.
If you write code that uses strings as identifiers rather than for their textual content, consider symbols. If you write a method that expects an argument to be either 'male' or 'female', consider using :male and :female . Comparing two symbols for equality is faster than strings (that's why symbols make good hash keys).
Symbols are used for naming things in the language: the names of classes, the names of methods etc.
These are very like strings, except they can never be garbage collected, and testing for equality is optimised to be very quick.
The Java implementation has a very similar thing, except that it is not available for runtime use. What I mean is, when you write java code like obj.someMethod(4), the string 'someMethod' is converted by the compiler into a symbol which is embedded in a lookup table in the .class file. These symbols are like 'special' strings which are not garbage collected, and which are very fast to compare for equality. This is almost identical to Ruby, except that Ruby allows you to create new symbols at runtime, whereas Java only allows it at compile time.
This is just like creating new methods -- Java allows it at compile time; Ruby allows it at runtime.
After ruby version 2.2 symbol GC was removed, so now mortal symbols i.e when we convert string to symbol ("mortal".to_sym) gets cleaned up from memory.
check this out:
require 'objspace'
ObjectSpace.count_symbols
{
:mortal_dynamic_symbol=>3,
:immortal_dynamic_symbol=>5,
:immortal_static_symbol=>3663,
:immortal_symbol=>3668
}
source: https://www.rubyguides.com/2018/02/ruby-symbols/

Why aren't the arguments to File.new symbols instead of strings?

I was wondering why the people who wrote the File library decided to make the arguments that determine what mode the file is opened in strings instead of symbols.
For example, this is how it is now:
f = File.new('file', 'rw')
But wouldn't it be a better design to do
f = File.new('file', :rw)
or even
f = File.new(:file, :rw)
for example? This seems to be the perfect place to use them since the argument definitely doesn't need to be mutable.
I am interested in knowing why it came out this way.
Update: I just got done reading a related question about symbols vs. strings, and I think the consensus was that symbols are just not as well known as strings, and everyone is used to using strings to index hash tables anyway. However, I don't think it would be valid for the designers of Ruby's standard library to plead ignorance on the subject of symbols, so I don't think that's the reason.
I'm no expert in the history of ruby, but you really have three options when you want parameters to a method: strings, symbols, and static classes.
For example, exception handling. Each exception is actually a type of class Exception.
ArgumentError.is_a? Class
=> True
So you could have each permission for the stream be it's own class. But that would require even more classes to be generated for the system.
The thing about symbols is they are never deleted. Every symbol you generate is preserved indefinitely; it's why using the method '.to_sym' lightly is discouraged. It leads to memory leaks.
Strings are just easier to manipulate. If you got the input mode from the user, you would need a '.to_sym' somewhere in your code, or at the very least, a large switch statement. With a string, you can just pass the user input directly to the method (if you were so trusting, of course).
Also, in C, you pass a character to the file i/o method. There are no Chars in ruby, just strings. Seeing as how ruby is built on C, that could be where it comes from.
It is simply a relic from previous languages.

Using Hash Tables as function inputs

I am working in a group that is writing some APIs for tools that we are using in Ruby. When writing API methods, many of my team mates use hash tables as the method's only parameter while I write my methods with each value specified.
For example, a class Apple defined as:
class Apple
#commonName
#volume
#color
end
I would instantiate the class with method:
Apple.new( commonName, volume, color )
My team mates would write it so the method looked like:
Apple.new( {"commonName"=>commonName, "volume"=>volume, "color"=>color )
I don't like using a hash table as the input. To me is seems unnecessarily bulky and doesn't add any clarity to the code. While it doesn't appear to be a big deal in this example, some of our methods have greater than 10 parameters and there will often be hash tables nested in inside other hash tables. I also noticed that using hash tables in this way is extremely uncommon in public APIs(net/telnet is the only exception that I can think of right now).
Question: What arguments could I make to my team members to not use hash tables as input parameters. The bulkiness of the code isn't a sufficient justification(they are not afraid of writing 200-400 character lines) and excessive memory/processing overhead won't work because it won't become an issue with the way our tools will be used.
Actually if your method takes more than 10 arguments, you should either redesign your class or eat dirt and use hashes. For any method that takes more than 4 arguments, using typical arguments can be counter-intuitive while calling the method, because you got to remember the order correctly.
I think best solution would be to simply redesign such methods and use something like builder or fluent patterns.
First of all, you should chide them for using strings instead of symbols for hash keys.
One issue with using a hash is that you then have to check that all the appropriate keys are in it. This makes it useful for optional parameters, but for mandatory one, why not use the built-in functionality of the language? For example, with their method, what happens if I do this:
Apple.new( {"commonName"=>commonName, "volume"=>volume} )
Whereas, with Apple.new(commonName, volume), you know you'll get an ArgumentError.
Named parameters make for more self-documenting code which is nice. But other than that there's not a lot of difference. The Hash allows for more flexibility, especially if you start doing any method aliasing. Also, the various Hash methods in ActiveSupport make setting defaults and verifying inputs pretty painless. I guess this probably wasn't the answer you were looking for.

Ruby symbols are not garbage collected!? Then, isn't it better to use a String?

If you create 10,000 strings in a loop, a lot of garbage collection has to take place which uses up a lot of resources.
If you do the same thing with symbols, you create objects which cannot be garbage collected.
Which is worse?
If you refer to the same symbol in your loop, then it doesn't have to recreate that object everytime i.e.
while i < 10000
i += 1
:im_using_this_symbol_here
end
Now if you use a string there instead, the string will be recreated 10K times. In general, use symbols in cases where you almost treat the literal like a constant or a key. A very good example for me would be
link_to "News", :action => 'news'
instead of
link_to "News", "action" => 'news'
action being re-used over and over again within your application.
Seeing as symbols are almost always created via literals, there isn't much potential for a memory explosion here. Their behavior is pretty much required by their usage: every time you refer to a symbol, it's the same one.
Similarly, strings need to be unique in Ruby. This is due to the way they're used - text processing etc.
Decide which one to use depending on their semantics, don't optimize prematurely.
If you are using Ruby 2.2.0 or later, it should usually be OK to dynamically create a lot of symbols, because they will be garbage collected according to the Ruby 2.2.0-preview1 announcement, which has a link to more details about the new symbol GC. However, if you pass your dynamic symbols to some kind of code that converts it to an ID (an internal Ruby implementation concept used in the C source code), then in that case it will get pinned and never get garbage collected. I'm not sure how commonly that happens.
When deciding whether to use symbols or strings you should consider:
Symbols cannot be changed after they are created.
Symbols do not have a lot of the methods that strings have, like start_with?
Symbols can very efficiently be compared to eachother for equality.
Symbols are supposed to represent the name of something according to the Symbol docs. I wouldn't use them to store anything that couldn't be considered a name.

Resources