When to use dump vs. generate vs. to_json and load vs. parse in Ruby's JSON lib? - ruby

david4dev's answer to this question claims that there are three equivalent ways to convert an object to a JSON string using the json library:
JSON.dump(object)
JSON.generate(object)
object.to_json
and two equivalent ways to convert a JSON string to an object:
JSON.load(string)
JSON.parse(string)
But looking at the source code, each of them seems to be pretty much different, and there are some differences between them (e.g., 1).
What are the differences among them? When to use which?

TL;DR:
In general:
Use to_json (or the equivalent JSON::generate).
Use JSON::parse.
For some special use cases, you may want dump or load, but it's unsafe to use load on data you didn't create yourself.
Extended Explanation:
JSON::dump vs JSON::generate
As part of its argument signature, JSON::generate allows you to set options such as indent levels and whitespace particulars. JSON::dump, on the other hand, calls ::generate within itself, with specific pre-set options, so you lose the ability to set those yourself.
According to the docs, JSON::dump is meant to be part of the Marshal::dump implementation scheme. The main reason you'd want to explicitly use ::dump yourself would be that you are about to stream your JSON data (over a socket for instance), since ::dump allows you to pass an IO-like object as the second argument. Unfortunately, the JSON data being produced is not really streamed as it is produced; it is created en masse and only sent once the JSON is fully created. This makes having an IO argument useful only in trivial cases.
The final difference between the two is that ::dump can also take a limit argument that causes it to raise an ArgumentError when a certain nesting depth is exceeded.
Comparison to #to_json
#to_json accepts options as arguments, so internal implementation aside, JSON::generate(foo, opts) and foo.to_json(opts) are equivalent.
JSON::load vs JSON::parse
Similar to ::dump calling ::generate internally, ::load calls ::parse internally. ::load, like ::dump, may also take an IO object, but again, the source is read all at once, so streaming is limited to trivial cases. However, unlike the ::dump/::generate duality, both ::load and ::parse accept options as part of their argument signatures.
::load can also be passed a proc, which will be called on every Ruby object parsed from the data; it also comes with a warning that ::load should only be used with trusted data. ::parse has no such restriction, and therefore JSON::parse is the correct choice for parsing untrusted data sources like user inputs and files or streams with unknown contents.

Related

Practical approach to pretty print vs. string conversion

I'd like to trivially provide a mechanism for logging data using pretty prints rather than plain type->string conversions which doesn't interfere with data transfer through strings.
I can add a type.String() converter method - which will then automatically be used by the fmt library which is generally what is being used for logging output.
However, this is likely to interfere in other domains which use type->string conversion and default to using the .String() mechanic (maybe there is a better standard interface that should be used when "give me this thing as a scannable string" is desired?)
What is the "right Go way" or a practical approach for writing type->string converters which are intended for data I/O - such as HTTP URI params or database I/O etc., vs. pretty print to logs?

Why do format conversion libraries lack a single method to write the output to a file?

From my experience with Ruby, libraries that parse/convert a format (such as YAML, JSON, XML, SASS, etc.) into objects often have a single method that covers from reading the file to parsing, which is usually named like load, load_file, etc. (In addition, they usually have a method that only does parsing on a string that was read in advance, which is usually named like decode, parse. etc.)
On the other hand, when it comes to converting the objects into the target file format, such libraries rarely have a single method that covers from conversion to writing to the destination file. Usually, they only have a single method that does only conversion, which is usually named like encode, render, etc., and the result string has to be written to the file using another method such as File.write.
What is the reason for this assymmetry? Why does writing to a file require an extra step?
I'd guess that it's because of error handling. Readings file can goi wrong in plenty of ways, but writing a file is even more error prone. It seems silly for a library that's main purpose is parsing to have to deal with file writing. I don't know why these libraries even include file read & parse methods.
Also, for a library to include these kinds of method is useless as soon as you need to access any of the options of the file writing and reading methods. So then the library includes an options parameter that gets passed to the file method, and now the code is just an unclear mess.
That's my 2ยข.

Using Ruby Symbols

First time I tried learning Ruby was 2 years ago, now I have started again. The reason I stopped was because I could not understand the Symbol class. And now I am at the same point again, completely lost in when and why you use Symbols. I have read the other posts on Stackoverflow as well as Googled for several explanations. But I do not understand it yet.
First I thought symbols was just a way to create some sort of "named constant" without having to go through the same process as in let say Java.
:all
instead of making a constant with an arbitrary value public static final String ALL = 8;
However it does not make much sense when you use it in e.g. attr_accessor :first_name etc.
Are Symbols just a lightweight String class? I am having problems understanding how I should interpret, when and how to use symbols both in my own classes and in frameworks.
In short, symbols are lightweight strings, but they also are immutable and non-garbage-collectable.
You should not use them as immutable strings in your data processing tasks (remember, once symbol is created, it can't be destroyed). You typically use symbols for naming things.
# typical use cases
# access hash value
user = User.find(params[:id])
# name something
attr_accessor :first_name
# set hash value in opts parameter
db.collection.update(query, update, multi: true, upsert: true)
Let's take first example, params[:id]. In a moderately big rails app there may be hundreds/thousands of those scattered around the codebase. If we accessed that value with a string, params["id"], that means new string allocation each time (and that string needs to be collected afterwards). In case of symbol, it's actually the same symbol everywhere. Less work for memory allocator, garbage collector and even you (: is faster to type than "")
If you have a simple one-word string that appears often in your code and you don't do something funky to it (interpolation, gsub, upcase, etc), then it's likely a good candidate to be a symbol.
However, does this apply only to text that is used as part of the actual program logic such as naming, not text that you get while actually running the program...such as text from the user/web etc?
I can not think of a single case where I'd want to turn data from user/web to symbol (except for parsing command-line options, maybe). Mainly because of the consequences (once created symbols live forever).
Also, many editors provide different coloring for symbols, to highlight them in the code. Take a look at this example
The O'Reilly Ruby Cookbook (p. 15) quotes Jim Weirich as saying:
If the contents (the sequence of characters) of the object are important, use a string.
If the identity of the object is important, use a symbol.
Symbols are generally used as hash keys, because it's the identity of the key that's important. Symbols are also required when passing messages using certain methods like Object#send.
A Ruby implementation typically has a table in which it stores the names of all classes, methods and variables. It refers to say a method name by the position in the table, avoiding expensive string comparisons. But you can use this table too and add values to it: symbols.
If you write code that uses strings as identifiers rather than for their textual content, consider symbols. If you write a method that expects an argument to be either 'male' or 'female', consider using :male and :female . Comparing two symbols for equality is faster than strings (that's why symbols make good hash keys).
Symbols are used for naming things in the language: the names of classes, the names of methods etc.
These are very like strings, except they can never be garbage collected, and testing for equality is optimised to be very quick.
The Java implementation has a very similar thing, except that it is not available for runtime use. What I mean is, when you write java code like obj.someMethod(4), the string 'someMethod' is converted by the compiler into a symbol which is embedded in a lookup table in the .class file. These symbols are like 'special' strings which are not garbage collected, and which are very fast to compare for equality. This is almost identical to Ruby, except that Ruby allows you to create new symbols at runtime, whereas Java only allows it at compile time.
This is just like creating new methods -- Java allows it at compile time; Ruby allows it at runtime.
After ruby version 2.2 symbol GC was removed, so now mortal symbols i.e when we convert string to symbol ("mortal".to_sym) gets cleaned up from memory.
check this out:
require 'objspace'
ObjectSpace.count_symbols
{
:mortal_dynamic_symbol=>3,
:immortal_dynamic_symbol=>5,
:immortal_static_symbol=>3663,
:immortal_symbol=>3668
}
source: https://www.rubyguides.com/2018/02/ruby-symbols/

Why aren't the arguments to File.new symbols instead of strings?

I was wondering why the people who wrote the File library decided to make the arguments that determine what mode the file is opened in strings instead of symbols.
For example, this is how it is now:
f = File.new('file', 'rw')
But wouldn't it be a better design to do
f = File.new('file', :rw)
or even
f = File.new(:file, :rw)
for example? This seems to be the perfect place to use them since the argument definitely doesn't need to be mutable.
I am interested in knowing why it came out this way.
Update: I just got done reading a related question about symbols vs. strings, and I think the consensus was that symbols are just not as well known as strings, and everyone is used to using strings to index hash tables anyway. However, I don't think it would be valid for the designers of Ruby's standard library to plead ignorance on the subject of symbols, so I don't think that's the reason.
I'm no expert in the history of ruby, but you really have three options when you want parameters to a method: strings, symbols, and static classes.
For example, exception handling. Each exception is actually a type of class Exception.
ArgumentError.is_a? Class
=> True
So you could have each permission for the stream be it's own class. But that would require even more classes to be generated for the system.
The thing about symbols is they are never deleted. Every symbol you generate is preserved indefinitely; it's why using the method '.to_sym' lightly is discouraged. It leads to memory leaks.
Strings are just easier to manipulate. If you got the input mode from the user, you would need a '.to_sym' somewhere in your code, or at the very least, a large switch statement. With a string, you can just pass the user input directly to the method (if you were so trusting, of course).
Also, in C, you pass a character to the file i/o method. There are no Chars in ruby, just strings. Seeing as how ruby is built on C, that could be where it comes from.
It is simply a relic from previous languages.

Hashes vs. Multiple Params?

It is very common in Ruby to see methods that receive a hash of parameters instead of just passing the parameters to the method.
My question is - when do you use parameters for your method and when do you use a parameters hash?
Is it right to say that it is a good practice to use a parameter hash when the method has more than one or two parameters?
I use parameter hashes whenever they represent a set of options that semantically belong together. Any other parameters which are direct (often required) arguments to the function, I pass one by one.
You may want to use a hash when there are many optional params, or when you want to accept arbitrary params, as you can see in many rails's methods.
if you have more than 2 arguements. you should start thinking of using hash.
This is good practise clearly explained in clean code link text
One obvious use case is when you are overriding a method in a child class, you should use hash parameters for the parent method's parameters for when you call it.
On another note, and this is not only related to Ruby but to all languages:
In APIs which are in flux, it is sometimes useful to declare some or all parameters to a function as a single parameters object (in Ruby these could be hashes, in C structs, and so on), so as to maintain API stability should the set of accepted arguments change in future versions. However, the obvious downside is that readability is drastically reduced, and I would never use this "pattern" unless I'd really really have to.

Resources