Why do format conversion libraries lack a single method to write the output to a file? - ruby

From my experience with Ruby, libraries that parse/convert a format (such as YAML, JSON, XML, SASS, etc.) into objects often have a single method that covers from reading the file to parsing, which is usually named like load, load_file, etc. (In addition, they usually have a method that only does parsing on a string that was read in advance, which is usually named like decode, parse. etc.)
On the other hand, when it comes to converting the objects into the target file format, such libraries rarely have a single method that covers from conversion to writing to the destination file. Usually, they only have a single method that does only conversion, which is usually named like encode, render, etc., and the result string has to be written to the file using another method such as File.write.
What is the reason for this assymmetry? Why does writing to a file require an extra step?

I'd guess that it's because of error handling. Readings file can goi wrong in plenty of ways, but writing a file is even more error prone. It seems silly for a library that's main purpose is parsing to have to deal with file writing. I don't know why these libraries even include file read & parse methods.
Also, for a library to include these kinds of method is useless as soon as you need to access any of the options of the file writing and reading methods. So then the library includes an options parameter that gets passed to the file method, and now the code is just an unclear mess.
That's my 2¢.

Related

goyacc: getting context to the yacc parser / no `%param`

What is the most idiomatic way to get some form of context to the yacc parser in goyacc, i.e. emulate the %param command in traditional yacc?
I need to parse to my .Parse function some context (in this case including for instance where to build its parse tree).
The goyacc .Parse function is declared
func ($$rcvr *$$ParserImpl) Parse($$lex $$Lexer) int {
Things I've thought of:
$$ParserImpl cannot be changed by the .y file, so the obvious solution (to add fields to it) is right out, which is a pity.
As $$Lexer is an interface, I could stuff the parser context into the Lexer implementation, then force type convert $$lex to that implementation (assuming my parser always used the same lexer), but this seems pretty disgusting (for which read non-idiomatic). Moreover there is (seemingly) no way to put a user-generated line at the top of the Parse function like c := yylex.(*lexer).c, so in the many tens of places I want to refer to this variable, I have to use the rather ugly form yylex.(*lexer).c rather than just c.
Normally I'd use %param in normal yacc / C (well, bison anyway), but that doesn't exist in goyacc.
I'd like to avoid postprocessing my generated .go file with sed or perl for what are hopefully obvious reasons.
I want to be able to (go)yacc parse more than one file at once, so a global variable is not possible (and global variables are hardly idiomatic).
What's the most idiomatic solution here? I keep thinking I must be missing something simple.
My own solution is to modify goyacc (see this PR) which adds a %param directive allowing one or more fields to be added to the $$ParserImpl structure (accessible as $$rcvr in code). This seems the most idiomatic route. This permits not only passing context in, but the ability for the user to add additional func()s using $$ParserImpl as a receiver.

what is the use of erlang compile options: "-compile({parse_transform, ms_transform})".?

As the title, does anybody could explain the use of parse_transform with ms_transform?
what the different between with it and without it ?
The -compile({parse_transform, ms_transform}). syntax invokes a parse transform.
A parse transform is a module which the compiler calls after the file or input has been parsed. The module is called with the full abstract syntax of the whole module and must return a new abstract for a whole module. The parse transform is allowed to do whatever it wants as long as the result is legal erlang syntax. It is like a super macro facility which works on the whole module not just single function calls. The resulting module is then compiled. You can have many parse transforms.
Parse transforms are typically used to do compile-time evaluation and code transformations. The ets:fun2ms call mentioned by #P_A is a typical example of this as it takes a fun and at compile-time transforms this into a match specification, see Matchspecs and ets:fun2ms. But parse transforms allow you to do much more, for example add and remove functions. An example of this is a parse transform which generates access functions for all the fields in a record.
It is a very powerful tool, but unfortunately easy to get wrong and so create a real mess. There are, however, some 3rd party support tools which can be very helpful.
ms_transform module implements parse_transform that translates fun syntax into match specifications. For example ets:fun2ms fun uses it.
Also you can use
-include_lib("stdlib/include/ms_transform.hrl").

When to use dump vs. generate vs. to_json and load vs. parse in Ruby's JSON lib?

david4dev's answer to this question claims that there are three equivalent ways to convert an object to a JSON string using the json library:
JSON.dump(object)
JSON.generate(object)
object.to_json
and two equivalent ways to convert a JSON string to an object:
JSON.load(string)
JSON.parse(string)
But looking at the source code, each of them seems to be pretty much different, and there are some differences between them (e.g., 1).
What are the differences among them? When to use which?
TL;DR:
In general:
Use to_json (or the equivalent JSON::generate).
Use JSON::parse.
For some special use cases, you may want dump or load, but it's unsafe to use load on data you didn't create yourself.
Extended Explanation:
JSON::dump vs JSON::generate
As part of its argument signature, JSON::generate allows you to set options such as indent levels and whitespace particulars. JSON::dump, on the other hand, calls ::generate within itself, with specific pre-set options, so you lose the ability to set those yourself.
According to the docs, JSON::dump is meant to be part of the Marshal::dump implementation scheme. The main reason you'd want to explicitly use ::dump yourself would be that you are about to stream your JSON data (over a socket for instance), since ::dump allows you to pass an IO-like object as the second argument. Unfortunately, the JSON data being produced is not really streamed as it is produced; it is created en masse and only sent once the JSON is fully created. This makes having an IO argument useful only in trivial cases.
The final difference between the two is that ::dump can also take a limit argument that causes it to raise an ArgumentError when a certain nesting depth is exceeded.
Comparison to #to_json
#to_json accepts options as arguments, so internal implementation aside, JSON::generate(foo, opts) and foo.to_json(opts) are equivalent.
JSON::load vs JSON::parse
Similar to ::dump calling ::generate internally, ::load calls ::parse internally. ::load, like ::dump, may also take an IO object, but again, the source is read all at once, so streaming is limited to trivial cases. However, unlike the ::dump/::generate duality, both ::load and ::parse accept options as part of their argument signatures.
::load can also be passed a proc, which will be called on every Ruby object parsed from the data; it also comes with a warning that ::load should only be used with trusted data. ::parse has no such restriction, and therefore JSON::parse is the correct choice for parsing untrusted data sources like user inputs and files or streams with unknown contents.

Why aren't the arguments to File.new symbols instead of strings?

I was wondering why the people who wrote the File library decided to make the arguments that determine what mode the file is opened in strings instead of symbols.
For example, this is how it is now:
f = File.new('file', 'rw')
But wouldn't it be a better design to do
f = File.new('file', :rw)
or even
f = File.new(:file, :rw)
for example? This seems to be the perfect place to use them since the argument definitely doesn't need to be mutable.
I am interested in knowing why it came out this way.
Update: I just got done reading a related question about symbols vs. strings, and I think the consensus was that symbols are just not as well known as strings, and everyone is used to using strings to index hash tables anyway. However, I don't think it would be valid for the designers of Ruby's standard library to plead ignorance on the subject of symbols, so I don't think that's the reason.
I'm no expert in the history of ruby, but you really have three options when you want parameters to a method: strings, symbols, and static classes.
For example, exception handling. Each exception is actually a type of class Exception.
ArgumentError.is_a? Class
=> True
So you could have each permission for the stream be it's own class. But that would require even more classes to be generated for the system.
The thing about symbols is they are never deleted. Every symbol you generate is preserved indefinitely; it's why using the method '.to_sym' lightly is discouraged. It leads to memory leaks.
Strings are just easier to manipulate. If you got the input mode from the user, you would need a '.to_sym' somewhere in your code, or at the very least, a large switch statement. With a string, you can just pass the user input directly to the method (if you were so trusting, of course).
Also, in C, you pass a character to the file i/o method. There are no Chars in ruby, just strings. Seeing as how ruby is built on C, that could be where it comes from.
It is simply a relic from previous languages.

Parsing CArchive (MFC classes) files in Ruby

I have a legacy app that seems to be exporting/saving files with CArchive (legacy MFC application).
We're currently refactoring the tool for the web. Is there a library I can look at in Ruby for parsing and loading these legacy files?
What possible libraries could I look into?
Problems with the file format according to XML serialization for MFC include:
Non-robustness—your program will probably crash if you read an archive produced by another version of your program. This can be avoided by complex and unwieldly version management. By using XML, this can be largely avoided.
- Heavy dependencies between your program object model and the archived data. Change the program model and it is almost impossible to read data from a previous version.
- Archived data cannot be edited, understood, and changed, except with the associated application.
Also - 4 versions of the legacy software exists, how would I be able to overcome this ObjectModel, Archived data problem for the different versions? Total backward (import) capabilities are required.
CArchive doesn't have a format that you can parse. It's just a binary file. You have to know what is in it to know how to read it. A library could make it easier to read some data types (CString, CArray, etc.) but I'm not sure you'll find anything like this.
CArchive works like this (storing part):
CArchive ar;
int i = 5;
float f = 5.42f;
CString str("string");
ar << i << f << str;
Then all this is dumped into binary file. You would have to read binary data and somehow interpret it. This is easy in C++ because MFC knows how to serialize types, including complex types like CString and CArray. But you'll have to do this on your own using Ruby.
For example you might read 4 bytes (because you know that int is that big) and interpret it as integer. Next four bytes for float. And then you have to see how to load CString, it stores the length first and then data, but you'll have to take a look at the exact format it uses. You could create utility functions for each type to make your life easier but don't expect this to be simple.
You could write an exporter in C++ using the old functionality, that would read in the CArchive and then output an xml file or whatever of the contents. Reading CArchives directly from Ruby (or any other language than C++/MFC) is going to be a major project. Maybe you can get away with it if the data that is written is just a struct with a few ints or longs, but as soon as your CArchive contains UDT's you're in for a world of pain. For example I don't even think CArchive makes promises on alignment.

Resources