Syntax when dereferencing database-backed tree - coding-style

I'm using MongoDB, so my clusters of data are in dictionaries. Some of these contain references to other Mongo objects. For example, say I have a Person document which has a separate Employer document. I would like to control element access so I can automatically dereference documents. I also have some data with dates, and since PyMongo can't store timezone info, I'd like to store a string timezone alongside the UTC time and have an accessor to the converted times easily.
Which of these options seems the best to you?
Person = {'employer': ObjectID}
Employer = {'name': str}
Option 1: Augmented operations are methods
Examples
print person.get_employer()['name']
person.get_employer()['name'] = 'Foo'
person.set_employer(new_employer)
Pro: Method syntax makes it clear that getting the employer is not just dictionary access
Con: Differences between the syntaxes between referenced objects and not, making it hard to normalize the schema if necessary. Augmenting an element would require changing the callers
Option 2: Everything is an attribute
Examples
print person.employer.name
person.employer.name = 'Foo'
person.employer = new_employer
Pro: Uniform syntax for augmented and non-augmented
?: Makes it unclear that this is backed by a dictionary, but provides a layer of abstraction?
Con: Requires morphing a dictionary to an object, not pythonic?
Option 3: Everything is a dictionary item
Examples
print person['employer']['name']
person['employer']['name'] = 'Foo'
person['employer'] = new_employer
Pro: Uniform syntax for augmented and non-augmented
?: Makes it unclear that some of these accesses are actually method calls, but provides a layer of abstraction?
Con: Dictionary item syntax is error-prone to type IMHO.

Your first 2 options would require making a "Person" class and an "Employer" class, and using __dict__ to read values and setattr for writing values. This approach will be slower, but will be more flexible (you can add new methods, validation, etc.)
The simplest way would be to use only dictionaries (option 3). It wouldn't require any need for oop. Personally, I also find it to be the most readable of the 3.
So, if I were you, I would use option 3. It is nice and simple, and easy to expand on later if you change your mind. If I had to choose between the first two, I would choose the second (I don't like overusing getters and setters).
P.S. I'd keep away from person.get_employer()['name'] = 'Foo', regardless of what you do.

Do not be afraid to write a custom class when that will make the subsequent code easier to write/read/debug/etc.
Option 1 is good when you're calling something that's slow/intensive/whatever -- and you'll want to save the results so can use option 2 for subsequent access..
Option 2 is your best bet -- less typing, easier to read, create your classes once then instantiate and away you go (no need to morph your dictionary).
Option 3 doesn't really buy you anything over option 2 (besides more typing, plus allowing typos to pass instead of erroring out)

Related

Is there such a thing as ‘class bloat’ - i.e. too many classes causing inefficiencies?

E.g. let’s consider I have the following classes:
Item
ItemProperty which would include objects such as Colour and Size. There's a relation-property of the Item class which lists all of the ItemProperty objects applicable to this Item (i.e. for one item you might need to specify the Colour and for another you might want to specify the Size).
ItemPropertyOption would include objects such as Red, Green (for Colour) and Big, Small (for Size).
Then an Item Object would relate to an ItemProperty, whereas an ItemChoice Object would relate to an ItemPropertyOption (and the ItemProperty which the ItemPropertyOption refers to could be inferred).
The reason for this is so I could then make use of queries much more effectively. i.e. give me all item-choices which are Red. It would also allow me to use the Parse Dashboard to quickly add elements to the site as I could easily specify more ItemPropertys and ItemPropertyOptions, rather than having to add them in the codebase.
This is just a small example and there's many more instances where I'd like to use classes so that 'options' for various drop-downs in forms are in the database and can easily be added and edited by me, rather than hard-coded.
1) I’ll probably be doing this in a similar way for 5+ more similar kinds of class-structures
2) there could be hundreds of nested properties that I want to access via ‘inverse querying’
So, I can think of 2 potential causes of inefficiency and wanted to know if they’re founded:
Is having lots of classes inefficient?
Is back-querying against nested classes inefficient?
The other option I can think of — if ‘class-bloat’ really is a problem — is to make fields on parent classes that, instead of being nested across other classes (that represent further properties, as above), just representing them as a nested JSON property directly.
The job of designing is to render in object descriptions truths about the world that are relevent to the system's requirements. In the world of the OP's "items", it's a fact that items have color, and it's a relevant fact because users care about an item's color. You'd only call a system inefficient if it consumes computing resources that it doesn't need to consume.
So, for something like a configurator, the fact that we have items, and that those items have properties, and those properties have an enumerable set of possible values sounds like a perfectly rational design.
Is it inefficient or "bloated"? The only place I'd raise doubt is in the explicit assertion that items have properties. Of course they do, but that's natively true of javascript objects and parse entities.
In other words, you might be able to get along with just item and several flavors of propertyOptions: e.g. Item has an attribute called "colorProperty" that is a pointer to an instance of "ColorProperty" (whose instances have a name property like 'red', 'green', etc. and maybe describe other pertinent facts, like a more precise description in RGB form).
There's nothing wrong with lots of classes if they represent relevant truth. Do that first. You might discover empirically that your design is too resource consumptive (I doubt you will in this case), at which point we'd start looking for cheats to be somehow skinnier. But do it the right way first, cheat later only if you must.
Is having lots of classes inefficient?
It's certainly inefficient for poor humans who have to remember what all those classes do and how they're related to each other. It takes time to write all those classes in the first place, and every line that you write is a line that has to be maintained.
Beyond that, there's certainly some cost for each class in any OOP language, and creating more classes than you really need will mean that you're paying more than you need to for the work that you're doing, which is pretty much the definition of inefficient.
I’ll probably be doing this in a similar way for 5+ more similar kinds of class-structures
Maybe you could spend some time thinking about the similarity between these cases and come up with a single set of more flexible classes that you can use in all those cases. Writing general code is harder than writing very specific code, but if you do a good job you'll recoup the extra effort many times over through reuse.

Is it good in Rails view to define constant just to be used once?

To a partial in a HAML file, I am passing parameters whose value is long, or methods whose name is long, for example:
"Some quite long string"
quiteLongMethodNameHere(otherConstant)
To make them shorter, I wrapped them in a constant/variable:
- message = "Some quite long string"
- is_important = quiteLongMethodNameHere(otherConstant)
= render :some_component, msg: message, is_important: is_important
Is this a good practice? Or should I just put the value on the param without wrapping it inside variable/constant?
It's a case-by-case decision. You want to balance the sometimes-competing interests of clarity and conciseness. For me, it depends on the expressiveness of both forms. If the long method name is clear, precise, and expressive, then I would be less interested in using an intermediate variable to hold its result than if it were not.
In other cases where the long form is less expressive, I will often use intermediate variables as "living" documentation, even if they are used only on the next line of code. This more explicitly reveals your intention to the reader (who may someone else, or you at some future point in time).
I find intermediate variables are much better than code comments because code comments can more easily become obsolete, and having the clarification in code makes it available for debuggers, etc. The performance hit of creating an extra variable is minimal, and significant in only the most unusual of cases.
Another factor is if you are aggregating things (in arrays, hashes, etc.) that include these function calls and values, then using the intermediate variable makes the code neater, and possibly easier to understand, as you can customize the name to make the most sense in the context of that collection.
Regardless of the length of the string, it makes sense to assign it to a variable/constant, and not directly refer to it in a view file. If it is a text, it makes more sense to put it in a i18n file.
However, it is not good to do that in the main view file. If you are going to do it, do it in the controller file or a helper file.

Why does delete return the deleted element instead of the new array?

In ruby, Array#delete(obj) will search and remove the specified object from the array. However, may be I'm missing something here but I found the returning value --- the obj itself --- is quite strange and a even a little bit useless.
My humble opinion is that in consistent with methods like sort/sort! and map/map! there should be two methods, e.g. delete/delete!, where
ary.delete(obj) -> new array, with obj removed
ary.delete!(obj) -> ary (after removing obj from ary)
For several reasons, first being that current delete is non-pure, and it should warn the programmer about that just like many other methods in Array (in fact the entire delete_??? family has this issue, they are quite dangerous methods!), second being that returning the obj is much less chainable than returning the new array, for example, if delete were like the above one I described, then I can do multiple deletions in one statement, or I can do something else after deletion:
ary = [1,2,2,2,3,3,3,4]
ary.delete(2).delete(3) #=> [1,4], equivalent to "ary - [2,3]"
ary.delete(2).map{|x|x**2"} #=> [1,9,9,9,16]
which is elegant and easy to read.
So I guess my question is: is this a deliberate design out of some reason, or is it just a heritage of the language?
If you already know that delete is always dangerous, there is no need to add a bang ! to further notice that it is dangerous. That is why it does not have it. Other methods like map may or may not be dangerous; that is why they have versions with and without the bang.
As for why it returns the extracted element, it provides access to information that is cumbersome to refer to if it were not designed like that. The original array after modification can easily be referred to by accessing the receiver, but the extracted element is not easily accessible.
Perhaps, you might be comparing this to methods that add elements, like push or unshift. These methods add elements irrespective of what elements the receiver array has, so returning the added element would be always the same as the argument passed, and you know it, so it is not helpful to return the added elements. Therefore, the modified array is returned, which is more helpful. For delete, whether the element is extracted depends on whether the receiver array has it, and you don't know that, so it is useful to have it as a return value.
For anyone who might be asking the same question, I think I understand it a little bit more now so I might as well share my approach to this question.
So the short answer is that ruby is not a language originally designed for functional programming, neither does it put purity of methods to its priority.
On the other hand, for my particular applications described in my question, we do have alternatives. The - method can be used as a pure alternative of delete in most situations, for example, the code in my question can be implemented like this:
ary = [1,2,2,2,3,3,3,4]
ary.-([2]).-([3]) #=> [1,4], or simply ary.-([2,3])
ary.-([2]).map{|x|x**2"} #=> [1,9,9,9,16]
and you can happily get all the benefits from the purity of -. For delete_if, I guess in most situations select (with return value negated) could be a not-so-great pure candidate.
As for why delete family was designed like this, I think it's more of a difference in point of view. They are supposed to be more of shorthands for commonly needed non-pure procedures than to be juxtaposed with functional-flavored select, map, etc.
I’ve wondered some of these same things myself. What I’ve largely concluded is that the method simply has a misleading name that carries with it false expectations. Those false expectations are what trigger our curiosity as to why the method works like it does. Bottom line—I think it’s a super useful method that we wouldn’t be questioning if it had a name like “swipe_at” or “steal_at”.
Anyway, another alternative we have is values_at(*args) which is functionally the opposite of delete_at in that you specify what you want to keep and then you get the modified array (as opposed to specifying what you want to remove and then getting the removed item).

Using Hash Tables as function inputs

I am working in a group that is writing some APIs for tools that we are using in Ruby. When writing API methods, many of my team mates use hash tables as the method's only parameter while I write my methods with each value specified.
For example, a class Apple defined as:
class Apple
#commonName
#volume
#color
end
I would instantiate the class with method:
Apple.new( commonName, volume, color )
My team mates would write it so the method looked like:
Apple.new( {"commonName"=>commonName, "volume"=>volume, "color"=>color )
I don't like using a hash table as the input. To me is seems unnecessarily bulky and doesn't add any clarity to the code. While it doesn't appear to be a big deal in this example, some of our methods have greater than 10 parameters and there will often be hash tables nested in inside other hash tables. I also noticed that using hash tables in this way is extremely uncommon in public APIs(net/telnet is the only exception that I can think of right now).
Question: What arguments could I make to my team members to not use hash tables as input parameters. The bulkiness of the code isn't a sufficient justification(they are not afraid of writing 200-400 character lines) and excessive memory/processing overhead won't work because it won't become an issue with the way our tools will be used.
Actually if your method takes more than 10 arguments, you should either redesign your class or eat dirt and use hashes. For any method that takes more than 4 arguments, using typical arguments can be counter-intuitive while calling the method, because you got to remember the order correctly.
I think best solution would be to simply redesign such methods and use something like builder or fluent patterns.
First of all, you should chide them for using strings instead of symbols for hash keys.
One issue with using a hash is that you then have to check that all the appropriate keys are in it. This makes it useful for optional parameters, but for mandatory one, why not use the built-in functionality of the language? For example, with their method, what happens if I do this:
Apple.new( {"commonName"=>commonName, "volume"=>volume} )
Whereas, with Apple.new(commonName, volume), you know you'll get an ArgumentError.
Named parameters make for more self-documenting code which is nice. But other than that there's not a lot of difference. The Hash allows for more flexibility, especially if you start doing any method aliasing. Also, the various Hash methods in ActiveSupport make setting defaults and verifying inputs pretty painless. I guess this probably wasn't the answer you were looking for.

What is an elegant way to track the size of a set of objects without a single authoritative collection to reference?

Update: Please read this question in the context of design principles, elegance, expression of intent, and especially the "signals" sent to other programmers by design choices.
I have two "views" of a set of objects. One is a dictionary/map indexing the objects by a string value. The other is a dictionary/map indexing the objects by an ordinal (ordering integer). There is no "master" collection of the objects by themselves that can serve as the authoritative source for the number of objects, but the two dictionaries should always both contain references to all the objects.
When a new item is added to the set a reference is added to both dictionaries, and then some processing needs to be done which is affected by the new total number of objects.
What should I use as the authoritative source to reference for the current size of the set of objects? It seems that all my options are flawed in one dimension or another. I can just consistently reference one of the dictionaries, but that would codify an implication of that dictionary's superiority over the other. I could add a 3rd collection, a simple list of the objects to serve as the authoritative list, but that increases redundancy. Storing a running count seems simplest, but also increases redundancy and is more brittle than referencing a collection's self-tracked count on the fly.
Is there another option that will allow me to avoid choosing the lesser evil, or will I have to accept a compromise on elegance?
I would create a class that has (at least) two collections.
A version of the collection that is
sorted by string
A version of the
collection that is sorted by ordinal
(Optional) A master collection
The class would handle the nitty gritty management:
The syncing of the contents for the collections
Standard collection actions (e.g. Allow users get the size, Add or retrieve items)
Let users get by string or ordinal
That way you can use the same collection wherever you need either behavior, but still abstract away the "indexing" behavior you are going for.
The separate class gives you a single interface with which to explain your intent regarding how this class is to be used.
I'd suggest encapsulation: create a class that hides the "management" details (such as the current count) and use it to expose immutable "views" of the two collections.
Clients will ask the "manglement" object for an appropriate reference to one of the collections.
Clients adding a "term" (for lack of a better word) to the collections will do so through the "manglement" object.
This way your assumptions and implementation choices are "hidden" from clients of the service and you can document that the choice of collection for size/count was arbitrary. Future maintainers can change how the count is managed without breaking clients.
BTW, yes, I meant "manglement" - my favorite malapropism for management (in any context!)
If both dictionaries contain references to every object, the count should be the same for both of them, correct? If so, just pick one and be consistent.
I don't think it is a big deal at all. Just reference the sets in the same order each time
you need to get access to them.
If you really are concerned about it you could encapsulate the collections with a wrapper that exposes the public interfaces - like
Add(item)
Count()
This way it will always be consistent and atomic - or at least you could implement it that way.
But, I don't think it is a big deal.

Resources