How to populate a Clojure record from a map? - data-structures

Is there something like struct-map for records? If not, should I use a struct (the docs discourage use of structs)?
Maybe I am doing the wrong thing completely? I have a rather complex function which currently takes a map of options. I am trying to clarify what option values are acceptable/used (by replacing it with a record). And now I want to interface that to code that has this information in maps (and which contain a superset of the data in the record).

It's not recommended to use records simply for "documentation" - plain old maps are more flexible, simpler, and easier. For documentation, you can merely add a docstring, or a comment, or create a function like (defn make-whatever [thing1 thing2]).
If you still want a record, you have a couple choices depending on whether you're using clojure version 1.3 or higher. If so, (defrecord Whatever ...) also defines a map->Whatever function, and a ->Whatever function that takes positional args. If not, you can write (into (Whatever. nil nil nil) some-map) (passing the right number of nils for the record type).

Related

Name a list in Scheme

I'm trying to make an array-like data structure in Scheme, and since I need to refer to it (and alter it!) often, I want to give it a name. But from what I've read on various tutorial sites, it looks like the only way to name the list for later reference is with define. That would be fine, except it also looks like once I initialize a list with define, it becomes more complicated altering or adding to said list. For example, it seems like I wouldn't be able to do just (append wordlist (element)), I'd need some manner of ! bang.
Basically my questions boil down to: Is define my only hope of naming a list? And if so, am I stuck jumping through hoops changing its elements? Thanks.
Yes, define is the way for naming things in Scheme. A normal list in Scheme won't allow you to change its elements, because it's immutable - that's one of the things you'll have to learn to live with when working with a functional data structure. Of course you can add elements to it or remove elements to it, but those operations will produce new lists, you can't change the elements in-place.
The other option is to use mutable lists instead of normal lists, but if you're just learning to use Scheme, it's better to stick to the immutable lists first and learn the Scheme way to do things in terms of immutable data.
Yes, define is the way to do "assignment" (really naming) in Scheme. Though, if you're writing some kind of package, you might consider wrapping the whole thing inside of a function and then using let to define something you refer to.
Then, of course, you have to have some sort of abstraction to unwrap the functions inside of your "package."
See SICP 2.5 Building Systems with Generic Operations
http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-18.html#%_sec_2.5
(append wordlist (element)) is creating a new list. What you might want is to use set! to redirect a reference to the new list, or define a reference to the new list using the same symbol's name.

Syntax when dereferencing database-backed tree

I'm using MongoDB, so my clusters of data are in dictionaries. Some of these contain references to other Mongo objects. For example, say I have a Person document which has a separate Employer document. I would like to control element access so I can automatically dereference documents. I also have some data with dates, and since PyMongo can't store timezone info, I'd like to store a string timezone alongside the UTC time and have an accessor to the converted times easily.
Which of these options seems the best to you?
Person = {'employer': ObjectID}
Employer = {'name': str}
Option 1: Augmented operations are methods
Examples
print person.get_employer()['name']
person.get_employer()['name'] = 'Foo'
person.set_employer(new_employer)
Pro: Method syntax makes it clear that getting the employer is not just dictionary access
Con: Differences between the syntaxes between referenced objects and not, making it hard to normalize the schema if necessary. Augmenting an element would require changing the callers
Option 2: Everything is an attribute
Examples
print person.employer.name
person.employer.name = 'Foo'
person.employer = new_employer
Pro: Uniform syntax for augmented and non-augmented
?: Makes it unclear that this is backed by a dictionary, but provides a layer of abstraction?
Con: Requires morphing a dictionary to an object, not pythonic?
Option 3: Everything is a dictionary item
Examples
print person['employer']['name']
person['employer']['name'] = 'Foo'
person['employer'] = new_employer
Pro: Uniform syntax for augmented and non-augmented
?: Makes it unclear that some of these accesses are actually method calls, but provides a layer of abstraction?
Con: Dictionary item syntax is error-prone to type IMHO.
Your first 2 options would require making a "Person" class and an "Employer" class, and using __dict__ to read values and setattr for writing values. This approach will be slower, but will be more flexible (you can add new methods, validation, etc.)
The simplest way would be to use only dictionaries (option 3). It wouldn't require any need for oop. Personally, I also find it to be the most readable of the 3.
So, if I were you, I would use option 3. It is nice and simple, and easy to expand on later if you change your mind. If I had to choose between the first two, I would choose the second (I don't like overusing getters and setters).
P.S. I'd keep away from person.get_employer()['name'] = 'Foo', regardless of what you do.
Do not be afraid to write a custom class when that will make the subsequent code easier to write/read/debug/etc.
Option 1 is good when you're calling something that's slow/intensive/whatever -- and you'll want to save the results so can use option 2 for subsequent access..
Option 2 is your best bet -- less typing, easier to read, create your classes once then instantiate and away you go (no need to morph your dictionary).
Option 3 doesn't really buy you anything over option 2 (besides more typing, plus allowing typos to pass instead of erroring out)

Hashes vs. Multiple Params?

It is very common in Ruby to see methods that receive a hash of parameters instead of just passing the parameters to the method.
My question is - when do you use parameters for your method and when do you use a parameters hash?
Is it right to say that it is a good practice to use a parameter hash when the method has more than one or two parameters?
I use parameter hashes whenever they represent a set of options that semantically belong together. Any other parameters which are direct (often required) arguments to the function, I pass one by one.
You may want to use a hash when there are many optional params, or when you want to accept arbitrary params, as you can see in many rails's methods.
if you have more than 2 arguements. you should start thinking of using hash.
This is good practise clearly explained in clean code link text
One obvious use case is when you are overriding a method in a child class, you should use hash parameters for the parent method's parameters for when you call it.
On another note, and this is not only related to Ruby but to all languages:
In APIs which are in flux, it is sometimes useful to declare some or all parameters to a function as a single parameters object (in Ruby these could be hashes, in C structs, and so on), so as to maintain API stability should the set of accepted arguments change in future versions. However, the obvious downside is that readability is drastically reduced, and I would never use this "pattern" unless I'd really really have to.

Using Hash Tables as function inputs

I am working in a group that is writing some APIs for tools that we are using in Ruby. When writing API methods, many of my team mates use hash tables as the method's only parameter while I write my methods with each value specified.
For example, a class Apple defined as:
class Apple
#commonName
#volume
#color
end
I would instantiate the class with method:
Apple.new( commonName, volume, color )
My team mates would write it so the method looked like:
Apple.new( {"commonName"=>commonName, "volume"=>volume, "color"=>color )
I don't like using a hash table as the input. To me is seems unnecessarily bulky and doesn't add any clarity to the code. While it doesn't appear to be a big deal in this example, some of our methods have greater than 10 parameters and there will often be hash tables nested in inside other hash tables. I also noticed that using hash tables in this way is extremely uncommon in public APIs(net/telnet is the only exception that I can think of right now).
Question: What arguments could I make to my team members to not use hash tables as input parameters. The bulkiness of the code isn't a sufficient justification(they are not afraid of writing 200-400 character lines) and excessive memory/processing overhead won't work because it won't become an issue with the way our tools will be used.
Actually if your method takes more than 10 arguments, you should either redesign your class or eat dirt and use hashes. For any method that takes more than 4 arguments, using typical arguments can be counter-intuitive while calling the method, because you got to remember the order correctly.
I think best solution would be to simply redesign such methods and use something like builder or fluent patterns.
First of all, you should chide them for using strings instead of symbols for hash keys.
One issue with using a hash is that you then have to check that all the appropriate keys are in it. This makes it useful for optional parameters, but for mandatory one, why not use the built-in functionality of the language? For example, with their method, what happens if I do this:
Apple.new( {"commonName"=>commonName, "volume"=>volume} )
Whereas, with Apple.new(commonName, volume), you know you'll get an ArgumentError.
Named parameters make for more self-documenting code which is nice. But other than that there's not a lot of difference. The Hash allows for more flexibility, especially if you start doing any method aliasing. Also, the various Hash methods in ActiveSupport make setting defaults and verifying inputs pretty painless. I guess this probably wasn't the answer you were looking for.

Return concrete or abstract datatypes?

I'm in the middle of reading Code Complete, and towards the end of the book, in the chapter about refactoring, the author lists a bunch of things you should do to improve the quality of your code while refactoring.
One of his points was to always return as specific types of data as possible, especially when returning collections, iterators etc. So, as I've understood it, instead of returning, say, Collection<String>, you should return HashSet<String>, if you use that data type inside the method.
This confuses me, because it sounds like he's encouraging people to break the rule of information hiding. Now, I understand this when talking about accessors, that's a clear cut case. But, when calculating and mangling data, and the level of abstraction of the method implies no direct data structure, I find it best to return as abstract a datatype as possible, as long as the data doesn't fall apart (I wouldn't return Object instead of Iterable<String>, for example).
So, my question is: is there a deeper philosophy behind Code Complete's advice of always returning as specific a data type as possible, and allow downcasting, instead of maintaining a need-to-know-basis, that I've just not understood?
I think it is simply wrong for the most cases. It has to be:
be as lenient as possible, be as specific as needed
In my opinion, you should always return List rather than LinkedList or ArrayList, because the difference is more an implementation detail and not a semantic one. The guys from the Google collections api for Java taking this one step further: they return (and expect) iterators where that's enough. But, they also recommend to return ImmutableList, -Set, -Map etc. where possible to show the caller he doesn't have to make a defensive copy.
Beside that, I think the performance of the different list implementations isn't the bottleneck for most applications.
Most of the time one should return an interface or perhaps an abstract type that represents the return value being returned. If you are returning a list of X, then use List. This ultimately provides maximum flexibility if the need arises to return the list type.
Maybe later you realise that you want to return a linked list or a readonly list etc. If you put a concrete type your stuck and its a pain to change. Using the interface solves this problem.
#Gishu
If your api requires that clients cast straight away most of the time your design is suckered. Why bother returning X if clients need to cast to Y.
Can't find any evidence to substantiate my claim but the idea/guideline seems to be:
Be as lenient as possible when accepting input. Choose a generalized type over a specialized type. This means clients can use your method with different specialized types. So an IEnumerable or an IList as an input parameter would mean that the method can run off an ArrayList or a ListItemCollection. It maximizes the chance that your method is useful.
Be as strict as possible when returning values. Prefer a specialized type if possible. This means clients do not have to second-guess or jump through hoops to process the return value. Also specialized types have greater functionality. If you choose to return an IList or an IEnumerable, the number of things the caller can do with your return value drastically reduces - e.g. If you return an IList over an ArrayList, to get the number of elements returned - use the Count property, the client must downcast. But then such downcasting defeats the purpose - works today.. won't tomorrow (if you change the Type of returned object). So for all purposes, the client can't get a count of elements easily - leading him to write mundane boilerplate code (in multiple places or as a helper method)
The summary here is it depends on the context (exceptions to most rules). E.g. if the most probable use of your return value is that clients would use the returned list to search for some element, it makes sense to return a List Implementation (type) that supports some kind of search method. Make it as easy as possible for the client to consume the return value.
I could see how, in some cases, having a more specific data type returned could be useful. For example knowing that the return value is a LinkedList rather than just List would allow you to do a delete from the list knowing that it will be efficient.
I think, while designing interfaces, you should design a method to return the as abstract data type as possible. Returning specific type would make the purpose of the method more clear about what they return.
Also, I would understand it in this way:
Return as abstract a data type as possible = return as specific a data type as possible
i.e. when your method is supposed to return any collection data type return collection rather than object.
tell me if i m wrong.
A specific return type is much more valuable because it:
reduces possible performance issues with discovering functionality with casting or reflection
increases code readability
does NOT in fact, expose more than is necessary.
The return type of a function is specifically chosen to cater to ALL of its callers. It is the calling function that should USE the return variable as abstractly as possible, since the calling function knows how the data will be used.
Is it only necessary to traverse the structure? is it necessary to sort the structure? transform it? clone it? These are questions only the caller can answer, and thus can use an abstracted type. The called function MUST provide for all of these cases.
If,in fact, the most specific use case you have right now is Iterable< string >, then that's fine. But more often than not - your callers will eventually need to have more details, so start with a specific return type - it doesn't cost anything.

Resources