Abstracting away from data structure implementation details in Clojure - data-structures

I am developing a complex data structure in Clojure with multiple sub-structures.
I know that I will want to extend this structure over time, and may at times want to change the internal structure without breaking different users of the data structure (for example I may want to change a vector into a hashmap, add some kind of indexing structure for performance reasons, or incorporate a Java type)
My current thinking is:
Define a protocol for the overall structure with various accessor methods
Create a mini-library of functions that navigate the data structure e.g. (query-substructure-abc param1 param2)
Implement the data structure using defrecord or deftype, with the protocol methods defined to use the mini-library
I think this will work, though I'm worried it is starting to look like rather a lot of "glue" code. Also it probably also reflects my greater familiarity with object-oriented approaches.
What is the recommended way to do this in Clojure?

I think that deftype might be the way to go, however I'd take a pass on the accessor methods. Instead, look into clojure.lang.ILookup and clojure.lang.Associative; these are interfaces which, if you implement them for your type, will let you use get / get-in and assoc / assoc-in, making for a far more versatile solution (not only will you be able to change the underlying implementation, but perhaps also to use functions built on top of Clojure's standard collections library to manipulate your structures).
A couple of things to note:
You should probably start with defrecord, using get, assoc & Co. with the standard defrecord implementations of ILookup, Associative, IPersistentMap and java.util.Map. You might be able to go a pretty long way with it.
If/when these are no longer enough, have a look at the sources for emit-defrecord (a private function defined in core_deftype.clj in Clojure's sources). It's pretty complex, but it will give you an idea of what you may need to implement.
Neither deftype nor defrecord currently define any factory functions for you, but you should probably do it yourself. Sanity checking goes inside those functions (and/or the corresponding tests).
The more conceptually complex operations are of course a perfect fit for protocol functions built on the foundation of get & Co.
Oh, and have a look at gvec.clj in Clojure's sources for an example of what some serious data structure code written using deftype might look like. The complexity here is of a different kind from what you describe in the question, but still, it's one of the few examples of custom data structure programming in Clojure currently available for public consumption (and it is of course excellent quality code).
Of course this is just what my intuition tells me at this time. I'm not sure that there is much in the way of established idioms at this stage, what with deftype not actually having been released and all. :-)

Related

How to Work with Ruby Duck Typing

I am learning Ruby and I'm having a major conceptual problem concerning typing. Allow me to detail why I don't understand with paradigm.
Say I am method chaining for concise code as you do in Ruby. I have to precisely know what the return type of each method call in the chain, otherwise I can't know what methods are available on the next link. Do I have to check the method documentation every time?? I'm running into this constantly running tutorial exercises. It seems I'm stuck with a process of reference, infer, run, fail, fix, repeat to get code running rather then knowing precisely what I'm working with during coding. This flies in the face of Ruby's promise of intuitiveness.
Say I am using a third party library, once again I need to know what types are allow to pass on the parameters otherwise I get a failure. I can look at the code but there may or may not be any comments or declaration of what type the method is expecting. I understand you code based on methods are available on an object, not the type. But then I have to be sure whatever I pass as a parameter has all the methods the library is expect, so I still have to do type checking. Do I have to hope and pray everything is documented properly on an interface so I know if I'm expected to give a string, a hash, a class, etc.
If I look at the source of a method I can get a list of methods being called and infer the type expected, but I have to perform analysis.
Ruby and duck typing: design by contract impossible?
The discussions in the preceding stackoverflow question don't really answer anything other than "there are processes you have to follow" and those processes don't seem to be standard, everyone has a different opinion on what process to follow, and the language has zero enforcement. Method Validation? Test-Driven Design? Documented API? Strict Method Naming Conventions? What's the standard and who dictates it? What do I follow? Would these guidelines solve this concern https://stackoverflow.com/questions/616037/ruby-coding-style-guidelines? Is there editors that help?
Conceptually I don't get the advantage either. You need to know what methods are needed for any method called, so regardless you are typing when you code anything. You just aren't informing the language or anyone else explicitly, unless you decide to document it. Then you are stuck doing all type checking at runtime instead of during coding. I've done PHP and Python programming and I don't understand it there either.
What am I missing or not understanding? Please help me understand this paradigm.
This is not a Ruby specific problem, it's the same for all dynamically typed languages.
Usually there are no guidelines for how to document this either (and most of the time not really possible). See for instance map in the ruby documentation
map { |item| block } → new_ary
map → Enumerator
What is item, block and new_ary here and how are they related? There's no way to tell unless you know the implementation or can infer it from the name of the function somehow. Specifying the type is also hard since new_ary depends on what block returns, which in turn depends on the type of item, which could be different for each element in the Array.
A lot of times you also stumble across documentation that says that an argument is of type Object, Which again tells you nothing since everything is an Object.
OCaml has a solution for this, it supports structural typing so a function that needs an object with a property foo that's a String will be inferred to be { foo : String } instead of a concrete type. But OCaml is still statically typed.
Worth noting is that this can be a problem in statically typed lanugages too. Scala has very generic methods on collections which leads to type signatures like ++[B >: A, That](that: GenTraversableOnce[B])(implicit bf: CanBuildFrom[Array[T], B, That]): That for appending two collections.
So most of the time, you will just have to learn this by heart in dynamically typed languages, and perhaps help improve the documentation of libraries you are using.
And this is why I prefer static typing ;)
Edit One thing that might make sense is to do what Scala also does. It doesn't actually show you that type signature for ++ by default, instead it shows ++[B](that: GenTraversableOnce[B]): Array[B] which is not as generic, but probably covers most of the use cases. So for Ruby's map it could have a monomorphic type signature like Array<a> -> (a -> b) -> Array<b>. It's only correct for the cases where the list only contains values of one type and the block only returns elements of one other type, but it's much easier to understand and gives a good overview of what the function does.
Yes, you seem to misunderstand the concept. It's not a replacement for static type checking. It's just different. For example, if you convert objects to json (for rendering them to client), you don't care about actual type of the object, as long as it has #to_json method. In Java, you'd have to create IJsonable interface. In ruby no overhead is needed.
As for knowing what to pass where and what returns what: memorize this or consult docs each time. We all do that.
Just another day, I've seen rails programmer with 6+ years of experience complain on twitter that he can't memorize order of parameters to alias_method: does new name go first or last?
This flies in the face of Ruby's promise of intuitiveness.
Not really. Maybe it's just badly written library. In core ruby everything is quite intuitive, I dare say.
Statically typed languages with their powerful IDEs have a small advantage here, because they can show you documentation right here, very quickly. This is still accessing documentation, though. Only quicker.
Consider that the design choices of strongly typed languages (C++,Java,C#,et al) enforce strict declarations of type passed to methods, and type returned by methods. This is because these languages were designed to validate that arguments are correct (and since these languages are compiled, this work can be done at compile time). But some questions can only be answered at run time, and C++ for example has the RTTI (Run Time Type Interpreter) to examine and enforce type guarantees. But as the developer, you are guided by syntax, semantics and the compiler to produce code that follows these type constraints.
Ruby gives you flexibility to take dynamic argument types, and return dynamic types. This freedom enables you to write more generic code (read Stepanov on the STL and generic programming), and gives you a rich set of introspection methods (is_a?, instance_of?, respond_to?, kind_of?, is_array?, et al) which you can use dynamically. Ruby enables you to write generic methods, but you can also explicity enforce design by contract, and process failure of contract by means chosen.
Yes, you will need to use care when chaining methods together, but learning Ruby is not just a few new keywords. Ruby supports multiple paradigms; you can write procedural, object oriend, generic, and functional programs. The cycle you are in right now will improve quickly as you learn about Ruby.
Perhaps your concern stems from a bias towards strongly typed languages (C++, Java, C#, et al). Duck typing is a different approach. You think differently. Duck typing means that if an object looks like a , behaves like a , then it is a . Everything (almost) is an Object in Ruby, so everything is polymorphic.
Consider templates (C++ has them, C# has them, Java is getting them, C has macros). You build an algorithm, and then have the compiler generate instances for your chosen types. You aren't doing design by contract with generics, but when you recognize their power, you write less code, and produce more.
Some of your other concerns,
third party libraries (gems) are not as hard to use as you fear
Documented API? See Rdoc and http://www.ruby-doc.org/
Rdoc documentation is (usually) provided for libraries
coding guidelines - look at the source for a couple of simple gems for starters
naming conventions - snake case and camel case are both popular
Suggestion - approach an online tutorial with an open mind, do the tutorial (http://rubymonk.com/learning/books/ is good), and you will have more focused questions.

Programming Go, using Unified Modelling Language Diagrams

I have just started using Go (GoLang) and I am finding it a great language. However, after many years of UML and the Object Oriented methods, I find that modelling Go programs (Reverse engineering) is a bit problematic, in that Go Structs contain properties/state, but no methods, and methods/functions that use Structs as parameters (even the ones that do magic so that it makes a Struct look like an object), don't contain methods, or state.
Does this mean I should be using another Methodology to model a Go Program or does UML sufficiently model the language constructs?
Yes I know that if you use methods on the Structs that the behavior of an object in UML can be mapped into Go via a combination of a Struct and a Struct Method, but I am finding this to be wrong, an impedance mismatch in paradigms of sorts.
Is it time for a new (perish the thought!) diagramming technique, for the brave new world where behavior is no longer controlled by an object? can behavior be modeled without reference to the state that it is affecting?
Update:
I am trying Data Flow Diagrams out, to see if they fit better to the paradigm. So far so good, but I think I am going to come unstuck when I model the Methods of a Struct, the compromise in the DFD being that they are treated as Functions. :(
Go supports inheritance!!! arghhh!!! (head is blown clean off.) you can compose a Struct which is made of another Struct, which has methods, that the Sub Struct now inherits...you getting this? my mind is blown. Means that UML IS valid...fully but it feels dirty.
Go Does not support inheritance, it just appears so. :) DFD's it is then!
The fact that methods are declared outside the definition of the struct itself should not be significant; it is just a minor syntax difference. The methods belong to the struct type just as surely as if they were inside the braces. (They are declared outside the braces because methods aren't limited to structs.)
The real potential problem with using UML with Go is that UML is normally used with traditional object-oriented design (i.e. class hierarchies), and Go takes a different approach to inheritance and polymorphism. Maybe UML can be adapted to Go's type system—I'm not familiar enough with UML to say—but your design approach will probably need to change somewhat whether you continue using UML or not.
Go is not intended to be used in the everything-is-an-object style of Smalltalk and Java. Idiomatic Go programs generally contain a large percentage of procedural code. If your design process focuses on object modeling, it will not work well for Go.
UML still gives you tools useful for analysis and design of components, interfaces, and data. Go is not an OO language, so you cannot use inheritance, polymorphism, methods, etc. You don't need a new paradigm, you may need an old one: Structured Analysis and Structured Design.
Go Structs contain properties/state, but no methods, and
methods/functions that use Structs as parameters (even the ones that
do magic so that it makes a Struct look like an object), don't contain
methods, or state.
As you may know, in C++ you can also declare methods on structs - just as in classes with the only difference that you won't be able to use access modifiers.
In OOP languages you declare methods of a class inside the class definition, giving the feeling that these methods are somehow part of the class. This is not true for most languages and Go makes this obvious.
When you declare something like the following pseudo-code in a traditional OOP language:
class Foo {
public function bar(x int) {
// ...
}
}
the linker will export a function that will look something like:
Foo__bar(this Foo, x int)
When you then do (assume f is an instance of Foo):
f.bar(3)
you are in fact (and indirectly, more on that later) doing:
Foo__bar(f, 3)
The class instance itself will only contain a so called vtable with function pointers to the methods it implements and/or inherits.
Additionally, methods do not contain state, at least not in the contemporary programming world.
Does this mean I should be using another Methodology to model a Go
Program or does UML sufficiently model the language constructs?
UML should suffice.
Is it time for a new (perish the thought!) diagramming technique, for
the brave new world where behavior is no longer controlled by an
object?
Naa.
Can behavior be modeled without reference to the state that it is
affecting?
Yes, that's what interfaces are for.
I am trying Data Flow Diagrams out, to see if they fit better to the
paradigm. So far so good, but I think I am going to come unstuck when
I model the Methods of a struct, the compromise in the DFD being that
they are treated as Functions. :(
Do not get lost in abstractions, break them. There is no perfect paradigm and there will never be.
If you like planning and modeling more than programming you should just stick with Java.
If you like building and maintaining the actual code and working systems you should try just planning your Go program on a piece of paper or a whiteboard and get programming.

Typing similar code

I've occasionally found myself in situations where I have to type out redundant code... where only one variable or two will change in each block of code. Usually I'll copy and paste this block and make the necessary changes on each block of code... but is there a better way to handle this?
Heavy use of cut and paste usually means there's something not quite right in the design of the code. Think about how you could refactor such as breaking out the cut/paste functionality into commonly called methods.
Yes. There is always a better way to do it than copy-and-paste. You should always get a little uneasy (kind of like you feel when you're about to give a speech in front of a huge crowd) when you're about to hit "Ctrl-V."
In almost any introductory class you're likely to be using a language that has functions, methods, or sub procedures. (What they're called and what they do depends on the language in question). Any variable that changes needs to be a parameter to that function/method/subprocedure.
When you do that (and the method/function/sub is accessible) you can replace the HUGE chunks of code with a single call to your new m-f-s.
There are a lot of other ways to do this, but when you're just getting started this is probably the way to go.
you have a lot of approaches to this situation. I don't know if you're working with OO or structured programming but you can build methods or functions and give them cohesion and unique responsibilities. I think it's an easy way of thinking...
In the OO paradigm we use some therms on how to avoid this situation: cohesion and low decoupling (you could search for them over the Internet). If you can apply both of them in your code, it will be easier to read and maintain.
That's all

Does Ruby have a Metaobject protocol and if not, is it possible to implement one?

Pardon my ignorance, but What is a Metaobject protocol, and does Ruby have one? If not, is it possible to implement one for Ruby? What features might a Metaobject protocol possess if Ruby was to have one?
What is a Metaobject protocol?
The best description I've come across is from the Class::MOP documentation:
A meta object protocol is an API to an object system.
To be more specific, it abstracts the components of an object system (classes, object, methods, object attributes, etc.). These abstractions can then be used to inspect and manipulate the object system which they describe.
It can be said that there are two MOPs for any object system; the implicit MOP and the explicit MOP. The implicit MOP handles things like method dispatch or inheritance, which happen automatically as part of how the object system works. The explicit MOP typically handles the introspection/reflection features of the object system.
All object systems have implicit MOPs. Without one, they would not work. Explicit MOPs are much less common, and depending on the language can vary from restrictive (Reflection in Java or C#) to wide open (CLOS is a perfect example).
Does Ruby have one?
According to this thread on Reopening builtin classes, redefining builtin functions? Perlmonks article I think the answer is no (at least in the strictest sense of what a MOP is).
Clearly there is some wriggle room here so it might be worth posting a question in the Perl side of SO because the Class::MOP / Moose author does answer questions there.
If you look closer to the definition, youll see that Ruby does have a MOP. Is it like the one in CLOS? No, CLOS is a meta-circular MOP which is great (I'd even say genius), but it's not the one true way, take a look at Smalltalk. To implement a (let's say basic) MOP all you need is to provide functions that allow your runtime to:
Create or delete a new class
Create a new property or method
Cause a class to inherit from a different class ("change the class structure")
Generate or change the code defining the methods of a class.
And Ruby provides a way to do all that.
On a side note: The author of Class::MOP is right (IMHO) when it claims that some of the things you can do with a meta circular MOP can be hard to do in Ruby (DISCLAIMER: I have zero, zilch, nada Perl knowledge, so I'm thinking Smalltalk like MOP vs CLOS like MOP here) but most of them are very specific (I'm thinking about metaclass instantation here) and there are ways to make things work without them. I think it all depends on your point view, meta circular MOPs are cooler but more on the academic side and non meta circular MOPs are more practical and easier to implement.

How does Linq work (behind the scenes)?

I was thinking about making something like Linq for Lua, and I have a general idea how Linq works, but was wondering if there was a good article or if someone could explain how C# makes Linq possible
Note: I mean behind the scenes, like how it generates code bindings and all that, not end user syntax.
It's hard to answer the question because LINQ is so many different things. For instance, sticking to C#, the following things are involved:
Query expressions are "pre-processed" into "C# without query expressions" which is then compiled normally. The query expression part of the spec is really short - it's basically a mechanical translation which doesn't assume anything about the real meaning of the query, beyond "order by is translated into OrderBy/ThenBy/etc".
Delegates are used to represent arbitrary actions with a particular signature, as executable code.
Expression trees are used to represent the same thing, but as data (which can be examined and translated into a different form, e.g. SQL)
Lambda expressions are used to convert source code into either delegates or expression trees.
Extension methods are used by most LINQ providers to chain together static method calls. This allows a simple interface (e.g. IEnumerable<T>) to effectively gain a lot more power.
Anonymous types are used for projections - where you have some disparate collection of data, and you want bits of each of the aspects of that data, an anonymous type allows you to gather them together.
Implicitly typed local variables (var) are used primarily when working with anonymous types, to maintain a statically typed language where you may not be able to "speak" the name of the type explicitly.
Iterator blocks are usually used to implement in-process querying, e.g. for LINQ to Objects.
Type inference is used to make the whole thing a lot smoother - there are a lot of generic methods in LINQ, and without type inference it would be really painful.
Code generation is used to turn a model (e.g. DBML) into code
Partial types are used to provide extensibility to generated code
Attributes are used to provide metadata to LINQ providers
Obviously a lot of these aren't only used by LINQ, but different LINQ technologies will depend on them.
If you can give more indication of what aspects you're interested in, we may be able to provide more detail.
If you're interested in effectively implementing LINQ to Objects, you might be interested in a talk I gave at DDD in Reading a couple of weeks ago - basically implementing as much of LINQ to Objects as possible in an hour. We were far from complete by the end of it, but it should give a pretty good idea of the kind of thing you need to do (and buffering/streaming, iterator blocks, query expression translation etc). The videos aren't up yet (and I haven't put the code up for download yet) but if you're interested, drop me a mail at skeet#pobox.com and I'll let you know when they're up. (I'll probably blog about it too.)
Mono (partially?) implements LINQ, and is opensource. Maybe you could look into their implementation?
Read this article:
Learn how to create custom LINQ providers
Perhaps my LINQ for R6RS Scheme will provide some insights.
It is 100% semantically, and almost 100% syntactically the same as LINQ, with the noted exception of additional sort parameters using 'then' instead of ','.
Some rules/assumptions:
Only dealing with lists, no query providers.
Not lazy, but eager comprehension.
No static types, as Scheme does not use them.
My implementation depends on a few core procedures:
map - used for 'Select'
filter - used for 'Where'
flatten - used for 'SelectMany'
sort - a multi-key sorting procedure
groupby - for grouping constructs
The rest of the structure is all built up using a macro.
Bindings are stored in a list that is tagged with bound identifiers to ensure hygiene. The binding are extracted and rebound locally where ever an expression occurs.
I did track the progress on my blog, that may provide some insight to possible issues.
For design ideas, take a look at c omega, the research project that birthed Linq. Linq is a more pragmatic or watered down version of c omega, depending on your perspective.
Matt Warren's blog has all the answers (and a sample IQueryable provider implementation to give you a headstart):
http://blogs.msdn.com/mattwar/

Resources