Compare two JDOM2 documents for equality - xmlunit

I have an application that generates XML documents using the JDOM2 library. In my unit/integration tests, I would need to compare the generated documents against handmade sample documents for equality.
With standard org.w3c.dom.Document objects, I would do that via XMLUnit. How to do that with JDOM2?

Doing a deep-equals with JDOM is not natively supported in the JDOM API. You will need to build your own. This is a good potential feature for JDOM 2.1.... hmmm. Perhaps I will add something like that (but it will need to be relatively complicated to get right).
Deep equals on two JDOM documents is complicated. You will need to compare namespaces, attributes, comments, processing instructions, etc. Often (some of) these differences are not important - like comments, or white-space differences: perhaps one side has the two text members Text("Hello ") and Text("World!"), and the other may have a single text member Text("Hello World!"). Are they the same?
My suggestion is to use some of the convenience features of JDOM2 (like the getDescendants() iterator) and then do your own logic to compare the two iterators one against the other.
I will consider a native JDOM API mechanism with some sort of interface for callbacks so that a deep compare is possible (with probably something that organizes a Comparable result (negative, 0, or positive) for less-than, equal, or greater-than.

Related

Is there such a thing as ‘class bloat’ - i.e. too many classes causing inefficiencies?

E.g. let’s consider I have the following classes:
Item
ItemProperty which would include objects such as Colour and Size. There's a relation-property of the Item class which lists all of the ItemProperty objects applicable to this Item (i.e. for one item you might need to specify the Colour and for another you might want to specify the Size).
ItemPropertyOption would include objects such as Red, Green (for Colour) and Big, Small (for Size).
Then an Item Object would relate to an ItemProperty, whereas an ItemChoice Object would relate to an ItemPropertyOption (and the ItemProperty which the ItemPropertyOption refers to could be inferred).
The reason for this is so I could then make use of queries much more effectively. i.e. give me all item-choices which are Red. It would also allow me to use the Parse Dashboard to quickly add elements to the site as I could easily specify more ItemPropertys and ItemPropertyOptions, rather than having to add them in the codebase.
This is just a small example and there's many more instances where I'd like to use classes so that 'options' for various drop-downs in forms are in the database and can easily be added and edited by me, rather than hard-coded.
1) I’ll probably be doing this in a similar way for 5+ more similar kinds of class-structures
2) there could be hundreds of nested properties that I want to access via ‘inverse querying’
So, I can think of 2 potential causes of inefficiency and wanted to know if they’re founded:
Is having lots of classes inefficient?
Is back-querying against nested classes inefficient?
The other option I can think of — if ‘class-bloat’ really is a problem — is to make fields on parent classes that, instead of being nested across other classes (that represent further properties, as above), just representing them as a nested JSON property directly.
The job of designing is to render in object descriptions truths about the world that are relevent to the system's requirements. In the world of the OP's "items", it's a fact that items have color, and it's a relevant fact because users care about an item's color. You'd only call a system inefficient if it consumes computing resources that it doesn't need to consume.
So, for something like a configurator, the fact that we have items, and that those items have properties, and those properties have an enumerable set of possible values sounds like a perfectly rational design.
Is it inefficient or "bloated"? The only place I'd raise doubt is in the explicit assertion that items have properties. Of course they do, but that's natively true of javascript objects and parse entities.
In other words, you might be able to get along with just item and several flavors of propertyOptions: e.g. Item has an attribute called "colorProperty" that is a pointer to an instance of "ColorProperty" (whose instances have a name property like 'red', 'green', etc. and maybe describe other pertinent facts, like a more precise description in RGB form).
There's nothing wrong with lots of classes if they represent relevant truth. Do that first. You might discover empirically that your design is too resource consumptive (I doubt you will in this case), at which point we'd start looking for cheats to be somehow skinnier. But do it the right way first, cheat later only if you must.
Is having lots of classes inefficient?
It's certainly inefficient for poor humans who have to remember what all those classes do and how they're related to each other. It takes time to write all those classes in the first place, and every line that you write is a line that has to be maintained.
Beyond that, there's certainly some cost for each class in any OOP language, and creating more classes than you really need will mean that you're paying more than you need to for the work that you're doing, which is pretty much the definition of inefficient.
I’ll probably be doing this in a similar way for 5+ more similar kinds of class-structures
Maybe you could spend some time thinking about the similarity between these cases and come up with a single set of more flexible classes that you can use in all those cases. Writing general code is harder than writing very specific code, but if you do a good job you'll recoup the extra effort many times over through reuse.

F# code quotation invocation, performance, and run-time requirements

Here are 4 deeply related questions about F# code quotations -
How do I invoke an F# code quotation?
Will it be invoked in a manner less efficient than if it were just a plain old F# lambda? to what degree?
Will it require run-time support for advanced reflection or code-emitting functionality (which is often absent or prohibited from embedded platforms I am targeting)?
Quotations are just data, so you can potentially "invoke" them in whatever clever way you come up with. For instance, you can simply walk the tree and interpret each node as you go, though that wouldn't perform particularly well if you're trying use the value many times and its not a simple value (e.g. if you've quoted a lambda that you want to invoke repeatedly).
If you want something more performant (and also simpler), then you can just use Linq.RuntimeHelpers.LeafExpressionConverter.EvaluateQuotation. This doesn't support all possible quotations (just roughly the set equivalent to C# LINQ expressions), and it's got to do a bit more work to actually generate IL, etc., but this should be more efficient if you're reusing the result. This does its work by first converting the quotation to a C# expression tree and then using the standard Compile function defined there, so it will only work on platforms that support that.

Are methods such as XContainer.Elements, XContainer.Nodes etc also considered Linq to XML query operators?

Book uses different terms for Linq-to-XML methods/properties defined in classes XObject, XNode, XElement etc ( such as XContainer.Elements, XContainer.Nodes, XObject.Document ... ) and for extension methods defined in Extensions class. For former it uses the term methods, while with extension methods it uses the term query operators.
Is there a particular reason why author uses two different terms or are XContainer.Elements, XContainer.Nodes etc also considered Linq-to-XML query operators?
Thank you
Ultimately I doubt that these terms are specified anywhere in a particularly definitive way - and I wouldn't worry too much about it.
I wouldn't be surprised to see the author using the terms inconsistently themselves. I'd be even less surprised if that were the case and the author turned out to be me ;)
I'm not sure which book you are refering to, but the Elements/Nodes/etc methods are considered Axis Methods (http://msdn.microsoft.com/en-us/library/bb387099.aspx). I would think the query operators would be things like Select/Where/OrderBy regardless of whether they exist directly on the type in question or if they were static extension methods.

Abstracting away from data structure implementation details in Clojure

I am developing a complex data structure in Clojure with multiple sub-structures.
I know that I will want to extend this structure over time, and may at times want to change the internal structure without breaking different users of the data structure (for example I may want to change a vector into a hashmap, add some kind of indexing structure for performance reasons, or incorporate a Java type)
My current thinking is:
Define a protocol for the overall structure with various accessor methods
Create a mini-library of functions that navigate the data structure e.g. (query-substructure-abc param1 param2)
Implement the data structure using defrecord or deftype, with the protocol methods defined to use the mini-library
I think this will work, though I'm worried it is starting to look like rather a lot of "glue" code. Also it probably also reflects my greater familiarity with object-oriented approaches.
What is the recommended way to do this in Clojure?
I think that deftype might be the way to go, however I'd take a pass on the accessor methods. Instead, look into clojure.lang.ILookup and clojure.lang.Associative; these are interfaces which, if you implement them for your type, will let you use get / get-in and assoc / assoc-in, making for a far more versatile solution (not only will you be able to change the underlying implementation, but perhaps also to use functions built on top of Clojure's standard collections library to manipulate your structures).
A couple of things to note:
You should probably start with defrecord, using get, assoc & Co. with the standard defrecord implementations of ILookup, Associative, IPersistentMap and java.util.Map. You might be able to go a pretty long way with it.
If/when these are no longer enough, have a look at the sources for emit-defrecord (a private function defined in core_deftype.clj in Clojure's sources). It's pretty complex, but it will give you an idea of what you may need to implement.
Neither deftype nor defrecord currently define any factory functions for you, but you should probably do it yourself. Sanity checking goes inside those functions (and/or the corresponding tests).
The more conceptually complex operations are of course a perfect fit for protocol functions built on the foundation of get & Co.
Oh, and have a look at gvec.clj in Clojure's sources for an example of what some serious data structure code written using deftype might look like. The complexity here is of a different kind from what you describe in the question, but still, it's one of the few examples of custom data structure programming in Clojure currently available for public consumption (and it is of course excellent quality code).
Of course this is just what my intuition tells me at this time. I'm not sure that there is much in the way of established idioms at this stage, what with deftype not actually having been released and all. :-)

How does Linq work (behind the scenes)?

I was thinking about making something like Linq for Lua, and I have a general idea how Linq works, but was wondering if there was a good article or if someone could explain how C# makes Linq possible
Note: I mean behind the scenes, like how it generates code bindings and all that, not end user syntax.
It's hard to answer the question because LINQ is so many different things. For instance, sticking to C#, the following things are involved:
Query expressions are "pre-processed" into "C# without query expressions" which is then compiled normally. The query expression part of the spec is really short - it's basically a mechanical translation which doesn't assume anything about the real meaning of the query, beyond "order by is translated into OrderBy/ThenBy/etc".
Delegates are used to represent arbitrary actions with a particular signature, as executable code.
Expression trees are used to represent the same thing, but as data (which can be examined and translated into a different form, e.g. SQL)
Lambda expressions are used to convert source code into either delegates or expression trees.
Extension methods are used by most LINQ providers to chain together static method calls. This allows a simple interface (e.g. IEnumerable<T>) to effectively gain a lot more power.
Anonymous types are used for projections - where you have some disparate collection of data, and you want bits of each of the aspects of that data, an anonymous type allows you to gather them together.
Implicitly typed local variables (var) are used primarily when working with anonymous types, to maintain a statically typed language where you may not be able to "speak" the name of the type explicitly.
Iterator blocks are usually used to implement in-process querying, e.g. for LINQ to Objects.
Type inference is used to make the whole thing a lot smoother - there are a lot of generic methods in LINQ, and without type inference it would be really painful.
Code generation is used to turn a model (e.g. DBML) into code
Partial types are used to provide extensibility to generated code
Attributes are used to provide metadata to LINQ providers
Obviously a lot of these aren't only used by LINQ, but different LINQ technologies will depend on them.
If you can give more indication of what aspects you're interested in, we may be able to provide more detail.
If you're interested in effectively implementing LINQ to Objects, you might be interested in a talk I gave at DDD in Reading a couple of weeks ago - basically implementing as much of LINQ to Objects as possible in an hour. We were far from complete by the end of it, but it should give a pretty good idea of the kind of thing you need to do (and buffering/streaming, iterator blocks, query expression translation etc). The videos aren't up yet (and I haven't put the code up for download yet) but if you're interested, drop me a mail at skeet#pobox.com and I'll let you know when they're up. (I'll probably blog about it too.)
Mono (partially?) implements LINQ, and is opensource. Maybe you could look into their implementation?
Read this article:
Learn how to create custom LINQ providers
Perhaps my LINQ for R6RS Scheme will provide some insights.
It is 100% semantically, and almost 100% syntactically the same as LINQ, with the noted exception of additional sort parameters using 'then' instead of ','.
Some rules/assumptions:
Only dealing with lists, no query providers.
Not lazy, but eager comprehension.
No static types, as Scheme does not use them.
My implementation depends on a few core procedures:
map - used for 'Select'
filter - used for 'Where'
flatten - used for 'SelectMany'
sort - a multi-key sorting procedure
groupby - for grouping constructs
The rest of the structure is all built up using a macro.
Bindings are stored in a list that is tagged with bound identifiers to ensure hygiene. The binding are extracted and rebound locally where ever an expression occurs.
I did track the progress on my blog, that may provide some insight to possible issues.
For design ideas, take a look at c omega, the research project that birthed Linq. Linq is a more pragmatic or watered down version of c omega, depending on your perspective.
Matt Warren's blog has all the answers (and a sample IQueryable provider implementation to give you a headstart):
http://blogs.msdn.com/mattwar/

Resources