What is the best way to compute the transitive import closure of an XML Schema definition? - validation

I have a set of XML schema definition resources (files). These files contain mutual import and include directives. For a specific purpose users will instantiate element definitions in a particular XSD. I would like to provide them with an excerpt that contains only the XSD resources required for the task. This means I need to trace all imports and includes to other resources recursively, until I have I set. (A Kleene Star or transitive closure).
I assume that this is implicitly done when I validate the schemata from the entry point. So there might be a call back that lists all dependencies resolved during the process that I can tap into.
The other solution I see is to use DOM and manually parse each schema for the import and include elements. This seems clunky, however.

I think the most convenient way to do this would be with an XSLT stylesheet to which you provide a list of starting points (URIs, or if you need to be careful about chameleon inclusion, namespace-name/URI pairs), and which then fetches the documents and computes the transitive closure, emitting either a list of URIs (or, again, namespace / URI pairs) or a sequence of XSD schema documents.
XQuery could also be used.
And as you suggest, DOM could also be used, with the host programming language of your choice. (I'd do it in XSLT or XQuery, myself, but that's because I do most of my programming in those languages.) Some validators may provide an API for getting a list of the schema documents consulted, or you may be able to extract that information from a validator's representation of the PSVI; APIs to XSD validation are not standardized.
Note that in the general case you need to watch out for and handle xsd:redefine and xsd:override, not just xsd:include and xsd:import.
And of course, if this is a one-shot task and the number of modules is likely to be less than fifty, it may be faster to do it by hand than by writing a program to do it automatically.

Related

Transitive reference to contained resources

We have a query regarding referencing resources contaiend in another. Assume the scenario,
ResourceA
|- contains
|- ResourceB
|- ResourceC
And ResourceB has a reference to ResourceC.
In above case, is it mandatory for ResourceA to contain direct reference to both Resource B and C? Or is it suifficient for A to refer to B, as there is a tarnsitive reference to C already (A->B->C)?
Our intepretation of the FHIR spec is latter (i.e. transitive reference is sufficient), reading below statement from https://www.hl7.org/fhir/references.html#contained
A contained resource SHALL only be included in a resource if something
in that resource (potentially another contained resource) has a
reference to it.
However there is also another note under same documentation which is causing us a confusion, mainly as the example only demonstrates reference from current().
Implementation Note: Contained resources are still a reference rather
than being inlined directly into the element that is the reference
(e.g. "custodian" above) to ensure that a single approach to resolving
resource references can be used. Though direct containment would seem
simpler, it would still be necessary to support internal references
where the same contained resource is referenced more than once. In the
end, all that it would achieve is creating additional options in the
syntax. For users using XPath to process the resource, the following
XPath fragment resolves the internal reference:
ancestor::f:[not(parent::f:)]/f:contained/*[#id=substring-after(current()/f:reference/#value,
'#')]
Could you please help clarify this. Thanks.
The intention is that what you're doing should be allowed. However, the way the invariant is currently defined, it doesn't work - in either XPath or FHIRPath (the latter being more important). To be honest, we haven't figured out how to fully express it properly because what we're looking for is that there is a reference chain from the containing resource to all contained resources or a reference chain from the contained resources to the containing resource. There's no simple FHIRPath or XPath expression that allows for a recursive checking of this. (XPath's 'ancestor' only works on the instance, not the logical reference hierarchy.) So - what you're doing is technically non-conformant with R4. It (hopefully) won't be non-conformant in R5. And it's certainly in the spirit of what we'd intended to allow in R4...
Your interpretation is definitely correct.
I believe the xpath expression is fine, substring-after(current()/f:reference/#value, '#') extracts the Reference.reference.value field from the current node and strips the leading #. This is relative to the reference field, not the contained resource or container. The path ancestor::f:*[not(parent::f:*)] can traverse multiple steps up the ancestors to find the contained resources so it is indifferent to whether they're in the same resource or a container. It uses the constraint that contained resources can't contain more contained resources, otherwise it could be ambiguous.

How to avoid implicit include statements in Rhapsody code generation

I'm creating code for interfaces specified in IBM Rational Rhapsody. Rhapsody implicitly generates include statements for other data types used in my interfaces. But I would like to have more control over the include statements, so I specify them explicitly as text elements in the source artifacts of the component. Therefore I would like to prevent Rhapsody from generating the include statements itself. Is this possible?
If this can be done, it is mostly likely with Properties. In the feature box click on properties and filter by 'include' to see some likely candidates. Not all of the properties have descriptions of what exactly they do so good luck.
EDIT:
I spent some time looking through the properties as well an could not find any to get what you want. It seems likely you cannot do this with the basic version of Rhapsody. IBM does license an add-on to customize the code generation, called Rules Composer (I think); this would almost certainly allow you to customize the includes but at quite a cost.
There are two other possible approaches. Depending on how you are customizing the include statements you may be able to write a simple shell script, perhaps using sed, and then just run that script to update your code every time Rhapsody generates it.
The other approach would be to use the Rhapsody API to create a plugin/tool that iterates through all the interfaces and changes the source artifacts accordingly. I have not tried this method myself but I know my coworkers have used the API to do similar things.
Finally I found the properties that let Rhapsody produce the required output: GenerateImplicitDependencies for several elements and GenerateDeclarationDependency for Type elements. Disabling these will avoid the generation of implicit include statements.

What is the best way to manage a large quantity of constants

I am currently working on a very complex program that processes rows from an input table and has a huge number of possible outcomes for each record. Because of this I have a very large number of constants defined for the outcome messages. There is one success message for the record, but a multitude of possible warnings and errors.
My first thought was to define all of my constants for these messages at the package body level, but then I decided to move each constant to the procedure where it is used. I'm now second guessing that decision and thinking of moving everything back to package body level. What is the best way to define this many constants? Ease of maintainability is my ultimate goal for this program since it is so complex.
I think this is a matter of taste. In my application I put all error codes into an Error-Package. All main and commonly used constants I put into a separate package (without a package body).
Again, a matter of taste, but I tend to put a list of named constants at the package spec level rather than the package body so that they can be referenced by any portion of the application. If I ever want to change the error code that c_err_for_specific_reason_x uses, it becomes a single place to do so.
If I wanted to hide the codes and put them within the body I would have a get_error_code(p_get_error_name varchar) function that did the translation based on you passing a valid constant name.
I've done both on different projects, but tend towards the list over the function most times. I tend to use the function if it a table-driven source of the data.
It ... wait for it ... depends!
Since you currently define your constants in the package body, you don't need them to be publicly accessible outside the package. So defining them in a spec really doesn't buy you anything.
Here's is the rule I follow: Define constants within the smallest scope needed. So if a constant is used only within one procedure, define it in that procedure. If it is used within more than one procedure, define it in the body. If it is used elsewhere by code in other packages (or non-packaged SPs) but only when using a particular package, define it in the spec of that package. If it is used by other code for general use, put it in a separate spec of such general constants.

How should I organise structs, variables and interfaces in Go?

I have a codebase where one file contains quite a lot of Structs, Interfaces and Variables in the same file as functions and I'm not sure if I need to seperate this into separate files with appending filename. So for example accounts.go will be accounts_struct.go and accounts_interface.go with struct and interface respectively.
What would be a good approach for the file organisation when you have growing codebase for Structs, Variables and Interfaces?
A good model to check out is the source code of Go itself: http://golang.org/src
(in addition of the official "Effective Go")
You will see that this approach (separating based on language items like struct, interface, ...) is never used.
All the files are based on features, and it is best to use a proximity principle approach, where you can find in the same file the definition of what you are using.
Generally, those features are grouped in one file per package, except for large ones, where one package is composed of many files (net, net/http)
If you want to separate anything, separate the source (xxx.go) from the tests/benchmarks (xxx_test.go)
As Thomas Jay Rush adds in the comments
Sometimes source code is automatically generated -- especially data structure definitions.
If the data structures are in the same file as the hand-wrought code, one must build capacity to preserve the hand-wrought portion in the code-generation phase.
If the data structures are separated in a different file, then inclusion allows one to simply write the data structure file without worry.
Dave Cheney offers an interesting perspective in "Absolute Unit (Test) # LondonGophers" (March 2019)
You should take a broader view of the "unit" under test.
The units are not each internal function you write, but a whole package. Specifically the public API of a package.
Organizing your files to facilitate testing their Public API is a good idea.
accounts_struct_test.go would not, in that regards, make much sense.
See also "How I organize packages in Go" by Bartłomiej Klimczak
Sometimes, a few handlers or repositories are needed.
For example, some information can be stored in a database and then sent via an event to a different part of your platform. Keeping only one repository with a method like saveToDb() isn’t that handy at all.
All of elements like that are split by the functionality: repository_order.go or service_user.go.
If there are more than 3 types of the object, there are moved to a separate subfolder.
Here is my mental model for designing a package.
a. A package should encompass one idea or concept. http is a concept, http client or http message is not.
b. A file in a package should encompass a set of related types, a good rule of thumb is if two files share the same set of imports, merge them. Using the previous example, http/client.go, http/server.go are a good level of granularity
c. Don't do one file per type, that's not idiomatic Go.

Artifact naming convention

We're doing a big project on OSGi and adding some commons modules. There's some discussion about naming the artifact.
So, one possibility when naming the module is for example:
cmns-definitions (for common definitions), another is cmns-definition, still another is cmns-def. This has some effect also on the package name. Now it's
xx.xxx.xxx.xxx.xxx.commons.definitions, if changing to cmns-def it would be xx.xxx.xxx.xxx.xxx.commons.def.
Inside this package will be classes like enums and other definitions to be used throughout the system.
I personally lean to cmns-definitions since there's not only 1 definition inside the package. Other people point out that java.util doesn't have only 1 utility there for example. Still, java.util is an abbreviation for me. It can mean java utility or java utilities. Same thing happens with commons-lang.
How would you name the package? Why would you choose this name?
cmns-definitions
cmns-definition
cmns-def
Bonus question: How to name something like cmns-exceptions? That's how I name it. Would you name it cmns-xcpt?
ËDIT:
I'm throwing in my own thoughts on this in the hope of being either confirmed or contradicted. If you can, please do.
According to what I think, the background reason why you name something is to make it easier to understand what's inside it. Or, according to Peter Kriens, to make it easy to remember and being able to automate processes via patterns. Both are valid arguments.
My reasoning is as follows in terms of pattern:
1) When a substantivation occurs and it's well known in the industry, follow it on your naming.
Eg:
"features" is a case on this. We have a module called cmns-features. Does this mean we have many features on this module? No. It means "the module that implements the "features" file from Apache karaf".
"commons" is a substantivation of "common" well-accepted on the industry. It doesn't mean "many common". It means "Common code".
If I see extr-commons as a module name, I know that it contains common code for extr (in this case extraction), for example.
2) When a quantity of classes inside the module are cooperating to give a distinct "one and one only" meaning to the whole, use singular form to name it.
The majority of modules are included here. If I name something cmns-persistence-jpa, I mean that whatever classes inside cooperate together to provide the jpa implementation of cmns-persistence-api. I don't expect 2 implementations inside it, but actually a myriad of classes that together make one implementation. Crystal clear to me. No?
3) When a grouping of classes is done with the sole purpose of gathering classes by affinity, but the classes don't cooperate together to no purpose, use plural.
Here is the case for example of cmns-definitions (enums used by the whole system).
Alternatively, using an abbreviation circumvents the problem, e.g. cmns-def which can be also "interpreted expanded" by a human reader to cmns-definitions. Many people use also "xxxx-util" meaning xxxx-utilities.
Still a third option can be used to pack things together, using a name that itself means a pluralization. The word "api" comes to mind, but any word that pluralizes something would do, like "pack".
Support to these cases (3) are well-known modules like commons-collections (using the plural) or commons-dbcp (using abbreviation) or commons-lang (again abbreviation) and anything that uses api to pack classes together by affinity.
From apache:
commons-collections -> many powerful data structures that accelerate development of most significant Java applications
commons-lang -> host of helper utilities for the java.lang API
commons-dbcp -> package of several database connection pools
'it is just a name ...'
I find in my long career that these just names can make a tremendous difference in productivity. I do not think it makes a difference if you use definitions, definition, or def as long as you're consistent and use patterns in the name that are easy to remember and can be used to automate processes. A build based on a consistent naming scheme is infinitely easier to work with than a build with "nice human display" names that are ad-hoc and have no discernible pattern.
If you use patterns, names tend to become shorter. Now people working with these names usually spent a lot of time with them. So their readability is not nearly as important as their mnemonic value. It turns out that abbreviations of 3 or 4 characters are surprisingly powerful. One of the reason is they work well is that there is only one possible abbreviation while if you go longer there are many candidates.
Anyway, most import part is the overall consistency. Good luck.
definitions (or def or definition) is a bad name because it doesn't have any semantic to a reader. You're in an object oriented world (I suppose) - try to follow its conventions and principles. Modules in Maven should be named after the biggest "abstraction" they contain. "Definition" is a form, not a meaning.
Your question is similar to: "Which class name is better FileUtilities or FileUtils". Answer: none.
Basically what you do with the Definitions and Exceptions is to provide kind of an API for your other modules. So I propose to combine definitions, exceptions and add interfaces to it. Then it makes sense to call it all cmns-api. I normally prefer the singular names as they are shorter but you are free to decide as it is just a name.

Resources