YAML include automation - include

Using YAML to create and maintain config for my Python program, and love it. My config is huge, so it had to be divided into smaller logically distinct file and used the trick of add_constructor() with the tag !include.
The config is hierarchical, it looks like as follows (well, with a lot more levels). Sections have their own YAML files, in a directory structure corresponding to the config hierarchy (i.e. A/B/C.yaml):
A:
B:
C: !include A/B/C
The added constructor for the !include tag gets 2 arguments, the loader and the node objects, as usual. I am pretty sure that the path in the node graph can be somehow figured out from the loader and/or node object.
I tried to look up the node object, but it does not have path or any argument or method that would provide this info. Loader has references to parser and other methods in the load process, but I do not understand yet the YAML loading enough to figure out where the path is.
I am lazy and I only want to state a single "!include this" in my config, i.e. without specifying (again) the path in the hierarchy.
A:
B:
C: !include this
"this" is to be a sort-of keyword, so that the node value exists and I can leave the option open for specifying a real/different path. My constructor will then check the node value, and if the "this" keyword is found, calculates the path, otherwise uses the node value literally as the path.

I would suggest to do this instead:
!include_dir_hierarchy
path: "."
prefixes: [ Section_, Sub-section_, Sub-sub-section_ ]
Assuming path points to the root of the A/B/C hierarchy, you can then in the constructor walk the referenced path and generate the in-memory hierarchy based on the directory names and the list of prefixes. I deduce from your question that this kind of mapping between the file system path and the YAML hierarchy is possible.

Related

How to retrieve the absolute path represented by a `Dir` object in Ruby?

I want to write a method that takes a Dir-object as argument and does something with it, and for that I need to know the absolute path represented by this object. How do I retrieve this information?
The solution I came with is something like File.absolute_path(dir.path) (dir is the Dir-object in question), which doesn't work if dir was instantiated with a relative path and the current working directory is different from the working directory at the time of the instantiation.

In YAML, must a quoted scalar be interpreted by a parser as a string?

I've seen advice around the Internet that if you want a YAML scalar value to be processed as a string, you should quote it:
foo : "2018-04-17"
In the example above, this advice is intended to tell me that the value 2018-04-17 will be processed by any given YAML parser as its native language's string type. For example, SnakeYAML would, if this advice were true, interpret this as a java.lang.String, and not as a java.util.Date. (As it happens, SnakeYAML interprets this as a java.util.Date, quotes or not, which is why I'm asking this question.)
But although this advice may happen to work with any given parser, I can't see where in the YAML 1.2. specification this advice might come from. The closest thing I can find is the following sentence:
YAML allows scalars to be presented in several formats. For example, the integer “11” might also be written as “0xB”. Tags must specify a mechanism for converting the formatted content to a canonical form for use in equality testing. Like node style, the format is a presentation detail and is not reflected in the serialization tree and representation graph.
And this one:
The scalar style is a presentation detail and must not be used to convey content information, with the exception that plain scalars are distinguished for the purpose of tag resolution.
And this one:
Note that resolution must not consider presentation details such as comments, indentation and node style.
Nevertheless, I see lots of YAML documents that rely on the double-quoting-the-value-means-it-will-be-parsed-as-a-string advice, which makes me think I'm misreading something. Is there contention on this subject?
Relevant section from the YAML 1.1 spec (note that SnakeYaml is YAML 1.1 and therefore, the 1.2 spec does not necessarily apply):
It is not required that all the tags of the complete representation be explicitly specified in the character stream. During parsing, nodes that omit the tag are given a non-specific tag: “?” for plain scalars and “!” for all other nodes. [...]
It is recommended that nodes having the “!” non-specific tag should be resolved as “tag:yaml.org,2002:seq”, “tag:yaml.org,2002:map” or “tag:yaml.org,2002:str” depending on the node’s kind. This convention allows the author of a YAML character stream to exert some measure of control over the tag resolution process. By explicitly specifying a plain scalar has the “!” non-specific tag, the node is resolved as a string, as if it was quoted or written in a block style. Note, however, that each application may override this behavior. For example, an application may automatically detect the type of programming language used in source code presented as a non-plain scalar and resolve it accordingly.
So to sum up, a YAML processor is not required to parse quoted scalars as string, and YAML also does not dictate which native type tag:yaml.org,2002:str does map to. And in fact, most YAML implementations do only follow parts of that advice. For example, if you deserialise YAML into a POJO/JavaBean with SnakeYaml, you typically do not use any explicit tags in your YAML, but your mappings are resolved to the corresponding Java classes in the root class' structure, instead of the generic Map which is what this advice suggests (since all mappings without explicit tags get the ! non-specific tag).
Note that this has been changed in YAML 1.2:
During parsing, nodes lacking an explicit tag are given a non-specific tag: “!” for non-plain scalars, and “?” for all other nodes.
That's closer to most implementations, but for example, if you deserialise into a class class Foo { String bar; }, this will still load although bar is not a string, but a field name:
"bar": some value
So the advice for using YAML is to specify the desired structure on the application side – in SnakeYaml, you would set the root class type, and then every value will be mapped to the required type at its point in the hierarchy, as long as it is able to map there, regardless of whether it is quoted or unquoted. In general, it makes more sense for the application to specify which kind of value it expects throughout the hierarchy instead of the YAML author to do that via quoting. This is also conformant with the YAML spec, which says
Resolving the tag of a node must only depend on the following three parameters: (1) the non-specific tag of the node, (2) the path leading from the root to the node, and (3) the content (and hence the kind) of the node.
Resolving a tag is the YAML term for determining the target type. And it is allowed to determine the target type based on its position in the hierarchy: The root type is determined by the fact that the element is the root of the YAML document and in the case of SnakeYaml, may be fed in via the API. All other types are determined by the fact that they are descendants from the root type.
Final note: If you really really want something to be a string, !!str 2018-04-17 will do since it sets a specific tag for the node.

how to reference a relative file from code and tests

I need to reference patients.json from patients.go, here's the folder structure:
If I do:
filepath.Abs("../../conf/patients.json")
it works for go test ./... but fails for revel run
If I do:
filepath.Abs("conf/patients.json")
the exact opposite happens (revel is fine but tests fail).
Is there a way to correctly reference the file so that it works both for tests and normal program run?
Relative paths are always interpreted / resolved to a base path: the current or working directory - therefore it will always have its limitations.
If you can live with always taking care of the proper working directory, you may keep using relative paths.
What I would suggest is to not rely on the working directory, but an explicitly specified base path. This may have a default value hard-coded in your application (which may be the working directory as well), and you should provide several ways to override its value.
Recommended ways to override the base path to which your "relative" paths are resolved against:
Command line flag (see flag package)
Environment variable (see os.Getenv())
(Fix named) Config file in user's home directory (see os/user/User and os/user/Current())
Once you have the base path, you can get the full path by joining the base path and the relative path. You may use path.Join() or filepath.Join(), e.g.:
// Get base path, from any or from the combination of the above mentioned solutions
base := "/var/myapp"
// Relative path, resource to read/write from:
relf := "conf/patients.json"
// Full path that identifies the resource:
full := filepath.Join(base, relf) // full will be "/var/myapp/conf/patients.json"
I've never used Revel myself but the following looks helpful to me:
http://revel.github.io/docs/godoc/revel.html
revel.BasePath
revel.AppPath
This is not the problem with path, but the problem with your design.
You should design your code more careful.
As far as I can tell, you share same path in your test file and reveal run. I guess that maybe you hard code your json path in your model package which is not suggested.
Better way is
model package get json path from global config, or init model with json path like model := NewModel(config_path). so reveal run can init model with any json you want.
hard code "../../conf/patients.json" in your xxxx_testing.go

FHIR StructureDefinition - differential definitions

The DSTU2 May ballot version has a StructureDefintion resource (replaces Profile) that allows for "differential" definition of structures.
It's pretty straightforward to use this to add elements to an existing structure - all elements in the differential are "adds" to the base.
However, how does one modify or reduce an existing profile? More specifically:
How can an element in a base structure be reliably matched to an element in a differential structure so that the differential can modify the base?
I can see two possibilities:
Use Path. A required element that works for non-sliced elements but not for slices (extensions are always slices).
Use Name. Except it's optional, so if the base didn't name their element, this won't work.
Is there another way?
Working example here: http://hl7.org/fhir/2015May/extensibility-examples.html#1.16.2.1.2
In this example, matching by path would replace any other extension, and name matching won't work because neither element is named. The only option is to treat it as an addition (which luckily is the intent here). But if I wanted to further modify this structure using this one as the base (perhaps to set max="1") I'd be unable to.
Actually, adds aren't adds. Any additions would have to be slices of extension - you can't add new elements in a profile. So (unless you're defining resources - which only HL7 can do), every element you specify in a constraint StructureDefinition must specify a "path" that corresponds to an existing path in the base resource. To constrain an existing element, simply identify that path and assert your constraints. If what you want to constrain can't be identified just by a path (i.e. you want to constrain a slice defined in a parent profile), then you'll need to re-declare the slicing and assert the additional constraints on the relevant slice. Name is used to uniquely identify slices within a profile but isn't (presently) used across profiles.

What is a sensible data-structure for allowing efficient synchronisation between two root paths?

I am working on an application that involves maintaining consistency between two local directories. Specifically, the directories should be identical, with the exception that all files in one of the directories are modified in some particular way (this part is not important to my question).
While running, my application runs two processes that listen for changes occurring under each of the paths, and performs relevant operations to bring them back in sync when necessary.
In terms of my specific question: I'm looking for advice on the tricker situation of when one starts the application. At this point, each process needs to check all files/folders under both the path that it is looking after, to see if anything has changed in anyway whilst the application was not running. (Let us assume that the application cannot be notified by the OS of anything that happened while it was shutdown, and thus will need to directly check every file/folder.)
Each process will have access to (and maintain) a persistent data-structure of all files/folder under its designated path. I was thinking that the following should be held within the data-structure for each of the files and folders:
File/folder name;
File hash (CRC32);
File/folder last mod data; and
File/folder size.
These pieces of information will obviously help to check for any changes to files/folder, but what is the best way to store them?
It seems to me that one sensible way to approach the situation of an application start is for each process to recursively scan through all files/folders under its designated path, and compare the metadata for each file scanned to the metadata stored in its data-structure. Then the processes should also iterate through the data-structures to look for things that have been removed from the paths. Some cases that may be encountered during this process are:
file modified (file name found in data-structure, but hash differs);
file added (no identical filename or hash found in data-structure);
file renamed (file with same hash exists in data-structure, but not with same filename);
folder added (no folder name in data-structure);
folder removed (folder name in data-structure, but not under path);
folder renamed (tricky one).
So, what's the best data-structure to use for this task? In my head I'm thinking some form of sorted associative array, e.g., a red-black tree, which store file and folder objects. Each file object contains name, hash and mod-date attributes , while each folder object contains name and children attributes, where children stores another associative array with everything underneath. Given the path to an arbitrary file, e.g., /foo/bar/file.txt, you begin at the root (foo), check for bar and so on until you get to file.txt's parent object.
Another alternative I can think of is to merely store everything flatly, such that there is one red-black tree where each key is the full path to each file/folder, and the value is the file / folder object. This would probably be quicker for retrieval, but it won't be possible to detect renamed files/folders without iterating through all values anyway, which sounds expensive. In the first approach, it may be the case that identifying a rename would only involves checking a portion of the data-structure rather than all of it.
Sorry the above ideas aren't terribly well thought out. What's the state of the art in this area, and are there any well-trodden approaches to these types of problems?
You're modelling a filesystem, so it's quite natural to use a hierarchical data structure. After all, you don't need to compare the file at dir1\dir2\foo.txt to dir3\bar.txt, right? You didn't mention file moves between directories as something you're tracking.
So, the data structure could be:
interface IFSEntry {
string name
datetime creationDate
pure virtual bool Compare(IFSEntry other)
pure virtual void UpdateFrom(IFSEntry other)
pure virtual bool WasRenamed(Dictionary<string,IFSEntry> possibleOriginals, out string oldName)
...
}
class File : IFSEntry {
...
}
class Directory : IFSEntry {
private Dictionary<string,IFSEntry> children;
...
}
The Directory implementations of UpdateFrom and Compare would recurse down their children.
File renames would be relatively easy by comparing CRC's. You'd miss files that changed in both places and were renamed. You could add a CRC dictionary to the Directory class if the time to run the comparisons proves a performance problem.
For directory moves, if the child files also changed, then you've got a fuzzy logic situation. It would be best to have a merge tool that the user would operate for that situation.
If a file changes in both places, you also need a user-facing merge strategy if conflicting changes occur. I'd argue that is always a good idea, just to let the user eyeball that the document didn't lose coherence.

Resources