Can a YAML anchor be defined elsewhere than its first usage? - yaml

I'm writing a fairly complex YAML document (such as for several Bitbucket Pipelines) and I have a section that is repeated in multiple places. I'd like to use a YAML anchor for modularity, that is, to ensure the section can be easily maintained (without drift or unintended discrepancies between these multiple places).
Does my &anchor have to be defined inline in the first place where the repeated node occurs? Or is it possible to just have a *reference in that location, and to instead define the anchor somewhere separate (i.e., either at the very top or bottom of the YAML file, preferably somehow outside of the node tree for the document, like as for a subroutine called from multiple places inside a main function)? I think this would be more readable in my case, because there would be more obvious symmetry between the lines of config that share the repetition (rather than making the initial instance look completely different from the subsequent instances).

Related

What is the essential difference between Document and Collectiction in YAML syntax?

Warning: This question is a more philosophical question than practical, but I find it well as to be asked and answered in practical contexts (forums like StackOverflow here, instead of the SoftwareEngineering stack-exchange website), due to the native development in the actual use de-facto of YAML and the way the way it's specification has evolved and features have been added to it over time. Let's ask:
As opposed to formats/languages/protocols such as JSON, the YAML format allows you (according to this link, that seems pretty official, or at least accurate and reliable source to understand the YAML specification) to embed multiple 'Documents' within one file/stream, using the three-dashes marking ("---").
If so, it's hard to ignore the fact that the concept/model/idea of 'Document' in YAML, is no longer an external definition, or "meta"-directive that helps the human/parser to organize multiple/distincted documents along each other (similar to the way file-systems defining the concept of "file" to organize different files, but each file in itself - does not necessarily recognize that it's a file, or that it's being part of a file system that wraps it, by definition, AFAIK.
However, when YAML allows for a multi-Document YAML files, that gather collections of Documents in a single YAML file (and perhaps in a way that is similar/analogous to HTTP Pipelining approach of HTTP protocol), the concept/model/idea/goal of Document receives a new, wider definition/character de-facto, as a part of the YAML grammar and it's produces, and not just of the YAML specification as an assistive concept or format description that helps to describe the specification.
If so, being a Document part of the language itself, what is the added value of this data-structure, compared to the existing, familiar and well-used good old data-structure of Collection (array of items)?
I'm asking it, because I've seen in this link (here) some snippet (in the second example), which describes a YAML sequence that is actually a collection of logs. For some reason, the author of the example, chose to prefer to present each log as a separate "Document" (separated with three-dashes), gathered together in the same YAML sequence/file, instead of writing a file that has a "Collection" of logs represented with the data-type of array. Why did he choose to do this? Is his choice fit, correct, ideal?
I can speculate that the added value of the distinction between a Document and a Collection become relevant when using more advanced features of the YAML grammar, such as Anchors, Tags, References. I guess every Document provide a guarantee that all these identifiers will be a unique set, and there is no collision or duplicates among them. Am I right? And if so, is this the only advantage, or maybe there are any more justifications for the existence of these two pretty-similar data structures?
My best for now, is to see Document as a "meta"-Collection, that is more strict, and lack of high-level logic, or as two different layers of collection schemes. Is it correct, accurate way of view?
And even if I am right, why in the above example (of the logs document from the link), when there's no use and not imply or expected to use duplications or collisions or even identifiers/anchors or compound structures at all - the author is still choosing to represent the collection's items as separate documents? Is this just not so successful selection of an example? Or maybe I'm missing something, and this is a redundancy in the specification, or an evolving syntactic-sugar due to practical needs?
Because the example was written on a website that looks serious with official information written by professionals who dealt with the essence of the language and its definition, theory and philosophy behind (as opposed to practical uses in the wild), and also in light of other provided examples I have seen in it and the added value of them being meticulous, I prefer not to assume that the example is just simply imperfect/meticulous/fit, and that there may be a good reason to choose to write it this way over another, in the specific case exampled.
First, let's look at the technical difference between the list of documents in a YAML stream and a YAML sequence (which is a collection of ordered items). For this, I'll discuss YAML tags, which are an advanced feature so I'll provide a quick overview:
YAML nodes can have tags, such as !!str (the official tag for string values) or !dice (a local tag that can be interpreted by your application but is unknown to others). This applies to all nodes: Scalars, mappings and sequences. Nodes that do not have such a tag set in the source will be assigned the non-specific tag ?, except for quoted scalars which get ! instead. These non-specific tags are later resolved to specific tags, thereby defining to which kind of data structure the node will be deserialized into.
YAML implementations in scripting languages, such as PyYAML, usually only implement resolution by looking at the node's value. For example, a scalar node containing true will become a boolean value, 42 will become an integer, and droggeljug will become a string.
YAML implementations for languages with static types, however, do this differently. For example, assume you deserialize your YAML into a Java class
public class Config {
String name;
int count;
}
Assume the YAML is
name: 42
count: five
The 42 will become a String despite the fact that it looks like a number. Likewise, five will generate an error because it is not a number; it won't be deserialized into a string. This means that not the content of the node defines how it will be deserialized, but the path to the node.
What does this have to do with documents? Well, the YAML spec says:
Resolving the tag of a node must only depend on the following three parameters: (1) the non-specific tag of the node, (2) the path leading from the root to the node and (3) the content (and hence the kind) of the node.)
So, the technical difference is: If you put your data into a single document with a collection at the top, the YAML processor is allowed to take into account the position of the data in the top-level collection when resolving a tag. However, when you put your data in different documents, the YAML processor must not depend on the position of the document in the YAML stream for resolving the tag.
What does this mean in practice? It means that YAML documents are structurally disjoint from one another. Whether a YAML document is valid or not must not depend on any preceeding or succeeding documents. Consequentially, even when deserialization runs into a semantic problem (such as with the five above) in one document, a following document may still be deserialized successfully.
The goal of this design is to be able to concatenate arbitrary YAML documents together without altering their semantics: A middleware component may, without understanding the semantics of the YAML documents, collect multiple streams together or split up a single stream. As long as they are syntactically correct, stream splitting and merging are sound operations that do not invalidate a YAML document even if another document is structurally invalid.
This design primary focuses on sending and receiving data over networks. Of course, nowadays, YAML is primarily used as configuration language. This is why this feature is seldom used and of rather little importance.
Edit: (Reply to comment)
What about end-cases like a string-tagged Document starts with a folded-string, making even its following "---" and "..." just a characters of the global string?
That is not the case, see rules l-bare-document and c-forbidden. A line containing un-indented ... not followed by non-whitespace will always end a document if one is open.
Moreover, ... doesn't do anything if no document is open. This ensures that a stream merger can always append ... to a document to ensure that the current document is closed, but no additional one is created.
--- has widely been adopted as separator between YAML documents (and, perhaps more prominently, between YAML front matter and content in tools like Jekyll) where ... would have been more appropriate, particularly in Jekyll. This gives the false impression that --- should be used by tooling to separate documents, when in reality ... is the syntactic element designed for that use-case.

Is there some standard for handling sets of ranges of numbers?

When you open the print dialog of an editor, you typically get a field for specifying which pages you want to print - which can be multiple ranges, e.g.: "5,11,31-33"
Now, there are other scenarios in which this kind of input from a user is relevant - especially in configuration files for sequential or iterative processes where you want to qualify which iterations or elements a certain action or feature should apply to.
However, I'm not aware of a name for this kind of strings; nor of an accepted standard format/convention for them (i.e. can you add spaces? Can you use semicolons instead of commas? Must the ranges be sorted? Are overlaps allowed and are they maintained or discarded? Can ranges use ".." instead of "-"? Can you range down instead of up? etc.).
Is there some such convention or such standard?
My motivation is double: I need to parse such ranges in a piece of code I'm looking at, any I want both to do it correctly (or rather per-convention), and secondly to go look for parsing functionality in existing libraries. Right now I don't even have a name to go on.

How does the sequence of tags like action-state and decision-state influence a Spring Webflow?

I have a generic understanding on how the Spring actions work and the format of a x-webflow.xml file.
Nevertheless I lived under the false impression that the sequence in which elements like <action-state>, <decision-state> and <view-state> are being written in the webflow.xml configuration file is not defining the "actual" flow, and the flow is resulting from the logic in which those elements reference eachother.
Nevertheless I have been shown that the order in which you write the elements in the config file is important. Could you help me with some example of clarification on how the order of the elements affects the webflow (like if you put this element before that one the flow is x, if you switch them the flow is y).
Only for the very first state in the file does the sequence matter. That defines the starting state. All the others are only referenced by other states, so their order doesn't strictly matter.
https://docs.spring.io/spring-webflow/docs/current/reference/html/defining-flows.html#flow-element
The first state defined becomes the flow's starting point.
IMO, it's a good convention to roughly order them in the order they're used, but you can never fully do that if you have more than one path through the flow.
Similarly, it's a good convention, but not necessary to put all <end-state> definitions at the end of the file.

Converting All Blocks to Lines and Text

When I receive a drawing, I wish to remove all definitions from previous drafters, such as blocks, styles, layers, groups, xrefs, etc. in order to retain only primitives: texts, lines and arcs, in summary, a single flat drawing.
This is a very routinary activity, and I've found many dissimilar answers through internet, often involving non-standard, non-canonical, combinations of the following commands:
LAYMRG, PURGE
AUDIT
SELECTSIMILAR
WBLOCK
EXPLODE, XPLODE
DIMSTYLE, BATTMAN
DXFOUT, WMFOUT, DXFIN, WMFIN
BURST
Unfortunately, after applying most them, the result sometimes retain many non-purgable objects, including:
Non-explodable blocks,
Dimensions with their own styles,
Blocks losing their text attributes (by XPLODE),
Changed fonts (by WMFOUT),
Do AutoCAD have some canonical way to do this?
I think it's not so easy. If there is such command, I don't know that, but...
In situation You described, You should attach drawing You get as External reference XRef . In that case, You can make such drawing displayed as darker or lighter, but without so many changes in drawing. Also if You get new version of such file, for example because Architect make some changes, You don't need to do anything, maybe only reload such file and new version is displayed.
You will have two separate files:
base, for example architecture
branch , for example electircal, HVAC, and so on. Your work.
Of corse You can think about some script (scr file of LISP) which will run all commands You want just by run one command. Create such script is not very complicated, but In my opinion it's easy and flexible enought to use XRef.

Drawing a user-defined tree

I am making a pretty abstract tree drawing system, but I am having quite a lot of trouble formalizing all the drawing features it should have. I'd very much appreciate if someone could point me to things to read about this topic, because unfortunately my searches have been in vain.
I am looking for/trying to make a meta-language for displaying trees. In these trees each node is an instance of a user-defined Object which have a user-defined graphical representation.
Each Object is associated with a Name, a graphical representation and has a finite number of childs ( 0+ ), which are only known to be Objects themselves. Object recursion is not allowed.
Each Object may have user-defined Options that are used to trigger conditions which would change their graphical representation ( in user-defined ways ). Some Options are automatically applied, others may require user interaction ( "Would you like this Object to be A or B?" ), thus explaining why Object trees need to be instanced.
Object
Name // The Object Name
Childs // List of Object Childs
ContextName // The Name of the Child within this context
Types // List of Objects' names. This child may be only one of them. Decided by the user during instancing.
Options // List of Options assigned to this child. Some of them may require user interaction, and apply other Options to the Child's childs.
*Priority // This is an integer which is used to decide the order in which childs are drawn.
Symbol Name // The Graphical representation of the Object
Once an Object Tree has been instanced, it has to be drawn without any addictional user input, and this is where I am having some trouble. The instancing of an Object tree assigns to each Object a particular graphical representation ( let's call it Symbol ). The assignment is however not known before the instancing. Different Objects may also have the same Symbol, which may be drawn differently depending on the Object's Options.
Because of this, Symbols must be defined separately from Objects, and must have a series of abstract mechanisms to be able to draw themselves ( and thir assigned childs ) correctly, following the user-specified rules.
Each Symbol is represented by an image ( or no image ) plus a finite number of Attachments. Attachments are relative positions to the Symbol's coordinates which tell the drawing code where to draw the Symbols of the Object's childs. Each one of them may have particular conditions to be used ( e.g. this Attachment may only be used by a Symbol that has a particular Option, or if N Symbols have already been drawn, no collisions with already drawn Symbols etc etc ).
The algorithm has to find a free Attachment for each Object's child, following the order specified by their Priority. If it is not possible to find an Attachment for a Child the user may specify beforehand rules that allow some automatic retries, but if they also fail then the whole tree drawing fails. Some of these rules allow for adding addictional child Symbols and/or assigning child Symbols to other childs ( making them grandChildren ) etc.
Symbol
Name
Main Image // Image Path, Height, Width
Attachments // List of the attachments, their position, requirements and addictional infos
Fail Rules // List of actions to do if it is not possible to successfully assign each Child to an Attachment
My main problem is that the number of variables that a Symbol should be able to access is pretty high. Each Symbol, which I'll again remind should be defined using this meta-language, should be able to access its child Symbols' informations ( not others to avoid deadlocks and circular referencing ): for example the user may want the heigth and width of a Symbol to be equal to the sum of the heigth and width of all the Child's Symbols, or to use the same picture, and so on. This is also caused by the fact that the user writes Symbols' rules independently from the final structure.
At the same time, since the tree must be drawn from top to bottom, some of these informations may not be available from the start, and may require a great deal of backtracking.
Also, since all of this has to be defined within a meta-language which I have to be able formalize and parse, I have to define which are the functions that the meta-language requires to allow the maximum grade of freedom to the language-writing user without being overly complex ( this is a vague limit, but essentially I don't want to have Tikz as a subset of my meta-language ). I am having however quite a bit of trouble identifying them.
As I said before, I am looking for informations about this kind of topic and/or methods for completing a project like this. Once I'll be able to fully complete the meta-language I think I won't have too much trouble implementing the code to do all of this, my problems are for the most part theoretical.
I have done a few similar projects with hierarchical data. I point you to where I started:
Joe Celko is the king of tree data. I recommend you start with his book. It is a mix of logic and business case. Trees and Hierarchies in SQL for Smarties even has a new edition out. There is a language for describing the hierarchies there, too.
I have used Oracle for storing my hierarchies which has a very efficient system for pulling and storing tree data. Look up "connect by" either in documentation or in the book: Mastering Oracle SQL by Mishra and Beaulieu.
You can use pointers to pull the images from the server so you wouldn't store them in the database. I have built several systems that use hierarchical displays of data with graphical objects, it keeps the overhead down this way. DevExpress and Telerik both have great viewers for displaying the trees and I have mine build the next levels dynamically. It doesn't know how many or what the next level is going to be until it is drilled down on. Try these examples and read the docs and you will be able to put this together in not time.
For telerik this link will show you multiple load on demand views: http://demos.telerik.com/aspnet-ajax/treeview/examples/programming/loadondemandmodes/defaultcs.aspx
For Devexpress: http://demos.devexpress.com/ASPxTreeListDemos/Data/VirtualMode.aspx
Think in HTML/DOM.
I was surprised, when I found, that the file format of the outliner I am using, NoteCase, is plain HTML. NoteCase can be found here: http://notecase.sourceforge.net/index1.html
If you don't familiar with it, outliner is an application type, which you can organize mainly text nodes in hierarchical tree. There are task outliners, too. When an outliner has graphical representation, it's called mind mapping. Anyway, the directory structure of a filesystem is an outline, too. There are lot of outliners for various areas. See Wikipedia for more details.
Notecase uses DL/DT/DD: DL is the list, DT is the item, and DD is the description of an item. They can be nested, of course.
If the format is HTML, you need only a CSS to show it in a browser easy readable for human eyes.
If you have additional properties, you can define additional tags or attributes, which browser will not show, but your renderer can.
You should write a converter, which transforms your source HTML file format to a more detailed HTML format, which contains computed fields (e.g. which sums of values from the sub-nodes, or replaces "inherit" marks in a sub-node with the inherited value from parent node), some additional formatting, or you can transform attributes into HTML nodes:
<node type="x" size="y" />
to
<div class="node">
<div class="param"> type: x </div>
<div class="param"> size: y </div>
</div>
Your data representation is a kinda DOM, and you should process it similar way. First, parse and read values from the file. Then, you should run some additional rounds (walk the tree) to fill missing values with defaults, calculate inherited and summarized values etc.
I think, you can't use standard DOM parsers, because you mentioned custom sorting of stuff depending on parameters, which DOM modell doesn't really support.
Don't afraid to walk the object tree just as many passes as many operation you want to perform on it. You can play with changing the order of the passes, enabling and disabling passes... as your have more and more features, it will articulated as new processing passes.
You may have passes, which must be run several times, e.g. if one pass can't calculate a value (because it's source should be calculated first), it may return a flag that "I've not done yet", and it should run again on the tree, until it results "no change mades, I've done".
I hope I've push you a bit.

Resources