Drawing a user-defined tree - algorithm

I am making a pretty abstract tree drawing system, but I am having quite a lot of trouble formalizing all the drawing features it should have. I'd very much appreciate if someone could point me to things to read about this topic, because unfortunately my searches have been in vain.
I am looking for/trying to make a meta-language for displaying trees. In these trees each node is an instance of a user-defined Object which have a user-defined graphical representation.
Each Object is associated with a Name, a graphical representation and has a finite number of childs ( 0+ ), which are only known to be Objects themselves. Object recursion is not allowed.
Each Object may have user-defined Options that are used to trigger conditions which would change their graphical representation ( in user-defined ways ). Some Options are automatically applied, others may require user interaction ( "Would you like this Object to be A or B?" ), thus explaining why Object trees need to be instanced.
Object
Name // The Object Name
Childs // List of Object Childs
ContextName // The Name of the Child within this context
Types // List of Objects' names. This child may be only one of them. Decided by the user during instancing.
Options // List of Options assigned to this child. Some of them may require user interaction, and apply other Options to the Child's childs.
*Priority // This is an integer which is used to decide the order in which childs are drawn.
Symbol Name // The Graphical representation of the Object
Once an Object Tree has been instanced, it has to be drawn without any addictional user input, and this is where I am having some trouble. The instancing of an Object tree assigns to each Object a particular graphical representation ( let's call it Symbol ). The assignment is however not known before the instancing. Different Objects may also have the same Symbol, which may be drawn differently depending on the Object's Options.
Because of this, Symbols must be defined separately from Objects, and must have a series of abstract mechanisms to be able to draw themselves ( and thir assigned childs ) correctly, following the user-specified rules.
Each Symbol is represented by an image ( or no image ) plus a finite number of Attachments. Attachments are relative positions to the Symbol's coordinates which tell the drawing code where to draw the Symbols of the Object's childs. Each one of them may have particular conditions to be used ( e.g. this Attachment may only be used by a Symbol that has a particular Option, or if N Symbols have already been drawn, no collisions with already drawn Symbols etc etc ).
The algorithm has to find a free Attachment for each Object's child, following the order specified by their Priority. If it is not possible to find an Attachment for a Child the user may specify beforehand rules that allow some automatic retries, but if they also fail then the whole tree drawing fails. Some of these rules allow for adding addictional child Symbols and/or assigning child Symbols to other childs ( making them grandChildren ) etc.
Symbol
Name
Main Image // Image Path, Height, Width
Attachments // List of the attachments, their position, requirements and addictional infos
Fail Rules // List of actions to do if it is not possible to successfully assign each Child to an Attachment
My main problem is that the number of variables that a Symbol should be able to access is pretty high. Each Symbol, which I'll again remind should be defined using this meta-language, should be able to access its child Symbols' informations ( not others to avoid deadlocks and circular referencing ): for example the user may want the heigth and width of a Symbol to be equal to the sum of the heigth and width of all the Child's Symbols, or to use the same picture, and so on. This is also caused by the fact that the user writes Symbols' rules independently from the final structure.
At the same time, since the tree must be drawn from top to bottom, some of these informations may not be available from the start, and may require a great deal of backtracking.
Also, since all of this has to be defined within a meta-language which I have to be able formalize and parse, I have to define which are the functions that the meta-language requires to allow the maximum grade of freedom to the language-writing user without being overly complex ( this is a vague limit, but essentially I don't want to have Tikz as a subset of my meta-language ). I am having however quite a bit of trouble identifying them.
As I said before, I am looking for informations about this kind of topic and/or methods for completing a project like this. Once I'll be able to fully complete the meta-language I think I won't have too much trouble implementing the code to do all of this, my problems are for the most part theoretical.

I have done a few similar projects with hierarchical data. I point you to where I started:
Joe Celko is the king of tree data. I recommend you start with his book. It is a mix of logic and business case. Trees and Hierarchies in SQL for Smarties even has a new edition out. There is a language for describing the hierarchies there, too.
I have used Oracle for storing my hierarchies which has a very efficient system for pulling and storing tree data. Look up "connect by" either in documentation or in the book: Mastering Oracle SQL by Mishra and Beaulieu.
You can use pointers to pull the images from the server so you wouldn't store them in the database. I have built several systems that use hierarchical displays of data with graphical objects, it keeps the overhead down this way. DevExpress and Telerik both have great viewers for displaying the trees and I have mine build the next levels dynamically. It doesn't know how many or what the next level is going to be until it is drilled down on. Try these examples and read the docs and you will be able to put this together in not time.
For telerik this link will show you multiple load on demand views: http://demos.telerik.com/aspnet-ajax/treeview/examples/programming/loadondemandmodes/defaultcs.aspx
For Devexpress: http://demos.devexpress.com/ASPxTreeListDemos/Data/VirtualMode.aspx

Think in HTML/DOM.
I was surprised, when I found, that the file format of the outliner I am using, NoteCase, is plain HTML. NoteCase can be found here: http://notecase.sourceforge.net/index1.html
If you don't familiar with it, outliner is an application type, which you can organize mainly text nodes in hierarchical tree. There are task outliners, too. When an outliner has graphical representation, it's called mind mapping. Anyway, the directory structure of a filesystem is an outline, too. There are lot of outliners for various areas. See Wikipedia for more details.
Notecase uses DL/DT/DD: DL is the list, DT is the item, and DD is the description of an item. They can be nested, of course.
If the format is HTML, you need only a CSS to show it in a browser easy readable for human eyes.
If you have additional properties, you can define additional tags or attributes, which browser will not show, but your renderer can.
You should write a converter, which transforms your source HTML file format to a more detailed HTML format, which contains computed fields (e.g. which sums of values from the sub-nodes, or replaces "inherit" marks in a sub-node with the inherited value from parent node), some additional formatting, or you can transform attributes into HTML nodes:
<node type="x" size="y" />
to
<div class="node">
<div class="param"> type: x </div>
<div class="param"> size: y </div>
</div>
Your data representation is a kinda DOM, and you should process it similar way. First, parse and read values from the file. Then, you should run some additional rounds (walk the tree) to fill missing values with defaults, calculate inherited and summarized values etc.
I think, you can't use standard DOM parsers, because you mentioned custom sorting of stuff depending on parameters, which DOM modell doesn't really support.
Don't afraid to walk the object tree just as many passes as many operation you want to perform on it. You can play with changing the order of the passes, enabling and disabling passes... as your have more and more features, it will articulated as new processing passes.
You may have passes, which must be run several times, e.g. if one pass can't calculate a value (because it's source should be calculated first), it may return a flag that "I've not done yet", and it should run again on the tree, until it results "no change mades, I've done".
I hope I've push you a bit.

Related

What is the essential difference between Document and Collectiction in YAML syntax?

Warning: This question is a more philosophical question than practical, but I find it well as to be asked and answered in practical contexts (forums like StackOverflow here, instead of the SoftwareEngineering stack-exchange website), due to the native development in the actual use de-facto of YAML and the way the way it's specification has evolved and features have been added to it over time. Let's ask:
As opposed to formats/languages/protocols such as JSON, the YAML format allows you (according to this link, that seems pretty official, or at least accurate and reliable source to understand the YAML specification) to embed multiple 'Documents' within one file/stream, using the three-dashes marking ("---").
If so, it's hard to ignore the fact that the concept/model/idea of 'Document' in YAML, is no longer an external definition, or "meta"-directive that helps the human/parser to organize multiple/distincted documents along each other (similar to the way file-systems defining the concept of "file" to organize different files, but each file in itself - does not necessarily recognize that it's a file, or that it's being part of a file system that wraps it, by definition, AFAIK.
However, when YAML allows for a multi-Document YAML files, that gather collections of Documents in a single YAML file (and perhaps in a way that is similar/analogous to HTTP Pipelining approach of HTTP protocol), the concept/model/idea/goal of Document receives a new, wider definition/character de-facto, as a part of the YAML grammar and it's produces, and not just of the YAML specification as an assistive concept or format description that helps to describe the specification.
If so, being a Document part of the language itself, what is the added value of this data-structure, compared to the existing, familiar and well-used good old data-structure of Collection (array of items)?
I'm asking it, because I've seen in this link (here) some snippet (in the second example), which describes a YAML sequence that is actually a collection of logs. For some reason, the author of the example, chose to prefer to present each log as a separate "Document" (separated with three-dashes), gathered together in the same YAML sequence/file, instead of writing a file that has a "Collection" of logs represented with the data-type of array. Why did he choose to do this? Is his choice fit, correct, ideal?
I can speculate that the added value of the distinction between a Document and a Collection become relevant when using more advanced features of the YAML grammar, such as Anchors, Tags, References. I guess every Document provide a guarantee that all these identifiers will be a unique set, and there is no collision or duplicates among them. Am I right? And if so, is this the only advantage, or maybe there are any more justifications for the existence of these two pretty-similar data structures?
My best for now, is to see Document as a "meta"-Collection, that is more strict, and lack of high-level logic, or as two different layers of collection schemes. Is it correct, accurate way of view?
And even if I am right, why in the above example (of the logs document from the link), when there's no use and not imply or expected to use duplications or collisions or even identifiers/anchors or compound structures at all - the author is still choosing to represent the collection's items as separate documents? Is this just not so successful selection of an example? Or maybe I'm missing something, and this is a redundancy in the specification, or an evolving syntactic-sugar due to practical needs?
Because the example was written on a website that looks serious with official information written by professionals who dealt with the essence of the language and its definition, theory and philosophy behind (as opposed to practical uses in the wild), and also in light of other provided examples I have seen in it and the added value of them being meticulous, I prefer not to assume that the example is just simply imperfect/meticulous/fit, and that there may be a good reason to choose to write it this way over another, in the specific case exampled.
First, let's look at the technical difference between the list of documents in a YAML stream and a YAML sequence (which is a collection of ordered items). For this, I'll discuss YAML tags, which are an advanced feature so I'll provide a quick overview:
YAML nodes can have tags, such as !!str (the official tag for string values) or !dice (a local tag that can be interpreted by your application but is unknown to others). This applies to all nodes: Scalars, mappings and sequences. Nodes that do not have such a tag set in the source will be assigned the non-specific tag ?, except for quoted scalars which get ! instead. These non-specific tags are later resolved to specific tags, thereby defining to which kind of data structure the node will be deserialized into.
YAML implementations in scripting languages, such as PyYAML, usually only implement resolution by looking at the node's value. For example, a scalar node containing true will become a boolean value, 42 will become an integer, and droggeljug will become a string.
YAML implementations for languages with static types, however, do this differently. For example, assume you deserialize your YAML into a Java class
public class Config {
String name;
int count;
}
Assume the YAML is
name: 42
count: five
The 42 will become a String despite the fact that it looks like a number. Likewise, five will generate an error because it is not a number; it won't be deserialized into a string. This means that not the content of the node defines how it will be deserialized, but the path to the node.
What does this have to do with documents? Well, the YAML spec says:
Resolving the tag of a node must only depend on the following three parameters: (1) the non-specific tag of the node, (2) the path leading from the root to the node and (3) the content (and hence the kind) of the node.)
So, the technical difference is: If you put your data into a single document with a collection at the top, the YAML processor is allowed to take into account the position of the data in the top-level collection when resolving a tag. However, when you put your data in different documents, the YAML processor must not depend on the position of the document in the YAML stream for resolving the tag.
What does this mean in practice? It means that YAML documents are structurally disjoint from one another. Whether a YAML document is valid or not must not depend on any preceeding or succeeding documents. Consequentially, even when deserialization runs into a semantic problem (such as with the five above) in one document, a following document may still be deserialized successfully.
The goal of this design is to be able to concatenate arbitrary YAML documents together without altering their semantics: A middleware component may, without understanding the semantics of the YAML documents, collect multiple streams together or split up a single stream. As long as they are syntactically correct, stream splitting and merging are sound operations that do not invalidate a YAML document even if another document is structurally invalid.
This design primary focuses on sending and receiving data over networks. Of course, nowadays, YAML is primarily used as configuration language. This is why this feature is seldom used and of rather little importance.
Edit: (Reply to comment)
What about end-cases like a string-tagged Document starts with a folded-string, making even its following "---" and "..." just a characters of the global string?
That is not the case, see rules l-bare-document and c-forbidden. A line containing un-indented ... not followed by non-whitespace will always end a document if one is open.
Moreover, ... doesn't do anything if no document is open. This ensures that a stream merger can always append ... to a document to ensure that the current document is closed, but no additional one is created.
--- has widely been adopted as separator between YAML documents (and, perhaps more prominently, between YAML front matter and content in tools like Jekyll) where ... would have been more appropriate, particularly in Jekyll. This gives the false impression that --- should be used by tooling to separate documents, when in reality ... is the syntactic element designed for that use-case.

Most elegant way to represent lots of fields

So, I have this list of attributes that I would like to represent in my Go code. That is, I'm looking for some data structure that can convey all these.
Mind like a diamond
Knows what's best
Shoes that cut
Eyes that burn like cigarettes
The right allocations
Fast
Thorough
Sharp as a tack
Playing with her jewellry
Putting up her hair
Touring the facilities
Picking up slack
Short skirt
Long jacket
Gets up early
Stays up late
Uninterrupted prosperity
Uses a machete to cut through red tape
Fingernails that shine like justice
Voice that is dark like tinted glass
Smooth liquidation
Right dividends
Wants a car with a cupholder arm rest
Wants a car that will get her there
Changing her name from Kitty to Karen
Trading her MG for a white Chrysler LeBaron
They're not going to change any time soon, but I will want to do the following things with them:
Read them from and write them to a JSON file.
Associate a boolean with each attribute (bonus points for a way to associate any one arbitrary type).
Easily display these strings to a human.
Easily count how many are "set" (That is, a function that, when called on someone who gets up early, has the right dividends, and is sharp as a tack, but nothing else, returns 3).
Should be as strongly typed as possible (It should be impossible, or at least difficult, to ask for an attribute that doesn't exist. You should be able to see all the attributes by looking at the code).
Preserving this order would be a plus, but not strictly necessary.
The approaches I've thought of so far are:
A struct with all of these as fields, but the count function is inelegant to write, and the 'nice' string name of the fields has to be stored elsewhere. Even if that can be solved with tags, the count problem remains.
A map with constant keys (or an array with enumerated values). Counting is a simple loop, and the keys can be the nice string names. However, this creates the problem of asking for keys that don't exist, and you have to hide the dictionary behind a function to stop users from trying to add new keys.
Is there something I'm missing here, or will I have to compromise?
I'd use a uint64 as a bitset (or an actual bitset type, of which none are built in but some are open source). It meets your requirements in the following ways:
JSON storage as "010101" string or base64 (11 chars).
Easy display: just define an array of the strings in the same order they are stored in the bitset.
Easily count how many are "set" - use "popcount" (open source implementations are easy to come by).
Order is naturally preserved--the first attribute will always be bit 0, etc.
If you have more than 64 attributes, you can use math/big.
You can use map[int]interface{} to store the value of any length inside the interface with integer as key or you can also use slice of interface for looping inside the data like this []interface{}. Later on you can easily type assert the interface to get underlying value which is string in your case.
Since it will be slice of interface or map of interface you can easily marshal it to convert it into json and save into file which can be read or write easily.

arranging user profiles with singularitygs

In my webapp I have several places (like user profile) where I use a columns to place propertyName -> propertyValue pairs. Sometimes property values are short and I can afford having both property name and value in one row but sometimes property values are big ( like cloud of user interests or paragraph describing something) so it makes sense to devote whole new row to property_value.
Are there any recommended ways how I can (mostly with means of singularity.gs) make property-value go to another row if property-value is too long?
It's really tricky to answer without knowing more about the web app. Generally speaking I'd wrap the key value pairs in a div and attach a class to the wrapper that styles the elements inside.
EG. If the Key-Value is "big" use block elements, if its "small" use inline-blocks to stay inline .
I'd use singularity to layout the outer wrappers, not the elements themselves. Basic CSS should cover the inner elements. If you don't know the length of the data coming out of a given key-value pair then I don't think CSS alone can solve your challenge. JS, or server side scripting will likely be needed to attach the appropriate wrapper classes described above.

Can Groups be used to emulate the "class" or "struct" data structures from other languages

Is there a data structure within LiveCode that can be used as a "holder" for associated data, letting me handle it collectively? I come from a Java / Javascript / C background so I am looking for a Class or Struct sort of data structure.
I've found examples of Groups, which seem to have some of this functionality, but it feels a bit like I'm bending the language to meet my needs.
As a specific example, suppose I had an image field on my screen that would randomly display an image and, when pressed, play an associated sound clip. I'd expect to create a list of "structures" that contained the path to the image and the path to the associated sound clip, and use that data to populate the image field and to decide what sound clip to play.
Would a Group be the correct structure to use in this case? Or am I approaching this in a way that isn't really fitting with the way LiveCode works?
It takes a little getting used to, but the xTalk world is much simpler and more open than any ordinary procedural language. So much of what you once had to manage is no longer required.
So when splash21 said that you could store all your image and sound references in a custom property, he was really saying that the LiveCode environment contains intrinsic, high level functionality that makes these sorts of things instantly accessible, and the only thing required of you is to call for them, and they simply work.
The only way to appreciate this is to make a few simple programs, to really see what is possible. Make your application. Everything you mentioned can be accomplished with perhaps a dozen lines of code in a single handler. I recommend that you join the LiveCode use list and forums. The community is vibrant and eager to help, frequently with full blown solutions to specific problems, but more importantly, as guides and mentors to new users
Craig Newman
Arrays in LiveCode are actually associative arrays (like hash maps). A key is associated with a value. The value might be as well an array.
Chapter 5.5.7 of the User's Guide says
Array elements may contain nested or sub-elements, making them multi-dimensional.
This type of array is ideal for processing hierarchical data structures such as trees or
XML. To access a sub-element, simply declare it using an additional set of square
brackets.
put "ABC" into myVariable["myKeyName"][“aChildElement”]
see also
How to store pictures in a stack?
Dave- I'm hoping to get a struct-like container implemented in the near future. Meanwhile you can, as splash21 mentioned, use custom properties (or better yet, custom property sets) to do what you want. This will give you a pseudo-struct for each object and you can implement the file and sound specifications into the properties. And if you use that in conjunction with a behavior object you'll end up very close to a real inheritable class formation.

associate multiple strings to only one

I'm trying to make an algorithm that easily simplifies and groups synonyms (with mismatches, capitals, acronims, etc) into only one. I supose there should exist a standard way to build such a structure that, looking for a string with possible mismatches, if the string exists in the structure, it returns a normalized string key. In short, sometimes the same concept could be written in several ways, but I only want to keep the concept.
For instance: Supose I want to normalize or simplify the appearances of
"General Director", "General Manager", "G, Dtor", "Gen Dir", ...
into
"GEN_DIR"
and keep only this result for further reference.
By the way, I suppose that building a Hash with key/value pairs like
hash["General Director"]="GEN_DIR"
hash["General Manager"]="GEN_DIR"
hash["G, Dtor"]="GEN_DIR"
hash["G, Dir"]="GEN_DIR"
could be a solution, but I suspect that there are more elegant or adequate solutions to that.
I would also need the way to persist this associative structure easily without any database because it should grow as I find more mismatches of the same word or sentence. A possible approach I think is to define this structure by means of a DSL, but I'm open to suggestions.
Well, there is no rule, at least a clear one.
My aim is to scrap from web some "structured" data that sometimes is incorrectly or incompletely typed. Some fields are descriptions and can be left as is. But some fields are suposedly to be "sets" but aren's correctly typed (as in my example). As a human can read that, he immediatelly knows what it means and can associate that with its meaning.
But I would like to automate as much as possible the process of reducing those possible mismatches to only one "string" (or symbol) before, for instance, saving it into a database. So, what I would need is a kindof hash or dictionary, as sawa correctly stated, that I can use to lookup any of such dirty strings to get the normalized string or symbol.
Also, of course, it would be desirable a way to make this hash (or whatelse it could be) to learn from new mismatches in some way and add a new association automatically (possibly it could be based on a distance measure between mismatched string and normalized string that, if lower than X, a new association is built). The whole association (i.e, hash) should grow as new mismatches and concepts arise and, though, it should be kept anywhere (possibly in an xml file, or something like what Mori answered below) for future uses.
Any new Idea?

Resources