Saving heterogeneous object graph

Saving heterogeneous object graph - doctrine

I have an unmanaged graph of objects. If I get objects without identity I can save them with:
$em->persist($obj);
But when I have one with identity (with $obj->id set) the persist() method wants to re-create it. I realize that merge() is for cases like this. The only problem is that it's recursive (so every other object in the graph should also have identity) and my graph is heterogeneous in this aspect (some of them do others don't have their id's set).

Related

How to delete a vertex from arango DB in GO and have the edges automatically deleted?

I'm using the arangodb go client and trying to delete a vertex and have the dangling edges automatically removed.
The arangodb documentation says about named graphs :
The underlying collections of the named graphs are still accessible using the standard methods for collections. However the graph module adds an additional layer on top of these collections giving you the following guarantees:
(...) If you delete a vertex all edges referring to this vertex will be deleted too
How do I harness the guarantees of the graph module using GO?
I created a named graph with the collections and edge collections that I want to delete from, and still if I just remove from the collection I get dangling edges pointing to the newly removed vertex.
Is there a way to use AQL to do this? The documentation suggests otherwise:
Deleting vertices with associated edges is currently not handled via AQL while the graph management interface and the REST API for the graph module offer a vertex deletion functionality.
Instead it offers a more complex query to do the same.
But since this functionality exists on the graph web interface, and supposedly in the REST API, shouldn't it be present on the arangodb go driver? Am I missing something?
It seems the advantages/guarantees of using named graphs aren't really there.

As #TomRegner suggested accessing a collection through graph.VertexCollection works.

Is a GraphQL related to Graph Database?

According to wikipedia: Graph Database
In computing, a graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data.[1] A key concept of the system is the graph (or edge or relationship). The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes.
If a database has a GraphQL API, is this database a Graph database?
Both terms sound very similar.

They are not related. GraphQL is just an API technology that is compared to REST . Think it as another way to implement the Web API and it has nothing to do with where the data is actually stored or the storage technology behind scene. For example, it can be used as a Web API to get the data from PostgreSQL too.
But as GraphQL treats the data as an object graph, in term of API implementation, it may be more matched when working with the Graph database. It may be easier to implement as we may delegate some graph loading problem to the Graph database to solve rather than solve it by ourself.

Efficient view updating with functional data model

In functional programming, data models are immutable, and updating a data model is done by applying a function on the data model, and getting a new version of the data model in return. I'm wondering how people write efficient viewers/editors for such data models, though (more specifically in Clojure)
A simplified example: suppose that you want to implement a viewer for a huge tree. In the non-functional world, you could have a controller for the Tree, with a function updateNode(Node, Value), which could then notify all observers to tell them that a specific node in the tree has been updated. On the viewer side, you would put all the nodes in a TreeView widget, keep a mapping of Node->WidgetNode, and when you are notified that a Node has changed, you can update just the one corresponding NodeWidget in the tree that needs updating.
The solution described in another Clojure MVC question talks about keeping the model in a ref, and adding a watcher. While this would indeed allow you to get notified of a change in the model, you still wouldn't know which node was updated, and would have to traverse the whole tree, correct?
The best thing I can come up with from the top of my head requires you to in the worst case update all the nodes on the path from root to the changed node (as all these nodes will be different)
What is the standard solution for updating views on immutable data models?

I'm not sure how this is a problem that's unique to functional programming. If you kept all of your state in a singly rooted mutable object graph with a notify when it changed, the same issue would exist.
To get around this, you could simply store the current state of model, and some information about what changed for the last edit. You could even keep a history of these things to allow for easy undo/redo because Clojure's persistent data structures make that extremely efficient with their shared underlying state.
That's just one thought on how to attack it. I'm sure there are many more.
I also think it's worth asking, "How efficient does it need to be?" The answer is, "just efficient enough for the circumstances." It might be the the plain map of data will work because you don't really have all that much data to deal with in a given application.

Networks: how to model them using aggregate roots?

A domain model defines a.o. the relationships between entities and we define aggregate roots to provide encapsulation and transaction boundaries. The well known relationships are one-to-one relationships (an entity or value object is contained within an aggregate root), one-to-many relationships (an aggregate root contains a collection of child objects) and many-to-many relationships. The latter are difficult, because many-to-many relationships between aggregate roots get you in trouble with the transaction boundary. Thus, in many cases, one direction of the many-to-many relationship is seen as more important and only that relation is modeled as a one-to-many relation.
Now, take it one step further. Networks. Many-to-many relationships between equivalent partners. How can you model that without violating the transaction boundary on your aggregate roots?
Have a look at this widely-applicable example:
I have a network with nodes. Every node has a limited amount of ports. One port can only be connected to one port on another node. I have to be able to add and remove connections between nodes, using the ports.
An intuitive approach to this would be to model the nodes as aggregate roots containing ports. Connections seem to be value objects and one port can have one connection. I could implement a Node.ConnectTo(nodeId, portId) method which add the connection (between port X on node A and port Y on node B) to the aggregate root, node A. Preferably, I would call this method twice, once on Node A and once on Node B and wrap it in a transaction. However, this would violate the transaction boundary, so I decide to only store it on Node A.
To see the connection on node B on the application client, a separate read model would be needed. But that's no problem, the CQRS architecture provides us these possibilities. So, adding, removing and viewing connections is not a problem.
The problem arises when I want to validate whether a port is still free before I add the connection to a port. The result of respecting our transaction boundary is that (in the write model) the fact that a port already is connected might not be known to the aggregate root, but might be stored in any other aggregate root.
Of course, you could trust your client's validation, go ahead and add the connection if it's ok for the node you are adding it to and rely on a process running consistency checks to execute compensating actions for invalid connections. But that seems to be a big deal to me compared to wrapping a transaction around two ConnectTo calls...
This made me think that maybe my aggregate roots were chosen incorrectly. And I started thinking about Nodes and Networks as aggregate roots, where a Network is a collection of Connections. The good thing about a Network aggregate is that you could always validate adding or removing connections. Except when a new connection would result in the joining of two existing networks... And your aggregate could become big, possibly resulting only in a single huge network. Not feasible either.
So, how do you think this should be modeled? Do you see a solution where you respect aggregate roots as transaction boundaries, you can validate your network and you do not risk to store your entire network as a single aggregate? Or am I asking for all 3 CAP's here and is it simply not possible?

I think your "new way" is flawed, since the View model should not produce an Exception that propagates "somehow" back to the domain model. The domain model need to resolve this by itself.
So, in this case (bind 1-to-1) you could utilize events within the domain model, so that
NodeA.connect( "port1" ).to( NodeB ).on( "port3" );
NodeA reserves "port1" on itself.
NodeA sends a "portConnectionRequest" to NodeB.
NodeB binds "port3" if available.
NodeB sends "portConnectionConfirmed" or "portConnectionDenied".
NodeA receieves event and acts accordingly.
The above assumes reliable messaging, which is easily achieved within the JVM, but much harder in a distributed environment, yet that is where you want it more. If a reliable messaging system can not be provided, I think you will have a Byzantine Agreement Problem problem at hand, or a subset of it.

Ok, I read and thought some more about it and I guess this is the 'right' way to do it:
Before executing the ConnectTo method on Node A, you validate whether the port on Node B is still free using an eventually consistent view model as your data source (not the domain model which cannot validate this efficiently, see above).
ConnectTo is run only on Node A, thus no transaction boundary is violated.
If the view model is not able to connect the port on Node B, because it is already in use, a true concurrency exception has happened and it must be signaled. Some action needs to be taken (either manuel intervention or an automated process must pick it up). The probability for this concurrency exception will usually be very low.

Persisting a hierarchical ordered list (flatfile/sql/nosql)

I want to store hierarchical ordered lists. One example would be nested todo lists. Another example would be XML. It would just be a tree where the children are in a order. For simplicity, entries are just strings of text.
The thing is that the list will be edited by the user, so it is important that the common operations are fast:
Edit an element
Delete an element
Insert an entry before another
I can imagine how to do this in a data structure: entries are linked lists, if they contain children, they also point to the head of another linked list. There is a hash table linking entry id to the actual data.
Editing is looking up the hash and then replacing the data part of the linked list
Deletion is looking up the hash and doing linked list deletion
Insertion is looking up the hash and doing linked list insertion
However, I need to store the data, and I have no idea how to achieve this. I don't want to save the entire tree if only one element changes. What is the best way? Flat files/SQLs/NoSqls/voodoos?

Using a relational database is viable solution. For your needs - fast insert, update, delete - I'd use an Adjacency List with an additional customizations as such:
id
parent_id
cardinality -- sort order for all nodes with the same parent_id
depth -- distance from the root node
Calculating cardinality and depth is either done with code or - preferably - a database trigger for any insert, delete or update. In addition, for retrieving an entire hierarchy with one SELECT statement, a hierarchy bridge table is called for:
id
descendent_id
This table would also be populated via the same trigger mentioned above and serves as a means for retrieving all nodes above or beneath a given id.
See this question for additional detail around Adjacency List, Hierarchy Bridge and other approaches for storing hierarchical data in a relational database.
Finally to provide some additional clarification on the options you listed:
Flat Files: a combination of linked lists and memory mapped files would probably serve, but you're really just rolling your own at that point, where a SQL or NoSQL solution would probably do better.
SQL: this would be my approach - tooling is the best here for data manipulation, backup and recovery.
XML: this is also a possibility with a database, very vendor specific, you'll need to study the syntax for node insert, update and delete. Can be very fast if the database offers an XML data type.
NoSQL: if you're talking key-value storage, the typical approach for hierarchical data appears to be materialized path, but this would require recalculating the entire path for all affected nodes on change, which is likely slow. Instead consider the Java Content Repository (JCR) - Apache Jackrabbit is an implementation - entire API centered around representing hierarchical structured data and persisting it - perhaps too heavyweight for the problem you're trying to solve.
voodoo: um...
Update
If you implement all pieces from this answer, add is cheap, re-sort is small cost, move is expensive. Trade-off is fast hierarchy traversal reads - for instance find a node's complete ancestry in one operation. Specifically, adding a leaf is an O(1) operation. Re-sort means updating cardinality all peer nodes coming after the moved node. Move means update of (1) cardinality for source and destination peer nodes coming after, (2) moved - and descendant - node depth, and (3) removal and addition of ancestry to hierarchy bridge table.
However, go with an Adjancency List alone (i.e. id, parent_id) and write becomes cheap, reads for one level are cheap, but reads that traverse the hierarchy are expensive. The latter would then require using recursive SQL such Oracle's CONNECT BY or Common Table Expressions as found in SQL Server and other RDBMSs.

You store lists (or rather trees) and don't want to rewrite the entire tree once a small piece of it changes. From this I conclude the stuctures are huge and small changes happen relatively often.
Linked lists are all about pointer chasing, and pointers and what they reference are much like keys and values. You need to efficiently store key-value pairs. Order of items is preserved by the linked list structure.
Suppose that you use a typical key-value store, from xDBM or Berkeley DB to any of modern NoSQL offerings. Also you could take a compact SQL engine, e.g. sqlite. They typically use trees to index keys, so it takes O(logN) to access a key, or hash tables that take about as much or a bit less.
You haven't specified when you persist your data incrementally. If you only do it once in a while (not for every update), you'll need to effectively compare the database to your primary data structure. This will be relatively time-consuming because you'll need to traverse the entire tree and look each node ID in the database. This is logarithmic but with a huge constant because of necessary I/O. And then you'll want to clean you persistent store from items that are no longer referenced. It may happen that just dumping the tree as JSON is far more efficient. In fact, that's what many in-memory databases do.
If you update your persistent structure with every update to the main structure, there's no point to have that main structure anyway. It's better to replace it with an in-memory key-value store such as Redis which already has persistence mechanisms (and some other nice things).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio