I found in one book, that for presenting genealogy (family) tree good to use DAG (directed acyclic graph) with topological sorting, but this algorithm is depending on order of input data.
Genealogy databases typically use what's called a lineage-linked structure.
This means that partners (husbands/wives) are linked and called a family. And a family is linked to it's children with a link back from the children to its parent family.
I do not know of a specific graph type that represents this. Most programs custom program it with a family table and an individual table with the appropriate links between them.
Genealogy databases generally follow this structure to match the GEDCOM (Genealogy Data Communications) standard that was developed to allow transfer of data between programs.
In that standard, you'll specifically see FAM and INDI records. FAM records are connected to INDI records with HUSB, WIFE and CHIL links. INDI records are connected to FAM records with FAMS (spouse) and FAMC (parent) links.
Using this data structure will allow you easily to read a GEDCOM file and import data from other genealogy software, and also export your data to a GEDCOM file so that other genealogy programs can read it.
In genealogy, the so-called Ahnentafel indexing (German for "ancestor table") is used for representation of the ancestors of a single person; basically this is a suitable linearization of a binary tree.
To present the relations between people found in historic record, Open Archives uses a flexible force-directed graph layout implementation. In this graph every node is a person, and there are two types of vertices: one depicting a marriage (orange) and one depicting a parent relation (the red 'blood' line). An example of a graph can be seen here.
DAGs will not work. Might look at prior post using GEDCOM model in Neo4j
The lineages can have complex relationships such as double cousins, step-sibling marriages, consangienity, etc. These are easily managed in a non-sql data base such as Neo4j.
Related
I started dedicating time for learning algorithms and data structures. So my first and basic question is, how do we represent the data depending on the context.
I have given it time and thought and came up with this conclusion.
Groups of same data -> List/Arrays
Classification of data [Like population on gender, then age etc.] -> Trees
Relations [Like relations between a product brought and others] -> Graphs
I am posting this question to know our stack overflow community thought about my interpretation of datastructures. Since it is a generic topic I could not get a justification for my thought online. Please help me if I am wrong.
This looks like oversimplifying things.
The data structure we want to use depends on what we are going to do with the data.
For example, when we store records about people and need fast access by index, we can use an array.
When we store the same records about people but need to find by name fast, we can use a search tree.
Graphs are a theoretical concept, not a data structure.
They can be stored as an adjacency matrix (two-dimensional array, suitable for small or dense graphs), or as lists of adjacent edges (array/list of dynamic arrays/lists, suitable for large or sparse graphs), or implicitly (generated on the fly), or otherwise.
I am currently working on a project based on graph and I am searching for an algorithm for slicing an dynamic graph. I have already done some research but most algorithms that I have found works only for a static graph. In my environment, the graph is dynamic, it means that users add/delete elements, create/delete dependences at runtime.
(In reality I am working with UML models but UML models can be also represented by typed graphs, wich are composed of typed Vertices and edges)
I also search for the terms graph fragmentation but I did not find anything. And I would like to know if exist such algorithm for slicing a dynamic graph?
[UPDATE]
Sorry for not being clear and I am updating my question.Let me first expose the context.
In MDE (Model Driven Engineering), large-scale industrial systems involve nowadays hundreds of developpers working on hundreds of models representing pars of the whole system specification. In a such context, the approach commonly adopted is to use a central repository. The solution I provide for my project (I am currently working on a research lab), is a solution which is peer-to-peer oriented, that means that every developper has his own replication of the system specification.
My main problem is how to replicate this data, the models.
For instance, imagine Alice and Bob working on this UML diagram and Alice has the whole diagram in his repository. Bob wants to have the elements {FeedOrEntry, Entry}, how can I slice this diagram UML?
I search for the terms of "model Slicing".I have found one paper which gives an approach for slicing UML Class Diagrams but the problem with this algorithm is it only works for a static graph. In our context, developpers add/update/remove elements constantly and the shared elements should be consistent with the other replicas.
Since UML Models can also be seen as a graph, I also search for the terms for "graph slicing" or "graph fragment" but I have found nothing useful.
And I would like to know if exist such algorithm for slicing a dynamic graph
If you make slicing atomic, I see no problem with using algorithm shown in paper you linked.
However, for your consistency constraints, I believe that your p2p approach is incompatible. Alternative is merge operation, but I have no idea how would that operation work. It probably, at least partially, would have to be done manually.
Sounds like maybe you need a NoSQL graph database such as Neo4J or FlockDB. They can store billions of vertexes and edges.
What about to normalize the graph to an adjacent tree model? Then you can use a DFS or BFS to slice the graph?
I want to implement a search similar to as seen in http://maps.google.com/. If I type a name of place or something i can see matching places. I know it uses AJAX.
But the major concern is fast retrieval of matching data from the database in quick time, as the user can type in almost anything. He can type a name of popular shop or something , or a name of a place ,or a shop followed by place name.
How can I design a database structure to make such a search? I just need pointers.
So, any pointers about search algorithms?
There's a whole field called spatial databases, or GIS (geospatial INformation services). Some major players are
Oracle Spatial
PostGIS
ESRI
Mapinfo
As for data structures k-d tree's are the typical spatial data structure. Lecture 3 here http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-851-advanced-data-structures-spring-2010/lecture-notes/ describes k-d trees nicely if briefly
hth
I want to run some analysis on networked data having multiple modes(i.e. multiple types of network nodes) and multiplex relations(i.e. multiples types of network edges).
The analysis is probably about SNA or applying any algorithm from graph theory, e.g. tie strength, centrality, betweenness, node distance, block, cluster, etc.
The source data is rather unstructured, therefore I should at first think about how I represent, store, and retrieve the data.
Following are some ideas. I would appreciate any feedback or further suggestion.:)
I know that there are already some great NoSQL databases, for example Neo4J, InfoGrid, for such kind of application. But for some extensibility reasons (e.g. licence, web standard...) I would like to prefer using RDF to store and represent my data. The tools to use would be SESAME or JENA.
the idea to represent network/graph data with RDF is trivial.
For example:
Network/Graph data
*Alice* ----lend 100USD----> *Bob* ----- likes ----> *Skiing*
represented with RDF
*Alice* --src--> *lend_relation* <---target--- *Bob* ---likes---> *Skiing*
|
has_value
\|/
*100USD*
[Alice src lend_relation]
[Bob target lend_relation]
[lend_relation has_value 100USD]
[Bob likes Skiing]
However, the problem is that RDF as well as SPARQL lacks of perspectives of graph model.
It is not efficient to traverse between nodes or find (the shortest) distance with RDF query.
It must be done with some extra analysis tools, for example JUNG or JGarphT,
and I must at first construct a sub graph by querying RDF storage and then convert it into the data model used by JUNG or JGraphT. If I want extra visualization (neither from JUNG nor JGraphT), then I must construct another data model for the visualization toolkit.
I don't know if that is a clear or efficient integration.
thanks again for any suggestion!
If you want to do network analysis of your RDF data with SPARQL you can have a look at SPARQL 1.1 Property Paths. I believe that in Jena/ARQ it's been already implemented ARQ - Property Paths.
Property Paths, from the new spec of SPARQL, allows you to query the RDF data model by defining graph patterns. Graph patterns that are a bit more complex than the ones you could define in SPARQL 1.0.
With this feature plus some logic at the application level you might be able to implement some interesting network analysis over your data.
So far I have encountered adjacency list, nested sets and nested intervals as models for storing tree structures in a database. I know these well enough and have migrated trees from one to another.
What are other popular models? What are their characteristics? What are good resources (books, web, etc) on this topic?
I'm not only looking for db storage but would like to expand my knowledge on trees in general. For example, I understand that nested sets/intervals are especially favorable for relational database storage and have asked myself, are they actually a bad choice in other contexts?
A variation is where you use a direct hierarchical representation (ie. parent link in node), but also store a path value.
ie. for a directory tree consisting of the following:
C:\
Temp
Windows
System32
You would have the following nodes
Key Name Parent Path
1 C: *1*
2 Temp 1 *1*2*
3 Windows 1 *1*3*
4 System32 3 *1*3*4*
Path is indexed, and will allow you to quickly do a query that picks up a node and all its children, without having to manipulate ranges.
ie. to find C:\Temp and all its children:
WHERE Path LIKE '*1*2*%'
This representation is the only place I can think of where storing id's in a string like this is ok.
The seminal resource for this are chapters 28-30 of SQL for Smarties.
(I've recommended this book so much I figure Celko owes me royalties by now!)