I know that knowledge graphs are represented in RDF, but I am wondering whether Memgraph as a graph database can store this kind of data?
While Memgraph is not an RDF store, it is capable of handling this kind of data with the labeled property graph model (LPG). LPG is represented by a set of nodes, relationships, properties (key-value attributes) and labels. RDF statements can be directly treated as nodes, relationships and properties of the graph, which are explored using the Cypher query language. Therefore, both RDF and LPG allow the creation of a knowledge graph.
Related
According to wikipedia: Graph Database
In computing, a graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data.[1] A key concept of the system is the graph (or edge or relationship). The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes.
If a database has a GraphQL API, is this database a Graph database?
Both terms sound very similar.
They are not related. GraphQL is just an API technology that is compared to REST . Think it as another way to implement the Web API and it has nothing to do with where the data is actually stored or the storage technology behind scene. For example, it can be used as a Web API to get the data from PostgreSQL too.
But as GraphQL treats the data as an object graph, in term of API implementation, it may be more matched when working with the Graph database. It may be easier to implement as we may delegate some graph loading problem to the Graph database to solve rather than solve it by ourself.
I came across multiple opinions that graph databases tend to have problems with aggregation operations. Like if you have a set of users and want to get maximum age, RDBMS will outperform graph database. Is true and if it is, what is the reason behind it? As far as I understand, key difference between relational and graph database is that each graph database node somehow includes references to the nodes it is connected to. How does it impact "get max age"-like query?
Disclaimer: most of what I have read was about Neo4j, but I suppose if these limitations exist, they should apply to any graph db.
The use of graph databases like Neo4j is recommended when dealing with connected data and complex queries.
The book Learning Neo4j by Rik Van Bruggen state that you should not use graph databases when dealing with simple, aggregate-oriented queries:
From the book:
(...) simple queries, where write patterns and read patterns align to
the aggregates that we are trying to store, are typically served quite
inefficiently in a graph, and would be more efficiently handled by an
aggregate-oriented Key-Value or Document store. If complexity is low,
the advantage of using a graph database system will be lower too.
The reason behind this is highly related to the nature of the persistence model. Its more easy to make a sum, max or avg operation over a tabled data than a data stored as graph.
I am aware that there are algorithms (and even tools) to transform relational databases (RDBMS) to Graph databases, and the other way around.
I do have several questions that are a bit larger than that:
Is there a common-practice working algorithm out there for such transformation, for example RDBMS => graph (or several)?
Is this algorithm bijective? To be more precise:
2.1. Given said algorithm, is the transformation RDBMS => graph injective (one-to-one)? More plainly, can there be any two relational DBs that can be transformed into the same Graph DB?
2.2. Similarly, is any Graph DB can be represented by a relational DB? Basically, I'm asking if the algorithm function is surjective (onto)?
TL;DR
There's typically an obvious bijection from a particular math notion of graph (node set, edge relation) to a relational representation. Essentially because the math uses sets and relations.
There's no standard graph DBMS. And no standard way to use one to represent application/business situations. So there's no standard mapping between a graph database state & a relational state, let alone one that gives a representation in the other that is natural for the situations represented.
Without relation-valued attributes, mappings are not always bijective between non-relational structures and relational structures because we must sometimes pick relational surrogate values 1:1 with the relation values we would have used.
Sometimes we're not interested in a particular situation, we are just interested in a data structure. Then we can come up with (various) relational versions of it.
But a database or data structure variable typically represents an application/business situation. There is typically a one-to-many or one-to-one mapping from situations to representations. Under the relational model, every table has an associated (characteristic) predicate (statement template) and holds the rows that make a true proposition (statement) from its predicate. Other data structures are used in an ad hoc way to represent a situation.
What's special about the relational model is that you can generically query via predicate logic and/or relation operators--a query expression determines a predicate and its result holds the rows that make a true proposition from its predicate. (Calculated with certain complexity guarantees and certain opportunities for automated optimization.)
Mappings between structures that represent the same situation depend on how the databases represent situations. So there is no general mapping between representations, even for two representations using the same data structure.
On the other hand you can define some generic mapping between two structures, and it might be bijective, but when a situation is represented by one, the other tells you about the other representation of the situation, hence the situation only indirectly, not the situation itself directly. So don't expect the relational version that describes the other structure's representaion to be anything like a good relational design for that application/business.
This is the problem with ORMs & object databases. You can define a mapping from a particular object-oriented state to relations but the relations are only describing the object-oriented state, not its represented situation. Every time an object value holds an oid to an object referenced rather than contained, that referencing object is representing a relationship/association entity instance. But usually there is no explicit predicate given for the relation corresponding to the set of such objects. Instead we are given a representation function from some entire representing state to a represented situation. Whereas in a relational design every superkey value of every table (base or query result) is 1:1 with some (possibly associative) entity.
I found in one book, that for presenting genealogy (family) tree good to use DAG (directed acyclic graph) with topological sorting, but this algorithm is depending on order of input data.
Genealogy databases typically use what's called a lineage-linked structure.
This means that partners (husbands/wives) are linked and called a family. And a family is linked to it's children with a link back from the children to its parent family.
I do not know of a specific graph type that represents this. Most programs custom program it with a family table and an individual table with the appropriate links between them.
Genealogy databases generally follow this structure to match the GEDCOM (Genealogy Data Communications) standard that was developed to allow transfer of data between programs.
In that standard, you'll specifically see FAM and INDI records. FAM records are connected to INDI records with HUSB, WIFE and CHIL links. INDI records are connected to FAM records with FAMS (spouse) and FAMC (parent) links.
Using this data structure will allow you easily to read a GEDCOM file and import data from other genealogy software, and also export your data to a GEDCOM file so that other genealogy programs can read it.
In genealogy, the so-called Ahnentafel indexing (German for "ancestor table") is used for representation of the ancestors of a single person; basically this is a suitable linearization of a binary tree.
To present the relations between people found in historic record, Open Archives uses a flexible force-directed graph layout implementation. In this graph every node is a person, and there are two types of vertices: one depicting a marriage (orange) and one depicting a parent relation (the red 'blood' line). An example of a graph can be seen here.
DAGs will not work. Might look at prior post using GEDCOM model in Neo4j
The lineages can have complex relationships such as double cousins, step-sibling marriages, consangienity, etc. These are easily managed in a non-sql data base such as Neo4j.
I want to run some analysis on networked data having multiple modes(i.e. multiple types of network nodes) and multiplex relations(i.e. multiples types of network edges).
The analysis is probably about SNA or applying any algorithm from graph theory, e.g. tie strength, centrality, betweenness, node distance, block, cluster, etc.
The source data is rather unstructured, therefore I should at first think about how I represent, store, and retrieve the data.
Following are some ideas. I would appreciate any feedback or further suggestion.:)
I know that there are already some great NoSQL databases, for example Neo4J, InfoGrid, for such kind of application. But for some extensibility reasons (e.g. licence, web standard...) I would like to prefer using RDF to store and represent my data. The tools to use would be SESAME or JENA.
the idea to represent network/graph data with RDF is trivial.
For example:
Network/Graph data
*Alice* ----lend 100USD----> *Bob* ----- likes ----> *Skiing*
represented with RDF
*Alice* --src--> *lend_relation* <---target--- *Bob* ---likes---> *Skiing*
|
has_value
\|/
*100USD*
[Alice src lend_relation]
[Bob target lend_relation]
[lend_relation has_value 100USD]
[Bob likes Skiing]
However, the problem is that RDF as well as SPARQL lacks of perspectives of graph model.
It is not efficient to traverse between nodes or find (the shortest) distance with RDF query.
It must be done with some extra analysis tools, for example JUNG or JGarphT,
and I must at first construct a sub graph by querying RDF storage and then convert it into the data model used by JUNG or JGraphT. If I want extra visualization (neither from JUNG nor JGraphT), then I must construct another data model for the visualization toolkit.
I don't know if that is a clear or efficient integration.
thanks again for any suggestion!
If you want to do network analysis of your RDF data with SPARQL you can have a look at SPARQL 1.1 Property Paths. I believe that in Jena/ARQ it's been already implemented ARQ - Property Paths.
Property Paths, from the new spec of SPARQL, allows you to query the RDF data model by defining graph patterns. Graph patterns that are a bit more complex than the ones you could define in SPARQL 1.0.
With this feature plus some logic at the application level you might be able to implement some interesting network analysis over your data.