Is a GraphQL related to Graph Database? - graphql

According to wikipedia: Graph Database
In computing, a graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data.[1] A key concept of the system is the graph (or edge or relationship). The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes.
If a database has a GraphQL API, is this database a Graph database?
Both terms sound very similar.

They are not related. GraphQL is just an API technology that is compared to REST . Think it as another way to implement the Web API and it has nothing to do with where the data is actually stored or the storage technology behind scene. For example, it can be used as a Web API to get the data from PostgreSQL too.
But as GraphQL treats the data as an object graph, in term of API implementation, it may be more matched when working with the Graph database. It may be easier to implement as we may delegate some graph loading problem to the Graph database to solve rather than solve it by ourself.

Related

Can I create a knowledge graph in Memgraph?

I know that knowledge graphs are represented in RDF, but I am wondering whether Memgraph as a graph database can store this kind of data?
While Memgraph is not an RDF store, it is capable of handling this kind of data with the labeled property graph model (LPG). LPG is represented by a set of nodes, relationships, properties (key-value attributes) and labels. RDF statements can be directly treated as nodes, relationships and properties of the graph, which are explored using the Cypher query language. Therefore, both RDF and LPG allow the creation of a knowledge graph.

Neptune Graph Database Performance Cost to List Edges of a Type

The use case for the graph database is to have users and contents (vertices) linked by likes, favorites and reports relations (edges). The problem I have is that I will sometimes need to show the reported contents (from any users). Since this is not a standard graph traversal, I fear this would have a big performance hit.
Is it possible to index the edges of type "reports" to quickly get the list of all contents that have been reported? Is there a better way to do this?
No, you cannot (don't need to) explicitly manage indices. Neptune uses a novel indexing strategy based on semi-clustered indices and offers excellent index performance out of the box. There is no need for custom indices.
From Neptune FAQs: https://aws.amazon.com/neptune/faqs/
Do I need to create indices on my data with Amazon Neptune?
No, existing graph database users are often forced to try and outguess the vendor implementation. Explicitly maintaining indices is just one aspect of that. Amazon Neptune does not require you to create specific indices to achieve good query performance, and it minimizes the need for such second guessing of the database design.
Can you share some details on the specific queries that you are looking for?

Do graph databases have problems with aggregation operations?

I came across multiple opinions that graph databases tend to have problems with aggregation operations. Like if you have a set of users and want to get maximum age, RDBMS will outperform graph database. Is true and if it is, what is the reason behind it? As far as I understand, key difference between relational and graph database is that each graph database node somehow includes references to the nodes it is connected to. How does it impact "get max age"-like query?
Disclaimer: most of what I have read was about Neo4j, but I suppose if these limitations exist, they should apply to any graph db.
The use of graph databases like Neo4j is recommended when dealing with connected data and complex queries.
The book Learning Neo4j by Rik Van Bruggen state that you should not use graph databases when dealing with simple, aggregate-oriented queries:
From the book:
(...) simple queries, where write patterns and read patterns align to
the aggregates that we are trying to store, are typically served quite
inefficiently in a graph, and would be more efficiently handled by an
aggregate-oriented Key-Value or Document store. If complexity is low,
the advantage of using a graph database system will be lower too.
The reason behind this is highly related to the nature of the persistence model. Its more easy to make a sum, max or avg operation over a tabled data than a data stored as graph.

Efficient traversal/search algorithm to fetch data from RDF?

I have my data as a RDF graph in DB and using SPARQL i am retriving the data. Now the nodes (objects) in the graphs gets huge and the traversal/search gets much slower now.
a. Can anyone suggest the efficient traversal/search algorithm to fetch the data?
As a next step, i have federated data i.e the data from external applications like SAP. In this case, the search becomes even much slower.
b. What efficient search algorithm do i use in this case?
This seems like a common issue in an large enterprise systems, and any inputs on how these problems have been solved in such systems will also be helpful.
I had a similiar problem. I was doing a lot of graph traversal using SPARQL property paths and it was too slow using an RDF based repository. I was using Jena TDB which is supposed to be fast but still it was too slow !
Like #Mikos suggested, I tried Neo4J. It then got much faster. Like Mark Watson says on this blog entry,
RDF data stores support SPARQL queries: good for matching patterns in data.
Neo4j supports arbitrary graph structures and seems best for exploring
a neighborhood of a graph: start at a node and explore the connected
nodes. (graph traversal)
I used Neo4j but you can try any tool that is built for graph traversal. I read that Allegrograph 4 is RDF based and has good graph traversal speed.
Now Im using Neo4j but I didnt give up on RDF. I still use URIs as identifiers and try to reuse the popular rdf vocabularies and relations. Later I'll add a feature to render my gaphs as RDF. I know that with Neo4j you can also use Tinkerpop to render RDF but I havent tried it myself.
Graph traversal and efficient querying is a wide-ranging problem and the approach to use is dependent on your situation. I would suggest looking at a data-store like Neo4j and complementing it with a tool like Lucene.

apply graph analysis on networked data represented with RDF

I want to run some analysis on networked data having multiple modes(i.e. multiple types of network nodes) and multiplex relations(i.e. multiples types of network edges).
The analysis is probably about SNA or applying any algorithm from graph theory, e.g. tie strength, centrality, betweenness, node distance, block, cluster, etc.
The source data is rather unstructured, therefore I should at first think about how I represent, store, and retrieve the data.
Following are some ideas. I would appreciate any feedback or further suggestion.:)
I know that there are already some great NoSQL databases, for example Neo4J, InfoGrid, for such kind of application. But for some extensibility reasons (e.g. licence, web standard...) I would like to prefer using RDF to store and represent my data. The tools to use would be SESAME or JENA.
the idea to represent network/graph data with RDF is trivial.
For example:
Network/Graph data
*Alice* ----lend 100USD----> *Bob* ----- likes ----> *Skiing*
represented with RDF
*Alice* --src--> *lend_relation* <---target--- *Bob* ---likes---> *Skiing*
|
has_value
\|/
*100USD*
[Alice src lend_relation]
[Bob target lend_relation]
[lend_relation has_value 100USD]
[Bob likes Skiing]
However, the problem is that RDF as well as SPARQL lacks of perspectives of graph model.
It is not efficient to traverse between nodes or find (the shortest) distance with RDF query.
It must be done with some extra analysis tools, for example JUNG or JGarphT,
and I must at first construct a sub graph by querying RDF storage and then convert it into the data model used by JUNG or JGraphT. If I want extra visualization (neither from JUNG nor JGraphT), then I must construct another data model for the visualization toolkit.
I don't know if that is a clear or efficient integration.
thanks again for any suggestion!
If you want to do network analysis of your RDF data with SPARQL you can have a look at SPARQL 1.1 Property Paths. I believe that in Jena/ARQ it's been already implemented ARQ - Property Paths.
Property Paths, from the new spec of SPARQL, allows you to query the RDF data model by defining graph patterns. Graph patterns that are a bit more complex than the ones you could define in SPARQL 1.0.
With this feature plus some logic at the application level you might be able to implement some interesting network analysis over your data.

Resources