Transform between relational database and graph-based database - algorithm

I am aware that there are algorithms (and even tools) to transform relational databases (RDBMS) to Graph databases, and the other way around.
I do have several questions that are a bit larger than that:
Is there a common-practice working algorithm out there for such transformation, for example RDBMS => graph (or several)?
Is this algorithm bijective? To be more precise:
2.1. Given said algorithm, is the transformation RDBMS => graph injective (one-to-one)? More plainly, can there be any two relational DBs that can be transformed into the same Graph DB?
2.2. Similarly, is any Graph DB can be represented by a relational DB? Basically, I'm asking if the algorithm function is surjective (onto)?

TL;DR
There's typically an obvious bijection from a particular math notion of graph (node set, edge relation) to a relational representation. Essentially because the math uses sets and relations.
There's no standard graph DBMS. And no standard way to use one to represent application/business situations. So there's no standard mapping between a graph database state & a relational state, let alone one that gives a representation in the other that is natural for the situations represented.
Without relation-valued attributes, mappings are not always bijective between non-relational structures and relational structures because we must sometimes pick relational surrogate values 1:1 with the relation values we would have used.
Sometimes we're not interested in a particular situation, we are just interested in a data structure. Then we can come up with (various) relational versions of it.
But a database or data structure variable typically represents an application/business situation. There is typically a one-to-many or one-to-one mapping from situations to representations. Under the relational model, every table has an associated (characteristic) predicate (statement template) and holds the rows that make a true proposition (statement) from its predicate. Other data structures are used in an ad hoc way to represent a situation.
What's special about the relational model is that you can generically query via predicate logic and/or relation operators--a query expression determines a predicate and its result holds the rows that make a true proposition from its predicate. (Calculated with certain complexity guarantees and certain opportunities for automated optimization.)
Mappings between structures that represent the same situation depend on how the databases represent situations. So there is no general mapping between representations, even for two representations using the same data structure.
On the other hand you can define some generic mapping between two structures, and it might be bijective, but when a situation is represented by one, the other tells you about the other representation of the situation, hence the situation only indirectly, not the situation itself directly. So don't expect the relational version that describes the other structure's representaion to be anything like a good relational design for that application/business.
This is the problem with ORMs & object databases. You can define a mapping from a particular object-oriented state to relations but the relations are only describing the object-oriented state, not its represented situation. Every time an object value holds an oid to an object referenced rather than contained, that referencing object is representing a relationship/association entity instance. But usually there is no explicit predicate given for the relation corresponding to the set of such objects. Instead we are given a representation function from some entire representing state to a represented situation. Whereas in a relational design every superkey value of every table (base or query result) is 1:1 with some (possibly associative) entity.

Related

Can you have different types of primitive data types in a dynamic array?

I am new to data structures in computer science. I am trying to find out about all the types of implementations of lists. I started with dynamic arrays and I wanted to know if it is possible to have different types of primitive data types in a dynamic array data structure.
I though that "dynamic" only means that you can remove, insert and add to your array without caring about its size. But do you have to care for the types of elements that there are in the array too ?
The term you are searching for is heterogeneous repectivelly homogenous. Heterogenous lists can store different kind of elements, while homogenous lists are limited to one type of elements.
Python is a good example for heterogeneous lists. This is implemented by storing references to the different objects in the list. So from a technical point of view, they store homogenous references, but from a user perspective they store different types, such as integer, strings, and other objects.
The term dynamic data structure only refers to its size/structure in runtime, as in it can change on runtime.
So for example, in C++ an array is a static data structure, whereas a vector or ordered_set is probably what you might call dynamic.
By having multiple data types in a data structure, what you are referring to is a dynamically typed language.
Any data structure will support multiple elements in it if the language is dynamically typed, such as python. The data structure itself need not be strictly dynamic for that to happen.

Data Structures and data representation in a given context

I started dedicating time for learning algorithms and data structures. So my first and basic question is, how do we represent the data depending on the context.
I have given it time and thought and came up with this conclusion.
Groups of same data -> List/Arrays
Classification of data [Like population on gender, then age etc.] -> Trees
Relations [Like relations between a product brought and others] -> Graphs
I am posting this question to know our stack overflow community thought about my interpretation of datastructures. Since it is a generic topic I could not get a justification for my thought online. Please help me if I am wrong.
This looks like oversimplifying things.
The data structure we want to use depends on what we are going to do with the data.
For example, when we store records about people and need fast access by index, we can use an array.
When we store the same records about people but need to find by name fast, we can use a search tree.
Graphs are a theoretical concept, not a data structure.
They can be stored as an adjacency matrix (two-dimensional array, suitable for small or dense graphs), or as lists of adjacent edges (array/list of dynamic arrays/lists, suitable for large or sparse graphs), or implicitly (generated on the fly), or otherwise.

Abstract and Primitive Data Types (ADT)

I know this question has been asked a million times but Can someone please explain to me what ADT exactly means (in layman's terms if possible) ?
I read this definition of ADT- ADT only mentions what operations are to be performed but not how these operations will be implemented.
So is the case with primitive data types.
Suppose if we have a float data type, we know that multiplication, division, etc. operations can be performed (so we know what operations will be performed) but we don't how it'll be performed (in case of multiplication we can just multiply or repeatedly add, so we have two processes giving the same result and therefore it's abstract). So both data types are essentially the same. (I know it's incorrect).
I know I'm getting it all wrong. Can someone please help me clear this concept?
Data types are classification of data in any programming language - for example integers, characters, floats etc.
Abstract Data type is a theoretical concept. An abstract data type (ADT) is a mathematical model for data types where a data type is defined by its behaviour (semantics) from the point of view of a user of the data, specifically in terms of possible values, possible operations on data of this type, and the behaviour of these operations.A set of data values and associated operations that are precisely specified independent of any particular implementation. Abstract data type (ADT) is a specification of a set of data and the set of operations that can be performed on the data.
For Example : Stack is an Abstract Data Type. A stack ADT can have the operations push, pop, peek. These three operations define what the type can be irrespective of the language of the implementation.
So we can say, Primitive data types are a form of Abstract data type. It is just that they are provided by the language makers and are very specific to the language. So basically there are 2 types of data types primitives and user defined. But both of them are abstract data types. I hope this makes it clear.
Abstract Data Types - These are the building blocks for manipulating data that would otherwise just be a string of 1s and 0s. Would you rather have liquid metal, or nuts and bolts to build with?
It is long since this question was asked but I have an answer that might be a little bit clearer.
Data Type - Deals with a set of values, their representation and a set of operations that can be applied to them.
Abstract Data Type (ADT) - Deals with a set of values and a set of operations that can be performed on them.
The difference between the two is that ADT is not concerned with the representation of these values.

is Tree, a data structure or abstract data type?

I hear many people referring tree as a data structure. But trees are mostly implemented using Linked Lists, or Arrays. So does it make it an abstract data type?
Given a type of structure, how can we determine whether it is a data structure or abstract data type?
If you are talking about a general Tree without specifying its implementation or any underlying data structure used, itself is an Abstract Data Type(ADT). ADT is any data type that doesn't specify its implementation.
But once you start talking about a concrete Tree with specific implementation using Linked List or Arrays, then that kind of concrete tree is a data structure.
With the above out of the way, the following may help you clear other confusions related to your question. Correct me if I'm wrong!
Data Type
The definition of data type from Wikipedia:
A data type or simply type is a classification identifying one of various types of data.
Data type is only a classification of data. It doesn't have any specifications about how those data are implemented. IMHO, data type is only a theoretical concept.
For example, any real number can be of the data type real. But along with integers, they can both be classified as a numeric data type, say number.
As I just pointed out, ADT is one kind of data type. But whether string, int can be considered as ADTs?
The answer is both yes and no.
Yes, because programming languages can have many ways to implement string and int ; but on one condition that through out all programming languages, these data types must share consistent properties.
No, because these primitive data types are not as abstract as stacks or queues. Since these data types seldom share consistent properties in every programming language, users of them must know the underlying problems like arithmetic overflow, etc.. Two languages may both have the int data type, but one ranges up to infinity and the other up to 2^32. This kind of technical detail must-knows is not what ADTs have promised. Let's look at stacks instead. In every programming language, stack can promise you with consistent procedures like pop, push. No other details on implementation level you should know about them, you just use them however you like it in every language.
Data Structure
Let's see the definition of data structure from Wiki:
A data structure is a particular way of organizing data in a computer so that it can be used efficiently.
As you can see, data structure is all about implementations. It is not conceptual but concrete. In my opinion, every piece of data in a program can by definition be considered as a data structure. A string can. An int can. And a whole bunch of other things like LinkedList_Stack or Array_Stack are all data structures.
Some of you might argue why int is a data structure? It's a data structure in a lower level from a programming language's author's view. Because programming languages can have many ways storing an int data type in a computer. The most common solution is two's complement, other alternatives are offset binary and ones' complement etc. However, from a user's view, we see int as the primitive data type which a programming language offers out of the box, we don't care its implementation. It's just the building block of one programming language. So for us users, any data constructed by these building blocks(primitive data types) of a programming language is more like a data structure. While for authors of programming languages, the building blocks are some lower level machine code, so for them int is definitely a data structure.
Put simply, whether one thing is a data structure or not really depends on how we look at it.
Via google:
In computer science, an abstract data type (ADT) is a mathematical
model for a certain class of data structures that have similar
behavior
So clearly, it is both.

normalize boolean expression for caching reasons. is there a more efficient way than truth tables?

My current project is an advanced tag database with boolean retrieval features. Records are being queried with boolean expressions like such (e.g. in a music database):
funky-music and not (live or cover)
which should yield all funky music in the music database but not live or cover versions of the songs.
When it comes to caching, the problem is that there exist queries which are equivalent but different in structure. For example, applying de Morgan's rule the above query could be written like this:
funky-music and not live and not cover
which would yield exactly the same records but of cause break caching when caching would be implemented by hashing the query string, for example.
Therefore, my first intention was to create a truth table of the query which could then be used as a caching key as equivalent expressions form the same truth table. Unfortunately, this is not practicable as the truth table grows exponentially with the number of inputs (tags) and I do not want to limit the number of tags used in one query.
Another approach could be traversing the syntax tree applying rules defined by the boolean algebra to form a (minimal) normalized representation which seems to be tricky too.
Thus the overall question is: Is there a practicable way to implement recognition of equivalent queries without the need of circuit minimization or truth tables (edit: or any other algorithm which is NP-hard)?
The ne plus ultra would be recognizing already cached subqueries but that is no primary target.
A general and efficient algorithm to determine whether a query is equivalent to "False" could be used to solve NP-complete problems efficiently, so you are unlikely to find one.
You could try transforming your queries into a canonical form. Because of the above, there will be always be queries that are very expensive to transform into any given form, but you might find that, in practice, some form works pretty well most of the time - and you can always give up halfway through a transformation if it is becoming too hard.
You could look at http://en.wikipedia.org/wiki/Conjunctive_normal_form, http://en.wikipedia.org/wiki/Disjunctive_normal_form, http://en.wikipedia.org/wiki/Binary_decision_diagram.
You can convert the queries into conjunctive normal form (CNF). It is a canonical, simple representation of boolean formulae that is normally the basis for SAT solvers.
Most likely "large" queries are going to have lots of conjunctions (rather than lots of disjunctions) so CNF should work well.
The Quine-McCluskey algorithm should achieve what you are looking for. It is similiar to Karnaugh's Maps, but easier to implement in software.

Resources