Efficient Mutable Graph Representation in Prolog? - performance

I would like to represent a mutable graph in Prolog in an efficient manner. I will searching for subsets in the graph and replacing them with other subsets.
I've managed to get something working using the database as my 'graph storage'. For instance, I have:
:- dynamic step/2.
% step(Type, Name).
:- dynamic sequence/2.
% sequence(Step, NextStep).
I then use a few rules to retract subsets I've matched and replace them with new steps using assert. I'm really liking this method... it's easy to read and deal with, and I let Prolog do a lot of the heavy pattern-matching work.
The other way I know to represent graphs is using lists of nodes and adjacency connections. I've seen plenty of websites using this method, but I'm a bit hesitant because it's more overhead.
Execution time is important to me, as is ease-of-development for myself.
What are the pros/cons for either approach?

As usual: Using the dynamic database gives you indexing, which may speed things up (on look-up) and slow you down (on asserting). In general, the dynamic database is not so good when you assert more often than you look up. The main drawback though is that it also significantly complicates testing and debugging, because you cannot test your predicates in isolation, and need to keep the current implicit state of the database in mind. Lists of nodes and adjacancy connections are a good representation in many cases. A different representation I like a lot, especially if you need to store further attributes for nodes and edges, is to use one variable for each node, and use variable attribtues (get_attr/3 and put_attr/3 in SWI-Prolog) to store edges on them, for example [edge_to(E1,N_1),edge_to(E2,N_2),...] where N_i are the variables representing other nodes (with their own attributes), and E_j are also variables onto which you can attach further attributes to store additional information (weight, capacity etc.) about each edge if needed.

Have you considered using SWI-Prolog's RDF database ? http://www.swi-prolog.org/pldoc/package/semweb.html

as mat said, dynamic predicates have an extra cost.
in case however you can construct the graph and then you dont need to change it, you can compile the predicate and it will be as fast as a normal predicate.
usually in sw-prolog the predicate lookup is done using hash tables on the first argument. (they are resized in case of dynamic predicates)
another solution is association lists where the cost of lookup etc is o(log(n))
after you understand how they work you could easily write an interface if needed.
in the end, you can always use a SQL database and use the ODBC interface to submit queries (although it sounds like an overkill for the application you mentioned)

Related

Change in the bayesian network structure

I'm learning a bayesian network and was wondering if it is possible to merge multiple children into a single child? For example in the figure below, could it be possible to have a single conditional probability table (Node DEF) from the three conditional probability tables (D, E, F).
If it's not possible, is there any work to make independent events to dependent event?
Thank you all
This question is not very well asked, so it is difficult to answer.
First of all the nodes of a BN are not events but random variables.
Secondly, merging nodes requires a justification of this merge. The first idea would be to generate a multidimensional variable with 8 values, which represents exactly the 8 possible values of DEF.
It is also possible to set up merging solutions like aggregators: an AND, an OR, a noisyOR, etc...
Finally, there is a large literature on sensor data fusion that certainly deserves to be looked at closely.

Intelligent purely functional sets

Set computations composed of unions, intersections and differences can often be expressed in many different ways. Are there any theories or concrete implementations that try to minimize the amount of computation required to reach a given answer?
For example, I first came across a practical application of this when trying to decompose atoms in a simulation of an amorphous material into neighbor shells where the first shell are the immediate neighbors of some given origin atom and the second shell are those atoms that are neighbors of the first shell not in either the first shell or the one before it:
nth 0 = singleton i
nth 1 = neighbors i
nth n = reduce union (map neighbors (nth(n-1))) - nth(n-1) - nth(n-2)
There are many different ways to solve this. You can incrementally test of membership in each set whilst composing the result or you can compute the union of three neighbor shells and use intersection to remove the previous two shells leaving the outermost one. In practice, solutions that require the construction of large intermediate sets are slower.
Presumably an intelligent set implementation could compose the expression that was to be evaluated and then optimize it (e.g. to reduce the size of intermediate sets) before evaluating it in order to improve performance. Do such set implementations exist?
Your question immediately reminded me of Haskell's stream fusion, described in this paper. The general principle can be summarized quite easily: Instead of storing a list, you store a way to build a list. Then the list transformation functions operate directly on the list generator, meaning that all the operations fuse into a single generation of the data without any intermediate structures. Then when you are done composing operations you run the generator and produce the data.
So I think the answer to your question is that if you wanted some similarly intelligent mechanism that fused computations and eliminated intermediate data structures, you'd need to find a way to transform a set into a "co-structure" (that's what the paper calls it) that generates a set and operate directly on that, then actually generate the set when you are done.
I think there's a very deep theory behind this concept that the paper hints at but never spells out, and if somebody else here knows what it is, please let me know, because this is very relevant to something else I am doing, too!

Iterable O(1) insert and random delete collection

I am looking to implement my own collection class. The characteristics I want are:
Iterable - order is not important
Insertion - either at end or at iterator location, it does not matter
Random Deletion - this is the tricky one. I want to be able to have a reference to a piece of data which is guaranteed to be within the list, and remove it from the list in O(1) time.
I plan on the container only holding custom classes, so I was thinking a doubly linked list that required the components to implement a simple interface (or abstract class).
Here is where I am getting stuck. I am wondering whether it would be better practice to simply have the items in the list hold a reference to their node, or to build the node right into them. I feel like both would be fairly simple, but I am worried about coupling these nodes into a bunch of classes.
I am wondering if anyone has an idea as to how to minimize the coupling, or possibly know of another data structure that has the characteristics I want.
It'd be hard to beat a hash map.
Take a look at tries.
Apparently they can beat hashtables:
Unlike most other algorithms, tries have the peculiar feature that the time to insert, or to delete or to find is almost identical because the code paths followed for each are almost identical. As a result, for situations where code is inserting, deleting and finding in equal measure tries can handily beat binary search trees or even hash tables, as well as being better for the CPU's instruction and branch caches.
It may or may not fit your usage, but if it does, it's likely one of the best options possible.
In C++, this sounds like the perfect fit for std::unordered_set (that's std::tr1::unordered_set or boost::unordered_set to you if you have an older compiler). It's implemented as a hash set, which has the characteristics you describe.
Here's the interface documentation. Note that the hash containers actually offer two sets of iterators, the usual ones and local ones which only go through one bucket.
Many other languages have "hash sets" as well, certainly Java and C#.

normalize boolean expression for caching reasons. is there a more efficient way than truth tables?

My current project is an advanced tag database with boolean retrieval features. Records are being queried with boolean expressions like such (e.g. in a music database):
funky-music and not (live or cover)
which should yield all funky music in the music database but not live or cover versions of the songs.
When it comes to caching, the problem is that there exist queries which are equivalent but different in structure. For example, applying de Morgan's rule the above query could be written like this:
funky-music and not live and not cover
which would yield exactly the same records but of cause break caching when caching would be implemented by hashing the query string, for example.
Therefore, my first intention was to create a truth table of the query which could then be used as a caching key as equivalent expressions form the same truth table. Unfortunately, this is not practicable as the truth table grows exponentially with the number of inputs (tags) and I do not want to limit the number of tags used in one query.
Another approach could be traversing the syntax tree applying rules defined by the boolean algebra to form a (minimal) normalized representation which seems to be tricky too.
Thus the overall question is: Is there a practicable way to implement recognition of equivalent queries without the need of circuit minimization or truth tables (edit: or any other algorithm which is NP-hard)?
The ne plus ultra would be recognizing already cached subqueries but that is no primary target.
A general and efficient algorithm to determine whether a query is equivalent to "False" could be used to solve NP-complete problems efficiently, so you are unlikely to find one.
You could try transforming your queries into a canonical form. Because of the above, there will be always be queries that are very expensive to transform into any given form, but you might find that, in practice, some form works pretty well most of the time - and you can always give up halfway through a transformation if it is becoming too hard.
You could look at http://en.wikipedia.org/wiki/Conjunctive_normal_form, http://en.wikipedia.org/wiki/Disjunctive_normal_form, http://en.wikipedia.org/wiki/Binary_decision_diagram.
You can convert the queries into conjunctive normal form (CNF). It is a canonical, simple representation of boolean formulae that is normally the basis for SAT solvers.
Most likely "large" queries are going to have lots of conjunctions (rather than lots of disjunctions) so CNF should work well.
The Quine-McCluskey algorithm should achieve what you are looking for. It is similiar to Karnaugh's Maps, but easier to implement in software.

Languages with native / syntactical / inline graph support?

The graph is arguably the most versatile and valuable data structure of all. I can store single variables, lists, hashes etc., and of course graphs, with it.
Given this, are there any languages that offer inline / native graph support and syntax? I can create variables, arrays, lists and hashes inline in Ruby, Python and Javascript, but if I want a graph, I have to either manage the representation myself with a matrix / list, or select a library, and use the graph through method calls.
Why on earth is this still the case in 2010? And, practically, are there any languages out there which offer inline graph support and syntax?
The main problem of what you are asking is that a more general solution is not the best one for a specific problem. It's just average for all of them but not a best one.
Ok, you can store a list in a graph assuming its degeneracy but why should you do something like that? And how would you store an hashmap inside a graph? Why would you need such a structure?
And do not forgot that graph implementation must be chosen accordingly to which operations you are going to do on it, otherwise it would be like using a hashtable to store a list of values or a list to store an ordered collection instead that a tree. You know that you can use an adjacency matrix, an edge list or adjacency lists.. every different implementation with it's own strenghts and weaknesses.
Then graphs can have really many properties compared to other collections of data, cyclic, acyclic, directed, undirected, bipartite, and so on.. and for any specific case you can implement them in a different way (assuming some hypothesis on the graph you need) so having them in native syntax would be overkill since you would need to configure them anyway (and language should provide many implementations/optimizations).
If everything is already made you remove the fun of developing :)
By the way just look for a language that allows you to write your own graph DSL and live with it!
Gremlin, a graph-based programming language: https://github.com/tinkerpop/gremlin/wiki
GrGen.NET (www.grgen.net) is a programming language for graph transformation plus an environment including a graphical debugger. You can define your graph model, the rewrite rules, and rule control with some nice special purpose languages and use the generated assemblies/C# code from any .NET language you like or from the supplied shell.
To understand why normal languages don't offer such a convenient/built-in interface to graphs, just take a look at the amount of code written for that project: the compiler alone is several man-years of work. That's a price tag too hefty for a feature/data structure only a minority of programmers ever need - so it's not included in general purpose programming languages.

Resources