Rete Algorithm is an efficient pattern matching algorithm that compares a large collection of patterns to a large collection of objects. It is also used in one of the expert system shell that I am exploring right now: is drools.
What is the time complexity of the algorithm, based on the number of rules I have?
Here is a link for Rete Algorithm: http://www.balasubramanyamlanka.com/rete-algorithm/
Also for Drools: https://drools.org/
Estimating the complexity of RETE is a non-trivial problem.
Firstly, you cannot use the number of rules as a dimension. What you should look at are the single constraints or matches the rules have. You can see a rule as a collection of constraints grouped together. This is all what RETE reasons about.
Once you have a rough estimate of the amount of constraints your rule base has, you will need to look at those which are inter-dependent. Inter-dependent constraints are the most complex matches and are pretty similar in concept as JOINS in SQL queries. Their complexity varies based on their nature as well as the state of your working memory.
Then you will need to look at the size of your working memory. The amount of facts you assert within a RETE based expert system strongly influence its performance.
Lastly, you need to consider the engine conflict resolution strategy. If you have several conflicting rules, it might take a lot of time to figure out in which order to execute them.
Regarding RETE performance, there is a very good PhD dissertation I'd suggest you to look at. The author is Robert B. Doorenbos and the title is "Production matching for large learning systems".
Related
Recently, I was reading these books about algorithms, specifically the section about analysis of algorithms:
Introduction to Algorithms. 3rd ed. TCRC
Algorithm Design Manual. 2nd ed. S. Skiena
Algorithm Design. J.Kleinberg & Eva Tardos
Algorithms. 4th ed. R. Sedgewick
Algorithms. S. Dasgupta, C. Papadimitriou & Vazirani
a few other books
After that, I got a bit confused because I don't fully understand the origin of counting steps of algorithms.
I mean, in Introduction to Algorithms and Algorithm Design Manual, something called the RAM model of computation is mentioned. In these books, it is said that under that model we count steps, but in the others books a model of computation as such is not mentioned.
The other books talk about counting steps of the path that the algorithm travels, that is, in a common sense way or in a logical way. So, I would appreciate if you guys could help me with these questions:
What's the relationship(or difference) between the step count method (other books) and using a model of computation (TCRC & S. Skiena) to do it?
When someone talks about counting steps to analyze algorithms, may I assume he is referring to using a model of computation(RAM)?
Our common sense is based on a model of computation that can be implicit or explicit. Usually in an introductory course it is left implicit. Explicitly what you use is usually the RAM model. Which is based on the idea of sequential processing, where each simple operation takes constant time. So you just count steps.
You can find a formal description of that model at http://people.seas.harvard.edu/~cs125/fall14/lec6.pdf.
Reality is, of course, rather different. As https://gist.github.com/jboner/2841832 shows, operations take wildly different amounts of time. I've personally seen jobs go from 5 days to 1 hour by switching to using a sort instead of hash lookups. Yes, hash lookups are O(1), but with a horrible constant when data is backed by disk. Distributed computing has things operating in parallel. Computing on a GPU gives you a tremendous amount of parallelism..as long as all computation operates in perfect lockstep. We are trying to build quantum computers, which can theoretically give would give us many, many orders of magnitude more parallelism..at the cost of losing irreversible operations like "if".
We can create models that deal with all of this complexity. But there is no need to consider any of it until you understand the basics. Which is the standard "count operations" thing from the RAM model.
In data mining, frequent itemset are found using different algorithms like Apriori Algorithm , FP-Tree , etc. So are these the pattern evaluation methods?
You can try Association Rules (apriori for example), Collaborative Filtering (item-based or user-based) or even Clustering.
I don't know what you are trying to do, but if you have a data-set and you need to find the most frequent item-set you should try some of the above techniques.
If you're using R you should explore the arules package for association rules (for example).
Apriori algorithm and FP-tree algorithm is used to find frequent itemsets for the given transactional data. This would help in market basket analysis applications. For pattern evaluation, there are many components namely:
support,
confidence,
Lift,
Imbalance ratio, etc.
More details can be seen at the paper:
Selecting the right interestingness measure for association patterns by Pang Ning Tan, Vipin Kumar, Jaideep Srivastava, KDD 2002.
Refer URL:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.1494&rep=rep1&type=pdf
I'm looking for a supervised machine learning algorithm that would produce transparent rules or definitions that can be easily interpreted by a human.
Most algorithms that I work with (SVMs, random forests, PLS-DA) are not very transparent. That is, you can hardly summarize the models in a table in a publication aimed at a non-computer scientist audience. What authors usually do is, for example, publish a list of variables that are important based on some criterion (for example, Gini index or mean decrease of accuracy in the case of RF), and sometimes improve this list by indicating how these variables differ between the classes in question.
What I am looking is a relatively simple output of the style "if (any of the variables V1-V10 > median or any of the variables V11-V20 < 1st quartile) and variable V21-V30 > 3rd quartile, then class A".
Is there any such thing around?
Just to constraint my question a bit: I am working with highly multidimensional data sets (tens of thousands to hundreds of thousands of often colinear variables). So for example regression trees would not be a good idea (I think).
You sound like you are describing decision trees. Why would regression trees not be a good choice? Maybe not optimal, but they work, and those are the most directly interpretable models. Anything that works on continuous values works on ordinal values.
There's a tension between wanting an accurate classifier, and wanting a simple and explainable model. You could build a random decision forest model, and constrain it in several ways to make it more interpretable:
Small max depth
High minimum information gain
Prune the tree
Only train on "understandable" features
Quantize/round decision threhsolds
The model won't be as good, necessarily.
You can find interesting research in the understanding AI methods done by Been Kim at Google Brain.
I've been set the assignment of producing a solution for the capacitated vehicle routing problem using any algorithm that learns. From my brief search of the literature, tabu search variants seem to be the most successful. Can they be classed as learning algorithms though or are they just variants on local search?
Search methods are not "learning". Learning, in cotenxt of computer science is a term for learning machines - which improve their quality over training (experience). Metaheuristics, which simply search through some space do not "learn", they simply browse all possible solutions (in heuristically guided manner) in order to optimize some function. In other words - optimization techniques are used to train some models, but these optimizers themselves don't "learn". Although this is purely linguistic manner, but I would distinguish between methods that learns - in the sense - are trying to generalize knowledge from some set of examples, from algorithms which simply are searching for best parameters for arbitrary given function. The core idea of machine learning (which distinguishes it from optimization itself) is that the aim is to actually maximize the quality of our model on unknown data, while in optimization (and in particular tabu search) we are simply looking for the best quality on exactly known, and well defined data (function).
In the past I had to develop a program which acted as a rule evaluator. You had an antecedent and some consecuents (actions) so if the antecedent evaled to true the actions where performed.
At that time I used a modified version of the RETE algorithm (there are three versions of RETE only the first being public) for the antecedent pattern matching. We're talking about a big system here with million of operations per rule and some operators "repeated" in several rules.
It's possible I'll have to implement it all over again in other language and, even though I'm experienced in RETE, does anyone know of other pattern matching algorithms? Any suggestions or should I keep using RETE?
The TREAT algorithm is similar to RETE, but doesn't record partial matches. As a result, it may use less memory than RETE in certain situations. Also, if you modify a significant number of the known facts, then TREAT can be much faster because you don't have to spend time on retractions.
There's also RETE* which balances between RETE and TREAT by saving some join node state depending on how much memory you want to use. So you still save some assertion time, but also get memory and retraction time savings depending on how you tune your system.
You may also want to check out LEAPS, which uses a lazy evaluation scheme and incorporates elements of both RETE and TREAT.
I only have personal experience with RETE, but it seems like RETE* or LEAPS are the better, more flexible choices.