fp growth algorithm

fp growth algorithm - algorithm

I have to implement FP-growth algorithm using any language. The code should be a serial code with no recursion. Is it possible to implement such algorithm without recursion? I am not looking for code, I just need an explanation of how to do it.

FPGrowth is a recursive algorithm. Like some other people said here, you can always transform an algorithm into a non recursive algorithm by using a stack. But I don't see any good reasons to do that for FPGrowth.
By the way, if you want a Java implementation of FPGrowth and other frequent pattern mining algorithms such as Apriori, HMine, Eclat, etc., you can check my website. I have implemented more than 40 algorithms for frequent pattern mining, association rule mining, etc.:
http://www.philippe-fournier-viger.com/spmf/

I don't know what is the algorithm you talking about. But everyting what is possible with recursion it is possible also without it. You can implement such kind of algorithms using stack.

Here is an very clear explanation of how the code works. It looks like you have to build a tree and validate it.

Assuming by "FP growth Algorithm" you mean frequent Pattern growth algorithm, I would point you over to this document that gives a decent explanation on how it works.
http://www.florian.verhein.com/teaching/2008-01-09/fp-growth-presentation_v1%20%28handout%29.pdf
I wonder though, is this homework related?

You can probably visit http://code.google.com/p/lofia/ to get something over FP Tree.
This is for longest frequent itemset mining.

You can take look at the concept & implemenntation FP growth algoithm in Mahout

Related

when to use bottom-up DP and when to use top-down DP

I have leant that two ways of DP, but I am confused now. How we choose in different condition? And I find that in most of time top-down is more natural for me. Can anyone tell me that how to make the choice.
PS: I have read this post older post but still get confused. Need help. Don't identify my questions as duplication. I have mentioned that they are different. I hope to know how to choose and when to consider problem from top-down or bottom-up way.

To make it simple, I will explain based on my summary from some sources
Top-down: something looks like: a(n) = a(n-1) + a(n-2). With this equation, you can implement with about 4-5 lines of code by making the function a call itself. Its advantage, as you said, is quite intuitive to most developers but it costs more space (memory stack) to execute.
Bottom-up: you first calculate a(0) then a(1), and save it to some array (for instance), then you continuously savea(i) = a(i-1) + a(i-2). With this approach, you can significantly improve the performance of your code. And with big n, you can avoid stack overflow.

A slightly longer answer, but I have tried to explain my own approach to dynamic programming and what I have come to understand after solving such questions. I hope future users find it helpful. Please do feel free to comment and discuss:
A top-down solution comes more naturally when thinking about a dynamic programming problem. You start with the end result and try to figure out the ways you could have gotten there. For example, for fib(n), we know that we could have gotten here only through fib(n-1) and fib(n-2). So we call the function recursively again to calculate the answer for these two cases, which goes deeper and deeper into the tree until the base case is reached. The answer is then built back up until all the stacks are popped off and we get the final result.
To reduce duplicate calculations, we use a cache that stores a new result and returns it if the function tries to calculate it again. So, if you imagine a tree, the function call does not have to go all the way down to the leaves, it already has the answer and so it returns it. This is called memoization and is usually associated with the top-down approach.
Now, one important point I think for the bottom-up approach is that you must know the order in which the final solution has to be built. In the top-down case, you just keep breaking one thing down into many but in the bottom-up case, you must know the number and order of states that need to be involved in a calculation to go from one level to the next. In some simpler problems (eg. fib(n)), this is easy to see, but for more complex cases, it does not lend itself naturally. The approach I usually follow is to think top-down, break the final case into previous states and try to find a pattern or order to then be able to build it back up.
Regarding when to choose either of those, I would suggest the approach above to identify how the states are related to each other and being built. One important distinction you can find this way is how many calculations are really needed and how a lot might just be redundant. In the bottom up case, you have to fill an entire level before you go to the next. However, in the top down case, an entire subtree can be skipped if not needed and in such a way, a lot of extra calculations can be saved.
Hence, the choice obviously depends on the problem, but also on the inter-relation between states. It is usually the case that bottom-up is recommended because it saves you stack space as compared to the recursive approach. However, if you feel the recursion isn't too deep but is very wide and can lead to a lot of unnecessary calculations by tabularization, you can then go for top-down approach with memoization.
For example, in this question: https://leetcode.com/problems/partition-equal-subset-sum/, if you see the discussions, it is mentioned that top-down is faster than bottom-up, basically, the binary tree approach with a cache versus the knapsack bottom up build-up. I leave it as an exercise to understand the relation between the states.

Bottom-up and Top-down DP approaches are the same for many problems in terms of time and space complexity. Difference are that, bottom-up a little bit faster, because you don't need overhead for recursion and, yes, top-down more intuitive and natural.
But, real advantage of Top-bottom approach can be on some small sets of tasks, where you don't need to calculate answer for all smaller subtasks! And you can reduce time complexity in this cases.
For example you can use Top-down approach with memorization for finding N-th Fibonacci number, where the sequence is defined as a[n]=a[n-1]+a[n-2] So, you have both O(N) time for calculating it (I don't compare with O(logN) solution for finding this number). But look at the sequence a[n]=a[n/2]+a[n/2-1] with some edge cases for small N. In botton up approach you can't do it faster than O(N) where top-down algorithm will work with complexity O(logN) (or maybe some poly-logarithmic complexity, I am not sure)

To add on to the previous answers,
Optimal time:
if all sub-problems need to be solved
→ bottom-up approach
else
→ top-down approach
Optimal space:
Bottom-up approach
The question Nikhil_10 linked (i.e https://leetcode.com/problems/partition-equal-subset-sum/) doesn't require all subproblems to be solved. Hence the top-down approach is more optimal.

If you like the top-down natural then use it if you know you can implement it. bottom-up is faster than the top-down one. Sometimes Bottom-ups are easier and most of the times the bottom-up are easy. Depending on your situation make your decision.

Effectiveness vs efficiency of algorithms

Please can someone tell me what the effectiveness of an algorithm relates to? I understand what the efficiency component entails
thanks

Effectiveness relates to the ability to produce the desired result.
Some tasks inherently do not have strict definitions - for example, machine translation between two human languages. Different algorithms exist to translate, say, from English to Spanish; their effectiveness is a measure of how good are the results that these algorithms produce. Their efficiency , on the other hand, measure how fast they are at producing the results, how much memory they use, how much disk space they need, etc.

This question suggests that you have read something which refers to the effectiveness of algorithms and have not understood the author's explanation of the term -- if the author has provided one. I don't think that there is a generally accepted interpretation of the term, I think it is one of those terms which falls under the Humpty-Dumpty rule 'a word means what I say it means'.
It might refer to an aspect of some algorithms which return only approximate solutions to problems. For example, we all know that the travelling salesman problem has NP time complexity, an actual algorithm which 'solves' the TSP might provide some bounds on the difference between the solutions it can find and an optimal solution which might take too long to find.

Algorithms: explanation about complexity and optimization

I'm trying to find a source or two on the web that explain these in simple terms. Also, can this notion be used in a practical fashion to improve an algorithm? If so, how? Below is a brief explanation I got from a contact.
I dont know where you can find simple
explanation. But i try to explain you.
Algorithm complexity is a term, that
explain dependence between input
information and resourses that need to
process it. For example, if you need
to find max in array, you should
enumerate all elements and compare it
with your variable(e.g. max). Suppose
that there are N elements in array.
So, complexity of this algorithm is
O(N), because we enumerate all
elements for one time. If we enumerate
all elements for 2 times, complexity
will be O(N*N). Binary search has
complexity O(log2(N)), because its
decrease area of search by a half
during every step. Also, you can
figure out a space complexity of your
algorithm - dependence between space,
required by program, and amount of
input data.

It's not easy to say all things about complexity, but I think wiki has a good explanation on it and for startup is good, see:
Big O notation for introducing
this aspect (Also you can look at
teta and omega notations too).
Analysis of algorithm, to know
about complexity more.
And Computational Complexity,
which is a big theory in computer
science.
and about optimization you can look at web and wiki to find it, but with five line your friends give a good sample for introduction, but these are not one night effort for understanding their usage, calculation, and thousand of theory.
In all you can be familiar with them as needed, reading wiki, more advance reading books like Gary and Johnson or read Computation Complexity, a modern approach, but do not expect you know everything about them after that. Also you can see this lecture notes: http://www.cs.rutgers.edu/~allender/lecture.notes/.

As your friend hinted, this isn't a simple topic. But it is worth investing some time to learn. Check out this book, commonly used as a textbook in CS courses on algorithms.

The course reader used in Stanford's introductory programming classes has a great chapter on algorithmic analysis by legendary CS educator Eric Roberts. The whole text is online at this link, and Chapter 8 might be worth looking at.

You can watch Structure and Interpretation of computer programs. It's a nice MIT course.

Also, can this notion be used in a practical fashion to improve an algorithm? If so, how?
It's not so much used for improving an algorithm but evaluating the performance of algorithms and deciding on which algorithm you choose to use. For any given problem, you really want to avoid algorithms that has O(N!) or O(N^x) since they slow down dramatically when the size of N (your input) increases. What you want is O(N) or O(log(N)) or even better O(1).
O(1) is constant time which means the algorithm takes the same amount of time to execute for a million inputs as it does for one. O(N) is of course linear which means the time it takes to execute the algorithm increases in proportion to its input.
There are even some problems where any algorithm developed to solve them end up being O(N!). Basically no fast algorithm can be developed to solve the problem completely (this class of problems is known as NP-complete). Once you realize you're dealing with such a problem you can relax your requirements a bit and solve the problem imperfectly by "cheating". These cheats don't necessarily find the optimal solution but instead settle for good enough. My favorite cheats are genetic/evolutionary algorithms and rainbow tables.
Another example of how understanding algorithmic complexity changes how you think about programming is micro-optimizations. Or rather, not doing it. You often see newbies asking questions like is ++x faster than x++. Seasoned programmers mostly don't care and will usually reply the first rule of optimization is: don't.
The more useful answer should be that changing x++ to ++x does not in any way alter your algorithm complexity. The complexity of your algorithm has a much greater impact on the speed of your code than any form of micro-optimization. For example, it is much more productive for you to look at your code and reduce the number of deeply nested for loops than it is to worry about how your compiler turns your code to assembly.
Yet another example is how in games programming speeding up code counter-intuitively involve adding even more code instead of reducing code. The added code are in the form of filters (basically if..else statements) that decides which bit of data need further processing and which can be discarded. Form a micro-optimizer point of view adding code means more instructions for the CPU to execute. But in reality those filters reduce the problem space by discarding data and therefore run faster overall.

By all means, understand data structures, algorithms, and big-O.
Design code carefully and well, keeping it as simple as possible.
But that's not enough.
The key to optimizing is knowing how to find problems, after the code is written.
Here's an example.

Solutions to problems using dynamic programming or greedy methods?

What properties should the problem have so that I can decide which method to use dynamic programming or greedy method?

Dynamic programming problems exhibit optimal substructure. This means that the solution to the problem can be expressed as a function of solutions to subproblems that are strictly smaller.
One example of such a problem is matrix chain multiplication.
Greedy algorithms can be used only when a locally optimal choice leads to a totally optimal solution. This can be harder to see right away, but generally easier to implement because you only have one thing to consider (the greedy choice) instead of multiple (the solutions to all smaller subproblems).
One famous greedy algorithm is Kruskal's algorithm for finding a minimum spanning tree.

The second edition of Cormen, Leiserson, Rivest and Stein's Algorithms book has a section (16.4) titled "Theoretical foundations for greedy methods" that discusses when the greedy methods yields an optimum solution. It covers many cases of practical interest, but not all greedy algorithms that yield optimum results can be understood in terms of this theory.
I also came across a paper titled "From Dynamic Programming To Greedy Algorithms" linked here that talks about certain greedy algorithms can be seen as refinements of dynamic programming. From a quick scan, it may be of interest to you.

There's really strict rule to know it. As someone already said, there are some things that should turn the red light on, but at the end, only experience will be able to tell you.

We apply greedy method when a decision can be made on the local information available at each stage.We are sure that following the set of decisions at each stage,we will find the optimal solution.
However, in dynamic approach we may not be sure about making a decision at one stage, so we carry a set of probable decisions , one of the probable elements may take to a solution.

Is there anything which can only be achieved by recursion?

I am not sure but I had heard of an algorithm which can only be achieved by recursion. Does anyone know of such thing?

You can always simulate recursion by keeping your own stack. So no.

You need to clarify what kind of recursion you are talking about. There's algorithmic recursion and there's recursion in the implementation. It is true that any recursive algorithm allows for a straightforward non-recursive implementation - one can easily do it by using the standard trick of simulating the recursion with manual stack.
However, your question mentions algorithms only. If one assumes that it is specifically about algorithmic recursion, then the answer is yes, there are algorithms that are inherently and unavoidably recursive. In general case it is not possible to replace an inherently recursive algorithm with a non-recursive one. The simplest way to build an inherently recursive algorithm is to take an inherently recursive data structure first. For example, let's say we need to traverse a tree with only parent-to-child links (and no child-to-parent links). It is impossible to come up with a reasonable non-recursive algorithm to solve this problem.
So, that's one example for you: traversal of a tree, which has only parent-to-child links cannot be performed by a non-recursive algorithm.
Another example of an inherently recursive algorithm is the well-known QuickSort algorithm. QuickSort is always recursive, and it cannot be turned into a non-recursive algorithm simply because if you succeed in doing this it will no longer be QuickSort anymore. It will be a completely different algorithm. Granted, this sounds as an exercise in pure semantics, but nevertheless that's also something that is worth mentioning.

If I remember my algorithms correctly, there is nothing doable by recursion that cannot be done with a stack and a loop. I don't have the formal proof here at my fingertips, however.
Edit: it occurs to me that the answer, possibly, is that the only thing doable by recursion that is not doable with a stack+loop, is a stack overflow?

The following compares a recursive vs non-recursive implementations: http://www.sparknotes.com/cs/recursion/whatisrecursion/section1.html
Excerpt:
Given that recursion is in general less efficient, why would we use it? There are two situations where recursion is the best solution:
The problem is much more clearly solved using recursion: there are many problems where the recursive solution is clearer, cleaner, and much more understandable. As long as the efficiency is not the primary concern, or if the efficiencies of the various solutions are comparable, then you should use the recursive solution.
Some problems are much easier to solve through recursion: there are some problems which do not have an easy iterative solution. Here you should use recursion. The Towers of Hanoi problem is an example of a problem where an iterative solution would be very difficult. We'll look at Towers of Hanoi in a later section of this guide.

Are you just looking for a practical example of recursion? Recently my friends and I implemented the Haar Wavelet function (as an exercise to start learning Ruby), which seemed to require recursion. Unless anybody has an implementation of it without recursion?
I would imagine any time one doesn't know the depth of the stack one is iterating over, recursion is the logical way to go. Sure, it may be doable with some hacked up loops, but is that good code?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio