Implementing Latent Dirichlet Allocation (LDA) with PyMC - pymc

PyMC comes with numerous examples but LDA, which is a relatively simple graphical model, is not one of them. There are questions on numerous sites about this but never any references to implementations. I've considered how it might be implemented but it's not clear how PyMC would be used to establish the topic-word dependencies within each document.
Can LDA be implemented relatively easily with PyMC?

I tried to implement and explore the solution provided in the link by Tom Minka and found that it gives some nice results. However, I am yet to find with 100% confidence that this implementation is the correct LDA implementation.
The ipython notebook can be viewed at: https://github.com/napsternxg/ipython-notebooks/blob/master/PyMC_LDA.ipynb

Related

Pyro vs Pymc? What are the difference between these Probabilistic Programming frameworks?

I used 'Anglican' which is based on Clojure, and I think that is not good for me. Bad documents and a too small community to find help. Also, I still can't get familiar with the Scheme-based languages. So I want to change the language to something based on Python.
Maybe Pyro or PyMC could be the case, but I totally have no idea about both of those.
What are the difference between the two frameworks?
Can they be used for the same problems?
Are there examples, where one shines in comparison?
(Updated for 2022)
Pyro is built on PyTorch. It has full MCMC, HMC and NUTS support. It has excellent documentation and few if any drawbacks that I'm aware of.
PyMC was built on Theano which is now a largely dead framework, but has been revived by a project called Aesara. PyMC3 is now simply called PyMC, and it still exists and is actively maintained. Its reliance on an obscure tensor library besides PyTorch/Tensorflow likely make it less appealing for widescale adoption--but as I note below, probabilistic programming is not really a widescale thing so this matters much, much less in the context of this question than it would for a deep learning framework.
There still is something called Tensorflow Probability, with the same great documentation we've all come to expect from Tensorflow (yes that's a joke). My personal opinion as a nerd on the internet is that Tensorflow is a beast of a library that was built predicated on the very Googley assumption that it would be both possible and cost-effective to employ multiple full teams to support this code in production, which isn't realistic for most organizations let alone individual researchers.
That said, they're all pretty much the same thing, so try them all, try whatever the guy next to you uses, or just flip a coin. The best library is generally the one you actually use to make working code, not the one that someone on StackOverflow says is the best. As for which one is more popular, probabilistic programming itself is very specialized so you're not going to find a lot of support with anything.
From here
Pyro is a deep probabilistic programming language that focuses on
variational inference, supports composable inference algorithms.
Pyro aims to be more dynamic (by using PyTorch) and universal
(allowing recursion).
Pyro embraces deep neural nets and currently focuses on variational inference. Pyro doesn't do Markov chain Monte Carlo (unlike PyMC and Edward) yet.
Pyro is built on pytorch whereas PyMC3 on theano. So you get PyTorch’s dynamic programming and it was recently announced that Theano will not be maintained after an year. However, I found that PyMC has excellent documentation and wonderful resources. Another alternative is Edward built on top of Tensorflow which is more mature and feature rich than pyro atm. Authors of Edward claim it's faster than PyMC3.
I guess the decision boils down to the features, documentation and programming style you are looking for.

Recommendations for Fast Multipole Method implementation?

I'm interested in implementing the Fast Multipole Method to efficiently simulate a system of repulsive particles.
I've found a large collection of references discussing FMM, but none seem very approachable for non-mathematicians who want to fully understand the algorithm.
Can you recommend a ground-up reference that clearly explains the mathematics behind the process, and includes pseudocode exemplifying a proper implementation?
I am by no means an expert in FMM, but this java implementation and introduction is the best source I've found so far for explaining it carefully and slowly. The paper is good at defining terms before using them, and the code at least is useful as a reference point. The math still gets hairy very quickly, but it is what it is :)
A pedestrian introduction to fast multipole methods is a close second. It doesn't explain the actual details of a working FMM implementation, but it's a good introduction to the basic ideas.
I like the short course on FMM. In begins with FMM in 1D, than it uses theory of complex variable to do FMM in 2D. And than there is the crazy 3D version which uses theory of spherical harmonics functions, which I guess can be very difficult for non-mathematician. But If you need FMM only in 2D you should be fine.
Unfortunately no pseudo codes are given there.
But do you really need the accuracy of FMM?. You might be fine with Barnes-Hut's algorithm
After running into a similar issue to you, I ended up writing a fully-documented Python fast multipole method implementation, pybbfmm. I've also written a short, mathematics-free tutorial on how the method works. Together, I think they're substantially more accessible than any of the other presentations I could find.
(meta: Although this is effectively a linkpost, the OP is explicitly asking for a link. I've added what I think was missing from the last one - the name fo the library - but I'm not sure how else to offer this answer except as a name and a link. Certainly it doesn't feel any more linkpost-y than the accepted answer. If this one gets deleted as well, I'll give up)

Examples for partially observable, sensorless environments

I am currently looking at partially observable environments and sensor less problems as described in Artificial intelligence : a modern approach/ Stuart Russell, Peter Norvig.
Chapter 4.
The only example for partially observable and also sensorless problems i can find on the internet is the vacuum cleaner problem also shown in the book.
Is there another example, making it also possible to execute the mentioned algorithms as well?
Thanks,
SideSwipe
The kind of problems you refer to are referred in the literature as "conformant" planning (partially observable, no feedback) problems. It's not a terribly "interesting" class of planning problems, because very little work has been done on them, compared with more expressive models such as contingent - partially observable, partial feedbak - planning.
There's been some work done on it in recent years and you can take a look at the benchmarks by Joerg Hoffmann over here: http://www.loria.fr/~hoffmanj/ff/cff-tests.tgz
A more interesting kind of "applications" of conformant planning is that of mapping the problem of designing a finite state controller into that of solving a conformant planning problem. You might want to check this paper:
http://www.dtic.upf.edu/~hgeffner/fsc-nectar-aaai-2010.pdf
I think there are some follow-ups to this.
Note that in the above the problems are described in STRIPS extended so to represent uncertainty in the initial state.

Sentiment Analysis of given text

This topic has many thread. But also I am posting another one. All the post may be a way to do a sentiment analysis, but I found no way.
I want to implement the doing ways of sentiment analysis. So I would request to show me a way. During my research, I found that this is used anyway. I guess Bayesian algorithm is used to calculate positive words and negative words and calculate the probability of the sentence being positive or negative using bag of words.
This is only for the words, I guess we have to do language processing too. So is there anyone who has more knowledge? If yes, can you guide me with some algorithms with their links for reference so that I can implement. Anything in particular that may help me in my analysis.
Also can you prefer me language that I can work with? Some says Java is comparably time consuming so they don't recommend Java to work with.
Any type of help is much appreciated.
First of all, sentiment analysis is done on various levels, such as document, sentence, phrase, and feature level. Which one are you working on? There are many different approaches to each of them. You can find a very good intro to this topic here. For machine-learning approaches, the most important element is feature engineering and it's not limited to bag of words. You can find many other useful features in different applications from the tutorial I linked. What language processing you need to do depends on what features you want to use. You may need POS-tagging if POS information is needed for your features for example.
For classifiers, you can try Support Vector Machines, Maximum Entropy, and Naive Bayes (probably as a baseline) and these are frequently used in the literature, about which you can also find a pretty comprehensive list in the link. The Mallet toolkit contains ME and NB, and if you use SVMlight, you can easily convert the feature formats to the Mallet format with a function. Of course there are many other implementations of these classifiers.
For rule-based methods, Pointwise Mutual Information is frequently used, and some kinds of scoring-based methods, etc.
Hope this helps.
For the text analyzing there is no language stronger than SNOBOL. In SNOBOL-4 the Fortran interpretator, for example, takes only 60 lines.
NLTK offers really good Algorithm for sentiment analysis. It is open source so you can have a look at the source code and check out the algorithm used. You can even download NLTK book which is free and has some good material on sentiment analysis.
Coming to your second point I dont think Java is that slow. I am myself coding in c++ for years but lately also started with java as if you see a lot of very popular open source softwares like lucene, solr, hadoop, neo4j are all written in java.

state-of-the-art of classification algorithms

We know there are like a thousand of classifiers, recently I was told that, some people say adaboost is like the out of the shell one.
Are There better algorithms (with
that voting idea)
What is the state of the art in
the classifiers.Do you have an example?
First, adaboost is a meta-algorithm which is used in conjunction with (on top of) your favorite classifier. Second, classifiers which work well in one problem domain often don't work well in another. See the No Free Lunch wikipedia page. So, there is not going to be AN answer to your question. Still, it might be interesting to know what people are using in practice.
Weka and Mahout aren't algorithms... they're machine learning libraries. They include implementations of a wide range of algorithms. So, your best bet is to pick a library and try a few different algorithms to see which one works best for your particular problem (where "works best" is going to be a function of training cost, classification cost, and classification accuracy).
If it were me, I'd start with naive Bayes, k-nearest neighbors, and support vector machines. They represent well-established, well-understood methods with very different tradeoffs. Naive Bayes is cheap, but not especially accurate. K-NN is cheap during training but (can be) expensive during classification, and while it's usually very accurate it can be susceptible to overtraining. SVMs are expensive to train and have lots of meta-parameters to tweak, but they are cheap to apply and generally at least as accurate as k-NN.
If you tell us more about the problem you're trying to solve, we may be able to give more focused advice. But if you're just looking for the One True Algorithm, there isn't one -- the No Free Lunch theorem guarantees that.
Apache Mahout (open source, java) seems to pick up a lot of steam.
Weka is a very popular and stable Machine Learning library. It has been around for quite a while and written in Java.
Hastie et al. (2013, The Elements of Statistical Learning) conclude that the Gradient Boosting Machine is the best "off-the-shelf" Method. Independent of the Problem you have.
Definition (see page 352):
An “off-the-shelf” method is one that
can be directly applied to the data without requiring a great deal of timeconsuming data preprocessing or careful tuning of the learning procedure.
And a bit older meaning:
In fact, Breiman (NIPS Workshop, 1996) referred to AdaBoost with trees as the “best off-the-shelf classifier in the world” (see also Breiman (1998)).

Resources