I'm a beginner of Prolog. And I'm wondering how to generate normal distributed random numbers in Prolog.
What I know is using the maybe from library(random) one can set up the probabilities. But what about when it comes to random distributions?
In general, languages provide you with a uniform distribution over 0 to 1. There are various algorithms for getting from that uniform distribution to another distribution, but this case is particularly common so there are a few ways to do it.
If you need a modest amount of random values in a normal distribution, the Box-Muller transform is a very simple algorithm, it amounts to a little math on a few uniform random values:
random_normal(N) :-
random(U1), random(U2),
Z0 is sqrt(-2 * log(U1)) * cos(2*pi*U2),
Z1 is sqrt(-2 * log(U1)) * sin(2*pi*U2),
(N = Z0 ; N = Z1).
This algorithm consumes two uniform values and produces two normal values. I'm providing both solutions. Other ways of doing this might be better for some applications. For instance, you could use asserta/1 and retract/1 to cache the second value and use it without computing, though messing around in the dynamic store may be about as bad as doing the other work (you'd have to benchmark it). Here's the use:
?- random_normal(Z).
Z = -1.2418135230345024 ;
Z = -1.1135242997982466.
?- random_normal(Z).
Z = 0.6266801862581797 ;
Z = -0.4934840828548163.
?- random_normal(Z).
Z = 0.5525713772053663 ;
Z = -0.7118660644436128.
I'm not greatly confident of this but it may get you over the hump.
If you are using SWI-Prolog or SWISH then another option would be to use embedded R, which gives you a lot of flexibility with stats and probabilities.
http://swish.swi-prolog.org/example/Rserve.swinb
The R project provides statistical computing and data vizualization. SWISH can access R through Rserve.
Integrative statistics with R:
Real is a c-based interface for connecting R to Prolog. See the documentation at doc/html/real.html for more information. There is also a paper [1] and a user's guide in doc/guide.pdf.
Real works on current versions of SWI and YAP. As of version 1.1 there is support for using Real on SWI web-servers.
Related
I have a program which traverses an expression tree that does algebra on probability distributions, either sampling or computing the resulting distribution.
I have two implementations computing the distribution: one (computeDistribution) nicely reusable with monad transformers and one (simpleDistribution) where I concretize everything by hand. I would like to not concretize everything by hand, since that would be code duplication between the sampling and computing code.
I also have two data representations:
type Measure a = [(a, Rational)]
-- data Distribution a = Distribution (Measure a) deriving Show
newtype Distribution a = Distribution (Measure a) deriving Show
When I use the data version with the reusable code, computing the distribution of 20d2 (ghc -O3 program.hs; time ./program 20 > /dev/null) takes about one second, which seems way too long. Pick higher values of n at your own peril.
When I use the hand-concretized code, or I use the newtype representation with either implementation, computing 20d2 (time ./program 20 s > /dev/null) takes the blink of an eye.
Why?
How can I find out why?
My knowledge of how Haskell is executed is almost nil. I gather there's a graph of thunks in basically the same shape as the program, but that's about all I know.
I figure with newtype the representation of Distribution is the same as that of Measure, i.e. it's just a list, whereas with the data version each Distribution is kinda' like a single-field record, except with a pointer to the contained list, and so the data version has to perform more allocations. Is this true? If true, is this enough to explain the performance difference?
I'm new to working with monad transformer stacks. Consider the Let and Uniform cases in simpleDistribution — do they do the same as the walkTree-based implementation? How do I tell?
Here's my program. Note that Uniform n corresponds to rolling an n-sided die (in case the unary-ness was surprising).
Update: based on comments I simplified my program by removing everything not contributing to the performance gap. I made two semantic changes: probabilities are now denormalized and all wonky and wrong, and the simplification step is gone. But the essential shape of my program is still there. (See question edit history for the non-simplified program.)
Update 2: I made further simplifications, reducing Distribution down to the list monad with a small twist, removing everything to do with probabilities, and shortening the names. I still observe large performance differences when using data but not newtype.
import Control.Monad (liftM2)
import Control.Monad.Trans (lift)
import Control.Monad.Reader (ReaderT, runReaderT)
import System.Environment (getArgs)
import Text.Read (readMaybe)
main = do
args <- getArgs
let dieCount = case map readMaybe args of Just n : _ -> n; _ -> 10
let f = if ["s"] == (take 1 $ drop 1 $ args) then fast else slow
print $ f dieCount
fast, slow :: Int -> P Integer
fast n = walkTree n
slow n = walkTree n `runReaderT` ()
walkTree 0 = uniform
walkTree n = liftM2 (+) (walkTree 0) (walkTree $ n - 1)
data P a = P [a] deriving Show
-- newtype P a = P [a] deriving Show
class Monad m => MonadP m where uniform :: m Integer
instance MonadP P where uniform = P [1, 1]
instance MonadP p => MonadP (ReaderT env p) where uniform = lift uniform
instance Functor P where fmap f (P pxs) = P $ fmap f pxs
instance Applicative P where
pure x = P [x]
(P pfs) <*> (P pxs) = P $ pfs <*> pxs
instance Monad P where
(P pxs) >>= f = P $ do
x <- pxs
case f x of P fxs -> fxs
How can I find out why?
This is, in general, hard.
The extreme way to do it is to look at the core code (which you can produce by running GHC with -ddump-simpl). This can get complicated really quickly, and it's basically a whole new language to learn. Your program is already big enough that I had trouble learning much from the core dump.
The other way to find out why is to just keep using GHC and asking questions and learning about GHC optimizations until you recognize certain patterns.
Why?
In short, I believe it's due to list fusion.
NOTE: I don't know for sure that this answer is correct, and it would take more time/work to verify than I'm willing to put in right now. That said, it fits the evidence.
First off, we can check whether this slowdown you're seeing is a result of something truly fundamental vs a GHC optimization triggering or not by running in O0, that is, without optimizations. In this mode, both Distribution representations result in about the same (excruciatingly long) runtime. This leads me to believe that it's not the data representation that is inherently the problem but rather there's an optimization that's triggered with the newtype version that isn't with the data version.
When GHC is run in -O1 or higher, it engages certain rewrite rules to fuse different folds and maps of lists together so that it doesn't need to allocate intermediate values. (See https://markkarpov.com/tutorial/ghc-optimization-and-fusion.html#fusion for a decent tutorial on this concept as well as https://stackoverflow.com/a/38910170/14802384 which additionally has a link to a gist with all of the rewrite rules in base.) Since computeDistribution is basically just a bunch of list manipulations (which are all essentially folds), there is the potential for these to fire.
The key is that with the newtype representation of Distribution, the newtype wrapper is erased during compilation, and the list operations are allowed to fuse. However, with the data representation, the wrappers are not erased, and the rewrite rules do not fire.
Therefore, I will make an unsubstantiated claim: If you want your data representation to be as fast as the newtype one, you will need to set up rewrite rules similar to the ones for list folding but that work over the Distribution type. This may involve writing your own special fold functions and then rewriting your Functor/Applicative/Monad instances to use them.
My current understanding:
I have tried reading a few papers and links regarding NMF. It all talks about how we can split a MxN matrix into MxR and RxN matrices(R
Question:
I have a list of users(U) and some assignments(A) for each user. Now I split this matrix(UxA) using NMF. I get 2 Matrices UxR and RxA. How do I use these to predict what assignments(A') a new user(U') must have?
Any help would be appreciated as I couldn't understand this after trying to search for the answer.
Side question and opinion based:
Also if anyone can tell me with their experience, how do they chose R, specially when the number of assignments are in the order of 50,000 or perhaps a hundred thousand. I have been trying these with the scikit-learn library
Edit:
This can simply be done using model.inverse_transform(model.transform(User'))
You can try think this problem as recommender. you want to approximate decompose matrix X into two nonnegative matrix U and V.
see https://cambridgespark.com/content/tutorials/implementing-your-own-recommender-systems-in-Python/index.html
For pyothn scikit-learn, you can use:
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html
from sklearn.decomposition import NMF
model = NMF(n_components=2, init='random', random_state=0)
W = model.fit_transform(X)
H = model.components_
Where X is the matrix you want to decomose. W and H is the nonnegative factor
To predict what assignments(A') a new user(U'), you just use WH' to complete the maitrx
Is there a way to make this work?
add(X, X + 1)
input: add(1, Y).
expected output: Y = 2.
output: Y = 1+1.
Or is it only possible by doing this?
add(X, Y):- Y is X+1.
Historically, there have been many attempts to provide this functionality. Let me give as early examples CLP(ℜ) (about 1986) or more recently Prolog IV. However, sooner or later, one realizes that a programmer needs a finer control about the kind of unification that is employed. Take as example a program that wants to differentiate a formula. In that situation an interpreted functor would not be of any use. For this reason most constraints ship today as some added predicates leaving functors otherwise uninterpreted. In this manner they also fit into ISO-Prolog which permits constraints as extensions.
From a programmer's viewpoint, an extension as yours would reduce the number of auxiliary variables needed, however, it also would require to interpret all terms to this end which produces a lot of extra overhead.
?- X in 1..100.
X in 1..100.
?- X #> 0, X #< 101.
X in 1..100.
So far so good!
Now let's imagine that the sup bound of X depends on Y in 100..200.
?- Y in 100..200, X #> 0, X #=< Y.
Y in 100..200,
Y#>=X,
X in 1..200.
The domain of X propagates correctly to 1..200, with the constraint between X and Y.
But…
?- Y in 100..200, X in 1..Y.
ERROR: Arguments are not sufficiently instantiated
I thought that in was equivalent to the inequalities declaration, but it apparently is not.
Is there any hidden reason as to why in requires that both bounds are ground?
That's a very good question. It has a straight-forward answer that may at first appear deep, yet is really easy to understand once you are familiar with declarative debugging approaches that are beginning to become more widely available.
Declarative debugging relies on properties of logic programs that we know always hold. A key property of pure logic programs (see logical-purity) is monotonicity:
Generalizing a goal can at meast increase the set of solutions.
Here is an example:
?- append([a], [b], [c]).
false.
Why does this fail? We want to find an actual cause of failure, an explanation why this does not succeed. It is tempting to think in procedural terms and trace the execution, but this quickly gets extremely complex and does not really show us the essence of the reason.
Instead, we can get to the essence by generalizing the goals systematically. For example, the following succeeds:
?- append([a], [b], Cs).
I have generalized the query by replacing the term [c] with Cs, which subsumes it.
You can do this automatically, see Ulrich Neumerkel's library(diadem) and especially the GUPU system where these ideas are fully developed and help tremendously to locate the precise location of mistakes in Prolog programs.
To apply such techniques to their utmost extent, you must aim to preserve as many interesting properties as possible in your programs, essentially by staying in the pure monotonic core of Prolog. I say "essentially" because these techniques also work somewhat beyond this subset on a class of programs that is very hard to classify formally. Note also that much of the ongoing work on logic programming is aimed towards increasing the pure monotonic subset of Prolog as far as possible, so the restriction becomes less and less severe over time.
Consider now the specifics of (in)/2: The domain argument of (in)/2 carries within it a declarative flaw in that it uses a defaulty representation of domains. There are several ways to see this, and notably from the following situation: Suppose I tell you that a domain has the form inf..High, then what is High? It can be either of:
sup
an integer.
The fact that we cannot clearly distinguish these cases tells you that this representation is defaulty.
A clean representation of boundaries would look like this:
inf, sup to denote the infinities
n(N) to denote the integer N.
With such a representation, I could tell you that the domain looks like this: inf..n(N), and you know that N must be an integer.
Now another example:
?- X in 1..0.
false.
Why does this fail? Let us again generalize the arguments to find a reason:
?- X in Low..0.
Suppose now that (in)/2 accepted this as an admissible query. Then what is Low? Again, it can be either an integer, or the atom inf. Unfortunately, there is no sensible way to constrain Low in this way: Constraining it to integers would make CLP(FD) constraints most appropriate, making it tempting to answer with:
X in Low..0,
Low in inf..0
but that is of course too much, because it precludes Low = inf:
?- Low in inf..0, Low = inf.
Type error: `integer' expected, found `inf' (an atom)
Knowing that we cannot decide the type of the term Low in such situations, and lacking a sensible way to state this, an instantiation error is raised to indicate that such a generalization is inadmissible.
From these considerations, the overall answer is really simple:
(in)/2 cannot be easily generalized due to the defaulty representation of domains.
Invariably, there are only two reasons for using such representations:
they are easier to type
they are easier to read.
The hefty drawback of course is that they stand massively in the way of declarative reasoning over your programs.
Note that internally, the CLP(FD) system of SWI-Prolog does use a clean representation of domains. This can also be carried over to the user side, so that we could write:
?- X in n(Low)..0.
with the declaratively perfectly valid answer:
X in n(Low)..0,
Low in inf..0.
Over time, such cleaner notations will definitely find their way to users. For now, it would be a bit too much to ask, at least in my experience.
In the specific examples you cite, the domain boundary is already constrained to integers, so of course we can distinguish the cases and translate this automatically to inequalitites. However, this would not remove the core problem of defaultyness, and exchanging the goals would still raise an instantiation error.
I am a Mechanical engineer with a computer scientist question. This is an example of what the equations I'm working with are like:
x = √((y-z)×2/r)
z = f×(L/D)×(x/2g)
f = something crazy with x in it
etc…(there are more equations with x in it)
The situation is this:
I need r to find x, but I need x to find z. I also need x to find f which is a part of finding z. So I guess a value for x, and then I use that value to find r and f. Then I go back and use the value I found for r and f to find x. I keep doing this until the guess and the calculated are the same.
My question is:
How do I get the computer to do this? I've been using mathcad, but an example in another language like C++ is fine.
The very first thing you should do faced with iterative algorithms is write down on paper the sequence that will result from your idea:
Eg.:
x_0 = ..., f_0 = ..., r_0 = ...
x_1 = ..., f_1 = ..., r_1 = ...
...
x_n = ..., f_n = ..., r_n = ...
Now, you have an idea of what you should implement (even if you don't know how). If you don't manage to find a closed form expression for one of the x_i, r_i or whatever_i, you will need to solve one dimensional equations numerically. This will imply more work.
Now, for the implementation part, if you never wrote a program, you should seriously ask someone live who can help you (or hire an intern and have him write the code). We cannot help you beginning from scratch with, eg. C programming, but we are willing to help you with specific problems which should arise when you write the program.
Please note that your algorithm is not guaranteed to converge, even if you strongly think there is a unique solution. Solving non linear equations is a difficult subject.
It appears that mathcad has many abstractions for iterative algorithms without the need to actually implement them directly using a "lower level" language. Perhaps this question is better suited for the mathcad forums at:
http://communities.ptc.com/index.jspa
If you are using Mathcad, it has the functionality built in. It is called solve block.
Start with the keyword "given"
Given
define the guess values for all unknowns
x:=2
f:=3
r:=2
...
define your constraints
x = √((y-z)×2/r)
z = f×(L/D)×(x/2g)
f = something crazy with x in it
etc…(there are more equations with x in it)
calculate the solution
find(x, y, z, r, ...)=
Check Mathcad help or Quicksheets for examples of the exact syntax.
The simple answer to your question is this pseudo-code:
X = startingX;
lastF = Infinity;
F = 0;
tolerance = 1e-10;
while ((lastF - F)^2 > tolerance)
{
lastF = F;
X = ?;
R = ?;
F = FunctionOf(X,R);
}
This may not do what you expect at all. It may give a valid but nonsense answer or it may loop endlessly between alternate wrong answers.
This is standard substitution to convergence. There are more advanced techniques like DIIS but I'm not sure you want to go there. I found this article while figuring out if I want to go there.
In general, it really pays to think about how you can transform your problem into an easier problem.
In my experience it is better to pose your problem as a univariate bounded root-finding problem and use Brent's Method if you can
Next worst option is multivariate minimization with something like BFGS.
Iterative solutions are horrible, but are more easily solved once you think of them as X2 = f(X1) where X is the input vector and you're trying to reduce the difference between X1 and X2.
As the commenters have noted, the mathematical aspects of your question are beyond the scope of the help you can expect here, and are even beyond the help you could be offered based on the detail you posted.
However, I think that even if you understood the mathematics thoroughly there are computer science aspects to your question that should be addressed.
When you write your code, try to make organize it into functions that depend only upon the parameters you are passing in to a subroutine. So write a subroutine that takes in values for y, z, and r and returns you x. Make another that takes in f,L,D,G and returns z. Now you have testable routines that you can check to make sure they are computing correctly. Check the input values to your routines in the routines - for instance in computing x you will get a divide by 0 error if you pass in a 0 for r. Think about how you want to handle this.
If you are going to solve this problem interatively you will need a method that will decide, based on the results of one iteration, what the values for the next iteration will be. This also should be encapsulated within a subroutine. Now if you are using a language that allows only one value to be returned from a subroutine (which is most common computation languages C, C++, Java, C#) you need to package up all your variables into some kind of data structure to return them. You could use an array of reals or doubles, but it would be nicer to choose to make an object and then you can reference the variables by their name and not their position (less chance of error).
Another aspect of iteration is knowing when to stop. Certainly you'll do so when you get a solution that converges. Make this decision into another subroutine. Now when you need to change the convergence criteria there is only one place in the code to go to. But you need to consider other reasons for stopping - what do you do if your solution starts diverging instead of converging? How many iterations will you allow the run to go before giving up?
Another aspect of iteration of a computer is round-off error. Mathematically 10^40/10^38 is 100. Mathematically 10^20 + 1 > 10^20. These statements are not true in most computations. Your calculations may need to take this into account or you will end up with numbers that are garbage. This is an example of a cross-cutting concern that does not lend itself to encapsulation in a subroutine.
I would suggest that you go look at the Python language, and the pythonxy.com extensions. There are people in the associated forums that would be a good resource for helping you learn how to do iterative solving of a system of equations.