Could anybody explain me the independence in bayesian nets? - probability

Could anybody explain me conditional independence in the following cases? Could you give me any other appropriate examples for each case?

First and third examples fall under rule, that says if a variable's all parents are known, it should care only about its children and it is conditionally independent of all other variables.
In the first example the random variable JohnCalls(child) is conditionally independent of the random variable Burglary(grandpa), which means that, if we know the state of random variable Alarm(parent), Johncalls will act accordingly regardless whether there was a Burglary or not.
The similar example would be WasPartying -> HomeworkWasntCompleted -> ReceivedBadGrade. Here, regardless whether you were partying or not, if homework wasn't completed (the parent is known), you gonna receive bad grade. So if we have a value of HomeworkWasntCompleted, learning value of WasPartying doesn't give us any new information about ReceivedBadGrade.
In the third example it's the same: if we know that Alarm is on, Marycalls won't give us any new hint about JohnCalls, so JohnCalls is conditionally independent of MaryCalls given the value of Alarm.
The second example is a little bit trickier. Although we know all the parents of Burglary (obviously, cause it doesn't have any parents), we can't say that Burglary is conditionally independent of Earthquake. Cause if we know that Alarm is on, and we received an information about Earthquake, we would guess that the Alarm was triggered by Earthquake and the chances of Burglary is considerably lower. So, in this case Earthquake gives us some information about Burglary. This example doesn't fall under the rule described above, cause the variables questioned upon conditional independence share the same descendant.
The similar example would be WasPartying -> HomeworkWasntCompleted <- DidntUnderstandTopic (pay attention to arrow directions).
Here you can find a nice lecture about conditional independence.

Related

Prover9 "Some, but not all, of the requested proofs were found"

I'm running some lattice proofs through Prover9/Mace4. Prover9's saying Exit: Time limit. plus the message in the Title.
I've doubled the time limit from 60 to 120 seconds. Same message (in twice the time). The weird thing is:
there's only one statement to prove. That is, only one label(goal) in the report (what's with the but not all?)
it does seem to have completed the proof, in that it shows last line $F.
Mace4 can't find any counter-examples (I upped its time to 120 seconds).
I've found some GHits for that message, but they seem to be all in Chinese(?)
It's possible the axioms I've given are (mutually) recursive -- I'm trying to introduce a function and a nominated 'absorbing element' [**]; and that solving will need infinitary unification. Does Prover9 do that?
I'm happy to add the axioms and goal to this message. (I'm using a non-standard way to define the meet and join.) But first, are there any sanity checks I should go through?
[**] the absorbing element is neither lattice top nor lattice bottom; more like lattice left-corner. (The element will be lattice bottom just in case the lattice degenerates to two elements.) The function is a partial ordering 'at right angles' to top/bottom. The lattice I expect to be neither complemented nor distributive (again except when 2 elements).
I've reproduced this after much trying, but only by setting some strange option that I'm sure I wouldn't have touched. (The only option I usually change is the Time limit, and I Reset to defaults quite often, so that would have blatted any evidence.)
Here's my guess for what happened.
what's with the but not all?
You can enter multiple goals (providing they're all positive). [**]
With strange option settings, if Prover9 can prove the first but not the second, it'll keep trying until exhausted; but then only report the successful one -- with a $F. result OK.
If you double the Time limit, it'll still prove the first and still keep on trying for the second -- taking twice the time for the same outcome.
Mace4 will come across the first goal, and use up its time trying for a counter-example. There isn't one because it's provable. Again, doubling its Time limit will get the same outcome after twice as long.
[Note **] It's never that I intend to set multiple goals; but when I'm hacking/experimenting with axioms, I keep all the goals in the Goals: box so I can easily toggle un/comment. I guess I didn't comment-out one when I was uncommenting another.
The behaviour usually, as described in the manual, is Prover9 reports success at the first goal it proves; doesn't go on to other goals. If there's multiple provable goals, it seems to choose the easiest/quickest(?) irrespective of position in the file.
But with max_proofs set to more than default 1, Prover9 will keep trying. (There's also a auto_denials flag that has something to do with it I don't understand.)
I've no idea how I set max_proofs -- I didn't recognise the Options/Limits sub-screen when I eventually found it. Weird.

About definiteness of definition of algorithm?

I'm reading a note about the definition of algorithm, it has two requirements that I don't know what's the differences between them
Definiteness: Every instruction should be clear and unambiguous. (I found a source with exactly the same statement)
From the resource I have there are 5 requirements: Input, Output, Definiteness, Finiteness, Effectiveness. I can understand the other 4 except the Definiteness. Can anyone provide some better definition if the above is not precise?
From the above I only suspect that there are at least two subtleties should be considered...
For conclusion from answers below: definiteness = defined(clear) + only_one(unambiguous).
Algorithm should be clear and unambiguous. Each of its steps (or phases), and their inputs/outputs should be clear and must lead to only one meaning.
For example, if one step is to add two integers, we must define both “integers” as well as the “add” operation: we cannot for example use the same symbol to mean addition in one place and multiplication somewhere else.
If presented to an educated human, the text should allow him to simulate execution by hand in exactly the way you had in mind (same steps taken, same results obtained).
When you don't quite understand the definition of a term provided by some author, it's often helpful to look for other definitions of it. I especially like the one for "definite" from wiktionary.org:
Free from any doubt.
In this context, clear becomes understandable, and unambiguous becomes with a single meaning.
It just means that instructions in an algorithm should have one and only one interpretation. Moreover, the interpretation should be obvious.
A statement like "Repeat steps 1 to 4 a few times" does not fit the criteria as "few times" can mean different number of tries to different people.
On the other hand, a statement like "Repeat steps 1 to 4 until x is equal to y" where x and y are some parameters in the algorithm is indeed clear and unambiguous.

Prolog unknowns in the knowledge base

I am trying to learn Prolog and it seems the completeness of the knowledge is very important because obviously if the knowledge base does not have the fact, or the fact is incorrect, it will affect the query results. I am wondering how best to handle unknown details of a fact. For example,
%life(<name>,<birth year>,<death year>)
%ruler(<name>,<precededBy>,<succeededBy>)
Some people I add to the knowledge base would still be alive, therefore their year of death is not known. In the example of rulers, the first ruler did not have a predecessor and the current ruler does not have a successor. In the event that there are these unknowns should I put some kind of unknown flag value or can the detail be left out. In the case of the ruler, not knowing the predecessor would the fact look like this?
ruler(great_ruler,,second_ruler).
Well, you have a few options.
In this particular case, I would question your design. Rather than putting both previous and next on the ruler, you could just put next and use a rule to find the previous:
ruler(great_ruler, second_ruler).
ruler(second_ruler, third_ruler).
previous(Ruler, Previous) :- ruler(Previous, Ruler).
This predicate will simply fail for great_ruler, which is probably appropriate—there wasn't anyone before them, after all.
In other cases, it may not be straightforward. So you have to decide if you want to make an explicit value for unknown or use a variable. Basically, do you want to do this:
ruler(great_ruler, unknown, second_ruler).
or do you want to do this:
ruler(great_ruler, _, second_ruler).
In the first case, you might get spurious answers featuring unknown, unless you write some custom logic to catch it. But I actually think the second case is worse, because that empty variable will unify with anything, so lots of queries will produce weird results:
ruler(_, SucceededHimself, SucceededHimself)
will succeed, for instance, unifying SucceededHimself = second_ruler, which probably isn't what you want. You can check for variables using var/1 and ground/1, but at that point you're tampering with Prolog's search and it's going to get more complex. So a blank variable is not as much like NULL in SQL as you might want it to be.
In summary:
prefer representations that do not lead to this problem
if forced, use a special value

How to recognize variables that don't affect the output of a program?

Sometimes the value of a variable accessed within the control-flow of a program cannot possibly have any effect on a its output. For example:
global var_1
global var_2
start program hello(var_3, var_4)
if (var_2 < 0) then
save-log-to-disk (var_1, var_3, var_4)
end-if
return ("Hello " + var_3 + ", my name is " + var_1)
end program
Here only var_1 and var_3 have any influence on the output, while var_2 and var_4 are only used for side effects.
Do variables such as var_1 and var_3 have a name in dataflow-theory/compiler-theory?
Which static dataflow analysis techniques can be used to discover them?
References to academic literature on the subject would be particularly appreciated.
The problem that you stated is undecidable in general,
even for the following very narrow special case:
Given a single routine P(x), where x is a parameter of type integer. Is the output of P(x) independent of the value of x, i.e., does
P(0) = P(1) = P(2) = ...?
We can reduce the following still undecidable version of the halting problem to the question above: Given a Turing machine M(), does the program
never stop on the empty input?
I assume that we use a (Turing-complete) language in which we can build a "Turing machine simulator":
Given the program M(), construct this routine:
P(x):
if x == 0:
return 0
Run M() for x steps
if M() has terminated then:
return 1
else:
return 0
Now:
P(0) = P(1) = P(2) = ...
=>
M() does not terminate.
M() does terminate
=> P(x) = 1 for a sufficiently large x
=> P(x) != P(0) = 0
So, it is very difficult for a compiler to decide whether a variable actually does not influence the return value of a routine; in your example, the "side effect routine" might manipulate one of its values (or even loop infinitely, which would most definitely change the return value of the routine ;-)
Of course overapproximations are still possible. For example, one might conclude that a variable does not influence the return value if it does not appear in the routine body at all. You can also see some classical compiler analyses (like Expression Simplification, Constant propagation) having the side effect of eliminating appearances of such redundant variables.
Pachelbel has discussed the fact that you cannot do this perfectly. OK, I'm an engineer, I'm willing to accept some dirt in my answer.
The classic way to answer you question is to do dataflow tracing from program outputs back to program inputs. A dataflow is the connection of a program assignment (or sideeffect) to a variable value, to a place in the application that consumes that value.
If there is (transitive) dataflow from a program output that you care about (in your example, the printed text stream) to an input you supplied (var2), then that input "affects" the output. A variable that does not flow from the input to your desired output is useless from your point of view.
If you focus your attention only the computations involved in the dataflows, and display them, you get what is generally called a "program slice" . There are (very few) commercial tools that can show this to you.
Grammatech has a good reputation here for C and C++.
There are standard compiler algorithms for constructing such dataflow graphs; see any competent compiler book.
They all suffer from some limitation due to Turing's impossibility proofs as pointed out by Pachelbel. When you implement such a dataflow algorithm, there will be places that it cannot know the right answer; simply pick one.
If your algorithm chooses to answer "there is no dataflow" in certain places where it is not sure, then it may miss a valid dataflow and it might report that a variable does not affect the answer incorrectly. (This is called a "false negative"). This occasional error may be satisfactory if
the algorithm has some other nice properties, e.g, it runs really fast on a millions of code. (The trivial algorithm simply says "no dataflow" in all places, and it is really fast :)
If your algorithm chooses to answer "yes there is a dataflow", then it may claim that some variable affects the answer when it does not. (This is called a "false positive").
You get to decide which is more important; many people prefer false positives when looking for a problem, because then you have to at least look at possibilities detected by the tool. A false negative means it didn't report something you might care about. YMMV.
Here's a starting reference: http://en.wikipedia.org/wiki/Data-flow_analysis
Any of the books on that page will be pretty good. I have Muchnick's book and like it lot. See also this page: (http://en.wikipedia.org/wiki/Program_slicing)
You will discover that implementing this is pretty big effort, for any real langauge. You are probably better off finding a tool framework that does most or all this for you already.
I use the following algorithm: a variable is used if it is a parameter or it occurs anywhere in an expression, excluding as the LHS of an assignment. First, count the number of uses of all variables. Delete unused variables and assignments to unused variables. Repeat until no variables are deleted.
This algorithm only implements a subset of the OP's requirement, it is horribly inefficient because it requires multiple passes. A garbage collection may be faster but is harder to write: my algorithm only requires a list of variables with usage counts. Each pass is linear in the size of the program. The algorithm effectively does a limited kind of dataflow analysis by elimination of the tail of a flow ending in an assignment.
For my language the elimination of side effects in the RHS of an assignment to an unused variable is mandated by the language specification, it may not be suitable for other languages. Effectiveness is improved by running before inlining to reduce the cost of inlining unused function applications, then running it again afterwards which eliminates parameters of inlined functions.
Just as an example of the utility of the language specification, the library constructs a thread pool and assigns a pointer to it to a global variable. If the thread pool is not used, the assignment is deleted, and hence the construction of the thread pool elided.
IMHO compiler optimisations are almost invariably heuristics whose performance matters more than effectiveness achieving a theoretical goal (like removing unused variables). Simple reductions are useful not only because they're fast and easy to write, but because a programmer using a language who understand basics of the compiler operation can leverage this knowledge to help the compiler. The most well known example of this is probably the refactoring of recursive functions to place the recursion in tail position: a pointless exercise unless the programmer knows the compiler can do tail-recursion optimisation.

How to get variable/function definitions set in Parallel (e.g. with ParallelMap)?

I have a function that I use to look up a value based on an index. The value takes some time to calculate, so I want to do it with ParallelMap, and references another similar such function that returns a list of expressions, also based on an index.
However, when I set it all up in a seemingly reasonable fashion, I see some very bizarre behaviour. First, I see that the function appears to work, albeit very slowly. For large indexes, however, the processor activity in Taskmangler stays entirely at zero for an extended period of time (i.e. 2-4 minutes) where all instances of Mathematica are seemingly inert. Then, without the slightest blip of CPU use, a result appears. Is this another case of Mathematica spukhafte Fernwirkung?
That is, I want to create a variable/function that stores an expression, here a list of integers (ListOfInts), and then on the parallel workers I want to perform some function on that expression (here I apply a set of replacement rules and take the Min). I want the result of that function to also be indexed by the same index under another variable/function (IndexedFunk), whose result is then available back on the main instance of Mathematica:
(*some arbitrary rules that will convert some of the integers to negative values:*)
rulez=Dispatch[Thread[Rule[Range[222],-Range[222]]]];
maxIndex = 333;
Clear[ListOfInts]
Scan[(ListOfInts[#]=RandomInteger[{1,999},55])&,Range[maxIndex ]]
(*just for safety's sake:*)
DistributeDefinitions[rulez, ListOfInts]
Clear[IndexedFunk]
(*I believe I have to have at least one value of IndexedFunk defined before I Share the definition to the workers:*)
IndexedFunk[1]=Min[ListOfInts[1]]/.rulez
(*... and this should let me retrieve the values back on the primary instance of MMA:*)
SetSharedFunction[IndexedFunk]
(*Now, here is the mysterious part: this just sits there on my multiprocessor machine for many minutes until suddenly a result appears. If I up maxIndex to say 99999 (and of course re-execute the above code again) then the effect can more clearly be seen.*)
AbsoluteTiming[Short[ParallelMap[(IndexedFunk[#]=Min[ListOfInts[#]/.rulez])&, Range[maxIndex]]]]
I believe this is some bug, but then I am still trying to figure out Mathematica Parallel, so I can't be too confident in this conclusion. Despite its being depressingly slow, it is nonetheless impressive in its ability to perform calculations without actually requiring a CPU to do so.
I thought perhaps it was due to whatever communications protocol is being used between the master and slave processes, perhaps it is so slow that it just appears that the processors are doing nothing when if fact they are just waiting to send the next bit of some definition or other. In which case I thought ParallelMap[..., Method->"CoarsestGrained"] would be of some use. But no, that doesn't work neither.
A question: "Am I doing something obviously wrong, or is this a bug?"
I am afraid you are. The problem is with the shared definition of a variable. Mathematica maintains a single coherent value in all copies of the variable across kernels, and therefore that variable becomes a single point of huge contention. CPU is idle because kernels line up to the queue waiting for the variable IndexedFunk, and most time is spent in interprocess or inter-machine communication. Go figure.
By the way, there is no function SetSharedDefinition in any Mathematica version I know of. You probably intended to write SetSharedVariable. But remove that evil call anyway! To avoid contention, return results from the parallelized computation as a list of pairs, and then assemble them into downvalues of your variable at the main kernel:
Clear[IndexedFunk]
Scan[(IndexedFunk[#[[1]]] = #[[2]]) &,
ParallelMap[{#, Min[ListOfInts[#] /. rulez]} &, Range[maxIndex]]
]
ParallelMap takes care of distributing definition automagically, so the call to DistributeDefinitions is superfluous. (As a minor note, it is not correct as written, omitting the maxIndex variable, but the omission is automatically taken care of by ParallelMap in this particular case.)
EDIT, NB!: The automatic distribution applies only to the version 8 of Mathematica. Thanks #MikeHoneychurch for the correction.

Resources