Metropolis algorithm to solve a problem - pymc

I need to implement the metropolis algorithm to solve the example titled Cheating among students here. In summary the aim is to estimate the frequency of students cheating in an exam. The experiment is as follows:
Each student is asked to toss a coin. If the result is heads then student will answer honestly if he/she cheated. If the result is tails the student will toss a second coin and answer "yes, I cheated" if it lands heads and "no, I didn't" if it lands tails.
The experiment involved $N=100$ trials and the interviewer got $X=35$ "yes" answers.
I need to find the frequency of cheating using a raw implementation of the metropolis algorithm. To that end I have identified the following variables in the experiment:
$FC \sim Bernoulli(\theta=0.5)$. The probability distribution for the first coin.
$SC \sim Bernoulli(\theta=0.5)$. The probability distribution for the second coin.
$TA \sim Bernoulli(\theta=P)$. The probability of a true answer. That is, a student honestly answering.
At this point is where I start to get lost. I think $P$ is the probability that I am interested in when using metropolis. However, it depends on the two other probabilities $FC$ and $SC$. Moreover, I would say it depends on a new random variable $Z = FC \cdot TA + (1-FC)\cdot SC$ that depends on the two others.
My first question is: Am I right until this point in my reasoning?. If so,
from what I understand, metropolis proposes a value at each iteration. The probability of this value is computed taking into account the distributions listed above. So, does that mean that at each point I have to evaluate the probability of the proposed value according to $Z$?.

Related

Percentage of cases they will contradict where atleast one of them is correct

Braden and Fred are two independent journalists. Braden is usually correct in 33% of his reports and Fred in 70% of his reports. In what percentage of cases are they likely to contradict each other, talking about the same incident where at least one of them is correct.
Is this question supposed to solved with probability?
This is indeed supposed to be solved with probability.
There are 4 cases
Braden is right and Fred is right
Braden is right and Fred is wrong
Braden is wrong and Fred is right
Braden is wrong and Fred is wrong
We also know that the probability of two unrelated events both happening is equal to the product of the probabilities of the events:
From this, we can see that the probability of the first case is equal to the probability that Braden is right, multiplied by the probability that Fred is right
Using this logic on all four cases, we get the following probabilities:
Since we are only interested in the cases where they contradict each other, we can simply sum the probability of those two cases.
From that, we can see that the probability of the two contradicting each other is 17/30, or 57%

independent times to ensure minimum cut of graph at least one trial succeeds

I just finished with the first module in the algo specialization course in coursera.
There was an exam question that i could not quite understand. I have passed that exam, so there's no point for me to retake it.
Out of curiosity, I want to learn the principles around this question.
The question was posted as such:
Suppose that a randomized algorithm succeeds (e.g., correctly computes
the minimum cut of a graph) with probability p (with 0 < p < 1). Let ϵ
be a small positive number (less than 1).
How many independent times do you need to run the algorithm to ensure
that, with probability at least 1−ϵ, at least one trial succeeds?
The options given were:
log(1−p)/logϵ
log(p)/logϵ
logϵ/log(p)
logϵ/log(1−p)
I made two attempts and both were wrong. My attempts were:
log(1−p)/logϵ
logϵ/log(1−p)
It's not so much I want to know the right answer. I want to learn the principles behind this question and what it's asking for. So that I know how to answer similar questions in future.
I have posted this on the forum, but nobody answered after a month. So I am trying it out here.
NO need to post the answer directly. If you got me to get to aha moment, i will mark it as correct.
Thanks.
How many independent times do you need to run the algorithm to ensure that, with probability at least 1−ϵ, at least one trial succeeds?
Let's rephrase it a bit:
What is the smallest number of independent trials such that the probability of all of them failing is less than or equal to ϵ?
By the definition of independent events, the probability of all of them occurring is the product of their individual probabilities. Since the probability of one trial failing is (1-p), the probability of n trials failing is (1-p)^n.
This gives us an inequality for n:
(1-p)^n <= ϵ

algorithm for solving resource allocation problems

Hi I am building a program wherein students are signing up for an exam which is conducted at several cities through out the country. While signing up students provide a list of three cities where they would like to give the exam in order of their preference. So a student may say his first preference for an exam centre is New York followed by Chicago followed by Boston.
Now keeping in mind that as the exam centres have limited capacity they cannot accomodate each students first choice .We would however try and provide as many students either their first or second choice of centres and as far as possible avoid students having to give the third choice centre to a student
Now any ideas of a sorting algorithm that would mke this process more efficent.The simple way to do this would be to first go through the list of first choice of students allot as many as possible then go through the list of second choices and allot. However this may lead to the students who are first in the list getting their first centre and the last students getting their third choice or worse none of their choices. Anything that could make this more efficient
Sounds like a variant of the classic stable marriages problem or the college admission problem. The Wikipedia lists a linear-time (in the number of preferences, O(n²) in the number of persons) algorithm for the former; the NRMP describes an efficient algorithm for the latter.
I suspect that if you randomly generate preferences of exam places for students (one Fisher–Yates shuffle per exam place) and then apply the stable marriages algorithm, you'll get a pretty fair and efficient solution.
This problem could be formulated as an instance of minimum cost flow. Let N be the number of students. Let each student be a source vertex with capacity 1. Let each exam center be a sink vertex with capacity, well, its capacity. Make an arc from each student to his first, second, and third choices. Set the cost of first choice arcs to 0; the cost of second choice arcs to 1; and the cost of third choice arcs to N + 1.
Find a minimum-cost flow that moves N units of flow. Assuming that your solver returns an integral solution (it should; flow LPs are totally unimodular), each student flows one unit to his assigned center. The costs minimize the number of third-choice assignments, breaking ties by the number of second-choice assignments.
There are a class of algorithms that address this allocating of limited resources called auctions. Basically in this case each student would get a certain amount of money (a number they can spend), then your software would make bids between those students. You might use a formula based on preferences.
An example would be for tutorial times. If you put down your preferences, then you would effectively bid more for those times and less for the times you don't want. So if you don't get your preferences you have more "money" to bid with for other tutorials.

Course assignment algorithm

I need to assign n people to m courses, where each person specified their first and second preference and each course has a maximum number of persons attending. Each person can only attend one course. The algorithm should find one solution where
the number of people assigned one course out of their preference is maximized
the number of people assigned their first choice is maximized (taking into account 1 which is of higher priority).
I guessed that this is not an uncommon problem but a search returned nothing too useful, therefore I decided to roll my own. This is what I came up so far:
For courses which have less first preferences than maximum numbers of people attending, assign all those persons to the course
For other courses: Put random people into the course which have selected this course as first choice until the course is full
For courses which have less second preferences than free spaces, assign all those persons to the course
For other courses: Put random people into the course which have selected this course as second choice until the course is full
For each person without a course: At their first (then second) preference look out for a person which has chosen another course where spots are still free (if more than one is found take the one which has chosen the course with most free spots), move this person to their second choice and assign the missing person
I still don't think this algorithm will find the optimal solution to the problem due to the last step. Any ideas how to make this one better? Are there other algorithm which solve this problem?
Place everyone in their first choice course if possible.
If there is anyone who didn't get it, place them in their second choice.
Now, we might get some who didn't get any of their choices. (the "losers".)
Find a person who got his first choice course, which is also the second choice of the "loser". This guy will be reassigned to his second choice, while the "loser" takes his slot. If there is no such person, then your problem is unsolvable.
Note that this maximizes the number of people who got their first choice:
If you got your second choice, then it means either:
someone else already got your first choice as his first choice
someone else got your first choice as his second choice, but only because his first choice was taken as someone else's second choice, and whose first choice was filled with first choice students.
(Possibly that last bit is a bit hard to follow, so here's a rewording:)
For person X with first choice A and second choice B:
If X got choice B, then:
Y took X's slot in A, and Y's first choice is A.
Y took X's slot in A, and Y's second choice is A. Y's first choice is C, but C's slots are all filled with other students whose first choice is C as well.
This is similar to the stable marriage problem.
Given n men and n women, where
each person has ranked all members of
the opposite sex with a unique number
between 1 and n in order of
preference, marry the men and women
together such that there are no two
people of opposite sex who would both
rather have each other than their
current partners. If there are no such
people, all the marriages are
"stable".
Update:
Taking #bdares comments into account, and the fact that the courses have a finite capacity it would be hard to cast the problem as stable matching.
I would solve this as a linear program with the objective function based on the number of people who get their first choice and the course size as a constraint.
The first problem can be modeled as a maximum cardinality bipartite matching problem. The second problem can be modeled as a weighted bipartite matching problem (also known as the assignment problem).
Sounds like a linear bottleneck assignment problem. While you in the wiki page, check out the link provided in the reference section.

How do you evaluate the efficiency of an algorithm, if the problem space is underspecified?

There was a post on here recently which posed the following question:
You have a two-dimensional plane of (X, Y) coordinates. A bunch of random points are chosen. You need to select the largest possible set of chosen points, such that no two points share an X coordinate and no two points share a Y coordinate.
This is all the information that was provided.
There were two possible solutions presented.
One suggested using a maximum flow algorithm, such that each selected point maps to a path linking (source → X → Y → sink). This runs in O(V3) time, where V is the number of vertices selected.
Another (mine) suggested using the Hungarian algorithm. Create an n×n matrix of 1s, then set every chosen (x, y) coordinate to 0. The Hungarian algorithm will give you the lowest cost for this matrix, and the answer is the number of coordinates selected which equal 0. This runs in O(n3) time, where n is the greater of the number of rows or the number of columns.
My reasoning is that, for the vast majority of cases, the Hungarian algorithm is going to be faster; V is equal to n in the case where there's one chosen point for each row or column, and substantially greater for any case where there's more than that: given a 50×50 matrix with half the coordinates chosen, V is 1,250 and n is 50.
The counterargument is that there are some cases, like a 109×109 matrix with only two points selected, where V is 2 and n is 1,000,000,000. For this case, it takes the Hungarian algorithm a ridiculously long time to run, while the maximum flow algorithm is blinding fast.
Here is the question: Given that the problem doesn't provide any information regarding the size of the matrix or the probability that a given point is chosen (so you can't know for sure) how do you decide which algorithm, in general, is a better choice for the problem?
You can't, it's an imponderable.
You can only define which is better "in general" by defining what inputs you will see "in general". So for example you could whip up a probability model of the inputs, so that the expected value of V is a function of n, and choose the one with the best expected runtime under that model. But there may be arbitrary choices made in the construction of your model, so that different models give different answers. One model might choose co-ordinates at random, another model might look at the actual use-case for some program you're thinking of writing, and look at the distribution of inputs it will encounter.
You can alternatively talk about which has the best worst case (across all possible inputs with given constraints), which has the virtue of being easy to define, and the flaw that it's not guaranteed to tell you anything about the performance of your actual program. So for instance HeapSort is faster than QuickSort in the worst case, but slower in the average case. Which is faster? Depends whether you care about average case or worst case. If you don't care which case, you're not allowed to care which "is faster".
This is analogous to trying to answer the question "what is the probability that the next person you see will have an above (mean) average number of legs?".
We might implicitly assume that the next person you meet will be selected at random with uniform distribution from the human population (and hence the answer is "slightly less than one", since the mean is less than the mode average, and the vast majority of people are at the mode).
Or we might assume that your next meeting with another person is randomly selected with uniform distribution from the set of all meetings between two people, in which case the answer is still "slightly less than one", but I reckon not the exact same value as the first - one-and-zero-legged people quite possibly congregate with "their own kind" very slightly more than their frequency within the population would suggest. Or possibly they congregate less, I really don't know, I just don't see why it should be exactly the same once you take into account Veterans' Associations and so on.
Or we might use knowledge about you - if you live with a one-legged person then the answer might be "very slightly above 0".
Which of the three answers is "correct" depends precisely on the context which you are forbidding us from talking about. So we can't talk about which is correct.
Given that you don't know what each pill does, do you take the red pill or the blue pill?
If there really is not enough information to decide, there is not enough information to decide. Any guess is as good as any other.
Maybe, in some cases, it is possible to divine extra information to base the decision on. I haven't studied your example in detail, but it seems like the Hungarian algorithm might have higher memory requirements. This might be a reason to go with the maximum flow algorithm.
You don't. I think you illustrated that clearly enough. I think the proper practical solution is to spawn off both implementations in different threads, and then take the response that comes back first. If you're more clever, you can heuristically route requests to implementations.
Many algorithms require huge amounts of memory beyond the physical maximum of a machine, and in these cases, the algorithmically more ineffecient in time but efficient in space algorithm is chosen.
Given that we have distributed parallel computing, I say you just let both horses run and let the results speak for themselves.
This is a valid question, but there's no "right" answer — they are incomparable, so there's no notion of "better".
If your interest is practical, then you need to analyze the kinds of inputs that are likely to arise in practice, as well as the practical running times (constants included) of the two algorithms.
If your interest is theoretical, where worst-case analysis is often the norm, then, in terms of the input size, the O(V3) algorithm is better: you know that V ≤ n2, but you cannot polynomially bound n in terms of V, as you showed yourself. Of course the theoretical best algorithm is a hybrid algorithm that runs both and stops when whichever one of them finishes first, thus its running time would be O(min(V3,n3)).
Theoretically, they are both the same, because you actually compare how the number of operations grows when the size of the problem is increased to infinity.
The way your problem is defined, it has 2 sizes - n and number of points, so this question has no answer.

Resources