Probability of event - probability

Probability of event - probability

Here is a probability problem: you observe .5 cars on average passing in front of you every 5 minutes on a road. What is the probability of seeing at least 1 car in 10 minutes?
I'm trying to solve this in 2 ways. The first way is to say: P(no car in 5 minutes) = 1 - .5 = .5. P(no car in first 5 minutes and no car in second 5 minutes) = P(no car in first 5 minutes) * P(no car in second 5 minutes) by independence. Therefore P(at least 1 car in 10 minutes) = 1 - .5*.5 = .75.
However, if I try the same, with a Poisson distribution with rate lambda = .5 per unit of time, for 2 units of time, I get: P(at least 1 car in 2 units of time) = 1 - exp(-2*lambda) = .63.
Am I doing something wrong? If not, what explains the discrepancy?
Thanks!

Your first calculation is incorrect. An average .5 cars / 5 minutes does not imply P(no car in 5 minutes) = 0.5. Consider for instance a process where every five minute, you see either no car with probability 90%, or 5 cars with probability 10%. On average you will see 0.5 cars every five minute, but the probability you see 0 cars in the next 5 minutes is clearly not 50%.
I haven't checked the computations for your second example; the calculation logic is looks correct, but the conclusion is incorrect: you are making an assumption about the distribution (Poisson) which is plausible but not implied by the problem statement.
If you take again my example, which is consistent with your problem description, the probability to see 0 cars in 10 minutes is 0.9 x 0.9 = 0.81, which gives you 19% of seeing one car or more. We could arbitrarily change my example to give you a wide variety of probabilities.
From your problem statement, the only thing you can say is that "in the long run, you'll see 0.5 cars every 5 minutes". Beyond that you can't make a statement on what should be expected within 10 minutes, unless you make some assumptions about the distribution of the cars arrivals.

Related

How to devide some fixed amount of reward points to players of a racing game in a fair way depending on their finishing time

I'm in need of some kind of algorithm I can't figure out on my own sadly.
My biggest problem is that I have no good way to describe the problem... :/
I will try like this:
Imagine you have a racing game where everyone can try to be the fastest on a track or map. Every Map is worth 100 Points in total. If someone finished a map in some amount of time he gets a record in a database. If the player is the first and only player to finish this map he earns all the 100 points of this map.
Now, that's easy ;) but...
Now another player finishes the map. Let's imagine the first player finishes in 50 Seconds and the 2nd player finishes in 55 seconds, so a bit slower. I now need a calculation depending on both records in the database. Each of both players now earn a part of the 100 points. The faster player a bit more then the slower player. Let's say they finished the exact same time they both would get 50 points from 100, but as the first one is slightly faster, he now earns something around 53 of the points and the slower player just 47.
I started to calculate this like this:
Sum of both records is 105 seconds, the faster player took 50/105 in percent of this, so he earns 100-(50/105*100) points and the slower player 100-(55/105*100) points. The key to this is, that all points distributed among the players always equals to 100 in total. This works for 2 players, but it breaks at 3 and more.
For example:
Player 1 : 20 seconds
Player 2 : 20 seconds
Player 3 : 25 seconds
Calculation would be:
Player 1: 100-(20/65*100) = 69 points
Player 2: 100-(20/65*100) = 69 points
Player 3: 100-(25/65*100) = 61 points
This would no longer add up to 100 points in total.
Fair would be something around values of:
Player 1 & 2 (same time) = 35 points
Player 3 = 30 points
My problem is i can't figure out a algorithm which solves this.
And I need the same algorithm for any amount of players. Can someone help with an idea? I don't need a complete finished algorithm, maybe just an idea at which step i used the wrong idea, maybe the sum of all times is already a bad start.
Thx in advance :)

We can give each player points proportional to the reciprocal of their time.
One player with t seconds gets 100 × (1/t) / (1/t) = 100 points.
Of the two players, the one with 50 seconds gets 100 × (1/50) / (1/50 + 1/55) ≈ 52.4, and the one with 55 gets 100 × (1/55) / (1/50 + 1/55) ≈ 47.6.
Of the three players, the ones with 20 seconds get 100 × (1/20) / (1/20 + 1/20 + 1/25) ≈ 35.7, and the one with 25 seconds gets 100 × (1/25) / (1/20 + 1/20 + 1/25) ≈ 28.6.

Simple observation: Let the sum of times for all players be S. A person with lower time t would have a higher value of S-t. So you can reward points proportional to S-t for each player.
Formula:
Let the scores for N players be a,b,c...,m,n. Total sum S = a+b+c...+m+n. Then score for a given player would be
score = [S-(player's score)]/[(N-1)*S] * 100
You can easily see that using this formula, the sum of scores of all players will be always be 100.
Example 1:
S = 50 + 55 = 105, N-1 = 2-1 = 1
Player 1 : 50 seconds => score = ((105-50)/[1*105])*100 = 52.38
Player 2 : 55 seconds => score = ((105-55)/[1*105])*100 = 47.62
Similarly, for your second example,
S = 20 + 20 + 25 = 65
N - 1 = 3 - 1 = 2
For Player 1, (S-t) = 65-20 = 45
Player 1's score => (45/(2*65))*100 = 34.6
Player 2 => same as Player 1
For Player 3, (S-t) = 65-25 = 40
Player 3's score => (40/(2*65))*100 = 30.8
This method avoids any division in the intermediate states, so there will be no floating point issues for the calculations.

How to define a algorithm that gives a ranking number for at dentist?

I have some problems with defining a algorithm that will calculate a ranking number for a dentist.
Assume, we have three different dentists:
dentist number 1: Got 125 patients and out of the 125 patients the
dentist have booked a time with 75 of them. 60% of them got a time.
dentist number 2: Got 5 patients and out of the 5 patients the
dentist have booked a time with 4 of them. 80% of them got a time.
dentist number 3: Got 25 patients and out of the 14 patients the
dentist have booked a time with 14 of them. 56% got a time.
If we use the formula:
patients booked time with / totalpatients * 100
it will not be the right way to calculate the ranking, as we will get an output of the higher percentage is, the better the dentist is, but it's wrong. By doing it in that way, the dentists would have a ranking:
dentist number 2 would have a ranking of 1. (80% got a time).
dentist number 1 would have a ranking of 2 (60% got a time).
dentist number 3 would have a ranking of 3. (56% got a time).
But, it should be in this way:
dentist number 1 = ranking 1
dentist number 2 = ranking 2
dentist number 3 = ranking 3
I don't know to make a algorithm that also takes the amount of patients as a factor to the ranking-calculation.

It is quite arbitrary how you define what makes a better dentist in terms of number of patients and the percentage of those that have an appointment with them.
Let's call the number of patients P, the number of those that have an appointment A, and the function determining how "good" a dentist is f. So f would be a function of P and A: f(P, A).
One component of f could indeed be what you already calculated: A/P.
Another component would have to be P, but I would think that the effect on f(P, A) of increasing P with 1 would be much higher for a low P, than for a high P, so this component should not be a linear function. It would also be practical if this component would have a value between 0 and 1, just like the other component.
Taking all this together, I suggest this definition of f, which will give a number between 0 and 1:
f(P,A) = 1/3 * P/(10 + P) + 2/3 * A/P
For the different dentists, this results in:
1: 1/3 * 125/135 + 2/3 * 75/125 = 0.7086419753...
2: 1/3 * 5/15 + 2/3 * 4/5 = 0.6444444444...
3: 1/3 * 25/35 + 2/3 * 14/25 = 0.6114285714...
You could play a bit with the constant factors in the formula, like increasing the term 10. Or you could change the factors 1/3 and 2/3 making sure that their sum is 1.
This is just one way to do it. There are an infinity of other ways...

Minimize the number of trips or Group maximum possible orders

We have one distribution center ( ware house ) and we are getting orders in real time whose time/distance from ware house and other order locations is known.
time matrix=
W O1 O2 O3
W 0 5 20 2
O1 5 0 21 7
O2 20 21 0 11
O3 2 7 11 0
order time of O1= 10:00 AM
order time of O2= 10:20 AM
order time of O3= 10:25 AM
I want to club as many as order possible such that delivery time of any order does not exceed by 2 hours of its order time. Thus the question is to reduce the number of trips(Trip is when delivery agent goes for delivery).
I am trying to come up with algorithm for this. there are two competing factors when
We can combine all the orders in the sequence as they are coming till it satisfies the constraint of delivery of the order within 2 hours of its ordering time.
We can modify above approach to find the bottleneck order(due to which we can not club more order now in approach 1). and pull it out from trip1 and make it a part of trip 2(new order) and wait for other orders to club it with trip1 or trip2 depending.
All the orders are coming in realtime. What will be the best approach to conquer this situation. Let me know if you need more clarity on this.

Very safe and easy algorithm which is guaranteed to not exceed the maximal waiting time for an order:
Let TSP() be a function which returns the estimate of time spent to visit given places. The estimate is pessimistic, i.e. the actual ride time can be shorter or equals to estimate, but not longer. For the good start you can implement TSP() very easily in a greedy way: from each place go to the nearest place. You can subtract the length of the longer edge coming out from W to have better estimate (so a car will always take the shorter edge coming out of W). If TSP() would happen to be optimal, then the whole algorithm presented here would be also optimal. The overall algorithm is as good as TSP() implementation is, it highly depends on good estimation.
Let earliestOrderTime be a time of the earliest not handled yet order.
Repeat every minute:
If there is a new order: If s is empty, set earliestOrderTime to current time. Add it to a set s. Calculate t = TSP(s + W).
If (current time + t >= earliestOrderTime + 2 hours): send a car for a TSP(s + W) trip. Make s an empty set.
Example
For your exemplary data it will work like this:
10:00. earliestOrderTime = 10:00. s = {O1}. t = TSP({01, W}) = 10 - 5 = 5.
10:00 + 0:05 < 10:00 + 2:00, so we don't send a car yet, we wait.
...
10:20. s = {O1, O2}. t = 46 - 20 = 26.
10:20 + 0:26 < 10:00 + 2:00, so we wait.
...
10:25. s = {O1, O2, O3}. t = 2 + 7 + 21 + 20 - 20 = 30.
10:25 + 0:30 < 10:00 + 2:00, so we wait.
...
11.30.
11:30 + 0:30 >= 10:00 + 2:00, so we send a car to go to O3, O1, O2 and back to W. He visits orders at 11:32, 11:39, 12:00 and come backs at 12:20. Guys where waiting 67, 99 and 100 minutes.

How to calculate one certain value from a rolling-window estimation in Stata

I'm using Stata to estimate Value-at-risk (VaR) with the historical simulation method. Basically, I will create a rolling window with 100 observations, to estimate VaR for the next 250 days (repeat 250 times). Hence, as I've known, the rolling window with time series command in Stata would be useful in this case. Here is the process:
Input: 350 values
1. Ascending sort the very first 100 values (by magnitude).
2. Then I need to take the 5th smallest for each window.
3. Repeat 250 times.
Output: a list of the 5th values (250 in total).
Sound simple, but I cannot do it the right way. This was my attempt below:
program his,rclass
sort lnreturn
return scalar actual=lnreturn in 5
end
tsset stt
time variable: stt, 1 to 350
delta: 1 unit
rolling actual=r(actual), window(100) saving(C:\result100.dta, replace) : his
(running his on estimation sample)
And the result is:
Start end actual
1 100 -.047856
2 101 -.047856
3 102 -.047856
4 103 -.047856
.... ..... ......
251 350 -.047856
What I want is 250 different 5th values in panel "actual", not the same like that.

If I understand this correctly, you want the 5th percentile of values in a window of 100. That should yield to summarize, detail or centile. I see no need to write a program.
Your bug is that your program his calculates the same thing each time it is called. There is no communication about windows other than what is explicit in your code. It is like saying
move here: now add 2 + 2
move there: now add 2 + 2
move to New York: now add 2 + 2
The result is invariant to your supposed position.
Note that I doubt that
return scalar actual=lnreturn in 5
really is your code. lnreturn[5] should work.
UPDATE You don't even need rolling here. Looping over data is easy enough. The data in this example are clearly fake.
clear
* sandpit
set obs 500
set seed 2803
gen y = ceil(exp(rnormal(3,2)))
l y in 1/5
* initialise
gen p5 = .
* windows of length 100: 1..100, 101..200, ...
quietly forval j = 1/401 {
local J = `j' + 99
su y in `j'/`J', detail
replace p5 = r(p5) in `j'
}
* check first calculation
su y in 1/100, detail
l in 1/5

Algorithm for a health/activity score for completing tasks on a website

We're creating a basic site where users get to-do's delivered to them each week. I need to create an algorithm which based on there completion (or lack there of) of these to-do's, I assign them a health score of 0 - 100.
There are 0 - 4 to-do's delivered each week.
A to-do can be completed, deleted (marked as irrelevant), or left pending.
If users aren't completing their to-do's (to-do's in the pending state) then they have a low health meter.
I'd also like to weight the pending states. For example pending to-do's in the first week aren't as detrimental to the score as pending to-do's in the 4th week. I'm only thinking of using the last 4-6 weeks of data to determine the score.
Any help with the approach I should take would be much appreciated.
I'm currently using the following notation
t0 # total to-dos given in week 0
t0_c # completed to-dos from week 0
t0_d # deleted to-dos from week 0
t0_p # pending to-dos from week 0

This sounds like the perfect place for a Moving average.
Example:
health for a week = 100*((not done for that week)^1.5)/8
Then use an exponential moving average on the historical and current health scores to get the current health.

May be you could assign completion scores each week and assign weight-age for each week's completion scores. Since you are considering say only last 5 weeks of scores, you could have something like (1 for completion, 0 for pending)
week 1 [1,0,0,1] completion score = 50%
week 2 [1,0,1] completion score = 66.6%
week 3 [0] completion score = 0%
week 4 [1] completion score = 100%
week 5 [1,0,0,0] completion score = 25%
All dont have total of 4 to-dos as some may be irrelevant
Now you assign weekly weights, you know week1 is 5 weeks back and so should have more weight and eventually comes down at week 5, so something like
week1 weight = 30%
week2 weight = 25%
week3 weight = 20%
week4 weight = 15%
week5 weight = 10%
Now just multiply week's completion score to its weight and add the terms
(50*30 + 66.6*25 + 25*0 + 15*100 + 25*10)/100 = 49.15%
One downside to this approach is that,
Say Guy1 week1 has 1 to-do pending i.e
week 1 [0] => score = 0%
Say Guy2 week1 has 2 to-do pending but one complete i.e
week 1 [0,0,1] => score = 33%
Guy2 gets much higher score though he has more pending work
If number of to-dos are roughly same among the customers on an average then this wont be a big issue.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio