I have a set of events that must occur randomly, but in a predefined frequency. i.e over a course of (totally) infinite events, event A should have occured 10% of the times, event B should have occured 3%, and so on... Of course the total sum of the percentages of the event list will add upto 100.
I want to achieve this programmatically. How do I do this?
You haven't specified a language, so here comes some pseudo-code
You basically want a function which will call other functions with various probabilities
Function RandomEvent
float roll = Random() -- Random number between 0 and 1
if roll < 0.1 then
EventA
else if roll < 0.13 then
EventB
....
interesting description. Without specific details constricting impementation, I can only offer an idea that you can modify to fit into the choices you've already made about your implementation. If you have a file for which every line contains a single event, then construct the file to have 10% A lines, 3% B lines, etc. Then when choosing an event, get an integer randomly generated to select a line number from the file.
You have to elaborate a little more on what you mean. If you just want the probabilities to be as you described, just pick a random number between 1-100 and map it to the events. That is, if the random number is 1-10, do Event A. If it's 11-13, do Event B, etc.
However, if you require things to come out exactly with those proportions at all times (not that this is really possible), you have to do it differently. Please confirm which meaning you are looking for and I'll edit if needed.
For each event, generate a random number between 0 and 100. If event A should occur 10% of the times, map values 0 - 10 to event A, and so on.
For instance, for 2 events:
n = 0 - 10 ==> Event A
n = 11 - 99 ==> Event B
If you do this, you can have your events occur at random times, and if the running time is long enough (and your RNG is good enough), event frequencies will add up to the desired percentage.
Generate a sequence of events in the exact proportions you want.
For each event, randomly generate a timestamp when each event should be delivered, within your time bounds.
Sort by that timestamp
Run through the list, delivering each event at the appropriate time.
Choose a random number from 1 to 100 inclusive. Assign each event a unique set of integers that represents the frequency that it should occur. If you randomly generated number falls within that particular selected range of numbers fire the associated event.
In the example above the event that should show 10% of the time you would assign it a range of integers 10 integers long (1-10, 12-21, etc...). How you store these integer rangess is up to you.
Like Michael said, since these are random numbers there is no way to guarantee said event fires exactly 10% of the time but over the long run it should...given an even distribution of random numbers.
Related
I'm trying to write a program that requires me to generator a random number.
I also need to make it so there's a variable chance to pick a set range.
In this case, I would be generating between 1-10 and the range with the percent chance is 7-10.
How would I do this? Could I be supplied with a formula or something like that?
So if I'm understanding your question, you want two number ranges, and a variable-defined probability that the one range will be selected. This can be described mathematically as a probability density function (PDF), which in this case would also take your "chance" variable as an argument. If, for example, your 7-10 range is more likely than the rest of the 1-10 range, your PDF might look something like:
One PDF such as a flat distribution can be transformed into another via a transformation function, which would allow you to generate a uniformly random number and transform it to your own density function. See here for the rigorous mathematics:
http://www.stat.cmu.edu/~shyun/probclass16/transformations.pdf
But since you're writing a program and not a mathematics thesis, I suggest that the easiest way is to just generate two random numbers. The first decides which range to use, and the second generates the number within your chosen range. Keep in mind that if your ranges overlap (1-10 and 7-10 obviously do) then the overlapping region will be even more likely, so you will probably want to make your ranges exclusive so you can more easily control the probabilities. You haven't said what language you're using, but here's a simple example in Python:
import random
range_chance = 30 #30 percent change of the 7-10 range
if random.uniform(0,100) < range_chance:
print(random.uniform(7,10))
else:
print(random.uniform(1,7)) #1-7 so that there is no overlapping region
I can choose from a list of "actions" to perform one once a second. Each action on the list has a numerical value representing how much it's worth, and also a value representing its "cooldown" -- the number of seconds I have to wait before using that action again. The list might look something like this:
Action A has a value of 1 and a cooldown of 2 seconds
Action B has a value of 1.5 and a cooldown of 3 seconds
Action C has a value of 2 and a cooldown of 5 seconds
Action D has a value of 3 and a cooldown of 10 seconds
So in this situation, the order ABA would have a total value of (1+1.5+1) = 3.5, and it would be acceptable because the first use of A happens at 1 second and the final use of A happens at 3 seconds, and then difference between those two is greater than or equal to the cooldown of A, 2 seconds. The order AAB would not work because you'd be doing A only a second apart, less than the cooldown.
My problem is trying to optimize the order in which the actions are used, maximizing the total value over a certain number of actions. Obviously the optimal order if you're only using one action would be to do Action D, resulting in a total value of 3. The maximum value from two actions would come from doing CD or DC, resulting in a total value of 5. It gets more complicated when you do 10 or 20 or 100 total actions. I can't find a way to optimize the order of actions without brute forcing it, which gives it complexity exponential on the total number of actions you want to optimize the order for. That becomes impossible past about 15 total.
So, is there any way to find the optimal time with less complexity? Has this problem ever been researched? I imagine there could be some kind of weighted-graph type algorithm that works on this, but I have no idea how it would work, let alone how to implement it.
Sorry if this is confusing -- it's kind of weird conceptually and I couldn't find a better way to frame it.
EDIT: Here is a proper solution using a highly modified Dijkstra's Algorithm:
Dijkstra's algorithm is used to find the shortest path, given a map (of a Graph Abstract), which is a series of Nodes(usually locations, but for this example let's say they are Actions), which are inter-connected by arcs(in this case, instead of distance, each arc will have a 'value')
Here is the structure in essence.
Graph{//in most implementations these are not Arrays, but Maps. Honestly, for your needs you don't a graph, just nodes and arcs... this is just used to keep track of them.
node[] nodes;
arc[] arcs;
}
Node{//this represents an action
arc[] options;//for this implementation, this will always be a list of all possible Actions to use.
float value;//Action value
}
Arc{
node start;//the last action used
node end;//the action after that
dist=1;//1 second
}
We can use this datatype to make a map of all of the viable options to take to get the optimal solution, based on looking at the end-total of each path. Therefore, the more seconds ahead you look for a pattern, the more likely you are to find a very-optimal path.
Every segment of a road on the map has a distance, which represents it's value, and every stop on the road is a one-second mark, since that is the time to make the decision of where to go (what action to execute) next.
For simplicity's sake, let's say that A and B are the only viable options.
na means no action, because no actions are avaliable.
If you are travelling for 4 seconds(the higher the amount, the better the results) your choices are...
A->na->A->na->A
B->na->na->B->na
A->B->A->na->B
B->A->na->B->A
...
there are more too, but I already know that the optimal path is B->A->na->B->A, because it's value is the highest. So, the established best-pattern for handling this combination of actions is (at least after analyzing it for 4 seconds) B->A->na->B->A
This will actually be quite an easy recursive algorithm.
/*
cur is the current action that you are at, it is a Node. In this example, every other action is seen as a viable option, so it's as if every 'place' on the map has a path going to every other path.
numLeft is the amount of seconds left to run the simulation. The higher the initial value, the more desirable the results.
This won't work as written, but will give you a good idea of how the algorithm works.
*/
function getOptimal(cur,numLeft,path){
if(numLeft==0){
var emptyNode;//let's say, an empty node wiht a value of 0.
return emptyNode;
}
var best=path;
path.add(cur);
for(var i=0;i<cur.options.length;i++){
var opt=cur.options[i];//this is a COPY
if(opt.timeCooled<opt.cooldown){
continue;
}
for(var i2=0;i2<opt.length;i2++){
opt[i2].timeCooled+=1;//everything below this in the loop is as if it is one second ahead
}
var potential=getOptimal(opt[i],numLeft-1,best);
if(getTotal(potential)>getTotal(cur)){best.add(potential);}//if it makes it better, use it! getTotal will sum up the values of an array of nodes(actions)
}
return best;
}
function getOptimalExample(){
log(getOptimal(someNode,4,someEmptyArrayOfNodes));//someNode will be A or B
}
End edit.
I'm a bit confused on the question but...
If you have a limited amount of actions, and that's it, then always pick the action with the most value, unless the cooldown hasn't been met yet.
Sounds like you want something like this (in pseudocode):
function getOptimal(){
var a=[A,B,C,D];//A,B,C, and D are actions
a.sort()//(just pseudocode. Sort the array items by how much value they have.)
var theBest=null;
for(var i=0;i<a.length;++i){//find which action is the most valuable
if(a[i].timeSinceLastUsed<a[i].cooldown){
theBest=a[i];
for(...){//now just loop through, and add time to each OTHER Action for their timeSinceLastUsed...
//...
}//That way, some previously used, but more valuable actions will be freed up again.
break;
}//because a is worth the most, and you can use it now, so why not?
}
}
EDIT: After rereading your problem a bit more, I see that the weighted scheduling algorithm would need to be tweaked to fit your problem statement; in our case we only want to take those overlapping actions out of the set that match the class of the action we selected, and those that start at the same point in time. IE if we select a1, we want to remove a2 and b1 from the set but not b2.
This looks very similar to the weighted scheduling problem which is discussed in depth in this pdf. In essence, the weights are your action's values and the intervals are (starttime,starttime+cooldown). The dynamic programming solution can be memoized which makes it run in O(nlogn) time. The only difficult part will be modifying your problem such that it looks like the weighted interval problem which allows us to then utilize the predetermined solution.
Because your intervals don't have set start and end times (IE you can choose when to start a certain action), I'd suggest enumerating all possible start times for all given actions assuming some set time range, then using these static start/end times with the dynamic programming solution. Assuming you can only start an action on a full second, you could run action A for intervals (0-2,1-3,2-4,...), action B for (0-3,1-4,2-5,...), action C for intervals (0-5,1-6,2-7,...) etc. You can then use union the action's sets to get a problem space that looks like the original weighted interval problem:
|---1---2---3---4---5---6---7---| time
|{--a1--}-----------------------| v=1
|---{--a2---}-------------------| v=1
|-------{--a3---}---------------| v=1
|{----b1----}-------------------| v=1.5
|---{----b2-----}---------------| v=1.5
|-------{----b3-----}-----------| v=1.5
|{--------c1--------}-----------| v=2
|---{--------c2---------}-------| v=2
|-------{-------c3----------}---| v=2
etc...
Always choose the available action worth the most points.
Say I want to schedule a collection of events in the period 00:00–00:59. I schedule them on full minutes (00:01, never 00:01:30).
I want to space them out as far apart as possible within that period, but I don't know in advance how many events I will have total within that hour. I may schedule one event today, then two more tomorrow.
I have the obvious algorithm in my head, and I can think of brute-force ways to implement it, but I'm sure someone knows a nicer way. I'd prefer Ruby or something I can translate to Ruby, but I'll take what I can get.
So the algorithm I can think of in my head:
Event 1 just ends up at 00:00.
Event 2 ends up at 00:30 because that time is the furthest from existing events.
Event 3 could end up at either 00:15 or 00:45. So perhaps I just pick the first one, 00:15.
Event 4 then ends up in 00:45.
Event 5 ends up somewhere around 00:08 (rounded up from 00:07:30).
And so on.
So we could look at each pair of taken minutes (say, 00:00–00:15, 00:15–00:30, 00:30–00:00), pick the largest range (00:30–00:00), divide it by two and round.
But I'm sure it can be done much nicer. Do share!
You can use bit reversing to schedule your events. Just take the binary representation of your event's sequential number, reverse its bits, then scale the result to given range (0..59 minutes).
An alternative is to generate the bit-reversed words in order (0000,1000,0100,1100,...).
This allows to distribute up to 32 events easily. If more events are needed, after scaling the result you should check if the resulting minute is already occupied, and if so, generate and scale next word.
Here is the example in Ruby:
class Scheduler
def initialize
#word = 0
end
def next_slot
bit = 32
while (((#word ^= bit) & bit) == 0) do
bit >>= 1;
end
end
def schedule
(#word * 60) / 64
end
end
scheduler = Scheduler.new
20.times do
p scheduler.schedule
scheduler.next_slot
end
Method of generating bit-reversed words in order is borrowed from "Matters Computational
", chapter 1.14.3.
Update:
Due to scaling from 0..63 to 0..59 this algorithm tends to make smallest slots just after 0, 15, 30, and 45. The problem is: it always starts filling intervals from these (smallest) slots, while it is more natural to start filling from largest slots. Algorithm is not perfect because of this. Additional problem is the need to check for "already occupied minute".
Fortunately, a small fix removes all these problems. Just change
while (((#word ^= bit) & bit) == 0) do
to
while (((#word ^= bit) & bit) != 0) do
and initialize #word with 63 (or keep initializing it with 0, but do one iteration to get the first event). This fix decrements the reversed word from 63 to zero, it always distributes events to largest possible slots, and allows no "conflicting" events for the first 60 iteration.
Other algorithm
The previous approach is simple, but it only guarantees that (at any moment) the largest empty slots are no more than twice as large as the smallest slots. Since you want to space events as far apart as possible, algorithm, based on Fibonacci numbers or on Golden ratio, may be preferred:
Place initial interval (0..59) to the priority queue (max-heap, priority = interval size).
To schedule an event, pop the priority queue, split the resulting interval in golden proportion (1.618), use split point as the time for this event, and put two resulting intervals back to the priority queue.
This guarantees that the largest empty slots are no more than (approximately) 1.618 times as large as the smallest slots. For smaller slots approximation worsens and sizes are related as 2:1.
If it is not convenient to keep the priority queue between schedule changes, you can prepare an array of 60 possible events in advance, and extract next value from this array every time you need a new event.
Since you can have only 60 events at maximum to schedule, then I suppose using static table is worth a shot (compared to thinking algorithm and testing it). I mean for you it is quite trivial task to layout events within time. But it is not so easy to tell computer how to do it nice way.
So, what I propose is to define table with static values of time at which to put next event. It could be something like:
00:00, 01:00, 00:30, 00:15, 00:45...
Since you can't reschedule events and you don't know in advance how many events will arrive, I suspect your own proposal (with Roman's note of using 01:00) is the best.
However, if you have any sort of estimation on how many events will arrive at maximum, you can probably optimize it. For example, suppose you are estimating at most 7 events, you can prepare slots of 60 / (n - 1) = 10 minutes and schedule the events like this:
00:00
01:00
00:30
00:10
00:40
00:20
00:50 // 10 minutes apart
Note that the last few events might not arrive and so 00:50 has a low probability to be used.
which would be fairer then the non-estimation based algorithm, especially in the worst-case scenario were all slots are used:
00:00
01:00
00:30
00:15
00:45
00:07
00:37 // Only 7 minutes apart
I wrote a Ruby implementation of my solution. It has the edge case that any events beyond 60 will all stack up at minute 0, because every free space of time is now the same size, and it prefers the first one.
I didn't specify how to handle events beyond 60, and I don't really care, but I suppose randomization or round-robin could solve that edge case if you do care.
each_cons(2) gets bigrams; the rest is probably straightforward:
class Scheduler
def initialize
#scheduled_minutes = []
end
def next_slot
if #scheduled_minutes.empty?
slot = 0
else
circle = #scheduled_minutes + [#scheduled_minutes.first + 60]
slot = 0
largest_known_distance = 0
circle.each_cons(2) do |(from, unto)|
distance = (from - unto).abs
if distance > largest_known_distance
largest_known_distance = distance
slot = (from + distance/2) % 60
end
end
end
#scheduled_minutes << slot
#scheduled_minutes.sort!
slot
end
def schedule
#scheduled_minutes
end
end
scheduler = Scheduler.new
20.times do
scheduler.next_slot
p scheduler.schedule
end
I have a short random number input, let's say int 0-999.
I don't know the distribution of the input. Now I want to generate a random number in range 0-99999 based on the input without changing the distribution shape.
I know there is a way to make the input to [0,1] by dividing it by 999 and then multiple 99999 to get the result. However, this method doesn't cover all the possible values, like 99999 will never get hit.
Assuming your input is some kind of source of randomness...
You can take two consecutive inputs and combine them:
input() + 1000*(input()%100)
Be careful though. This relies on the source having plenty of entropy, so that a given input number isn't always followed by the same subsequent input number. If your source is a PRNG designed to cycle between the numbers 0–999 in some fashion, this technique won't work.
With most production entropy sources (e.g., /dev/urandom), this should work fine. OTOH, with a production entropy source, you could fetch a random number between 0–99999 fairly directly.
You can try something like the following:
(input * 100) + random
where random is a random number between 0 and 99.
The problem is that input only specifies which 100 range to use. For instance 50 just says you will have a number between 5000 and 5100 (to keep a similar shape distribution). Which number between 5000 and 5100 to pick is up to you.
A user visits my website at time t, and they may or may not click on a particular link I care about, if they do I record the fact that they clicked the link, and also the duration since t that they clicked it, call this d.
I need an algorithm that allows me to create a class like this:
class ClickProbabilityEstimate {
public void reportImpression(long id);
public void reportClick(long id);
public double estimateClickProbability(long id);
}
Every impression gets a unique id, and this is used when reporting a click to indicate which impression the click belongs to.
I need an algorithm that will return a probability, based on how much time has past since an impression was reported, that the impression will receive a click, based on how long previous clicks required. Clearly one would expect that this probability will decrease over time if there is still no click.
If necessary, we can set an upper-bound, beyond which we consider the click probability to be 0 (eg. if its been an hour since the impression occurred, we can be pretty sure there won't be a click).
The algorithm should be both space and time efficient, and hopefully make as few assumptions as possible, while being elegant. Ease of implementation would also be nice. Any ideas?
Assuming you keep data on past impressions and clicks, it's easy: let's say that you have an impression, and a time d' has passed since that impression. You can divide your data into three groups:
Impressions which received a click in less than d'
Impressions which received a click after more than d'
Impressions which never received a click
Clearly the current impression is not in group (1), so eliminate that. You want the probability it is in group (2), which is then
P = N2 / (N2 + N3)
where N2 is the number of impressions in group 2, and similarly for N3.
As far as actual implementation, my first thought would be to keep an ordered list of the times d for past impressions which did receive clicks, along with a count of the number of impressions which never received a click, and just do a binary search for d' in that list. The position you find will give you N1, and then N2 is the length of the list minus N1.
If you don't need perfect granularity, you can store the past times as a histogram instead, i.e. a list that contains, in each element list[n], the number of impressions that received a click after at least n but less than n+1 minutes. (Or seconds, or whatever time interval you like) In that case you'd probably want to keep the total number of clicks as a separate variable so you can easily compute N2.
(By the way, I just made this up, I don't know if there are standard algorithms for this sort of thing that may be better)
I would suggest hypothesizing an arrival process (clicks per minute) and trying to fit a distribution to that arrival process using your existing data. I'll bet the result is negative binomial which is what you get when you have a poisson arrival process with a non-stationary mean if the mean has a gamma distribution. The inverse (minutes per click) gives you the distribution of the interarrival process. Don't know if there's a distribution named for that, but you can create an empirical one.
Hope this helps.