Even probabilities in sequential execution - probability

Imagine you have three actions which you execute sequentially:
doActionA
doActionB
doActionC
Additionally, there is a probability for each action to be performed (for example 33 % for each of those actions). If one of those actions is performed, the other ones will terminate.
doActionA probability 33
doActionB probability 33
doActionC probability 33
For example, if doActionA is successful, only ActionA get executed, the results of the other two actions do not matter. If ActionA fails, but ActionB has success, ActionC does not get executed. If ActionC has success, ActionA and ActionB logically also fail. If no one success, then no one gets activated.
Now my question is - as ActionA is the first and ActionB is the second in the sequence, the probability of those two are higher than ActionC, despite it says probability 33 in all three cases right? My goal is, to achieve, that all the three probabilities are even. I think because of the sequential nature of this execution, ActionA must have a smaller probability value than ActionB and ActionB must have a smaller probability value than ActionC. How can I calculate those probabilities?

Answer from #twalberg: If you're stuck with the existing algorithm, you probably want to set the probabilities at 33, 50 and 100% for each of the actions. In that case the first action has a 33% chance, but if it doesn't happen, then the second one will have a 50% conditional probability or 67% * 50% = 33% total probablilty. And if neither of the first two happen, then the third one always will...

Related

Executing genetic algorithm per configuration

I'm trying to understand a paper I'm reading regarding genetic algorithms.
They running there a GA with parameters they presented.
Some of the parameters are:
stop criterion - 50 generations.
Runs per configuration - 30.
After they presented the parameters they said that they executed the algorithm 20 times.
I don't understand two things:
1. what the second parameter means? that every configuration runs untill it reaches 30 generations?
2. when they execute the algorithm 20 times, it means untill they reach 20 generations or that they executed 20 configurations?
thanks.
Question 1
Usually in GA:s parameters such as mutation rates, selection rates, etc, need to be customised for the problem at hand. To get a somewhat good measure to quantify what is a good or bad parameter configuration, each configuration is repeated for a number of runs, whereafter the mean and standard deviation of the best solution in each of these runs are computed. Note that these runs have nothing to do with the number of generations or specific parameters in the GA, it's simply a way to reduce noise and increase reliability when comparing and choosing among different parameter configurations.
On the other hand, "number of runs" in itself can also be a parameter of study, but it's quite obvious that increasing the total number of runs will increase the chance to find a really good solution (say a large deviation from mean GA performance). If studying such a case, it's important that the total number of generations (effectively the total number of function evaluations), over all runs, are the same between different parameter settings.
As an example, consider the following parameter configuration:
numberOfRuns = 30
numberOfGenerations = 50
==> a total of 1500 generation analysed
studied vs configuration:
numberOfRuns = 50
numberOfGenerations = 30
==> a total of 1500 generation analysed
A study like this could examine if it is favourable to have more generations in each GA run (first scenario), of if it's better to have fewer generations but more runs (second scenario: probably favourable for a GA with quick convergence to local optima but with large standard deviation).
The following parameter configuration, however, would not have any meaning to benchmark against the two above, since it has a larger number of total generations:
numberOfRuns = 50
numberOfGenerations = 50
==> a total of 2500 generation analysed
Question 2
If they say they execute the algorithm 20 times, it would generally be equivalent with 20 runs as per described above. Hence, 20 runs each running over 50 generations.

Python Probability Aligorithm

I am looking for help with a Python algorithm that will take a percent or fraction (such as 45% or 4500/10000) and testing it multiple times, and seeing how many times it comes out true, and how many times it comes out false.
Basically, I am looking for an algorithm that will take a probability, test it multiple times, and give us results on how many times you, say, survived, or died.
Is this possible, and can anyone help me?
Loop over the following for the number of trials you want:
Generate a random integer between 0 and the denominator (if it's a fraction) or real number between 0 and 1 (if it's a percent)
If the value is less than the numerator/percent, record a failure, otherwise record a success
You can find information on generating random values in the python documentation, and how you determine whether you're working with a percent or a fraction will depend on how you accept and parse user input.

Algorithm for optimizing the order of actions with cooldowns

I can choose from a list of "actions" to perform one once a second. Each action on the list has a numerical value representing how much it's worth, and also a value representing its "cooldown" -- the number of seconds I have to wait before using that action again. The list might look something like this:
Action A has a value of 1 and a cooldown of 2 seconds
Action B has a value of 1.5 and a cooldown of 3 seconds
Action C has a value of 2 and a cooldown of 5 seconds
Action D has a value of 3 and a cooldown of 10 seconds
So in this situation, the order ABA would have a total value of (1+1.5+1) = 3.5, and it would be acceptable because the first use of A happens at 1 second and the final use of A happens at 3 seconds, and then difference between those two is greater than or equal to the cooldown of A, 2 seconds. The order AAB would not work because you'd be doing A only a second apart, less than the cooldown.
My problem is trying to optimize the order in which the actions are used, maximizing the total value over a certain number of actions. Obviously the optimal order if you're only using one action would be to do Action D, resulting in a total value of 3. The maximum value from two actions would come from doing CD or DC, resulting in a total value of 5. It gets more complicated when you do 10 or 20 or 100 total actions. I can't find a way to optimize the order of actions without brute forcing it, which gives it complexity exponential on the total number of actions you want to optimize the order for. That becomes impossible past about 15 total.
So, is there any way to find the optimal time with less complexity? Has this problem ever been researched? I imagine there could be some kind of weighted-graph type algorithm that works on this, but I have no idea how it would work, let alone how to implement it.
Sorry if this is confusing -- it's kind of weird conceptually and I couldn't find a better way to frame it.
EDIT: Here is a proper solution using a highly modified Dijkstra's Algorithm:
Dijkstra's algorithm is used to find the shortest path, given a map (of a Graph Abstract), which is a series of Nodes(usually locations, but for this example let's say they are Actions), which are inter-connected by arcs(in this case, instead of distance, each arc will have a 'value')
Here is the structure in essence.
Graph{//in most implementations these are not Arrays, but Maps. Honestly, for your needs you don't a graph, just nodes and arcs... this is just used to keep track of them.
node[] nodes;
arc[] arcs;
}
Node{//this represents an action
arc[] options;//for this implementation, this will always be a list of all possible Actions to use.
float value;//Action value
}
Arc{
node start;//the last action used
node end;//the action after that
dist=1;//1 second
}
We can use this datatype to make a map of all of the viable options to take to get the optimal solution, based on looking at the end-total of each path. Therefore, the more seconds ahead you look for a pattern, the more likely you are to find a very-optimal path.
Every segment of a road on the map has a distance, which represents it's value, and every stop on the road is a one-second mark, since that is the time to make the decision of where to go (what action to execute) next.
For simplicity's sake, let's say that A and B are the only viable options.
na means no action, because no actions are avaliable.
If you are travelling for 4 seconds(the higher the amount, the better the results) your choices are...
A->na->A->na->A
B->na->na->B->na
A->B->A->na->B
B->A->na->B->A
...
there are more too, but I already know that the optimal path is B->A->na->B->A, because it's value is the highest. So, the established best-pattern for handling this combination of actions is (at least after analyzing it for 4 seconds) B->A->na->B->A
This will actually be quite an easy recursive algorithm.
/*
cur is the current action that you are at, it is a Node. In this example, every other action is seen as a viable option, so it's as if every 'place' on the map has a path going to every other path.
numLeft is the amount of seconds left to run the simulation. The higher the initial value, the more desirable the results.
This won't work as written, but will give you a good idea of how the algorithm works.
*/
function getOptimal(cur,numLeft,path){
if(numLeft==0){
var emptyNode;//let's say, an empty node wiht a value of 0.
return emptyNode;
}
var best=path;
path.add(cur);
for(var i=0;i<cur.options.length;i++){
var opt=cur.options[i];//this is a COPY
if(opt.timeCooled<opt.cooldown){
continue;
}
for(var i2=0;i2<opt.length;i2++){
opt[i2].timeCooled+=1;//everything below this in the loop is as if it is one second ahead
}
var potential=getOptimal(opt[i],numLeft-1,best);
if(getTotal(potential)>getTotal(cur)){best.add(potential);}//if it makes it better, use it! getTotal will sum up the values of an array of nodes(actions)
}
return best;
}
function getOptimalExample(){
log(getOptimal(someNode,4,someEmptyArrayOfNodes));//someNode will be A or B
}
End edit.
I'm a bit confused on the question but...
If you have a limited amount of actions, and that's it, then always pick the action with the most value, unless the cooldown hasn't been met yet.
Sounds like you want something like this (in pseudocode):
function getOptimal(){
var a=[A,B,C,D];//A,B,C, and D are actions
a.sort()//(just pseudocode. Sort the array items by how much value they have.)
var theBest=null;
for(var i=0;i<a.length;++i){//find which action is the most valuable
if(a[i].timeSinceLastUsed<a[i].cooldown){
theBest=a[i];
for(...){//now just loop through, and add time to each OTHER Action for their timeSinceLastUsed...
//...
}//That way, some previously used, but more valuable actions will be freed up again.
break;
}//because a is worth the most, and you can use it now, so why not?
}
}
EDIT: After rereading your problem a bit more, I see that the weighted scheduling algorithm would need to be tweaked to fit your problem statement; in our case we only want to take those overlapping actions out of the set that match the class of the action we selected, and those that start at the same point in time. IE if we select a1, we want to remove a2 and b1 from the set but not b2.
This looks very similar to the weighted scheduling problem which is discussed in depth in this pdf. In essence, the weights are your action's values and the intervals are (starttime,starttime+cooldown). The dynamic programming solution can be memoized which makes it run in O(nlogn) time. The only difficult part will be modifying your problem such that it looks like the weighted interval problem which allows us to then utilize the predetermined solution.
Because your intervals don't have set start and end times (IE you can choose when to start a certain action), I'd suggest enumerating all possible start times for all given actions assuming some set time range, then using these static start/end times with the dynamic programming solution. Assuming you can only start an action on a full second, you could run action A for intervals (0-2,1-3,2-4,...), action B for (0-3,1-4,2-5,...), action C for intervals (0-5,1-6,2-7,...) etc. You can then use union the action's sets to get a problem space that looks like the original weighted interval problem:
|---1---2---3---4---5---6---7---| time
|{--a1--}-----------------------| v=1
|---{--a2---}-------------------| v=1
|-------{--a3---}---------------| v=1
|{----b1----}-------------------| v=1.5
|---{----b2-----}---------------| v=1.5
|-------{----b3-----}-----------| v=1.5
|{--------c1--------}-----------| v=2
|---{--------c2---------}-------| v=2
|-------{-------c3----------}---| v=2
etc...
Always choose the available action worth the most points.

load balancing algorithms - special example

Let´s pretend i have two buildings where i can build different units in.
A building can only build one unit at the same time but has a fifo-queue of max 5 units, which will be built in sequence.
Every unit has a build-time.
I need to know, what´s the fastest solution to get my units as fast as possible, considering the units already in the build-queues of my buildings.
"Famous" algorithms like RoundRobin doesn´t work here, i think.
Are there any algorithms, which can solve this problem?
This reminds me a bit of starcraft :D
I would just add an integer to the building queue which represents the time it is busy.
Of course you have to update this variable once per timeunit. (Timeunits are "s" here, for seconds)
So let's say we have a building and we are submitting 3 units, each take 5s to complete. Which will sum up to 15s total. We are in time = 0.
Then we have another building where we are submitting 2 units that need 6 timeunits to complete each.
So we can have a table like this:
Time 0
Building 1, 3 units, 15s to complete.
Building 2, 2 units, 12s to complete.
Time 1
Building 1, 3 units, 14s to complete.
Building 2, 2 units, 12s to complete.
And we want to add another unit that takes 2s, we can simply loop through the selected buildings and pick the one with the lowest time to complete.
In this case this would be building 2. This would lead to Time2...
Time 2
Building 1, 3 units, 13s to complete
Building 2, 3 units, 11s+2s=13s to complete
...
Time 5
Building 1, 2 units, 10s to complete (5s are over, the first unit pops out)
Building 2, 3 units, 10s to complete
And so on.
Of course you have to take care of the upper boundaries in your production facilities. Like if a building has 5 elements, don't assign something and pick the next building that has the lowest time to complete.
I don't know if you can implement this easily with your engine, or if it even support some kind of timeunits.
This will just result in updating all production facilities once per timeunit, O(n) where n is the number of buildings that can produce something. If you are submitting a unit this will take O(1) assuming that you keep the selected buildings in a sorted order, lowest first - so just a first element lookup. In this case you have to resort the list after manipulating the units like cancelling or adding.
Otherwise amit's answer seem to be possible, too.
This is NPC problem (proof at the end of the answer) so your best hope to find ideal solution is trying all possibilities (this will be 2^n possibilities, where n is the number of tasks).
possible heuristic was suggested in comment (and improved in comments by AShelly): sort the tasks from biggest to smallest, and put them in one queue, every task can now take element from the queue when done.
this is of course not always optimal, but I think will get good results for most cases.
proof that the problem is NPC:
let S={u|u is a unit need to be produced}. (S is the set containing all 'tasks')
claim: if there is a possible prefect split (both queues finish at the same time) it is optimal. let this time be HalfTime
this is true because if there was different optimal, at least one of the queues had to finish at t>HalfTime, and thus it is not optimal.
proof:
assume we had an algorithm A to produce the best solution at polynomial time, then we could solve the partition problem at polynomial time by the following algorithm:
1. run A on input
2. if the 2 queues finish exactly at HalfTIme - return True.
3. else: return False
this solution solves the partition problem because of the claim: if the partition exist, it will be returned by A, since it is optimal. all steps 1,2,3 run at polynomial time (1 for the assumption, 2 and 3 are trivial). so the algorithm we suggested solves partition problem at polynomial time. thus, our problem is NPC
Q.E.D.
Here's a simple scheme:
Let U be the list of units you want to build, and F be the set of factories that can build them. For each factory, track total time-til-complete; i.e. How long until the queue is completely empty.
Sort U by decreasing time-to-build. Maintain sort order when inserting new items
At the start, or at the end of any time tick after a factory completes a unit runs out of work:
Make a ready list of all the factories with space in the queue
Sort the ready list by increasing time-til-complete
Get the factory that will be done soonest
take the first item from U, add it to thact factory
Repeat until U is empty or all queues are full.
Googling "minimum makespan" may give you some leads into other solutions. This CMU lecture has a nice overview.
It turns out that if you know the set of work ahead of time, this problem is exactly Multiprocessor_scheduling, which is NP-Complete. Apparently the algorithm I suggested is called "Longest Processing Time", and it will always give a result no longer than 4/3 of the optimal time.
If you don't know the jobs ahead of time, it is a case of online Job-Shop Scheduling
The paper "The Power of Reordering for Online Minimum Makespan Scheduling" says
for many problems, including minimum
makespan scheduling, it is reasonable
to not only provide a lookahead to a
certain number of future jobs, but
additionally to allow the algorithm to
choose one of these jobs for
processing next and, therefore, to
reorder the input sequence.
Because you have a FIFO on each of your factories, you essentially do have the ability to buffer the incoming jobs, because you can hold them until a factory is completely idle, instead of trying to keeping all the FIFOs full at all times.
If I understand the paper correctly, the upshot of the scheme is to
Keep a fixed size buffer of incoming
jobs. In general, the bigger the
buffer, the closer to ideal
scheduling you get.
Assign a weight w to each factory according to
a given formula, which depends on
buffer size. In the case where
buffer size = number factories +1, use weights of (2/3,1/3) for 2 factories; (5/11,4/11,2/11) for 3.
Once the buffer is full, whenever a new job arrives, you remove the job with the least time to build and assign it to a factory with a time-to-complete < w*T where T is total time-to-complete of all factories.
If there are no more incoming jobs, schedule the remainder of jobs in U using the first algorithm I gave.
The main problem in applying this to your situation is that you don't know when (if ever) that there will be no more incoming jobs. But perhaps just replacing that condition with "if any factory is completely idle", and then restarting will give decent results.

Generating a set of random events at a predefined frequency

I have a set of events that must occur randomly, but in a predefined frequency. i.e over a course of (totally) infinite events, event A should have occured 10% of the times, event B should have occured 3%, and so on... Of course the total sum of the percentages of the event list will add upto 100.
I want to achieve this programmatically. How do I do this?
You haven't specified a language, so here comes some pseudo-code
You basically want a function which will call other functions with various probabilities
Function RandomEvent
float roll = Random() -- Random number between 0 and 1
if roll < 0.1 then
EventA
else if roll < 0.13 then
EventB
....
interesting description. Without specific details constricting impementation, I can only offer an idea that you can modify to fit into the choices you've already made about your implementation. If you have a file for which every line contains a single event, then construct the file to have 10% A lines, 3% B lines, etc. Then when choosing an event, get an integer randomly generated to select a line number from the file.
You have to elaborate a little more on what you mean. If you just want the probabilities to be as you described, just pick a random number between 1-100 and map it to the events. That is, if the random number is 1-10, do Event A. If it's 11-13, do Event B, etc.
However, if you require things to come out exactly with those proportions at all times (not that this is really possible), you have to do it differently. Please confirm which meaning you are looking for and I'll edit if needed.
For each event, generate a random number between 0 and 100. If event A should occur 10% of the times, map values 0 - 10 to event A, and so on.
For instance, for 2 events:
n = 0 - 10 ==> Event A
n = 11 - 99 ==> Event B
If you do this, you can have your events occur at random times, and if the running time is long enough (and your RNG is good enough), event frequencies will add up to the desired percentage.
Generate a sequence of events in the exact proportions you want.
For each event, randomly generate a timestamp when each event should be delivered, within your time bounds.
Sort by that timestamp
Run through the list, delivering each event at the appropriate time.
Choose a random number from 1 to 100 inclusive. Assign each event a unique set of integers that represents the frequency that it should occur. If you randomly generated number falls within that particular selected range of numbers fire the associated event.
In the example above the event that should show 10% of the time you would assign it a range of integers 10 integers long (1-10, 12-21, etc...). How you store these integer rangess is up to you.
Like Michael said, since these are random numbers there is no way to guarantee said event fires exactly 10% of the time but over the long run it should...given an even distribution of random numbers.

Resources