What is the percentage of this algorithm? - algorithm

I need help determining the percent chance an item has to be picked by my algorithm. Essentially I created a hashmap of different items and then added an integer as the value to make some items less likely to appear. The way my algorithm works is by generating a random number between 0 and the size of the map. Then it will roll another random number between 1 and that chosen items assigned value. If that value matches the rolled number it will be added to an array. Also keep in mind this algorithm chooses 3 items out of the map and all 3 must be different so what is the percent chance each item will be chosen? Obviously you would need to know how many items there are and the associated values with each item but I can do that on my own I just want to know a general process for finding these percent's. Also after I figure out the percent each item has of showing up in the final set of 3 items I also need to take into account that those 3 items only have a 40% chance of showing up at all.
The first step is to figure out the percent chance each item has of showing up in the final 3 and the last step is to determine the chance those final 3 items have of showing up at all. I was thinking I could divide 3 by the maps total size (for example 20) which would give me a 15% chance of landing on each item before taking into account their assigned values in the map but where do I go from there because I am kind of getting lost with all these different things I need to take into account.

Related

Process to most efficiently fill 6 rows of 21 seats with various group sizes

I have 6 rows of 21 chairs in a building and a booking system where groups of different sizes can book in, with 2 chairs required between each group for social distancing purposes. We want to fit as many people as possible but maintain fairness so smaller groups booking later don't usurp larger groups who already booked earlier. They don't have to be seated in chronological order. Lastly, whilst an attempted booking of more than one person might get rightfully rejected because there's no space to accommodate them, a single booking made after CAN be accepted if they do fit.
I have almost achieved all the above, but not quite... This is how my process works (I could add a lot of code here, but I'm not really after code, rather I need the explanation of what to change that I can't quite grasp. I hope that's ok):
Order the bookings so far chronologically
Loop through each row starting with 21
for each booking, check if the group size + 2 fit. If they do, add them to that rows array, remove them from the bookings array and reduce the number of seats remaining by the group size + 2. Do this until no remaining bookings will fit on this row.
If the remaining seats is 0 then there will be 2 unnecessary 'buffer' seats at the end of the row, but not even a group size of 1 will fit with 2 sets between them and the previous group, so ignore this fact and move on.
If the remaining seats is more than 0 then go through the remaining bookings AGAIN and see if any of the groups will fit WITHOUT adding the 2 seats buffer. If they do, add them to the row array, remove them from the bookings array and break the loop and move onto the next row.
Hopefully you can follow my logic there. It works really well but it isn't filling up the rows in the most efficient way. The bookings don't need to be sat in chronological order but we can't have previous booked in bookings being pushed out by smaller, more recent bookings because they are fitting as efficiently.
Does anyone have any light to shed? My brain is melting!
Since adding small groups is easier than adding large groups, you should place the large groups first.
Suppose the situation is this: you currently have a list of groups that fit. Suddenly, a new group G attempts to book. Try to fit the new group by sorting all the groups by size and placing the groups largest first, smallest last. If this work, accept the new booking of G with the new placement. But if this results in an earlier group no longer fitting, then reject the new group G and keep the old placement.
When you reject a group because you can't fit it, you can also keep the size of that group in memory; the next time a group of equal size or larger attempts to book, you can immediately reject them because you know that you can't fit this size.

Efficiently and dynamically rank many users in memory?

I run a Java game server where I need to efficiently rank players in various ways. For example, by score, money, games won, and other achievements. This is so I can recognize the top 25 players in a given category to apply medals to those players, and dynamically update them as the rankings change. Performance is a high priority.
Note that this cannot easily be done in the database only, as the ranks will come from different sources of data and different database tables, so my hope is to handle this all in memory, and call methods on the ranked list when a value needs to be updated. Also, potentially many users can tie for the same rank.
For example, let's say I have a million players in the database. A given player might earn some extra points and instantly move from 21,305th place to 23rd place, and then later drop back off the top 25 list. I need a way to handle this efficiently. I imagine that some kind of doubly-linked list would be used, but am unsure of how to handle quickly jumping many spots in the list without traversing it one at a time to find the correct new ranking. The fact that players can tie complicates things a little bit, as each element in the ranked list can have multiple users.
How would you handle this in Java?
I don't know whether there is library that may help you, but I think you can maintain a minimum heap in the memory. When a player's point updates, you can compare this to the root of the heap, if less than,do nothing.else adjust the heap.
That means, you can maintain a minimum heap that has 25 nodes which are the highest 25 of all the players in one category.
Forget linked list. It allows fast insertions, but no efficient searching, so it's of no use.
Use the following data
double threshold
ArrayList<Player> top;
ArrayList<Player> others; (3)
and manage the following properties
each player in top has a score greater or equal to threshold
each player in others has a score lower than threshold
top is sorted
top.size() >= 25
top.size() < 25 + N where N is some arbitrary limit (e.g., 50)
Whenever some player raises it score, do the following:
if they're in top, sort top (*)
if they're in others, check if their score promotes them to top
if so, remove them from others, insert in top, and sort top
if top grew too big, move the n/2 worst players from top to others and update threshold
Whenever some player lowers it score, do the following:
- if they're in others, do nothing
- if they're in top, check if their new score allows them to stay in top
- if so, sort top (1)
- otherwise, demote them to bottom, and check if top got too small
- if so, determine an appropriate new threshold and move all corresponding players to top. (2)
(1) Sorting top is cheap as it's small. Moreover, TimSort (i.e., the algorithm behind Arrays.sort(Object[])) works very well on partially sorted sequences. Instead of sorting, you can simply remember that top is unsorted and sort it later when needed.
(2) Determining a proper threshold can be expensive and so can be moving the players. That's why only N/2 player get moved away from it when it grows too big. This leaves some spare players and makes this case pretty improbable assuming that players rarely lose score.
EDIT
For managing the objects, you also need to be able to find them in the lists. Either add a corresponding field to Player or use a TObjectIntHashMap.
EDIT 2
(3) When removing an element from the middle of others, simply replace the element by the last one and shorten the list by one. You can do it as the order doesn't matter and you must do it because of speed. (4)
(4) The whole others list needn't be actually stored anywhere. All you need is a possibility to iterate all the players not contained in top. This can be done by using an additional Set or by simply iterating though all the players and skipping those scoring above threshold.
FINAL RECOMMENDATIONS
Forget the others list (unless I'm overlooking something, you won't need it).
I guess you will need no TObjectIntHashMap either.
Use a list top and a boolean isTopSorted, which gets cleared whenever a top score changes or a player gets promoted to top (simple condition: oldScore >= threshold | newScore >= threshold).
For handling ties, make top contain at least 25 differently scored players. You can check this condition easily when printing the top players.
I assume you may use plenty of memory to do that or memory is not a concern for you. Now as you want only the top 25 entries for any category, I would suggest the following:
Have a HashSet of Player objects. Player objects have the info like name, games won, money etc.
Now have a HashMap of category name vs TreeSet of top 25 player objects in that category. The category name may be a checksum of some columns say gamewon, money, achievent etc.
HashMap<String /*for category name */, TreeSet /*Sort based on the criteria */> ......
Whenever you update a player object, you update the common HashSet first and then check if the player object is a candidate for top 25 entries in any of the categories. If it is a candidate, some player object unfortunately may lose their ranking and hence may get kicked out of the corresponding treeset.
>> if you make the TreeSet sorted by the score, it'll break whenever the score changes (and the player will not be found in it
Correct. Now I got the point :) . So, I will do the following to mitigate the problem. The player object will have a field that indicate whether it is already in some categories, basically a set of categories it is already in. While updating a player object, we have to check if the player is already in some categories, if it is in some categories already, we will rearrange the corresponding treeset first; i.e. remove the player object and readjust the score and add it back to the treeset. Whenever a player object is kicked out of a category, we remove the category from the field which is holding a set of categories the player is in
Now, what you do if the look-up is done with a brand new search criteria (means the top 25 is not computed for this criteria already) ? ------
Traverse the HashMap and build the top entries for this category from "scratch". This will be expensive operation, just like indexing something afresh.

Algorithm to get probability of reaching a goal

Okay, I'm gonna be as detailed as possible here.
Imagine the user goes through a set of 'options' he can choose. Every time he chooses, he get, say, 4 different options. There are many more options that can appear in those 4 'slots'. Each of those has a certain definite and known probability of appearing. Not all options are equally probable to appear, and some options require others to have already been selected previously - in a complex interdependence tree. (this I have already defined)
When the user chooses one of the 4, he is presented another choice of 4 options. The pool of options is defined again and can depend on what the user has chosen previously.
Among all possible 'options' that can ever appear, there are a certain select few which are special, call them KEY options.
When the program starts, the user is presented the first 4 options. For every one of those 4, the program needs to compute the total probability that the user will 'achieve' all the KEY options in a period of (variable) N choices.
e.g. if there are 4 options altogether the probability of achieving any one of them is exactly 1 since all of them appear right at the beginning.
If anyone can advise me as to what logic i should start with, I'd be very grateful.
I was thinking of counting all possible choice sequences, and counting the ones resulting in KEY options being chosen within N 'steps', but the problem is the probability is not uniform for all of them to appear, and also the pool of options changes as the user chooses and accumulates his options.
I'm having difficulty implementing the well defined probabilities and dependencies of the options into an algorithm that can give sensible total probability. So the user knows each time which of the 4 puts him in the best position to eventually acquire the KEY options.
Any ideas?
EDIT:
here's an example:
say there are 7 options in the pool. option1, ..., option7
option7 requires option6; option6 requires option4 and option5;
option1 thru 5 dont require anything and can appear immediately, with respective probabilities option1.p, ..., option5.p;
the KEY option is, say, option7;
user gets 4 randomly (but weighted) chosen options among 1-5, and the program needs to say something like:
"if you choose (first), you have ##% chance of getting option7 in at most N tries." analogous for the other 3 options.
naturally, for some low N it is impossible to get option7, and for some large N it is certain. N can be chosen but is fixed.
EDIT: So, the point here is NOT the user chooses randomly. Point is - the program suggests which option to choose, as to maximize the probability that eventually, after N steps, the user will be offered all key options.
For the above example; say we choose N = 4. so the program needs to tell us which of the first 4 options that appeared (any 4 among option1-5), which one, when chosen, yields the best chance of obtaining option7. since for option7 you need option6, and for that you need option4 and option5, it is clear that you MUST select either option4 or option5 on the first set of choices. one of them is certain to appear, of course.
Let's say we get this for the first choice {option3, option5, option2, option4}. The program then says:
if you chose option3, you'll never get option7 in 4 steps. p = 0;
if you chose option5, you might get option7, p=....;
... option2, p = 0;
... option4, p = ...;
Whatever we choose, for the next 4 options, the p's are re calculated. Clearly, if we chose option3 or option2, every further choice has exactly 0 probability of getting us to option7. But for option4 and option5, p > 0;
Is it clearer now? I don't know how to getting these probabilities p.
This sounds like a moderately fiddly Markov chain type problem. Create a node for every state; a state has no history, and is just dependent on the possible paths out of it (each weighted with some probability). You put a probability on each node, the chance that the user is in that state, so, for the first step, there will be a 1 his starting node, 0 everywhere else. Then, according to which nodes are adjacent and the chances of getting to them, you iterate to the next step by updating the probabilities on each vertex. So, you can calculate easily which states the user could land on in, say, 15 steps, and the associated probabilities. If you are interested in asymptotic behaviour (what would happen if he could play forever), you make a big pile of linear simultaneous equations and just solve them directly or using some tricks if your tree or graph has a neat form. You often end up with cyclical solutions, where the user could get stuck in a loop, and so on.
If you think the user selects the options at random, and he is always presented the same distribution of options at a node, you model this as a random walk on a graph. There was a recent nice post on calculating terminating probabilities of a particular random walks on the mathematica blog.

Amortizing the calculation of distribution (and percentile), applicable on App Engine?

This is applicable to Google App Engine, but not necessarily constrained for it.
On Google App Engine, the database isn't relational, so no aggregate functions (such as sum, average etc) can be implemented. Each row is independent of each other. To calculate sum and average, the app simply has to amortize its calculation by recalculating for each individual new write to the database so that it's always up to date.
How would one go about calculating percentile and frequency distribution (i.e. density)? I'd like to make a graph of the density of a field of values, and this set of values is probably on the order of millions. It may be feasible to loop through the whole dataset (the limit for each query is 1000 rows returned), and calculate based on that, but I'd rather do some smart approach.
Is there some algorithm to calculate or approximate density/frequency/percentile distribution that can be calculated over a period of time?
By the way, the data is indeterminate in that the maximum and minimum may be all over the place. So the distribution would have to take approximately 95% of the data and only do a density based on that.
Getting the whole row (with that limit of 1000 at a time...) over and over again in order to get a single number per row is sure unappealing. So denormalize the data by recording that single number in a separate entity that holds a list of numbers (to a limit of I believe 1 MB per query, so with 4-byte numbers no more than 250,000 numbers per list).
So when adding a number also fetch the latest "added data values list" entity, if full make a new one instead, append the new number, save it. Probably no need to be transactional if a tiny error in the statistics is no killer, as you appear to imply.
If the data for an item can be changed have separate entities of the same kind recording the "deleted" data values; to change one item's value from 23 to 45, add 23 to the latest "deleted values" list, and 45 to the latest "added values" one -- this covers item deletion as well.
It may be feasible to loop through the whole dataset (the limit for each query is 1000 rows returned), and calculate based on that, but I'd rather do some smart approach.
This is the most obvious approach to me, why are you are you trying to avoid it?

Generating a set of random events at a predefined frequency

I have a set of events that must occur randomly, but in a predefined frequency. i.e over a course of (totally) infinite events, event A should have occured 10% of the times, event B should have occured 3%, and so on... Of course the total sum of the percentages of the event list will add upto 100.
I want to achieve this programmatically. How do I do this?
You haven't specified a language, so here comes some pseudo-code
You basically want a function which will call other functions with various probabilities
Function RandomEvent
float roll = Random() -- Random number between 0 and 1
if roll < 0.1 then
EventA
else if roll < 0.13 then
EventB
....
interesting description. Without specific details constricting impementation, I can only offer an idea that you can modify to fit into the choices you've already made about your implementation. If you have a file for which every line contains a single event, then construct the file to have 10% A lines, 3% B lines, etc. Then when choosing an event, get an integer randomly generated to select a line number from the file.
You have to elaborate a little more on what you mean. If you just want the probabilities to be as you described, just pick a random number between 1-100 and map it to the events. That is, if the random number is 1-10, do Event A. If it's 11-13, do Event B, etc.
However, if you require things to come out exactly with those proportions at all times (not that this is really possible), you have to do it differently. Please confirm which meaning you are looking for and I'll edit if needed.
For each event, generate a random number between 0 and 100. If event A should occur 10% of the times, map values 0 - 10 to event A, and so on.
For instance, for 2 events:
n = 0 - 10 ==> Event A
n = 11 - 99 ==> Event B
If you do this, you can have your events occur at random times, and if the running time is long enough (and your RNG is good enough), event frequencies will add up to the desired percentage.
Generate a sequence of events in the exact proportions you want.
For each event, randomly generate a timestamp when each event should be delivered, within your time bounds.
Sort by that timestamp
Run through the list, delivering each event at the appropriate time.
Choose a random number from 1 to 100 inclusive. Assign each event a unique set of integers that represents the frequency that it should occur. If you randomly generated number falls within that particular selected range of numbers fire the associated event.
In the example above the event that should show 10% of the time you would assign it a range of integers 10 integers long (1-10, 12-21, etc...). How you store these integer rangess is up to you.
Like Michael said, since these are random numbers there is no way to guarantee said event fires exactly 10% of the time but over the long run it should...given an even distribution of random numbers.

Resources