Algorithm to Make a Set of Random Outcomes Approach a Specific Percentage - algorithm

Currently, I have a pool of basketball players where I have a projected total of points for each player. Additionally, I have a normal distribution function that gives me a random drawing from a normal distribution for each player. Currently, I have an algorithm that calculates n unique random lineups of 8 players based on some constraints. Between each lineup, the normal distribution function runs again to produce new predictions for each player. Then the best lineup is produced for that specific set of predictions.
I would like to tweak this algorithm in the following way. I would like to have 4 tiers of maximum and minimum percentages where each player is assigned a tier. Within the number of lineups generated, I would like each specific player to occur with that frequency. So for example if I wanted to generate 10 lineups and player 1 is in tier 1 which requires the player to be between 50-60%, then the player would occur in 5-6 lineups ideally.
I'm struggling with how to modify my current algorithm to include this stipulation. Any thoughts would be greatly appreciated! I just don't know how to force each player within a specific range of percentages.

There are a lot of ways to do it.
Here is an easy approach. Keep a current relative odds of being picked for each player. The actual probability is the relative odds divided by the sum of the odds. Each person starts with the expected number of times be selected. Whenever someone is selected, their relative odds is reduced by 1. If it goes below 0, that person is out of the pool.
This approach guarantees that each player will not be in more than a maximum number of teams. It makes it unlikely, but not impossible, that any given player will be in fewer teams than you want.
An easy way to solve that is to randomly round people's desired frequencies up and down to get the right integer count. And now everything has to come even.
There is yet another problem, though. Which is that it is possible that you'll not succeed in assignment to fill all the teams. But if you go from the most popular player to the least, the odds of such mistakes should be acceptably low. Doubly so if you widen the ranges slightly by populating a few extra teams, then throwing away ones that didn't work out.

First draft
So if I understand correctly, you have N players that might appear in the first
position of the string. But you want them to be selected not at random, but according
to some percentage.
Now the first step is to normalize those percentages:
Alice 20%
Bob 40%
Charlie 10%
Doug 60%
Eric 30%
The sum is 160%, so you generate a random number from 1 to 160; say it's 97.
97 is more than 20, so subtract 20 and ignore Alice.
77 is more than 40, so subtract 40 and ignore Bob.
37 is more than 10, so subtract 10 and ignore Charlie.
27 is less than 60: Doug it is.
You can also pre-populate a 160-element array with 20 "Alice" indexes, 60 "Doug" indexes etc., and your player is players[array[random(160)]].

Related

Get all possible valid positions of ships in battleship game

I'm creating probability assistant for Battleship game - in essence, for given game state (field state and available ships), it would produce field where all free cells will have probability of hit.
My current approach is to do a monte-carlo like computation - get random free cell, get random ship, get random ship rotation, check if this placement is valid, if so continue with next ship from available set. If available set is empty, add how the ships were set to output stack. Redo this multiple times, use outputs to compute probability of each cell.
Is there sane algorithm to process all possible ship placements for given field state?
An exact solution is possible. But does not qualify as sane in my books.
Still, here is the idea.
There are many variants of the game, but let's say that we start with a worst case scenario of 1 ship of size 5, 2 of size 4, 3 of size 3 and 4 of size 2.
The "discovered state" of the board is all spots where shots have been taken, or ships have been discovered, plus the number of remaining ships. The discovered state naively requires 100 bits for the board (10x10, any can be shot) plus 1 bit for the count of remaining ships of size 5, 2 bits for the remaining ships of size 4, 2 bits for remaining ships of size 3 and 3 bits for remaining ships of size 2. This makes 108 bits, which fits in 14 bytes.
Now conceptually the idea is to figure out the map by shooting each square in turn in the first row, the second row, and so on, and recording the game state along with transitions. We can record the forward transitions and counts to find how many ways there are to get to any state.
Then find the end state of everything finished and all ships used and walk the transitions backwards to find how many ways there are to get from any state to the end state.
Now walk the data structure forward, knowing the probability of arriving at any state while on the way to the end, but this time we can figure out the probability of each way of finding a ship on each square as we go forward. Sum those and we have our probability heatmap.
Is this doable? In memory, no. In a distributed system it might be though.
Remember that I said that recording a state took 14 bytes? Adding a count to that takes another 8 bytes which takes us to 22 bytes. Adding the reverse count takes us to 30 bytes. My back of the envelope estimate is that at any point in our path there are on the order of a half-billion states we might be in with various ships left, killed ships sticking out and so on. That's 15 GB of data. Potentially for each of 100 squares. Which is 1.5 terabytes of data. Which we have to process in 3 passes.

How to give players a score on a ranking/prediction task?

I have a website built with php/mysql, and I am looking for help in communicating to a Programmer what I want him to do with a Poll/Prediction game that I am trying to create.
For purposes of discussion, assume a game where perhaps 100 players try to predict the top 5 finishers in a Golf Tournament of perhaps 9 Golfers.
I am looking for help in how to create and assign a score based upon the accuracy of prediction.
The players provide a rank ordering using a drag and drop function to order the players from 1 through 5. This ordering has already been coded, and the ranks are stored somehow in the DB (I do not know how).
My initial thinking is to ask the coder to create a script which will assign a score from 1 to 5 for each Golfer that the player nominated to be in the Top 5.
So, a player who predicted perfectly would be awarded a perfect score of 12345.
His first golfer received a 1 for finishing first, second a 2 for finishing second, third golfer receives a 3 for finishing third, and so on.
Anybody less than perfect would have a score higher than 12345.
Players who got the first four positions correct would have to be differentiated on the basis of the finish of their fifth Golfer.
So, one might score 12347 and the other 12348 and the player with the highest score (12348) would be the loser in a matchup of the two players.
A player who did poorly, might have a score of 53419.
Question:
Is this a viable way of creating a score which the players of my game can be ranked upon?
Is it possible to instead simply have something like a Spearman Rank-Order Correlation calculated comparing the Actual Finish Positions with the Predicted Finish Positions for each player,
and then rank players on the basis of the correlation coefficients for their rankings?
Thanks for any help in clarifying how to conceptualize this before approaching a programmer who gets annoyed when I don't really know what I want him to do ahead of time.
It's a quite interesting problem.
It seems that there are three components that need to be considered in the scoring: the number of correct predictions, the order of correct predictions, and the weight of correct predictions.
For example, assume the truth is:
1,5,10,15,20
Here are some predictions:
1,6,7,8,9 : only predicted first one
2,1,10,21,30 : 1 and 10, but the order of 1 is incorrect
20,15,1,5,30 : hit four in the top 5, but the orders are incorrect
It depends on what you value most. You may first check how many in the top 5 the user has predicted, add a value, and then penalize wrong orders. The weight for each position should also be different, this way
1,5,10,15,20 will rank higher than 1,5,10,20,15 and higher than 1,10,5,20,15
Spearman may be working, but I feel it could be too coarse for your purpose.
This is actually a very similar problem that search engines have. EG, in search engine evaluation, the actual outcomes are preferred results provided by humans, and the predicted outcomes are the results delivered by the search engine. In both your task and for search engines, I'd guess you care a lot more about the accuracy of the winner than the accuracy of the 5th place finisher. If that is the case, then the mean average precision is probably a good measure.

Evaluate Poker HandRange A vs. Poker HandRange B

I have this problem: I want to know how often a player holding a portfolio of poker hands beats another player holding a different portfolio of poker hands.
Each hand in a portfolio is given a weight (i.e. a likelihood). Each hand in a portfolio also knows it's own "strength". This effectively means all cards have been dealt. So please assume no more cards need to be dealt.
The reason this problem is annoying is because of duplicate card problems. For example, if I pick a random holding from each player's portfolio I must check that these holdings don't share a card -- obviously both plays can't be dealt the same card.
I want to do this quickly so that I can make many different RangeA vs RangeB comparisions per second. I have a solution, but I won't talk about it yet because I don't want to taint any responces.
-- For an Example --
Given a 5 card board of "Ah 3c 8c Td Jh":
HandRangeA = {{"As Ac", 2.5%}, {"As Ad", 2.5%}, {"Ac Kc", 5%}....}
HandRangeB = {{"As Ac", 7.5%}, {"As Ad", 7.5%}, {"Ac Kc", 5%}....}
(Each HandRange contains all possible holding that don't use a "board card")
Goal :: compute the probaility HandRangeA beats HandRangeB
I wrote some software that did this via monte carlo. That means I ran both hands to completion, 1000 times with random boards that could arrive given the situation, and counted wins and losses. It was surprisingly accurate.
Since I was doing it for texas holdem, I would do the same thing after the (1) deal, (2) flop, 3 (turn) so the player could see how their percentages changed given the board.
I really should have finished that software. But I stopped playing poker online....
I think Andrew Prock is considered the expert here; check out the discussion here, and links therein.
I think you want something like this:
probWin = 0
For Each HandA in RangeA
probA = getProbability(HandA)
For Each HandB in RangeB
probB = getConditionalProbability(HandB, HandA)
probWin += probA * probB * getProbabilityADefeatsB(HandA, HandB)
You need to consider conditional probability, because given HandA is As Ac, there is no longer a 7.5% chance that HandB is As Ac (in fact, there is a 0% chance of that). So you are taking the probability of A having a particular hand multiplied by the probability of B having a particular hand, given what A has, multiplied by the probability of A's hand beating B's hand. That should give you the probability of A having that hand against that particular hand of B's and winning. Iterating over all such pairs should give the desired result I think.
Since the approach is exhaustive, there is no need for any sort of Monte Carlo simulation. Of course this will be O(n^2) where n is the number of possible hands, but n here is relatively small.
EDIT:
I should note that since you are referring to the case where all cards have been dealt, the getProbabilityADefeatsB() function would return either 1 or 0. Also, getConditionalProbability() will either be exactly 0 (because the hands share a card) or simply what your regular weight was. It would be more complicated if the hands were less specific (if HandA is AA then HandB could be a different flavor of AA, but it is less likely).

slot machine payout calculation

There's this question but it has nothing close to help me out here.
Tried to find information about it on the internet yet this subject is so swarmed with articles on "how to win" or other non-related stuff that I could barely find anything. None worth posting here.
My question is how would I assure a payout of 95% over a year?
Theoretically, of course.
So far I can think of three obvious variables to consider within the calculation: Machine payout term (year in my case), total paid and total received in that term.
Now I could simply shoot a random number between the paid/received gap and fix slots results to be shown to the player but I'm not sure this is how it's done.
This method however sounds reasonable, although it involves building the slots results backwards..
I could also make a huge list of all possibilities, save them in a database randomized by order and simply poll one of them each time.
This got many flaws - the biggest one is the huge list I'm going to get (millions/billions/etc' records).
I certainly hope this question will be marked with an "Answer" (:
You have to make reel strips instead of huge database. Here is brief example for very basic 3-reel game containing 3 symbols:
Paytable:
3xA = 5
3xB = 10
3xC = 20
Reels-strip is a sequence of symbols on each reel. For the calculations you only need the quantity of each symbol per each reel:
A = 3, 1, 1 (3 symbols on 1st reel, 1 symbol on 2nd, 1 symbol on 3rd reel)
B = 1, 1, 2
C = 1, 1, 1
Full cycle (total number of all possible combinations) is 5 * 3 * 4 = 60
Now you can calculate probability of each combination:
3xA = 3 * 1 * 1 / full cycle = 0.05
3xB = 1 * 1 * 2 / full cycle = 0.0333
3xC = 1 * 1 * 1 / full cycle = 0.0166
Then you can calculate the return for each combination:
3xA = 5 * 0.05 = 0.25 (25% from AAA)
3xB = 10 * 0.0333 = 0.333 (33.3% from BBB)
3xC = 20 * 0.0166 = 0.333 (33.3% from CCC)
Total return = 91.66%
Finally, you can shuffle the symbols on each reel to get the reels-strips, e.g. "ABACA" for the 1st reel. Then pick a random number between 1 and the length of the strip, e.g. 1 to 5 for the 1st reel. This number is the middle symbol. The upper and lower ones are from the strip. If you picked from the edge of the strip, use the first or last one to loop the strip (it's a virtual reel). Then score the result.
In real life you might want to have Wild-symbols, free spins and bonuses. They all are pretty complicated to describe in this answer.
In this sample the Hit Frequency is 10% (total combinations = 60 and prize combinations = 6). Most of people use excel to calculate this stuff, however, you may find some good tools for making slot math.
Proper keywords for Google: PAR-sheet, "slot math can be fun" book.
For sweepstakes or Class-2 machines you can't use this stuff. You have to display a combination by the given prize instead. This is a pretty different task, so you may try to prepare a database storing the combinations sorted by the prize amount.
Well, the first problem is with the keyword assure, if you are dealing with random, you cannot assure, unless you change the logic of the slot machine.
Consider the following algorithm though. I think this style of thinking is more reliable then plotting graphs of averages to achive 95%;
if( customer_able_to_win() )
{
calculate_how_to_win();
}
else
no_win();
customer_able_to_win() is your data log that says how much intake you have gotten vs how much you have paid out, if you are under 95%, payout, then customer_able_to_win() returns true; in that case, calculate_how_to_win() calculates how much the customer would be able to win based on your %, so, lets choose a sampling period of 24 hours. If over the last 24 hours i've paid out 90% of the money I've taken in, then I can pay out up to 5%.... lets give that 5% a number such as 100$. So calculate_how_to_win says I can pay out up to 100$, so I would find a set of reels that would pay out 100$ or less, and that user could win. You could add a little random to it, but to ensure your 95% you'll have to have some other rules such as a forced max payout if you get below say 80%, and so on.
If you change the algorithm a little by adding random to the mix you will have to have more of these caveats..... So to make it APPEAR random to the user, you could do...
if( customer_able_to_win() && payout_percent() < 90% )
{
calculate_how_to_win(); // up to 5% payout
}
else
no_win();
With something like that, it will go on a losing streak after you hit 95% until you reach 90%, then it will go on a winning streak of random increments until you reach 95%.
This isn't a full algorithm answer, but more of a direction on how to think about how the slot machine works.
I've always envisioned this is the way slot machines work especially with video poker. Because the no_win() function would calculate how to lose, but make it appear to be 1 card off to tease you to think you were going to win, instead of dealing with a 'fair' game and the random just happens to be like that....
Think of the entire process of.... first thinking if you are going to win, how are you going to win, if you're not going to win, how are you going to lose, instead of random number generators determining if you will win or not.
I worked many years ago for an internet casino in Australia, this one being the only one in the world that was regulated completely by a government body. The algorithms you speak of that produce "structured randomness" are obviously extremely complex especially when you are talking multiple lines in all directions, double up, pick the suit, multiple progressive jackpots and the like.
Our poker machine laws for our state demand a payout of 97% of what goes in. For rudely to be satisfied that our machine did this, they made us run 10 million mock turns of the machine and then wanted to see that our game paid off at what the law states with the tiniest range of error (we had many many machines running a script to auto playing using a script to simulate the click for about a week before we hit the 10 mil).
Anyhow the algorithms you speak of are EXPENSIVE! They range from maybe $500k to several million per machine so as you can understand, no one is going to hand them over for free, that's for sure. If you wanted a single line machine it would be easy enough to do. Just work out you symbols/cards and what pay structure you want for each. Then you could just distribute those payouts amongst non-payouts till you got you respective figure. Obviously the more options there are means the longer it will take to pay out at that respective rate, it may even payout more early in the piece. Hit frequency and prize size are also factors you may want to consider
A simple way to do it, if you assume that people win a constant number of times a time period:
Create a collection of all possible tumbler combinations with how much each one pays out.
The first time someone plays, in that time period, you can offer all combinations at equal probability.
If they win, take that amount off the total left for the time period, and remove from the available options any combination that would payout more than you have left.
Repeat with the reduced combinations until all the money is gone for that time period.
Reset and start again for the next time period.

Voting algorithm: how to calculate rank?

I am trying to figure our a way to calculate rank. Right now it simply takes ratio of wins / losses of each individual entry, so e.g. one won 99 times out of a 100, it has 99% winning rank. BUT if an entry won 1 out of total 1 votes, it will have a 100% winning rank, but definitely it can't be higher that of the one that won 99 times. What would be a better way to do this?
Try something like this:
votes = wins + losses
score = votes * ( wins / votes )
That way, something with 50% wins, but a million votes would still be ahead of something with 100% wins but only one vote.
You can add in an extra weight based on age (in days in this example), too, something like
if age < 5:
score = score + ((highest real score on site) * ((5 - age) / 5)
This will put brand new entries right at the top of the first page, and then they will move slowly down the list over the course of the next 5 days (I'm assuming age is a fractional number, not just an integer). After the 5 days are up, they will be put in the list based solely on the score from the previous bit of pseudo-code.
Depending on how complicated you want to make it, the Elo system chess uses (or something similar) may be what you want: http://en.wikipedia.org/wiki/Elo_rating_system
Even if a person has won 1/1 matches, his rating would be far below someone who has won/lost hundreds of matches against tough opponents, for instance.
You could always use a point system rather than win/loss ratio. Winning would always give points and then you could play around with either removing points for losing, not awarding points at all for losing, or awarding less points for losing. It all depends on exactly how you want people to be ranked. For example you may want to give 2 points for winning and 1 point for losing if you want to favor people who participate over those who do not (which sounds kind of like what you were talking about in your example of the person playing 100 games vs 1 game). The NHL uses a similar technique for rankings (2 points for a win, 1 point for an overtime loss, 0 points for a regular loss). That might give you some more flexibility.
if i understand the question correctly, then whoever gets more votes has the higher rank.
Would it make sense to add more rank to winning entry if losing entry originally had a much higher rank, e.g. much stronger competitor?

Resources